Text Generation
Transformers
Safetensors
granite_switch
language
granite-switch
granite-4.1
conversational
Instructions to use ibm-granite/granite-switch-4.1-8b-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-switch-4.1-8b-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ibm-granite/granite-switch-4.1-8b-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-switch-4.1-8b-preview", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ibm-granite/granite-switch-4.1-8b-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ibm-granite/granite-switch-4.1-8b-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-switch-4.1-8b-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ibm-granite/granite-switch-4.1-8b-preview
- SGLang
How to use ibm-granite/granite-switch-4.1-8b-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-switch-4.1-8b-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-switch-4.1-8b-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-switch-4.1-8b-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-switch-4.1-8b-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ibm-granite/granite-switch-4.1-8b-preview with Docker Model Runner:
docker model run hf.co/ibm-granite/granite-switch-4.1-8b-preview
Granite Switch Composed Model
Base Model
- Identifier: ibm-granite/granite-4.1-8b
- Model type: granite
- Architectures: GraniteForCausalLM
- Hidden size: 4096
- Hidden layers: 40
- Attention heads: 32
- Vocab size: 100352
Embedded Adapters
Total adapters: 12
| # | Name | Technology | Control Token | Token ID | Rank | Alpha | Target Modules | Source |
|---|---|---|---|---|---|---|---|---|
| 1 | citations | lora | <|citations|> |
100352 | 16 | 32 | k_proj, o_proj, q_proj, v_proj | ibm-granite/granitelib-rag-r1.0 |
| 2 | query_rewrite | alora | <|query_rewrite|> |
100353 | 32 | 32 | down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj | ibm-granite/granitelib-rag-r1.0 |
| 3 | query_clarification | alora | <|query_clarification|> |
100354 | 32 | 64 | down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj | ibm-granite/granitelib-rag-r1.0 |
| 4 | hallucination_detection | lora | <|hallucination_detection|> |
100355 | 16 | 32 | down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj | ibm-granite/granitelib-rag-r1.0 |
| 5 | answerability | alora | <|answerability|> |
100356 | 16 | 32 | down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj | ibm-granite/granitelib-rag-r1.0 |
| 6 | factuality-detection | alora | <|factuality-detection|> |
100357 | 32 | 64 | k_proj, q_proj, v_proj | ibm-granite/granitelib-guardian-r1.0 |
| 7 | policy-guardrails | alora | <|policy-guardrails|> |
100358 | 16 | 32 | k_proj, o_proj, q_proj, v_proj | ibm-granite/granitelib-guardian-r1.0 |
| 8 | factuality-correction | alora | <|factuality-correction|> |
100359 | 32 | 64 | k_proj, q_proj, v_proj | ibm-granite/granitelib-guardian-r1.0 |
| 9 | guardian-core | alora | <|guardian-core|> |
100360 | 16 | 64 | k_proj, o_proj, q_proj, v_proj | ibm-granite/granitelib-guardian-r1.0 |
| 10 | uncertainty | alora | <|uncertainty|> |
100361 | 32 | 64 | down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj | ibm-granite/granitelib-core-r1.0 |
| 11 | requirement-check | alora | <|requirement-check|> |
100362 | 16 | 64 | k_proj, o_proj, q_proj, v_proj | ibm-granite/granitelib-core-r1.0 |
| 12 | context-attribution | lora | <|context-attribution|> |
100363 | 16 | 32 | k_proj, q_proj, v_proj | ibm-granite/granitelib-core-r1.0 |
Composition Details
- base_param_count: 8,380,551,168
- composed_param_count: 9,568,112,640
- Param delta: +14.17%
- compose_settings:
- adapter_substitute_token_ids:
- 100264
- 100264
- 100264
- 100264
- 100264
- 27
- 27
- 27
- 27
- 27
- 27
- 100264
- target_model: "granite-4.1-8b"
- adapter_substitute_token_ids:
- adapter_sources:
- "ibm-granite/granitelib-rag-r1.0": "2f0b2c79c6731068625aca8045c2eb2e8912b353"
- "ibm-granite/granitelib-guardian-r1.0": "773b254e98f993a605ec4b6259634906e0e64e8e"
- "ibm-granite/granitelib-core-r1.0": "d0a2a96a4cd07e96f0fe7ca29a42bfe088299d43"