Instructions to use ibm-granite/granite-switch-4.1-8b-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ibm-granite/granite-switch-4.1-8b-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ibm-granite/granite-switch-4.1-8b-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-switch-4.1-8b-preview", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ibm-granite/granite-switch-4.1-8b-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ibm-granite/granite-switch-4.1-8b-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-switch-4.1-8b-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ibm-granite/granite-switch-4.1-8b-preview

SGLang

How to use ibm-granite/granite-switch-4.1-8b-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ibm-granite/granite-switch-4.1-8b-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-switch-4.1-8b-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ibm-granite/granite-switch-4.1-8b-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-switch-4.1-8b-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ibm-granite/granite-switch-4.1-8b-preview with Docker Model Runner:
```
docker model run hf.co/ibm-granite/granite-switch-4.1-8b-preview
```

antonpuz commited on 14 days ago

Commit

c739556

verified ·

1 Parent(s): 6cdcd19

Updating GraniteSwitch 4.1 8b

Browse files

Files changed (13) hide show

BUILD.md +16 -5
chat_template.jinja +19 -8
compose_report.json +1 -1
config.json +13 -71
io_configs/factuality-correction/io.yaml +8 -2
io_configs/factuality-detection/io.yaml +8 -2
io_configs/guardian-core/io.yaml +6 -1
io_configs/policy-guardrails/io.yaml +13 -1
io_configs/requirement-check/io.yaml +3 -1
io_configs/uncertainty/io.yaml +2 -2
model-00001-of-00004.safetensors +2 -2
model-00002-of-00004.safetensors +2 -2
model.safetensors.index.json +3 -4

BUILD.md CHANGED Viewed

@@ -35,10 +35,21 @@ Total adapters: **12**
 - composed_param_count: 9,568,112,640
 - Param delta: +14.17%
 - compose_settings:
-  - exclude_adapters:
-    - "context_relevance"
   - target_model: "granite-4.1-8b"
 - adapter_sources:
-  - "ibm-granite/granitelib-rag-r1.0": "6e4a75e35f1cb272e8d15b4615fb0a123398d1cf"
-  - "ibm-granite/granitelib-guardian-r1.0": "882ccf11cf1e4cdc3a66044f17872e55078dbc85"
-  - "ibm-granite/granitelib-core-r1.0": "8f78babf3f0d5baba230464838050a71fe59dee5"

 - composed_param_count: 9,568,112,640
 - Param delta: +14.17%
 - compose_settings:
+  - adapter_substitute_token_ids:
+    - 100264
+    - 100264
+    - 100264
+    - 100264
+    - 100264
+    - 27
+    - 27
+    - 27
+    - 27
+    - 27
+    - 27
+    - 100264
   - target_model: "granite-4.1-8b"
 - adapter_sources:
+  - "ibm-granite/granitelib-rag-r1.0": "2f0b2c79c6731068625aca8045c2eb2e8912b353"
+  - "ibm-granite/granitelib-guardian-r1.0": "773b254e98f993a605ec4b6259634906e0e64e8e"
+  - "ibm-granite/granitelib-core-r1.0": "d0a2a96a4cd07e96f0fe7ca29a42bfe088299d43"

chat_template.jinja CHANGED Viewed

@@ -38,7 +38,8 @@
                        adapter_token=adapter_token,
                        adapter_type=adapter_type,
                        adapter_invocation_text=adapter_invocation_text,
-                       alora_target_idx=-1
                        ) %}
 {%- if tools %}
     {%- for tool in tools %}
@@ -59,6 +60,7 @@
 {#- For lora adapters: insert activation token at the very beginning -#}
 {%- if ns.adapter_token and ns.adapter_type == 'lora' %}
 {{- ns.adapter_token }}
 {%- endif %}
 {%- if messages[0].role == 'system' %}
@@ -91,7 +93,8 @@
     {%- endif %}
 {%- endif %}
 {%- if ns.system_message %}
-    {{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
 {%- endif %}
 {#- ALoRA Pass 1: find the last user message containing the invocation text.
      ns.alora_target_idx stays -1 when the invocation sequence is the assistant role
@@ -129,17 +132,22 @@
             {%- endfor %}
         {%- endif %}
     {%- endif %}
-        {#- ALoRA Pass 2: inject activation token before invocation text in the target message -#}
     {%- if loop.index0 == ns.alora_target_idx %}
         {%- set _parts = content.val.rsplit(ns.adapter_invocation_text, 1) %}
         {%- if _parts | length > 1 %}
-            {%- set content.val = _parts[0] + ns.adapter_token + ns.adapter_invocation_text + _parts[1] %}
         {%- endif %}
     {%- endif %}
 {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
-        {{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val + '<|end_of_text|>\n' }}
     {%- elif message.role == 'assistant' %}
-        {{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
         {%- if message.tool_calls %}
             {%- for tool_call in message.tool_calls %}
                 {%- if (loop.first and content.val) or (not loop.first) %}
@@ -162,7 +170,8 @@
         {{- '<|end_of_text|>\n' }}
     {%- elif message.role == 'tool' %}
         {%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
-            {{- '<|start_of_role|>user<|end_of_role|>' }}
         {%- endif %}
         {{- '\n<tool_response>\n' }}
         {{- content.val }}
@@ -178,7 +187,9 @@
      role token boundary rather than inside a user message. -#}
 {%- if ns.adapter_token and ns.adapter_type == 'alora' and ns.alora_target_idx == -1 %}
 {{- ns.adapter_token }}
 {%- endif %}
 {%- if add_generation_prompt %}
-    {{- '<|start_of_role|>assistant<|end_of_role|>' }}
 {%- endif %}

                        adapter_token=adapter_token,
                        adapter_type=adapter_type,
                        adapter_invocation_text=adapter_invocation_text,
+                       alora_target_idx=-1,
+                       skip_next_start_of_role=false
                        ) %}
 {%- if tools %}
     {%- for tool in tools %}
 {#- For lora adapters: insert activation token at the very beginning -#}
 {%- if ns.adapter_token and ns.adapter_type == 'lora' %}
 {{- ns.adapter_token }}
+{%- set ns.skip_next_start_of_role = true %}
 {%- endif %}
 {%- if messages[0].role == 'system' %}
     {%- endif %}
 {%- endif %}
 {%- if ns.system_message %}
+    {%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{{- '<|start_of_role|>' }}{%- endif %}
+        {{- 'system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
 {%- endif %}
 {#- ALoRA Pass 1: find the last user message containing the invocation text.
      ns.alora_target_idx stays -1 when the invocation sequence is the assistant role
             {%- endfor %}
         {%- endif %}
     {%- endif %}
+        {#- ALoRA Pass 2: inject activation token AND drop the first char of
+         the invocation text so the runtime-swapped embedding doesn't duplicate. -#}
     {%- if loop.index0 == ns.alora_target_idx %}
         {%- set _parts = content.val.rsplit(ns.adapter_invocation_text, 1) %}
         {%- if _parts | length > 1 %}
+            {%- set content.val = _parts[0] + ns.adapter_token + ns.adapter_invocation_text[1:] + _parts[1] %}
         {%- endif %}
     {%- endif %}
 {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
+        {%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{{- '<|start_of_role|>' }}{%- endif %}
+        {{- '' }}{%- endif %}
+        {{- message.role + '<|end_of_role|>' + content.val + '<|end_of_text|>\n' }}
     {%- elif message.role == 'assistant' %}
+        {%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{{- '<|start_of_role|>' }}{%- endif %}
+        {{- '' }}{%- endif %}
+        {{- message.role + '<|end_of_role|>' + content.val }}
         {%- if message.tool_calls %}
             {%- for tool_call in message.tool_calls %}
                 {%- if (loop.first and content.val) or (not loop.first) %}
         {{- '<|end_of_text|>\n' }}
     {%- elif message.role == 'tool' %}
         {%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
+            {%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{{- '<|start_of_role|>' }}{%- endif %}
+        {{- 'user<|end_of_role|>' }}
         {%- endif %}
         {{- '\n<tool_response>\n' }}
         {{- content.val }}
      role token boundary rather than inside a user message. -#}
 {%- if ns.adapter_token and ns.adapter_type == 'alora' and ns.alora_target_idx == -1 %}
 {{- ns.adapter_token }}
+{%- set ns.skip_next_start_of_role = true %}
 {%- endif %}
 {%- if add_generation_prompt %}
+    {%- if ns.skip_next_start_of_role %}{%- set ns.skip_next_start_of_role = false %}{%- else %}{{- '<|start_of_role|>' }}{%- endif %}
+        {{- 'assistant<|end_of_role|>' }}
 {%- endif %}

compose_report.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "metadata": {
-    "timestamp": "2026-05-03T16:10:19.726075"
   },
   "base_model_mapping": [
     {

 {
   "metadata": {
+    "timestamp": "2026-05-21T05:14:40.268526"
   },
   "base_model_mapping": [
     {

config.json CHANGED Viewed

@@ -27,19 +27,19 @@
     16,
     16
   ],
-  "adapter_third_party": [
-    "citations",
-    "query_rewrite",
-    "query_clarification",
-    "hallucination_detection",
-    "answerability",
-    "factuality-detection",
-    "policy-guardrails",
-    "factuality-correction",
-    "guardian-core",
-    "uncertainty",
-    "requirement-check",
-    "context-attribution"
   ],
   "adapter_token_ids": [
     100352,
@@ -62,7 +62,6 @@
   "attention_dropout": 0.0,
   "attention_multiplier": 0.0078125,
   "bos_token_id": 100257,
-  "control_dims": 32,
   "control_token_gain": 15.0,
   "dtype": "bfloat16",
   "embedding_multiplier": 12.0,
@@ -70,63 +69,6 @@
   "fused_add_norm": false,
   "hidden_act": "silu",
   "hidden_size": 4096,
-  "hiding_groups": {
-    "all_controls": [
-      "citations",
-      "query_rewrite",
-      "query_clarification",
-      "hallucination_detection",
-      "answerability",
-      "factuality-detection",
-      "policy-guardrails",
-      "factuality-correction",
-      "guardian-core",
-      "uncertainty",
-      "requirement-check",
-      "context-attribution"
-    ]
-  },
-  "hiding_policy": {
-    "answerability": [
-      "all_controls"
-    ],
-    "base": [
-      "all_controls"
-    ],
-    "citations": [
-      "all_controls"
-    ],
-    "context-attribution": [
-      "all_controls"
-    ],
-    "factuality-correction": [
-      "all_controls"
-    ],
-    "factuality-detection": [
-      "all_controls"
-    ],
-    "guardian-core": [
-      "all_controls"
-    ],
-    "hallucination_detection": [
-      "all_controls"
-    ],
-    "policy-guardrails": [
-      "all_controls"
-    ],
-    "query_clarification": [
-      "all_controls"
-    ],
-    "query_rewrite": [
-      "all_controls"
-    ],
-    "requirement-check": [
-      "all_controls"
-    ],
-    "uncertainty": [
-      "all_controls"
-    ]
-  },
   "initializer_range": 0.1,
   "intermediate_size": 12800,
   "layer_types": [

     16,
     16
   ],
+  "adapter_substitute_token_ids": [
+    100264,
+    100264,
+    100264,
+    100264,
+    100264,
+    27,
+    27,
+    27,
+    27,
+    27,
+    27,
+    100264
   ],
   "adapter_token_ids": [
     100352,
   "attention_dropout": 0.0,
   "attention_multiplier": 0.0078125,
   "bos_token_id": 100257,
   "control_token_gain": 15.0,
   "dtype": "bfloat16",
   "embedding_multiplier": 12.0,
   "fused_add_norm": false,
   "hidden_act": "silu",
   "hidden_size": 4096,
   "initializer_range": 0.1,
   "intermediate_size": 12800,
   "layer_types": [

io_configs/factuality-correction/io.yaml CHANGED Viewed

@@ -14,10 +14,16 @@ response_format: |
     "required": ["correction"]
   }
 transformations: ~
-instruction: ~
 parameters:
   # corrected response can several hundred tokens at high temperatures
   max_completion_tokens: 4096
   temperature: 0.0
 # No sentence boundary detection
-sentence_boundaries: ~

     "required": ["correction"]
   }
 transformations: ~
+instruction: |2
+  <guardian>As a judge agent, your role is to help assess whether the provided text meets the given judging criteria, utilizing all available information, including conversations, documents, and tools.
+  ### Criteria: A factually incorrect response occurs when the assistant's message contains one or more factual claims that are unsupported by, inconsistent with, or directly contradicted by the information provided in the documents or context. This includes situations where the assistant: introduces details not grounded in the context, misstates or distorts facts contained within the context, misinterprets the meaning or implications of the context, supplies erroneous or conflicting information relative to the context. Even if only a small portion of the response contains such inaccuracies, the overall message is considered factually incorrect.
+  ### Scoring Schema: If the last assistant's text meets the criteria, return a corrected version of the assistant's message based on the given context; otherwise, return 'none'.
 parameters:
   # corrected response can several hundred tokens at high temperatures
   max_completion_tokens: 4096
   temperature: 0.0
 # No sentence boundary detection
+sentence_boundaries: ~

io_configs/factuality-detection/io.yaml CHANGED Viewed

@@ -16,9 +16,15 @@ response_format: |
     "additionalProperties": false
   }
 transformations: ~
-instruction: ~
 parameters:
   max_completion_tokens: 20
   temperature: 0.0
 # No sentence boundary detection
-sentence_boundaries: ~

     "additionalProperties": false
   }
 transformations: ~
+instruction: |2
+  <guardian>As a judge agent, your role is to help assess whether the provided text meets the given judging criteria, utilizing all available information, including conversations, documents, and tools.
+  ### Criteria: A factually incorrect response occurs when the assistant's message contains one or more factual claims that are unsupported by, inconsistent with, or directly contradicted by the information provided in the documents or context. This includes situations where the assistant: introduces details not grounded in the context, misstates or distorts facts contained within the context, misinterprets the meaning or implications of the context, supplies erroneous or conflicting information relative to the context. Even if only a small portion of the response contains such inaccuracies, the overall message is considered factually incorrect.
+  ### Scoring Schema: If the last assistant's text meets the criteria, return 'yes'; otherwise, return 'no'.
 parameters:
   max_completion_tokens: 20
   temperature: 0.0
 # No sentence boundary detection
+sentence_boundaries: ~

io_configs/guardian-core/io.yaml CHANGED Viewed

@@ -25,7 +25,12 @@ transformations:
   - type: nest
     input_path: []
     field_name: "guardian"
-instruction: ~
 parameters:
   max_completion_tokens: 15
   temperature: 0.0

   - type: nest
     input_path: []
     field_name: "guardian"
+instruction: |-
+  <guardian>As a judge agent, your role is to help assess whether the provided text meets the given judging criteria, utilizing all available information, including conversations, documents, and tools.
+  ### Criteria: {criteria}
+  ### Scoring Schema: {scoring_schema}
 parameters:
   max_completion_tokens: 15
   temperature: 0.0

io_configs/policy-guardrails/io.yaml CHANGED Viewed

@@ -16,7 +16,19 @@ response_format: |
     "additionalProperties": false
   }
 transformations: ~
-instruction: ~
 parameters:
   temperature: 0.0
   max_completion_tokens: 20

     "additionalProperties": false
   }
 transformations: ~
+instruction: |-
+  <guardian> You are a compliance agent trying to help determine whether a scenario is compliant with a given policy.
+  ### Criteria: Policy: {policy_text}
+  ### Scoring Schema: Does the scenario described in the previous message comply with the policy? Compliance is determined based on the following guidelines:
+  <start_of_guidelines>:
+  - "Yes"  if, given the scenario and the policy, it can be decided with certainty that the scenario complies with the policy
+  - "No"  if, given the scenario and the policy, it can be decided with certainty that the scenario does not comply with the policy
+  - "Ambiguous" if, given the scenario and the policy, it is not possible to decide whether the scenario complies with, or violates, the policy and more information is needed to decide with certainty.
+  <end_of_guidelines>
+  Your answer must be either "Yes", "No", or "Ambiguous". You MUST return your answer as a valid JSON object with the key "label". For example, if your answer is "Yes", respond as "{{"label":"Yes"}}".
 parameters:
   temperature: 0.0
   max_completion_tokens: 20

io_configs/requirement-check/io.yaml CHANGED Viewed

@@ -25,7 +25,9 @@ transformations:
   - type: nest
     input_path: []
     field_name: "requirement_check"
-instruction: ~
 parameters:
   max_completion_tokens: 15
   temperature: 0.0

   - type: nest
     input_path: []
     field_name: "requirement_check"
+instruction: |-
+  <requirements>: {requirement}
+  Please verify if the assistant's generation satisfies the user's requirements or not and reply with a binary label accordingly. Respond with a json {{"score": "yes"}} if the constraints are satisfied or respond with {{"score": "no"}} if the constraints are not satisfied.
 parameters:
   max_completion_tokens: 15
   temperature: 0.0

io_configs/uncertainty/io.yaml CHANGED Viewed

@@ -35,8 +35,8 @@ transformations:
     input_path: []
     retained_fields:
       score: "certainty"
-instruction: ~
 parameters:
   max_completion_tokens: 15
   temperature: 0.0
-sentence_boundaries: ~

     input_path: []
     retained_fields:
       score: "certainty"
+instruction: <certainty>
 parameters:
   max_completion_tokens: 15
   temperature: 0.0
+sentence_boundaries: ~

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0e2e60a9d257ce24d7196c82ded9965b4a37b3f4d40b0dd9d2378e60ae2259aa
-size 4999587489

 version https://git-lfs.github.com/spec/v1
+oid sha256:8b153d175bf5a24225a5138dddd9501a77fb3595bc95255a5a0603e038a1eb68
+size 4997144072

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6364123669111c1a904ffe832a030df2c9539e61267e1e56a7c3f3d1bb2088a8
-size 4979764800

 version https://git-lfs.github.com/spec/v1
+oid sha256:06644fb2f44e8b96391b0fc9d5b27c627969d76efe162767d523b484d94e828b
+size 4982910656

model.safetensors.index.json CHANGED Viewed

@@ -1,10 +1,9 @@
 {
   "metadata": {
     "total_parameters": 9568112640,
-    "total_size": 19136325753
   },
   "weight_map": {
-    "model.adapter_hiding_matrix": "model-00001-of-00004.safetensors",
     "model.adapter_token_ids": "model-00001-of-00004.safetensors",
     "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
     "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
@@ -794,7 +793,7 @@
     "model.layers.9.self_attn.o_proj.lora_B": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.base_layer.weight": "model-00001-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_A_slices.0": "model-00001-of-00004.safetensors",
-    "model.layers.9.self_attn.qkv_proj.lora_A_slices.1": "model-00001-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_A_slices.2": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_B_slices.0": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_B_slices.1": "model-00002-of-00004.safetensors",
@@ -808,6 +807,6 @@
     "model.layers.9.shared_mlp.output_linear.lora_A": "model-00002-of-00004.safetensors",
     "model.layers.9.shared_mlp.output_linear.lora_B": "model-00002-of-00004.safetensors",
     "model.norm.weight": "model-00004-of-00004.safetensors",
-    "model.token_to_group_mask": "model-00001-of-00004.safetensors"
   }
 }

 {
   "metadata": {
     "total_parameters": 9568112640,
+    "total_size": 19137028288
   },
   "weight_map": {
     "model.adapter_token_ids": "model-00001-of-00004.safetensors",
     "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
     "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
     "model.layers.9.self_attn.o_proj.lora_B": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.base_layer.weight": "model-00001-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_A_slices.0": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.qkv_proj.lora_A_slices.1": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_A_slices.2": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_B_slices.0": "model-00002-of-00004.safetensors",
     "model.layers.9.self_attn.qkv_proj.lora_B_slices.1": "model-00002-of-00004.safetensors",
     "model.layers.9.shared_mlp.output_linear.lora_A": "model-00002-of-00004.safetensors",
     "model.layers.9.shared_mlp.output_linear.lora_B": "model-00002-of-00004.safetensors",
     "model.norm.weight": "model-00004-of-00004.safetensors",
+    "model.switch.control_to_substitute_lut": "model-00001-of-00004.safetensors"
   }
 }