Instructions to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF",
	filename="Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-Compact.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Use Docker

docker model run hf.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

LM Studio
Jan

vLLM

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Ollama
How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Ollama:
```
ollama run hf.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
```

Unsloth Studio

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF to start chatting

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Run Hermes

hermes

Docker Model Runner
How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16
```

Lemonade

How to use LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF:F16

Run and chat with the model

lemonade run user.Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF-F16

List all available models

lemonade list

Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF / chat_template.jinja

LuffyTheFox

Upload chat_template.jinja

8d42ce1 verified 16 days ago

raw

history blame contribute delete

18.4 kB

	{#- ===== HELPER: raise_exception macro =====
	Jinja2 doesn't have a built-in raise_exception.
	This macro outputs an error marker in the rendered output.
	Callers should check output for "ERROR:" pattern to detect validation failures.
	-#}
	{%- macro raise_exception(message) -%}
	{{- '\n[ERROR: ' ~ message ~ ']' -}}
	{%- endmacro -%}

	{#- ===== SECTION 1A: MACRO render_content =====
	Handles string, list (image/video/text items), or None/undefined.
	count_vision=true: increments ns.image_count / ns.video_count.
	is_system_content=false: Set true when rendering system/developer content
	to enable media validation (raises exception).
	count_vision=true: increments vision counters.
	-#}
	{%- macro render_content(content, count_vision=false, is_system_content=false) -%}
	{#- VALIDATION: System messages cannot contain images or videos (from v18) -#}
	{#- FIX: also exclude strings and handle None - llama.cpp treats strings as non-iterable in for loops -#}
	{%- if is_system_content and content is iterable and content is not mapping and content is not string and content is not none -%}
	{%- for item in content -%}
	{%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
	{{- raise_exception('System message cannot contain images.') -}}
	{%- endif -%}
	{%- if item.type == 'video' or 'video' in item -%}
	{{- raise_exception('System message cannot contain videos.') -}}
	{%- endif -%}
	{%- endfor -%}
	{%- endif -%}

	{#- Main content rendering -#}
	{#- Handle None/undefined content -#}
	{%- if content is none or content is defined == false -%}
	{{- '' -}}
	{%- elif content is string -%}
	{{- content -}}
	{#- FIX: also exclude strings - llama.cpp treats strings as non-iterable in for loops -#}
	{%- elif content is iterable and content is not mapping and content is not string -%}
	{%- for item in content -%}
	{#- Handle different item types -#}
	{%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
	{%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
	{%- if add_vision_id is defined and add_vision_id -%}
	{{- 'Picture ' ~ ns.image_count ~ ': ' -}}
	{%- endif -%}
	{{- '<\|vision_start\|><\|image_pad\|><\|vision_end\|>' -}}
	{%- elif item.type == 'video' or 'video' in item -%}
	{%- if count_vision -%}{%- set ns.video_count = ns.video_count + 1 -%}{%- endif -%}
	{%- if add_vision_id is defined and add_vision_id -%}
	{{- 'Video ' ~ ns.video_count ~ ': ' -}}
	{%- endif -%}
	{{- '<\|vision_start\|><\|video_pad\|><\|vision_end\|>' -}}
	{%- elif item.type == 'text' or 'text' in item -%}
	{{- item.text -}}
	{#- ERROR: Unknown content type - raise explicit exception (from v18) -#}
	{%- else -%}
	{{- raise_exception('Unexpected content type in message content.') -}}
	{%- endif -%}
	{%- endfor -%}
	{#- ERROR: Unknown content type - raise explicit exception (from v18) -#}
	{%- elif content is not none and content is defined -%}
	{{- raise_exception('Unexpected content type.') -}}
	{%- endif -%}
	{%- endmacro -%}

	{#- ===== SECTION 1B: MACRO detect_tool_error (NEW in v0.7) =====
	Detects if a tool response contains error indicators.
	Uses heuristics from v18:
	- Checks for error keywords (error, exception, traceback, failed to)
	- Ignores responses with '$ ' (shell output prefix) or 'took ' (timing info)
	- Ignores responses > 500 chars (likely valid output, not error)

	Returns: ns.last_tool_failed (true/false)
	Side effect: Updates ns.consecutive_failures counter
	-#}
	{%- macro detect_tool_error(content) -%}
	{#- Type guard: ensure content is string (llama.cpp compatibility) -#}
	{%- set content = content if content is string else '' -%}
	{%- set content_lower = content \| lower -%}
	{%- set content_length = content \| length -%}

	{#- Error detection heuristics: short response + no shell prefix + has error keywords -#}
	{%- if content_length < 500
	and '$ ' not in content
	and 'took ' not in content_lower
	and ('"error":' in content_lower or 'error:' in content_lower
	or 'exception:' in content_lower or 'traceback' in content_lower
	or 'command not found' in content_lower or 'invalid syntax' in content_lower
	or 'failed to' in content_lower or 'permission denied' in content_lower) -%}
	{#- Error detected - update failure tracking -#}
	{%- set ns.last_tool_failed = true -%}
	{%- set ns.consecutive_failures = ns.consecutive_failures + 1 -%}
	{%- else -%}
	{#- No error - reset failure tracking -#}
	{%- set ns.last_tool_failed = false -%}
	{%- set ns.consecutive_failures = 0 -%}
	{%- endif -%}
	{%- endmacro -%}

	{#- ===== SECTION 2: NAMESPACE INITIALISATION =====
	Single ns object for all mutable state.

	enable_thinking: default=true (controls think-block in generation prompt)
	preserve_thinking: default=true (controls think-block display in conversation history)
	image_count: Vision counter for images
	video_count: Vision counter for videos

	NEW in v0.7:
	- consecutive_failures: Tracks consecutive tool call failures (from v18)
	- last_tool_failed: Boolean flag for current tool response (from v18)
	-#}
	{%- set ns = namespace(
	enable_thinking=false,
	preserve_thinking=false,
	image_count=0,
	video_count=0,
	consecutive_failures=0,
	last_tool_failed=false
	) -%}

	{#- Resolve enable_thinking kwarg -#}
	{%- if enable_thinking is defined -%}
	{%- if enable_thinking -%}
	{%- set ns.enable_thinking = true -%}
	{%- else -%}
	{%- set ns.enable_thinking = false -%}
	{%- endif -%}
	{%- endif -%}

	{#- Resolve preserve_thinking kwarg (FIXED in v0.7: now also affects conversation history, not just generation prompt).
	preserve_thinking=false => force non-thinking mode (same as enable_thinking=false).
	preserve_thinking=true => default, no override (thinking controlled by enable_thinking).
	When not defined => default, no override.
	-#}
	{%- if preserve_thinking is defined -%}
	{%- if not preserve_thinking -%}
	{%- set ns.enable_thinking = false -%}
	{%- set ns.preserve_thinking = false -%}
	{%- else -%}
	{%- set ns.preserve_thinking = true -%}
	{%- endif -%}
	{%- endif -%}

	{#- ===== SECTION 3: PRE-SCAN =====
	Track last /no_think or /think flag in user messages.
	Also scan system messages for <\|think_off\|> / <\|think_on\|> markers
	(allows apps to control thinking mode via system prompt injection).
	The model follows the last flag encountered in multi-turn conversations.
	-#}
	{%- for i in range(messages \| length) -%}
	{%- set _msg = messages[i] -%}
	{%- if _msg.role == 'user' -%}
	{%- set _u = _msg.content if _msg.content is string else '' -%}
	{%- if _u.rstrip().endswith('/no_think') -%}
	{%- set ns.enable_thinking = false -%}
	{%- elif _u.rstrip().endswith('/think') -%}
	{%- set ns.enable_thinking = true -%}
	{%- endif -%}
	{%- elif _msg.role == 'system' or _msg.role == 'developer' -%}
	{%- set _s = _msg.content if _msg.content is string else '' -%}
	{%- if '<\|think_off\|>' in _s -%}
	{%- set ns.enable_thinking = false -%}
	{%- elif '<\|think_on\|>' in _s -%}
	{%- set ns.enable_thinking = true -%}
	{%- endif -%}
	{%- endif -%}
	{%- endfor -%}

	{#- ===== SECTION 4: VALIDATE MESSAGES (NEW in v0.7) =====
	Validate that messages is provided and not empty.
	From v18: raises exception if no messages provided.
	-#}
	{%- if not messages -%}
	{{- raise_exception('No messages provided.') -}}
	{%- endif -%}

	{#- ===== SECTION 5: COLLECT SYSTEM CONTENT =====
	Merge all system/developer messages with \n\n separator.
	<\|think_off\|> / <\|think_on\|> markers are stripped from output.

	FIXED in v0.7: Pass is_system_content=true to render_content to trigger
	media validation (raises exception if system contains images/videos).
	-#}
	{%- set ns_sys = namespace(content='') -%}
	{%- for msg in messages -%}
	{%- if msg.role == 'system' or msg.role == 'developer' -%}
	{#- Pass is_system_content=true for media validation -#}
	{%- set _c = render_content(msg.content \| default(''), false, true) \| trim -%}
	{%- set _c = _c \| replace('<\|think_off\|>', '') \| replace('<\|think_on\|>', '') \| trim -%}
	{%- if _c -%}
	{%- if ns_sys.content == '' -%}
	{%- set ns_sys.content = _c -%}
	{%- else -%}
	{%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
	{%- endif -%}
	{%- endif -%}
	{%- endif -%}
	{%- endfor -%}

	{#- ===== SECTION 6: BUILD TOOLS LIST =====
	Normalise each tool to {"type":"function","function":{...}} format.
	Serialisation happens later at output time (avoids Markup + str escaping bugs).
	-#}
	{%- set _has_tools = tools is defined and tools -%}
	{%- if _has_tools -%}
	{%- set ns_tb = namespace(list=[]) -%}
	{%- for tool in tools -%}
	{%- if tool.function is defined -%}
	{%- set ns_tb.list = ns_tb.list + [tool] -%}
	{%- else -%}
	{%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
	{%- endif -%}
	{%- endfor -%}
	{%- endif -%}

	{#- ===== SECTION 7: OUTPUT SYSTEM TURN =====
	Each fragment output via its own {{ }} block so tojson Markup objects are
	never Python-concatenated with plain strings (would trigger HTML-escaping).
	User system content appears BEFORE the tools block (correct ordering).
	No default system prompt injected.
	-#}
	{%- if ns_sys.content or _has_tools -%}
	{{- '<\|im_start\|>system\n' -}}
	{%- if ns_sys.content -%}
	{{- ns_sys.content -}}
	{%- if _has_tools -%}{{- '\n\n' -}}{%- endif -%}
	{%- endif -%}
	{%- if _has_tools -%}
	{{- '# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n' -}}
	{%- for tool in ns_tb.list -%}
	{{- tool \| tojson -}}
	{%- if not loop.last -%}{{- '\n' -}}{%- endif -%}
	{%- endfor -%}
	{{- '\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call>' -}}
	{%- endif -%}
	{{- '<\|im_end\|>\n' -}}
	{%- endif -%}

	{#- ===== SECTION 8: MAIN MESSAGE LOOP =====
	FIXED in v0.7:
	- Tool responses now have error detection via detect_tool_error macro
	- Warning messages injected for failed tool calls
	- consecutive_failures tracking for escalating warnings
	-#}
	{%- for message in messages -%}

	{#- 8a: System / Developer — already rendered above, skip -#}
	{%- if message.role == 'system' or message.role == 'developer' -%}

	{#- 8b: User messages -#}
	{%- elif message.role == 'user' -%}
	{%- set _uc = render_content(message.content \| default(''), true, false) -%}
	{{- '<\|im_start\|>user\n' + _uc + '<\|im_end\|>\n' -}}

	{#- 8c: Assistant messages -#}
	{%- elif message.role == 'assistant' -%}
	{#- Safely extract content as string — guard against absent key.
	Also support message.reasoning_content as an explicit think-block source
	(used by some frameworks that store thinking separately from content). -#}
	{%- if message.content is defined and message.content is string -%}
	{%- set _ac = message.content -%}
	{#- FIX: also exclude strings - llama.cpp treats strings as non-iterable in for loops -#}
	{%- elif message.content is defined and message.content is iterable and message.content is not mapping and message.content is not string -%}
	{%- set _ac = render_content(message.content, false, false) -%}
	{%- else -%}
	{%- set _ac = '' -%}
	{%- endif -%}

	{#- Reconstruct content from reasoning_content + content when the framework
	stores thinking separately (e.g. OpenAI-style reasoning_content field).
	Only apply when no think-block already present in _ac. -#}
	{%- if message.reasoning_content is defined and message.reasoning_content is string
	and message.reasoning_content \| trim
	and '<think>' not in _ac -%}
	{%- set _ac = '<think>\n' + message.reasoning_content \| trim + '\n</think>\n\n' + _ac -%}
	{%- endif -%}

	{#- Collect tool_calls if present -#}
	{#- Type check: ensure tool_calls is a list, not string (llama.cpp compatibility) -#}
	{%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls is not string else [] -%}

	{#- Strip <tool_call> prefix from content when tool_calls also present
	(some frameworks duplicate the data in both fields) -#}
	{%- if _tc and '<tool_call>' in _ac -%}
	{%- set _ac = _ac.split('<tool_call>')[0] \| trim -%}
	{%- endif -%}

	{#- FIXED in v0.7: Think-block handling with preserve_thinking support

	New logic (from v18): preserve_thinking controls think-block display on ALL
	assistant messages, not just generation prompt:

	- Tool-call turns : never strip (think block is part of the tool-call format)
	- preserve_thinking : if true, show think blocks on ALL messages
	- Last-history turn : if preserve_thinking false, apply last-turn handling
	- Historical turns : if preserve_thinking false, strip think blocks

	The old behavior (strip unless add_generation_prompt) is now controlled
	by preserve_thinking parameter.
	-#}
	{%- set _show_think = false -%}
	{%- if _tc -%}
	{#- Tool calls: always show think block -#}
	{%- set _show_think = true -%}
	{%- elif ns.preserve_thinking -%}
	{#- preserve_thinking=true: show think blocks on all messages -#}
	{%- set _show_think = true -%}
	{%- elif loop.last -%}
	{#- Last message without preserve_thinking: show if thinking enabled -#}
	{%- set _show_think = ns.enable_thinking -%}
	{%- endif -%}

	{#- Apply think-block stripping based on _show_think flag -#}
	{%- if not _show_think -%}
	{#- Fuzzy end-tag detection for stripping -#}
	{%- set _think_end = '' -%}
	{%- if '</think>' in _ac -%}
	{%- set _think_end = '</think>' -%}
	{%- elif '</thinking>' in _ac -%}
	{%- set _think_end = '</thinking>' -%}
	{%- elif '</ think>' in _ac -%}
	{%- set _think_end = '</ think>' -%}
	{%- elif '</think >' in _ac -%}
	{%- set _think_end = '</think >' -%}
	{%- endif -%}
	{%- if _think_end -%}
	{%- set _ac = _ac.split(_think_end)[-1].lstrip('\n') -%}
	{%- endif -%}
	{%- elif not _tc and loop.last and '<think>' not in _ac and not ns.enable_thinking -%}
	{#- Last turn, non-thinking: inject empty think block if missing -#}
	{%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
	{%- endif -%}

	{#- Emit the assistant turn -#}
	{{- '<\|im_start\|>assistant\n' -}}
	{%- if _ac -%}
	{{- _ac -}}
	{%- if _tc -%}{{- '\n' -}}{%- endif -%}
	{%- endif -%}

	{#- Render tool calls in Hermes format.
	Each value output via its own {{ }} block — never concatenated with plain strings
	in Python, which would trigger Markup HTML-escaping. -#}
	{%- if _tc -%}
	{%- for tc in _tc -%}
	{{- '<tool_call>\n' -}}
	{{- '{"name": ' -}}{{- tc.function.name \| tojson -}}
	{%- if tc.function.arguments is string -%}
	{{- ', "arguments": ' + tc.function.arguments -}}
	{%- else -%}
	{{- ', "arguments": ' -}}{{- tc.function.arguments \| tojson -}}
	{%- endif -%}
	{{- '}' -}}
	{%- if not loop.last -%}
	{{- '\n</tool_call>\n' -}}
	{%- else -%}
	{{- '\n</tool_call>' -}}
	{%- endif -%}
	{%- endfor -%}
	{%- endif -%}
	{{- '<\|im_end\|>\n' -}}

	{#- 8d: Tool results — with error detection (NEW in v0.7) -#}
	{%- elif message.role == 'tool' -%}
	{%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
	{%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}

	{#- NEW in v0.7: Detect errors in tool response -#}
	{%- set _tool_content = message.content \| default('') -%}
	{{- detect_tool_error(_tool_content) -}}

	{%- if _prev_role != 'tool' -%}
	{{- '<\|im_start\|>user\n' -}}
	{%- endif -%}
	{{- '<tool_response>\n' -}}
	{{- _tool_content -}}

	{#- NEW in v0.7: Inject warning if tool error detected -#}
	{#- v0.8: Replaced emoji with text-only for tokenization safety -#}
	{%- if ns.last_tool_failed -%}
	{%- if ns.consecutive_failures >= 2 -%}
	{{- '\n\n[SYSTEM WARNING: ' ~ ns.consecutive_failures ~ ' consecutive tool errors detected. Your previous approach is incorrect.]' -}}
	{%- else -%}
	{{- '\n\n[SYSTEM WARNING: The previous tool call returned an error. Diagnose the failure and retry with corrected arguments.]' -}}
	{%- endif -%}
	{%- endif -%}

	{%- if _next_role == 'tool' -%}
	{{- '\n</tool_response>\n' -}}
	{%- else -%}
	{{- '\n</tool_response>' -}}
	{{- '<\|im_end\|>\n' -}}
	{%- endif -%}

	{#- 8e: Unknown role - explicit error (from v18) -#}
	{%- else -%}
	{{- raise_exception('Unexpected message role: ' + message.role) -}}
	{%- endif -%}

	{%- endfor -%}

	{#- ===== SECTION 9: GENERATION PROMPT =====
	FIXED in v0.7: preserve_thinking now affects conversation history (Section 8),
	so generation prompt logic is simplified.

	enable_thinking=True → open <think>\n prefill so llama.cpp reasoning-budget
	and other inference engines can hook into the think-stream.
	The model continues generating inside the open block.
	enable_thinking=False → exact non-thinking prefill: </think>\n\n

	NOTE: The <think>\n opener is EPHEMERAL — it lives only in the generation
	prompt, never in chat history. Historical think-block stripping is handled
	in Section 8 based on preserve_thinking setting.
	-#}
	{%- if add_generation_prompt -%}
	{{- '<\|im_start\|>assistant\n' -}}
	{%- if ns.enable_thinking -%}
	{{- '<think>\n' -}}
	{%- else -%}
	{{- '<think>\n\n</think>\n\n' -}}
	{%- endif -%}
	{%- endif -%}