Jun 07, 2026 17:00:00

Forge is a free service that adds guardrails to local AI, such as features to prompt retries, enforce steps, recover errors, and manage context while considering VRAM.

Generally, AI models don't always produce 100% accurate results. Hallucination, which produces plausible lies, is well-known, but there are also risks of timeouts and loops occurring during multi-step processing. ' Forge ' is a reliability layer that can improve the accuracy of self-hosted LLMs by applying 'guardrails' such as prompting retries.

antoinezambelli/forge: A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

https://github.com/antoinezambelli/forge/tree/main

Forge is a Python-based framework that allows self-hosted LLMs to call necessary tools in any order by specifying a toolset within Forge. The workflow structure is opt-in, and loops can be restricted as needed by specifying 'required steps,' 'prerequisites,' and 'terminal tools.' Furthermore, Forge's 'guardrails' (rescue analysis, retry acceleration, and response verification) are applied even when there are no required steps.

According to score measurements using the 'forge v0.7.0 evaluation suite consisting of 26 scenarios,' a local LLM with an 8B scale saw its score improve from less than 10% to 84% by using forge, and Claude Sonnet 4.6 also improved from 85% to 98%.

Forge supports the following local LLMs as its backend:

backend	recommendation	Native function call?
Ollama	Easiest setup, includes model management features as standard.	yes
llama-server	Top performance, fully controllable	yes
Llamafile	Single binary, no dependencies	no
vLLM	High throughput, AWQ/GPTQ weighting	yes
Anthropic	Frontier baseline, hybrid workflow	yes

Furthermore, Forge has three different usage methods depending on the purpose and application.

• Proxy server : By placing it between the client and the local model server, the client perceives that it is communicating with a smarter model.
• Workflow Runner : Forge manages the entire lifecycle through tool definition, backend selection, and execution of structured agent loops.
Guardrail middleware : The user controls the loop, while Forge acts as a reliability stack, verifying responses, rescuing invalid tool calls, and enforcing required steps.

While Forge has a high barrier to entry, using it properly can significantly improve the reliability of local LLMs, so those who regularly use local LLMs should definitely check it out.

Related Posts:

Jun 07, 2026 17:00:00 in AI, Software, Posted by log1c_sh