ai

OpenSquilla 0.4.0 Released: AI Coding Can “Self-Verify” for the First Time

The open-source AI agent project OpenSquilla recently released version 0.4.0. The core innovation of this release is the introduction of a coding workflow called “Coding Mode,” bringing a “self-verification” mechanism to AI coding for the first time: instead of simply claiming “it’s fixed,” the AI now generates verifiable evidence by running tests itself before returning the result, proving that the fix actually works.

This mechanism addresses the biggest bottleneck in AI coding today—trust. Over the past year, AI’s coding capabilities have advanced significantly, but “being able to write code” does not mean “being trustworthy.” Most coding agents deliver code immediately after making a fix, while humans still have to review it line by line to determine whether it is actually correct. This remains the fundamental obstacle preventing AI coding from operating autonomously and scaling into production environments. Integrating verification directly into the agent means the industry’s standard for evaluating AI coding is shifting from “claims it fixed the problem correctly” to “can prove it fixed the problem correctly.”

Its approach is an independent “red-green regression proof chain.” First, identify the bug and write a test that is guaranteed to fail, proving that the bug has truly been captured. Next, fix the functionality so that the test changes from red to green. Finally, run the project’s existing test suite once again to verify that nothing else has been broken. Only if all three stages pass is the result delivered; if any stage fails, the result is rejected immediately. In addition, it includes a default automatic repair loop—which automatically revises the code until all tests pass—and isolated execution, where all changes are made only inside an isolated copy and are transferred to the source code only after they are accepted.

In its official demonstration, Coding Mode added a “correct gradient calculation” capability to the well-known open-source project micrograd—the minimal automatic differentiation library created by Anthropic researcher Andrej Karpathy and widely regarded as a landmark project in AI education. When gradients are calculated incorrectly, a model neither throws an error nor crashes; instead, it silently drifts farther and farther from the correct result, making this one of the most difficult bugs to detect manually. The demonstration proceeded in two steps. First, the AI completed the “red → green → regression” three-stage process and produced its own proof of correctness. Then, a human compared micrograd’s new functionality side by side with the industry-standard tool PyTorch using the same input. The forward values and every gradient matched perfectly down to the 10th decimal place. In other words, it was not simply “the AI says it is correct,” but rather “there is not a single-point deviation from the official standard answer.” This also represents the team’s latest application for agent runtime following its next-generation benchmark, claw-swe-bench.

During the same period, OpenSquilla also introduced its first signed and notarized desktop installation package. It can be installed on macOS and Windows with a double click, requiring no command-line operations.

OpenSquilla aims to “increase agent intelligence per unit cost” and, with Learnable Harness as its entry point, seeks to build the agent product with the best price-to-performance ratio. While mainstream agent frameworks generally increase model calls and raise token costs, OpenSquilla reduces costs “before the call” through methods such as local intelligent routing that automatically selects models based on task complexity, loading capabilities only when needed, retrieving memory only when required, and preprocessing tool outputs. According to previous reporting by Silicon Star, based on the data presented, its intelligent routing achieves approximately 4.4 percentage points higher routing accuracy than the general gateway OpenRouter while reducing costs by approximately 75%. When running the same tasks as flagship models, the quality remains essentially at the same level, while the cost differs by approximately nine times. According to the official OpenSquilla website, internal testing under normal scenarios shows that total costs can generally be reduced by approximately 60% to 80%.

The founder of Jiyuan Lüdong (基元律动), Wang Yunhe, previously led large-model research and development at leading technology companies, while the CTO is Han Kai. Within weeks of OpenSquilla’s launch, its GitHub star count reached the thousands. According to public reports, the company completed its first round of financing only months after its establishment, making it one of the few representative players in the Harness and agent local model sector.

Leave a Reply

Your email address will not be published. Required fields are marked *

3 − 1 =