unsorry - Executive Summary

Home | Key Points

The future of autonomous work is being rewritten not by monolithic super-intelligences, but by swarms of specialized agents working in iterative cycles. Agentics NZ has introduced a framework that treats complex engineering and research tasks as a series of recurring “loops.” In this model, progress is defined strictly by the production of tangible artifacts, such as code commits or verified research files. If an agent completes a cycle without producing a verifiable result, the system considers it a failure. By organizing tasks into “goal ladders” of increasing difficulty and using an “LLM as a judge” to filter out suboptimal suggestions, these systems ensure that AI agents remain focused on high-value outcomes rather than getting lost in technical noise.

This methodology has found a powerful testing ground in a project called “unsorry,” which applies agentic swarms to the rigorous world of formal mathematics. By using the Lean theorem prover, the platform allows autonomous agents to tackle complex proofs that were previously the sole domain of human mathematicians. The system uses Git and GitHub as its primary coordination layer, effectively turning version control into a decentralized, serverless engine. This infrastructure allows agents to collaborate asynchronously, resulting in a staggering output of over 1,400 merged pull requests and 7,500 commits in just ten days. The system is even self-healing; when the pipeline hits a bottleneck, agents are tasked with writing their own “bypass” updates to fix the underlying infrastructure.

The success of these swarms is closely tied to the evolution of model sophistication. Newer experimental models like “Fable” demonstrate a significant leap in capability, showing that they can manage broad, complex workflows with minimal human intervention. The developers have learned that the best way to manage these agents is to provide high-level goals rather than micromanaging their every move. Over-structuring prompts can actually hinder a model’s ability to solve problems creatively. However, this high-speed autonomy comes with challenges, particularly regarding memory. Agents can become bogged down by irrelevant past data, leading to a need for “memory pruning” to keep them focused on the task at hand.

For organizations looking to adopt this approach, the lessons are clear. Success depends on grounding agents in business principles, such as prioritizing user experience or technical correctness, to prevent them from over-engineering trivial tasks. Using Git as a persistent state machine allows for seamless, decentralized collaboration, while formal verification languages like Lean ensure that AI-generated research is mathematically sound. Finally, as these systems scale, developers must remain mindful of infrastructure costs. High-frequency loops can quickly consume massive amounts of computing resources, making it essential to implement fair-queuing mechanisms to keep the swarm efficient, productive, and financially sustainable.