What We Learned Building AI Inside One of the Most Compliance-Intensive Operations We Run

By Sooraj Chandran, Group Product Manager, International, Justworks

I want to tell you a story about building AI inside a high-stakes environment, and I want to tell all of it, including the part where something broke.

Project Turing is an AI Co-Pilot built for Justworks' international EOR operations team. EOR, employer of record, is the service we provide to small businesses hiring employees outside the United States. We are the legal employer in those countries on the customer's behalf. We own the compliance obligations, the tax filings, the entity structures, the employment agreements. The customer gets an international employee. We get the responsibility of making sure everything is done correctly across 50 countries, each with its own employment law and regulatory environment.

This is not a simple environment to layer AI on top of. The stakes are high. The edge cases are real. And a wrong answer carries legal and financial consequences for a small business that trusted us to get it right.

Which is exactly why it is the right environment to build AI in. Because if you can make AI trustworthy here, you have proven something meaningful about what AI can do in high-compliance, high-stakes work.


Why we started here

The international ops team was spending an enormous proportion of their time on work that was mechanical, pattern-based, and high-volume. Looking up employment agreement clauses. Cross-referencing job descriptions against compliance requirements. Pulling together information from multiple systems to answer a question that a consultant already knew how to answer. They just needed the data assembled first.

That is the kind of work AI is genuinely well-suited for: removing the assembly work that sits in front of judgment so the humans can spend their time on the parts that actually require expertise.

The goal from the beginning was to give our people back the time they were spending on work that did not require them, so they could spend more of their time on the work that did.


What we built and how we built it

We started with a team of two. No large project team, no lengthy planning phase. A clear problem, a clear standard, and a commitment to ship something real as fast as possible so we could learn from it.

The standard we held ourselves to was simple: the output has to be trustworthy enough that our people will actually rely on it. Accurate, consistent, and grounded in the specific compliance context of the country and situation in question.

In a single week recently, that team of two shipped five new capabilities. Usage more than doubled in that same week. Messages went from 188 to 426, a 127% increase. Unique conversations nearly tripled, from 64 to 179. Token usage spiked 3.5x, disproportionate to message growth, which tells us the queries are getting more complex, not just more frequent. People are asking harder questions.

The reaction from Julio, one of our international consultants, after completing a task in under 10 minutes that would otherwise have taken multiple people several days: "Honestly, wow, blown away."

That is the signal we were building toward. A person who does this work every day telling us that something genuinely changed.


What broke and what it told us

A few weeks in, we tried something more ambitious.

A mega query: reviewing all employment agreements and job descriptions for hundreds of employees for specific clauses and phrases. A team of people spent several weeks completing this same task last quarter as part of a compliance audit. Exactly the kind of high-volume, pattern-based work where the cost of human time is high and the value of the output is unambiguous.

The Project Turing Co-Pilot nearly completed it in under 15 minutes.

Nearly. The agent hit its limits before finishing. The run failed.

That failure did not tell us the approach was wrong. It told us exactly what to build next. The system was capable enough to attempt a task that would take a human team weeks, and it got most of the way there before hitting an infrastructure constraint we now have a clear path to resolve. A roadmap, not a setback.

What we are building toward: long-running autonomous agents that can handle tasks of this complexity without hitting context or memory limits. Sub-agents that can be orchestrated across a multi-step compliance workflow. And the goal we have set for the next three months: an ops team member who can create an agent or workflow entirely without engineering intervention.

That last milestone matters most to me. The people who know this work best, the international consultants who understand the compliance environment, the edge cases, the places where the data is ambiguous, will be able to build their own tools directly, in plain English. The expertise and the toolbuilding will be in the same hands.


What this requires that pure software cannot provide

International EOR compliance is not a domain where approximate answers are acceptable. A missed clause in an employment agreement. A misclassified employee in a jurisdiction with strict employment law. A benefits structure that does not comply with local requirements. These are at the core of what we do.

Building AI in this environment requires deep institutional knowledge baked into every design decision. Which data sources can be trusted. Where the ambiguity in a regulation requires human judgment rather than a model output. What the failure modes look like and why they matter. How to design the system so that the human is always in the right position to catch what the AI cannot.

We could not have built Turing in a meaningful way without twelve years of Justworks operating in this environment. The knowledge layer, the understanding of what the compliance questions actually are, what a trustworthy answer looks like, and where the boundaries of automation should sit, is not something you acquire quickly. It is something you accumulate by doing the work at scale over time.

Trust is not a claim you make. It is something you demonstrate through the specificity of the work, and the willingness to be honest about what broke along the way.


What we are building next

For the next several weeks the team is focused on the infrastructure investment the mega query failure signaled we needed: long-running agents, core agent capabilities. More depth, executed more reliably, on harder problems.

We are also moving toward making Ops Co-Pilot an internal agents platform, enabling other teams across Justworks to build their own capabilities directly on top of the infrastructure we have created. The international team will not be the only ones building this way for long.

Within three months, an ops team member will create an agent or workflow without any engineering intervention. When that happens, the people who know the compliance environment best will be building the tools that serve it. That is when this gets really interesting.


Sooraj Chandran, Group Product Manager, International, Justworks