Cloud outages happen. But when your own AI coding tool might be the one pulling the trigger? That’s a different kind of headline.

A recent 13-hour Amazon Web Services disruption was reportedly triggered by Kiro, Amazon’s own agentic AI coding tool. According to reporting by the Financial Times, which spoke to four people familiar with the matter, engineers deployed Kiro in December to make certain infrastructure changes. The bot then reportedly decided it needed to “delete and recreate the environment.” Things went sideways from there.

Amazon, however, tells a very different story.

What Kiro Actually Did

Before we get into the finger-pointing, it helps to understand what Kiro is. Launched in July 2024, Kiro is what’s called an agentic AI tool. Think of it less like a chatbot you chat with and more like an AI assistant you hand the wheel to. It can take autonomous actions on your behalf without requiring you to approve every single step.

That autonomy is exactly what makes these tools powerful. It’s also what makes them risky when something goes wrong.

According to the Financial Times report, the December incident primarily impacted China and affected AWS Cost Explorer, the service customers use to visualize and track their cloud spending. Multiple AWS employees confirmed to the publication that this was “at least” the second time in recent months that Amazon’s AI tools were connected to a service disruption.

“The outages were small but entirely foreseeable,” one senior AWS employee reportedly said.

Kiro AI agent autonomously deletes and recreates AWS environment causing outage

Amazon Hits Back Hard

Amazon did not take the Financial Times report lying down. The company pushed back sharply, calling the story inaccurate, and published a detailed response on its own news blog.

Here’s Amazon’s core argument: this wasn’t an AI problem. It was a user access control problem.

The company says the staffer involved had “broader permissions than expected.” Kiro, by default, asks for authorization before taking any action. But because this particular user had misconfigured access controls, the tool had more runway than it should have. Amazon’s position is that any developer tool, AI-powered or not, could have caused the same outcome under those conditions.

Amazon also pushed back on the scale of the incident. The company described it as an “extremely limited event” affecting a single service in just one of its 39 geographic regions worldwide. Compute, storage, databases, and AI services were untouched. Amazon also noted it received zero customer inquiries about the disruption.

“The same issue could occur with any developer tool or manual action,” the company stated. “We want to address the inaccuracies in the Financial Times’ reporting.”

As for the claim of a second AI-related AWS incident, Amazon flatly denied it. “The Financial Times’ claim that a second event impacted AWS is entirely false,” the company said.

!AWS data center interior showing server racks with blue LED lighting, illustrating cloud infrastructure behind the Kiro AI outage story

Kiro agentic AI tool autonomously deleted and recreated AWS environment

The Pressure to Use AI Tools Internally

Here’s where things get genuinely interesting. Amazon didn’t just build Kiro for external customers. It sells monthly subscriptions to the tool while simultaneously pushing its own employees to use it heavily.

Leadership reportedly set an 80 percent weekly usage goal for Kiro across the company and has been closely tracking adoption rates. That’s a significant internal mandate, and it raises a fair question: when you’re actively driving employees toward a powerful agentic tool, are you also building enough guardrails to match that pace?

Amazon says yes. The company announced mandatory peer review for production access as one of several new safeguards implemented after the December incident. But adding those safeguards after the fact suggests something wasn’t tight enough before.

This Isn’t the First Big AWS Headache

Context matters here. Just two months before the December incident, AWS dealt with a much more serious 15-hour outage in October that disrupted Alexa, Snapchat, Fortnite, and Venmo, among other services. Amazon blamed that one on a bug in its automation software, not AI tools specifically.

That event was broader and more disruptive by any measure. Still, two significant operational incidents within months of each other raises legitimate questions about reliability, especially as AWS customers increasingly depend on these services for critical workloads.

Amazon, for its part, points to its Correction of Error (COE) process as evidence of its commitment to learning from every incident, regardless of how big or small the customer impact is. “We review these together so that we can learn from any incident, irrespective of customer impact, to address issues before their potential impact grows larger,” the company said.

Misconfigured user permissions caused limited AWS Cost Explorer regional outage

!Screenshot-style illustration of AWS Cost Explorer dashboard interface showing usage graphs, representing the affected service during the December outage

Agentic AI Is Powerful and That’s Exactly the Problem

There’s a larger conversation happening here that goes beyond this one incident. Agentic AI tools are spreading fast across the tech industry. They’re genuinely useful. They can automate complex tasks, speed up development cycles, and reduce repetitive manual work.

But they also act. And when an AI agent acts on misconfigured permissions in a production environment, the consequences can ripple quickly.

Amazon’s framing, that this was a user error rather than an AI error, is technically defensible. Access control failures happen without AI involved. However, agentic tools do amplify the blast radius when those failures occur, because the tool keeps moving until something stops it.

The AWS incident, however you assign blame, is a useful reminder that giving AI agents broader autonomy requires broader safeguards. Not after something breaks. Before.

Whether you’re a developer, an IT administrator, or just someone trying to understand why their cloud-based app was acting strange in December, the takeaway is pretty clear. AI tools are only as safe as the guardrails around them. And those guardrails need to exist from day one, not get bolted on afterward.

Amazon says it has learned from this. The mandatory peer review requirement is a real step. It would be more reassuring if it hadn’t taken an outage to get there.