All Posts · AWS Security IAM Incident 7 min read
AWS
IAM
Incident Report Status: Resolved
FRIDAY → SUNDAY

How many IAM roles with administrator access exist in
your AWS account right now?

If you can't answer that instantly, keep reading.
I couldn't either — and it cost us a full weekend.

Last year I was the DevOps lead at a fast-growing startup, responsible for AWS infrastructure, CI/CD pipelines, and platform security across multiple client projects. On a Friday afternoon I got a call that a CloudFront distribution wasn't deploying. Routine issue. Except it wasn't.

AWS had flagged and frozen our account. An access key had been exposed. Lambda was down. Nothing new could be provisioned. The weekend had just become a work weekend — and I didn't fully understand why yet.

By Sunday night, I did. Here are the three things that incident taught me — and the story of how I learned each one the hard way.

The credential you warned about
will be the one that leaks.

For months I had pushed the team to stop using hardcoded AWS access keys in application code and switch to IAM instance profile roles instead. Every EC2, every ECS task — they should assume a role, not carry a key. The request was ignored. There were always more urgent things.

Then a developer used an AI coding tool carelessly, and the key ended up somewhere public.

Friday evening. CloudTrail open. I'm tracing every action that key had taken over the previous week. What I found wasn't just an accidental leak — the attacker had already moved. They had created a dummy IAM user with administrator access. Then created a second IAM role with administrator access, for use after we rotated the first key. They were planning to stay.

We killed the compromised user entirely. Spent the rest of Friday hunting every place that key lived — Bitbucket pipelines, application configs, environment variables across three environments. Replaced everything. Locked the replacement down to no console access and minimal permissions.

By midnight Friday we thought it was over.
It wasn't.

The access you didn't set up
is the access you don't watch.

Saturday morning. Still seeing suspicious activity. Something was still wrong. This is where the real story starts.

A third-party vendor — integrated into our AWS account for payment processing — had their own IAM role sitting in our account. With administrator access. A direct trust relationship with our account. Set up before I joined, never reviewed, never questioned because the vendor had always "just worked."

That role was the actual entry point. The attacker had compromised something on the vendor's side and used their trusted role to get into our account with full administrator privileges. The Friday night rotation we were so confident about had done nothing to close that door.

We pulled the vendor's access immediately — down to read-only for support purposes only. Then I called a senior contact at the vendor to understand what had happened on their end. That conversation was not comfortable. But it was necessary.

We worked through Saturday and into Sunday rebuilding.

An incident is a gift
if you use it properly.

AWS restored our account by Sunday after reviewing our remediation. But the real work wasn't getting back to normal — it was making sure normal was no longer the problem.

No more hardcoded credentials. Every EC2 and ECS instance now uses IAM instance profiles — the application assumes a role, never touches a key. This is what I had asked for months earlier. The incident made it non-negotiable.

Separated keys by purpose. Application-level access scoped tightly to exactly the services each app needs — specific S3 buckets, specific Lambda functions, nothing broader. CI/CD pipelines got completely separate credentials with their own scope.

Audited every third-party IAM role. For each one: who set it up, what does it trust, what can it do, when did it last assume anything. Anything we couldn't explain got removed or scoped down. This audit alone found two other roles that had no business existing.

Redesigned CloudTrail alerting to catch unusual role assumptions and cross-account activity in near real-time. Not after the fact. Not in a Monday morning review. In real time.

The questions worth asking today, before Friday comes.

Run this audit on your AWS account — now
How many IAM roles with administrator access exist in your account right now?
When did each of them last assume anything?
Can you explain every third-party trust relationship in your account?
Are your EC2 and ECS workloads using instance profiles or carrying hardcoded keys?

If any of these made you pause — that's the work. Not glamorous. Not urgent until it suddenly is.

I found our unknown administrator role on a Saturday, mid-incident, while thinking we had already fixed everything the night before.

You have time to find yours first.
DK

Divesh Kumar

Solution Engineer · DevOps Lead · Cloud Platforms

4+ years in AWS, Azure, GCP · Kubernetes · Terraform · CI/CD