A weird thing about the claim that bots surpassed humans is that most people hear it as a story about fake people. As if half the internet is now bots arguing on Reddit, posting slop, and liking each other’s comments.
That is not what the 51% number means. Imperva/Thales’s 2025 Bad Bot Report said automated traffic made up 51% of all web traffic in 2024, versus 49% from humans, and 37% of total traffic was classified as bad bots, up from 32% in 2023. Search crawlers, scrapers, monitoring tools, API probes, price bots, and attack traffic all get counted in that same mix.
Here’s the update that makes the headline more interesting, not less: by 2025, Human Security was already breaking AI-driven traffic into training crawlers (67.5%), AI scrapers (31.9%), and agentic AI (1.7%). That is the market moving on from “did bots beat humans?” to the question that actually matters: which automated traffic gets allowed, charged, or blocked.
The core argument is simple: the 51% headline matters because it collapses useful automation, commercial extraction, and malicious abuse into one traffic metric, turning web traffic from an audience signal into a cost-allocation problem.
Did bots surpass humans in internet traffic in 2024?
Yes. By Imperva/Thales’s measurement, bots surpassed humans in internet traffic in 2024.
But wait, what exactly is being measured here? Not “how many fake users exist online.” Not “how many social accounts are bots.” The report is about web requests observed across Imperva’s network, which means it is counting machine hits to websites, apps, and APIs.
A single product page can now get requests from:
– Google indexing it
– an AI scraper copying it for training or search
– a competitor’s price bot checking it every minute
– a monitoring tool verifying uptime
– a credential-stuffing bot trying leaked passwords
– actual humans, maybe
Those all land in traffic totals. That is why bots surpassed humans is both true and easy to misunderstand.
A compact view helps:
| Traffic category (2024, Imperva/Thales) | Share of web traffic |
|---|---|
| Human traffic | 49% |
| Automated traffic | 51% |
| └─ Bad bots | 37% |
| └─ Other automation / good bots | 14% |
That last line is the whole fight. The 51% figure is not “the robots are here.” It is a single top-line bucket mixing together useful automation, extractive automation, and outright abuse.
Why the 51% figure matters more than the headline
The important part is not that machines “won.” It’s that traffic has stopped being a clean proxy for audience because one metric now hides radically different economic behaviors.
If your analytics spike, was that:
– more readers,
– more search crawler activity,
– more AI-driven traffic,
– more scraping,
– or more attack traffic?
Those are not different shades of the same thing. They are different cost structures pretending to be one number.
Here’s a simple hypothetical. Say a publisher had 10 million monthly page requests in January and 12 million by April, a tidy 20% traffic increase. Sounds healthy. But subscriptions are flat, ad revenue is flat, and hosting costs are up 28%. After segmenting logs, they find the extra 2 million requests came mostly from AI scrapers, aggressive monitoring, and bot probes hitting uncached pages.
That is not growth. It is somebody else’s automation using your infrastructure.
This is why the 51% number matters. Once useful crawling, commercial extraction, and malicious abuse all hit the same servers, “traffic” becomes less of an audience metric and more of a billing dispute.
What the report counts as bots, and what it does not
A lot of people hear “bots” and think fake social-media personas. That category exists, but it is not what this report is mainly about. The report is about automated web traffic.
Imperva splits traffic into broad classes. “Good bots” include search engine crawlers and some useful automation. “Bad bots” include scraping, fraud, account takeover attempts, vulnerability probing, and other abusive behaviors. In Imperva’s count, bad bots alone reached a record 37% of total internet traffic in 2024.
Wait, if search crawlers and attack bots both count as bots, doesn’t that make the 51% number kind of mushy? Yes. Exactly.
The number is real, but it is not morally sorted. It bundles together:
– useful automation that helps the web function
– commercial extraction that copies or monitors without much reciprocity
– malicious abuse that drives everyone’s costs up
A small comparison table makes this clearer than another paragraph of abstraction:
| Traffic type | Typical purpose | Value created | Cost to site owner |
|---|---|---|---|
| Human sessions | Read, browse, buy, subscribe | Direct business value | Usually worth serving |
| Search crawlers | Index pages for discovery | Referral value | Moderate infrastructure cost |
| AI scrapers / training crawlers | Collect content or product data | Often externalized to the crawler operator | Can create high uncached load with little return |
| Monitoring / API automation | Uptime checks, integrations | Operational value | Usually predictable and accepted |
| Bad bots | Fraud, scraping, credential stuffing, probing | Negative value | Security spend, origin load, incident risk |
Different vendors will produce different percentages because they see different traffic and define bot categories differently. That does not kill the pattern. It sharpens the real question: which machine load are you paying for, and what do you get back?
Why 2024 was the tipping point for bad bots
The cleanest progression in the data is this: bad bots rose from 32% of traffic in 2023 to 37% in 2024, AI made them cheaper to adapt, trusted crawler identities became camouflage, and operators started writing access policy by bot type instead of pretending all automation was one category.
First, the jump itself. Five points in one year is huge. At internet scale, that means a lot more scraping, probing, fraud, and attack traffic hitting applications that still often report “traffic” as if it implies demand.
Second, generative AI changed the workflow more than the existence of automation. Bots were already scraping pages, rotating IPs, testing leaked credentials, and hitting APIs long before ChatGPT. What AI changes is iteration cost. It helps operators rewrite scripts after blocks fail, generate more realistic request patterns, produce convincing spam or form fills, and keep adjusting faster than simple defenses can keep up.
According to coverage of the Imperva findings, the company observed roughly 2 million AI-enabled attacks per day in 2024. That does not mean every attack was some autonomous super-agent. It means the boring parts of bot operations got cheaper.
Then there is the stranger detail. Reporting around the data said ByteSpider, the crawler associated with ByteDance, accounted for 54% of AI-enabled attacks seen by Imperva. The revealing part is not a cartoon villain story. It is that known crawler identities can become useful cover. If a name is widely recognized, attackers spoof it. Reputation becomes attack surface.
By 2025 and into 2026, the classification got more explicit: training crawlers (67.5%), AI scrapers (31.9%), agentic AI (1.7%). Those categories are not taxonomy; they are the beginnings of a pricing and permission system for the web.
A training crawler raises one policy question: can it ingest your content at all? An AI scraper raises another: does it pay if it extracts value at scale? Agentic traffic raises a third: what actions can an automated system take on your site before it needs identity, rate limits, or a contract?
That is the payoff. The web is moving toward three tiers for automation:
– Allow: indexing, verified integrations, predictable monitoring
– Charge: high-volume extraction, premium API access, expensive dynamic endpoints
– Block: fraud, credential stuffing, abusive scraping, spoofed crawlers
What readers should take from the bot traffic shift
If you run a site, app, store, or API, the move now is not “get serious about bots” in the abstract. It is to classify traffic by economic behavior and tie each class to an action.
Here’s a concrete operator checklist:
| Metric to watch | What it tells you | Trigger | Action |
|---|---|---|---|
| Verified-human sessions | Real audience | Requests rise while verified-human sessions stay flat | Treat the increase as suspect until segmented |
| Conversion rate | Business value from traffic | Traffic up, conversions flat or down | Check for scraper or low-intent bot inflation |
| Uncached origin requests | Expensive infrastructure load | Origin hits rise faster than pageviews | Rate-limit heavy paths, cache harder, gate expensive endpoints |
| WAF challenge rate | Active bot pressure | Sharp rise in challenges or failed challenges | Tighten bot rules, fingerprint repeat offenders, protect login/search/API routes |
| Top scraper user agents and ASN/IP clusters | Who is extracting data | New crawler identities spike or “known” crawlers behave oddly | Verify, throttle, require robots compliance or block |
| Cost per 1,000 requests by traffic class | Who is making you pay | One traffic class has high marginal cost and low value | Move it to paid API, metered access, or denial |
That last metric is the one more teams should compute. Not just traffic share. Cost per 1,000 requests by traffic class. Once you have that, the policy gets much easier.
For publishers, the practical shift is from SEO policy to access policy. Decide which crawlers can index, which can train, which can scrape article archives, and which get cut off. The AI content feedback loop is one version of the same problem: machine systems consuming content, reproducing it elsewhere, and sending back less value than they take.
For platforms, the visible bot problem is only one layer. Fake accounts matter, and our piece on How Many Bots Are on X (Twitter)? gets into that. But underneath the feed is the bigger infrastructure shift: more of the web’s activity is systems talking to systems before any human shows up.
For ordinary users, the change is quieter. More of what looks like “internet activity” is now fetching, indexing, copying, ranking, probing, and acting by automation. The web is still made for people. But a growing share of its traffic economy is machine-to-machine negotiation over who gets access, who gets blocked, and who picks up the bill.
Key Takeaways
- Imperva/Thales said automated traffic reached 51% of all web traffic in 2024, so by that measure bots surpassed humans.
- That headline is easy to misread because it mixes useful automation, commercial extraction, and malicious abuse into one traffic bucket.
- The real shift is economic: web traffic is becoming less of an audience signal and more of a cost-allocation problem.
- Generative AI matters mostly because it lowers the cost of adapting, disguising, and scaling bot operations.
- By 2025/2026, firms were already segmenting AI-driven traffic into training crawlers, AI scrapers, and agentic AI, which is really a blueprint for allow / charge / block policy.
Further Reading
- Bots now make up more than half of global internet traffic | The Independent, News coverage of the Imperva/Thales finding that automated traffic reached 51% in 2024.
- Artificial Intelligence Fuels Rise of Hard-to-Detect Bots…, Syndicated reporting with the key figures: 51% total bot traffic and 37% bad-bot traffic.
- 2025 Bad Bot Report (Imperva/Thales), The original report, including definitions, methodology, and year-over-year comparisons.
- 36% of global internet traffic originated from bots, A useful reminder that bot estimates vary depending on how they are measured.
- AI-driven traffic is the fastest-growing category of internet traffic, Follow-on reporting showing how AI crawlers, scrapers, and agents are becoming distinct traffic categories.
