Perplexity Says Cloudflare's Accusations Of 'stealth' Ai Scraping Are Based On Embarrassing Errors

Trending 1 month ago
Cloudflare accuses Perplexity of sneaking into websites to bargain their content
Elyse Betters Picaro / ZDNET

ZDNET's cardinal takeaways

  • Cloudflare claims Perplexity ignores websites' wishes successful its contented hunt.
  • Other AI companies, specified arsenic OpenAI, don't swipe content, Cloudflare says
  • Cloudflare now offers services to artifact fierce AI crawlers.
  • Perplexity is denying Cloudflare's claims.

Cloudflare, a starring contented transportation web (CDN) company, has accused nan AI startup Perplexity of evading websites' "no crawl" directives by stealthily deploying web crawlers to scrape contented from sites that person explicitly blocked its charismatic bots.

If that sounds familiar, you've heard these accusations before. Last year, WIRED and Forbes some accused Perplexity of doing nan aforesaid point to their sites.

How Perplexity allegedly bypasses 'no crawl' directives

According to Cloudflare, erstwhile Perplexity's web crawler encounters a robots.txt file, which sites usage to artifact their contented from being crawled, Perplexity pretends to beryllium an mean Chrome web browser connected a Mac. This enables it to bypass nan bot barriers.

Also: Perplexity's Comet AI browser is hurtling toward Chrome - really to effort it

Cloudflare started investigating erstwhile it received complaints from customers who had "both disallowed Perplexity crawling activity successful their robots.txt files and besides created WAF [Web Application Firewall] rules to specifically artifact some of Perplexity's declared crawlers: PerplexityBot and Perplexity-User." The customers said their contented still ended up successful Perplexity, moreover aft they had blocked it.

The CDN past group up caller trial domains, explicitly prohibiting each automated entree successful its robots.txt files and done circumstantial WAF rules that blocked crawling from Perplexity's acknowledged crawlers. Cloudflare recovered that Perplexity would usage aggregate IP addresses not listed in its charismatic IP range and rotate done these IPs to sneak into nan sites' contented and records.

"In summation to rotating IPs, we observed requests coming from different Autonomous System Numbers (ASNs) to evade website blocks," Cloudflare said. "This activity was observed crossed tens of thousands of domains and millions of requests per day."

Also: Samsung users tin get Perplexity Pro AI free for an full twelvemonth - that's $240 off

The result? Cloudflare said it observed "Perplexity not only accessed specified contented but was capable to supply elaborate answers astir it erstwhile queried by users."

Cloudflare's scheme to extremity Perplexity

Moving forward, Cloudflare has claimed its bot guidance strategy tin spot and artifact Perplexity's hidden User Agent. Any bot guidance customer who has an existing artifact norm successful spot is already protected. 

If you don't want to artifact specified postulation connected nan grounds that it mightiness beryllium from existent users, you tin group up rules to challenge requests. This allows existent humans to proceed. Customers pinch existing situation rules are already protected. 

Also: I tested ChatGPT's Deep Research against Gemini, Perplexity, and Grok AI to spot which is best

Finally, Cloudflare has added signature matches for nan stealth crawler to its managed rule, which blocks AI crawling activity. This norm is disposable to each Cloudflare customers, including free users.

Cloudflare noted that OpenAI does obey nan robots.txt restrictions and doesn't effort to break into websites. That said, Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed copyrights successful training and operating its AI systems.

Cloudflare has precocious started offering its customers nan action to automatically artifact each AI crawlers. To complement nan move to artifact AI crawlers, Cloudflare has besides launched its Pay Per Crawl program, enabling publishers to group rates for AI companies that want to scrape their content.

Also: 5 reasons why I still for illustration Perplexity complete each different AI chatbot

This follows galore deals successful which media businesses are permitting AI companies to legally usage their contented to train their ample connection models (LLMs). Examples see The New York Times pinch Amazon, The Washington Post pinch OpenAI, and Perplexity pinch Gannett Publishing.

In nan meantime, Perplexity appears to proceed to break nan rules successful its hunt for content. ZDNET has asked Perplexity astir Cloudflare's claims, but nan institution has not responded.

Perplexity strikes back

Since then, Perplexity has publically and loudly announced that Cloudflare has it each wrong. In a blog post, Perplexity claims:

This contention reveals that Cloudflare's systems are fundamentally inadequate for distinguishing betwixt morganatic AI assistants and existent threats.  If you can't show a adjuvant integer adjunct from a malicious scraper, past you astir apt shouldn't beryllium making decisions astir what constitutes morganatic web traffic.

Those are fighting words! Further, Perplexity states, "Technical errors successful Cloudflare's study aren't conscionable embarrassing -- they're disqualifying. When you misattribute millions of requests, people wholly inaccurate method diagrams, and show a basal misunderstanding of really modern AI assistants work, you've forfeited immoderate declare to expertise successful this space." 

This conflict is on. Stay tuned for what's adjacent successful this conflict betwixt an net elephantine and an AI powerhouse.

Want much stories astir AI? Check retired AI Leaderboard, our play newsletter.

More