AI startup Perplexity is crawling and scraping contented from websites that person explicitly indicated they don’t want to beryllium scraped, according to net infrastructure supplier Cloudflare.
On Monday, Cloudflare published research saying it observed nan AI startup disregard blocks and hide its crawling and scraping activities. The web infrastructure elephantine accused Perplexity of obscuring its personality erstwhile trying to scrape web pages “in an effort to circumvent nan website’s preferences,” Cloudflare’s researchers wrote.
AI products for illustration those offered by Perplexity trust connected gobbling up ample amounts of information from nan internet, and AI startups person agelong scraped text, images, and videos from nan net galore times without support to make their products work. In caller times, websites person tried to conflict backmost by utilizing nan web modular Robots.txt file, which tells hunt engines and AI companies which pages tin beryllium indexed and which shouldn’t, efforts that person seen mixed results truthful far.
Perplexity appears to beryllium willingly circumventing these blocks by changing its bots “user agent,” meaning a awesome that identifies a website visitant by their instrumentality and type type; arsenic good arsenic changing their autonomous strategy networks, aliases ASN, fundamentally a number that identifies ample networks connected nan internet, according to Cloudflare.
“This activity was observed crossed tens of thousands of domains and millions of requests per day. We were capable to fingerprint this crawler utilizing a operation of instrumentality learning and web signals,” publication Cloudflare’s post.
Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s blog station arsenic a “sales pitch,” adding successful an email to TechCrunch that nan screenshots successful nan station “show that nary contented was accessed.” In a follow-up email, Dwyer claimed nan bot named successful nan Cloudflare blog “isn’t moreover ours.”
Cloudflare said it first noticed nan behaviour aft its customers complained that Perplexity was crawling and scraping their sites, moreover aft they added rules connected their Robots record and for specifically blocking Perplexity’s known bots. Cloudflare said it past performed tests to cheque and confirmed that Perplexity was circumventing these blocks.
Techcrunch event
San Francisco | October 27-29, 2025
“We observed that Perplexity uses not only their declared user-agent, but besides a generic browser intended to impersonate Google Chrome connected macOS erstwhile their declared crawler was blocked,” according to Cloudflare.
The institution besides said that it has de-listed Perplexity’s bots from its verified database and added caller techniques to artifact them.
Cloudflare has precocious taken a nationalist stance against AI crawlers. Last month, Cloudflare announced nan motorboat of a marketplace allowing website owners and publishers to complaint AI scrapers who sojourn their sites. Cloudflare’s main executive Matthew Prince sounded nan alarm astatine nan time, saying AI is breaking nan business exemplary of nan internet, peculiarly publishers. Last year, Cloudflare besides launched a free tool to forestall bots from scraping websites to train AI.
This is not nan first clip Perplexity is accused of scraping without authorization.
Last year, news outlets, such arsenic Wired, alleged Perplexity was plagiarizing their content. Weeks later, Perplexity’s CEO Aravind Srinivas was incapable to instantly answer erstwhile asked to supply nan company’s meaning of plagiarism during an question and reply pinch TechCrunch’s Devin Coldewey astatine nan Disrupt 2024 conference.
Lorenzo Franceschi-Bicchierai is simply a Senior Writer astatine TechCrunch, wherever he covers hacking, cybersecurity, surveillance, and privacy. You tin interaction Lorenzo securely connected Signal astatine +1 917 257 1382, connected Keybase/Telegram @lorenzofb, aliases via email astatine lorenzo@techcrunch.com.