Bots Bypass Captcha
CAPTCHA: Hard for Humans, Easy for Bots – PerimeterX
CAPTCHA: A Well-worn Approach to Bot Defense
For years, website owners have used a number of approaches and technologies to battle constantly evolving bot threats. One of the most common ways to battle bots has been to use CAPTCHAs, a challenge-response mechanism that promised an easy way to distinguish between a bot and a human. CAPTCHA is an acronym for completely automated public Turing test to tell computers and humans apart. Used in millions of sites, CAPTCHA is employed to help prevent bots from doing form submissions, executing logins and accessing sensitive pages or processes.
How CAPTCHA Has Evolved
As bot-based threats have evolved, so have the CAPTCHA mechanisms intended to stop them. In its early forms, users were asked to read distorted text and submit it in a form.
An example of one of the types of Google reCAPTCHAs that are most commonly used today.
Today, Google reCAPTCHA represents the dominant form of CAPTCHA technology in use. One study found that, across one million of the world’s top websites that employ CAPTCHA, Google reCAPTCHA was deployed by 94% of them.
How CAPTCHA Is Failing
In spite of its widespread, continued usage, there are two very fundamental problems with CAPTCHA:
User experience: From a user standpoint, as just about anyone alive can tell you, the experience is a poor one. It’s time-consuming, increasingly difficult, and can often keep legitimate users from doing what they want and need to do.
Efficacy: From a security standpoint, quite simply, it doesn’t work. The challenge is supposed to be easy for users, and hard for bots, but in fact, it’s become quite the opposite.
Following is an overview of the plethora of options available that make it easy to bypass CAPTCHA challenges.
How Attackers are Easily Bypassing CAPTCHA Challenges
There are a number of CAPTCHA-solving technologies and services available to attackers today. Attackers choose the solvers that work best against the type of CAPTCHA used on a target site. Here are two high-level categories:
Automated Technologies and Plug-ins
There is a range of automated technologies, including APIs, browser plug-ins and extensions that enable attackers to bypass or solve CAPTCHA challenges. Here are a few examples:
A group of researchers from Lancaster University, Northwest University and Peking University used the concept of a generative adversarial network (GAN) in order to create an extremely fast and accurate CAPTCHA solver.
There are several free online CAPTCHA solving services and libraries that leverage deep learning-based technologies, including GRIS, Alchemy, Clarifai and NeuralTalk. Academic studies show that deep-learning-based approaches are highly accurate in solving CAPTCHA challenges.
DeCaptcher is an example of one of the solving services available via APIs making it easy to integrate into applications. Based on an optical character recognition system, the service solves challenges and provides a file to download that details the time, the challenge image, and the text used to solve the challenges.
Open-source tools and browser extensions, including Buster and UnCaptcha, use audio recognition that was intended to help visually impaired users and abuses it to bypass CAPTCHA mechanisms in an automated fashion.
Human-assisted Solving Services
In addition, there are also human-powered services that are available. These services are often staffed by people who work in so-called farms. These services are easy to find via a simple Google search. These services make it cost-effective for attackers to bypass the object recognition challenges used in reCAPTCHA.
2captcha and anti-captcha are some of the most popular examples of such a service. At a high level, these services enable customers to submit target websites, often via an API, to the vendor. The vendor’s staff will solve the challenge and provide the solution back to the customer. These vendors advertise solving 1, 000 regular CAPTCHA challenges for as little as $1. 00, and 1, 000 reCAPTCHA challenges for between $1. 99 and $2. 99.
Increasing Prevalence and Usage of CAPTCHA Solvers
Given their low/no cost, availability and efficacy, the use of CAPTCHA solvers continues to grow. With our PerimeterX Bot Defender solution, we’ve detected a rapid expansion in the use of CAPTCHA solvers. As the diagram below illustrates, between August 2019 and March 2020, we saw a significant increase in the volume of attempted attacks that employed CAPTCHA solvers.
Given their accessibility and ease, the use of CAPTCHA solvers has grown rapidly.
Conclusion
It’s abundantly clear that users and businesses can’t stand CAPTCHA mechanisms that interrupt the user flow and ultimately lower conversions on websites. Particularly as artificial intelligence continues to improve, standalone visual-challenge-response approaches aren’t viable. Quite simply, organizations can’t rely solely on CAPTCHA-based mechanisms to combat bots, given the abundance of CAPTCHA solvers. These realities are exposing a very clear demand for advanced mechanisms that don’t frustrate users and are difficult for bots to solve.
CAPTCHA and reCAPTCHA: How Can You Bypass It?
If you have spent any time on the internet in recent years, you’ve had to check a little box to tell the world, “I’m not a robot. ” This little box was invariably accompanied by a small visual or audio test, called CAPTCHA.
You have to pass the CAPTCHA test to prove you are “not a robot” before you can access some part of a website. Usually, this occurs at a point where you need to complete a form to sign up, subscribe, or make a purchase on a website or app.
For many users, these have been an annoying and time-consuming necessity of the internet—often leaving them wondering how to avoid CAPTCHA. For the companies using them, however, CAPTCHA tools have been a reassuring security measure. This has given them confidence that the people accessing their website are genuine visitors and not fraudsters. There is one problem though, they don’t always work.
In this article, we will go through exactly what CAPTCHAs are, how they can easily be bypassed or are otherwise ineffective, and what you can do instead to truly protect yourself from fraudulent users.
Table of Contents:
What Is CAPTCHA?
What Is reCAPTCHA?
The Downsides of CAPTCHA
What Can You Do about CAPTCHA Bypasses?
What Is a CAPTCHA?
As the internet started gaining traction in the 90s, internet malpractice followed close behind. CAPTCHAs were created in response to this as a way of differentiating genuine users from bad bots merely crawling through websites to perform some form of fraud.
The very name CAPTCHA explains this goal, standing for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’, with a Turing Test being a creation designed to differentiate between human intelligence and that of a machine.
These early CAPTCHAs took the form of text altered in some way to make it impossible for bots to read. While initially, they were very successful, quick advances in computing meant that bots were able to read what the text said.
In fact, pretty soon bots got so good at bypassing CAPTCHA that, by 2014, Google found that their reCAPTCHA program (a development from the original CAPTCHAs) could be bypassed by bots over 99% of the time.
Want to Learn More about Bots? Download the Bots 101 eBook!
reCAPTCHA is a human verification system developed in 2007 and purchased by Google in 2009. Initially, the tool was developed to help digitize books that couldn’t be scanned by computers. Once enacted to verify users, reCAPTCHA displayed two different distorted words with lines running through them (compared to CAPTCHA’s random sequences of letters and numbers).
By 2012, the project began incorporating images from Google Street View. By now, you’ve almost certainly spent a decent chunk of time clicking all of the images that contain a stoplight just to prove you’re not a bot. And you’ve probably failed some of these tests, too! As noted by Baymard Institute, “Only 66% of users during our qualitative usability testing successfully entered the CAPTCHA on the first attempt. ”
There were a few more iterations of reCAPTCHA, including the noCAPTCHA reCAPTCHA (where low-risk users only had to click a checkbox that stated “I’m not a robot”) and reCAPTCHA v3.
About reCAPTCHA v3
In 2018, Google unveiled reCAPTCHA v3, the latest iteration of the tool. Even if you’re an incredibly proficient internet user, there’s a good chance you’re scratching your chin and wondering whether you’ve come across reCAPTCHA v3 before.
With reCAPTCHA v3, you don’t have to decipher distorted words, you don’t have to click boxes to indicate you know what a car looks like, and you don’t even have to click the “I’m not a robot” checkbox, either. That’s because reCAPTCHA v3 exists largely in the background—completely invisible to the average user.
As such, reCAPTCHA v3 helps companies detect bots while ostensibly delivering a better user experience—but it hurts user privacy in exchange.
Here’s how it works: Google analyzes behavior as users navigate a website, and they rank that behavior to determine how “risky” the user is, i. e., how likely it is that the session is actually a bot and not a human.
While reCAPTCHA v3 can help websites detect bots, it’s only good for that use case. If you want to protect your website from ad fraud, you’ll need to do more than rely on this service. Based on client performance data, carefully crafted malware and human fraud will get past reCAPTCHA v3 and has a high false positive rate in mismarking real people as fraud.
As useful as CAPTCHA has been in the past, it’s important to realize that they aren’t without their downsides. These tools leave much to be desired as ad fraud prevention methods. Some key issues with CAPTCHA and reCAPTCHA include:
CAPTCHAs Hurt the User Experience
Imagine you’re heading to a retailer’s website to complete an e-commerce transaction. You just found out about a new product, and you’re eager to buy it as soon as possible. As you begin the process of checking out, you run into a CAPTCHA. Worse yet, you fail the test. Would such an experience make you more or less likely to complete the purchase?
If the CAPTCHA test is poorly made, it can be failed multiple times. For example, if there’s a requirement to “pick all boxes that have a fire hydrant” and it’s all one big fire hydrant with just the tip of a piece on a few pixels on one box, should it be clicked or not?
This can be extraordinarily frustrating for users—which impacts user engagement and conversions.
CAPTCHAs Can Waste Customers’ Time
In more recent news, CAPTCHAs have been shown to eat up extra time for users. For example, the PS5 and Xbox Series X console launches have pitted human buyers against bots owned and operated by scalpers on retailer websites.
When a human encounters a CAPTCHA test, they have to spend precious seconds looking at it and responding. A bot can bypass the test—acting like a CAPTCHA skipper and proceeding almost directly to purchase in milliseconds. The result? The bot buys dozens of consoles and the human gets an “out of stock” error message by the time they finish the test.
Killing Conversion Rates
Taken together, it comes as no surprise that annoying experiences and more time required to complete actions translate into a 40% lower conversion rate with CAPTCHA. It’s worth noting that CAPTCHAs won’t just prevent you from generating more leads or selling more products at that moment. Since consumers are likely to stop supporting brands after a bad experience, they may very well prevent you from racking up sales in the future, too.
CAPTCHA Bypass Is Too Easy with Modern Bots
If hurting the user experience wasn’t enough to cause you to think about ditching CAPTCHAs, here’s something else to consider: Due to the evolution of technology, artificial intelligence (AI) has gotten to the point where a modern “CAPTCHA bot” or “block reCAPTCHA tool” can bypass the test with ease—defeating their purpose entirely.
Since CAPTCHAs don’t offer any kind of support or analytics, you can’t zero in on where fraud is coming from. Even if your CAPTCHAs somehow prevented bots from getting around them, you’d still have to deal with malware and human fraud.
Unfortunately, despite attempts to outrun malicious users in digital advertising, just a quick Google search will provide you with an abundance of sites telling you exactly how to get around even the most complex tests.
Additionally, these tests are often so difficult or poorly-made that users get genuinely angry in dealing with them, painting a less than ideal picture of CAPTCHAs. Best case, this leads to a sour taste in their mouth from the user experience. In the worst case they leave the site altogether.
Even when it comes to reCAPTCHA v3, it is shockingly easy for fraudsters to gain a high score using a carefully crafted CAPTCHA bot or by employing human fraud farms. These sophisticated fraudsters can easily bypass the CAPTCHAs they face.
By putting the responsibility on the website owner, you are left with people deciding what traffic probably should get to their sites. With all this in mind, probability comes with a high risk of false positives. The most commonly used CAPTCHAs today should not be used as a definitive solution to block fraudulent traffic.
Thankfully, there are ways to block fraudulent traffic that are better at identifying malicious bots, malware, and human fraud that do not ruin the user experience and don’t leave the decision-making in your hands.
Using Biometrics
You could verify users are real humans and not bots by using biometrics. For example, you might ask people on smartphones to prove their identity with their fingerprint. There are other kinds of biometrics to consider, too—including typing biometrics, speech recognition, and facial recognition.
Depending on your use case, however, biometrics might not be the best option. On one hand, such systems tend to be pretty pricey. On the other, not too many consumers are keen on giving away their biometric data to a company that sells socks, for example.
Multi-Factor Authentication
You can also implement a multi-factor authentication (MFA) method to make sure actual humans are accessing your systems. For example, you might have someone log into their account and then send them a text message with a one-time passcode they need to input on your website to get to the next step.
While this method can be helpful in secure environments—like banking and brokerage accounting apps—it will likely create far too much user friction for the average company.
Ad Fraud Solutions
An ad fraud solution like Anura enables you to stop bots in their tracks while also protecting you from malware and human fraud. The solution sits entirely in the background of your website, with no effect on the user experience at all.
Have Questions about Ad Fraud Detection? Get the eBook with everything you need to know!
Anura detects fraud with precision via a robust, fine-tuned solution that delivers virtually no false positives. Get the peace of mind that comes with knowing you’re never blocking real visitors. This definitive and accurate approach gives you the freedom to run your business without the worries of fraudulent visitors.
With Anura, you’re able to sell more, generate more leads, and optimize your campaigns with the peace of mind that comes with knowing your data is accurate and that fraudsters haven’t taken advantage of you. It’s the easiest way to stop bot traffic—and several other kinds of ad fraud, too—without hurting the user experience.
Request a trial or contact us to learn more.
How to detect Captcha farms and block Captcha bots
Easy to implement and often free, Captchas are widely used as a basic bot protection measure. But they’re not immune to bots. On the contrary, fraudsters are finding increasingly sophisticated ways to bypass Captchas. For example, some advanced Captcha-solving bots rely on artificial intelligence for automated image or audio recognition.
But another, increasingly common strategy relies on a much more time-tested problem-solving device: the human brain. In an interesting turn of the tables, Captcha farms aren’t about bot-assisted humans. They’re about human-assisted bots.
What are Captcha farms?
Captcha farms are services that bot developers can query via an API to automate solving Captchas. Instead of using AI to solve Captcha challenges, Captcha farms such as 2Captcha and DeathByCaptcha distribute Captchas to a pool of human workers, usually in developing countries.
Due to the low cost of labor in these countries, these services can come as cheap as $1-3 per 1, 000 Captchas solved, depending on the type of Captcha (image or text-based Captchas, reCaptchas, hCaptchas, Geetest, FunCaptchas, etc).
Let’s follow the trajectory of a bot that gets challenged with a Captcha. Here’s what happens:
The bot is blocked by a Captcha challenge.
It makes an API call to the Captcha farm with the website’s Captcha public key & its domain name as parameters.
The Captcha farm asks one of its workers to solve the Captcha.
After ~30-45 seconds, the Captcha is solved and you obtain its response token.
The bot solves the Captcha by submitting the response token.
In short, solving a Captcha is as simple as calling a function in the bot’s code. The attacker doesn’t even need to interact directly with the Captcha by clicking on it. If the attackers know the structure and the URL of the Captcha callback, i. e. the request where the website sends the Captcha response token after a successful response has been submitted (which is straightforward by looking at the devtools), they can prove that they’ve solved a Captcha without even using a real browser.
Captcha farms enable bot developers to significantly reduce their infrastructure costs. For example, for an attacker conducting large-scale crawling or credential stuffing attacks, using real automated browsers or automated headless browsers is costly. Such bots require significant computational resources (RAM/CPU) when compared to bots that only use a simple HTTP request library such as Curl, the Python quest module, or the Axios library in
Captcha farms enable bot developers to run their bots with cheaper infrastructure, which is why their small service fees deliver an excellent return on investment.
Captcha bots target every industry
If you believe that your website isn’t impacted by Captcha farms, you’re probably wrong. At DataDome, we see them in every domain.
Here are three quick snapshots from three very different industries.
Case study #1: Public transportation website and app
A customer in the public transportation industry recently activated the DataDome bot protection solution on all its websites and applications.
Before the protection was implemented, bots could easily crawl the site. It was also targeted by frequent credential stuffing attacks. Once we activated the protection, the bot traffic volume decreased significantly. This is a common observation: when bot operators realize that their bots are being blocked, many will simply stop trying and look for another victim instead (hopefully not you).
A few days after activating the protection, we started to see an uptick in solved Captchas for this customer
The volume of solved Captchas (the green curve) starts to increase around April 11.
A superficial analysis could easily have concluded that the solved Captchas were false positives, i. humans mistakenly identified as bots. However, our detection engine was able to determine that they originated from Captcha farms.
In the span of six days, our Captcha bot detection invalidated approximately 12, 000 Captchas solved by Captcha farm workers. (And no, our customer received no complaints about legitimate users being blocked. )
Case study #2: Price comparison website
Our second case study is a high-traffic price comparison website that constantly came under fire from Captcha bots. Between November 2019 and April 2020, bots attempted to forge more than 265, 000 Captchas using Captcha farms.
Significant Captcha farm activity (the red curve) on a price comparison website.
The perceptive reader may have spotted an interesting phenomenon in this graph: in some instances, the volume of Captchas submitted by Captcha bots is higher than the volume of Captchas served. How can this be the case?
It’s because we only consider a Captcha as served when the browser has executed the Javascript responsible for rendering the Captcha. This means that the bots which submitted the “surplus” Captchas did not even come from real browsers.
A website relying exclusively on Captchas for bot protection would have been tricked into accepting these visitors as humans, since the Captchas were effectively “passed. ”
In contrast, the DataDome detection engine analyzes around 250 events for every request to determine whether a visitor is human or a bot. A Captcha will only be accepted as passed when we are sure that it has been solved by a human actually browsing the website, not by a Captcha farm worker.
Case study #3: Retailer website
Our last example is a large retailer with both online and physical stores. During the first three weeks of February 2020, their applications and websites received more than 26, 000 Captchas forged by Captcha bots.
Regular Captcha farm activity on a retailer website
Again, a website with no other bot protection solution would have taken the solved Captchas at face value and let the bots through. This illustrates our main point: Captcha farms make Captchas a very inadequate bot protection measure.
Detecting Captcha farms
Detecting bots that leverage Captcha farms to bypass bot protection solutions is challenging. In fact, many bot management solutions accept solved Captchas as proof of the visitor’s humanity.
Other solutions will give the user a certain “credit” of allowed requests, based on the session cookie, after passing a Captcha. In this case, the bot just needs to send its trusted session cookie along with malicious requests in order not to be challenged by more Captchas for a while.
Solved Captchas are also often used as a feedback mechanism for false positives (we do this at DataDome, too). If a human is blocked by mistake, the detection system can correct the error by letting the user continue to browse the website after solving a Captcha. If the bot detection system is using machine learning, the algorithm will use this mistake to self-correct.
The DataDome dashboard shows the number of Captchas passed, which enables users to monitor detection quality.
Captcha farms make this feedback loop less reliable. If the bot detection system accepts solved Captchas as absolute proof of humanity, Captcha farms will increase the number of false negatives (bots passing as humans).
On the other hand, if the bot protection system invalidates Captchas too diligently, it increases the risk of false positives (hard-blocking legitimate human users).
In order to efficiently stop bad bots while preserving the user experience for humans, accurate detection of Captcha farms is therefore essential.
For obvious reasons, we can’t give away too much detail about how we detect Captcha farms. But below are a couple of examples of the high-level approaches that the DataDome real-time detection engine relies on to detect bots that submit Captchas solved by Captcha farm workers.
Fingerprinting
Our detection engine always conducts a deep analysis of visitors’ browser fingerprints. If the engine is 100% sure that the user is a bot, e. g. if it detects a modified Selenium, Puppeteer or Playwright bot, it will automatically invalidate the Captcha passed.
Besides detecting well-known browser automation frameworks and headless browsers, our engine analyzes hundreds of other fingerprinting signals in order to make its decision.
Solve speed
Alongside the browser fingerprints of visitors, we also look at how quickly a visitor solves a Captcha. The graph below shows two cumulative distribution functions: one for how quickly a Captcha farm worker solves a Captcha, another for how quickly other users solve them.
The blue line grows significantly faster than the orange line. This means that Captcha farm workers are much faster at solving Captchas than regular users. Indeed, Captcha farm workers solved around 50% of Captchas in less than 5 seconds where normal users only solved around 30%.
AI outlier detection
The DataDome detection engine also leverages AI outlier detection to identify suspicious Captcha-solving traffic.
For example, if it detects a sudden increase in outdated browsers (based on the user agent) coming from unusual countries for that website, it may flag these Captcha responses as potentially coming from Captcha farms. DataDome recently did an analysis of 2 million Captchas passed over 3 months to better understand which countries tend to have the most Captcha farm workers.
We did so by looking at the IP addresses of the specific workers of Captcha farms and using MaxMind’s database to map those IPs to particular countries (anything more granular than a country has a much higher chance of being inaccurate). Here’s what we found:
This was further broken down into the ISPs and telecom providers of Captcha farm workers:
Using this data, along with more advanced heuristics and statistical techniques based on different signals, such as the IP score or the device fingerprint, DataDome’s detection engine determines whether or not it should validate the Captcha attempt.
If the solved Captcha is invalidated, the engine generates a pattern to automatically block similar Captcha attempts in the future.
Captchas are not an anti-bot silver bullet
To conclude: while Captchas still have their uses, they won’t keep your website or your mobile app safe from malicious bots. Even Captchas that claim to be smarter, such as reCaptcha v3, have their downsides:
The difficulty of setting proper block/allow thresholds
The need to monitor false positives and false negatives (feedback loop)
The risk of blocking legitimate good bots (search engine bots, SEO bots, technical partners).
And as we have seen, for very small fees, motivated bot developers can easily leverage Captcha farms to bypass security systems that put too much trust in Captchas.
A good bot protection solution, on the other hand, will effectively block bots—including bots that rely on Captcha farms—while remaining invisible to human users.
The DataDome bot protection software analyzes 100% of the requests that hit our customers’ applications. We collect and analyze more than 250 different events for each and every request, in order to accurately distinguish between humans and bots.
New threats are identified via statistical and behavioral detection, using data from server-side fingerprints, a JS rendering engine, SDK inputs and session tracking. We make extensive use of online machine learning, and detect a new bad bot pattern every 10 milliseconds. Our false positive rate is below 0. 01%: of 10, 000 Captchas served, less than one is seen by a human.
If you’d like to watch real-time bot activity on your own website, you can set up a free DataDome trial yourself in less than an hour (no commitment, no credit card). All you have to do is create your free account and follow the installation instructions. You can then access your personal dashboard, and get a full overview of good bot, bad bot and human traffic to your site.
Ready? Start here.