Can Bots Get Past Captcha
CAPTCHA: Hard for Humans, Easy for Bots – PerimeterX
CAPTCHA: A Well-worn Approach to Bot Defense
For years, website owners have used a number of approaches and technologies to battle constantly evolving bot threats. One of the most common ways to battle bots has been to use CAPTCHAs, a challenge-response mechanism that promised an easy way to distinguish between a bot and a human. CAPTCHA is an acronym for completely automated public Turing test to tell computers and humans apart. Used in millions of sites, CAPTCHA is employed to help prevent bots from doing form submissions, executing logins and accessing sensitive pages or processes.
How CAPTCHA Has Evolved
As bot-based threats have evolved, so have the CAPTCHA mechanisms intended to stop them. In its early forms, users were asked to read distorted text and submit it in a form.
An example of one of the types of Google reCAPTCHAs that are most commonly used today.
Today, Google reCAPTCHA represents the dominant form of CAPTCHA technology in use. One study found that, across one million of the world’s top websites that employ CAPTCHA, Google reCAPTCHA was deployed by 94% of them.
How CAPTCHA Is Failing
In spite of its widespread, continued usage, there are two very fundamental problems with CAPTCHA:
User experience: From a user standpoint, as just about anyone alive can tell you, the experience is a poor one. It’s time-consuming, increasingly difficult, and can often keep legitimate users from doing what they want and need to do.
Efficacy: From a security standpoint, quite simply, it doesn’t work. The challenge is supposed to be easy for users, and hard for bots, but in fact, it’s become quite the opposite.
Following is an overview of the plethora of options available that make it easy to bypass CAPTCHA challenges.
How Attackers are Easily Bypassing CAPTCHA Challenges
There are a number of CAPTCHA-solving technologies and services available to attackers today. Attackers choose the solvers that work best against the type of CAPTCHA used on a target site. Here are two high-level categories:
Automated Technologies and Plug-ins
There is a range of automated technologies, including APIs, browser plug-ins and extensions that enable attackers to bypass or solve CAPTCHA challenges. Here are a few examples:
A group of researchers from Lancaster University, Northwest University and Peking University used the concept of a generative adversarial network (GAN) in order to create an extremely fast and accurate CAPTCHA solver.
There are several free online CAPTCHA solving services and libraries that leverage deep learning-based technologies, including GRIS, Alchemy, Clarifai and NeuralTalk. Academic studies show that deep-learning-based approaches are highly accurate in solving CAPTCHA challenges.
DeCaptcher is an example of one of the solving services available via APIs making it easy to integrate into applications. Based on an optical character recognition system, the service solves challenges and provides a file to download that details the time, the challenge image, and the text used to solve the challenges.
Open-source tools and browser extensions, including Buster and UnCaptcha, use audio recognition that was intended to help visually impaired users and abuses it to bypass CAPTCHA mechanisms in an automated fashion.
Human-assisted Solving Services
In addition, there are also human-powered services that are available. These services are often staffed by people who work in so-called farms. These services are easy to find via a simple Google search. These services make it cost-effective for attackers to bypass the object recognition challenges used in reCAPTCHA.
2captcha and anti-captcha are some of the most popular examples of such a service. At a high level, these services enable customers to submit target websites, often via an API, to the vendor. The vendor’s staff will solve the challenge and provide the solution back to the customer. These vendors advertise solving 1, 000 regular CAPTCHA challenges for as little as $1. 00, and 1, 000 reCAPTCHA challenges for between $1. 99 and $2. 99.
Increasing Prevalence and Usage of CAPTCHA Solvers
Given their low/no cost, availability and efficacy, the use of CAPTCHA solvers continues to grow. With our PerimeterX Bot Defender solution, we’ve detected a rapid expansion in the use of CAPTCHA solvers. As the diagram below illustrates, between August 2019 and March 2020, we saw a significant increase in the volume of attempted attacks that employed CAPTCHA solvers.
Given their accessibility and ease, the use of CAPTCHA solvers has grown rapidly.
Conclusion
It’s abundantly clear that users and businesses can’t stand CAPTCHA mechanisms that interrupt the user flow and ultimately lower conversions on websites. Particularly as artificial intelligence continues to improve, standalone visual-challenge-response approaches aren’t viable. Quite simply, organizations can’t rely solely on CAPTCHA-based mechanisms to combat bots, given the abundance of CAPTCHA solvers. These realities are exposing a very clear demand for advanced mechanisms that don’t frustrate users and are difficult for bots to solve.
CAPTCHA and reCAPTCHA: How Can You Bypass It?
If you have spent any time on the internet in recent years, you’ve had to check a little box to tell the world, “I’m not a robot. ” This little box was invariably accompanied by a small visual or audio test, called CAPTCHA.
You have to pass the CAPTCHA test to prove you are “not a robot” before you can access some part of a website. Usually, this occurs at a point where you need to complete a form to sign up, subscribe, or make a purchase on a website or app.
For many users, these have been an annoying and time-consuming necessity of the internet—often leaving them wondering how to avoid CAPTCHA. For the companies using them, however, CAPTCHA tools have been a reassuring security measure. This has given them confidence that the people accessing their website are genuine visitors and not fraudsters. There is one problem though, they don’t always work.
In this article, we will go through exactly what CAPTCHAs are, how they can easily be bypassed or are otherwise ineffective, and what you can do instead to truly protect yourself from fraudulent users.
Table of Contents:
What Is CAPTCHA?
What Is reCAPTCHA?
The Downsides of CAPTCHA
What Can You Do about CAPTCHA Bypasses?
What Is a CAPTCHA?
As the internet started gaining traction in the 90s, internet malpractice followed close behind. CAPTCHAs were created in response to this as a way of differentiating genuine users from bad bots merely crawling through websites to perform some form of fraud.
The very name CAPTCHA explains this goal, standing for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’, with a Turing Test being a creation designed to differentiate between human intelligence and that of a machine.
These early CAPTCHAs took the form of text altered in some way to make it impossible for bots to read. While initially, they were very successful, quick advances in computing meant that bots were able to read what the text said.
In fact, pretty soon bots got so good at bypassing CAPTCHA that, by 2014, Google found that their reCAPTCHA program (a development from the original CAPTCHAs) could be bypassed by bots over 99% of the time.
Want to Learn More about Bots? Download the Bots 101 eBook!
reCAPTCHA is a human verification system developed in 2007 and purchased by Google in 2009. Initially, the tool was developed to help digitize books that couldn’t be scanned by computers. Once enacted to verify users, reCAPTCHA displayed two different distorted words with lines running through them (compared to CAPTCHA’s random sequences of letters and numbers).
By 2012, the project began incorporating images from Google Street View. By now, you’ve almost certainly spent a decent chunk of time clicking all of the images that contain a stoplight just to prove you’re not a bot. And you’ve probably failed some of these tests, too! As noted by Baymard Institute, “Only 66% of users during our qualitative usability testing successfully entered the CAPTCHA on the first attempt. ”
There were a few more iterations of reCAPTCHA, including the noCAPTCHA reCAPTCHA (where low-risk users only had to click a checkbox that stated “I’m not a robot”) and reCAPTCHA v3.
About reCAPTCHA v3
In 2018, Google unveiled reCAPTCHA v3, the latest iteration of the tool. Even if you’re an incredibly proficient internet user, there’s a good chance you’re scratching your chin and wondering whether you’ve come across reCAPTCHA v3 before.
With reCAPTCHA v3, you don’t have to decipher distorted words, you don’t have to click boxes to indicate you know what a car looks like, and you don’t even have to click the “I’m not a robot” checkbox, either. That’s because reCAPTCHA v3 exists largely in the background—completely invisible to the average user.
As such, reCAPTCHA v3 helps companies detect bots while ostensibly delivering a better user experience—but it hurts user privacy in exchange.
Here’s how it works: Google analyzes behavior as users navigate a website, and they rank that behavior to determine how “risky” the user is, i. e., how likely it is that the session is actually a bot and not a human.
While reCAPTCHA v3 can help websites detect bots, it’s only good for that use case. If you want to protect your website from ad fraud, you’ll need to do more than rely on this service. Based on client performance data, carefully crafted malware and human fraud will get past reCAPTCHA v3 and has a high false positive rate in mismarking real people as fraud.
As useful as CAPTCHA has been in the past, it’s important to realize that they aren’t without their downsides. These tools leave much to be desired as ad fraud prevention methods. Some key issues with CAPTCHA and reCAPTCHA include:
CAPTCHAs Hurt the User Experience
Imagine you’re heading to a retailer’s website to complete an e-commerce transaction. You just found out about a new product, and you’re eager to buy it as soon as possible. As you begin the process of checking out, you run into a CAPTCHA. Worse yet, you fail the test. Would such an experience make you more or less likely to complete the purchase?
If the CAPTCHA test is poorly made, it can be failed multiple times. For example, if there’s a requirement to “pick all boxes that have a fire hydrant” and it’s all one big fire hydrant with just the tip of a piece on a few pixels on one box, should it be clicked or not?
This can be extraordinarily frustrating for users—which impacts user engagement and conversions.
CAPTCHAs Can Waste Customers’ Time
In more recent news, CAPTCHAs have been shown to eat up extra time for users. For example, the PS5 and Xbox Series X console launches have pitted human buyers against bots owned and operated by scalpers on retailer websites.
When a human encounters a CAPTCHA test, they have to spend precious seconds looking at it and responding. A bot can bypass the test—acting like a CAPTCHA skipper and proceeding almost directly to purchase in milliseconds. The result? The bot buys dozens of consoles and the human gets an “out of stock” error message by the time they finish the test.
Killing Conversion Rates
Taken together, it comes as no surprise that annoying experiences and more time required to complete actions translate into a 40% lower conversion rate with CAPTCHA. It’s worth noting that CAPTCHAs won’t just prevent you from generating more leads or selling more products at that moment. Since consumers are likely to stop supporting brands after a bad experience, they may very well prevent you from racking up sales in the future, too.
CAPTCHA Bypass Is Too Easy with Modern Bots
If hurting the user experience wasn’t enough to cause you to think about ditching CAPTCHAs, here’s something else to consider: Due to the evolution of technology, artificial intelligence (AI) has gotten to the point where a modern “CAPTCHA bot” or “block reCAPTCHA tool” can bypass the test with ease—defeating their purpose entirely.
Since CAPTCHAs don’t offer any kind of support or analytics, you can’t zero in on where fraud is coming from. Even if your CAPTCHAs somehow prevented bots from getting around them, you’d still have to deal with malware and human fraud.
Unfortunately, despite attempts to outrun malicious users in digital advertising, just a quick Google search will provide you with an abundance of sites telling you exactly how to get around even the most complex tests.
Additionally, these tests are often so difficult or poorly-made that users get genuinely angry in dealing with them, painting a less than ideal picture of CAPTCHAs. Best case, this leads to a sour taste in their mouth from the user experience. In the worst case they leave the site altogether.
Even when it comes to reCAPTCHA v3, it is shockingly easy for fraudsters to gain a high score using a carefully crafted CAPTCHA bot or by employing human fraud farms. These sophisticated fraudsters can easily bypass the CAPTCHAs they face.
By putting the responsibility on the website owner, you are left with people deciding what traffic probably should get to their sites. With all this in mind, probability comes with a high risk of false positives. The most commonly used CAPTCHAs today should not be used as a definitive solution to block fraudulent traffic.
Thankfully, there are ways to block fraudulent traffic that are better at identifying malicious bots, malware, and human fraud that do not ruin the user experience and don’t leave the decision-making in your hands.
Using Biometrics
You could verify users are real humans and not bots by using biometrics. For example, you might ask people on smartphones to prove their identity with their fingerprint. There are other kinds of biometrics to consider, too—including typing biometrics, speech recognition, and facial recognition.
Depending on your use case, however, biometrics might not be the best option. On one hand, such systems tend to be pretty pricey. On the other, not too many consumers are keen on giving away their biometric data to a company that sells socks, for example.
Multi-Factor Authentication
You can also implement a multi-factor authentication (MFA) method to make sure actual humans are accessing your systems. For example, you might have someone log into their account and then send them a text message with a one-time passcode they need to input on your website to get to the next step.
While this method can be helpful in secure environments—like banking and brokerage accounting apps—it will likely create far too much user friction for the average company.
Ad Fraud Solutions
An ad fraud solution like Anura enables you to stop bots in their tracks while also protecting you from malware and human fraud. The solution sits entirely in the background of your website, with no effect on the user experience at all.
Have Questions about Ad Fraud Detection? Get the eBook with everything you need to know!
Anura detects fraud with precision via a robust, fine-tuned solution that delivers virtually no false positives. Get the peace of mind that comes with knowing you’re never blocking real visitors. This definitive and accurate approach gives you the freedom to run your business without the worries of fraudulent visitors.
With Anura, you’re able to sell more, generate more leads, and optimize your campaigns with the peace of mind that comes with knowing your data is accurate and that fraudsters haven’t taken advantage of you. It’s the easiest way to stop bot traffic—and several other kinds of ad fraud, too—without hurting the user experience.
Request a trial or contact us to learn more.
reCAPTCHA v2 vs v3: efficient bot protection? [2021 Update]
The promise of Google’s reCaptcha v3 is to prevent bot traffic to your website, without the user friction we all associate with google reCaptcha v2. But does reCaptcha v3 keep its promise?
Let’s take an honest look at what reCaptcha v3 can and cannot do for your website security. We’ll recap the differences between reCaptcha v2 vs v3, uncover the pitfalls of reCaptcha v3 configuration, and sum up what a truly effective bot protection and mitigation solution must deliver.
Here’s what we’ll cover:
ReCaptcha recap
ReCaptcha v2: Hard on humans, too easy on bots
AI to solve reCaptcha v2 challenges
Captcha farms
ReCaptcha v3: Easy on humans, except website admins
Mapping reCaptcha v3 user scores to actions
There’s no feedback loop
Detection quality
Why reCaptcha is not a bot management solution
The reCaptcha alternative which really stops bots
Without further ado, let’s dive in.
What is reCAPTCHA?
First, a quick recap: reCaptcha is a security service provided by Google, currently used by more than 6 million websites. Its purpose is to protect websites from bot-driven abuse.
Google promotes reCaptcha as a free service, but in reality it’s only free for accounts that generate less than 1 million API calls per month.
For heavier reCaptcha uses, Google recently started charging a fee. Accounts that generate more than 1, 000 calls per second or 1 million calls per month must sign up for a reCaptcha Enterprise account. For up to 10 million calls per month, the fee is 1$ per 1, 000 calls (beyond 10 million calls, custom fees apply). So for example, if your website generates 3 million calls per month, your reCaptcha bill will be $3, 000.
Most websites are still using reCaptcha v2, which was launched in 2014. If a website visitor’s behavior triggers suspicion, reCaptcha v2 will serve a challenge that the visitor must solve to prove they’re human.
As users, we’re all familiar with the various versions of reCaptcha v2. Sometimes, all you need to do is check a box that says “I’m not a robot”. Other times, the reCaptcha will challenge you with an image or audio recognition task. Whether or not you get the full challenge will depend on how confident Google is that you really are a human.
Aren’t we all computers in a simulation anyway?
reCaptcha v2 is based on an “advanced risk analysis system” which relies quite heavily on Google cookies. If someone is browsing the web using Chrome, or has been logged into a Google account for a while, they’ll most likely just have to tick a box. A Firefox user who has disabled third-party cookies, on the other hand, is much more likely to get a difficult image recognition challenge.
But not everyone uses Chrome, and not everyone is comfortable using Google’s services. In fact, people are increasingly concerned about their online privacy. They prefer privacy-conscious browsers such as Firefox or Brave, and might even use a VPN to browse the Internet. ReCaptcha v2 will give these users tougher challenges, which will degrade their user experience and lead to lower conversion rates.
Furthermore, due to the ubiquity of reCaptcha v2, cybercriminals have found increasingly efficient automated solutions to bypass even the most difficult reCaptcha v2 challenges.
Some bots leverage recent progress in artificial intelligence to solve reCaptcha v2 challenges. More specifically, advanced neural networks help train AI models in such a way that they can automatically solve captchas.
It’s quite ironic, in a way: Google uses reCaptchas to train their image and audio recognition AI models, and cybercriminals use those advances in AI to beat the reCaptchas. The circle of digital life!
Cybercriminals can also outsource reCaptcha solving to human workers in low-cost countries via so-called Captcha farms.
Thanks to Captcha farms, attackers can use bots that aren’t even able to execute JavaScript. That’s because all that’s required to pass a reCaptcha v2 is to send a callback request containing the response token, and the Captcha farm provides this token.
Without the need for JavaScript, attackers can create bots which leverage simple HTTP request libraries instead of having to use fully automated browsers like Selenium or Puppeteer with headless Chrome. In turn, this decreases their infrastructure cost, so that they can crawl pages faster or stuff more credentials. The fees they pay the Captcha farms are minor in comparison: it costs approximately $1-3 to solve 1, 000 v2 reCaptchas.
If you want to learn more about Captcha farms and Captcha farm detection, watch this webinar recording:
Webinar: Are Captcha Farms outsmarting your website?
Having listened to some of the complaints from its users, Google developed reCaptcha v3 to provide a better user experience. Unlike v2, reCaptcha v3 is transparent for website visitors. There are no challenges to solve. Instead, reCaptcha v3 continuously monitors the visitor’s behavior to determine whether it’s a human or a bot.
For each request the user makes, reCaptcha v3 returns a score between 0 and 1 that represents how likely it is that the request originated from a bot. Close to 0: sorry, you’re a bot. Close to 1: congrats, you’re a human.
In order to improve the accuracy of this score, website administrators can define specific actions, such as “sending a friend request” or “homepage” to help the reCaptcha understand how normal user behavior will vary depending on the context.
However, there’s a catch. While reCaptcha v3 clearly improves the experience for human users by eliminating the need to disrupt their browsing with reCaptcha challenges, it also raises new problems for website administrators.
With reCaptcha v2, the only required action was to verify whether the user correctly solved the challenge or not. With reCaptcha v3, you now need to decide which action to take depending on the score. Getting this configuration right is a tricky task for even the most experienced webmaster.
For each action a user makes on your website, you have three possible responses:
Give the user access to the requested resource
Ask the user to solve a v2 reCaptcha to determine if they’re human
Block the user (hard block).
This means that you need to decide, for each action, where you want to place the threshold for a particular response. Will you block the user when their score falls below 0. 25, or will you serve them a v2 reCaptcha? What about 0. 15? Will you fully block them then, or does 0. 10 seem more appropriate? There are no clear-cut answers, which is what makes these questions so difficult.
The issue here is that the stricter you make your thresholds, the more likely you are to block actual users. The contrary is also true: the looser your thresholds, the more likely you are to leave bots undetected. You’ll need to make an unpleasant compromise between not blocking too many users and not allowing too many bots.
The reCaptcha v3 dashboard will show you a distribution of user scores for each action on your website. But that’s not enough to help you understand whether you’ve set the right thresholds, because there’s no other information to help you better understand those users.
This is particularly true when you consider that the Internet is far more diverse than we often imagine it to be. Sure, the majority of your legitimate users might browse the Internet with Chrome, Edge, or Safari, but what about the 10% of people who don’t? What about your privacy-savvy users? Their user scores will be significantly lower, and do you really want to make their lives harder with v2 reCaptchas or by blocking them without a second chance at all?
Setting blocking and authorization thresholds without a proper monitoring mechanism is like playing Russian roulette with your website’s traffic. However, collecting, storing and analyzing enough data to set these thresholds accurately requires deep bot detection knowledge and would entail significant software development costs.
There’s another problem with reCaptcha v3. It uses behavioral detection to predict whether a given request originates from a human or not. While behavioral detection is indeed extremely helpful for detecting advanced bots, learning how to accurately distinguish bots from humans requires very large data volumes.
Before reCaptcha v3 can make a decision based on behavior, It also needs a user to interact with your website for a while before it can make an accurate decision. When used alone, it therefore leaves your site vulnerable to large-scale distributed crawlers that leverage IP rotation to frequently change their IP address.
Here at DataDome, we did a quick experiment to determine whether reCaptcha v3 also uses basic client-side fingerprinting signals. Turns out it does. While v3 can easily detect “naive” bots, such as those that don’t remove the navigator. webdriver attribute or use unpatched Selenium bots, bots that forge their fingerprint will easily bypass detection.
We created a Headless Chrome bot and used the Puppeteer extra framework to forge its fingerprint. The screenshot below was taken by that bot. It had obtained an almost-perfect user score of 0. 9. A perfect intruder.
While reCaptcha v2 and v3 can help limit bot traffic, both versions come with several problems:
User experience: human users hate the image/audio recognition challenges
Captcha farms and advances in AI allow cybercriminals to bypass reCaptchas
Defining the right thresholds for v3 user scores is a very difficult task
There’s no way to monitor false positives and negatives
Advanced bots are able to bypass them.
The bottom line is that neither v2 nor v3 serves as a replacement for a proper bot management solution.
The DataDome SaaS bot protection solution offers a reCaptcha alternative which actually works for e-commerce and classified ads websites. Here’s how we address each of the above-mentioned problems.
User experience
Like reCaptcha v3, DataDome is transparent for human users. There’s no challenge to solve. But unlike v3, DataDome uses a wide range of techniques to distinguish bots from people: behavioral analysis, device fingerprinting, IP reputation, and more. All these approaches are invisible to human users.
In fact, DataDome’s customers frequently report that the user experience has improved. For some customers, bots could represent 40% of their traffic. This took a heavy toll on their server loads and, as a result, on the performance of their websites. Activating DataDome instead of reCaptcha significantly improved loading speed and user experience, since bots didn’t swamp their servers anymore.
Captcha farm and AI detection
DataDome does use Captchas as a feedback loop to enable blocked human users to continue their navigation. However, we don’t consider a solved Captcha as undisputable proof. We’ve developed different approaches to make sure that Captchas are solved by actual people, not by Captcha farms or neural networks. Every day, we invalidate thousands of forged Captcha responses.
Blocking and allowing thresholds
If you are looking for a reCaptcha alternative that works on autopilot, DataDome has you covered. Once you’ve installed our server-side module and our mobile SDK, and whitelisted your partners’ bots, you don’t need to add any other detection logic or thresholds. Our advanced detection engine takes care of figuring out whether your visitors are human or not, so there’s no complex exercise for you to go through.
Unlike reCaptcha v3, DataDome also recognizes good bots such as less popular search engine bots, content aggregators and SEO bots. This means you don’t have to worry about forgetting something or making a mistake and accidentally degrading your SEO rankings.
Of course, if you want, DataDome gives you the possibility to add custom detection logic or whitelist some of your traffic based on criteria such as IP, country, user agent, and so on.
Feedback loops
Thanks to our advanced detection engine, DataDome (unlike reCaptcha v3) has an extremely low false positive rate: below 0. 01%. I. e., per 10, 000 Captchas served, less than one is seen by a human. In those rare instances where that happens, our real-time feedback loop propagates the information to our detection engine in less than 2 ms to ensure we don’t hard block humans.
To deal with false negatives (letting bots through), DataDome’s detection engine constantly learns new bot patterns using AI. It also leverages bad traffic detected on one website to protect other websites. But we also keep humans in the loop: our team of data analysts conducts frequent traffic reviews to ensure we don’t miss any bots.
DataDome also comes with an intuitive dashboard which enables you to monitor your main traffic metrics, such as the volume and nature of bad bot requests, the number of Captchas served, etc. If you want to explore your traffic in more detail, you can do so with a real query language that you can use to explore a wide range of dimensions, such as IP address, country, type of bots blocked, and more.
Detecting advanced bots
Every day, DataDome encounters new advanced bots. Our detection engine uses behavioral AI detection, advanced fingerprinting, and IP reputation to make sure we detect even the most cunning bots. Contrary to other bot management solutions that analyze requests in batches, DataDome analyzes each request in less than 2 ms to determine if it originated from a bot or a person.
The bot we discussed above, which received a 0. 9 user score with reCaptcha v3, would be caught immediately with DataDome. Our fingerprinting module can detect advanced bots that use residential IP proxies, forge their fingerprint, or use real browsers and headless browsers automated with modified Puppeteer. The same goes for advanced Playwright bots or modified Selenium bots, even if they modify the Chromedriver binary. In a single request, we stop them.
In conclusion
While reCaptcha v2 and v3 can help block some bot traffic, they come with many problems. They degrade the user experience, can be bypassed with Captcha farms or AI, have no real feedback mechanisms, can lead to false positives and negatives, and don’t detect advanced bots.
Neither version of reCaptcha should therefore be considered as a proper bot management solution.
Want to see what type of bot traffic is on your site? You can test your site today. (It’s easy & free. )