• December 21, 2024

WordPress Anti Scraper Plugin

How to Prevent Blog Content Scraping in WordPress

How to Prevent Blog Content Scraping in WordPress

Last updated on February 17th, 2021 by Are you looking for a way to keep spammers and scammers from stealing your WordPress blog content using content scrapers?
It is very frustrating as a website owner to see that someone is stealing your content without permission, monetizing it, outranking you in Google, and stealing your audience.
In this article, we’ll cover what is blog content scraping, how you can reduce and prevent content scraping, and even how to take advantage of content scraping for your own benefit.
What Is Blog Content Scraping?
Blog content scraping is when content is taken from numerous sources and republished on another site. Usually this is done automatically via your blog’s RSS feed.
Content scraping is so easy now that anyone can start a WordPress site, put a free or commercial theme, and install a few plugins that will go and scrape content from selected blogs.
Why Are Content Scrapers Stealing my Content?
Some of our users have asked us why are they stealing my content? The simple answer is because you are AWESOME. The truth is that these content scrapers have ulterior motives. Below are just few reasons why someone would scrape your content:
Affiliate commission – There are some dirty affiliate marketers out there that just wants to exploit the system to make few extra bucks. They will use your content and other’s content to bring traffic to their site through search engine. These sites are usually targeted towards a specific niche, so they have related products that they are promoting.
Lead Generation – Often we see lawyers and realtors doing this. They want to seem like industry leaders in their small communities. They do not have the bandwidth to produce quality content, so they go out and scrape content from other sources. Sometimes, they are not even aware of this because they are paying some scumbag $30/month to add content and help them get better SEO. We have encountered quite a few of these in the past.
Advertising Revenue – Some folks just want to create a “hub” of knowledge. A one-stop-shop for users in a specific niche. Often we notice that our site content is being scraped. The scraper always replies, I was doing this for the good of the community. Except the site is plastered with ads.
These are just a few reasons why someone would steal your content.
How to Catch Content Scrapers?
Catching content scrapers is a tedious task and can take up a lot of time. The are few ways that you can catch content scrapers.
Search Google with Your Post Titles
Yup that is as painful as it sounds. This method is probably not worth it especially if you are writing about a very popular topic.
Trackbacks
If you add internal links in your posts, you will notice a trackback if a site steals your content. This way is pretty much the scraper telling you that they are scraping your content.
If you are using Akismet, then a lot of these trackbacks will show up in the SPAM folder. Again, this will only work if you have internal links in your posts.
Ahrefs
If you have access to an SEO tool like Ahrefs, you can monitor your backlinks and keep an eye out for stolen content.
How to Deal with Content Scrapers
There are few approaches that people take when dealing with content scrapers: the Do Nothing Approach, Take Down approach, or Take Advantage of them approach.
Let’s take a look at each one.
The Do Nothing Approach
This is by far the easiest approach you can take. Usually the most popular bloggers would recommend this because it takes A LOT of time fighting the scrapers.
Now obviously if it is a well-known blog like Smashing Magazine, CSS-Tricks, Problogger, or others, then they do not have to worry about it. They are authority sites in Google’s eyes.
However, we know some good sites that have gotten flagged as scrapers because Google thought their scrapers were the original content. So this approach is not always the best in our opinion.
Take Down Approach
This is the exact opposite of the “Do Nothing Approach”. In this approach, you simply contact the scraper and ask them to take the content down.
If they refuse to do so or simply do not reply to your requests, then you file a DMCA (Digital Millennium Copyright Act) with their host.
In our experience, majority of the scraping websites do not have a contact form available. If they do, then utilize it. If they do not have the contact form, then you need to do a Whois Lookup.
You can see the contact info on the administrative contact. Usually the administrative, and technical contact is the same.
It will also show the domain registrar. Most well-known web hosting companies and domain registrars have DMCA forms or emails. You can see that this specific person is with HostGator because of their nameservers. HostGator has a form for DMCA complaints.
If the nameserver is something like, then you have to dig deeper by doing reverse IP lookups and searching for IPs.
You can also use a third party service for for takedowns.
Jeff Starr in his article suggest that you should block the bad guy’s IPs. Access your logs for their IP address, and then block it with something like this in your root. htaccess file:
Deny from 123. 456. 789
You can also redirect them to a dummy feed by doing something like this:
RewriteCond%{REMOTE_ADDR} 123\. 456\. 789\.
RewriteRule. * [R, L]
You can get really creative here as Jeff suggests. Send them to really large text feeds full with Lorem Ipsum. You can send them some disgusting images of bad things. You can also send them right back to their own server causing an infinite loop which will crash their site.
The last approach that we take is to take advantage of them.
How to Take Advantage of Content Scrapers
This is our approach of dealing with content scrapers, and it turns out quite well. It helps our SEO as well as help us make extra bucks.
The majority of scrapers use your RSS Feed to steal your content. So these are some of the things that you can do:
Internal Linking – You need to interlink your blog posts a lot. When you have internal links in your article, it helps you increase pageviews and reduce bounce rate on your own site. Secondly, it gets you backlinks from the people who are stealing your content. Lastly, it allows you to steal their audience. If you are a talented blogger, then you understand the art of internal linking. You have to place your links on interesting keywords. Make it tempting for the user to click it. If you do that, then the scraper’s audience will too click on it. Just like that, you took a visitor from their site and brought them back to where they should have been in the first place.
Auto Link Keywords with Affiliate Links – There are few plugins like ThirstyAffiliates that will automatically replace assigned keywords with affiliate links,
Get Creative with RSS Footer – You can use the All in One SEO Plugin to add custom items to your RSS Footer. You can add just about anything you want here. We know some people who like to promote their own products to their RSS readers. So they will add banners. Guess what, now those banners will appear on these scraper’s website as well. In our case, we always add a little disclaimer at the bottom of our posts in our RSS feeds. By doing this, we get a backlink to the original article from scraper’s site which lets Google and other search engines know we are authority. It also lets their users know that the site is stealing our content..
Check out our guide on how to control your RSS feed footer in WordPress for more tips and ideas.
How You Can Reduce and Prevent WordPress Blog Scraping
Considering if you take our approach of lots of internal linking, adding affiliate links, RSS banners and such chances are that you will reduce content scraping to good measure. If you take Jeff Starr’s suggestion of redirecting content scrapers, that too will stop those scrapers. Aside from what we have shared above, there are a few other tricks that you can use.
Full vs. Summary RSS Feed
There has been a debate in the blogging community whether to have full RSS feed or summary RSS feed. We are not going to go into much details about that debate, however one of the PROS of having a Summary Only RSS feed is that you prevent content scraping.
You can change the settings by going to your WordPress admin panel and going under Settings » Reading. Then change the setting For each article in a feed show: Summary.
Trackback SPAM
Trackbacks and Pingbacks definitely had great uses however, they are now constantly being abused.
Often themes display trackbacks and pingbacks under or among the comments. This gives the spammer an incentive to scrape your site and send trackbacks. If you mistakenly approves it, then they get a backlink and mention from your site. Here is how you can disable Trackbacks on all future posts.
Here is an article that will show you how to disable trackbacks and pings on existing WordPress posts as well.
Is Content Scraping Ever Good?
It can be. If you see that you are making money from the scraper’s site, then sure it can be. If you see a lot of traffic from a scraper’s site, then it can be.
In most cases however, it is not. You should always try to get your content taken off. But you will realize as your blog gets larger, it is almost impossible to keep track of all content scrapers. We still send out DMCA complaints, however we know that there are tons of other sites that are stealing our content that we just cannot keep up with.
We hope this article helped you prevent blog content scraping in WordPress. You might also want to see our guide on how to prevent image theft in WordPress.
If you liked this article, then please subscribe to our YouTube Channel for WordPress video tutorials. You can also find us on Twitter and Facebook.
WordPress Content Scraping (Fight Back or Ignore?) - Kinsta

WordPress Content Scraping (Fight Back or Ignore?) – Kinsta

Content scraping, or what we like to refer to as “content stealing, ” has been a problem since the internet began. For anyone publishing on a regular basis or working with search engine optimization (SEO), it actually can be downright infuriating. The bigger you grow, the more you notice just how many content scraping farms are out there. We publish a lot of content here at Kinsta and content scraping is an issue we deal with on a regular basis. The question is, should you try to fight back or simply ignore them and move on? Today we’ll dive into some of the pros and cons of both sides.
What is Content Scraping?
Content scraping is basically when someone takes your content and uses it on their own site (either manually or automatically with a plugin or bot) without giving you attribution or credit. This is usually done in hopes of somehow gaining traffic, SEO, or new users. This is actually against copyright laws in the United States and some other countries. Google also doesn’t condone this and recommends that you should be creating your own unique content.
Here are a couple of examples of scraped content that Google mentions:
Sites that copy and republish content from other sites without adding any original content or value
Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user
Sites dedicated to embedding content such as video, images, or other media from other sites without substantial added value to the user
This is not to be confused with content syndication, which is typically when you republish your own content for broader reach. Syndicate content could also be done by a third-party, but there is a fine line between this and content scraping. If someone is syndicating content, special tags such as rel=canonical or noindex should always be used.
There are a lot of third-party WordPress plugins now that allow you to automatically grab third-party RSS feeds. And while the developers have good intentions, unfortunately, these are sometimes abused and used for content scraping. One of the reasons WordPress is so popular is for ease of use, but sometimes that can also backfire.
There is a fine line between content scraping and content syndication. Always give credit where credit is due. Click to TweetLive Example of Content Scraping Farm
We call them “farms” when the same owner scrapes content across dozens of sites. These are typically easy to spot as the WordPress site owner usually uses the same theme across all sites and even just a slight variation between domain names.
We are using a live example in today’s post! We have no shame in calling out these types of sites as they don’t provide any value and only negate the hard work done by content publishers. Here is an example of a content scraping farm. We archived each link in case the sites go down in the future. You can click on each one of them and see they are all using the same theme, and same scraped content. Typically a scraper will grab content from a lot of different sources, our blog is one of them.
(archived link)
You can see below, they are simply scraping our blog posts word for word, along with all of our articles across all of the domains above.
How to Find Them?
One of the easiest ways to find them is to utilize a tool like Copyscape or Ahrefs (if they are also copying your internal links). Copyscape even allows you to submit your sitemap file and have it automatically notify you as it scans the web and finds content.
Copyscape
You can also manually search Google using the “allintitle” tag. Simply input the tag along with your post’s title. Example: allintitle: Kinsta Handles WordPress Caching So You Don’t Have To
Search Google with allintitle tag
The allintitle keyword prompts Google to search for those words in the titles of posts only. The second and more effective way is to search for some text within your post, with the search term in double-quotes. Putting the double quotes tells Google to search for the exact same text. You may get false positives with your title search, as someone might use the same title, but the second way is far more effective because it’s highly unlikely that someone will have the exact same sentences or paragraphs.
Does Content Scraping Affect SEO?
The next question you probably have is, how does this affect SEO? Because in the example above, the content scraping farm isn’t using rel=canonical tags, giving credit, or noindex tags. This means that when Google bot crawls it, it’s going to think that it’s their original content. That’s not fair you might think. You’re right, it’s not. We published the content and then they just scrape it. However, before you start panicking, it’s important to understand what really goes on behind the scenes.
First off, even though the Google crawler might see it as their content, most likely the Google algorithm doesn’t. Google isn’t stupid and has many rules and checks in place to ensure original content owners still get the credit. How do we know this? Well, let’s take a look at each of these posts from an SEO perspective.
This person scraped our blog post back in November 2017, so it’s had plenty of time to rank if it was going to. So we pull up our handy Ahrefs tool and check to see what current keywords their post is ranking for. And we can see it’s not ranking for any keywords. So as far as organic traffic goes, they don’t benefit from this post at all.
Content scraping SEO
If we pull up our original blog post in Ahrefs we can see we rank for 96 keywords.
Original content SEO
When Google sees what you might think is duplicate content, it uses a lot of different signals and data points to figure out who originally wrote the content and what should be ranked. Here are a couple of examples:
Publish dates (although in this case the content was scraped on the same day)
Domain authority and page rank. Yes, Google is probably still using page rank internally
Social signals
Traffic
Backlinks
Again these are all safe assumptions, being that no one really knows what Google uses. But the point here is that you probably don’t need to lose sleep over someone scraping your content. However, you still might want to do something about it. It’s also not impossible for someone else to outrank you with your own content. We’ll go into this further below.
Want to know how we increased our traffic over 1000%?
Join 20, 000+ others who get our weekly newsletter with insider WordPress tips!
Subscribe Now
What We Do About Content Scraping
Creating useful, unique and share-worthy content is not easy, it takes a lot of your valuable time (and often costs a lot of money) so you should definitely protect it. But here are some additional reasons why you might not want to ignore scrapers.
If a site with a significant amount of traffic is scraping your content and using it to supplement their other content, it could very well be that they are benefiting from it. This definitely isn’t right as you’re the original owner of the content.
Things like this can seriously skew data in your reporting tools and make your life harder. For example, these will show up in backlink reports in tools such as Ahrefs or Majestic. The bigger you are, the messier it gets.
Do you want to put your trust solely in Google to figure out if theirs or yours is the original content? Even though they are pretty smart about this, we surely don’t. Also, even though their post has no search engine rankings for any keywords, it actually is indexed by Google (as seen below).
Scraped content is indexed
Contact Website Owner and File DMCA Complaint
To ensure we get credit where credit is due, we usually first contact the owner of the website and request removal. We recommend creating a few email templates you can reuse to speed this process up and not waste your time. If we don’t hear from them after a couple tries, we take this a step further and file a DMCA complaint.
DMCA complaints can be a little tricky as you’ll need to look up the IP of the site, find the host, etc. But not to worry, we have all the steps documented on how to easily file a DMCA complaint, as well as track down the owner. You can also file a legal removal request directly with Google.
As far as the live case study example above, it looks like it’s time to take that next step as we haven’t been able to reach the website owner.
Update Disavow File
To ensure these don’t impact our site in any way (regardless of what happens with the DMCA complaint), we also add these entire domains in our disavow file. This tells Google we want nothing to do with them, and that we’re not trying to manipulate SERPs in any way.
If you’re doing this for a higher quality site, you can also just submit the URL for disavowal, instead of the entire domain. Although typically we don’t see high-quality sites scraping content.
Step 1
In Ahrefs we select the domain in question and click on “Disavow Domains. ” This ensures everything from this content scraped website never impacts us.
Ahrefs disavow domain
The great thing about Ahrefs when dealing with these types of issues is their “Hide disavowed links” option. It then automatically hides the domains and URLs from showing up in your main report in the future. This is super helpful for organization and keeping your sanity, especially if you are exclusively using Ahrefs to manage your backlinks.
Hide disavowed links
Step 2
As you can see below we added all of the domains from the content scraping farm to our disavow links section in Ahrefs. The next step is to click on “Export” and get the disavow file (TXT) that we need to submit over in Google Search Console.
Export disavow file
Step 3
Then head over to Google’s Disavow Tool. Select your Google Search Console profile and click on “Disavow Links. ”
Disavow links
Step 4
Choose your disavow file you exported from Ahrefs and submit it. This will overwrite your previous disavow file. If you haven’t been using Ahrefs before in the past and a disavow file already exists, it’s recommended to download the current one, merge it with your new one, and then upload it. From then on, if you’re only using Ahrefs, you can simply upload and overwrite.
Disavow file
Block IPs of Scrapers
You could also take this a step further and block IPs of the scrapers. Once you have determined unusual traffic (which can sometimes be hard to do), you could block it on your server using. htaccess files or Nginx rules. If you’re a Kinsta client our support team can also block IPs for you. Or if you’re using a third-party WAF such as Sucuri or Cloudflare, these also have options to block IPs.
Summary
Content scraping farms might not always affect your SEO, but they definitely aren’t adding anything of value for users. We highly recommend taking a few moments to get them taken down. We have a whole Trello card devoted to “takedown” requests. This helps make the web a better place for everyone and ensures your unique content is only seen and ranked on your site.
What do you think about content scraping? Do you try and fight them or just ignore them? We would love to hear your thoughts down below in the comments.
Save time, costs and maximize site performance with:
Instant help from WordPress hosting experts, 24/7.
Cloudflare Enterprise integration.
Global audience reach with 28 data centers worldwide.
Optimization with our built-in Application Performance Monitoring.
All of that and much more, in one plan with no long-term contracts, assisted migrations, and a 30-day-money-back-guarantee. Check out our plans or talk to sales to find the plan that’s right for you.
WP Shieldon – WordPress Firewall - Plugins

WP Shieldon – WordPress Firewall – Plugins

Details
Reviews
Support
Development
WP Shieldon is a WordPress security plugin based on Shieldon library, a Web Application Firewall (WAF) for PHP.
When the users or robots are trying to view many your web pages at a short period of time, they will temporarily get banned. To get unbanned by solving Catpcha.
You can visit the plugin author – Terry L. ‘s blog and try reloading the pages several times then you will see how this plugin works. You can also try Terry’s login page then you will find it protected. More information about Shieldon, please visit
Please note that there are three important things you must understand before using WP Shieldon:
WP Shieldon is not for beginners.
Turning Trusted Bot component on to allow search engine crawlers such as Google, Bing, Yahoo, and others smoothly crawling your website.
Open Source Code
Plugin:
Core library:
Features
Realtime statistics – See who are browsing your website and their status.
Beautiful and detailed statistics and dashboard.
Block bad bots by default – Backlink crawlers, copyright crawlers and WayBack machine bot.
IP manager – Block signle IP or IP range as you want. (IPv6 supported)
Online session control – You can limit just how many visitors browsing your website. Good for webmasters whose blog is hosted on a share hosting.
SEO friendly – You can allow popular search engines such as Google, Bing, Yahoo and others, put them in the whitelist.
XML RPC, Login, Signup page protection.
Multiple data drivers – Redis, SQLite, File system, MySQL.
Multiple CAPTCHA modules – Google reCAPTCHA v2, v3 and Image CAPTCHA.
XSS Protection.
Page authentication.
Many others you can find by yourself
Check out my other WordPress works here:
Markdown Editor – WP Githuber MD – an all in one Markdown editor.
SEO Search Permalink – Static search permalink.
Mynote Theme – Theme for programmers.
Documents
Traditional Chinese
English
Copyright
WP Shieldon, Copyright 2019
WP Shieldon is distributed under the terms of the GNU GPL
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
“WP Shieldon – WordPress Firewall” is open source software. The following people have contributed to this ntributors
Terry L.
= 1. 0. 0
First release.
= 1. 1
Fix Redis check.
= 1. 2
Add CDN setting.
= 1. 3
Fix setting URL in plugin page.
Fix SQLite check.
Fix daemon Enable button not working.
= 1. 1. 0
Add dashboard
Fix passcode not working.
= 1. 2. 0
Add report – IP log table.
Add report – Session table.
Add report – Rule table.
= 1. 1
Fix variable type error.
= 1. 2
Add new tabs for more period data on dashboard.
Fix component loading issue.
= 1. 3
Update readme.
Fix i10n issues.
Imporve statistics pages.
= 1. 3. 0
Add IP management on IP rule table page.
= 1. 1
Fix passcode login issue.
= 1. 2
Fix JavaScript conflicts.
= 1. 3
Improve performance.
= 1. 4
Exclude password-protected posts’ form.
Update
= 1. 4. 0
Update Shieldon core.
Add feature – XSS protection.
Add feature – WWW-authenticte page protection.
Add setting page – Overview.
= 1. 1
Update localization strings. (zh_TW, zh_CN)
Fix typo.
= 1. 2
Fix issue #6 – Issue with ‘Save Draft’ feature.
= 1. 3
Update Shieldon core up to 0. 7
Update translation strings.
= 1. 5. 0 (WordCamp Taipei version)
Assign language code to dialog UI.
= 1. 6. 0
Update Shieldon kernel.
Add options in settings allowing to avoid conflicts with some WordPress core functions.
Add an option for switching Action Logger.
Add import/export settings feature.
Add page – operation status.
= 1. 1
Fix Website Healthy Check issue.
Add an option for allowing only logged-in users can accesss REST API.
Add an option for disabling XML-RPC.
= 1. 2
Add Facebook and Twitter bots into whitelist in Trusted Bot component.
Add current website’s IP address into whitelist by default.
Prevent automatic favicon requests.
= 1. 3
Ignore AJAX calls.
Fix issue #3 – Notice appearing on PHP 7

Frequently Asked Questions about wordpress anti scraper plugin

How do I stop WordPress site from scraping?

Below are 10 methods to protect your site from content scrapers.Rate Limiting and Blocking. You can fight off a large portion of bots by detecting the problem first. … Registration and Login. … Honeypots and Fake Data. … Use a CAPTCHA. … Frequently Change the HTML. … Obfuscation. … Don’t Post It!Apr 29, 2017

How do I stop content scrapers?

Preventing Web Scraping: Best Practices for Keeping Your Content SafeRate Limit Individual IP Addresses. … Require a Login for Access. … Change Your Website’s HTML Regularly. … Embed Information Inside Media Objects. … Use CAPTCHAs When Necessary. … Create “Honey Pot” Pages. … Don’t Post the Information on Your Website.Aug 11, 2014

What is scraping in WordPress?

If you want to create a price comparison site or dropshipping store, WordPress scraper plugins can be very useful. Web scraping consists of gathering information from the web. That information is then organized or imported. Some people consider scraping as an unethical or questionable activity.Jun 18, 2021

Leave a Reply