How to Create and Automate a Large Disavow Link File for Google
If you're managing a site that's popular, you'll notice a large influx of suspicious-looking links. This shows up in various ways - sites that duplicate your content, spun content linking to you, spammy directories linking to your site, and so on. How do you tell Google that these aren't your sites and you don't want to be affiliated with them? The disavow feature.
In the early days of a website, it's no problem to add a link here and there to a disavow file, but when you start getting into the thousands of links, this becomes a seriously time-consuming task. In this article, we break down our approach to automating this task as much as possible.
Why Disavow Links?
There's no consensus in the SEO industry as to whether or not we should bother disavowing links. Google themselves have been fairly consistent in saying that they recognize that there is a natural, expected level of internet spam and scraping, and that they don't penalize websites for these links.
However, I personally err on the side of "better safe than sorry." I've seen these spammy links become more and more sophisticated, and some of them very much look like they're embedded as part of a link farming or PBN (private blog network) scheme. These are the types of links we don't want our websites to be associated with, as they can trigger penalties. They don't have any value to bolster a website's rank, so why not disavow them? And of course, as is tradition with Google and their messaging, they recently muddied the waters as shown below:
So, to ensure that you're set to protect against future penalties, and to help value the links you actually want counted more, there's simply no reason to not disavow links with Google.
Link Disavowing Index
How to Submit a Disavow File
This guide is aimed more at existing webmasters who have some knowledge of the disavow tool and how to use it, but for a comprehensive guide, check out Google's support article and Ahref's guide on the disavow tool. But a few notes I want to point out:
- Always disavow at the domain level, as this will cover all future spammy links generated by that website.
- Don't use this tool if you've only got a handful of links, or are unsure as to whether the links you're adding are low quality or not. Only use it for obvious spam.
- Check, double-check and triple-check. You absolutely don't want to accidentally disavow a high-quality link.
- If you've used the tool before, Google allows you to download your current disavow file. So, add to that rather than overwriting it, so you can retain all your previous disavow work.
What's the general breakdown of backlink types across most sites?
Tools We Use to Create Our Master Disavow File
In order to create a disavow file that's as accurate and effective as possible, we have to resort to using multiple tools. The problem with backlinks, as they relate to internet crawlers, is that no single crawler is comprehensive. You've got to use multiple sources to determine what links your site has, and what the quality of each is. So here's a breakdown of what we use to create our list:
Good ol' faithful data management software Excel is where we create our list, remove duplicates, strip down URLs to just the domain, and compare the value of links from each data source. You're going to want to refresh your Excel skills to the point where you're comfortable with data comparison, filtering and the 'text to column' and 'concatenate' functions.
Google Search Console
As a webmaster, you should be pretty familiar with Search Console (formerly Webmaster Tools) to navigate crawl information, search performance, and other important info needed to manage and monitor your website. This is one of the sources we use to build our link list.
Our preferred SEO tool, Ahrefs gives us a lot of information as to the value of a website. We download our list of links here, and use their metrics to get a general idea as to the value of a link.
Another tool similar to Ahrefs, Majestic gives us another source for finding backlinks to compare with the previous two. We include this one because none of the three link sources we use seem to have the same info. We find links in each that the others haven't yet crawled or don't display. So it gives us another backup to make sure we don't disavow links that might actually have value.
If you're invested in disavowing links for the benefit of SEO, you're going to want to monitor your traffic and ranks, and we prefer to use Agency Analytics for this. It's a comprehensive all-in-one dashboard, but for the purpose of this article, we use it to make sure we have historical SEO data to make sure that our efforts either help, or more importantly, don't hurt our overall site value. You can use Agency Analytics or your preferred rank-tracking tool for this.
Do You Need to Submit a Disavow File?
If you're unsure as to whether or not you should invest in a disavow file, consider a few things:
- If you have a low total number of backlinks overall, say under 100, you probably don't need to worry too much.
- If you don't have a high number of quality backlinks, links you may consider disavowing could actually be giving you value.
- If SEO isn't a priority - if most of your traffic is referral or direct - don't fret too much.
- If you don't have the ability or time to constantly monitor your page rankings, and your disavow file negatively affects you, you may not catch it for weeks or months and lose business in the meantime.
- If the majority of links you're considering disavowing are tagged as 'nofollow,' don't worry about it, as disavowing has the same general result as that tag.
The disavow tool is for more advanced webmasters and SEOs. That's not to sound exclusive in any way, it's just a minor part of your overall SEO strategy and won't have as strong an impact as other things like site optimization, content creation and quality backlink building.
How to Create Your Disavow File and Automate Most of the Link Auditing
When you've got a website with thousands of backlinks (or in some cases we've seen, millions), it's no longer feasible to manually audit each one to see if it's high quality or not. So we've got to automate at least a portion of the process for our sanity, and so we can spend time on more valuable work. The problem with this, of course, is that Google does not publish any information on the value of links according to its algorithm. This is why we have to include multiple tools to automate a portion of the appraisal process. Here's the step-by-step on how to build your file with as much of it automated as possible.
Step 1 - Compiling All of Your Links
To get started, we're going to want to create a master list of all links to your website. We pull this data from all three of our sources - Google, Ahrefs, and Majestic.
For Google, you'll notice that there are two options to download your links - 'more sample links' and 'latest links.' What's the difference between the two? Exporting your latest links will give you up to 100,000 of the most recent links, while exporting more sample links provides you just that, a sample of the overall list not limited by recency. According to Google's page on the topic, "This is useful when you have many more than 100,000 pages linking to your site, because it shows some data truncated by the Latest Links export due to length limits." In my example, and for most websites, we have far fewer than the limit, so the data will be the same for each report.
For Ahrefs, exporting the file is fairly simple, but pay attention to the 'one link per domain' selection. We don't need all links, but rather just all domains to determine value - plus, we're building our disavow file at the domain level, not for each URL.
The same goes for Majestic - make sure to select to display only one backlink per domain, to download only the links we need for this job.
Step 2 - Creating the Master Domain List
On each Excel sheet you've downloaded, for now, only copy the URL column and paste it to a new sheet you create - no header needed for the row. At this point, you've got a comprehensive list of all links to your website in one master sheet. Now it's time to clean it up. First up is to strip the list to show only the domains. We do this using Excel's 'text to columns' function.
Select the entire column, then navigate to 'data' -> 'text to columns.' From there, you'll want to choose 'delimited' and click 'next.' On the next screen, uncheck 'tabs' under the 'delimiters' section, check 'other' and enter '/' as the delimiter. Check 'treat consecutive delimiters as one,' and click next. On the next screen, keep all the settings as default and then click 'finish.' What we've done here is split the URLs across different columns, with the backslashes - the default way to denote folder levels in URLs - indicating where to split the text. By doing this, we can strip each URL down to its domain name only - which is what we'll be using for our appraisal.
At this point, delete all of the columns aside from the leftmost one containing the domain names. Then, select that column and click the 'remove duplicates' selection under the 'data' tab. Now we've got a spreadsheet of only the domains of all our backlinks!
Step 3 - Getting the Domain Ratings
Now that you've got your master list of domains, you're going to want to automate the valuation of each domain. But, we'll have to do a bit of manual work here, with automation helping to expedite the process. Unfortunately, due to the nature of SEO and the fact that Google does not publish its algorithms or rating criteria, the tools we use are based on third-party "best guesses." But, it's generally a decent start to whittle down our list to only a handful that we'll then have to manually evaluate.
The first step is to use Ahrefs' Batch Analysis tool to get their ratings on our list of domains. Of course, they can't make it easy on us, and don't provide the ability to upload our newlycreated Excel spreadsheet to their system. We have to copy and paste the data into a field, with a limit of 200 at a time. If you've got thousands of domains left at this point, you may opt to skip Ahrefs altogether for now, but I don't recommend it if possible, as they still provide us valuable data.
Navigate to their Batch Analysis tool, copy your domain column (200 rows at a time) and paste that list into the page field. From there, make sure 'protocol' is set to 'http + https,' 'auto mode' (or 'subdomains' - the result will be the same) is selected under 'target mode,' and lastly select the 'live' index option. Click 'analyse' (you Europeans and your odd spellings and useful units of measurements!) and then export the results. Repeat this until you've got all your domains done. Then, you'll have to undertake the process of merging all your newly created spreadsheets.
Once you've got your spreadsheet, delete every column except for 'target' (this is the domain), 'domain rating' and 'total traffic.' These are all that we'll be using to value a site. Save this file, or add another tab to your master domain spreadsheet and copy/paste the results. Next up is Majestic.
Thankfully, Majestic allows us to upload our spreadsheet of domains (get with the times, Ahrefs!), so we don't have to worry about repeating the same task over and over, creating a bajillion spreadsheets. Navigate to 'tools' -> 'bulk backlink checker,' and select the option to upload your file. From there, select 'single column' and hit the 'submit' button. It will pop up a screen letting you know how much of your crawl budget you'll be using, so as long as you're not out, click 'accept' and it'll process your file. From there you can download it, but note that Majestic exports this file as a .gz extension file. This is just another form of .zip, so you can extract the archive using a program like WinZip or 7-Zip.
You've got your file from Majestic now. Same as with the Ahrefs file, delete all columns except for 'item' (which is the domain), 'trust flow' and 'citation flow.' These are the metrics we'll be using for domain quality from Majestic. Copy the results into your spreadsheet or tab that contains the Ahrefs results, and make sure that the domains line up so that rows aren't mismatched. Now we're ready to compare and appraise our domains.
Step 4 - Filtering Out the Good Domains
We're at the final push - getting the domains for our disavow file! At this point, we should have a spreadsheet that has the following rows: Domain, Ahrefs Domain Rank, Ahrefs Traffic, Majestic Trust Flow and Majestic Citation Flow. Now it's time to filter out the domains that meet the robots' criteria, leaving us with only suspicious domains with low scores that indicate they may be the source of our low-quality links.
In Excel, select the 'data' tab and then press 'filter.' This will add a small dropdown arrow to the top row of each column on our spreadsheet. From here, we can select to filter each row based on our criteria - in this case, a numerical value. In my example, I filter domains with a 'Ahrefs DR' less than 20 and a 'Majestic TF' of less than 10. This gives me a general list of the domains I'd consider low quality.
However, and this is important, those numbers are just general estimates. I highly suggest you play with filtering each column based on criteria applicable to your own website and its backlink profile. I added the other metrics like traffic and citation flow to give me another column that will alert me that a site might have value (sometimes these numbers are high when the others aren't, which should trigger a manual look).
For instance, we're a web design firm, and as such, often put a footer link back to our website on sites we build. Brand new websites for young or new businesses won't have much history or value to them. In my example above, I immediately recognized several websites that we built showing up under my criteria for poor quality, so I had to continue to filter and even manually remove some domains for my final disavow file.
If you want to play it completely safe, but still improve Google's opinion of your link profile, you can enter '0' (zero) as the value for both filters, and you'll still get a good number of spammy domains to disavow. Find your happy levels that work for you, and make sure to manually take a look at those domains you're not sure about. Sometimes you'll find new, legitimate blogs that reference you which you want to keep in your link profile. Sometimes you'll find low quality directory sites that just happen to have really good PBNs powering them, allowing them to pass your filters. So always take a look at that middle area between "obviously good" and "obviously poor."
Step 5 - Creating the Disavow File
Now that you've got your list of domains you want to disavow, it's time to format them properly for Google. This part is pretty easy, thanks to Excel's concatenate function. The format for our final disavow file has each row shown as "domain:somethinghere.com" - so we just need to quickly format our spreadsheet to look correct.
Copy your filtered list of bad quality domains to a new spreadsheet or tab. From there, insert a new column in front of the domains and enter "domain:" as the value in each cell. In the third column, enter the formula "=concatenate(A1,B1)" in the first cell. That will create a function to merge the previous two columns. Copy that function down the entire column, and you'll have your values for your disavow file. Copy that column into a new text file or spreadsheet and save it as a text file, and you should end up with a text file that looks like the below.
That's it, you're done! Upload the file according to the instructions mentioned earlier in the post. You should get verification from Google's system with the number of domains you've disavowed, which should match with your spreadsheet. Now we sit back, wait, and monitor.
How to Monitor and Gauge the Results of Your Disavowed Links
The ultimate goal of disavowing links is to improve your backlink profile, thus improving your website's overall SEO. So, the principles of monitoring the effect of your disavow file are the same as monitoring any SEO work - wait, be patient, and track major changes in keyword ranking and website traffic.
We use the aforementioned Agency Analytics to track our targeted keywords long-term, and Google Analytics to track web traffic. This is good enough to get a sense as to how your efforts are working. Mark the day that you submitted your file, and monitor changes from that date on. If you see a gradual (or even dramatic) increase in your website's presence in the SERPs, congratulations, it worked! But don't be disappointed if you don't see any change - disavowing spammy links won't make a difference for most websites, as Google has likely given no positive or negative value to those links in the first place.
However, if you see a noticeable dropoff in your ranks and traffic, you may have to reevaluate your disavow file and undo it. If you got too aggressive in your setup process and disavowed links that were actually giving your website quality, or if you didn't have any other quality links to lean back on, you can see a negative result.
The general policy for how long it takes to see results from submitting your disavow file is a few weeks. It takes a few days to process, and a longer time for Google to crawl and assign new value to your URLs. So, be patient. If you've got a very popular site with a lot of traffic, you'll likely see results sooner since Google is crawling your site more, but if you've got a newer website or one that isn't highly visited, it may take much longer to see any results.
Use the disavow feature to clean up large amounts of webspam pointing to your website, and automate the process as much as you can, but don't forget to be smart and safe with your approach.
On the Process of Evaluating a Backlink
When it comes to determining whether or not a backlink is good and worth keeping associated with your website, there isn't a clear-cut rule or formula we can apply. Since Google doesn't provide their criteria to the public, and the nature of the web is such that new websites of all shapes and sizes are being created every day, we have to apply our critical thinking to evaluating links. Here are a few rules I follow:
Get a General Idea with Third-Party Tools
As we show in this guide, we use Ahrefs and Majestic to view quality scores for URLs, which gives us a general idea of site value. However, remember that these quality scores are based primarily on the number of backlinks a site has pointing to it. You may very well have a backlink from a website that's valid, but doesn't have a lot of backlinks because it's new or not being marketed yet. A low quality score doesn't mean its link to you is unnatural or without value.
Look at the URL Structure
A quick and dirty way to tell the quality of a link is by looking at the URL and its structure. If you see that it's one of the billion websites created that show results based on keywords or spun content, you'll see a lot of query strings or page-level type URLs. And, if the TLD (the .com or .org portion of a domain name) is from a country you don't operate in, it may be fishy.
Look at the Linking Page's Title
Ahrefs helps a lot here, as they show the title of the page linking back to you. On spammy spun-content websites, the title will often be nothing more than keyword-stuffed gibberish. That's a clear sign of a site you want to disavow.
Look at the Anchor Text
A dead giveaway for a low-quality link is the anchor text - the text used to link to you. You can see how you're linked contextually, and often it won't make any sense. In Ahrefs, you can see a snippet of the sentence you're linked in, and if it's nonsense, you likely want to disavow that page.
Actually Look at the Page Linking to You
If all else fails, take a look at the website that's linking to you. Sometimes you'll be surprised at where you're getting a link and want to keep it, but most of the time you'll realize that the site looks obviously spun and templated or just low-quality altogether. Make sure to have an ad blocker running in your browser, and decent antivirus software, if you're going to go digging into questionable websites.