Screaming Frog Explained
If you’re not familiar with the world of internet marketing, and someone told you that you should use Screaming Frog, what would you think? I’d probably call them crazy. As it turns out, though, Screaming Frog is an excellent tool far too few people actually use to its full extent.
What is Screaming Frog?
Now, when people talk about Screaming Frog, they aren’t talking about the company and their web marketing services. You’re certainly free to contract them for any and all SEO work you want done, but that’s not what I’m here for. I’m here to teach you how to use the free tool they provide, the Screaming Frog SEO Spider Tool.
A spider is a piece of software that crawls around on a website, harvesting data and presenting it to the owner of the spider. Google has a fleet of these things it uses to index the Internet as completely as possible, for use in the search results. Other search engines – everything from Yahoo and Bing to oddballs like Million Short either use search spiders of their own, or pull data indexes from other entities that use spiders.
This particular spider is a desktop application you can download and run from your local PC, regardless of platform. It fetches SEO data, including URL, meta data, Schema categories, and more.
The primary benefit of Screaming Frog’s Spider is the ability to search for and filter various SEO issues. You don’t have to have a deep knowledge of SEO to figure out what is and isn’t done properly; the tool will help filter it for you. It can find bad redirects, meta refreshes, duplicate pages, missing meta data, and a whole lot more.
The tool is extremely robust. The data it collects includes server and link error, redirects, URLs blocked by robots.txt, external and internal links and their status, the security status of links, URL issues, issues with page titles, meta data, page response time, page word count, canonicalization, link anchor text, images with URLs, sizes, alt text, and a heck of a lot more.
Essentially, when I talk about doing a site audit or a content audit, everything I recommend you harvest can be harvested with Screaming Frog, and a whole lot more. Plus, since the tool is made to be SEO-friendly, it follows Google’s AJAX (Francis) standard for web crawling.
Now, the basic tool is the Lite version of the tool, which you can download and use for free. However, it limits you in several notable ways. Primarily, you can only crawl 500 URLs with it, and you lack access to some custom options, Google Analytics integration, and a handful of other features.
I highly recommend, if you have a medium or large-sized site with over 500 URLs you would want to crawl, that you buy the full license. It’s an annual fee of 99 British Pounds, which works out as of this writing to be about $140. Given that it works out to be under $12 per month, most businesses can easily afford it, and it’s well worth the price.
By default, Screaming Frog obeys the same directives as the Googlebot, including nofollow and noindex tags in your robots.txt. However, if you want, you can give it unique directives using it’s own user agent, “Screaming Frog SEO Spider”. This allows you to control it more directly, and potentially give it more access than Google gets. You can read more about how to do that on their download page, at the bottom.
The First-Timer’s Guide to Screaming Frog
Regardless of the size of your site, unless you’re 100% certain you’ve done everything right and you haven’t made a mistake – you’re wrong if you believe that, by the way – the first thing you want to do is complete a total site crawl.
I’m going to be assuming you’re using the full version of Screaming Frog to make sure you haven’t missed anything.
Again, it’s super cheap, just buy the license.
- Click the configuration menu and click spider.
- In the menu that appears, click “crawl all subdomains” so that it’s checked. You can crawl CSS, JavaScript, Images, SWF, and External links as well to get a complete view of your site. You can leave those unchecked if you want a faster crawl of just page and text elements, but no media or scripts.
- Initiate the crawl and wait for it to complete. It will be faster the less you have checked in the configuration menu. It is also limited by what processing power and memory you have allocated to the program. The more powerful your computer, the faster it will crawl.
- Click the Internal tab and filter your results by HTML. Click to export. You will be given a CSV file of all the data crawled, sorted by individual HTML page. You can then use this to identify issues on a given page and fix them quickly and easily.
If you’ve found that Screaming Frog crashes when crawling a large site, you might be having high memory issues. The spider will use all the memory available to it, and sometimes it will go higher than your computer will allow it to handle. In order to put a throttle on it and keep it from crashing, you will need to go back to that spider configuration menu. Under Advanced, check “pause on high memory usage.” This will pause the spider when it’s eating your resources beyond where it can handle.
IF you find that your crawl is timing out, it might be due to the server not handling as many requests as you want to send in. To rate limit your crawling, go to the speed submenu in the configuration menu and pick a limit for the number of requests it can make per second.
If you want to use proxies with your crawling – for competitive research or to avoid bot-capture blocking – you will need to click configuration and click proxy. From within this menu, you can set a proxy setup of your choice. Screaming Frog supports pretty much any kind of proxy you want to use, though you will want to make sure it’s fast and responsive, otherwise your crawl will probably take forever.
Performing a Link Audit with Screaming Frog
Links are difficult to audit, because they can be difficult to harvest. How many links do you have on a typical page? Couple that with all of your parameters and you have a lot of information you need to gather. Here’s how to do it with the spider.
- In the spider configuration menu, check all subdomains but uncheck CSS, images, JavaScript, Flash, and any other options you don’t need. Decide if you want to crawl nofollowed links and check the boxes accordingly.
- Initiate the crawl and let it run until it’s finished.
- Click the Advanced Report menu and click “All Links” to generate and export a CSV of all of the links it crawls, including their locations, their destinations, their anchor text, their directives, and other data.
From here you can export the data, or you can sort it as much as you like. Here are some sorts and actions you can perform.
- Click the internal tab and sort by outlinks. This will show you the pages with the most links on your site. Pages with over 100 links are generally suspect according to Google, so you may want to audit those pages to determine why they have so many links, and how you can minimize them.
- Click the internal tab and click status code. Any links that show the 404 status code are links that are broken; you will want to fix those links. Links that report a 301 or other redirect may be redirecting to homepages or to harmful pages; check them and determine if they should be removed. You can also generate specific reports for different types of status codes – 3XX, 4XX, or 5XX for redirections, client errors ,or server errors respectively – under the Advanced Report drop-down.
- This guide shows you how to use Majestic and Screaming Frog together to find internal linking opportunities.
Performing a Content Audit with Screaming Frog
Content audits are hugely important, because a ton of the most important search ranking factors today are all content-based. Site speed, HTTPS integration, mobile integration, Schema.org; these are all important, but they aren’t as important as having high quality content, good images, and a lack of duplication.
- Perform a full site crawl, including CSS, Images, scripts, and all the rest. You want as much data as possible.
- In the internal tab, filter by HTML, then scroll over to the word count column and sort it low to high. Pages with anything under 500-1000 words are likely to be thin content; determine whether they should be improved, noindexed, or removed entirely. Note: this will require some interpretation for e-commerce sites, particularly pages with minimal but valuable product information.
- In the images tab, filter by “missing alt text” to find images that are holding back your site by not having alt text associated with the images. You can also filter by “alt text over 100 characters” to find images with excessive alt text, which is generally detrimental to the user experience and to your search ranking.
- In the page titles tab, filter for titles over 70 characters. Google doesn’t display much more than that, so extra-lengthy titles aren’t doing you any favors. Truncate or edit the titles to remove excessive characters that aren’t doing you any good.
- In the same page titles tab, filter by duplicate to find pages that have duplicate meta titles. Duplicate titles indicate duplicate content, which is a Panda penalty and can be hurting your search ranking significantly. If the pages are unique, change their titles to reflect their content. If the pages are duplicates, remove one and redirect its URL to the other, or canonicalize the content if necessary.
- In the URL tab, filter by duplicate to find similar duplication issues that need canonicalization to fix.
- In the meta description tab, filter by duplicate to find lazy duplicated meta descriptions on unique pages, or duplicate pages that have had their titles changed to make them appear more unique. Fix these issues ASAP, they are hurting your site.
- In the URL tab, filter by various options to determine pages that have non-standard or non-human-readable URLs that could be changed. This is particularly important for pages with non-ASCII characters or excessive underscores in the URL.
- In the directives tab, filter by any directive you want to identify pages or links that have directives attached to them. Directives include index/noindex, follow/nofollow, and several other directives in much less common use. This can also be used to determine where canonicalization is already implemented.
Creating an XML Sitemap
Sitemaps are incredibly helpful for Google, as they let the search engine know where all of your pages are and when they were last updated. You can generate one in a number of different ways, but Screaming Frog has its own method if you want to use it. All you need to do is crawl your site completely, including all subdomains. Then click on the “Advanced Export” menu and click the bottom option, the XML Sitemap option. This will save your sitemap as an Excel table, which you can then edit. Open it and selec read online and “open as an SML table.” Ignore any warnings that pop up. In table form, you can edit your sitemap easily, and you can save it as an XML file. When that’s done, you can upload it to Google.
If you are finding that certain sections of your site are not being indexed, you may have an issue with robots.txt flagging those subfolders as noindex. Additionally, if a page has no internal links pointing to it, it won’t be crawlable. Make sure any page you know exists but that doesn’t show up has an internal link pointing to it.
No comments: