Create a content inventory with Screaming Frog

Business Benefits

Catalog all the URLs on your site that you may want to optimize for users and search engines.

Remove unnecessary items from your crawl.

Under Configuration > Spider, uncheck the Crawl boxes for Images, CSS, JavaScript, SWF, and External Links.

Enter your homepage URL and start the crawl.

Crawling may take a few seconds or a couple of hours, depending on the size of your site. If your site has more than 20,000 pages, Screaming Frog may crash. Either save your progress periodically or consider a different tool.

Monitor the crawl to identify any “crawl traps.”

If the crawler gets stuck in a subdirectory (/wp-content/) with thousands of irrelevant pages, pause and clear the crawl, add the subdirectory to the Exclude filter, and start the crawl again. You may need to repeat this step.

Once the crawl has completed, export the crawl as a CSV file and open in Google Sheets or Excel.

In the Content column, delete all rows that are not HTML.

In the Status Code column, delete all rows that are not a 200.

Delete all columns except for Address, Title 1, and Meta Description 1.

Alternatively, keep H1 and Word Count if you’re interested in reviewing those elements.

Sort the spreadsheet on the Address column.

Remove any pages that you don’t expect to optimize.

Examples include:

  • Privacy Policy
  • Pages with UTM parameters
  • Tag pages
  • Author pages
  • Pagination URLs.

Your content inventory should include only pages that marketers will actively try to improve.

Last edited by @hesh_fekry 2023-11-14T16:25:18Z