How to Search Through the Source Code of the Entire Website

Ahrefs Site Audit, also available as part of the free Ahrefs Webmaster Tools, allows you to search through the raw HTML code or the JS-rendered code across all crawled pages of the website.

This feature is particularly useful when you need to verify analytics tags, identify pages that call certain scripts or stylesheets, detect unwanted injections into the page code, or research the competitors’ technologies.

It is important to understand that in the era of JavaScript-powered websites, the page code can exist in two forms:

Raw (Source): the HTML code before any JavaScript on the page has been executed. This is what you see using the “View Page Source” feature in the browser.

Rendered: the final HTML code after being altered/generated by JavaScript. It is visible in the “Inspect” mode in the browser.

The source and rendered versions can be significantly different, so it’s important to ensure you’re searching through the correct version of the page code.

How to search through the rendered code of the pages

If you need to search through the JS-rendered HTML code of all the pages on the website, run a crawl in Site Audit or Ahrefs Webmaster Tools. Ensure that the “Execute JavaScript” option is activated in the crawl settings.

Once the crawl is complete, go to the Page Explorer and access the Advanced filter. Select ‘Page source’ followed by ‘Contains’ from the dropdown menu. Then, enter the specific piece of code you are searching for.

The example above finds all pages on our blog that that contain an embedded table.

How to search through the raw HTML of the pages

Searching through the raw HTML (also called source HTML) requires a few extra actions:

1. Disable JavaScript rendering in the crawl settings

2. Ensure discoverability of all pages by the crawler.

This is crucial for websites where page content (including the internal links) is generated via JavaScript, as AhrefsSiteAudit bot may not automatically discover all pages via raw HTML code.

That’s why you need to supply the Site Audit tool with a list of input URLs that we call “Seeds.”

The easiest way to do that is to make sure that the Sitemaps are used in the “URL Sources.” If that’s not feasible, use the Custom URL list.

When the crawl is finished, use the advanced filter to search through the source code of all crawled pages.

Source link

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

Stay up to date
Register now to get updates on promotions and coupons

Shopping cart

×