Google quietly updated their Google Search Central documentation to note that they are now indexing .csv files.
This opens up a new way to get crawled or if a publisher doesn’t want their .csv files crawled, it may mean updating robots.txt to exclude those files.
Comma-separated values (CSV) files are text files that save data in a tabular format that can be displayed as a spreadsheet.
CSV files contain data in plain text, which means that the CSV files do not contain style elements like fonts nor does it contain images or active links.
They are useful for doing things like uploading a list of URLs for crawling to software like Screaming Frog.
But they are also useful for organizing data in a spreadsheet.
Google’s ability to index CSV files is a new functionality because a “filetype” search on Google for CSV files does not currently return CSV files.
Searches like the following currently do not return CSV files:
filetype:csv site:.gov
filetype:csv site:.edu
filetype:csv site:.com
Something curious about the indexing of CSV files by Google is that Google’s Dataset search appearance already used CSV files but apparently only when described with structured data.
Dataset structured data documentation on Google’s old Developer documentation (viewable on Archive.org) states that CSV files are an acceptable standard for appearing in dataset search features.
The use of tabular data as a search appearance goes back to 2018, when Google announced that they would be showing that kind of data in search when the data is accompanied with structured data.
According to the original documentation:
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats are provided as structured data…
Here are some examples of what can qualify as a dataset:
A table or a CSV file with some data
An organized collection of tables
A file in a proprietary format that contains data
A collection of files that together constitute some meaningful dataset
A structured object with data in some other format that you might want to load into a special tool for processing
Images capturing data
Files relating to machine learning, such as trained parameters or neural network structure definitions
Anything that looks like a dataset to you”
Google updated the above documentation in 2022 and redirected it to the new Search Central Documentation.
The updated documentation makes it clearer that Google relies on the structured data to use CSV files in their dataset search appearance.
But will this change mean that Google will eventually crawl CSV files and use those for search appearances (in addition to tabular data notated in structured data)?
This is what the current documentation explains today:
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats as structured data.
Google’s approach to dataset discovery makes use of schema.org and other metadata standards that can be added to pages that describe datasets…
Here are some examples of what can qualify as a dataset:
A table or a CSV file with some data…”
The definition of a core algorithm update is when Google makes “significant” and “broad changes” to their core algorithm.
It may be a coincidence that the indexing of CSV files and the core algorithm update happened at virtually the same time.
But it may bear considering whether Google has improved their crawling engine to be able to index CSV or if that capability was already there.
Read the updated list of a indexable file types:
File types indexable by Google
Read Google’s Search Central Dataset Documentation:
Dataset (Dataset, DataCatalog, DataDownload) structured data
Featured image by Shutterstock/Jane Kelly