There may not be a clear distinction of whether web crawling is legal or illegal based on the existing laws in different jurisdictions. But court rulings have offered guidance, which is pegged on the fact that web crawling involves public data.
By definition, public data is any information that is not bound by any law or legal restriction and can therefore be used, reused, distributed, and redistributed by anyone. However, according to the court rulings, public data may not be used or redistributed under certain conditions. So, how does this impact web crawling, and is web crawling legal? But first, let’s explore what is a web crawler as well as what web crawling is.
What is a web crawler?
Web crawling is the process by which web-based companies such as search engines and content aggregators discover new web pages and recently uploaded content. It entails using a web crawler, also known as a spider, to crawl the web by following links included on web pages.
In the case of search engines and aggregator sites, the spiders collect and organize the data in indexes (databases) that facilitate future retrieval. On the other hand, when web crawlers are used alongside web scrapers (bots that automatically collect data from websites and convert it into a structured format for download by humans), the former category of bots discovers new web pages from which data can be retrieved. If you want to know more, Oxylabs wrote an article on the topic.
Regardless of the application of web crawlers and web crawling, the glaring similarity is the bots collect data from websites. So, does this constitute an infringement of copyright laws, and does it breach the site’s terms and conditions, which are legally binding? To answer this question, let’s look at some court rulings.
Court rulings on data extraction from websites
In the United States, LinkedIn lost a web scraping appeal against HiQ Labs in 2019. This followed a court of appeals ruling that the country’s Computer Fraud and Abuse Act (CFAA) does not prohibit the extraction of publicly accessible data on the internet. The court’s ruling suggested that CFAA only applies in situations where a company has bypassed authentication requirements in order to scrape data hidden behind login pages, for example. The US Supreme Court has since revived the case.
In a 2015 case involving Ryanair Ltd and PR Aviation BV, the Court of Justice of the European Union noted that intellectual property rights did not apply to the scraped data. This is because the data did not result from any creative input. However, the court noted that website owners could restrict the reuse of scraped data by including such a clause in their terms and conditions.
Earlier in 2013, the UK Supreme Court in the NLA v Meltwater ruling noted that a news aggregator site’s use of headlines scraped from news websites amounted to copyright infringement. This, the court noted, was due to the fact that the headlines resulted from a creative process.
It is worth pointing out that although these rulings offer guidance, there is still no law that makes web crawling illegal. But it becomes unlawful when certain conditions are breached. So, is web crawling legal? Yes, when certain conditions are fulfilled.
The legality of web crawling
As stated, web crawling is legal if it satisfies the following conditions:
⦁ It follows the stipulations outlined in the terms and conditions
⦁ That it does not breach copyright/intellectual property laws by facilitating the use or reuse of content that resulted from a creative process/required creative input
⦁ It does not bypass authentication or login pages to gain unauthorized access to specific pages
In this regard, if you intend to undertake web crawling, it is important to read the terms and conditions. It is also equally crucial for you to program the web crawler to read the robots.txt file. Notably, this file contains instructions for bots. It outlines the specific pages that the spider can crawl as well as links it can follow. This means that the crawler should not follow links that are not listed in the file.
At the same time, the web crawler should not access, collect, or archive copyrighted material.
Disclaimer: This section and article do not constitute legal advice
Web crawling is integral to the workings of the internet. Without it, some companies as we know them may not exist. For this reason, it is correct to say that web crawling or the use of web crawlers is legal. After all, businesses such as search engine companies that are regulated in their respective countries are still operational. Secondly, no law outrightly criminalized web crawling.
But businesses should be aware of conditions that, once breached, the practice ceases to be legal and qualifies as illegality. These conditions include breaching terms and conditions, contravening copyright laws, and bypassing authentication requirements. That said, we reiterate that this crawlinfo.com article does not constitute legal advice but rather is only meant to guide your web crawling journey.