Quite frankly, when I got started —mostly trying out some scraping libraries and pieces of code I copped from Github scraping repositories — I wasn’t aware of that guideline at all. Soon enough I was hitting up the users on r/webscraping and other forums with pleas to check why am I failing to get the data that seems accessible on the websites I was targeting.
But, to my defense, as time passed, I discovered this was not entirely common knowledge, and like myself, there were many others who were crashing like waves on the walls of the website’s request rate limits, IP tagging, and more mechanisms.
So now, no longer a newbie, I feel it’s best to help others speed up the process and get to the top of the mountain faster. That is, scraping data from websites at scale.
