Лайфхаки

Маленькие, полезные хитрости

Top 7 proxy Solutions for web scraping. Pricing Evaluation

02.05.2023 в 08:06

Top 7 proxy Solutions for web scraping. Pricing Evaluation

For web scraping, bandwidth proxies can grow the bill really quickly and should be avoided if possible. Let's take a look at some usage scenarios and how bandwidth proxies would scale:

targetavg document page sizepages per 1GBavg browser page sizepages per 1GB
Walmart.com16kb1k - 60k1 - 4 MB200 - 2,000
Indeed.com20kb1k - 50k0.5 - 1 MB1,000 - 2,000
LinkedIn.com35kb300 - 30k1 - 2 MB500 - 1,000
Airbnb.com35kb30k0.5 - 4 MB250 - 2,000
Target.com50kb20k0.5 - 1 MB1,000 - 2,000
Crunchbase.com50kb20k0.5 - 1 MB1,000 - 2,000
G2.com100kb10k1 - 2 MB500 - 2,000
Amazon.com200kb5k2 - 4 MB250 - 500

In the table above, we see example bandwidth usage estimations for several popular web scraping targets.

Note that bandwidth used by web scrapers varies wildly based on scraped target and web scraping technique.For example, reverse engineering websites behavior and grabbing only the data document details will use significantly less bandwidth than using automated browser solutions like Puppeteer, Selenium or Playwright. So, for browser-based scraping bandwidth-based pricing can be very expensive.

Finally, all estimations should be at least doubled to consider the retry logic and other usage overhead (like session warm up, and request headers).For example, let's say we have a $400/Mo plan that gives us 20GB of data. That would only net us ~50k Amazon product scrapes at best and only few hundred if we use a web browser with no special caching or optimization techniques.

Conversely, bandwidth proxies can work well with web scrapers that take advantage of AJAX/XHR requests.For example, the same $400/Mo plan of 20GB data would yield us ~600k walmart.com product scrapes if we can reverse engineer walmart's web page behavior, which is a much more reasonable proposition!

Bandwidth-based proxies usually give access to big proxy pools, but it's very rare for web scrapers to need more than 100-1000 proxies per projects. For example, if we use 1 proxy at 30req/minute to scrape a website at 5000req/minute we only require 167 rotating proxies!

Proxy count based pricing is often a much safer and easier pricing model to work with. Buying a starter pool of private proxies (only accessible to a single client or very small pool of clients) is an easier and safer commitment for web scraping projects.