Лайфхаки

Маленькие, полезные хитрости

20+ Best Rotating Proxies for Web Crawling & Scraping. How do proxies and scrapers work together?

17.06.2023 в 10:45

20+ Best Rotating Proxies for Web Crawling & Scraping. How do proxies and scrapers work together?

Before we answer that question, first we should define what proxies are.

Proxies are IP addresses that are part of a network. There a variety of types that we will go over later in this article. The proxies work as a gateway providing you with anonymity. The server will see the proxy IP address, but not your own.

When you access a website, you send out a request from your own IP address to the website’s server. When you scrape, the tool can send out hundreds of those requests every second to the website’s server. Once it sees all those requests, the server will think that it is being DoSed and will block the IP address that is sending out the requests. In simpler terms, you will scrape less than a second if using your own IP address.

This is where proxies come into play. Since they provide the anonymity and hide your original IP address from the destination server, you can scrape longer without getting detected.

What types of proxies are best for scraping?

There are several types of proxies that you can use, and each one has its pros and cons.

Datacenter proxies are IP addresses that providers purchase from datacenters and resell them as proxies. They usually come sequentially and in bulk. They can be used for scraping, but since they are datacenter proxies, there is a great chance that they are already marked as such. That means that strict websites will already have them blacklisted, and you will find yourself in a tight spot.

Residential and mobile proxies are very similar. Residential proxies are IP addresses from real people’s home internet connections. On the other hand, mobile proxies are IP addresses from connections of mobile networks – 3G and 4G. We still haven’t been able to find 5G proxies.

Compared to the datacenter proxies, the residential or mobile is a better choice because they come from real IP addresses from other people’s connections, so the chances of them being flagged as proxies are very small.

Regardless of which proxies you would go for, make sure to get rotating proxies. Using a proxy is one thing, but the type and how you use them can make a huge difference.

Rotating proxies mean that you can set up the proxies to rotate at a specific interval. Some are flexible, while the provider preconfigures others.

The advantage that rotating proxies have over static ones is the fact that they can rotate with every new request. If they do that, each request will be with a different IP address, and the web site’s server will think that a new person is making the request. Static proxies will work as your own home IP address and will send all requests from the same proxy address.

Rotating Proxy buy. Frequently Asked Questions

Please read ourif you have questions that are not listed below.

    What is a rotating datacenter proxy?

    A rotating datacenter IP is a type of proxy that changes its IP for each session provided by your Internet Service Provider (ISP). With such a proxy you can send multiple requests from rotating datacenter IPs to one site and get multiple responses. When you buy a rotating datacenter IP you can arrange IPs rotation with a single proxy server.

    How to use a rotating datacenter ip address proxy?

    When you buy backconnect datacenter proxy servers you need to specify a range of IPs available to you from your datacenter rotating ip address provider. Make sure you read the manual to your service bot carefully when you use rotating proxies inside a bot.

    Can I use your datacenter proxies for scraping safely?

    We, as your rotating proxies provider, can assure you that you can use our datacenter proxies safely inside any software tools for scraping. Our account managers provide round-the-clock support to make sure that your satisfaction with our proxy performance is at the highest possible level.

    Is it safe for me to buy rotating datacenter proxy servers only from a single datacenter backconnect proxies provider?

    Absolutely, we provide round-the-clock support of our rotating datacenter proxies services and our account manager is ready to handle all issues related to our datacenter IP performance when necessary. Also, our proxy network is vast and stable, so you will be able to find a solution for your specific case easily.

    Why is a rotating datacenter proxy better than a regular datacenter proxy?

    Sometimes, websites manage to track your datacenter proxy activity and eventually restrict access from your IP. It becomes possible if a query detects your proxy on the target server side, or if you exceed the data request limit. We constantly grow our proxy network to avoid such cases, and include both datacenter and commercial IPs. Our rotating proxies backconnect into new IPs during the timeframe that you are able to set in your proxy dashboard.

    How do I upgrade to more proxies?

    You can consult our account support manager to provide you with additional IPs. This is an easy procedure and you will see your new proxies in your dashboard in no time.

    Why do you need to rotate your IPs?

    A datacenter proxy routes your connection to a random IP address in the IP pool. You have a range of IPs for each proxy, which significantly reduces the chances of getting detected during your sessions.

    Can I have a free trial period on my rotating proxies services before I buy the rotating proxies?

    Yes, we provide you with a two-day free trial period before you buy a rotating datacenter datacenter proxy server to get absolutely confident in the quality of our services. If you have any questions, our managers will guide you through the process of datacenter IP setup and initialization.

    How can I be sure that I get the best price for my datacenter proxies?

    We are aware of the pricing policy of all rotating datacenter IPs providers and are ready to offer you the best price when you buy datacenter rotating proxy servers considering their price/performance ratio.

    How many simultaneous connections/threads do your proxies feature?

    How many rotating datacenter proxies can you provide?

    We have a whole range of plans depending on your current needs. Please consult with our account manager to find our, which will be right for your particular case.

    What if I need proxies from a specific country?

    In this case, we are ready to provide you with proxies from various geographic locations. We have rotating datacenter proxies from most European, American and Asian countries. Please contact our support to find out about the availability of proxies in the country of your needs.

Rotating Proxy scrapy. Scrapy with a Rotating Tor Proxy

This post shows an approach to using a rotating Tor proxy with Scrapy.

I’m using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over time too, so I’m using the Tor network.

Setup

I’ve got the following in thesettings.pyfor my Scrapy project:

DOWNLOADER_MIDDLEWARES = { 'rotating_proxies.middlewares.RotatingProxyMiddleware' : 610 , 'rotating_proxies.middlewares.BanDetectionMiddleware' : 620 , } ROTATING_PROXY_LIST_PATH = 'proxy-list.txt' ROTATING_PROXY_PAGE_RETRY_TIMES = 5

This (1) specifies where the package middleware fits into the pipeline for processing requests and (2) points to a file,proxy-list.txt, which contains a list of proxies. There are other settings for the package, but they are not important right now.

Proxy List

The contents of

So I’m running four local proxies. How? Well, with Docker, of course!

The scrapy-rotating-proxies package ensures that

  • requests are sent out via these proxies and
  • the proxies are used in rotation, so that consecutive requests use distinct proxies.

The reason for rotating through a list of proxies is to ensure that at any given time there are multiple proxies (each with a different IP address) available for sending requests.

Tor Proxies

In order to access a truly diverse set of IP addresses I’m tapping into the Tor network via the pickapp/tor-proxy Docker image.

Using Docker Compose it’s easy to spin up a cluster of Tor proxies. This is mydocker-compose.yml:

There are four services defined, each of which maps port 8888 on the container to a specific host port (a sequence of ports starting at 9990 and corresponding to the ports listed inproxy-list.txt).

CONTAINER ID IMAGE PORTS NAMES 98feb5a034e6 datawookie/tor-privoxy 0.0.0.0:9990->8888/tcp tor-bart 26f05b1deb17 datawookie/tor-privoxy 0.0.0.0:9991->8888/tcp tor-homer b856ded83585 datawookie/tor-privoxy 0.0.0.0:9992->8888/tcp tor-marge c352aea63eed datawookie/tor-privoxy 0.0.0.0:9993->8888/tcp tor-lisa

Setting the

To make this setup more flexible I have a script,create-proxies, which generates the contents ofproxy-list.txtanddocker-compose.yml.

If I want to add or remove proxies then I simply edit theNAMESlist, run the script again, restart Docker Compose and voila!

Results

This is what an extract from the crawler logs looks like:

Proxies(good: 0, dead: 0, unchecked: 4, reanimated: 0, mean backoff: 0s) Proxy is GOOD Proxy is GOOD Proxies(good: 2, dead: 0, unchecked: 2, reanimated: 0, mean backoff: 0s) Proxy is GOOD Proxies(good: 3, dead: 0, unchecked: 1, reanimated: 0, mean backoff: 0s) Proxies(good: 3, dead: 0, unchecked: 1, reanimated: 0, mean backoff: 0s) Proxies(good: 3, dead: 0, unchecked: 1, reanimated: 0, mean backoff: 0s) Proxy is GOOD Proxies(good: 4, dead: 0, unchecked: 0, reanimated: 0, mean backoff: 0s)

The addresses for the proxies are fixed (sampled from the list inproxy-list.txt). However, the each Tor proxy refreshes its exit node every minute. Here are the logs from a slightly updated version of the Tor proxy Docker image:

HUP → Tor. exit IP: 109.70.100.50. HUP → Tor. exit IP: 31.7.61.190. HUP → Tor. exit IP: 178.20.55.18. HUP → Tor. exit IP: 185.220.102.242. HUP → Tor. exit IP: 109.70.100.51.

This is happening for each of the proxies, so requests effectively are being sent from a constantly changing set of IP addresses. Good way to stay below the radar!

Rotating Proxy Scrapy. Scrapy with a Rotating Tor Proxy

This post demonstrates an approach to using a rotating Tor proxy with Scrapy.

I'm using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over time too, so I'm using the Tor network.

I'm running four local proxies. How? Well, with Docker, of course!

The scrapy-rotating-proxies package ensures that

The reason for rotating through a list of proxies is to ensure that at any given time there are multiple proxies (each with a different IP address) available for sending requests.

In order to access a truly diverse set of IP addresses I'm tapping into the Tor network via the pickapp/tor-proxy Docker image.

Setting the Tor proxy:

docker run -d --name tor-proxy -p 9050:9050 pickapp/tor-proxy

This is what an extract from the crawler logs looks like:

INFO: Starting new request: GET https://example.com INFO: Rotating proxy: 192.168.1.1 INFO: Starting new request: GET https://example.com INFO: Rotating proxy: 192.168.1.2 INFO: Starting new request: GET https://example.com INFO: Rotating proxy: 192.168.1.3 INFO: Starting new request: GET https://example.com INFO: Rotating proxy: 192.168.1.4

This is happening for each of the proxies, so requests effectively are being sent from a constantly changing set of IP addresses. Good way to stay below the radar!

Rotating Proxy это. Rotating Proxy для Xevil : что это такое, как применить и как обойти бан.

Сегодня речь пойдет о rotating proxy, что это такое, где взять и как правильно использовать в Xevil и Hrefer.

Любители

Rotating proxy — это такой вид прокси, который при каждом запросе меняет внешний IP, т.е. указав в софте один IP, например 127.0.0.1:23000, на выходе будет случайный.

Live demo на примере wget, с помощью которого мы запрашиваем свой внешний адрес у сервиса http://ifconfig.co не меняя прокси выглядит так (стрелкой отмечен IP отдаваемый сервисом, ну и сам адрес прокси я затер)

Rotating Proxy это. Rotating Proxy для Xevil : что это такое, как применить и как обойти бан.

Однако у таких проксей есть серьезный минус, который ограничивает их сферу применения, и как следствие — не все провайдеры прокси предоставляют  такую опцию — использование Rotating proxy их для серфинга очент проблематично т.к. реальный браузер открывает кучу соединений и в результате все ресурсы такие как CSS, скрипты и картинки будут тоже загружены через прокси, причем в нашем случае через разные. А вот для для случаев, когда нужно послать один запрос, например при парсинге, или, как выяснилось при распознавании капчей — такое решение в самый раз, т.к. Xevil обновлять прокси автоматом по ссылке не умеет и постепенно списки протухают.

Например на основом рабочем сервере у меня вышла такая картина (это примерно 20 дней работы)

Однако не торопитесь бежать и вставлять прокси в Xevil т.к. есть один важный ньюанс, который может похоронить идею на корню — если прокси оказался плохой, то в процессе работы он будет забанен на 30 минут, но т.к. адрес прокси у нас один — то работа программы станет невозможной, т.к. все прокси в виде одного будут забанены!

Обойти это ограничение несложно, достаточно поставить права «только для чтения» для файла «Modules\ReCaptcha2\BannedProxies.csv» и проблема будет решена.

Правда это вызывает ошибку Xevil при закрытии, но на работоспособности программы это не сказывается.

Аналогичным образом такие прокси можно использовать в Hrefer или каких то других программах-парсерах.