Python crawler tutorial: How to get proxy IP using Python

JasonBen 2024-08-16 16:36

When doing Python crawlers, a large amount of data needs to be extracted, and in order to bypass the restrictions of the website, many developers will use proxy IP. Proxy IP can hide the real IP address of the user, improve the speed of web crawling, and avoid being tracked or blocked by the target website. The following article will introduce how to get a proxy IP using Python and how to use proxy IP in Python crawler.

Preparatory work

Before you begin, do some preparatory work. First, install Python on your system, you can download Python from the official website. Then, you need to ensure you have installed the Requests library, which supports accessing web resources through proxies. If it is not installed, you can install it using the following command:

pip install requests

How to get a proxy IP using Python

Before using proxy IP, you need to find a reliable proxy IP source. There are many websites that provide free or paid proxy IP services. You can directly use free proxy IP, however, free proxy IP is less stable and may be slower. It is recommended that you choose a high-quality proxy IP service provider, and you can get proxy IP through the API provided by these service providers. the following is the sample code:

import requests  

def get_proxy_ips():  

    url = 'https://www.ipxproxy.com'  # Proxy IP URL of the website  

    response = requests.get(url)  

    if response.status_code == 200:  

        proxy_ips = response.json()  # Assume that the returned data is in JSON format  

        return proxy_ips  

    else:  

        return []  

proxy_ips = get_proxy_ips()  

print(proxy_ips)

Verify the validity of the proxy IP

After getting the proxy IP, we need to verify its availability. A common way to verification is by sending an HTTP request to the target website to see if we can successfully get a response. Below is the sample code:

import requests  

def check_proxy(proxy):  

    url = 'http://ipxproxy.com'  # The URL of the target website

    try:  

        response = requests.get(url, proxies=proxies, timeout=5)  

        if response.status_code == 200:  

            return True  

except Exceptions as e:

Print(“Failed to verify proxy IP: ”，e) 

    return False  

If check_proxy(proxy)

       print(“Proxy IP available”)

Else:

       print(“Proxy IP is not available, need to get it again”)

Setting up proxy IP in Python

Using a proxy IP address for web crawling allows you to better cope with the anti-crawling strategies of the target website. To set the proxy IP, you need to pass the proxy IP and port number to the proxy parameter of the requests library. Below is the sample code:

import requests

# Define the proxy IP address and port

proxies = {

  "http": "http://gate3.ipxproxy.com:7778",

  "https": "http://gate3.ipxproxy.com:7778",

}

# URL of the target website

url = 'http://ipxproxy.com'

try:

    # Using proxies to access websites

    response = requests.get(url, proxies=proxies)

    # Print page content

    print(response.text)

except Exception as e:

print(f"Error accessing {url} through proxy: {e}")

If your proxy server requires authentication, you also need to include a username and password in the proxy address in the following format:

proxies = {

  "http": "http://IPX10000_custom_zone_GB_st_1855_city_13695_sid_62601125_time_90:[email protected]:7778",

  "https": "https://IPX10000_custom_zone_GB_st_1855_city_13695_sid_62601125_time_90:[email protected]:7778",}

With the above steps, I believe you already know how to get proxy IP using Python. Using proxy IP when crawling data can help us better deal with some network access restrictions or anti-crawler strategies. For better Python crawling, it is recommended to use a rotating proxy IP, which can dynamically change your IP address to avoid being blocked by the target website.

Grow your business

70 million residential IPs in 230+ countries/regions around the world