When doing Python crawlers, a large amount of data needs to be extracted, and in order to bypass the restrictions of the website, many developers will use proxy IP. Proxy IP can hide the real IP address of the user, improve the speed of web crawling, and avoid being tracked or blocked by the target website. The following article will introduce how to get a proxy IP using Python and how to use proxy IP in Python crawler.
Preparatory work
Before you begin, do some preparatory work. First, install Python on your system, you can download Python from the official website. Then, you need to ensure you have installed the Requests library, which supports accessing web resources through proxies. If it is not installed, you can install it using the following command:
pip install requests
How to get a proxy IP using Python
Before using proxy IP, you need to find a reliable proxy IP source. There are many websites that provide free or paid proxy IP services. You can directly use free proxy IP, however, free proxy IP is less stable and may be slower. It is recommended that you choose a high-quality proxy IP service provider, and you can get proxy IP through the API provided by these service providers. the following is the sample code:
import requests
def get_proxy_ips():
url = 'https://www.ipxproxy.com' # Proxy IP URL of the website
response = requests.get(url)
if response.status_code == 200:
proxy_ips = response.json() # Assume that the returned data is in JSON format
return proxy_ips
else:
return []
proxy_ips = get_proxy_ips()
print(proxy_ips)
Verify the validity of the proxy IP
After getting the proxy IP, we need to verify its availability. A common way to verification is by sending an HTTP request to the target website to see if we can successfully get a response. Below is the sample code:
import requests
def check_proxy(proxy):
url = 'http://ipxproxy.com' # The URL of the target website
try:
response = requests.get(url, proxies=proxies, timeout=5)
if response.status_code == 200:
return True
except Exceptions as e:
Print(“Failed to verify proxy IP: ”,e)
return False
If check_proxy(proxy)
print(“Proxy IP available”)
Else:
print(“Proxy IP is not available, need to get it again”)
Setting up proxy IP in Python
Using a proxy IP address for web crawling allows you to better cope with the anti-crawling strategies of the target website. To set the proxy IP, you need to pass the proxy IP and port number to the proxy parameter of the requests library. Below is the sample code:
import requests
# Define the proxy IP address and port
proxies = {
"http": "http://gate3.ipxproxy.com:7778",
"https": "http://gate3.ipxproxy.com:7778",
}
# URL of the target website
url = 'http://ipxproxy.com'
try:
# Using proxies to access websites
response = requests.get(url, proxies=proxies)
# Print page content
print(response.text)
except Exception as e:
print(f"Error accessing {url} through proxy: {e}")
If your proxy server requires authentication, you also need to include a username and password in the proxy address in the following format:
proxies = {
"http": "http://IPX10000_custom_zone_GB_st_1855_city_13695_sid_62601125_time_90:[email protected]:7778",
"https": "https://IPX10000_custom_zone_GB_st_1855_city_13695_sid_62601125_time_90:[email protected]:7778",}
With the above steps, I believe you already know how to get proxy IP using Python. Using proxy IP when crawling data can help us better deal with some network access restrictions or anti-crawler strategies. For better Python crawling, it is recommended to use a rotating proxy IP, which can dynamically change your IP address to avoid being blocked by the target website.