List Of User Agents For Scraping

Daniel - August 9, 2021

List Of User Agents For Scraping

Tutorials

Spoofing user agents is a must if you have to scrape data successfully on websites. In this post, we will show you the list of user agents for scraping and how to use them to protect your scraper bots from web server bans.

What Is A User Agent?

A user-agent is a string of text included in the headers of requests sent to web servers. A webserver uses details in the user agent to identify the device type, operating system version, and the browser used.

Example: Windows 10 with Google Chrome

user_agent_desktop = ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) ‘\

Find the perfect Proxy Product.

Proxyrack offers a multiple options to suit most use cases, if you are unsure our 3 Day Trial allows you to test them all.

Security

Residential proxies

Never get blocked, choose your location

View all option available
Vault

Datacenter proxies

Super fast and reliable

View all option available
Try

3 Day Trial

Test all products to find the best fit

View all option available

‘AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 ‘\

‘Safari/537.36’

The user string tells the web server you’re browsing with Mozilla browser on a Windows 10, 64bit device; The website server uses this information to tailor its response to suit your device type, Operating system, and browser in the format below:

User-Agent: Mozilla/<version> (<system-information>) <platform> (<platform-details>) <extensions>

Reasons For Spoofing User Agent

Web servers can identify browsers, web scrapers, download managers, spambots, etc. because they have unique user-agent strings. For that reason, most antibot websites can identify and ban a web scraper based on its user-agent string.

Web scrapers, spambots, download managers, etc., use fake user-agent strings that give them legitimate identities by using user strings belonging to popular browsers. This process of changing user string is known as user string spoofing.

Therefore, changing or spoofing your user agent is the only way to scrape data successfully from antibot websites.

List Of User Agents For Scraping

Here is a list of top PC-based user agents:

  1. Windows 10/ Edge browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246

  2. Windows 7/ Chrome browser: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36

  3. Mac OS X10/Safari browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9

  4. Linux PC/Firefox browser: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1

  5. Chrome OS/Chrome browser: Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36

How To Change User Agent Using Python Requests

For data scraping, the best user agents are user agent strings belonging to a real browser. Thus, to change web scraper user agent using python request, copy the user string of a well-known browser (Mozilla, Chrome, Edge, Opera, etc.), and paste it in a dict with the key ‘user-agent’ e.g.

headers = {“User-Agent”:”Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36″}

To test if your web scraper is sending the right header, send a request to HTTPBin in the format below:

r = requests.get(‘http://httpbin.org/headers’,headers=headers) pprint(r.json())

However, some web servers can detect that you’re using a bot because the following headers will be missing (also missing from the request sent to HTTPBin)

  • */* replacing user agent string

  • Accept-Language

  • Dnt

  • Upgrade-Insecure request.

Thus, for a successful scraping, your user string should include the missing headers above; example:

headers = {

“Accept”: “text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image

apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9″,

“Accept-Encoding”: “gzip, deflate”,

“Accept-Language”: “en-GB,en-US;q=0.9,en;q=0.8”,

“Dnt”: “1”,

“Host”: “httpbin.org”,

“Upgrade-Insecure-Requests”: “1”,

“User-Agent”: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36”,}

To test if your bot is sending the right user string, send requests to HTTPBin in the format below:

r = requests.get(‘http://httpbin.org/headers’,headers=headers) pprint(r.json())

How To Rotate Your User Agent Using Proxies

Following the above steps and instructions will successfully spoof your user agent to that of a real browser. However, the webserver can ban your webserver if it sends and receives a large volume of data requests humanly impossible per minute. Hence, to prevent an IP address ban, you should rotate your user agent using rotating proxies and a list of user agents belonging to real browsers.

To rotate a user agent using proxies,

  • Have a collection of user agents from popular web browsers

  • Create a python list of user agent

  • Program your web scraper to choose a user agent string from the python list

  • Replace the exit IP address using rotating proxies

For the best result, you should source rotating proxies from ProxyRack because we have a large collection of rotating proxies worldwide.

Also, rotate each user-agent with all headers associated with the user-agent string, as mentioned in examples above, to prevent the webserver from identifying your web scraper as a bot.

Note: Always remember to delete any header starting with ‘X’ in ‘HTTPBin’ because it is generated by HTTPBin as a load balancer.

Conclusion

Successfully scraping websites using web scrapers depends on how well you can spoof user agents and the type of proxies you use. Therefore, you should:

  1. Make sure you’re using the right user string for the headers you’re using

  2. Organize your headers in the right order as used by the browser whose user string you’re spoofing because most websites using sophisticated antibot tools can detect that you’re using a bot if your headers are not arranged orderly.

  3. Add a Referer header to the user string you’re using to make it authentic

  4. Don’t keep cookies or log into the website you’re scraping to prevent the webserver from identifying you based on past activities.

  5. Get premium rotating proxies from ProxyRack, especially if you intend to scrape a large volume of data.

    Find the perfect Proxy Product.

    Proxyrack offers a multiple options to suit most use cases, if you are unsure our 3 Day Trial allows you to test them all.

    Security

    Residential proxies

    Never get blocked, choose your location

    View all option available
    Vault

    Datacenter proxies

    Super fast and reliable

    View all option available
    Try

    3 Day Trial

    Test all products to find the best fit

    View all option available

Related articles

Daniel - May 9, 2022

How To Create A Custom SEO Tool

Tutorials
Read Article

Daniel - May 9, 2022

Best Proxies For Enterprises

Tutorials
Read Article

Daniel - April 7, 2022

What Is Autofill Magic?

Tutorials
Read Article

Daniel - April 1, 2022

What Is AdsPower Browser?

Tutorials
Read Article

Get Started by signing up for a Proxy Product

View Plans