I'm scraping house rental listings with Selenium using Firefox on a Windows 7 box and have run into a captcha on a website where I have three pages to traverse. Some searching tells me that one of the many techniques I'll have to implement in order to avoid detection is to change the user agent for each request.
I have found Python code on stackoverflow posted as recently as five months ago for doing the same sort of thing in Google Chrome using the following user agent string:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
But if I go to https://www.whatsmyua.info/ from Firefox on my desktop I see the user agent string is:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0
And if I do the same thing using the Tor browser I get this:
Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0
Those two user agent strings don't resemble the Python-style code at all!
Wouldn't putting reference to Windows, AppleWebKit, Chrome and Safari into the user agent string be an obvious tipoff to the bot detector that this is no ordinary browser access?
Related
I'm trying to make a file download app using vb.net but when i debug the app and press the button i see this error. How can i solve this problem ?
MY CODE :
ERROR :
Looks like thinkbroadband's server doesn't like serving that file to things it considers bots/not real browsers. You'll have to mimic a real browser instead:
Dim x As New WebClient
x.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.45")
x.DownloadFile("http://download.thinkbroadband.com/10MB.zip", "c:\temp\a.zip")
How did I know this? I opened my browser developer tools (F12), downloaded the file OK in the browser and looked at what headers were sent for an OK result, then pared it down to just the important one (useragent)
For error reporting etc I would like to know the version of chromium used by a chromium based web browser. Can I find that somewhere in the users web browser?
execute "navigator.userAgent" in javascript
it will return a string similar to
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
with Chromium as 3rd
Thanks for suggestions about using navigator.userAgent. They might be useful (not sure), but a more safe way seems to use something like
if (typeof chrome === "undefined") {
// Not chromium based, probably Firefox then
}
Any comments on this? #Asesh?
I am trying to identify and do some study on what user agent is sending request to our application.
when i execute the request in chrome i see that it says
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
when i execute it from safari it says
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Safari/602.1.50
when u execute it from mozilla it says
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:49.0) Gecko/20100101 Firefox/49.0
Why does it say "Mozilla/5.0" for all requests and for chrome specifically it lists all mozilla, chrome and safari.
Can any one please explain why is this the case? Thanks.
User agent strings are a mess, and it is very difficult to write code to get accurate information from them. (See this link for a detailed history of why).
If you just want to gather statistics, try adding Google Analytics to your front end, or use a library that specializes in parsing user agent strings on your back end, such as MobileDetect for PHP.
If you want to do something different depending on what features the browser supports, try detecting support for that feature instead of the browser version. Modernizr is great for this.
I'm coding a Qt Quick app that contains a small web-page view and I can't find any settings for WebKit 3.0 in the QtWebKit or QML QtWebView documentation.
Q1 - How do I enable flash plugin?
Q2 - How do I set a different User Agent string?
Q3 - How do configure disk cache and cookie cache?
Changing the User Agent string
For changing the user agent string you could use this QtWebKit.experimental extension
import QtWebKit.experimental 1.0
...
// and add this line in your WebView
experimental.userAgent:"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36"
See this post
But like you, I haven't found out yet about:
How to enable flash plugins?
How to configure disk cache? ...
Why in the world is this the webkit user agent:
Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.27+ (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27
Why not:
Webkit/5.04 (Windows NT 6.0; WOW64) AppleWebKit/534.27+ (KHTML, like Gecko) Safari/533.20.27
Thanks for clearing this up :)
It really is a left over from the early days of Web. Many sites were only compatible with Netscape Navigator, which was the dominant browser at that point, and so what they did was to sniff the User-Agent for the "Mozilla/*" part. When IE showed up, MS wanted those websites to work in their browser as well, so they went for pretending they are Mozilla as well. And so did all the browsers that popped up later on, including WebKit-based ones. And it doesn't seem like that artifact is going away anywhere soon as still many old sites do that type of sniffing and for browsers dropping this convention would probably mean breaking thousands of sites.
It's a throwback to the browser wars, the browser is identifying as a 'Mozilla Compatible' agent.