I'm trying to extract not just the browser and its version number but also the rendering engine and its version number from common user-agent strings. Most browsers report this just fine, e.g.:
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)"
"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.5.12"
Safari also reports the WebKit version number, but it seems to do so twice. Here's my own UA:
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9.1"
In this case, it seems that one is just more detailed than the other.
But when I look at databases of Safari UA strings, e.g. useragentstring.com, the two versions are often entirely different.
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532+ (KHTML, like Gecko) Version/4.0.2 Safari/530.19.1"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ) AppleWebKit/525.28.3 (KHTML, like Gecko) Version/3.2.3 Safari/525.29"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/530.19.2 (KHTML, like Gecko) Version/4.0.2 Safari/530.19.1"
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/528.4+ (KHTML, like Gecko) Version/4.0dp1 Safari/526.11.2"
Etc.
Which one do I use? It's not a major issue, but just wondering. Thanks!
The AppleWebKit/xxx section tells you which rendering engine is being used. The Version/xxx Safari/xxx section tells you which version of the "Browser Frontend" is being used.
Webkit nightly builds run the currently installed Safari frontend with the latest nightly rendering engine doing the rendering. This is why you can get different AppleWebKit/xxx numbers with the same Version/xxx Safari/xxx.
Related
I've been trying to use this endpoint for a personal project :
GET https://cdn-api.co-vin.in/api/v2/appointment/sessions/public/calendarByDistrict?district_id=512&date=06-05-2021
My problem is that it returns a 403 error on Postman as well as when I use curl or wget, but works fine on chrome.
Any solutions to rhis? Why is this happening?
Try adding header in your request.
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
Installed goaccess, and trying to parse/analyse one log file. Facing issues in the log format. Any one knows the format we need to use - for below kind of log:[updated the log sample]
::1 - - [24/Jun/2013:17:10:39 -0500] "GET /favicon.ico HTTP/1.1" 404 286 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36" 0 -
It worked after using --log-format=COMBINED.
Answer credits to #Pete Darrow.
This is a typical log line from Apache being stored in AWS Elasticsearch. I'd like to be able to add a viz to my dashboard showing top referrers. The problem is that many static files have referrers from its own domain which prevents me from seeing the data I want.
Is it possible to have a search expression like "where REFERRER does not contain VHOST"
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET / HTTP/1.1" 200 42766 "http://facebook.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET /js/lib/jquery-ui/jquery-ui.js HTTP/1.1" 200 42766 "http://example.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0
I have been monitoring google analytics of our ecommerce server. Normally we would have less than 10 visitors. However recently I been seeing unusual bot activities. Sometimes it jumps to over 50 connections at a time. All in within few minutes. I am not sure if it is a bad crawler or someone committing click fraud on our google PPC ad campaigns.
Following is a small part from our access_log. Checking ip addresses does not reveal much. Also ipaddresses are unique and I could not find any repeat access from same ip when I compare over a few days.
76.189.130.73 - - [27/Feb/2016:21:32:25 -0600] "GET /hp-ce260x-toner-cartridge.html HTTP/1.1" 200 11548 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/4E423F"
71.82.43.43 - - [27/Feb/2016:21:32:26 -0600] "GET /hp-cb540a-oem-black-toner-cartridge.html HTTP/1.1" 200 11497 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
68.4.69.7 - - [27/Feb/2016:21:32:25 -0600] "GET /hp-c9723a-magenta-laser-toner-cartridge.html HTTP/1.1" 200 11233 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36"
50.54.179.218 - - [27/Feb/2016:21:32:26 -0600] "GET /hp-q5942xd-black-toner-cartridge.html HTTP/1.1" 200 11299 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"
64.213.217.226 - - [27/Feb/2016:21:32:28 -0600] "GET /hp-q2682a-yellow-toner-cartridge.html HTTP/1.1" 200 11336 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36"
50.25.245.238 - - [27/Feb/2016:21:32:29 -0600] "GET /hp-ce255x-oem-high-yield-toner-cartridge.html HTTP/1.1" 200 11196 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36"
I am not sure if this is related but I also see a few crawling from ahrefs.com/robot/ and webmeup-crawler.com/, but their ip addresses are consistent. I have already modified robots.txt to block ahrefs.com bot.
robots.txt can be abused, but it's mainly meant for google bots looking for what's available to be searched for. I have noticed in my own log that both google and random IP addresses tries a variety of different directories including these:
/phpMyAdmin/scripts/setup.php
/phpmyadmin/scripts/setup.php
/pma/scripts/setup.php
/robots.txt (Google in this case)
'9\xdd\xb1\xf8\xa1\xa8\xa8\x82\x904\x1f\x84\xbeNv\x7fa\xd9\xd4,)\x98^\xbf\x98\x14\x82q
\x19\xa5\b\x7f\xee\x98\x02\xde_\xa1\x1b\xc0
\x06\xe6\xf2\xba\"!=\xe1\x18?\xb6\xf5$\xb4n0[\x92\xe9_
\x8b[Y5nS\x1d (some kind of hash cracker)
//wp-login.php
/blog//wp-login.php
/wordpress//wp-login.php
/wp//wp-login.php
/?author=1
What they are looking for are mostly pre-created directories from from free download templates.
You should know that nearly all IP's starting on 66.249 are google.
The rest you can lookup yourself.
In your case it looks like the bot(s) are looking for an HP printer to mess with.
Hope this helped
The most used IE User-agent's in my stats are:
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; MATM)
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
What does the MATM stand for?
MATM is a codename for the hardware, one of several by the same vendor:
useragent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; MATMJS)
vendor: TS
-
useragent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/7.0; MATM)
vendor: TS
-
useragent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/7.0; MATP)
vendor: TS
-
useragent: Mozilla/5.0 (MSIE 9.0; Windows NT 6.3; WOW64; Trident/7.0; MATBJS; rv:11.0) like Gecko
vendor: TS
-
useragent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MATPJS; rv:11.0) like Gecko
vendor: TS
-
useragent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; TNJB; rv:11.0) like Gecko
vendor: TS
-
useragent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; TAJB; rv:11.0) like Gecko
vendor: TS
Where TS is Toshiba:
'TS' => 'Toshiba',
Use the following registry key to see the definition:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Internet Settings\5.0\User Agent
Look for it under the Pre-Platform and Post-Platform keys.
Many factors affect the user-agent string, including OEM vendors, carriers, network administrators, and user preferences.
Additional tokens can be added to the user-agent string by using the Registry Editor to create new string values under the Pre-Platform key or Post-Platform key. The value name should be the complete token; the value data is ignored. Tokens added to the Pre-Platform key appear before the platform token in the final user-agent string. Tokens added to the Post-Platform key appear after the platform token in the final user-agent string. Multiple tokens in either the Pre-Platform key or Post-Platform key are displayed in an unpredictable order.
Earlier versions of Internet Explorer included feature tokens defined using the Pre-Platform and Post-Platform keys part of the user-agent string during the HTTP negotiation process. Over time, this lead to overly long user-agent strings, which in turn created problems for certain web servers. Problems usually appeared when user-agent strings were longer than 256 characters. As of Internet Explorer 9, the user-agent string no longer includes feature tokens during HTTP negotiation. Feature tokens are included in the value returned by the userAgent property of the navigator object. Applications that rely on the earlier behavior should be modified accordingly.
References
Device Detector Github repo
Internet Explorer compatibility cookbook: User-agent string changes
Understanding user-agent strings
Registry Keys Affected by WOW64