I use Selenium::Remote::Driver and need to set up a "normal" user-agent. I've tried to use Selenium::UserAgent, but it has only some User-Agents for mobiles and tables, and none for usual desktop PC's. Maybe there's a way to expand the list of devices in Selenium::UserAgent, or how to set up a correct User-Agent (I'd like would be Firefox) in Selenium::Remote:Driver manually?
Update:
I'm trying to parse a website which is protected from bots (solve the puzzle if you are a human). When I try to parse it with the default Selenium+Firefox UserAgent - the protection appears.
I've tried to use Selenium::UserAgent and it worked - the protection has disappeared, but I wasn't able to scrape the needed data, because the target site promotes it's mobile apllication instead of showing the needed data this way.
So, after that I've checked the UserAgent of my home computer's browser and set it up using LWP::UserAgent:
my $ua = LWP::UserAgent->new( "Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0" );
my $driver = Selenium::Remote::Driver->new( browser_name => 'firefox', ua => $ua );
But this way the protection arrived again.
After that, I've connected to my server through the VNC viewer, opened the same Firefox I've been using with Selenium, and there was no anti-bot protection this way. So, that's why I'm sure that I need to use a correct UserAgent and/or some other settings.
I am trying to grab some PDFs and other files from a subscription web site (I have a paid subscription) where the current process of downloading is a tedious right click, "Save As"...
I have a list of all of the URLs to each file I want, and if I copy and paste these URLs DIRECTLY into a browser, I have access to the file, and can successfully "Save As".
However, if I try to download the files directly using code, I get an Access Denied error, I assume because whatever security is placed on the site based on my login, that info is not being accessed unless I'm using the browser directly. Origin, etc.
My goal is naturally to be able to just loop through my list of URLs...
So, is there a way to somehow figure out how to pass my "login" info so that I can access the files directly? Or a different way to download?
I've been trying to use VB.Net WebClient for the download -
wc.DownloadFile(sFile, sDest)
It works fine for non "protected" files.
Wondering if I can have better luck using ASP.Net...
Thanks.
VB.NET The remote server returned an error: (403) Forbidden A FILE DOWNLOAD APP
Found the answer here.
Dim x As New WebClient
x.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.45")
x.DownloadFile(sWebFile, sDestLocalFile)
Thanks
I am trying to use PageSpeed Insights in Google Search Console for Weebly/Square website and getting an error:
Lighthouse returned error: NOT_HTML. The page provided is not HTML (served as MIME type )
It worked for me at the beginning (I tested 2-3 times). I resized some images and tried again. Getting this error since then.
Square's support states it's not on their side.
Lighthouse returning NOT_HTML can have at least three causes:
The page is really served as text/plain or without any valid Content-Type, potentially because of browser or bot detection.
You might be able to reproduce this by making a request with the same User-Agent as Lighthouse:
curl -IA "Mozilla/5.0 (Linux; Android 7.0; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.175 Mobile Safari/537.36 Chrome-Lighthouse" 'https://www.rustichappyplace.com/'
The webserver supports HTTP/2 or QUIC, but doesn't implement the protocol exactly as Lighthouse expects it, causing the Content-Type to be misdetected.
You should be able to reproduce the error in the newest Google Chrome or Chromium Nightly browser. In that case there is little you can do except asking your hoster to disable these features or to update the server software.
Lighthouse has a bug that is triggered because of some feature the web server uses.
Currently (March 2021) Lighthouse on Google PageSpeed Insights seems to have a bug that produces NOT_HTML in some constellations when HTTP/2 Early Hints are activated in the web server. I've had a similar problem today and found that disabling H2EarlyHints in Apache 2.4.46 prevented the issue.
If your hoster uses that feature to accelerate page loading, ask them to disable it for now.
Why in the world is this the webkit user agent:
Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.27+ (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27
Why not:
Webkit/5.04 (Windows NT 6.0; WOW64) AppleWebKit/534.27+ (KHTML, like Gecko) Safari/533.20.27
Thanks for clearing this up :)
It really is a left over from the early days of Web. Many sites were only compatible with Netscape Navigator, which was the dominant browser at that point, and so what they did was to sniff the User-Agent for the "Mozilla/*" part. When IE showed up, MS wanted those websites to work in their browser as well, so they went for pretending they are Mozilla as well. And so did all the browsers that popped up later on, including WebKit-based ones. And it doesn't seem like that artifact is going away anywhere soon as still many old sites do that type of sniffing and for browsers dropping this convention would probably mean breaking thousands of sites.
It's a throwback to the browser wars, the browser is identifying as a 'Mozilla Compatible' agent.
Given that Chrome and Safari use webkit has anyone yet found anything that renders differently on Chrome than Safari? Is there any reason at the moment to test sites on both, or would testing Safari be sufficient for right now?
Part of this is knowing what is dependent on the rendering engine and what isn't. Javascript, for example, is handled differently in both browsers (google has their own custom javascript renderer), so if your page uses javascript substantially I'd test it in both.
This is probably a good place to note that Chrome has been added to BrowserShots so you don't even need to have it installed to test on it and Safari.
Google Chrome also uses an earlier version of Webkit than the current Safari, so pages should be checked in both browsers.
They are very similar, but not identical. For example, I remember reading that Apple put a lot of work in Safari to get Apple-style font rendering there, and I doubt Google duplicated that effort.
They don't ship synchronized releases of WebKit. For example,
Google Chrome
Official Build 2200
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.30 Safari/525.1
versus
Safari
3.1.2
Mozilla/5.0 (Windows; U; Windows NT 6.0; pl-PL) AppleWebKit/525.19 (KHTML, like Gecko) Version/3.1.2 Safari/525.21
WebKit is very modular, so they have different drawing and javascript engines. Plugins are handled in significantly different manners as well.
In practice, I have not seen any site that acts differently, and the two browsers should have identical behavior, as far as any sane webpage is concerned. You could, of course, sniff the user-agent and force different behavior...
So no, there is no reason at the moment to test both. Keep in mind that Google does not have a stable release of Chrome yet.
No, and some specific UI differences include not rendering text-shadow and box-shadow the same between them. Same with border-radius. I'd avoid using these three (advanced) CSS rendering rules if you're working with Chrome.
Chrome and Safari have different font rendering on Windows. Safari includes Apple's font rendering which to a Windows user is a bit fuzzy-looking. On OS X, they both use the platform's native font rendering. So Safari looks like OS X on both systems, whereas Chrome looks like the platform it's running on.
This is in addition to other points mentioned by people who know more than I do. :)
No. This would be a similar question to "Does Chrome Render the same as Konqueror", and altho the Webkit ( HTML Renderer ) versions may be different, the Java script engines are very different between Chrome, Safari and Konqueror. This will affect a lot of Google apps since they are written using javascript heavy stuff (AJAX). This also seems to affect a lot of modern sites, especially ones with complex menu's and editors ( such as this ). In the end it depends how much of the site you are viewing is written with javascript features.
FWIW, Google's Chrome FAQ says they should render very similarly:
http://www.google.com/chrome/intl/en/webmasters-faq.html#useragent
http://www.google.com/chrome/intl/en/webmasters-faq.html#renderie
They still have different JavaScript engines, which might behave differently (propably only in some rare conditions, however).
Chrome is currently using a slightly older version of Webkit than Safari.
Over time it will be updated, of course, but there is the possibility that it will always be a little behind, depending on how Apple release their source.
In addition the Javascript engines are different, which may affect behaviour, although they're both extremely fast.