Is it possible to use Browser control or E2 to download files without "Access Denied" error - vb.net

I am trying to grab some PDFs and other files from a subscription web site (I have a paid subscription) where the current process of downloading is a tedious right click, "Save As"...
I have a list of all of the URLs to each file I want, and if I copy and paste these URLs DIRECTLY into a browser, I have access to the file, and can successfully "Save As".
However, if I try to download the files directly using code, I get an Access Denied error, I assume because whatever security is placed on the site based on my login, that info is not being accessed unless I'm using the browser directly. Origin, etc.
My goal is naturally to be able to just loop through my list of URLs...
So, is there a way to somehow figure out how to pass my "login" info so that I can access the files directly? Or a different way to download?
I've been trying to use VB.Net WebClient for the download -
wc.DownloadFile(sFile, sDest)
It works fine for non "protected" files.
Wondering if I can have better luck using ASP.Net...
Thanks.

VB.NET The remote server returned an error: (403) Forbidden A FILE DOWNLOAD APP
Found the answer here.
Dim x As New WebClient
x.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.45")
x.DownloadFile(sWebFile, sDestLocalFile)
Thanks

Related

How to set a custom User-Agent in Perl's Selenium::Remote::Driver

I use Selenium::Remote::Driver and need to set up a "normal" user-agent. I've tried to use Selenium::UserAgent, but it has only some User-Agents for mobiles and tables, and none for usual desktop PC's. Maybe there's a way to expand the list of devices in Selenium::UserAgent, or how to set up a correct User-Agent (I'd like would be Firefox) in Selenium::Remote:Driver manually?
Update:
I'm trying to parse a website which is protected from bots (solve the puzzle if you are a human). When I try to parse it with the default Selenium+Firefox UserAgent - the protection appears.
I've tried to use Selenium::UserAgent and it worked - the protection has disappeared, but I wasn't able to scrape the needed data, because the target site promotes it's mobile apllication instead of showing the needed data this way.
So, after that I've checked the UserAgent of my home computer's browser and set it up using LWP::UserAgent:
my $ua = LWP::UserAgent->new( "Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0" );
my $driver = Selenium::Remote::Driver->new( browser_name => 'firefox', ua => $ua );
But this way the protection arrived again.
After that, I've connected to my server through the VNC viewer, opened the same Firefox I've been using with Selenium, and there was no anti-bot protection this way. So, that's why I'm sure that I need to use a correct UserAgent and/or some other settings.

Syntax of User Agent in HTTP Header

I have Google Chrome Browser running version 89.0 The user agent of my browser displays the following string:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
What does each field in the above syntax format represents ?
It’s entirely arbitrary what you put in the user agent field so there is no standard as to what format to use.
Saying that, for historical reasons, browsers like to claim to be lots of different browsers. Great post on that here: https://webaim.org/blog/user-agent-string-history/
This was mostly because some sites would use the user agent to guess the capabilities of the browser asking for a page and try to serve different versions depending what that said.
This was fraught with problems (hence why browsers had to pretend to be other to avoid getting a substandard page) and there are now much better ways to feature detect on the client side in CSS and JavaScript.
Additionally there are privacy issues with having such a specific version as with, along with just a few other items to make it more unique, it’s pretty easy to track individual users.
On the server side, User Agent Client Hints will allow a browser to tell the site what it supports, rather than a site guessing based on the user agent. Much more accurate and future proof.
Chrome has even said to it intends to freeze the user agent at some point to stop people depending on it. So I wouldn’t build anything depending on it.

How can I fix the Lighthouse returned error: NOT_HTML. The page provided is not HTML (served as MIME type ) error for square/weebly website?

I am trying to use PageSpeed Insights in Google Search Console for Weebly/Square website and getting an error:
Lighthouse returned error: NOT_HTML. The page provided is not HTML (served as MIME type )
It worked for me at the beginning (I tested 2-3 times). I resized some images and tried again. Getting this error since then.
Square's support states it's not on their side.
Lighthouse returning NOT_HTML can have at least three causes:
The page is really served as text/plain or without any valid Content-Type, potentially because of browser or bot detection.
You might be able to reproduce this by making a request with the same User-Agent as Lighthouse:
curl -IA "Mozilla/5.0 (Linux; Android 7.0; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.175 Mobile Safari/537.36 Chrome-Lighthouse" 'https://www.rustichappyplace.com/'
The webserver supports HTTP/2 or QUIC, but doesn't implement the protocol exactly as Lighthouse expects it, causing the Content-Type to be misdetected.
You should be able to reproduce the error in the newest Google Chrome or Chromium Nightly browser. In that case there is little you can do except asking your hoster to disable these features or to update the server software.
Lighthouse has a bug that is triggered because of some feature the web server uses.
Currently (March 2021) Lighthouse on Google PageSpeed Insights seems to have a bug that produces NOT_HTML in some constellations when HTTP/2 Early Hints are activated in the web server. I've had a similar problem today and found that disabling H2EarlyHints in Apache 2.4.46 prevented the issue.
If your hoster uses that feature to accelerate page loading, ask them to disable it for now.

VB.NET The remote server returned an error: (403) Forbidden A FILE DOWNLOAD APP

I'm trying to make a file download app using vb.net but when i debug the app and press the button i see this error. How can i solve this problem ?
MY CODE :
ERROR :
Looks like thinkbroadband's server doesn't like serving that file to things it considers bots/not real browsers. You'll have to mimic a real browser instead:
Dim x As New WebClient
x.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.45")
x.DownloadFile("http://download.thinkbroadband.com/10MB.zip", "c:\temp\a.zip")
How did I know this? I opened my browser developer tools (F12), downloaded the file OK in the browser and looked at what headers were sent for an OK result, then pared it down to just the important one (useragent)

How to make VB.NET webbrowser controls look like a different Web Browser to websites

Some websites, especially the ones that are fancy with HTML 5 and whatnot, will verify the browser that you are using, and give you a little warning message like: "Warning you are using an untested browser" if your web browser isn't in their little white-list.
Sadly these websites do not recognize IE controls as being Internet Explorer Browsers, so sometimes they show unnecessary warnings/errors
Is there any feasible way for me to make my webbrowser control show up as Internet Explorer 9 instead of whatever it actually shows up for, that way if the website has already tested Internet Explorer 9 for functionality, it will not show any errors.
Thank you!
By default WebBrowser control is detected as IE7, to see this - try to navigate to "What is My User Agent":
WebBrowser1.Navigate("http://www.whatsmyuseragent.com/")
Easiest way to change this is to pass user agent of a different browser as the last parameter of "Navigate" method. Open http://www.whatsmyuseragent.com/ in your normal IE9, copy the displayed string and use it as parameter e.g.:
WebBrowser1.Navigate("http://www.whatsmyuseragent.com/", Nothing, Nothing, "User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)")
The problem you're hitting is that, by-default, IE Web Browser controls run in legacy Compatibility modes. To resolve that, set Feature_Browser_Emulation for your process (ensure you write to both the 32bit and 64bit registry keys if your project is compiled for AnyCPU. See webbrowser using ie10 c# winform for more details.
If you wanted to send a different user-agent string (which is such sites determine what browser version you're using) you need to use the URLMon API UrlMkSetSessionOption as discussed here: http://blogs.msdn.com/b/ieinternals/archive/2009/10/08/extending-the-user-agent-string-problems-and-alternatives.aspx