Retrieve current chrome open page in html without saving it - selenium

I'm implementing a python script mainly based on pyautogui. One of the things the script does is to open a chrome webpage. After that I would need to access the DOM of this currently open webpage.
Since I've not opened the browser with selenium, I can't use it to analyze the DOM.
However, my question is: is this currently open chrome page available/saved somewhere in the hard drive so that I can access it with selenium? Like an .html file?
I checked many other questions here and users talk about chrome cache, but there are no html files there.
I just need to be able to access the current open page and not all the historical data in the cache.
Opening web browser directly with selenium is not an option either, since most of the websites analyzed have captchas and distil technology.
Thanks.

If you start the original chrome with --remote-debugging-port=PORT_NR argument, and visit localhost:PORT_NR from another browser, you will have access to the full content of the browser, including dev console.
Once you have this, you have multiple ways to go:
You can visit http://localhost:PORT_NR with with any other browser (or even with the same browser), and you should have full access to the content of the original Chrome. With Selenium you should have a relatively easy time to get by.
You can also use the devtools api (the documentation.. is.. well... there is room for improvement. Search for chrome devtools protocol to be amazed by the lack of docs). As an example you can get to http://localhost:PORT_NR/json to get the available debugging URIs. Grab the relevant websocket endpoint (webSocketDebuggerUrl). Open a websocket connection, and issue a command, like {"method": "DOM.getDocument", "id":12}. You can find available DOM related commands here: https://chromedevtools.github.io/devtools-protocol/1-3/DOM

Sice I had to reinvet the wheel I may give some extra info that I coudn't find anywhere:
Start the Browser with remote debugging enabled (see previous posts)
Connect to the given port on localhost and use these HTTP-GET-Requests to geta very limited control on your browser:
https://chromedevtools.github.io/devtools-protocol/#endpoints
Most important:
GET /json/new?{url}
GET /json/activate/{targetId}
GET /json/close/{targetId}
GET /json or /json/list
To gain full control over the browser, you need to use a "websocket" connection. Each Object in the GET /json or /json/list has it's own ID. Use this ID to interact with the tab. Btw: Type "page" are normal tabs, the other stuff are extentions and so on. Once you know which Tab you want to influence, get it's "webSocketDebuggerUrl".
Use this URL and connect with something that can speak the Websocket-protocol.
Once connected, you must craft a valid Json by the following structure:
{
"id":0,
"method":"Page.navigate",
"params":{url:http://google.com}}
}
Notes:
ID is a simple counter (int) that get bigger - not the ID of the tab(!)
Method is the method described in the docs params is also in the docs.
The return values are always JSONs.
From now on you can use the official docs:
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-navigate
Dunno how other ppl found out about it but it took a few hours to get it working. Probably cause everyone is just using python's selenium to do it.

Related

Login screen recording in JMeter repeating url issue

I need your help, I have recorded a login script in blaze meter and importing it into JMeter what I noticed that browsing URL is repeating like site.com/0, site.com/1,site.com/2 and so on. Please suggest what to do to fix it asap help required. thanks.
I am trying to record a login script in blaze meter when I imported the script in JMeter I found that the browsing URL is repeating. like example.com/0, example.com/1,and so on. please help me.
We cannot "help" without knowing what are your expectations.
When it comes to performance testing of web applications you need to ensure that JMeter is properly configured to behave exactly like a real browser.
It means that JMeter should send the same requests and in the same manner as the real browser does.
In case if the network footprint generated by JMeter matches the one which the real browser produces - you don't need any "help" there. If it doesn't - we need to see:
the dump of requests from "Network" tab of your browser's developer tools
how did you configure the BlazeMeter Chrome Extension, i.e. choosing "Only top level requests" might "help" you
Normally these numeric postfixes are used as the naming convention for the Transaction Controller to all nested redirects, embedded resources and so on would be considered an integral part of the parent "transaction"

Can I use Selenium with password-protected URLs?

I'm trying to use Selenium for fetching information automatically from an ADSL modem's status page.
To log in to the modem, it requires certain username + password combination. Unlike all the samples that I have found, this comes before fetching the page, and therefore is not the case of finding the right id and then 'typing' the text into them.
Does Selenium have support for reaching such access controlled pages?
If I understand you correctly, if you use the following url, it will work:
http://username:password#modemstatusurl/bar/foo

Script to download Google web history

How does one write a script to download one's Google web history?
I know about
https://www.google.com/history/
https://www.google.com/history/lookup?hl=en&authuser=0&max=1326122791634447
feed:https://www.google.com/history/lookup?month=1&day=9&yr=2011&output=rss
but they fail when called programmatically rather than through a browser.
I wrote up a blog post on how to download your entire Google Web History using a script I put together.
It all works directly within your web browser on the client side (i.e. no data is transmitted to a third-party), and you can download it to a CSV file. You can view the source code here:
http://geeklad.com/tools/google-history/google-history.js
My blog post has a bookmarklet you can use to easily launch the script. It works by accessing the same feed, but performs the iteration of reading the entire history 1000 records at a time, converting it into a CSV string, and making the data downloadable at the touch of a button.
I ran it against my own history, and successfully downloaded over 130K records, which came out to around 30MB when exported to CSV.
EDIT: It seems that number of foks that have used my script have run into problems, likely due to some oddities in their history data. Unfortunately, since the script does everything within the browser, I cannot debug it when it encounters histories that break it. If you're a JavaScript developer, use my script, and it appears your history has caused it to break; please feel free to help me fix it and send me any updates to the code.
I tried GeekLad's system, unfortunately two breaking changes have occurred #1 URL has changed ( I modified and hosted my own copy which led to #2 type=rss arguments no longer works.
I only needed the timestamps... so began the best/worst hack I've written in a while.
Step 1 - https://stackoverflow.com/a/3177718/9908 - Using chrome disable ALL security protocols.
Step 2 - https://gist.github.com/devdave/22b578d562a0dc1a8303
Using contentscript.js and manifest.json, make a chrome extension, host ransack.js locally to whatever service you want ( PHP, Ruby, Python, etc ). Goto https://history.google.com/history/ after installing your contentscript extension in developer mode ( unpacked ). It will automatically inject ransack.js + jQuery into the dom, harvest the data, and then move on to the next "Later" link.
Every 60 seconds, Google will force you to re-login randomly so this is not a start and walk away process BUT it does work and if they up the obfustication ante, you can always resort to chaining Ajax calls and send the page back to the backend for post processing. At full tilt, my abomination script collected 1 page a second of data.
On moral grounds I will not help anyone modify this script to get search terms and results as this process is not sanctioned by Google ( though not blocked apparently ) and recommend it only to sufficiently motivated individuals to make it work for them. By my estimates it took me 3-4 hours to get all 9 years of data ( 90K records ) # 1 page every 900ms or faster.
While this thing is going, DO NOT browse the rest of the web because Chrome is running with no safeguards in place, most of them exist for a reason.
One can download her search logs directly from Google (In case downloading it using a script is not the primary purpose),
Steps:
1) Login and Go to https://history.google.com/history/
2) Just below your profile picture logo, towards the right side, you can find an icon for settings. See the second option called "Download". Click on that.
3) Then click on "Create Archive", then Google will mail you the log within minutes.
maybe before issuing a request to get the feed the script shuld add a User-Agent HTTP header of well known browser, for Google to decide that the request came from that browser.

How might I provided a URL use the FireShot API to take a screenshot, upload to Imgur, and return some output (eg. markdown)

I am looking for a way to utilize the FireShot API with JS to given a URL (or perhaps a list) use the FireShot API to take screenshot, upload to Imgur, then return the user the URLs or perhaps something like markdown to use quickly in forums.
Method 1: Open new window
I tried opening the URL in a new window, but found that I cant control that page with JS dues to cross domain problems. The same with iFrames.
Method 2: simple $.get()
A simple $.get() wont work because of the same cross domain issues I guess?
http://jsfiddle.net/t6aeq/
$.get($url.val(), function(data) {
console.log(data);
});
Via PHP "Proxy"
So I tried creating a simple PHP script that gets the HTML of the URL and returns it to my JS (using file_get_contents($url)). But some sites like Microsoft will detect that I am using some automated methods and give an error page of sorts. I also cant seem to find a way to use jQuery to query that returned HTML for link[rel=stylesheet], script, style and body to append to the head and a div respectively. I posted abt that on another question
A new Idea: Embed scripts on browser level
So I thought away of getting around these is using iMacros or GreeseMonkey or something to insert scripts into pages on the browser level instead? But any guidance or tips on how can I do that? Also, I'd prefer a pure JS/PHP method if available so users are not limited to using Browser plugin/scripts (tho I will be the only user for now)
It suddenly came to my mind that this may not work because the FireShot API key and Imgur is limited to the domain? Any solutions?
You might be able to inject the FireShot script using Greasemonkey. But, first use GM_xmlhttpRequest() to fetch an API key, for that page's domain, from the "Create FireShot API Key" page.
Note that GM_xmlhttpRequest() does not have the same cross-domain issues that $.get() has.
However, at this point you might be better off just writing your own Firefox add-on. Maybe start with FireShot's code for ideas. Also see the Screengrab add-on.

Refresh browser via cron(or not) to a different page on remote request?

I need to display pages in a tutorial fashion. I looked in to netsupport, beamyourscreen and other possibilities but, I do not want the viewers to download anything. I cannot use gd / send screenshots due to audio / video instructions embedded in some of the pages.
Basically, I need the ability to "refresh" a users browser window to a different page via an interface on my end. Whether via a form submission, javascript or any other type of "controller" that allows me to change the page on the viewers browser. PERL preferred but, PHP / javascript whatever works and is cross browser. I set up a simple javascript page forward timer that "works" but, page load times and conversation interruptions are a huge factor.
The entire tutorial website will be developed around this ability.
I was looking in to curl / cron / wget methods but, found little information.
I have seen forum and chat scripts that basically perform a similar task but, there must be a simple(ish) solution in leau of hacking up another script to suit my needs.
I do not want others to control the pages either. The site really, only needs to be accessable during the tutorial however, It "could" remain web accessable as long as user interaction was normal unless (being controlled).
The initial site concept is based on instructing people how to properly introduce new pets into a home. Will be operated by a veteranarian that saved my pets life. I wanted to give something back.
Possible? I really appreciate simple examples etc...
You have no other way but to keep polling the server for "instructions" using javascript. No, you can't send nothing to the end user browser, neither curl nor wget.
Mainly, you'll have to set up a simple request/response protocol between the browser and the server.
If you want to go deeper, you can use something like cometd/meteord/etc. If not, a hidden iframe that reloads himself and receives pages with javascript code for the needed actions can do the trick.
Another alternative.
With javascript dopolling and single character flatfile. Have a simple one character flatfile with a single var. Write it in perl (it is faster and uses less resources than php). The parent script calls a javascript variable in a flatfile. It hits the flatfile and goes wherever the var sets it. The flatfile is written to by the controller. Done.
I guess you could also rename an empty flatfile and use that as the controller. I am usure which is faster, open and read a specific file or hit the directory and return the file name. On the controller side, opening and writing to a file vs renaming a file. Maybe they counter each other in resources and time?
This way the site can act as a normal site. When you want to have remote users see a "presentation" (automatically being shown the site pages at the controllers pace), the controller activates polling and tells the viewers to push a start button. This allows a remote instructor to load pages for the viewers at his leisure.
It is a simple solution that works with nothing really sophisticated going on. No frames are needed either. Just need javascript enabled.
Any better suggestions are welcome!
It occurred to me that what you might want to use is HTML Push technology. Check out the wiki, they have several links. I have never used it myself