How can I group Header fields together to show counts for Browser Versions? - amazon-cloudwatch

I am parsing #headers to create a dashboard in Cloudwatch. I'd like to be able to show the counts of each browser version being used from the information contained in the logs. The browser information is in the User-Agent header and I'd like to be able to show all version 4 Chrome, version 5, version 6 etc. I'm not sure where to start with this one.

Related

Retrieve current chrome open page in html without saving it

I'm implementing a python script mainly based on pyautogui. One of the things the script does is to open a chrome webpage. After that I would need to access the DOM of this currently open webpage.
Since I've not opened the browser with selenium, I can't use it to analyze the DOM.
However, my question is: is this currently open chrome page available/saved somewhere in the hard drive so that I can access it with selenium? Like an .html file?
I checked many other questions here and users talk about chrome cache, but there are no html files there.
I just need to be able to access the current open page and not all the historical data in the cache.
Opening web browser directly with selenium is not an option either, since most of the websites analyzed have captchas and distil technology.
Thanks.
If you start the original chrome with --remote-debugging-port=PORT_NR argument, and visit localhost:PORT_NR from another browser, you will have access to the full content of the browser, including dev console.
Once you have this, you have multiple ways to go:
You can visit http://localhost:PORT_NR with with any other browser (or even with the same browser), and you should have full access to the content of the original Chrome. With Selenium you should have a relatively easy time to get by.
You can also use the devtools api (the documentation.. is.. well... there is room for improvement. Search for chrome devtools protocol to be amazed by the lack of docs). As an example you can get to http://localhost:PORT_NR/json to get the available debugging URIs. Grab the relevant websocket endpoint (webSocketDebuggerUrl). Open a websocket connection, and issue a command, like {"method": "DOM.getDocument", "id":12}. You can find available DOM related commands here: https://chromedevtools.github.io/devtools-protocol/1-3/DOM
Sice I had to reinvet the wheel I may give some extra info that I coudn't find anywhere:
Start the Browser with remote debugging enabled (see previous posts)
Connect to the given port on localhost and use these HTTP-GET-Requests to geta very limited control on your browser:
https://chromedevtools.github.io/devtools-protocol/#endpoints
Most important:
GET /json/new?{url}
GET /json/activate/{targetId}
GET /json/close/{targetId}
GET /json or /json/list
To gain full control over the browser, you need to use a "websocket" connection. Each Object in the GET /json or /json/list has it's own ID. Use this ID to interact with the tab. Btw: Type "page" are normal tabs, the other stuff are extentions and so on. Once you know which Tab you want to influence, get it's "webSocketDebuggerUrl".
Use this URL and connect with something that can speak the Websocket-protocol.
Once connected, you must craft a valid Json by the following structure:
{
"id":0,
"method":"Page.navigate",
"params":{url:http://google.com}}
}
Notes:
ID is a simple counter (int) that get bigger - not the ID of the tab(!)
Method is the method described in the docs params is also in the docs.
The return values are always JSONs.
From now on you can use the official docs:
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-navigate
Dunno how other ppl found out about it but it took a few hours to get it working. Probably cause everyone is just using python's selenium to do it.

Issue in submitting and updating the safari extension

I have my safari extension in the gallery.
Now I made few minor changes in my code and then resubmitted it for review (it takes a month to review, :( that is really a long time)
After almost a month, I received an email from safari , that my extension is rejected because of this Issue
Please strip the extended attributes (xattr -c path/to/extension.safariextension) and rebuild your Safari Extension package using the latest version of Safari.
I tried the above command but every time I face the same issue.
and also I am confused that if every time I update my code, I need to resubmit extension for review, then what is the use of this link:
https://developer.apple.com/library/archive/documentation/Tools/Conceptual/SafariExtensionGuide/UpdatingExtensions/UpdatingExtensions.html
also, when I was going through all this, I got an email from safari that my certificate (which is used in extension builder to create .extz file) is going to expire in a month
I created a new certificate and created a new extz file.
but don't know that do I need to resubmit my extension or just update it on webserver.
so basically I have 3 issues/ confusions
Do I need to resubmit extension everytime I update code or just updating extz on web server will work
what do I have to do for this:
"extension description within the Safari extension preference pane appears to be a placeholder"
Safari Extension certificate expires in a year, everyYear I need to resubmit extension or can update extz with new certificate on web server will work
Thanks in Advance
================update======================================
Submitting Extension removing extended attributes is actually resolved
by just running this command from MAC xattr -c path/to/extension.safariextension, and then rebuild and submit.
So, Question 2 is resolved , now I just want to clarify about other 2 parts of my question 1st and 3rd.
Had same issue with this placeholder -issue, fixed by adding/modifying proper Description in Information -section of Extension Builder.

Can I use Selenium with password-protected URLs?

I'm trying to use Selenium for fetching information automatically from an ADSL modem's status page.
To log in to the modem, it requires certain username + password combination. Unlike all the samples that I have found, this comes before fetching the page, and therefore is not the case of finding the right id and then 'typing' the text into them.
Does Selenium have support for reaching such access controlled pages?
If I understand you correctly, if you use the following url, it will work:
http://username:password#modemstatusurl/bar/foo

Script to download Google web history

How does one write a script to download one's Google web history?
I know about
https://www.google.com/history/
https://www.google.com/history/lookup?hl=en&authuser=0&max=1326122791634447
feed:https://www.google.com/history/lookup?month=1&day=9&yr=2011&output=rss
but they fail when called programmatically rather than through a browser.
I wrote up a blog post on how to download your entire Google Web History using a script I put together.
It all works directly within your web browser on the client side (i.e. no data is transmitted to a third-party), and you can download it to a CSV file. You can view the source code here:
http://geeklad.com/tools/google-history/google-history.js
My blog post has a bookmarklet you can use to easily launch the script. It works by accessing the same feed, but performs the iteration of reading the entire history 1000 records at a time, converting it into a CSV string, and making the data downloadable at the touch of a button.
I ran it against my own history, and successfully downloaded over 130K records, which came out to around 30MB when exported to CSV.
EDIT: It seems that number of foks that have used my script have run into problems, likely due to some oddities in their history data. Unfortunately, since the script does everything within the browser, I cannot debug it when it encounters histories that break it. If you're a JavaScript developer, use my script, and it appears your history has caused it to break; please feel free to help me fix it and send me any updates to the code.
I tried GeekLad's system, unfortunately two breaking changes have occurred #1 URL has changed ( I modified and hosted my own copy which led to #2 type=rss arguments no longer works.
I only needed the timestamps... so began the best/worst hack I've written in a while.
Step 1 - https://stackoverflow.com/a/3177718/9908 - Using chrome disable ALL security protocols.
Step 2 - https://gist.github.com/devdave/22b578d562a0dc1a8303
Using contentscript.js and manifest.json, make a chrome extension, host ransack.js locally to whatever service you want ( PHP, Ruby, Python, etc ). Goto https://history.google.com/history/ after installing your contentscript extension in developer mode ( unpacked ). It will automatically inject ransack.js + jQuery into the dom, harvest the data, and then move on to the next "Later" link.
Every 60 seconds, Google will force you to re-login randomly so this is not a start and walk away process BUT it does work and if they up the obfustication ante, you can always resort to chaining Ajax calls and send the page back to the backend for post processing. At full tilt, my abomination script collected 1 page a second of data.
On moral grounds I will not help anyone modify this script to get search terms and results as this process is not sanctioned by Google ( though not blocked apparently ) and recommend it only to sufficiently motivated individuals to make it work for them. By my estimates it took me 3-4 hours to get all 9 years of data ( 90K records ) # 1 page every 900ms or faster.
While this thing is going, DO NOT browse the rest of the web because Chrome is running with no safeguards in place, most of them exist for a reason.
One can download her search logs directly from Google (In case downloading it using a script is not the primary purpose),
Steps:
1) Login and Go to https://history.google.com/history/
2) Just below your profile picture logo, towards the right side, you can find an icon for settings. See the second option called "Download". Click on that.
3) Then click on "Create Archive", then Google will mail you the log within minutes.
maybe before issuing a request to get the feed the script shuld add a User-Agent HTTP header of well known browser, for Google to decide that the request came from that browser.

How do you access browser history?

Some e-Marketing tools claim to choose which web page to display based on where you were before. That is, if you've been browsing truck sites and then go to Ford.com, your first page would be of the Ford Explorer.
I know you can get the immediate preceding page with HTTP_REFERRER, but how do you know where they were 6 sites ago?
Javascript this should get you started: http://www.dicabrio.com/javascript/steal-history.php
There are more nefarius means to: http://ha.ckers.org/blog/20070228/steal-browser-history-without-javascript/
Edit:I wanted to add that although this works it is a sleazy marketing teqnique and an invasion of privacy.
Unrelated but relevant, if you only want to look one page back and you can't get to the headers of a page, then document.referrer gives you the place a visitor came from.
You can't access the values for the entries in browser history (neither client side nor server side). All you can do is to send the browser back or forward a number of steps. The entries of the history are otherwise hidden from programmatic access.
Also note that HTTP_REFERER won't be there if the user typed the address in the URL bar instead of following a link to your page.
The browser history can't be directly accessed, but you can compare a list of sites with the user's history. This can be done because the browser attributes a different CSS style to a link that hasn't been visited and one that has.
Using this style difference you can change the content of you pages using pure CSS, but in general javascript is used. There is a good article here about using this trick to improve the user experience by displaying only the RSS aggregator or social bookmarking links that the user actually uses: http://www.niallkennedy.com/blog/2008/02/browser-history-sniff.html