Can I use selenium to get to certain point and after that use request + lxml(or other package like scrapy) to make the process faster? - selenium

I have a web crawling task which the website needs the user to login and pass the Email 2FA authentication before getting to the main page. On the main page, I have a lot of detail pages(all in the same format) to visit and collect the data and store them in a CSV file.
Now I got no problem with getting to the main page. However, I'd like to optimise tasks after entering the main page, my current solution for this part is I use a for loop and visit each page and collect the data.
for each in page_id_list:
driver.get("https://...." + str(each))
driver.find_elements_by_xpath("...")
....
Functionally it works fine, but I was thinking if I can make the process faster. As selenium loads the whole web page and other packages like request + lxml only retrieve HTML maybe I should somehow use another way to visit those pages, and also I wonder if Scrapy will be useful in my case?
Thank you.

Related

Zap Vaadin setup issues

Being absolutely new to security testing, I am trying to run basic steps and then trying to run a spider and active scan. I have seen couple of videos from owasp & youtube and tried to make sense of the included ZAP documentation. However, I don't think ZAP is logged in while making the spider crawl or active scans.
Mine is a Java + Vaadin & Spring based application and the POST URL's don't change on making any kind of request. Just the request parameters change. POST URL's are always
http://example.com/UIDL/?v-uiId=0
I have used
Set Active Session under HTTP session, before running the spider/active scan
have setup the context(though very few URL's appear under Sites),
I have also tried to update structure parameter with words like "page", "UIDL" etc. which i could see in the URL's
I have also tried to exclude logout URL, but since all the POST URL's are the same, the spider and active scan don't really do anything, in that case.
On right clicking i got an option to select the request as Form based Authentication.
I have tried that as well, however, it populates the "Login Request POST data" with vaues like
12929ddf-5d7f-4264-b810-2fa8f38eca6f[["597","v","v",["pagelength",["i","28"]]],["597","v","v",["firstToBeRendered",["i","0"]]],["597","v","v",["lastToBeRendered",["i","26"]]],["597","v","v",["reqfirstrow",["i","15"]]],["597","v","v",["reqrows",["i","12"]]]]
and fills the username/password fields with exactly the same.
One last thing, do we really need to disable the XSRF to be able to use ZAP properly with Java/Vaadin applications?
I am simply trying to get it working and then continue learning from there. Would appreciate if anyone can please help me with the correct setup.

How to check in jmeter if entered fields remain same in the first page after navigating back from nth page

I want to test a page.Where i want to fill up the fields like first name last name etc.and after going two pages further if i come back to the original page by using back navigation ,data entered for first name and last name remains the same.or it is filled up.
In jmeter i want to check the same if data entered for the fields remain same if i navigate back .
How can i achieve this.
I tried gving url directly in the path its not happening since it is not the way.
please help me since i'm new to jmeter.
You need to understand 2 things. How JMeter works and how your application works.
JMeter only captures data that is communicated to server. It does not matter how data is entered from UI. It does not check if data retains in the fields or not. It only records the request that is sent by your application to server-side.
So, if you understand above, you also need to understand how your application sends data to server. Does it sends the request as you move from first page to second. Or does it send (Submit) data on final page.
Either way, JMeter is not a tool to test if your form fields are retaining data in them as you navigate between pages. As mentioned earlier it only monitors data requests/responses.
Selenium seems a better option for your test requirement.
Please read the apache documentation carefully:
JMeter is not a browser. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages. Nor does it render the HTML pages as a browser does (it's possible to view the response as HTML etc., but the timings are not included in any samples, and only one sample in one thread is ever viewed at a time).
First, you have to understand how JMeter works!!! To do the Functional Testing, Selenium would be a good choice.
Thanks

Script to download Google web history

How does one write a script to download one's Google web history?
I know about
https://www.google.com/history/
https://www.google.com/history/lookup?hl=en&authuser=0&max=1326122791634447
feed:https://www.google.com/history/lookup?month=1&day=9&yr=2011&output=rss
but they fail when called programmatically rather than through a browser.
I wrote up a blog post on how to download your entire Google Web History using a script I put together.
It all works directly within your web browser on the client side (i.e. no data is transmitted to a third-party), and you can download it to a CSV file. You can view the source code here:
http://geeklad.com/tools/google-history/google-history.js
My blog post has a bookmarklet you can use to easily launch the script. It works by accessing the same feed, but performs the iteration of reading the entire history 1000 records at a time, converting it into a CSV string, and making the data downloadable at the touch of a button.
I ran it against my own history, and successfully downloaded over 130K records, which came out to around 30MB when exported to CSV.
EDIT: It seems that number of foks that have used my script have run into problems, likely due to some oddities in their history data. Unfortunately, since the script does everything within the browser, I cannot debug it when it encounters histories that break it. If you're a JavaScript developer, use my script, and it appears your history has caused it to break; please feel free to help me fix it and send me any updates to the code.
I tried GeekLad's system, unfortunately two breaking changes have occurred #1 URL has changed ( I modified and hosted my own copy which led to #2 type=rss arguments no longer works.
I only needed the timestamps... so began the best/worst hack I've written in a while.
Step 1 - https://stackoverflow.com/a/3177718/9908 - Using chrome disable ALL security protocols.
Step 2 - https://gist.github.com/devdave/22b578d562a0dc1a8303
Using contentscript.js and manifest.json, make a chrome extension, host ransack.js locally to whatever service you want ( PHP, Ruby, Python, etc ). Goto https://history.google.com/history/ after installing your contentscript extension in developer mode ( unpacked ). It will automatically inject ransack.js + jQuery into the dom, harvest the data, and then move on to the next "Later" link.
Every 60 seconds, Google will force you to re-login randomly so this is not a start and walk away process BUT it does work and if they up the obfustication ante, you can always resort to chaining Ajax calls and send the page back to the backend for post processing. At full tilt, my abomination script collected 1 page a second of data.
On moral grounds I will not help anyone modify this script to get search terms and results as this process is not sanctioned by Google ( though not blocked apparently ) and recommend it only to sufficiently motivated individuals to make it work for them. By my estimates it took me 3-4 hours to get all 9 years of data ( 90K records ) # 1 page every 900ms or faster.
While this thing is going, DO NOT browse the rest of the web because Chrome is running with no safeguards in place, most of them exist for a reason.
One can download her search logs directly from Google (In case downloading it using a script is not the primary purpose),
Steps:
1) Login and Go to https://history.google.com/history/
2) Just below your profile picture logo, towards the right side, you can find an icon for settings. See the second option called "Download". Click on that.
3) Then click on "Create Archive", then Google will mail you the log within minutes.
maybe before issuing a request to get the feed the script shuld add a User-Agent HTTP header of well known browser, for Google to decide that the request came from that browser.

jMeter simulate user's progress through site

I'm a newbie to jMeter, so please bear with me.
I've been assigned the task of testing how an e-commerce website responds under load. I've managed to set up basic tests in jMeter that basically just repeatedly visit the home page, but I'd like to simulate something a bit more realistic:
User arrives on home page
User goes to catalogue page
User views product
User adds product to cart
User returns to catalogue, selects another product, adds to cart
User removes first product from cart
User proceeds to checkout
User completes checkout process.
I'm having trouble finding adequate documentation to explain how to do this. I figured out that I need a cookie manager in my test so that the user session will be maintained, but I haven't figured out how to get the user to traverse the site in a realistic use pattern (such as the one described above). Can anyone help out with this, give me some pointers as where to look for good examples, etc?
This should be no problem, record or manually create the necessary steps as HTTP Samplers, then add them into a Runtime Controller for example to execute them iteratively.
The individual steps will be executed in the order they are in the tree and, in case Cookies are used to handle session state, you might need to add the Cookie Manager to the top of the tree which will handle cookie headers for each user.
Add some timers to simulate user's think time and scale up by increasing the number of virtual users in the thread group.
Use some listener like the Aggregate Report to view the response times for every step.
Try to read http://jmeter.apache.org/usermanual/index.html at first.
Also you'll encounter the problem that Jmeter can't process dynamic pages:
http://wiki.apache.org/jmeter/JMeterFAQ#Does_JMeter_process_dynamic_pages_.28e.g._Javascript_and_applets.29
Does JMeter process dynamic pages (e.g. Javascript and applets)?
No. JMeter does not process Javascript or applets embedded in HTML pages.
JMeter can download the relevant resources (some embedded resources are downloaded automatically if the correct options are set), but it does not process the HTML and execute any Javascript functions.
If the page uses Javascript to build up a URL or submit a form, you can use the Proxy Recording facility to create the necessary sampler. If this is not possible, then manual inspection of the code may be needed to determine what the Javascript is doing.

How to speed up Google Translate

I have a web page that has 70000 characters. As you know when doing translation through Google API you can only send up to 5000 characters at a time. Which means I have to send data to Google 14 times (70000/5000) which takes a lot of time and then my page is displayed. Is there a way to speed up the process?
Thanks
have you tried caching the translation?
If you were using some AJAX framework (you don't mention what your web page is created with eg c#) then you can make it faster by making the API call via the AJAX framework.
It would look something like this (psuedo-code since we don't know what you are using):
Serve web page (almost instant)
Web page starts AJAX call:
Break text into chunks
Foreach chunk
Translate via API
Append to the page
This way the user will see the page immediately, and will also see the translation appear piece by piece as it is processsed instead of having to wait until the end.
My best bet would be to generate a page in one language, then ask google to translate it trough HTTP and display result as your own, to make it seamless for user. I believe that is what Google Chrome does when translating web pages.
Example of URL that makes Google translate the whole web page:
http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http%3A%2F%2Flinux.org.ru%2F
Of course, another option is to use Google Translate API and cache result if page content is not changing frequently.
go to the Javascript file in Google, it will lead you also to the CSS file, make a file or perhaps two, or you may be able to add CSS to your own, now make Javascript page on your web site in own directory. make a nip of code to update the Javascript code every so many seconds or minutes, and this will make the transition much faster, just by refreshing the content they give.. have fun :) also ultimately you can also send a request at the same time as the first one to translate after char 5000 which should be relatively easy to do.