Injected scripts not firing for Google results pages - safari

As a newbie to Safari Extensions I have what I am sure is a terribly trivial question ...
Here goes: I'm building an extension to work with some search engines. To cut a long story short I have boiled my issue down to its simplest form. I have an injected script (an end script). This fires as planned on the Google homepage. But when I enter a query to Google the script does not fire when the subsequent results page loads.
For example, to keep things really basic for testing I created a simple script that just writes to the console; I've set the access level to All so that it fires for all pages. I can see the console message when I open the Google homepage but I dont see it when the subsequent results pages load.
For all intents and purposes it seems as if the transition from Google homepage to results page is not a normal one (that is, not a conventional page load) and does not cause injected scripts to fire. I've only seen this problem on Google so I assume it is something to do with their page loading mechanism. I've tried it with Google Instant on and off and both produce the same behaviour.
e
It's one of those problems that seems so basic as to be stupefying! Please help.

Related

Python Scrapy Splash doesn't render website, stuck at loading screen

I would like to render the following website with Scrapy Splash.
https://m.mobilebet.com/en/sports/football/england-premier-league/
Unfortunately, Splash always gets stuck at the loading screen:
I have already tried using a long waiting time (up to 60 seconds) with no results. My Splash version is 3.3.1 and obey robots.txt has been set to false.
Thanks!
There's not quite enough info to answer, but I've got a good guess.
You see, the major difference between Splash and your browser is the user agent string. You have one that looks like a person. Splash generally doesn't.
This kind of infinite loading is a method used by sites to mitigate repetitive load. Often when you're developing locally without a proxy you'll trip these issues. They are quite maddening to develop against because they're inconsistent.
Your requests are just getting dropped, you'll probably see a 403 after 5-10 minutes.
I think it's likely you can solve this issue with the method mentioned in this answer: Scrapy+Splash return 403 for any site.
I don't think it'll be possible - this website needs JS to be rendered. So you'll need to use something like Selenium to scrape information from it.
Also, perhaps what you are looking for is an API for that information - since scraping it from a website can be very inefficient. Try googling "sports REST API" - look for one with Python SDK.
Ok, so Splash is supposed to render the JS for you it seems. But I wouldn't rely on it too much - those websites constantly change and they are developed against latest browsers, your best bet is to use Selenium with Chromium driver (though using API is much more preferable).

Causes of duplicate apache POST requests, other than double submission of form?

This might sound like a question that gets asked frequently but I am not looking for solutions to handle duplicate requests. I just want to know what could cause Apache to receive duplicate requests in the first place.
I have been running into a rather sporadic problem. I have a form that does a POST request on submit but the request can somehow get duplicated just a second later (according to access logs). This used to be a more frequent problem because we were not handling it as gracefully so I put in some client side code to disable the submit button during the form submit event. This prevents double submission (obviously as long as javascript is enabled), but the problem still persists in a very randomly manner. One thing I noticed from logs is clients that cause the issue are android phones running Chrome. Does mobile Chrome do funky things like retry POST requests on it's own? When testing it on my own, I could never get duplicate POST requests to occur, unless I remove the javascript code that disables the submit button.
Web server setup is super simple. No load balancing or anything, just a single server running Apache 2.2.15. We use PHP 5.6 but that probably has nothing to do with this.
I guess it is users doubleclicking rather than clicking, and the application they use transforms every click into a new POST request. Here I'd look into the application design.
Usually I use frameworks that totally cover this and thus can only guess. Clicking the button should not only trigger the POST request but also disable the button while the action is in progress. So JavaScript code could look like
disable button
post the data
enable button
If, due to the POST, the browser navigates to another page this would not be harmful at all.
EDIT: Seeing you did exactly what I suggested, maybe there is another cause.
Suppose users POST their data, and then the screen goes dark, or they switch applications. When they reactivate the browser, is it possible the browser reloads the page by repeating the last request?
I think frameworks cover such situations by responding with a redirect as response to POST, and the redirect would retrieve the data via GET. Since GET is idempotent, it can be run repeatedly without further damage.

Handling SEO on Isomorphic React

i'm using React & Node JS for building universal apps (). I'm also using react-helmet as library to handle page title, meta, description, etc.
But i have some trouble when loading content dynamically using ajax, google crawler cannot fetch my site correctly because content will be loaded dynamically. Any suggestion to tackle this problem?
Thank you!
I had similar situation but with backend as django, but I think which backend you use doesnt matter.
First of let me get to the basics, the google bots dont actually wait for your ajax calls to get completed. If you want to test it out register your page at google webmaster tools and try fetch as google, you will see how your page is seen by bots(mine was just empty page with loading icon), so since calls dont complete, not data and page is empty, which is bad for SEO ,as bots read text.
So what you need to do, is try server side rendering. This you can do in 2 ways either you prerender.io or create templates on backend which are loaded when the page is called for the first time, after which your single page app kicks in.
If you use prerender its paid but pre-render internally uses phantom.js which you are you can directly use. But it did not work out really well for me so I went with creating templates on the backend. So the bots or the user when come to page for first time(or first entry) the page is served from backend else front end.
Feel free to ask in case in any questions :)

jquery tabs taking long time to load due to Google OAuth2

I recently moved to google Oauth2 authorisation for my webpage that has multiple jquery tabs.
On clicking the tab, a Get request is triggered, after shifting to Google OAuth2, the time to load these tabs have increased by atleast 5x times. Previously I used server-side Apache Authorisation. I am assuming the Get request data is getting authorised by google then rendered on page (which might be causing this latency), is there some way to solve this problem.
I tried serially loading each tab as one of the ways,by trying this solution but that didnt work well. Although I think there could be a better way to solve this.

Testing progressively enhanced features with Capybara

I'm using Capybara to test features on a progressively enhanced website. Let's say my feature is to navigate around a hierarchy of locations. The non-javascript version involves getting a new version of the page when we click around on different locations. The enhanced javascript version opens up hidden elements, or loads up new information via Ajax.
I start by writing a test for the non javascript version, which looks something like this:
When I visit the page for "UK"
And I click "London"
Then I should see the information for "London"
Using the default mechanize driver, the test fails, I develop the feature, then the test passes.
I then create an identical test for the javascript version, flagged up with #javascript. It runs the test with the javascript driver, and that test passes because the feature has been implemented. (It's running through the non-js flow). However, I want the javascript version of the test to fail at this point because the feature has not yet been enhanced with Javascript.
So I'm looking for a good strategy for determining whether or not a whole new page has come from the server, and making sure both versions of the feature work. (I plan on integrating this with pushState so testing for a changed URL won't do)
I'm interested to hear other peoples opinions on this - I'm not convinced Cucumber is the right tool for the job, since you're describing features from the perspective of user interaction, and it sounds like your implementation of progressive enhancement will result in essentially the same user interaction.
That aside, I think you may want to consider building in some kind of testing hook to the page itself to help with this. Hard to say what without knowing your exact situation, but maybe one of:
The script-enhanced version of the page could add some element to the DOM, indicating that the enhancements are active, or indicating that the data came from an AJAX request rather than page load.
You could generate a random page identifier (from the server) on every load (e.g. new GUID), embed this into the DOM and assert that it hasn't changed after the interaction (on the enhanced version). This would be a very simple way of achieving your stated goal ("determining whether or not a whole new page has come from the server")
Why does your javascript "enhanced" version work the same as your non-enhanced? Your cucumber tests using #javascript should be testing to ensure the enhancements work.
For instance,
* if the javascript opens a modal dialog instead of following a link to a new page, test for that.
* if the javascript submits a form and updates a value, test for that.
These tests would fail if run without javascript support.