How can I extract data behind a login page using import.io - import.io

I need to crawl some data that sits behind a login page. To be able to scrap it I need a tool that is able to login and then crawl the pages behind it. Is it possible to do this behind import.io?

Short version: yes, it is.
Longer version:
There are at least two ways, both require you to sign up and download the desktop app (all free)
Extractor version (simpler):
Point the browser to the page where the login is. Login normally, then train your API to extract the data you need. The downside of using this method is that it will only work as long as you are logged in. If you want import.io to login for you you'll need the..
Authenticated version:
As above, but create an authenticated API. This will record for login procedure and execute it for you every time you execute the API

Since the chosen answer doesn't work anymore :( I recommend Cloudscrape. You will get a free trial with 20 hours of crawling and/or scraping if you sign up. For data behind a login you will need a scraper.
Handy tutorials
Tutorial for logging in with scraper.
Tutorial for pagination.

Related

Verification Google OAuth2 concert scren with the apps for personal use only

I recently asked this question and user's #DalmTo and #Sergio NH they gave me an exhaustive answer for which I thank them very much.
Moving forward to question, we started publishing the application, and its verification was not required, since no scope was added (here it is a little unclear why the requests worked in an application with a test mode in which these scope were not added (google drive, google sheet and google ads)).
However, this time the application in the "In Production" mode began to give us an "Unverified app screen" (see Unverified app screen). We decided that we still need to add scope to the list, and, of course, that the scope list (their list is described above) requires verification by Google.
We started filling in the necessary fields, while studying the Google documentation at the same time, and came across the following information (see block Verification process -> What are the requirements for verification?):
Apps not applicable for verification
Apps for internal use only
(single domain use) Apps for personal use only Apps that are Gmail
SMTP plugins for WordPress Apps that are in development or
staging/testing
Apps for personal use only
And this is just our case: we have already received permission from Google Ads and are just generating simple reports that we want to integrate with Google Sheet. I.e., this is an elementary script that works within this account (however, we still need to request the first concert screen, even for this developer account) and cannot be distributed to any other accounts.
But when adding our scope, Google requires us to pass verification, forcing us to fill in the required fields, in the form of domains and their verification via the Search Console (we have already done this and this stage does not cause difficulties) and links to Youtube videos - where we must show how scope is used.
And just this stage is not clear. We do not allow other people's accounts to connect to this application, and the software does not have any interface, it is just a script that receives data from Google Ads and saves it to Google Sheet (creating a file via Google Drive). We have described all this in the scope usage description field. But the link to the Youtube video is require field, and we sincerely do not understand why (considering our case) we should record something, and most importantly, what exactly we should record in this case. If the documentation itself says that in our case we do not even need a verification.
Maybe we did not understand something and now we are doing it wrong? We will be glad to receive any tips from experts working with Google Cloud Console and apologize in advance for broken English.
We also apologize in advance to the StackOverflow community that we have to publish such elementary (which we are absolutely sure of from our side) questions here. We come here from Google Cloud Console - > Support - > Community support, and we must first try to publish posts in the Google Groups specified there, but they simply do not answer us, apparently considering our questions too elementary and not worthy of attention (however, these same questions in Google Groups are moderated) (for example, the previous question). And we are no longer able to contact any other support. Once again, we apologize for having to ask about this here.
It is true that if your app is a single use app then you do not need to be verified.
However if you don't get your app verified then there will be some restrictions.
you will see the unverified app screen
your refresh tokens will probably only be good for two weeks.
In the case of the YouTube api uploaded videos will be suck private.
If you can live with those points then you don't need to verify your app and you can continue as is.
If on the other hand you don't want to see the unverified app screen and you want a refresh token that will last longer then two weeks. You will need to verify your app. Yes, Even if your app is a console application running as a job some where you still show the consent screen. This is the YouTube video you will need to show Google. Show the consent screen popping up show the URL bar and then show your script running. You also need to set up the homepage and privacy policy screens. Yes i 100% agree with you that this is silly.
When you go though the process. Explain to google that this is a single use script running as a job some where.
Unfortunately when Google changed it so that Refresh tokens expire for unverified apps they pretty much tied the hands of all developers who are running such single user scripts. We now have to get our apps verified if we don't want to have to request a new refresh token every two weeks.
If your program needs to access the requested scopes of the Google account privacy, even though the user is yourself, you also need to provide a youtube video to demonstrate how you use this program. The auditor cannot guarantee whether you will make this program public.

Handling SEO on Isomorphic React

i'm using React & Node JS for building universal apps (). I'm also using react-helmet as library to handle page title, meta, description, etc.
But i have some trouble when loading content dynamically using ajax, google crawler cannot fetch my site correctly because content will be loaded dynamically. Any suggestion to tackle this problem?
Thank you!
I had similar situation but with backend as django, but I think which backend you use doesnt matter.
First of let me get to the basics, the google bots dont actually wait for your ajax calls to get completed. If you want to test it out register your page at google webmaster tools and try fetch as google, you will see how your page is seen by bots(mine was just empty page with loading icon), so since calls dont complete, not data and page is empty, which is bad for SEO ,as bots read text.
So what you need to do, is try server side rendering. This you can do in 2 ways either you prerender.io or create templates on backend which are loaded when the page is called for the first time, after which your single page app kicks in.
If you use prerender its paid but pre-render internally uses phantom.js which you are you can directly use. But it did not work out really well for me so I went with creating templates on the backend. So the bots or the user when come to page for first time(or first entry) the page is served from backend else front end.
Feel free to ask in case in any questions :)

Google Analytics APIs: get two specific numbers only (as simple as possible)

I want to get two statistics (Visits today, visits total) from my GoogleAnalytics account.
I checked Google Analytics resources such as
https://developers.google.com/analytics/devguides/reporting/?hl=en
https://developers.google.com/analytics/devguides/reporting/core/v3/reference
But it seems pretty time-consuming to get a certain ID, oAuth and everything working.
I do not need any user authentication, just an API request from my backend (GA authentication should be provided via request url for example).
To be honest, I found myself jumping from one link to another when doing tutorial and did not accomplish anything at the end.
What is the quickest way to get everything working? If there is a nice tutorial on getting JUST basic (two numbers) stuff from GoogleAnalytics I would be very grateful (everything I see is working almost as GA itself - just with custom styles/graphs etc. I need plain and simple number returned via REST api for instance.)
Thanks for any info!
Auth is understandably complicated, but it sounds like you need service account authentication since you are querying your own data and need it to run on the back end.
The quickest way to get from zero to querying the API is to follow the Hello Analytics Guide. I have linked you to the PHP service account page. But there are examples of service accounts, web apps, and installed apps in four different languages.
But an outline of the main steps are
Create a project in the Google developer console
Within that project create a service account and download the p12 file.
add that service account email to the particular view you wish to query.
You are now ready to modify the Hello Analytics example.
Below is a simple query for the number sessions today:
function getResults(&$analytics) {
return $analytics->data_ga->get(
'ga:XXXX', // Replace with your view ID.
'today',
'today',
'ga:sessions');
}
Feel free to ask any clarifying questions in the comments section.

Tracking query strings in Shopify, using webhooks and admin options

I was redirected here by Shopify support. I have three main questions for a project I'll be working on and wanted to see how possible some of the things would be.
We are looking to develop a plugin for use with Shopify to track purchases through the use of a link shortener (to see which link referred what purchases, etc.). I have a few questions that I'm not 100% sure on even after reading through the documentation.
The first problem that I seem to have is tracking the query string that the link shortener appends to the URL once it redirects. For this service, they use "?visit_id={hash}" and I need to be able to access this--at the very least on the "Thank You" page after an order. I saw in the docs that there is "landing_page_ref" (http://wiki.shopify.com/Order#landing_site_ref) but considering our query string is "visit_id" instead of one of the acceptable parameters, how would I be able to use that query string?
Lastly, I just have a question about how webhooks work with plugins that are on the app store. I know I can just call webhooks to wherever I want, like my personal server, but if this app gets onto the app store, I obviously don't want to hook everything to my own server. Is there a way to make it run on the store itself, and which URL should I use?
Lastly, what is the preferred method for handling configuration options for the plugin? Is there a way to hook into the admin backend or would all configuration have to be in a file within the plugin?
Thanks,
Andrew
I'll do my best to answer these for you. It sounds like you're used to building plugins for something like Wordpress - Shopify apps are a bit different.
You can't access anything on the thank you page for the order.
The thank you page/checkout process goes through a secured Shopify page that you don't have access to - so if you want information about what your URL shortener attached to the store pages, you'll need to retrieve it while they're on the page (using something like a ScriptTag + Javascript to track the query string), or hope that it's inside the Order when you retrieve it later (using the API or a webhook).
Webhooks need to talk to a server you run.
They send the information to you, and then you process it and deal with it. If you want to use webhooks, you will need to run a server with your app on it for the webhooks to talk to.
You manage your own config.
Because you're running your own server to handle those webhooks, you handle configuration for your plugin there. The apps I've worked on typically have their own database for managing configuration options, as well as an admin panel to manage them (it's what the user accesses when they click 'Log Into [Your App]' on the "Manage Apps" screen).
You'll need to run your own server to host your Shopify app.

Do any Google Voice APIs still work?

I am interested in writing a simple CLI program that will send an SMS using Google Voice.
There are several scripts and an API or two available, but I have run into an issue that none of them seem to work any longer; as they mostly rely on parsing returned web pages.
Is anyone familiar with a current API that works so that I can send an SMS on Google Voice?
Thanks!
Looks like Google Voice changed the login procedure, and it now expects you to pass back a cookie. This issue for the Python wrapper sums it up: http://code.google.com/p/pygooglevoice/issues/detail?id=60
UPDATE AND FIX: Actually all that needs to happen is to change the login URL in your code:
Old URL - https://www.google.com/accounts/ServiceLoginAuth?service=grandcentral
New URL - https://accounts.google.com/ServiceLogin?service=grandcentral