Import.io returns empty columns for javascript enabled api - import.io

I have searched here and couldnt find any answers. Some columns of an import.io api is not returning any data. These data is behind javascript, but, during training it returns data, but, during bulk extract or crawling, the column is empty. I mailed the import.io support, they asked me to post my question here. Can anyone help me?

If you look in the HTML of the page you can see an attributes on the part of the html near where "days to go" is displayed, even without JS.
e.g:
data-hours-remaining="532.6704760581918"
and:
data-end_time="2016-04-13T16:00:00-04:00"
I believe this corresponds to the "days to go" text.
Use a custom xpath e.g: .//*[#id='project_duration_data']/#data-hours-remaining to get the data you want from these attributes. You can then post-process them into whatever format you want; days, weeks etc..

Related

How to grab Xpath query in Googlesheet IMPORTXML function?

Trying to grab from a link (https://www.valueresearchonline.com/stocks/1764/infosys-ltd?utm_source=direct-click&utm_medium=stocks&utm_term=&utm_content=Infosys&utm_campaign=vro-search#snapshot)- this is the relevant HTML:
I've made the following query to try and work with the subsequent HTML:
Essential Checks
Altman Z-Score
=IMPORTXML($A$2,"//*[#id='z-score']/div/div[2]/div/div")
A2 having the relevant URL.
I think the Xpath is correct there, but not sure why it won't give me the result.
According to the IMPORTXML documentation:
IMPORTXML imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.
Therefore, the =IMPORTXML() command you are using reads the HTML source of the page without any JavaScript code associated with it and without executing it.
So since the website you are trying to import the data from is a dynamic website, the results you are getting are not the expected ones. In this case, unfortunately, the use of IMPORTXML() is not possible.

How to use exactTerms and excludeTerms with Google Custom Search JSON API

I've been working with Google Custom Search API and faced some inconveniences I hope you can help me with.
Google Custom Search API offers as a parameter in its call the possibility to search by an exact text as well as exclude it from results: exactTerms and excludeTerms. However, the q parameter is mandatory and cannot be ignored, so if I want to search only by an specific text I just can't.
So how can I do a query using JSON API that contains specifically the text I want? Does the q parameter work as the search form in Google?
If I want results including 'foo', should I do this:
service.cse().list(cx=const.SEARCH_ENGINE_KEY, q='"foo"').execute()
or this?:
service.cse().list(cx=const.SEARCH_ENGINE_KEY, q=None, exactTerms='foo').execute()
Thank you in advance for your time.
Due to the success on the answers (hehe) I'm posting my own conclusions. Please, if you've any facts regarding the original question, please post it.
I've been testing with some calls to Google CSE API and looks like you can pass to q parameter the same query you'd do in Google's main page textfield. So (at least for my needs), you don't need exactTerms and excludeTerms to get what I was trying to achieve.
Anyway, as I said before, if you know how to work with these parameters I'm sure everybody will thank you.

how to dynamically extract data from dropdown lists or multiple textboxes using import.io

I am making an API wherein I want to dynamically get data from the site http://transportformumbai.com/mumbai_local_train.php
Depending on start and end station and timings I want to get the list of all available trains along with the table given by clicking on viewroute column table. i.e. for eg.
I am using import.io connector... But it works well with a single textbox but not with multiple textboxes (Refer this link)or dropdown lists...
Can anyone guide what should I do next...
Apart from import.io is there anyother alternative?
I am a newbie working with crawlers... So please justify your answer.
What is web scraping... Do I have to use web scraper??
Thank you.
Actually, if you look in the URL bar the parameters for destination and time are defined there (highlighted below), so you don't need to worry about drop down menus, or using a Connector.
Use an Extractor on this page:
http://transportformumbai.com/get_schedule_new.php?user_route=western&start_station=khar_road&end_station=malad&start_time=00&end_time=18
Train it to get every column - note that the view route column contains links.
You can create a separate Extractor for the "view route" page:
http://transportformumbai.com/view_route_new.php?trainno=BYR1097&user_route=western&train_origin=Churchgate&train_end=Bhayandar&train_speed=S
Now you should "Chain" the second Extractor to the first one and it will pull that information from every link on the first one.
If you want to choose different destinations and times, just change the URL parameters of the original link.
http://support.import.io/knowledgebase/articles/613374-how-do-i-get-data-behind-dropdown-menus
Your best bet here seems to have an API for every URL combination. You have to analyze the URL structure.

How does api archive.org works?

As you surely know web.archive.org lets you inspect the history of a domain, ie:http://web.archive.org/web/*/besttatoo.com
I also has an API: http://archive.org/help/json.php
I need to get data from the API but I can't get many info on how to use it, has anyone used it and can paste some examples of use?
This link provides details about the item LovingU on archive.org:
http://archive.org/details/LovingU&output=json
To create an API query to your liking, use this page:
https://archive.org/advancedsearch.php#raw
That page allows you to choose your output format: JSON, XML, HTML, CSV or RSS and also the parameters your want to see. You can limit the number of results, too.

Programmatic access to On-Line Encyclopedia of Integer Sequences

Is there a way to search and retrieve the results from On-Line Encyclopedia of Integer Sequences (http://oeis.org) programmatically?
I have searched their site and the results are always returned in html. They do not seem to provide an API but in the policy statement they say its acceptable to access the database programmatically. But how to do it without screen scraping?
Thanks a lot for your help.
The OEIS now provides several points of access, not just ones using their internal format. These seem largely undocumented, so here are all of the endpoints that I have found:
https://oeis.org/search?fmt=json&q=<sequenceTerm>&start=<itemToStartAt>
Returns a JSON formatted response of the results found from the sequenceTerm given. If too many results were returned, count will be > 0 whilst results will be null. If no results were returned, count will be 0. itemToStartAt is used for pagination of results, as only a maximum of 10 are ever returned. This starts at 0. If you wanted to return a second page of results, this would equal 10. Information about what each of the entries means can be found here.
https://oeis.org/search?fmt=text&q=<sequenceTerm>&start=<itemToStartAt>
Exactly the same arguments as before, however this returns it in the OEIS internal format. Which is largely written about here. Unless your project requires it, I'd highly recommend using the JSON format over this.
https://oeis.org/search?fmt=<json|text>&q=id:A<sequenceNumber>
Will return a single result if the sequenceNumber is found. This is the suggested method for obtaining single sequences, as it appears to be far more optimised than some of the alternative methods that can be used as queries. Requests often take under a second. Alternative search query methods can be found on this page.
https://oeis.org/A<sequenceNumber>/graph?png=1
This endpoint can be used to grab the images used to graph the data points. Alternatively, setting png to equal to zero returns the HTML page containing a graph of it.
https://oeis.org/recent.txt
This returns a list of recently updated entries in the OEIS internal format. There are no parameters available, or JSON format, as this seems like a static text file that is simply being served to the client. Due to the length of replies from the OEIS database (for some sequences replies can take above five seconds), I'd highly recommend heavily caching requests and using the above endpoint to update them when they change.
A URL of the form http://oeis.org/search?fmt=text&q=2,5,14,50,233 gives a nicely formatted text output.
But it seems there is no way to get a single sequence in text form.
If you happen to use Mathematica, it sounds like the following notebook might help. It allows you to specify a sequence and automatically import a detailed list of matching entries from the OEIS:
http://www.brotherstechnology.com/math/oeis_mathematica.html
It looks like direct use of their CGI program is the only API they provide.
URL for Searching the Database
https://oeis.org/search?q=id:A000032&fmt=text
gives the plain text form of an entry in their internal format
https://oeis.org/eishelp1.html