Google Sheets IMPORTXML: Xpath not working (document node does provide data) - api

Using Google Sheets, I am trying to retrieve text passages from the Perseus Scaife Library, which has a working API.
When I query for the document node (=importxml("https://scaife-cts.perseus.org/api/cts?request=GetPassage&urn=urn:cts:greekLit:tlg0527.tlg001.opp-grc2:1.1","/")) I get all the data, including the URNs etc. However, any other xpath_query gives an error.
I know that Google Sheets can access the data, but I would like to be able to select only one node (//p).

You want to retrieve the text in passage. If my understanding is correct, how about this answer?
=importxml(A1, "//*[local-name()='passage']")
Result :
Note :
https://scaife-cts.perseus.org/api/cts?request=GetPassage&urn=urn:cts:greekLit:tlg0527.tlg001.opp-grc2:1.1 is converted by URL encode and put to "A1".
Converted URL is https://scaife-cts.perseus.org/api/cts?request=GetPassage&urn=urn%3acts%3agreekLit%3atlg0527%2etlg001%2eopp%2dgrc2%3a1%2e1.
Reference :
local-name
If this was not what you want, I'm sorry.

Related

[Mendeley API]: How to search for partial terms

I am using the Mendeley API to retrieve documents in the profile of a user.
Specifically, I am using this API:
GET https://api.mendeley.com/search/documents?view=all&limit=25&title=ONTOLOGY
I would like to search for all the documents that match a partial term, i.e. instead of the full word "ONTOLOGY" I would like to get the same result if I do an HTTP call like
GET https://api.mendeley.com/search/documents?view=all&limit=25&title=ONTOLO
How can I achieve that?
Should I put any jolly character?
I tried
ONTOLO*
ONTOLO$
ONTOLO?
with no luck.
I haven't found any documentation related to this feature.
Thanks!!

How to include the query filter in URL (cloudSearch)

I am trying to retrieve data from cloudSearch, searching for the word "Person" and adding the following filter:
(prefix field=claimedgalleryid '')
The problem is that I don't know how to create the URL using that exact filter.
Could someone give me a suggestion or some link to Amazon documentation related to this topic?
What I've tried and didn't work:
...search?q=Gallerist&size=10&start=0&fq=(prefix%20field=claimedgalleryid%20%27%27)
...search?q=Gallerist&size=10&start=0&filter=(prefix%20field=claimedgalleryid%20%27%27)
You were close with your first attempt--it looks like you forgot to URI encode the = sign as %3D. Try this instead:
&fq=(prefix+field%3Dclaimedgalleryid+'')
I highly recommend using the "test search" feature to work out the kinks in your query syntax. You can see the results right there, and then use the "View Raw: JSON" link to copy the full request URL and see how characters get escaped and such.

Openrefine not working as expected

I'm very new to OpenRefine, so please bear with me if i have made a simple mistake.
I'm parsing a HTML website to gather some date.
Everything went fine with fetching the individual pages, but now the parsing of the HTML fails.
I'm creating a new column, based on the one holding all the page's HTML. I'm trying to get to the data in a specific DIV[20].
In the"create column based on this column" window it gives me a preview when using value.parseHtml().select("DIV")[20] , which results in exactly what i need... executing it gives me nothing but blank cells.
it even tells me that it is "filling 0 rows with grel:value.parseHtml().select("DIV")[20]"
Any clue what i'm doing wrong here?
You just need to finalize with .toString() to output the JSON.org object AS a string.
This is explained on our wiki here: https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-commands
I also updated the select() function with that example: https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.
You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")
You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.

google analytics API, how to extract pageviews for a specific page?

Google Analytics API: how to extract pageviews for a specific page?
I tried using something like
ga:pagePath=~page.php%3fid%3d44 (page.php?id=44)
but it doesn't seem to work... I get "no results found" where I have 20 pageviews for sure
UPDATE
I think I found the solution
ga:pagePath==/website/page.php?id=44
for some reason I had to include the complete path and ==
To use a partial path to match for a page in filters you should use
ga:pagePath=#page.php?id=44
=# tells ga to match a substring.
What you were originally using was incorrect for this.
I think your problem is that you put the hex version of the ? and = characters into your query, which doesn't match how Analytics stores the page paths. If you change these to the normal characters it should work:
ga:pagePath=~page.php?id=44
Your other solution should work as well but is a bit more inflexible in case you wanted to tweak the query to return other pages.