Use custom function to populate gSpreadsheet cell based on a XML/JSON response - api

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.

You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")

You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.

Related

Using an API to Extract All Comments from a Reddit Post

I am using the Reddit API (Pushshift) : https://github.com/pushshift/api
Using the documentation, I understand how I can use this to extract every comment containing the word "covid" that was left in a certain time period:
https://api.pushshift.io/reddit/search/comment?q=covid&after=3h&before=2h&size=1
The output looks something like this:
{"data":[{"subreddit_id":"t5_2qh6p","author_is_blocked":false,"comment_type":null,"edited":false,"author_flair_type":"richtext","total_awards_received":0,"subreddit":"Conservative","author_flair_template_id":null,"id":"j98zf27","gilded":0,"archived":false,"collapsed_reason_code":null,"no_follow":false,"author":"VamboRoolOkay","send_replies":true,"parent_id":41917615743,"score":1,"author_fullname":"t2_7uxkru5f","all_awardings":[],"body":"I will never believe that election fraud wasn't a significant factor. Go ahead - call it a conspiracy theory. But I also maintained that Covid was lab-created. Truth is the Daughter of Time.","top_awarded_type":null,"author_flair_css_class":null,"author_patreon_flair":false,"collapsed":false,"author_flair_richtext":[{"e":"text","t":"Conservative"}],"is_submitter":false,"gildings":{},"collapsed_reason":null,"associated_award":null,"stickied":false,"author_premium":false,"can_gild":true,"link_id":"t3_116l7ct","unrepliable_reason":null,"author_flair_text_color":"dark","score_hidden":true,"permalink":"/r/Conservative/comments/116l7ct/kamala_harris_plans_on_running_with_biden_in_2024/j98zf27/","subreddit_type":"public","locked":false,"author_flair_text":"Conservative","treatment_tags":[],"created_utc":1676866031,"subreddit_name_prefixed":"r/Conservative","controversiality":0,"author_flair_background_color":"","collapsed_because_crowd_control":null,"distinguished":null,"retrieved_utc":1676866047,"updated_utc":1676866048,"body_sha1":"328df3784d15f77b98a84418c4ce720822227cfe","utc_datetime_str":"2023-02-20 04:07:11"}],"error":null,"metadata":{"es":{"took":98,"timed_out":false,"_shards":{"total":828,"successful":828,"skipped":824,"failed":0},"hits":{"total":{"value":573,"relation":"eq"},"max_score":null}},"es_query":{"size":1,"query":{"bool":{"must":[{"bool":{"must":[{"simple_query_string":{"fields":["body"],"query":"covid","default_operator":"and"}},{"range":{"created_utc":{"gte":1676862433000}}},{"range":{"created_utc":{"lt":1676866033000}}}]}}]}},"aggs":{},"sort":{"created_utc":"desc"}},"es_query2":"{\"size\":1,\"query\":{\"bool\":{\"must\":[{\"bool\":{\"must\":[{\"simple_query_string\":{\"fields\":[\"body\"],\"query\":\"covid\",\"default_operator\":\"and\"}},{\"range\":{\"created_utc\":{\"gte\":1676862433000}}},{\"range\":{\"created_utc\":{\"lt\":1676866033000}}}]}}]}},\"aggs\":{},\"sort\":{\"created_utc\":\"desc\"}}","api_launch_time":1673017478.254743,"api_request_start":1676873233.6143198,"api_request_end":1676873233.7406816,"api_total_time":0.12636184692382812}}
My Question: Suppose I identify a post that contains the word "covid" - now, I want to retrieve every comment on this post : Is this possible to do?
For instance, based on the output of these results, I see that :
link_id: t3_116l7ct
parent_id:41917615743
Can I somehow use this information to write an API query to retrieve all comments from this post?
I tried the following query but got an empty result: https://api.pushshift.io/reddit/comment/search/?link_id=t3_116cjib
Thanks!

Read from Google Sheet connection only allows first 100,000 rows

I can only read from the first 100,000 rows of any particular tab in a Google Sheet via the API.
Is this a known limitation of the Google Sheets API? I didn't see a reference to it in the documentation.
Issue and workaround:
I thought that if you want to retrieve the values from Spreadsheet, it seems that when the method of "spreadsheets.get" is used for the Spreadsheet including the large data, the data cannot be correctly retrieved because of an error like Response Code: 413. Message: response too large.. I thought that this might be the reason for your issue. And, in this situation, I confirmed that even when the method is changed from "spreadsheets.get" to "spreadsheets.values.get " and "spreadsheets.values.batchGet", the issue occurred. So I thought that this situation might be the current specification of Sheets API.
But, fortunately, I confirmed that when I tested to retrieve the values from your Spreadsheet using the query language, all values can be retrieved. And also, I confirmed that when I tested to retrieve the values from your Spreadsheet using the Spreadsheet service of Google Apps Script, all values can be retrieved.
So in this answer, I would like to propose these 2 patterns.
Pattern 1:
In this pattern, the values are retrieved by the query language. I thought that in your situation, this might be suitable. The endpoint is as follows.
https://docs.google.com/spreadsheets/d/###spreadsheetId###/gviz/tq?tqx=out:csv&gid=###sheetId###&access_token=###accessToken###
In this case, the access token can be also included in the request header instead of the query parameter. At that time, please use Authorization: Bearer ###. When the above endpoint is accessed using your access token, all values are returned as CSV data.
Pattern 2:
In this pattern, the values are retrieved by the Spreadsheet service of Google Apps Script. The sample script is as follows. When you use this script, please test this script at the script editor of Google Apps Script.
function myFunction() {
const id = "###spreadsheetId###";
const sheet = SpreadsheetApp.openById(id).getSheetByName("Tab 1");
const values = sheet.getDataRange().getValues();
console.log(values.length)
}
When I tested this script for your sample Spreadsheet, 405028 can be seen in the log.
References:
Query Language
Spreadsheet Service

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

I have been trying to call Wikipedia API to retrieve page id and wikidata item id using below call and it works fine.
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects=1&format=xml&titles=Cat
but I need to retrieve the same information from other languages of my choice for example if I mention German and French languages in my call, it should look for their translation of word Cat and retrieve their page info. There is langlink property in Wikipedia API but somehow it doesn't work with query action along with pageprop.
So ideally, I want something like this:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&prop=langlinks&lllang=de&lllang=fr&titles=Cat
Any help would be appreciated.
Using lllang twice will just result in the second value overwriting the first one. You'll have to omit the paramter and then you get all the links:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|langlinks&ppprop=wikibase_item&titles=Cat

Obtain Deviantart Deviation ID / UUID from page URL

I was looking at the Deviantart API to see what you can do with it .
A lot of requests require you to provide a deviation id to work with.
Take for instance adding a deviation to favorites ( in Collections -> Add deviation to favorites above, I cannot post more than 2 links... )
Now I looked through the API to figure out how to obtain that id, but I did not find out how to do so.
If I only have the deviation URL, for instance http://kennyklent.deviantart.com/art/Pinkie-Pie-Dancing-296143815 , how can I tell its deviation-id?
It is not the number at the end 296143815, I would've thought so, but it's not.
If it helps, here's one example from the api's /browse/dailydeviations endpoint
"deviationid": "27FD366A-30CB-FC3E-DE54-9621E90FCE60",
"printid": "E984FC87-8B57-239C-FE7C-E2674A0DDFC4",
"url": "http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397",
So this deviation SF-Botanical-Gardens-57879397 has the id 27FD366A-30CB-FC3E-DE54-9621E90FCE60 - but how would I find out if it wasn't listed in the examples?
Update 06/2017:
For anyone stumbling across this 2 years later, the answer below still works but there is now another way to get the UUID. Every Deviation now has a meta property da:appurl showing the UUID value on the deviation page itself.
To stay with the SF-Botanical-Gardens-57879397 example from above, looking at the page source at http://mudimba.deviantart.com/art/SF-Botanical-Gardens-57879397 reveals:
<meta property="da:appurl" content="DeviantArt://deviation/27FD366A-30CB-FC3E-DE54-9621E90FCE60">
Which contains exactly the UUID value 27FD366A-30CB-FC3E-DE54-9621E90FCE60
Original answer
I got an answer from a Deviantart dev directly, http://comments.deviantart.com/1/492518964/3755610860
You cannot convert integer IDs into UUID format, you have to query the api to find the correct uuid. So for your example, you would query the /gallery/folders endpoint and then the gallery/{folderid} endpoint to get the list of deviations in that folder.
There's no easier way to obtain the UUID for a given URL for now.

Is it possible to compare 2 dimensions in a Google Analytics query filter?

I'm just starting out with the Google Analytics API and am wondering if it's possible to compare two dimensions via an operand in the filters I pass in the query. And by wondering I mean I've tried it, but have had no success.
Specifically I'm trying to compare 2 custom variable values. One holds the user who created a post (customVarValue3), the other the user who is viewing the post (customVarValue5). I want to get the pageviews only for the visitors who are not also the creator. The filter looks like this (without urlencoding applied):
ga:customVarValue3!=ga:customVarValue5
The full query (url encoded) looks like this:
https://www.google.com/analytics/feeds/data?ids=ga%3Axxxxxx&dimensions=ga%3AcustomVarValue1%2Cga%3AcustomVarValue2%2Cga%3AcustomVarValue3&metrics=ga%3Apageviews&filters=ga%3AcustomVarValue3!%3Dga%3AcustomVarValue5&sort=-ga%3Apageviews&start-date=2012-02-09&end-date=2012-02-23&max-results=50
However, it returns the same results (and I know there are results where ga:customVarValue3 == ga:customVarValue5).
Probably it isn't possible, but I just wanted to see if anyone knew how to achieve this or has a workaround or something.
No, it is not possible using the GAv3 API in its present state. You can, however, get all the results by using the specified two custom variables as the dimensions for a report, and programmatically filter out the unnecessary results.
Some simple construct like
for(var item in collatedResultsListwithDimensions) {
for(var row in item.rows) {
if(row[0]!=row[1])
newResultRows.push(row);
}
}
Now your newResultRows will have those rows where row[1]!=row[0] assuming the two custom variables you mentioned are the first two dimensions.