DataSift and GoogleBigQuery

DataSift and GoogleBigQuery - google-bigquery

I have been trying to export data to a google bigquery dataset from datasift, but except for 4 empty rows, no other relevant data has been pushed.
I followed instruction from this link: http://dev.datasift.com/docs/push/connectors/bigquery. Not sure if it's the csdl code that I used the cause.
For example I configured a stream using:
wikipedia.title contains "Audi".
The live preview has no output. Also, the only data sources that I've set as active are Interaction and Wikipedia.
Please let me know what may be the reason for this. At the end of every stream recording I don't see any changes, expect the creation of the table mentioned in the destination with 4 empty rows(some row have null values, and interaction.type is ds_verify).
Thank you!

Related

Why the output of Nilearn's fetch ABIDE dataset is empty?

I want to diagonse autism and one of the best autism diagnosis datasets is ABIDE (Autism Brain Imaging Data Exchange). My final goal in this part is to have a connectivity matrix which I can use for the rest of my research. At this moment I am trying to download ABIDE dataset using Nilearn library. I use the code below as mentioned in Nilearn's documentry, but unfortunately I get an empty list from dataset.func_preproc. I don't know the reason, because it works fine for other datasets of Nilearn which I tested.
dataset = nilearn.datasets.fetch_abide_pcp(derivatives=['func_preproc', 'func_mean'])
print(dataset['func_preproc'])
and the output is:
[]
as you see , dataset['func_preproc'] is an empty list.
Does anyone have any idea about this?

Read from Google Sheet connection only allows first 100,000 rows

I can only read from the first 100,000 rows of any particular tab in a Google Sheet via the API.
Is this a known limitation of the Google Sheets API? I didn't see a reference to it in the documentation.

Issue and workaround:
I thought that if you want to retrieve the values from Spreadsheet, it seems that when the method of "spreadsheets.get" is used for the Spreadsheet including the large data, the data cannot be correctly retrieved because of an error like Response Code: 413. Message: response too large.. I thought that this might be the reason for your issue. And, in this situation, I confirmed that even when the method is changed from "spreadsheets.get" to "spreadsheets.values.get " and "spreadsheets.values.batchGet", the issue occurred. So I thought that this situation might be the current specification of Sheets API.
But, fortunately, I confirmed that when I tested to retrieve the values from your Spreadsheet using the query language, all values can be retrieved. And also, I confirmed that when I tested to retrieve the values from your Spreadsheet using the Spreadsheet service of Google Apps Script, all values can be retrieved.
So in this answer, I would like to propose these 2 patterns.
Pattern 1:
In this pattern, the values are retrieved by the query language. I thought that in your situation, this might be suitable. The endpoint is as follows.
https://docs.google.com/spreadsheets/d/###spreadsheetId###/gviz/tq?tqx=out:csv&gid=###sheetId###&access_token=###accessToken###
In this case, the access token can be also included in the request header instead of the query parameter. At that time, please use Authorization: Bearer ###. When the above endpoint is accessed using your access token, all values are returned as CSV data.
Pattern 2:
In this pattern, the values are retrieved by the Spreadsheet service of Google Apps Script. The sample script is as follows. When you use this script, please test this script at the script editor of Google Apps Script.
function myFunction() {
const id = "###spreadsheetId###";
const sheet = SpreadsheetApp.openById(id).getSheetByName("Tab 1");
const values = sheet.getDataRange().getValues();
console.log(values.length)
}
When I tested this script for your sample Spreadsheet, 405028 can be seen in the log.
References:
Query Language
Spreadsheet Service

Struggling to import analyst share price to GoogleSheets

I am trying to create a column that imports the analyst price target from TipRanks website.
I uploaded two images:
Image 1: you can see the cell that I want to import.
Image 2: you can see my function that doesn't work.
What should I change in order to get this live info?
Thanks.

The site you are checking is actually "javascript" generated thus import functions won't properly work on them.
To check, just try to import the whole site data. If it returns a javascript function, then it is javascript generated.
Sample (tipranks.com)
What you can do is actually try to find other sites that provide the same data.
I did find one with the same data you are looking for, 50.38 for csiq. Link is "https://www.marketwatch.com/investing/stock/csiq/analystestimates". And since data is shown as table, it would be easier to import using importhtml.
Cell formula is:
=INDEX(IMPORTHTML("https://www.marketwatch.com/investing/stock/csiq/analystestimates", "table", 5), 2, 2)
Sample output:
The table is the fifth one in the DOM, and INDEX(table, 2, 2) means getting the 2nd row 2nd column of the table.
If the site is no good for you, you can try finding other sites that would suit your needs. And then use either importhtml or importxml depending on the site structure.

When you inspect the network when the website is loading you will see that the prices come when calling the forecast endpoint https://www.tipranks.com/stocks/tsla/forecast. This in turn returns an html response which is probably generated with Javascript on the client because they use React on the frontend, but you can still see the preview in the Network tab of the browser dev tools.
You can then copy the preview in VSCode and prettify it, to try and pin point the span holding the price. Of course it won't be exact science, because the html tags are generated with some media queries, but you will get close enough to some extent.
After you get the xml path but you get an empty error, you can delete some tags until you get some text. Use search in google sheets to search for highest price label, and than continue adding tags until you get the desired value.
Here is what I managed to get:
Lowest price target:
=importxml("https://www.tipranks.com/stocks/snow/forecast", "/html/body/div[1]/div[1]/div[4]/div[1]/div[2]/div[1]/div[4]/div[2]/div[2]/div[4]/div[1]/div[1]/div[5]/span[2]")
Average price target:
=importxml("https://www.tipranks.com/stocks/snow/forecast", "/html/body/div[1]/div[1]/div[4]/div[1]/div[2]/div[1]/div[4]/div[2]/div[2]/div[4]/div[1]/div[1]/div[3]/span[2]")
Highest price target:
=importxml("https://www.tipranks.com/stocks/snow/forecast", "/html/body/div[1]/div[1]/div[4]/div[1]/div[2]/div[1]/div[4]/div[2]/div[2]/div[4]/div[1]/div[1]/div[1]/span[2]")
In time these methods might change depending on their development process, but you could use the above steps to update the script.
P.S. I wasn't satisfied with the marketwatch analyst price targets. I think the wisdom of the crowd is better on tipranks.

Try this one. Works perfectly fine on my personal Stock Portfolio on Google Sheets:
Lowest Price Target:
=importxml(CONCATENATE("https://www.tipranks.com/stocks/", A1,"/forecast"), "//*[#class='colorpurple-dark ml3 mobile_fontSize7 laptop_ml0']")
Average Price Target:
=importxml(CONCATENATE("https://www.tipranks.com/stocks/", A4,"/forecast"), "//*[#class='colorgray-1 ml3 mobile_fontSize7 laptop_ml0']")
Highest Price Target:
=importxml(CONCATENATE("https://www.tipranks.com/stocks/", A4,"/forecast"), "//*[#class='colorpale ml3 mobile_fontSize7 laptop_ml0']")

Openrefine not working as expected

I'm very new to OpenRefine, so please bear with me if i have made a simple mistake.
I'm parsing a HTML website to gather some date.
Everything went fine with fetching the individual pages, but now the parsing of the HTML fails.
I'm creating a new column, based on the one holding all the page's HTML. I'm trying to get to the data in a specific DIV[20].
In the"create column based on this column" window it gives me a preview when using value.parseHtml().select("DIV")[20] , which results in exactly what i need... executing it gives me nothing but blank cells.
it even tells me that it is "filling 0 rows with grel:value.parseHtml().select("DIV")[20]"
Any clue what i'm doing wrong here?

You just need to finalize with .toString() to output the JSON.org object AS a string.
This is explained on our wiki here: https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-commands
I also updated the select() function with that example: https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.

You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")

You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DataSift and GoogleBigQuery - google-bigquery

Related

Why the output of Nilearn's fetch ABIDE dataset is empty?

Read from Google Sheet connection only allows first 100,000 rows

Struggling to import analyst share price to GoogleSheets

Openrefine not working as expected

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Categories

Resources