I am trying to fetch data related to medical or doctor related protest from gdelt. I have tried using GDELT API, it returns results with a few irrelevant links but for further analysis it has very few columns.
While using Big query the columns returned our more and good for analysis. But I want to know what event codes should I use to fetch only medical or doctor related protests data.
Related
Most of the larger cities have a table with climate data. I think Wikipedia calls them weatherbox.
Is there a way to retrieve these tables via the API?
For instance, to get the climate section for NYC I do:
https://en.wikipedia.org/w/api.php?action=parse&prop=wikitext&pageid=645042§ion=18&format=json
I get the text but not the actual table which is referred to as {{New York City weatherbox}}.
There seems to be another way to retrieve some weatherboxes but this isn't working for many cities.
This one works:
https://en.wikipedia.org/wiki/Template:New_York_City_weatherbox
but this one doesn't:
https://en.wikipedia.org/wiki/Template:Turin_weatherbox
I am trying to query the patents-public-data:patents dataset. This dataset includes information on U.S. patent classifications according to the CPC guidelines.
There are a couple "publications" tables within the patent dataset. Each of them (except for one) has an assigned date, e.g. 201710 or 201809. I wonder what these dates signify. Which "publications" table is the most up to date? And how often is it updated?
As it was mentioned, SO is not the appropriate channel for this question; however, if you check the dataset information within the GCP Marketplace this dataset is updated quarterly. It looks like the table named "publications" is the most up to date one and the tables "publications_201710", "publications_201802", "publications_201809" and "publications_201903" contain the publications until the date indicated within their name.
You can find additional information regarding this dataset in this link. In addition, in the BigQuery public datasets documentation you can see the alias to contact the team that manages the BigQuery public dataset program.
I'm looking to pull together a full list of the current FTSE 100 constituents with the addition of a column highlighting when the company was founded.
Each wiki info box for the individual companies within a table contains the founder date. I'm struggling to work out the function in sparql utilising dbpedia to take the existing ftse 100 table.
I am new to linked data and trying to link ordnance survey postcodes to LSOA data.
I am using the OS linked Data Sparql API and the end point is:
http://data.ordnancesurvey.co.uk/datasets/os-linked-data/apis/sparql
I have a query which returns the postcodes, although I am trying to link LSOA codes with the postcodes to learn from. The code I have so far is:
SELECT ?postcode WHERE {
?postcodeUnit a <http://data.ordnancesurvey.co.uk/ontology/postcode/PostcodeUnit>
BIND (STRAFTER((STR(?postcodeUnit)),'postcodeunit/') as ?postcode)
}limit 10
This code brings back the postcodes, but I am trying to link to LSOA codes.
Thanks in Advance
You may be a bit stuck with this data source.
http://data.ordnancesurvey.co.uk/ontology/postcode/PostcodeUnit
Shows that there is no LSOA data associated with a postcode unit. And a quick look at the RDF for the site shows that they don't have a field called LSOA.
I was wondering if it is possible to work on a per row basis in the kettle?
I am trying to implement a reporting scheme which consists of a table, where the requests get queued for processing and then the Pentaho job that picks up the records on that table.
my job currently has 3 transformations in it,
1st is to get records from the queued requests table
2nd is to analyze the values on each record and come up with multiple results based on that record. for example, a user would request to have records of movies of the horror genre. then it should spit out the horror movies
3rd is to further retrieve the information about the movies such as the year, director and etc, which is to be outputted to an excel file.
this is the idea, but it's a bit challenging doing it in Pentaho as it does stuff all at the same. is there a way that I can make my job work on records one by one?
EDIT.
Just to add, I have been trying to extend the implementation of the Pentaho cookbook sample but if I compare to my design, its like step 2 and step 3 only.
I can't seem to make the table input step work one at a time.
i just made it act like the implementation in the cookbook, i did adjustments on it. instead of using two transformations to gather all the necessary fields, i just retrieved all the information that i need in 1 transformation.
then after that i copied those information to the next steps, then some queries to complete the information and it is now working.
passing parameters between transformations is a bit confusing, there are parameters to be set on the transformation itself and also on the job where the transformations lay so i kinda went guessing for some time just to make it work.