How to get date of creation of Wikipedia page by API? - wikipedia-api

Wikipedia article United States presidential election, 2016 has page information which contains "Date of page creation" under "Edit history" section. How do I get this date through an API?

For this goal you need to use action query with property revisions:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&rvprop=timestamp&rvdir=newer&titles=United States presidential election, 2016
where:
rvdir=newer - sort revisions by oldest first
rvlimit=1 - get only one revision (the 1st one)
rvprop=timestamp - get only information about the revision timestamp
In your case the result will be: "2009-02-03T09:59:37Z" → 09:59, 3 February 2009.

Related

How to display only searched string in column in postgresql

I want to only display searched string from a table, as example this is my table:
Table name: guidelines
id content
1 An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
2 There were a lot of people getting accused on this particular town.
If I use search query to search for "accused", it will show the full result:
SELECT content FROM "guidelines" WHERE "content" 'ILIKE' '%accused%';
Result:
content
An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
There were a lot of people getting accused on this particular town.
How can I only get the first matching string and followed by the data on the column, as example this is my goal:
content
Accused, often as “the accused”, refers to th...
accused on this particular to...
update: I updated the table and column name to make it better to differentiate table and column
In Postgresql, you can do that by using position function and substring function. see the following query as an example:
SELECT
id,
substring(content, position ('accused' in content)) as matched
FROM
guidelines
WHERE
content LIKE '%accused%'
Try this :
SELECT substring(content from '%#"accused%#"%' for '#') from guidelines;
each # is the place holder defined in the last part for '#' and need and aditional "
So you have % and function will return what is found inside both placeholder. In this case is % or the rest of the string after accused

Understanding Themes in Google BigQuery GDELT GKG 2.0

I'm using Google bigquery to analyze the GDELT GKG 2.0 dataset and would like to better understand how to query based on themes (or V2Themes). The docs mention a 'Category List' spreadsheet but so far I've been unsuccessful in finding that list.
the following asesome blog mentions that you can use World Bank Taxonomy among others to narrow down your search. My objective is to find all items that mention "droughts / too little water" ,all items that mention "floods / too much water" and all items that mention " poor quality / too dirty water" that have a geographical match on a sub-country level.
So far I've been able to get a list of distinct themes but this is non-extensive and I don't get the hierarchy / structure of it.
SELECT
DISTINCT theme
FROM (
SELECT
GKGRECORDID,
locations,
REGEXP_EXTRACT(themes,r'(^.[^,]+)') AS theme,
CAST(REGEXP_EXTRACT(locations,r'^(?:[^#]*#){0}([^#]*)') AS NUMERIC) AS location_type,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){1}([^#]*)') AS location_fullname,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){2}([^#]*)') AS location_countrycode,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){3}([^#]*)') AS location_adm1code,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){4}([^#]*)') AS location_adm2code,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){5}([^#]*)') AS location_latitude,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){6}([^#]*)') AS location_longitude,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){7}([^#]*)') AS location_featureid,
REGEXP_EXTRACT(locations,r'^(?:[^#]*#){8}([^#]*)') AS location_characteroffset,
DocumentIdentifier
FROM
`gdelt-bq.gdeltv2.gkg_partitioned`,
UNNEST(SPLIT(V2Locations,';')) AS locations,
UNNEST(SPLIT(V2Themes,';')) AS themes
WHERE
_PARTITIONTIME >= "2018-08-20 00:00:00"
AND _PARTITIONTIME < "2018-08-21 00:00:00" )
WHERE
(location_type = 5
OR location_type = 4
OR location_type = 2) --WorldState, WorldCity or US State
ORDER BY
theme
And a list of water related themes I've been able to find so far (sample, not exhaustive):
CRISISLEX_C06_WATER_SANITATION
ENV_WATERWAYS
HUMAN_RIGHTS_ABUSES_WATERBOARD
HUMAN_RIGHTS_ABUSES_WATERBOARDED
HUMAN_RIGHTS_ABUSES_WATERBOARDING
NATURAL_DISASTER_FLOODWATER
NATURAL_DISASTER_FLOODWATERS
NATURAL_DISASTER_FLOOD_WATER
NATURAL_DISASTER_FLOOD_WATERS
NATURAL_DISASTER_HIGH_WATER
NATURAL_DISASTER_HIGH_WATERS
NATURAL_DISASTER_WATER_LEVEL
TAX_AIDGROUPS_WATERAID
TAX_DISEASE_WATERBORNE_DISEASE
TAX_DISEASE_WATERBORNE_DISEASES
TAX_FNCACT_WATERBOY
TAX_FNCACT_WATERMAN
TAX_FNCACT_WATERMEN
TAX_FNCACT_WATER_BOY
TAX_WEAPONS_WATER_CANNON
TAX_WEAPONS_WATER_CANNONS
TAX_WORLDBIRDS_WATERFOWL
TAX_WORLDMAMMALS_WATER_BUFFALO
UNGP_CLEAN_WATER_SANITATION
WATER_SECURITY
WB_1000_WATER_MANAGEMENT_STRUCTURES
WB_1021_WATER_LAW
WB_1063_WATER_ALLOCATION_AND_WATER_SUPPLY
WB_1064_WATER_DEMAND_MANAGEMENT
WB_1199_WATER_SUPPLY_AND_SANITATION
WB_1215_WATER_QUALITY_STANDARDS
WB_137_WATER
WB_138_WATER_SUPPLY
WB_139_SANITATION_AND_WASTEWATER
WB_140_AGRICULTURAL_WATER_MANAGEMENT
WB_141_WATER_RESOURCES_MANAGEMENT
WB_143_RURAL_WATER
WB_144_URBAN_WATER
WB_1462_WATER_SANITATION_AND_HYGIENE
WB_149_WASTEWATER_TREATMENT_AND_DISPOSAL
WB_150_WASTEWATER_REUSE
WB_155_WATERSHED_MANAGEMENT
WB_156_GROUNDWATER_MANAGEMENT
WB_159_TRANSBOUNDARY_WATER
WB_1729_URBAN_WATER_FINANCIAL_SUSTAINABILITY
WB_1731_NON_REVENUE_WATER
WB_1778_FRESHWATER_ECOSYSTEMS
WB_1790_INTERNATIONAL_WATERWAYS
WB_1798_WATER_POLLUTION
WB_1805_WATERWAYS
WB_1998_WATER_ECONOMICS
WB_2008_WATER_TREATMENT
WB_2009_WATER_QUALITY_MONITORING
WB_2971_WATER_PRICING
WB_2981_DRINKING_WATER_QUALITY_STANDARDS
WB_2992_FRESHWATER_FISHERIES
WB_427_WATER_ALLOCATION_AND_WATER_ECONOMICS
While this link is provided as a theme listing:
http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_CategoryList.xlsx
...it is far from complete (perhaps just the original theme list?). I just pulled a single day's worth of GKG, and there are tons of themes not on the list of 283 themes in that spreadsheet.
GKG documentation located at https://blog.gdeltproject.org/world-bank-group-topical-taxonomy-now-in-gkg/ points to a World Bank Taxonomy located at http://pubdocs.worldbank.org/en/275841490966525495/Theme-Taxonomy-and-definitions.pdf. The GKG post implies this World Bank taxonomy has been rolled into the GKG theme list.
This is presented as a complete listing of World Bank Taxonomy themes. Unfortunately, I've found numerous World Bank themes in GKG that aren't in this publication. The union of these two lists represents a portion of GKG themes, but it definitely isn't all of them.
Here is the list of GKG Themes:
http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_CategoryList.xlsx
If anyone needs this, I have added a list of all themes in the GKG v1 in the timeperiod from 1/1/2017-31/12/2020 which are at least present in 10 or more articles for that particular day: Themes.parquet
It consists of 17639 unique themes with the count per day. Looks like this:
The complete numbers for that 4 year dataset is 36 713 385 unique actors, 50 845 unique themes as well as 26 389 528 unique organizations. These numbers are not filtered for different spellings for the same entity, and hence Donald Trump and Donald J. Trump will count as two separate actors.
The best GDELT GKG Themes list I could find is here, as described in this blog post.
I put it into a CSV file, which I find slightly easier to work with, and put that file here.

OpenRefine cell.cross creates column but fills zero rows

I have two projects with a column in common I am attempting to merge.
Project 1 has columns Date Published, Type, Story, Subtopics and Author
Project 2 has columns PageTitle, UniquePageviews and AvgTimeOnPage
PageTitle and Story have equivalent values. I want to add UniquePageviews and AvgTimeOnPage to Project 1.
When I use GREL forEach(cross(cell,"Project2","PageTitle"),v,v.cells["AvgTimeOnPage"].value)[0] (either manually or using VIB-Bits extension), I am notified that two new columns have been created ... but "by filling 0 rows." The new columns are blank except for the header, which is added correctly.
How do I get cell.cross to fill any of the rows it says it's filling?
Edit: Database samples here: https://imgur.com/a/fRGhhNw
Project 1:
Date published Type Story Subtopic(s) Author
4/30/2018 News in Brief Last year's solar eclipse set off a wave in the upper atmosphere Planetary Science, Earth Lisa Grossman
4/30/2018 News in Brief New genetic details may help roses come up smelling like, well, roses Plants, Genetics Susan Milius
4/30/2018 Science Visualized See (and hear) the stunning diversity of bowhead whales' songs Animals, Biophysics, Ecology Helen Thompson
4/29/2018 News New genetic sleuthing tools helped track down the Golden State Killer suspect Genetics, Science & Society Tina Hesman Saey
Project 2:
PageTitle UniquePageviews AvgTimeOnPage
The truth about animals isn't always pretty 63398 Sun Dec 31 00:03:06 EST 1899
Birds get their internal compass from this newly ID'd eye protein 53566 Sun Dec 31 00:03:30 EST 1899
Last year's solar eclipse set off a heat wave in the upper atmosphere 35496 Sun Dec 31 00:07:03 EST 1899
City heat is getting hazardous for humans 32199 Sun Dec 31 00:05:49 EST 1899
Finally got it to work. Lessons learned:
Use 2.8, not 3.0 beta
Transform smart quotes to regular quotes
value.replace(/[\u2018\u2019\u201A\u201B\u2032\u2035]/,"'")

NYT article search API not returning results for certain queries

I have a set of queries and I am trying to get web_urls using the NYT article search API. But I am seeing that it works for q2 below but not for q1.
q1: Seattle+Jacob Vigdor+the University of Washington
q2: Seattle+Jacob Vigdor+University of Washington
If you paste the url below with your API key in the web browser, you get an empty result.
Search request for q1
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington&begin_date=20170626&api-key=XXXX
Empty results for q1
{"response":{"meta":{"hits":0,"time":27,"offset":0},"docs":[]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Instead if you paste the following in your web browser (without the article 'the' in the query) you get non-empty results
Search request for q2
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+University%20of%20Washington&begin_date=20170626&api-key=XXXX
Non-empty results for q2
{"response":{"meta":{"hits":1,"time":22,"offset":0},"docs":[{"web_url":"https://www.nytimes.com/aponline/2017/06/26/us/ap-us-seattle-minimum-wage.html","snippet":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week....","lead_paragraph":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week.","abstract":null,"print_page":null,"blog":[],"source":"AP","multimedia":[],"headline":{"main":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs","print_headline":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs"},"keywords":[],"pub_date":"2017-06-26T15:16:28+0000","document_type":"article","news_desk":"None","section_name":"U.S.","subsection_name":null,"byline":{"person":[],"original":"By THE ASSOCIATED PRESS","organization":"THE ASSOCIATED PRESS"},"type_of_material":"News","_id":"5951255195d0e02550996fb3","word_count":643,"slideshow_credits":null}]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Interestingly, both queries work fine on the api test page
http://developer.nytimes.com/article_search_v2.json#/Console/
Also, if you look at the article below returned by q2, you see that the query term in q1, 'the University of Washington' does occur in it and it should have returned this article.
https://www.nytimes.com//aponline//2017//06//26//us//ap-us-seattle-minimum-wage.html
I am confused about this behaviour of the API. Any ideas what's going on? Am I missing something?
Thank you for all the answers. Below I am pasting the answer I received from NYT developers.
NYT's Article Search API uses Elasticsearch. There are lots of docs online about the query syntax of Elasticsearch (it is based on Lucene).
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
If you want articles that contain "Seattle", "Jacob Vigdor" and "University of Washington", do
"Seattle" AND "Jacob Vigdor" AND "University of Washington"
or
+"Seattle" +"Jacob Vigdor" +"University of Washington"
I think you need to change encoding of spaces (%20) to + (%2B):
In your example,
q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington
When I submit from the page on the site, it uses %2B:
q=Seattle%2BJacob+Vigdor%2Bthe+University+of+Washington
How are you URL encoding? One way to fix it would be to replace your spaces with + before URL encoding.
Also, you may need to replace %20 with +. There are various schemes for URL encoding, so the best way would depend on how you are doing it.

Podio API filtering by date range

I'm trying to filter tasks by date range and I'm getting errors whatever I try. This is how my request looks like: http://api.podio.com/task?completed=true&created_on%5Bfrom%5D=2016-06-23&created_on%5Bto%5D=2016-06-28&limit=100&offset=0&sort_by=rank&sort_desc=false&space=4671314
Here I'm trying to filter by created_on and I'm suplying {from: "2016-06-23", to: "2016-06-28"} but it's always returning the same error - invalid filter. I'm trying to filter tasks that are created in the last 5 days here.
The tasks API reference can be found in their API docs.
What am I doing wrong?
Date ranges can be separated by -.
To display "all my tasks created between 1st Jan 2014 and 1st Jan 2016" :-
/task?created_on=2014-01-01-2016-01-01&responsible=0'
Podio API get task filtering by date range use below :
/task/?created_on=2017-04-25-2017-05-01&offset=0&sort_by=rank&sort_desc=false&space=xxxxxxx