How to obtain data in a table from Wikipedia API? - api

I'm trying to get all the content from Wikipedia:Unusual_articles and I'm able to get the list of table content by calling this endpoint:
https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=sections&page=Wikipedia:Unusual_articles
and the data I got back look something like this:
{
title: "Wikipedia:Unusual articles",
pageid: 154126,
sections: [
{
toclevel: 1,
level: "2",
line: "Places and infrastructure",
number: "1",
index: "T-1",
fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
byteoffset: null,
anchor: "Places_and_infrastructure"
},
{
toclevel: 2,
level: "3",
line: "Americas",
number: "1.1",
index: "T-2",
fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
byteoffset: null,
anchor: "Americas"
},
...
But I'm not able to get the content of a particular section. For example under Americas is a list of the table with a link and a short description, but is there a way to obtain the link and short description from the API?

You can get the content of every page section by using MediaWiki API with action=parse in two steps. First you have to get all sections from the page with:
https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=Wikipedia:Unusual_articles
From the response you see that section Americas has index=T-2 (T means transcluded page) and it comes from fromtitle=Wikipedia:Unusual_articles/Places_and_infrastructure. Now we use these index and fromtitle to get the content of the section with:
https://en.wikipedia.org/w/api.php?action=parse&page=Wikipedia:Unusual_articles/Places_and_infrastructure&section=2&prop=...
where:
prop=wikitext - gives the original section wikitext that was parsed.
prop=text - gives the parsed section text of the wikitext.

Related

Google Docs API for creating invoice containing table of variable number of rows

I have a template file for my invoice with a table with sample row, but I want to add more rows dynamically based on a given array size, and write the cell values from the array...
Template's photo
I've been struggling for almost 3 days now.
Is there any easy way to accomplish that?
Here's the template file: Link to the Docs file(template)
And here's a few sample arrays of input data to be replaced in the Template file:
[
[
"Sample item 1s",
"Sample Quantity 1",
"Sample price 1",
"Sample total 1"
],
[
"Sample item 2",
"Sample Quantity 2",
"Sample price 2",
"Sample total 2"
],
[
"Sample item 3",
"Sample Quantity 3",
"Sample price 3",
"Sample total 3"
],
]
Now, the length of the parent array can vary depending on the number of items in the invoice, and that's the only problem that I'm struggling with.
And... Yeah, this is a duplicate question, I've found another question on the same topic, but looking at the answers and comments, everyone is commenting that they don't understand the question whereas it looks perfectly clear for me.
Google Docs Invoice template with dynamically items row from Google Sheets
I think the person who asked the question have already quit from it. :(
By the way I am using the API for PHP (Google API Client Library for PHP), and code for replacing dummy text a Google Docs Document by the actual data is given below:
public function replaceTexts(array $replacements, string $document_id) {
# code...
$req = new Docs\BatchUpdateDocumentRequest();
// var_dump($replacements);
// die();
foreach ($replacements as $replacement) {
$target = new Docs\SubstringMatchCriteria();
$target->text = "{{" . $replacement["targetText"] . "}}";
$target->setMatchCase(false);
$req->setRequests([
...$req->getRequests(),
new Docs\Request([
"replaceAllText" => [
"replaceText" => $replacement["newText"],
"containsText" => $target
]
]),
]);
}
return $this->docs_service->documents->batchUpdate(
$document_id,
$req
);
}
A possible solution would be the following
First prep the document by removing every row from the table apart from the title.
Get the full document tree from the Google Docs API.
This would be a simple call with the document id
$doc = $service->documents->get($documentId);
Traverse the document object returned to get to the table and then find the location of the right cell. This could be done by looping through the elements in the body object until one with the right table field is found. Note that this may not necessarily be the first one since in your template, the section with the {{CustomerName}} placeholder is also a table. So you may have to find a table that has the first cell with a text value of "Item".
Add a new row to the table. This is done by creating a request with the shape:
[
'insertTableRow' => [
'tableCellLocation' => [
'rowIndex' => 1,
'columnIndex' => 1,
'tableStartLocation' => [
'index' => 177
]
]
]
]
The tableStartLocation->index element is the paragraph index of the cell to be entered, i.e. body->content[i]->table->startIndex. Send the request.
Repeat steps 2 and 3 to get the updated $doc object, and then access the newly created cell i.e. body->content[i]->table->tableRows[j]->tableCells[k]->content->paragraph->elements[l]->startIndex.
Send a request to update the text content of the cell at the location of the startIndex from 5 above, i.e.
[
'insertText' => [
'location' => [
'index' => 206,
]
],
'text' => 'item_1'
]
]
Repeat step 5 but access the next cell. Note that after each update you need to fetch an updated version of the document object because the indexes change after inserts.
To be honest, this approach is pretty cumbersome, and it's probably more efficient to insert all the data into a spreadsheet and then embed the spreadsheet into your word document. Information on that can be found here How to insert an embedded sheet via Google Docs API?.
As a final note, I created a copy of your template and used the "Try this method" feature in the API documentation to validate my approach so some of the PHP syntax may be a bit off, but I hope you get the general idea.

Text search in aggregation using pymongo

I have a collection named users, it has following attributes
{
“_id”: “937a04d3f516443e87abe8308a1fe83e”,
“username”: “andy”,
“full_name”: “andy white”,
“image” : “https://example.com/xyz.jpg”,
… etc
}
i want to make a text search on full_name and username using aggregation pipeline, so that if a user search for any 3 letters, then the most relevant full_name or username returned sorted by relevancy,
i have already created text index on username and full_name and then i tried query from below link:
https://www.mongodb.com/docs/manual/tutorial/text-search-in-aggregation/#return-results-sorted-by-text-search-score
pipeline_stage = [
{"$match": {"$text": {"$search": “whit”}}},
{"$sort": {“score”: {"$meta": “textScore”}}},
{"$project": {“username”: 1,“full_name”: 1,“image”:1}}
]
stages = [*pipeline_stage]
users = users_db.aggregate(stages)
but i am getting below error:
pymongo.errors.OperationFailure: FieldPath field names may not start with ‘$’. Consider using $getField or $setField., full error: {‘ok’: 0.0, ‘errmsg’: “FieldPath field names may not start with ‘$’. Consider using $getField or $setField.”, ‘code’: 16410, ‘codeName’: ‘Location16410’, ‘$clusterTime’: {‘clusterTime’: Timestamp(1657811022, 14), ‘signature’: {‘hash’: b’a\xb4rem\x02\xc3\xa2P\x93E\nS\x1e\xa6\xaa\xb0\xb1\x85\xb5’, ‘keyId’: 7062773414158663703}}, ‘operationTime’: Timestamp(1657811022, 14)}
I also tried below link (my query also below) but i am getting full text search results, not working for partial text search:
https://www.mongodb.com/docs/manual/tutorial/text-search-in-aggregation/#match-on-text-score
pipeline_stage = [
{"$match": {"$text": {"$search": search_key}}},
{"$project": {"full_name": 1, "score": {"$meta": "textScore"}}},
]
Any help will be appreciated,
Note: I want to do partial text search, sorted by relevant records at top,
Thanks
Your project stage is incorrect, it should be
pipeline_stage = [
{"$match": {"$text": {"$search": "and"}}},
{"$sort": {"score": {"$meta": "textScore"}}},
{"$project": { "username": "$username", "full_name": "$full_name", "image": "$image"}}
]
Also note if you use an English text search, words like and are not indexed.

Is there a way to Index a doc to Elasticsearch with a specific _id filed?

I'm looking to simulate a state where I have a specific _id field inside an index.
Let's assume I want to take the EXACT same log from index1 in my example and index it into index2.
Like so:
This is my index1
{
_index: "index-number-one",
_type: "doc",
_id: "S0meSpec!f!cID",
_score: 1,
_source: {
message: "message1",
type: "type1",
tags: [
"_bla"],
number: 3
}
}
Now I want that exact same log in my index2
{
_index: "index-number-two",
_type: "doc",
_id: "S0meSpec!f!cID",
_score: 1,
_source: {
message: "message1",
type: "type1",
tags: [
"_bla"],
number: 3
}
}
Couldn't find an API in Elasticsearch that can insert a doc to an Index with a specific _id field... (?)
If this action isn't possible so that the Elasticsearch cluster won't have duplications in the _id field, I can imagine it's because they want to keep the ability to search a doc by it's _id
field which needs to be unique, in that case, assume that I don't mind deleting the entire doc from index1 (maybe save it aside as some variable in my code), but in the end, I need the doc in index2, to have the EXACT _id as index1 once had.
And if there's a way to edit an existing _id field it would also solve my problem.
Can anyone please shed any light on how to achieve that goal?
answer to myself,
I found that it can be done in a POST request on the index like so:
POST twitter/test-index-1234/abctype/Som3Cust0mID
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
And the outcome in ES:
{
_index: "test-index-1234",
_type: "abctype",
_id: "Som3Cust0mID",
_score: 1,
_source: {
user: "kimchy",
post_date: "2009-11-15T14:12:12",
message: "trying out Elasticsearch"
}
}
It is definitely possible to do this. IDs are unique per index, not per cluster.
Check the reindex API, it copies one index onto another and keeps the document IDs.
It is also possible to change the ID using a script inside the reindex call.

How to save in chart original ID of element in array that is used to build chart data labels?

Let's say I have an array of users:
usersData = [
{ id: 21, name: 'John Dean' },
{ id: 3, name: 'Mike Brine' },
{ id: 6, name: 'Tom Kalvi' }
]
names in userData are generated from another array with full user information: user.first_name + user.last_name
When I build Vue-Chart Bar, labels array store this data with indexes:
labels = { 0: "John Dean", 1: "Mike Brine", 2: "Tom Kalvi" }
And when I want to get original ID from Bar event onClick, I receive only index.
I need to load additional information for each user based on Bar click, but I need to have original ID.
What is the easiest way to get it?
Thank you in advance
Well if you have an index clicked just do like this:
usersData[clickedIndex].id

Collapsing a group using Google Sheets API

So as a workaround to difficulties creating a new sheet with groups I am trying to create and collapse these groups in a separate call to batchUpdate. I can call request an addDimensionGroup successfully, but when I request updateDimensionGroup to collapse the group I just created, either in the same API call or in a separate one, I get this error:
{
"error": {
"code": 400,
"message": "Invalid requests[1].updateDimensionGroup: dimensionGroup.depth must be \u003e 0",
"status": "INVALID_ARGUMENT"
}
}
But I'm passing depth as 0 as seen by the following JSON which I send in my request:
{
"requests":[{
"addDimensionGroup":{
"range":{
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5}
}
},{
"updateDimensionGroup":{
"dimensionGroup":{
"range": {
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5
},
"depth":0,
"collapsed":true
},
"fields":"*"
}
}],
"includeSpreadsheetInResponse":true}',
...
I'm not entirely sure what I am supposed to provide for "fields", the documentation for UpdateDimensionGroupRequest says it is supposed to be a string ("string ( FieldMask format)"), but the FieldMask definition itself shows the possibility of multiple paths, and doesn't tell me how they are supposed to be separated in a single string.
What am I doing wrong here?
The error message is actually instructing you that the dimensionGroup.depth value must be > 0:
If you call spreadsheets.get() on your sheet, and request only the DimensionGroup data, you'll note that your created group is actually at depth 1:
GET https://sheets.googleapis.com/v4/spreadsheets/{SSID}?fields=sheets(rowGroups)&key={API_KEY}
This makes sense, since the depth is (per API spec):
depth numberThe depth of the group, representing how many groups have a range that wholly contains the range of this group.
Note that any given particular DimensionGroup "wholly contains its own range" by definition.
If your goal is to change the status of the DimensionGroup, then you need to set its collapsed property:
{
"requests":
[
{
"updateDimensionGroup":
{
"dimensionGroup":
{
"range":
{
"sheetId": <your sheet id>,
"dimension": "ROWS",
"startIndex": 2,
"endIndex": 5
},
"collapsed": true,
"depth": 1
},
"fields": "collapsed"
}
}
]
}
For this particular Request, the only attribute you can set is collapsed - the other properties are used to identify the desired DimensionGroup to manipulate. Thus, specifying fields: "*" is equivalent to fields: "collapsed". This is not true for the majority of requests, so specifying fields: "*" and then omitting a non-required request parameter is interpreted as "Delete that missing parameter from the server's representation".
To change a DimensionGroup's depth, you must add or remove other DimensionGroups that encompass it.