Extract table data from Wikipedia

Extract table data from Wikipedia - wikipedia-api

is there any way to extract only table data I am trying to extract a table from the specific section "Grade One" from this article https://en.wikipedia.org/wiki/List_of_motor_racing_circuits_by_FIA_grade using Api sandbox but I am getting only the whole content of the page.
this is the URL from the API sandbox which gives me all content.
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=List%20of%20motor%20racing%20circuits%20by%20FIA%20grade&prop=text

I followed the steps I described in my answer in order to get the data you want.
This is the URL:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=List%20of%20motor%20racing%20circuits%20by%20FIA%20grade&prop=sections%7Ctext&section=1&disablelimitreport=1&utf8=1
The output contains the table and the text in the "Grade One" section.
This is the API Sandbox example.
Response:
{
"parse": {
"title": "List of motor racing circuits by FIA grade",
"pageid": 57151782,
"text": {
"*": "<div class=\"mw-parser-output\"><h2><span class=\"mw-headline\" id=\"Grade_One\">Grade One</span><span class=\"mw-editsection\"><span class=\"mw-editsection-bracket\">[</span>edit<span class=\"mw-editsection-bracket\">]</span></span></h2>\n<p>There are 40 Grade One circuits for a total of 49 layouts in 27 nations as of December 2021. Circuits holding Grade One certification may host events involving \"Automobiles of Groups D (FIA International Formula) and E (Free Formula) with a weight/power ratio of less than 1 kg/hp.\"<sup id=\"cite_ref-ISC2019_1-0\" class=\"reference\">[1]</sup> As such, a Grade One certification is required to host events involving Formula One cars.<sup id=\"cite_ref-2\" class=\"reference\">[2]</sup><sup id=\"cite_ref-2021_December_list_3-0\" class=\"reference\">[3]</sup>\n</p>\n<table class=\"wikitable sortable\" width=\"75%\" style=\"font-size: 95%;\">\n<tbody><tr>\n<th>Circuit\n</th>\n<th>Location\n</th>\n<th>Country\n</th>\n<th>Layout\n</th>\n<th>Length\n</th>\n<th>Continent\n</th></tr>\n<tr>\n<td>Albert Park Circuit\n</td>\n<td>Melbourne\n</td>\n<td><span class=\"flagicon\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/23px-Flag_of_Australia_%28converted%29.svg.png\" decoding=\"async\" width=\"23\" height=\"12\" class=\"thumbborder\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/35px-Flag_of_Australia_%28converted%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/46px-Flag_of_Australia_%28converted%29.svg.png 2x\" data-file-width=\"1280\" data-file-height=\"640\" /> </span>Australia\n</td>\n<td>Grand Prix\n</td>\n<td>5.279 km (3.280 mi)\n</td>\n<td>Australia\n</td></tr>\n<tr>[...the rest of the table is shown here]"
},
"sections": [
{
"toclevel": 1,
"level": "2",
"line": "Grade One",
"number": "1",
"index": "1",
"fromtitle": "List_of_motor_racing_circuits_by_FIA_grade",
"byteoffset": 0,
"anchor": "Grade_One"
}
]
}
}
I can't see how you can return the table only, so, you have to extract the table from the API response (by using a script).

Related

Add computed field to Query in Grafana using JSON API als data source

What am I trying to achieve:
I would like to have a time series chart showing the total number of members in my club at any time. This member count should be calculated by using the field "Eintrittsdatum" (joining-date) and "Austrittsdatum" (leaving-date). I’m thinking of it as a running sum - every filled field with a joining-date means +1 on the member count, every leaving-date entry is a -1.
Data structure
I’m calling the API of webling.ch with a secret key. This is my data structure with sample data per member:
[
{
"type": "member",
"meta": {
"created": "2020-03-02 11:33:00",
"createuser": {
"label": "Joana Doe",
"type": "user"
},
"lastmodified": "2022-12-06 16:32:56",
"lastmodifieduser": {
"label": "Joana Doe",
"type": "user"
}
},
"readonly": true,
"properties": {
"Mitglieder ID": 99,
"Anrede": "Dear",
"Vorname": "Jon",
"Name": "Doe",
"Strasse": "Doeington Street",
"Adresszusatz": null,
"PLZ": "9999",
"Ort": "Doetown",
"E-Mail": "jon.doe#doenet.net",
"Telefon Privat": null,
"Telefon Geschäft": null,
"Mobile": "099 877 54 54",
"Geschlecht": "m",
"Geburtstag": "1966-03-10",
"Mitgliedschaftstyp": "Aktivmitgliedschaft",
"Eintrittsdatum": "2020-03-01",
"Austrittsdatum": null,
"Passfoto": null,
"Wordpress Benutzername": null,
"Wohnhaft im Glarnerland": false,
"Lat": "43.1563379",
"Long": "6.0474622"
},
"parents": [
240
],
"children": {
},
"links": {
"debitor": [
2124,
3056,
3897
],
"attendee": [
2576
]
},
"id": 1815
}
]
Grafana data source
I am using the “JSON API” by Marcus Olsson: GitHub - grafana/grafana-json-datasource: A data source plugin for loading JSON APIs into Grafana.
Grafana v9.3.1 (89b365f8b1) on Linux
My current approach
Queries:
Query C - uses a filter on the source-API to only show entries with "Eintrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Eintrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Query D - uses a filter on the source-API to only show entries with "Austrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Austrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Here's a screenshot to clarify things
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-1.png)
Transformations:
My applied transformations
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-2.png)
What's working
I can correctly gather the number of members added/subtracted per day.
What's not working
I can't get the graph to display the way i want: I'd like to have a running sum of these numbers instead of the following two graphs.
Time series graph with merged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-3.png)
Time series graph with unmerged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-4.png)
I can't get the names to display within the tooltip of the data points (really not THAT necessary).

How to get new Search Engine results in the past 24h using a SERP API?

Assume I am in possession of a SERP API, which given a keyword, returns me the Google results of that keyword in JSON format (for example: https://serpapi.com/):
{
"organic_results": [
{
"position": 1,
"title": "Coffee - Wikipedia",
"link": "https://en.wikipedia.org/wiki/Coffee",
"displayed_link": "https://en.wikipedia.org › wiki › Coffee",
"snippet": "Coffee is a brewed drink prepared from roasted coffee beans, the seeds of berries from certain Coffea species. From the coffee fruit, the seeds are ...",
"sitelinks":{/*snip*/}
,
"rich_snippet":
{
"bottom":
{
"extensions":
[
"Region of origin: Horn of Africa and ‎South Ara...‎",
"Color: Black, dark brown, light brown, beige",
"Introduced: 15th century"
]
,
"detected_extensions":
{
"introduced_th_century": 15
}
}
}
,
"about_this_result":
{
"source":
{
"description": "Wikipedia is a free content, multilingual online encyclopedia written and maintained by a community of volunteers through a model of open collaboration, using a wiki-based editing system. Individual contributors, also called editors, are known as Wikipedians.",
"source_info_link": "https://en.wikipedia.org/wiki/Wikipedia",
"security": "secure",
"icon": "https://serpapi.com/searches/6165916694c6c7025deef5ab/images/ed8bda76b255c4dc4634911fb134de53068293b1c92f91967eef45285098b61516f2cf8b6f353fb18774013a1039b1fb.png"
}
,
"keywords":
[
"coffee"
]
,
"languages":
[
"English"
]
,
"regions":
[
"the United States"
]
}
,
"cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:U6oJMnF-eeUJ:https://en.wikipedia.org/wiki/Coffee+&cd=4&hl=en&ct=clnk&gl=us",
"related_pages_link": "https://www.google.com/search?q=related:https://en.wikipedia.org/wiki/Coffee+Coffee"
},
/* Results 2,3,4... */
]}
What is a good way to get new results from the past 24h? I added the &tbs=qdr:d query parameter, which only shows the results from the past day. That's a good first step.
The 2nd step is to filter out only relevant results. When there are no relevant results, Google shows this message box:
What is their algorithm to show this box?
Idea 1: "grep -i {exact_keywords}"
For example, if I search a keyword like "Alexander Pope", the 24h Google query might return results about the pope, written by a guy called Alexander. That's not super relevant. My naive idea is to grep (case insensitive) the exact keyword "Alexander Pope".
But that might leave out some good results.
Any other ideas?

create order with different extra bag for outbound and inbound

I want to test order create API by adding extra bags. And I am experiencing a strange problem.
I make a search for Paris-NYC round trip, then I send the request to offer price API using include=detailed-fare-rules,bags parameter.
In the response, I get 2 kinds of extra bag information:
1 bag, 30 EUR
2 bags, 75 EUR
"bags": {
"1": {
"quantity": 1,
"name": "CHECKED_BAG",
"price": {
"amount": "30.00",
"currencyCode": "EUR"
},
"bookableByItinerary": true,
"segmentIds": [
"1",
"3"
],
"travelerIds": [
"1"
]
},
"2": {
"quantity": 2,
"name": "CHECKED_BAG",
"price": {
"amount": "75.00",
"currencyCode": "EUR"
},
"bookableByItinerary": true,
"segmentIds": [
"1",
"3"
],
"travelerIds": [
"1"
]
}
}
Everything goes well if I create order by:
adding 1 bag for outbound(paris to NYC), and adding 1 bag for inbound(NYC to Paris)
adding only 1 bag for outbound (0 extra bag for inbound)
adding 2 bag for outbound(paris to NYC), and adding 2 bags for inbound(NYC to Paris)
The problem is for the scenario:
I create order by adding 1 bag for outbound, and adding 2 bags for inbound.
In this case, the order is created with a warning message
"warnings": [
{
"status": 200,
"code": 0,
"title": "BookingWithPriceMarginWarning",
"detail": "The prices are lower than expected"
}
]
And the created order contains 1 extra bag for outbound, and 1 extra bag for inbound.
So I have 2 questions about this strange problem:
Is it normal that my order is modified when processing order create ?
Adding different number of extra bags for different itineraries is supported ?
Thanks

Is it normal that my order is modified when processing order create ?
It depends if you are Self-Service or Enterprise user:
For Enterprise users, Flight Create Orders offer the possibility to do a "best-effort" for additional-service booking. If this option is activated, Flight Create Orders gives priority to the reservation of your flight and remove the additional service that cannot be booked. That's why you receive the warning in your request when it happens.
Self Service users have the default behavior which rejects the creation of the order if at least one additional service can not be booked. In this case you will receive the following error:
{
"errors": [
{
"status": 400,
"code": 38034,
"title": "ONE OR MORE SERVICES ARE NOT AVAILABLE",
"detail": "Error booking additional services"
}
]
}
Adding different number of extra bags for different itineraries is supported ?
Yes, that is supported. Be aware that you cannot have an infinite amount of bag on a plane, so it could happen that you get an error when adding extra bags if there are too many bags already added by other passengers.

Querying BigQuery Events data in PowerBI

Hi I have analytics events data moved from firebase to BigQuery and need to create visualization in PowerBI using that BigQuery dataset. I'm able to access the dataset in PowerBI but some fields are in array type I generally use UNNEST while querying in console but how to run the query inside PowerBI. Is there any other option available? Thanks.
Table In BigQuery

What we did until the driver fully supports arrays is to flatten in a view: create a view in bigquery with UNNEST() and query that in PBI instead.

You might need to Transform(parse Json into columns/rows) your specific column in your case event_params
So I have below Json as example for you.
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
}
}
I had this json added to my table. currently it has only 1 column
Now I go to Edit queries and go on Transform Tab, there you find Parse, In my case I have Json
When you parse as Json you will have expandable column
Now click on expanding it and sometimes it asks for expand to new row.
Finally you will have such a Table

Unwind and alias used together in an OrientDB SQL query

Using OrientDB 2.2.16 and given the following data imported in a document database:
{
"teams": [
{
"name": "McLaren F1 Team",
"nationality": "british",
"headquarters": {
"city": "Woking",
"country": "England"
},
"drivers": [
{
"name": "Fernando Alonso",
"nationality": "Spanish",
"yearOfBirth": "1980"
},
{
"name": "Jenson Button",
"nationality": "British",
"yearOfBirth": "1980"
}
]
},
{
"name": "Scuderia Ferrari",
"nationality": "italian",
"headquarters": {
"city": "Maranello",
"country": "Italy"
},
"drivers": [
{
"name": "Sebastian Vettel",
"nationality": "German",
"yearOfBirth": "1987"
},
{
"name": "Kimi Raikkonen",
"nationality": "Finnish",
"yearOfBirth": "1979"
}
]
}
]
}
Using unwind, I want to find the query returning the names of all the drivers. To be exact, the result has to be a list of documents where each document contains the name of a driver in a property called "name".
My (not working) attempts:
SELECT drivers.name FROM Teams unwind drivers
It returns almost what I expect, but the name is placed under a property called "drivers".
SELECT drivers.name AS name FROM Teams unwind drivers
Fails totally, there's no unwind at all.
SELECT drivers.name AS name FROM Teams unwind name
This works, but it's kind of a bug actually because the alias applies to drivers, not name, and that's why the unwind works.

A little background on how the query is elaborated:
Teams data is fetched from the storage
each record is filtered (no filtering in this case, because you have no WHERE condition)
for each source record, the engine calculates the projections and creates a new document containing the projection values bound to aliases
In your case, at this step you have two records:
query 1: the default alias for drivers.name in v 2.2 is drivers (in v 3.0 this will change, the default alias will be drivers.name)
| drivers |
+-----------------------------------------+
| ["Fernando Alonso", "Jenson Button" ] |
| ["Sebastian Vettel", "Kimi Raikkonen" ] |
query 2 and 3: in this case the alias is name, you are defining it explicitly
| name |
+-----------------------------------------+
| ["Fernando Alonso", "Jenson Button" ] |
| ["Sebastian Vettel", "Kimi Raikkonen" ] |
the UNWIND is calculated on the result of step 3. The alias of the result is the same as the previous step
Query 1 unwinds drivers as expected, but the alias remains drivers
Query 2 tries to unwind drivers but it doesn't find it obviously, this is why it fails
Query 3 unwinds name as expected
As a conclusion: this is the expected behavior

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract table data from Wikipedia - wikipedia-api

Related

Add computed field to Query in Grafana using JSON API als data source

How to get new Search Engine results in the past 24h using a SERP API?

create order with different extra bag for outbound and inbound

Querying BigQuery Events data in PowerBI

Unwind and alias used together in an OrientDB SQL query

Categories

Resources