Wikipedia API Extraction of abstracts in 2 languages - wikipedia-api

I am trying to connect 2 API queries.
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro=&explaintext=&titles=Albert+Einstein&format=json
Where I search for article descriptions and
https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllang=de&titles=Companion%20dog
Where I retrieve the name of the article in another language (here German).
Is there a way to connect them to retrieve description data both in English and German?
I have tried connecting them via "generators" and I seem to not understand how to apply it here.
I also tried inputting another query after extracting names in 2 languages (searching for descriptions). However, the names are sometimes formatted so that I cannot reuse them in the query.

No. The description is a snippet from the start of the article. If you want a German description, you need to get it from the German Wikipedia (ie. a different API endpoint).

Related

is this possible to manipulate a description to same meaning but different words with data manipulation

i want to copy a data from a website which sells courses like ITIL, Prince2 and PMP and many other IT sector courses now there are 20,000 different courses's description is there.
However, i want to use selenium to scrape all the data but description is still subject to copyright.
Kindly let me know how i can manipulate all of that description to data to same meaning but different words.
Is there any API which can give me an access to build an code which will be helping these description data by using it's synonymous or which can change it's grammer to completely new sentennces but same meaning.
Kindly let me know where to start this.
Thanks,
The task you are referring to is called paraphrasing.
There is a lot of research on the field. In arXiv you fill find research papers on the topic. However, since you are asking for an API, I am assuming you don't want to implement these models by your self. Luckily, some authors have published their models online on GitHub. (Note: some are a re-implementation by someone else.)
When you use some of these implementations, note that most offer a pre-trained model. Do read which data set was used for training and try to pick the one that is the most similar to the data that you are facing. By doing so, more words in the domain of your descriptions will be available and more synonyms can be used.

fetching wikidata labels in other languages from reconciled column

I want to use wikidata reconciliation to translate a column of terms into various languages by fetching the labels in those languages. Using SPARQL, I'd filter a query for label by language (this is the approach suggested in various similar cases). I don't see how to do the same using OpenRefine reconciliation, however.
Maybe the problem is that the wikidata API is language-specific?
Say that you want to fetch labels in Italian, which has language code it. You can do that by entering Lit in the property input. You can also fetch descriptions with Dit or aliases with Ait. To fetch these terms in other languages, replace it by other language codes.
This is only documented at https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation so far - I acknowledge that we need a more visible documentation for this (ideally it should be easily accessible from OpenRefine's user interface, given that the reconciliation service comes preconfigured in OpenRefine).

MediaWiki api for Wikipedia - is it possible to search by title on ALL languages?

I know that to search for a page id of a wikipedia with known title, i can do:
https://en.wikipedia.org/w/api.php?action=query&titles=7_Studios
However, in this case, 7_Studios is a french wikipedia article, so the above link would not work. Instead I need to try
https://fr.wikipedia.org/w/api.php?action=query&titles=7_Studios
My question is, if I do not know what language the article is about but only the title itself, how can it make sure i can find it using the api?
As Bergi mentioned, you can use Wikidata for this: it contains the database of interwiki links, so it's possible some article title won't be there, but most should.
To do this, you can use the wbgetentities module: you specify the title to search for and a list of wikis to search. For example:
https://www.wikidata.org/w/api.php?action=wbgetentities&titles=7_Studios&sites=enwiki|frwiki|nlwiki|dewiki
You can specify up to 50 wikis in one query. Currently, there are around 300 Wikipedias, so if you really need to query all of them, you may need up to 6 requests for each title.

How to get information in info box of Wikipedia articles using Wikipedia api?

I'm trying to get lead actor's name from movie's Wikipedia article.
I tried different values for prop, prop=info seems most relevant. But this doesn't contain the information in info box of Wikipedia article.
See:
http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Casino_Royale_(2006_film)&format=jsonfm
Is it possible to extract information in infobox using Wikipedia API?
The MediaWiki API doesn't understand infoboxes. So, you have basically two options:
Parse the infobox yourself. You can either parse the wikitext directly or the generated HTML table (both are available from the API).
Let somebody else do the parsing. This is exactly what DBPedia does. Wikidata tries to do something similar, but it probably won't contain enough data to be usable for a long time; see growth statistics.

Developing English query-based search for an embodied agent

I'm looking to create an Embodied Agent for handling search requests on a website. The agent needs to be able to handle simple questions, and provide a series of website links for an answer.
All the articles are in a database. Each article has a title field, and a series of tags to categorize the article.
At this point, my simple algorithm would be:
Split the question up into a series of words.
Remove all common words like "a", "the", "how", etc.
Create a "where" clause, searching the article body, article title, and tags for the remaining words.
Display the list, possibly ranked with those articles with matches in the title first, tags second, and article body third.
Is there a better algorithm for converting an English question into a SQL query? Are there specific details that should be tracked along with each article by the article author to further improve search results? Are there details that should be recorded over time while the search is in use to further improve search results?
UPDATE: The website will be running on IIS, with the latest ASP.NET. The backend database will be a SQL Server.
There really isn't an easy solution for true english query parsing. Most search engines simply eliminate noise words, like you're proposing, and look for the remaining terms. If you're using Microsoft SQL, you may want to look at Full-Text Search (SQL Server). You may also want to read Semantic Search (SQL Server), if you can use Microsoft SQL Server 2012. If you're using MySQL, see 12.9. Full-Text Search Functions.
You might find Kueri.me relevant.
Kueri converts natural language to SQL. It comes with a Javascript library out of the box that can be integrated inside a website.
You will be able to ask:
show me articles
top 10 articles by rating
bottom 5 articles by creation date
last 7 articles added in the last week and with description containing "xx" or "yy"
show all articles with more than 2 rankings
how many articles with no rating per section
etc