Select a language in the Google search API - vb.net

I am using the google search API to try and fetch some results from google and store them in a listbox.
This is working fine however I want to search in my own language (dutch) instead of the default language (english).
I tried using "dutch", "nl", "nl-NL", language.dutch but it doesn't seem to work, I've also searched alot but couldn't find the right answer.
Dim cl As New GwebSearchClient("www.google.nl")
Dim rtnList As IList(Of IWebResult) = cl.Search(KEYWORD, LENGTH, LANGUAGE)
For Each itm As IWebResult In rtnList
ListBox1.Items.Add(itm.Url) '
Next

I am not sure if this is what you are looking for, no experience with this API at all, but maybe this is some information for you:
Specific UI language - if you want to specify the language for the UI components (SearchControl, branding, etc...) instead of having it auto-detected then set "language" to the specific language code such as: en, es, zh-CN, pt-PT, and etc...
google.load("search", "1", {"language" : "en"});
From here.

Related

Bing Spell Check API works only in English

Trying Bing Spell Check API, but it doesn't seem to work correctly with languages other than English. Available languages for Spell Check
I've tried to check French text, but the results will actually suggest mistakes to a perfectly fine text and vice versa (meaning it also won't correct a mistake in a text).
I've tried checking this text:
La Terre a un noyau interne solide
This is how I've passed the language:
var result = client.SpellCheckerWithHttpMessagesAsync(text: text, mode: "spell", acceptLanguage: "fr-FR").Result;
I've also tried setLang:
var result = client.SpellCheckerWithHttpMessagesAsync(text: text, mode: "spell", setLang: "fr-FR").Result;
The result suggested changing solide to solid which is wrong.
I've tried other texts as well as different languages with the same results.
Am I missing something in how to use this API?
Pls use market parameter mkt=fr-fr in the query and drop the setLang parameter.

Normalize string from HtmlAgilityPack document

I'm trying to get a web page using vb.net and HtmlAgilityPack with this code:
Dim mWPage As New HtmlAgilityPack.HtmlDocument
Dim wC As New WebClient()
mWPage.Load(wC.OpenRead(mUrl))
My problem is to get text from a table but, when I extract InnerText, i get something like this:
Modificat<!--span-->i dati
instead of (Note that I wrote the same string and below it's displayed correctly):
Modificati dati
I've tryed to use the answer here but it doesn't work in this case (or I wasn't able to make it works)
I noticed that contents changes when I change "User-Agent", so I tryed various "User-Agent" but I never got a perfect text.
So my questions are:
can I use the code that is indicated in the answer to solve the problem?
if not, can I get a perfect text using the right "User-Agent"?
If so, how can I find the right "User-Agent"?
If not, how can I fix the receivedstring?
The response from the server based on a new User-Agent is fully dependent on the server so we will not be able to predict which one will yield the response you're looking for.
But... You will be able to use the HttpUtility.HtmlDecode method to get rid of the encoded HTML and turn it into teh string you're looking for.
To filter out the HTML comment you may need to change the XPath you're using. If you append //text(), you should get only the text elements that match the rest of your expression.

Date range search using Google Custom Search API

I am using the Google Custom Search API to search for images. My implementation is using Java, and this is how I build my search string:
URL url = new URL("https://ajax.googleapis.com/ajax/services/search/images?"
+ "v=1.0&q=barack%20obama&userip=INSERT-USER-IP");
How would I modify the URL to limit search results, for example, to: 2014-08-15 and 2014-09-31?
You can specify a date range using the sort parameter. For your example, you would add this to your query string: sort=date:r:20140815:20140931.
This is documented at https://developers.google.com/custom-search/docs/structured_data#page_dates
Also if you use Google's Java API you can use the Query class and its setSort() method rather than building the URL by hand.
I think the better way is to put this into query itself. Query parameter contains 'after' flag which can be used like:
https://customsearch.googleapis.com/customsearch/v1?
key=<api_key>&
cx=<search_engine_id>&
q="<your_search_word> after:<YYYY-MM-DD>"

How do I access the "See Also" Field in the Wiktionary API?

Many of the Wiktionary pages for Chinese Characters (Hanzi) include links at the top of the page to other similar-looking characters. I'd like to use the Wiktionary API to send a single character in the query and receive a list of similar characters as the response. Unfortunately, I can't seem to find any query that includes the "See Also" field. Is this kind of query possible?
The “see also” field is just a line of wiki code in the page source, and there is no way for the API to know that it's different from any other piece of text on the page.
If you are happy with using only the English version of Wiktionary, you can fetch the wikicode: index.php?title=太&action=raw, and then parse the result for the template also. In this case, the line you are looking for is {{also|大|犬}}.
To check if the template is used on the page at all, query the API for titles=太&prop=templates&tltemplates=Template:also
Similar templates are avilable in more language editions of Wiktionary, in case you want to use other sources than the English one. The current list is:
br:Patrom:gwelet
ca:Plantilla:vegeu
cs:Šablona:Viz
de:Vorlage:Siehe auch
el:Πρότυπο:δείτε
es:Plantilla:desambiguación
eu:Txantiloi:Esanahi desberdina
fi:Malline:katso
fr:Modèle:voir
gl:Modelo:homo
id:Templat:lihat
is:Snið:sjá einnig
it:Template:Vedi
ja:テンプレート:see
no:Mal:se også
oc:Modèl:veire
pl:Szablon:podobne
pt:Predefinição:ver também
ru:Шаблон:Cf
sk:Šablóna:See
sv:Mall:se även
It has been suggested that the WikiData project be expanded to cover Wiktionary. If and when that happens, you might be able to query theWikiData API for that kind of stuff!

citeseerx search api

Is there a way to access CiteSeerX programmatically (e.g. search by author and/or title?) Surprisingly I cannot find anything relevant; surely others too are trying to get scholarly article metadata without resorting to scraping?
EDIT: note that CiteSeerX supports OAI PMH, but that seems to be an API geared towards digital libraries keeping up to date with each other ("content dissemination") and does not specifically support search. Moreover the citeseer info on that page is very sparse and even says "Currently, there are difficulties with the OAI".
There is another SO question about CiteSeerX API (though not specifically search); the 2 answers do not resolve the problem (one talks about Mendeley, another piece of software, and the other says OAI-PMH implementations are free to offer extensions to the minimal spec).
Alternatively, can anyone suggest a good way to obtain citations from authors/titles programmatically?
As suggested by one of the commenters, I tried jabref first:
jabref -n -f "citeseer:title:(lessons from) author:(Brewer)"
However jabref seems to not realize that the query string needs to contain colons and so throws an error.
For search results, I ended up scraping the CiteSeerX results with Python's BeautifulSoup:
url = "http://citeseerx.ist.psu.edu/search?q="
q = "title%3A%28{1}%29+author%3%28{0}%29&submit=Search&sort=cite&t=doc"
url += q.format (author_last, title.replace (" ", "+"))
soup = BeautifulSoup (urllib2.urlopen (url).read ())
result = soup.html.body ("div", id = "result_list") [0].div
title = result.h3.a.string.strip ()
authors = result ("span", "authors") [0].string
authors = authors [len ("by "):].strip ()
date = result ("span", "pubyear") [0].string.strip (", ")
It is possible to get a document ID from the results (the misleadingly-named "doi=..." part in the summary link URL) and then pass that to the CiteSeerX OAI engine to get Dublin Core XML (e.g. http://citeseerx.ist.psu.edu/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:CiteSeerX.psu:10.1.1.42.2177); however that XML ends up containing multiple dc:date elements, which makes it less useful than the scrape output.
Too bad CiteSeerX makes people resort to scraping in spite of all the open archives / open access rhetoric.