MediaWiki API is returning empty extracts from Wikipedia - api

This page is returning an empty extract:
https://fr.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Alerte%20Rouge%20%28groupe%29&explaintext&rvprop=content&format=json
But the same query works for other pages:
https://fr.wikipedia.org/w/api.php?action=query&prop=extracts&titles=?action=query&prop=extracts&titles=B%C3%A9rurier_noir&explaintext&rvprop=content&format=json
Adding "&exlimit=max&exintro" as suggested in other topics didn't fix the issue.
Am I doing something wrong?

On French Wikipedia, "Alerte Rouge (groupe)" is a redirect to "Alerte rouge (groupe)".
Requesting extracts on redirects does not work. Thus you have to use the correct title, i.e. https://fr.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Alerte%20rouge%20(groupe)&explaintext&format=json

Related

Apache regex -301 redirects to eradicate duplicates in url path

We are using a CMS that produces URLs of the format www.domain.com/home/help/contact/contact. Here the first occurrence of contact is the directory and the second occurrence is the HTML page itself. These urls are causing issues in the SEO space.
We have implemented canonical tags but the business wants to make sure they don't come across these duplicates in both the search engines and Google analytics, and have asked us to implement a 301 solution on our web server.
My question is we have got a regex to find these matches but I also need the part of the URL before the match. The regex we have is .*?([\w]+)\/\1+ and this returns contact in /home/help/contact/contact. How can I get the /home/help/ path as well so I can redirect to the right page? Can someone help with this please as I am a beginner when it comes to regex?
Since you're able to get contact using a matching group, enclose everything before that inside a matching group as well:
(.*?)(/[\w]+)\2+
I have put the / inside a matching group too, so that you won't get false positives for
/home/some/app/page
this would be \1 ^ ^ found repetition (character p would be matched)

The first result in Google is not my homepage

When you type hackisition on Google, it returns the following url as first result:
https://www.hackisition.com/en/
// instead of
https://www.hackisition.com/
I'd like to replace that link by the real homepage. How can I do that? Is there a way to specifically ask Google to show this homepage?
I am getting "https://www.hackisition.com/" as first result for "hackisition". I'm not sure why you're getting such result. Try to clear cookies and turn off VPN/Proxy if you're using one.
We are getting result as you said , so I suggest to clear cache
Maybe that is because of User location.
That mean countries preferred English on search would be able to see /en
and other countries that not prefer English will see the non /en result.
Search your website on
Google.ae
Google.DK
you'll see https://www.hackisition.com in result not https://www.hackisition.com/en.
But if you search in Google.com, Google.eu you'll see the result https://www.hackisition.com/en
Hope this would be the solution of your problem.

google analytics API, how to extract pageviews for a specific page?

Google Analytics API: how to extract pageviews for a specific page?
I tried using something like
ga:pagePath=~page.php%3fid%3d44 (page.php?id=44)
but it doesn't seem to work... I get "no results found" where I have 20 pageviews for sure
UPDATE
I think I found the solution
ga:pagePath==/website/page.php?id=44
for some reason I had to include the complete path and ==
To use a partial path to match for a page in filters you should use
ga:pagePath=#page.php?id=44
=# tells ga to match a substring.
What you were originally using was incorrect for this.
I think your problem is that you put the hex version of the ? and = characters into your query, which doesn't match how Analytics stores the page paths. If you change these to the normal characters it should work:
ga:pagePath=~page.php?id=44
Your other solution should work as well but is a bit more inflexible in case you wanted to tweak the query to return other pages.

Google Custom Search API, Howto return country specific results only

I am making some PHP code which takes a given search phrase and url and searches through the google search results until it finds the url (only first 100 results). My problem is, this is only working for the US. I have tried adding the "&cr=" option, but it still on returns US results.
The full URL I am using for the request is:
https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=CX_VALUE&q=KEYWORD&cr=COUNTRY&alt=JSON
Does anyone have any experience with this? I want to be able to see UK results. Tried inserting &cr=countryUK , but still only does US results.
Thanks :)
Regards,
Stian
Use the gl=<country code> param to limit it to your country of choice (so gl=gb for the uk).
More info here:
http://googleajaxsearchapi.blogspot.com/2009/10/web-search-in-your-country.html

Controller not found issue when rewriting url with exclamation mark

In monorail I'm trying to create a url rewriting rule to give friendly urls to article posts. Here's what the urls look like:
http://domain.com/2010/11/29/Winter-snow-warning
And here's the code in global.asax.cs to rewrite the urls:
RoutingModuleEx.Engine.Add(
new PatternRoute("/<year>/<month>/<day>/<title>")
.DefaultForController().Is("post")
.DefaultForAction().Is("show")
.Restrict("year").ValidInteger
.Restrict("month").ValidInteger
.Restrict("day").ValidInteger
);
This works great, however if there is an exclamation point in the url:
http://domain.com/2010/11/29/Winter-snow-warning!!
Then it doesn't match the rewriting rule and errors out, saying the controller "2010" cannot be found. What am I missing here, is this a bug in monorail?
Perhaps the default matching mechanism of Monorail's routing is not accepting exclamation mark, thus the route does not match and the default /controller/action rule is matched instead, failing since no 2010 controller exists.
A quick workaround could be to to restrict the title to the exact expression that fits your needs. e.g.: .Restring("title").ValidRegex("[-_.+!*'() a-zA-Z0-9]+]")