Can I get a list of Wikimedia files filtered by a regex? - wikipedia-api

I am looking to find all images by Kawahara Keiga from Wikimedia.
The filenames usually contain the strings "RMNH.ART" and "Kawahara Keiga" - see:
https://en.wikipedia.org/wiki/File:Naturalis_Biodiversity_Center_-_RMNH.ART.5_-_Carcinoplax_longimana_(De_Haan,_1833)_-_Kawahara_Keiga.jpg
https://en.wikipedia.org/wiki/File:Naturalis_Biodiversity_Center_-_RMNH.ART.537_-_Halieutaea_stellata_-_Kawahara_Keiga_-_Siebold_Collection.jpg
https://en.wikipedia.org/wiki/File:Naturalis_Biodiversity_Center_-_RMNH.ART.256_-_Hemitrygon_akajei_(M%C3%BCller_%26_Henle,_1841)_-_Kawahara_Keiga_-_Siebold_Collection.jpg
Is it possible to query a Wikimedia API and get a list of files filtered by "contains" or a regex or similar?

Answering your specific question, you can use:
https://commons.wikimedia.org/w/api.php?action=query&list=search&srsearch=RMNH.ART&srnamespace=6&srlimit=500&format=json
Alternatively though, since the images are categorised already, you could use this instead:
https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Kawahara_Collection_at_Naturalis_Biodiversity_Center&cmlimit=500&format=json
These will both return the first 500 files, and to get all of them, you will need to add &sroffset=500 or &cmcontinue. Admittedly, I've not quite sure how the second one works.
The docs for both of these are at https://www.mediawiki.org/wiki/API:Search and https://www.mediawiki.org/wiki/API:Categorymembers

Related

Can we reference tags in karate? [duplicate]

I'm wondering if you can use wildcard characters with tags to get all tagged scenarios/features that match a certain pattern.
For example, I've used 17 unique tags on many scenarios throughout many of my feature files. The pattern is "#jira=CIS-" followed by 4 numbers, like #jira=CIS-1234 and #jira=CIS-5678.
I'm hoping I can use a wildcard character or something that will find all of the matches for me.
I want to be able to exclude them from being run, when I run all of my features/scenarios.
I've tried the follow:
--tags ~#jira
--tags ~#jira*
--tags ~#jira=*
--tags ~#jira=
Unfortunately none have given my the results I wanted. I was only able to exclude them when I used the exact tag, ex. ~#jira=CIS-1234. It's not a good solution to have to add each single one (of the 17 different tags) to the command line. These tags can change frequently, with new ones being added and old ones being removed, plus it would make for one real long command.
Yes. First read this - there is this un-documented expression-language (based on JS) for advanced tag selction based on the #key=val1,val2 form: https://stackoverflow.com/a/67219165/143475
So you should be able to do this:
valuesFor('#jira').isPresent
And even (here s will be a string, on which you can even do JS regex if you know how):
valuesFor('#jira').isEach(s => s.startsWith('CIS-'))
Would be great to get your confirmation and then this thread itself can help others and we can add it to the docs at some point.

Can you use wildcard characters with tags to get all matching tags

I'm wondering if you can use wildcard characters with tags to get all tagged scenarios/features that match a certain pattern.
For example, I've used 17 unique tags on many scenarios throughout many of my feature files. The pattern is "#jira=CIS-" followed by 4 numbers, like #jira=CIS-1234 and #jira=CIS-5678.
I'm hoping I can use a wildcard character or something that will find all of the matches for me.
I want to be able to exclude them from being run, when I run all of my features/scenarios.
I've tried the follow:
--tags ~#jira
--tags ~#jira*
--tags ~#jira=*
--tags ~#jira=
Unfortunately none have given my the results I wanted. I was only able to exclude them when I used the exact tag, ex. ~#jira=CIS-1234. It's not a good solution to have to add each single one (of the 17 different tags) to the command line. These tags can change frequently, with new ones being added and old ones being removed, plus it would make for one real long command.
Yes. First read this - there is this un-documented expression-language (based on JS) for advanced tag selction based on the #key=val1,val2 form: https://stackoverflow.com/a/67219165/143475
So you should be able to do this:
valuesFor('#jira').isPresent
And even (here s will be a string, on which you can even do JS regex if you know how):
valuesFor('#jira').isEach(s => s.startsWith('CIS-'))
Would be great to get your confirmation and then this thread itself can help others and we can add it to the docs at some point.

sharepoint crawl rule to exclude AllItems.aspx , but get an item/document in search resu lts if queried in the search box

I followed this blog Tips 1and created a crawl rule http://.*forms/allitems.aspx and ran full crawl. I no longer get the results with AllItems.aspx. However, if there is any document with name Something.doc in a Document Library , it no longer gets pulled in the search results.
I think what I desire is a basic functionality, like the user should not get to see Allitems.aspx in the search results but should get the item/document with names entered in the search box.
Please let me know if I am missing anything. I have already put in 24 hours...googled the max I could.
It seems that an Index Reset is required. Here's the steps I did:
1. Add the following crawl rule to exclude: *://*allitems.aspx.
2. Index Reset.
3. Full Crawl.
I could not find a good way to do this using crawl rules. Instead, I opted to set up a restriction on the search results web part.
In the search results web part properties, select "Change Query"
Add a property filter to exclude anything with "AllItems" (and any other exclusions you want in place.
Used Steve Mann's blog as a reference and for the images: http://stevemannspath.blogspot.com/2013/04/sharepoint-2013-search-removing-junk.html

How to specify multiple values on siteSearch in google custom search api?

I'm using the google custom search api and want to create a search using the siteSearch:
https://www.googleapis.com/customsearch/v1?key=k&cx=cx&q=cocos2d&siteSearch=www.cocos2d-iphone.org&siteSearchFilter=i
and it works fine (returns all the result only from the given site).
Then I want to specify TWO sites to search so I tried to change the :
siteSearch=www.cocos2d-iphone.org
to
siteSearch=www.cocos2d-iphone.org www.XXXXXXXX.org
siteSearch=www.cocos2d-iphone.org|www.XXXXXXXX.org
siteSearch=www.cocos2d-iphone.org||www.XXXXXXXX.org
but none of these works.
hope someone can help here, thanks:)
Currently I don't believe you can specify more site through the query param siteSearch.
nevertheless you can configure your Custom Search Engine here: https://www.google.com/cse/manage/all
in the "Site to search" area.
This also works for excluding, as you can read here: https://support.google.com/customsearch/bin/answer.py?hl=en&answer=2631038&topic=2601037&ctx=topic
You cannot do this with the as_sitesearch parameter as that only accepts a single value. But you can achieve what you want with the as_q parameter, setting it to some value like: "site:google.com OR site:microsoft.com" - that will work in a similar way to this search.
The as_q parameter is documented here as:
The as_q parameter provides search terms to check for in a document.
This parameter is also commonly used to allow users to specify
additional terms to search for within a set of search results.
Examples q=president&as_q=John+Adams
Use "space" as seperator
Below is sample PHP code which works for me
$url="https://www.googleapis.com/customsearch/v1?key=k&cx=cx&q=cocos2d&siteSearch=".urlencode("www.cocos2d-iphone.org www.XXXXXXXX.org")."&siteSearchFilter=i"
Thanks,
Ojal Suthar

google analytics API, how to extract pageviews for a specific page?

Google Analytics API: how to extract pageviews for a specific page?
I tried using something like
ga:pagePath=~page.php%3fid%3d44 (page.php?id=44)
but it doesn't seem to work... I get "no results found" where I have 20 pageviews for sure
UPDATE
I think I found the solution
ga:pagePath==/website/page.php?id=44
for some reason I had to include the complete path and ==
To use a partial path to match for a page in filters you should use
ga:pagePath=#page.php?id=44
=# tells ga to match a substring.
What you were originally using was incorrect for this.
I think your problem is that you put the hex version of the ? and = characters into your query, which doesn't match how Analytics stores the page paths. If you change these to the normal characters it should work:
ga:pagePath=~page.php?id=44
Your other solution should work as well but is a bit more inflexible in case you wanted to tweak the query to return other pages.