Filter Google Custom Search for pages that contain a specific metatag - api

I have been trying to filter my google custom searches to pages that specifically contain a meta tag structured as ilustrated below:
<meta data-rh="true" id="meta-og-image" property="og:image" content="https://xxx.jpg"/>
When I get the JSON from GCS I can see the metatag as:
"metatags": [
{
"og:image": "xxx.jpg",
"theme-color": "#fff",...
is it possible to craft a GET command informing in the URL that only pages with that specific meta tag should be part of the results?
As google only returns 10 items per CGS request and the total numer of items found is much bigger than 1000, I need to filter the search itself instead of crawling through results.
Could´nt get it to work until now.
Thanks for the support!

A query like more:pagemap:metatags:og_image will return pages that that have an og:image metatag
Learn more: https://developers.google.com/custom-search/docs/structured_search

Related

Google Custom Search Engine - Edit the URLs in SERP

Good evening, I'm working with Google Custom Search Engine: https://cse.google.it/cse/ and I need to add, only for some URLs (in total 10 URLs) a parameter at the end of each search results URL. For example: if the domain is www.dominio-test.com then the URL to show is www.dominio-test.com/123456 the CSE searches on the web without restriction.
The code I use is the following:
<script async src = "https://cse.google.com/cse.js cx=014431187084467459449:v2cmjgvgjr0"> </script>
<div class = "gcse-search"> </div>
I thought of proceeding with reading the DOM with getElementsByClassName (), extracting the content and adding it to a variable, at this point add the if / else / replace controls and output again.
But I don't know how to proceed, can you please support me?
Thanks and good job
Search Element callbacks might work for you: https://developers.google.com/custom-search/docs/element#callbacks

Netsuite PDF Templating: get number of pages as attribute

I am templating pdfs in Netsuite using freemarker and I want to display the footer only on the last page. I have been doing some research, but couldn't find a solution (since looks like the environment does not allow me to include or import libs), so I thought that just comparing the number of the page with the total pages in an if tag would be a nice and easy workaround. I already know how to display the numbers by using the <pagenumber/> and <totalpages/> tags, but still cannot get them as values so I can use them like this:
<#if (pagenumber == totalpages) >
... footer html...
</#if>
Any ideas of how or where can I get those values from?
The approach you are trying won't work, because you are mixing BFO and Freemarker syntax. Netsuite uses two different "engines" to process PDF Templates. The first step is Freemarker, which merges the record fields with your template and produces an XML file, which is then converted by BFO into a PDF file. The <totalpages/> element is meaningless to Freemarker, as it is only converted into a number by BFO later.
Unfortunately, the ability to add a footer to only the last page of a document is currently a limitation of BFO, as per the BFO FAQ:
At the moment we do not have a facility for explicitly assigning a
footer or header to the last page in a document when the number of
pages is unknown.
You CAN add it after a page break - and put the page break at the end of the body
<pbr footer="nlfooter" footer-height="25%"></pbr>
</body>
The issue here is - on a one page output - you will get 2 pages minimum... it will always ADD a page for the disclaimer / footer...

Query Wikipedia pages with properties

I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.
Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)
A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.
This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.
I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.
Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in.
Prominent images of a page is returned by prop=pageimages, thanks to PageImages.
MobileFrontend adds a property called extracts, that you can use with the directive exintro to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:
http://en.wikipedia.org/w/api.php?action=query&pageids=21482&prop=pageimages|info|extracts&inprop=url&exintro
giving this:
<api>
<query>
<pages>
<page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&action=edit">
<thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
<extract xml:space="preserve">
<p><b>Nairobi</b> /naɪˈroʊbi/ is the [...]
</extract>
</page>
</pages>
</query>
</api>
Here is a multistep process to get a list of Wikipedia page titles and properties for articles, and then getting the page IDs and URLS.
Please note: It does use a portion of a previous answer: "Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in."
If you would like to use the Wikipedia API for your own applications and search Wikipedia for getting a list of articles about a certain topic, and you wanted the answer in JSON format, then you could could use the following URL:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&format=json&callback=?
And if your eyes are having trouble parsing results from that, then replace "format=json&callback=?" with "formatversion=2" like the following example to make it easier for your eyes:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&formatversion=2
The following example will give me a batch list of article titles and properties about/for "Thailand" in JSON format, and after that I will use the resulting titles to find the page IDs and URLS of those articles.
URL step 1:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=thailand&format=json&callback=?
From step 1, I can get the list of titles I need from inside the resulting JSON, with step 2, I use those titles gained in step 1 in another API query (aka step 2) for gaining the page IDs and URLs of those articles in the resulting JSON...results of step2.
Here are the Wikipedia article titles from the resulting JSON of step 1:
Thailand
Outline of Thailand
Geography of Thailand
Economy of Thailand
Football in Thailand
Southern Thailand
Government of Thailand
Northern Thailand
Culture of Thailand
Cinema of Thailand
URL step 2:
https://en.wikipedia.org/w/api.php?action=query&titles=Thailand|Outline%20of%20Thailand|Geography%20of%20Thailand|Economy%20of%20Thailand|Football%20in%20Thailand|Southern%20Thailand|Government%20of%20Thailand|Northern%20Thailand|Culture%20of%20Thailand|Cinema%20of%20Thailand&prop=info&inprop=url&format=json&callback=?

How to get the result of "all pages with prefix" using Wikipedia api?

I wish to use Wikipedia api to extract the result of this page:
http://en.wikipedia.org/wiki/Special:PrefixIndex
When searching "something" on it, for example this:
http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4
Then, I would like to access each of the resulting pages and extract their information.
What api call might I use?
You can use list=allpages and specify apprefix. For example:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apprefix=tal&aplimit=max
This query will give you the id and title of each article that starts with tal. If you want to get more information about each page, you can use this list as a generator:
http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=allpages&gapprefix=tal&gaplimit=max&prop=info
You can give different values to the prop parameter to get different information about the page.

google analytics API, how to extract pageviews for a specific page?

Google Analytics API: how to extract pageviews for a specific page?
I tried using something like
ga:pagePath=~page.php%3fid%3d44 (page.php?id=44)
but it doesn't seem to work... I get "no results found" where I have 20 pageviews for sure
UPDATE
I think I found the solution
ga:pagePath==/website/page.php?id=44
for some reason I had to include the complete path and ==
To use a partial path to match for a page in filters you should use
ga:pagePath=#page.php?id=44
=# tells ga to match a substring.
What you were originally using was incorrect for this.
I think your problem is that you put the hex version of the ? and = characters into your query, which doesn't match how Analytics stores the page paths. If you change these to the normal characters it should work:
ga:pagePath=~page.php?id=44
Your other solution should work as well but is a bit more inflexible in case you wanted to tweak the query to return other pages.