How do I search this? Possible to access more than 100 JSON api search results if I pay for it? - google-custom-search

How to search this?
I want to be able to:
1. create a search engine
2. programatically search it thorugh an API (python, or other)
3. paginate through the results (all of them, if I chose)
4. store URL's or results that I want.
Is this even possible with Google Custom Search Engine?
I enabled billing, my CC is up to date with Google, I do steps 1..3 above.
On a search, I will get back 4,000 results for example, but I can only access 10 at a time with the API, none more, and when I reach 100 results I am shut off.
I want to be able to process 1000 results if I wish.
Before you reply, do you personally have working code that goes beyond the 100 limit?
If so, would be very much interested in speaking, learning how you did it.
I am using Python at the moment, but it could be any language.
--
I tried using the &start=100, 200, and so on to paginate through, but this does not work.
I tried getting 100 results in a python script, ending the program, calling it again setting start=100 (after the first set returned), and nothing happened.
I want to be able to use the Google Custom Search API, pay Google for a monthly subscription but have not found that this is possible.
For any given search, I want to decide how many results to process, could be 1K, could be 20K, I simply need/want access to the full result set, but I do not, have not seemed to find a way to do this.

The API allows only a max result depth of 100. See https://developers.google.com/custom-search/v1/cse/list

Related

Mailchimp Archive get more than 20 results

I am using Mailchimp's archive URL in PHP -- I am simply fetching the URL and displaying it as it sits, in order to white lable the funky URL IE
https://us17.campaign-archive.com/home/?u=xxxxxyyyyyxxxxyyyy&id=xxxxyyyyyxxxxyy
In doing so I have read through both the Archive and API documentation, and have found nothing on the parameter for row count. It defaults to 20 as stated in the Archive docs, but I know I have seen archives with a larger row count than that. Is anyone familiar enough with the URL parameters used by MailChimp to increase the row count, to say, 100? IE
https://us17.campaign-archive.com/home/?u=xxx&id=yyy&count=100
It's been a problem for years. Even in 2022 there is still no known way for an end-user to get more than past 20 issues from mailchimp, they simply refuse to add/allow that ability.
However the newsletter creator can go into their backend and generate/enable a javascript API that has the &show= parameter, which can be increased.
https://mailchimp.com/help/add-an-email-campaign-archive-to-your-website/
Again, only the campaign creator can do this, not some random end-user/reader.

How to fetch results from an offset when the API doesn't support offset (HERE Maps API)

I have a search functionality that gets data from HERE API's Search endpoint. I maintain records of each search's results so I can add metadata that I need for my own purposes and also so I can provide results without always going back to HERE API. The problem I have is with paginating, specifically with providing a starting index when fetching results from HERE. Similar to how Algolia does it, I want to be able to search for a term and begin with the results at a certain index, the offset. HERE API apparently doesn't allow this at all. The closest it comes to such a feature is that it provides the URL for the next search, as described here. This is limited because it doesn't allow me to start the search results at a particular index that I specify. So essentially I want to know if there's a "standard" way of getting such functionality even when it's not provided by the API.
My own solution
The HERE API provides a size parameter that allows specifying the total number of results that I want, so I can specify a larger size than I need, and basically use code to start the results from my desired index. But this feels a bit hacky, and I wonder if there's a better/more established way of doing this.
Happy to listen to any ideas! Thanks. :)
Such a kind of an 'offset' for starting the paging after a specific number of results is indeed not supported by the Places API itself.
You have to set up a workaround within your application.

What is the maximum results returned for YouTube Data API v3 call

Context
I am in the process of providing some consultancy on doing a HTTP GET using YouTube Data API V3; in order to develop a Windows based application to GET a list of results from Youtube, for say a specific CATEGORY, or a specific TAG.
We are open to using any programming language(I'm from a C++ background and am hoping You tube will support direct HTTP connections without using Google client SDK and so on) to connect to YouTube and (HTTP) GET data.(Once a month or so, so YouTube API quotas should not be problem).
The Issue
We are being told by some of my client's web developers that YouTube API v3 will only return a maximum of 500 records/results, for say a query that returns JUST the Total viewers, the Video's link, and basic meta data such as that.
S, say I wish to find 5,000 results for category "House music" or "basketball" - and I have the Developer Key etc are all set up, would that be possible?
If so, what GET fields would I need to populate(such as "max_results_per_page")?
Thank you.
The API won't provide more than ~500 search results for any arbitrary query. It's by design. Technically, it means that the nextPageToken field won't be returned once you hit ~500 results. No additional parameter can change that.
If you want more than ~500 results for a query, you have to split it into more specific sub-queries. I'd suggest using the publishedAfter and publishedBefore parameters to achieve that, but feel free to experiment with the other ones here.
This only holds for the search-Query. Other queries like "PlaylisItem:list" deliver more results. I have tested with 100.000 items to get the videos of a playlist.

Filter google query results

I'm writing a search engine for wikipedia articles using lucene on the wiki xml dump and I want to calculate the accuracy of the engine when compared to google wiki result on a particular query, when I give "site:en.wikipedia.org" along with the query. I want to do it for multiple queries so I'm getting the google search result URLs manually. I got Google APIs to use a bot to search Google but the problem is I want to get rid off certain type of results like
"/Category:"
"/icon:"
"/file:"
"/photo:"
and user pages.
But I haven't found a convenient way to do this except for using an iterative method of issuing a query, get n number of results, then filter out by using regular expressions, then retrieve the remaining (n-x) results and so on. Google keeps blocking me when I do that.
Is there an intelligent way to get Google results the way I want using Java?
Thanks in advance guys.
You could just try excluding those pages from the Google results, like this:
living people site:en.wikipedia.org -inurl:category -inurl:category_talk -inurl:file -inurl:file_talk -inurl:user -inurl:user_talk

Programmatic Querying of Google and Other Search Engines With Domain and Keywords

I'm trying to find out if there is a programmatic way to determine how far down in a search engine's search results my site shows up for given keywords. For example, my query would provide my domain name, and keywords, and the result would return a say 94 indicating that my site was the 94th result. I'm specifically interested in how to do this with google but also interested in Bing and Yahoo.
No.
There is no programmatic access to such data. People generally roll out their own version of such trackers. Get the Google search page and use regexes to find your position. But now different results are show in different geographies and results are personalize.
gl=us parameter will help you getting results from US, you can change geography accordingly to get the results.
Before creating this from scratch, you may want to save yourself some time (and money) by using a service that does exactly that [and more]: Ginzametrics.
They have a free plan (so you can test if it fits your requirements and check if it's really worth creating your own tool), an API and can even import data from Google Analytics.