How to get Google CSE RESTFull API Result's Next Page & confusion on daily request limit? - google-custom-search

I am using Google CSE Restlful API. And my code to get results is
Google.Apis.Customsearch.v1.CseResource.ListRequest listRequest = svc.Cse.List(query);
listRequest.Cx = cx;
Google.Apis.Customsearch.v1.Data.Search search = listRequest.Fetch();
foreach (Google.Apis.Customsearch.v1.Data.Result result in search.Items)
{
//do something with items
}
It returns me 10 results out of total 100 . To see results of next 10 records I have to
listRequest.Start = 11;
search = listRequest.Fetch();
And now I my 'search.Items' have results from 11-20 .
Now I have 2 questions:
1- Is it right way to get the results of next page ( next 10 records) ?
2- And doing so would it mean that I have consumed 2 request out of 100 allowed requests per day ?
If this is correct then effectively user can only get total of 1000 results per day from Google CSE API.
So it means if I have to see all 100 results of my first query I would have to make 10 requests.
Thanks,
Wasim

Yes it's the right way: setting the start parameter to the next index will request the next paginated results from your query.
You are right also on the second question, each request (paginated or non paginated) is counted between the max of 100 allowed per day, resulting of a total of 1000 max results per day.

Related

How can I paginate the results of GEOSEARCH?

I am following the tutorial on https://redis.io/commands/geosearch/ and I have successfully migrated ~300k records (from existing pg database) into testkey (sorry for the unfortunate name, but I am testing it out!) key.
However, executing a query to return items with 5km results in 1000s of items. I'd like to limit the number of items to 10 at a time, and be able to load the next 10 using some sort of keyset pagination.
So, to limit the results I am using
GEOSEARCH testkey FROMLONLAT -122.2612767 37.7936847 BYRADIUS 5 km WITHDIST COUNT 10
How can I execute GEOSEARCH queries with pagination?
Some context: I have a postgres + postgis database with ~3m records. I have a service that fetches items within a radius and even with right indexes it is starting to get sluggish. For context, my other endpoints can handle 3-8k rps, while this one can barely handle 1500 (8ms average query exec time). I am exploring moving items into redis cache, either the entire payload or just IDs and run IN query (<1ms query time).
I am struggling to find any articles using google search.
You can use GEOSEARCHSTORE to create a sorted set with the results from your search. You can then paginate this sorted set with ZRANGE. This is shown as an example on the GEOSEARCHSTORE page:
redis> GEOSEARCHSTORE key2 Sicily FROMLONLAT 15 37 BYBOX 400 400 km ASC COUNT 3 STOREDIST
(integer) 3
redis> ZRANGE key2 0 -1 WITHSCORES
1) "Catania"
2) "56.441257870158204"
3) "Palermo"
4) "190.44242984775784"
5) "edge2"
6) "279.7403417843143"
redis>

How to Setup current database for accurate asynchronous pagination

So currently, I am working on a way for accurate pagination for an asynchronously written database. Basically a request comes in and is broken down into X number of Jobs (in a Queue) which get pulled by another app which batch writes Y number of ResponseItems per X job to a PostgreSQL table.
Two Tables
RequestMetaData: RequestId, ..., ..., ...,
ResponseItemData: Id, RequestId, ..., ...
Each ResponseItemData is part of a Request.
Lets say a request A comes in. It gets split into 700 jobs. Application pulls a job and writes 1000 items. But another request B could have come in as well, so the next job that gets written could have written 1000 items.
1
.
.
.
1000 ==> Last Item in Job 1 of Request A
1001
.
.
.
2000 ==> Last Item in Job 1 of Request B
2001
.
.
.
3000 ==> Last Item in Job 2 Of Request A
So my question is how do I do accurate pagination here? I already know how to do relative cursors
and can retrieve X number of items per requestId by using the last item seen in a incrementing table. Something like this
SELECT *
FROM `ResponseItemData`
WHERE `ResponseItemData`.`id` > **$Last_Item_Seen(Null On First Get Request) ** AND 'ResponseItemData'.'id' == **$RequestId**
ORDER BY `ResponseItemData`.`id` ASC
LIMIT X
My question:
A) How do I know when a request is done? If I keep paginating using a requestId 1000 items 699 times how do I know the 700 is the last one? How do I keep track of when my request is done? In my RequestMetaData Table, I store the amount_of_jobs_request_split_into as well as the expected_number_of_response_items.
Thanks for the help

CKAN - Why I only get a result with the first 10 from Count

With the CKAN API query I get a count = 47 (thats correct) but only 10 results.
How do I get all (=47) results with the API query?
CKAN API Query:
https://suche.transparenz.hamburg.de/api/3/action/package_search?q=title:Fahrplandaten+(GTFS)&sort=score+asc
From source: *For me the page loads very slowly, patience
https://suche.transparenz.hamburg.de/dataset?q=hvv-fahrplandaten+gtfs&sort=score+desc%2Ctitle_sort+asc&esq_not_all_versions=true&limit=50&esq_not_all_versions=true
The count shows only the total number of results found. You can change the total number of results returned by setting up limit and row parameters. e.g https://suche.transparenz.hamburg.de/api/3/action/package_search?q=title:Fahrplandaten+(GTFS)&sort=score+asc&rows=100. The row limit is 1000 per query. You can find more info here

What is the logic behind the prometheus sum after rate functionality?

In about two minutes I have 2000 requests - which should be 1000 requests/minute or 17 requests/second.
The total counter works fine and gives me a nice graph:
sum by(status_code) ( request_duration_count{service_id="myserviceId", path=~".*myservice/frag/.*"} )
The request rate results in a flatline at 0:
sum by(status_code) (rate(request_duration_count{service_id="myserviceId", path=~".*myservice/frag/.*"}[1m]))
I think problem here is that I have one request per URL - which is not fine but it is as it is.
The URLs look like this:
https://myserver/myservice/frag/1
https://myserver/myservice/frag/2
https://myserver/myservice/frag/3
https://myserver/myservice/frag/4
https://myserver/myservice/frag/5
...
Each of these URLs are set to the "path" label and so I get 2000 series for this metric.
So if I calculate the rate over one minute I get 0,008 requests per second for each series.
If I sum this up (0,008... * 2000) I should get roundabout 17.
So why do I have a flatline at zero?

Fetch time series from Google BigQuery

I am trying to fetch a list of prices from google big query using the following query :
query_request = service.jobs()
query_data = {
'query': (
'''
SELECT
open
FROM
timeseries.price_2015
''')
}
query_response = query_request.query(
projectId=project_id,
body=query_data).execute()
The table contains 370000 records, but the query loads only the first 100000. I guess I am hitting some limit? Can you tell how I can fetch all records for the 'price' column?
The number of rows returned is limited by the lesser of either the maximum page size or the maxResults property. See more in Paging Through list Results
Consider using Jobs: getQueryResults or Tabledata: list where you can call those API in loop passing PageToken from previous response to next call and collecting whole set on client side