Google plus api: search activities by timestamp - google-plus

I am doing a cron style search of activities and I want to retrieve Google plus activities published after the timestamp when last search was run. How can this be done?
Current documentation seems to allow only search by keywords and doesn't talk about a timestamp range filter in search.
Here is the link to the documentation
https://developers.google.com/+/api/latest/activities/search

One way to do this would be to
a. Store the timestamp of the previous search as "previous_search_timestamp"
b. In every search, sort the results by recency (as allowed by the API)
c. Iterate over the results of current search, till you come across an activity whose published <= previous_search_timestamp
d. Stop processing the results from that activity onwards (or making further pagination requests) as the further activity results would already have been retrieved in the previous search. You don't want to make redundant API calls or data processing on your server :)

Related

flight-offers-search and flight-cheapest-date-search - limit by number of connections and layover duration

I am testing flight-offers-search and flight-cheapest-date-search
Are there parameters available to limit by number of connections and layover duration, didn't see it in the doc.
Also, is there a functionality to fetch future prices for given period ex: get average price for 2 week trips in the next month, 3 months, 1 year?
Thank you.
Regarding your first point: In the Flight Offers Search API: As of today, the API doesn't offer a parameter to control the time of the layover you will have to check the response and do it on your side. For the number of connections, you can filter direct and non-direct flights using the parameter nonStop. Then, if you want to limit the number of stops you have to do it by filtering the response (by looking at the number of segments inside the itineraries).
Flight Cheapest Date Search has a similar parameter to control the direct and non-direct offers: nonStop.
Regarding your second point: not directly, for this, you can do it by:
Use the Flight Offers Search and do multiples searches and make an average of the prices you find
Use the Flight Cheapest Date Search to do the same (keep in mind that this API uses a pre-computed cache and has a limited number of origin-destination)

Twitter API retrieve tweets with search_query for time range (more than year past)

Is there a way to collect tweets that match specific keyword, but are more than year old date range? I know you can collect tweet with search query on stream api, but I would like to grab tweets that are more than 1 year old, and also would like to specify the date range.
From the Twitter Search Documentation.
The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.
So, you can't grab historic data via the API.
You can specify date ranges using since:2015-12-20 and until:2015-12-21 as part of your query.

Soundcloud API: How to write conditions concerning dates/ids?

I'm having this use case I'm not figuring out by the soundcloud documentation page: I fetch the last 10 most recent tracks. In 2 hours, I want to see if there are new tracks. So technically, I want to ask "give me the tracks with created_at greater than the created_at of my last fetched track". How can I do that using the current Soundcloud API spec?
You can send created_at[from] parameter in the request which will allow you to set a minimum creation date for your query.
For example
/users/x/tracks.json?created_at[from]=2012-11-01%2016%3A02%3A00
For more info check the filters heading underneath each resource :)

What exactly does 'since_id' and 'max_id' mean in the Twitter API

I've been poring over the Twitter docs for some time now, and I've hit a wall how to get stats for growth of followers over a period of time / count of tweets over a period of time...
I want to understand from the community what does since_id and max_id and count mean in the Twitter API.
I've been following this page https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline
I'm trying to get stats for a user --
counts of tweets in a particular time period
count of followers over a particular time period
count of retweets
I'd like some help forming querystrings for the above..
Thanks..
since_id and max_id are both very simple parameters you can use to limit what you get back from the API. From the docs:
since_id - Returns results with an
ID greater than (that is, more recent
than) the specified ID. There are
limits to the number of Tweets which
can be accessed through the API. If
the limit of Tweets has occured since
the since_id, the since_id will be
forced to the oldest ID available.
max_id - Returns results with an ID
less than (that is, older than) or
equal to the specified ID.
So, if you have a given tweet ID, you can search for older or newer tweets by using these two parameters.
count is even simpler -- it specifies a maximum number of tweets you want to get back, up to 200.
Unfortunately the API will not give you back exactly what you want -- you cannot specify a date/time when querying user_timeline -- although you can specify one when using the search API. Anyway, if you need to use user_timeline, then you will need to poll the API, gathering up tweets, figuring out if they match the parameters you desire, and then calculating your stats accordingly.
The max_id = top of tweets id list .
since_id = bottom of tweets id list .
for more : get a deep look in the last diagram .. here
The max_id and since_id are used to prevent redundancy in the case of Twitter API calls. Visualize the tweets coming in as piling onto a stack. One API call has to specify how many (count) tweets will be processed. But as this call is made, new tweets may be added. In that case, if you draw out a stack and run through the process, you notice that there can be some 'fragmentation' or sections of unprocessed tweets stuck in between processed ones. This is visible in below image as well.
To get around this problem, two parameters are used to keep track of the latest/greatest ID tweet previously processed (since_id) and the oldest/lowest ID tweet recently processed (max_id). The since_id points to the bottom of the 'fragment' and the (max_id-1) points to the top of the 'fragment'. (Note that the max_id is inclusive unlike the since_id)
So, the parameters together keep track of which part of the tweet stack still needs to be processed.

Show hitted documents in the same series together in Lucene

The are some articles are written in several parts,
for example, I got those articles from IBM developer works:
Distributed data processing with
Hadoop, Part 1:Getting started
Distributed data processing with
Hadoop, Part 2:Going further
Distributed data processing with
Hadoop, Part 3: Application
development
I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.
How can I make sure the hitted documents in the same series displayed together?
I guess in SQL, we can use "group by".
I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.
If you want to roll your own, One possible way to do this is:
Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
Make an initial query to Lucene, and get a hit list.
For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.