Historical aggregate Twitter data - api

I want to graph the number of tweets and the number of followers over the last three months, but I haven't been able to find a way to do that either through the API or any ready-made tool.
I tried TwitterCounter, but the data they provided was basically the result of some sort of interpolation function, not based on actual historical data.
Is there a way to get historical aggregate data from Twitter (not the actual tweets, but the sums, averages, etc.)?

There are no such numbers. Or not that I am aware of them. Before they updated their tweet id algorithm it was possible to estimate the numbers of tweets per day via a simple difference, but now - since they use a different algorithm to create the ids - it is not possible anylonger.
You could try if google's twitter search could give you some stats.
What do you mean with the 'number of followers'? Whose followers?

Related

Best way to implement multiple continuous aggregates in postgres

Imagine you have to display information about rainfall based on cities over time.
You have tables the provides the details on how much it rains in a specific city for every hour. There is an endpoint that returns the average amount of rainfall for the timeframe/city requested.
(so imagine a table called rainfall_california, rainfall_texas, etc... I realize this schema isn't ideal for rainfall, but using it for an example.)
So instead of calculating the average on each request, I setup a continuous aggregate to calculate the average into a new view and have a policy to refresh the last hour of data once every hour.
ca_texas_rainfall_1_day
ca_texas_rainfall_7_day
ca_texas_rainfall_30_day
ca_california_rainfall_1_day
ca_california_rainfall_7_day
ca_california_rainfall_30_day
This works great and is super fast, but I'm a little confused on the best way to set it up. Should I have a different view for each continuous aggregate and each city? Wouldn't that result in a ton of different views? Or should I consolidate the average of each table into a single view?

Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

I want to be able to combine the functionality of the Kibana Terms Graph (be able to create buckets based on uniqueness of values from a particular attribute) and Histogram Graph (separate data into buckets based on queries and then illustrate the date based on time).
Overall, I want to create a Histogram, but I only want to create the Histogram based on the results of one query, not multiple queries like it's being done in the Kibana demo app. Instead, I want each bucket to be dynamically created per unique value of my particular field. For example, consider the following data returned by my query:
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "San Francisco"}
{"myValueType": "San Francisco"}
Also assume that each record has a timestamp field for separating histogram data by date. For that particular date, I want the data to be communicated as a count of 3 into the New York bucket and a count of 2 into the San Francisco bucket. However, I am only able to show a count of 5 for my one linked query. When I configure the Histogram, I am able to specify a field to use for my timestamp, but not to create buckets from. I could've sent a field to compute a total/min/max/mean, but this field would've had to be numeric, so that is not the solution either.
If I were to use a Term Graph to create a pie or bar graph, I am indeed able to separate my data into buckets based on the unique values of my specified field (in this case, "myValueType"), but this would total up the data for all-time, not split up the data by timestamp. Although this is good information to know, it is not ideal because I wouldn't be able to detect trends in my data.
I am looking for a solution that will do one of the following:
Let me dynamically create queries in my Kibana dash board to create "buckets" in a Histogram
Allow me to run an ElasticSearch Terms Aggregation to supposidly split up my data into buckets based on "myValueType" and integrate these results into my Histogram
Customize the JSON of my dashboard, but this doesn't look possible to me
Create my own custom panel, but this is not desirable
Link a Kibana "TopN" query in Kibana. Actually, this has proven to be a work-around for my problem because the TopN query dynamically created one query per unique value/term from the specified fieldName. However, the problem is that I can only link one colour to this TopN query and each unique term will be placed in a bucket that uses a different shade of the colour. Ideally, every bucket in my Histogram will have a completely different colour associated to it. Imagine how difficult it will be to distinguish unique terms as the number of buckets grows.
If all else fails, I make one query per unique value from my search field. This will allow me to have one unique colour per bucket, but as the number of unique terms in the "myValueType" field changes, I need to keep adding/removing queries from Kibana, which can get quite messy.
I'm sure there is someting that I am missing here. Please help me out. Many thanks.
A highly related SOF question: Is it Possible to Use Histogram Facet or Its Curl Response in Kibana
This would be a great feature. It looks like it will be supported in Kibana4, but there doesn't seem to be much more info out there than that.
For reference: https://github.com/elasticsearch/kibana/issues/1249
Maybe a little late but it is actually possible in the newest BETA release.
kibana 4 beta 3 installation download

facebook shares count not accurate

may i know why this three urls return difference of count?
https://www.facebook.com/plugins/like.php?href=http://www.rotikaya.com/iqram-dinzly-tinggalkan-jalan-jalan-cari-makan-kerana-takut-gemuk/&layout=standard&show_faces=false&width=300&action=like&colorscheme=light&height=30
https://graph.facebook.com/?id=http://www.rotikaya.com/iqram-dinzly-tinggalkan-jalan-jalan-cari-makan-kerana-takut-gemuk/
http://api.facebook.com/restserver.php?method=links.getStats&format=json&urls=http://www.rotikaya.com/iqram-dinzly-tinggalkan-jalan-jalan-cari-makan-kerana-takut-gemuk/
(deprecated but more accurate)
The Graph link is intended to reflect the number a like button would show, which is an combination of several metrics.
The getStats endpoint does a more detailed breakdown. If you look at the total value on it, you'll see they match up.
https://graph.facebook.com/?id=http://www.imdb.com/title/tt0117500/
This displays total of likes and the deprecated shares as a whole
http://api.facebook.com/restserver.php?method=links.getStats&format=json&urls=http://www.imdb.com/title/tt0117500/
This one displays all the data, as you can see if you add the like count and share count it is just the same with the previous URL

What exactly does 'since_id' and 'max_id' mean in the Twitter API

I've been poring over the Twitter docs for some time now, and I've hit a wall how to get stats for growth of followers over a period of time / count of tweets over a period of time...
I want to understand from the community what does since_id and max_id and count mean in the Twitter API.
I've been following this page https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline
I'm trying to get stats for a user --
counts of tweets in a particular time period
count of followers over a particular time period
count of retweets
I'd like some help forming querystrings for the above..
Thanks..
since_id and max_id are both very simple parameters you can use to limit what you get back from the API. From the docs:
since_id - Returns results with an
ID greater than (that is, more recent
than) the specified ID. There are
limits to the number of Tweets which
can be accessed through the API. If
the limit of Tweets has occured since
the since_id, the since_id will be
forced to the oldest ID available.
max_id - Returns results with an ID
less than (that is, older than) or
equal to the specified ID.
So, if you have a given tweet ID, you can search for older or newer tweets by using these two parameters.
count is even simpler -- it specifies a maximum number of tweets you want to get back, up to 200.
Unfortunately the API will not give you back exactly what you want -- you cannot specify a date/time when querying user_timeline -- although you can specify one when using the search API. Anyway, if you need to use user_timeline, then you will need to poll the API, gathering up tweets, figuring out if they match the parameters you desire, and then calculating your stats accordingly.
The max_id = top of tweets id list .
since_id = bottom of tweets id list .
for more : get a deep look in the last diagram .. here
The max_id and since_id are used to prevent redundancy in the case of Twitter API calls. Visualize the tweets coming in as piling onto a stack. One API call has to specify how many (count) tweets will be processed. But as this call is made, new tweets may be added. In that case, if you draw out a stack and run through the process, you notice that there can be some 'fragmentation' or sections of unprocessed tweets stuck in between processed ones. This is visible in below image as well.
To get around this problem, two parameters are used to keep track of the latest/greatest ID tweet previously processed (since_id) and the oldest/lowest ID tweet recently processed (max_id). The since_id points to the bottom of the 'fragment' and the (max_id-1) points to the top of the 'fragment'. (Note that the max_id is inclusive unlike the since_id)
So, the parameters together keep track of which part of the tweet stack still needs to be processed.

Show hitted documents in the same series together in Lucene

The are some articles are written in several parts,
for example, I got those articles from IBM developer works:
Distributed data processing with
Hadoop, Part 1:Getting started
Distributed data processing with
Hadoop, Part 2:Going further
Distributed data processing with
Hadoop, Part 3: Application
development
I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.
How can I make sure the hitted documents in the same series displayed together?
I guess in SQL, we can use "group by".
I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.
If you want to roll your own, One possible way to do this is:
Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
Make an initial query to Lucene, and get a hit list.
For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.