i'm wanting to run a contest on Instagram where the user enters by leaving a comment - how would I go about getting past the 150 api limit?
Any paid solutions out there? Anyway I can get the comments and add them to a spreadsheet in real time?
Thanks
The Instagram comments endpoint (https://api.instagram.com/v1/media/{media-id}}/comments?access_token=ACCESS-TOKEN) will only return you the most recent comments, so I would recommend setting up a cron job to request the comments for the media you are using. Every minute would probably be sufficient, but you can go faster if you think more than 150 comments will get added every minute. Since they return you the most recent comments with each request, you can cache everything if you start making requests as soon as your photo posts.
Related
Right, this is really working on my nerves, but Instagram has to do something about their bloody documentation.
I am already trying for a week to live update my website with new instagram posts without refreshing the page. Twitter was fairly easy, but instagram is just one big mess. Basically I use the realtime Instagram API, the callback and all that stuff is working fine, but thanks to Instagram it does not return me an ID from the post that is new, the callback only returns some basic stuff:
[{"changed_aspect": "media", "object": "tag", "object_id": "nofilter", "time": 1391091743, "subscription_id": xxxxx, "data": {}}]
with this data you are nothing, except for the Tag, but I knew the tag before this callback too so doesn't matter. It actually only tells me that there is a new post. I have tried doing the same request as when the page loads, when this callback occurs, and get the posts that are newer than those already on the page. Unfortunately I have not succeeded in this yet. I have picked the ID from the last posted instagram post, and checked if it is in the callback request, and it's not.
What am I doing wrong?
I'd appreciate some help, thanks!
Edit:
I'd like to note that this is not only a problem with the realtime api, but also with the normal API. I just don't know how to compare data so I don't get duplicates in my database(normal api), or on my website (realtime). I can't find any tutorial or documentation (Yes, I might be blind), that explains to me how to compare data. I can only find the min_id and max_id, but no explanation what these id's contains. I checked these id's with id's from results, and they do not match. It's not an ID from a media item.
I also checked the next_url, and in my logic thinking, this should be a URL to the next page (like Twitter).
Am I looking at this all wrong?
Ok strike my old answer, I changed the way I do this. Here's how I'll do it now.
I still wait for 10 hits on my Real-time subscription, when I reach 10 I send off a new thread (if one is not already running).
The sync thread queries my DB for a value, I need the last min_tag_id I used. Then I query:
https://api.instagram.com/v1/tags/*/media/recent?access_token=*&min_tag_id=*
Try it out here: https://api.instagram.com/v1/tags/montreal/media/recent?access_token=*
You'll get 20 results, and a min_tag_id value. Append that to your url, you'll see you get no results. Wait a couple of seconds and refresh. Eventually you'll get some media, and a new min_tag_id.
(You can ignore the "next_url" value they give you, you won't be using that).
Basically you only need to store that min_tag_id and query until you have no more results, that means you're done then.
When you get a subscription push, you need to query that endpoint (tag / recent).
I normally start an synchronous thread to perform this so I can answer in under 2 seconds to Instagram.
Then you parse that endpoint and look for a "next url" value.
Keep querying that end point, parsing the media and going to the next url until you find your stop condition.
For me I try to match 10 consecutive records in my DB. Basically from the tag, I store media when then meet my business rules.
The Instagram documentation is accurate and actually well written.
The realtime API is working correctly. As stated in the documentation:
The changed data is not included in the payload, so it is up to you
how you'd like to fetch the new data. For example, you may decide only
to fetch new data for specific users, or after a certain number of
photos have been posted.
http://instagram.com/developer/realtime/
You only receive a notification that an update has happened to your subscribed object. It is up to you to call the API to find out what that data is.
You can call the /tags/[tag-name]/media/recent with an access token that you have previously stored on your own server or DB. Then, you should be able to compare the data returned from that endpoint with any data you have retrieved prior, and just pull the objects that you do not yet have.
With https://dev.twitter.com/docs/api/1/get/statuses/user_timeline I can get 3,200 most recent tweets. However, certain sites like http://www.mytweet16.com/ seems to bypass the limit, and my browse through the API documentation could not find anything.
How do they do it, or is there another API that doesn't have the limit?
You can use twitter search page to bypass 3,200 limit. However you have to scroll down many times in the search results page. For example, I searched tweets from #beyinsiz_adam. This is the link of search results:
https://twitter.com/search?q=from%3Abeyinsiz_adam&src=typd&f=realtime
Now in order to scroll down many times, you can use the following javascript code.
var myVar=setInterval(function(){myTimer()},1000);
function myTimer() {
window.scrollTo(0,document.body.scrollHeight);
}
Just run it in the FireBug console. And wait some time to load all tweets.
The only way to see more is to start saving them before the user's tweet count hits 3200. Services which show more than 3200 tweets have saved them in their own dbs. There's currently no way to get more than that through any Twitter API.
http://www.quora.com/Is-there-a-way-to-get-more-than-3200-tweets-from-a-twitter-user-using-Twitters-API-or-scraping
https://dev.twitter.com/discussions/276
Note from that second link: "…the 3,200 limit is for browsing the timeline only. Tweets can always be requested by their ID using the GET statuses/show/:id method."
I've been in this (Twitter) industry for a long time and witnessed lots of changes in Twitter API and documentation. I would like to clarify one thing to you. There is no way to surpass 3200 tweets limit. Twitter doesn't provide this data even in its new premium API.
The only way someone can surpass this limit is by saving the tweets of an individual Twitter user.
There are tools available which claim to have a wide database and provide more than 3200 tweets. Few of them are followersanalysis.com, keyhole.co which I know of.
You can use a tool I wrote that bypasses the limit.
It saves the Tweets in a JSON format.
https://github.com/pauldotknopf/twitter-dump
You can use a Python library snscrape to do it. Or you can use ExportData tool to get all tweets for the user, which returns already preprocessed CSV and spreadsheet files. The first option is free, but has less information and requires more manual work.
We are working on some analytics using the amount a user is retweeted or mentioned... I can't seem to find a way to get these numbers using the apis does anyone have any ideas?
https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name={screen_name}&count={count}
it's important to include the line include_entities=true to the request. This will give you an expanded response including re-tweet and mention counts.
Get Status / User Timeline
Twitter API Console
Update:
to get tweets from the last 90 days, there is a Node.js library you can use called Snapbird
https://github.com/remy/snapbird
.. and here is another resource covering the same topic.
http://blog.tweetsmarter.com/twitter-search/10-ways-and-20-features-for-searching-old-tweets/
For some reason, when searching on one specific Twitter user, the search API return nothing. (ie http://search.twitter.com/search.atom?q=+from%3ATWITTERHANDLE_A) TWITTERHANDLE_A here is the Twitter account name. This user has been active for over a month, has had many RTs, #s and has sent such tweets out as well.
Meanwhile, I created a new Twitter account - we will call it TWITTERHANDLE_B. Immediately after I created the account, I sent 1 tweet and performed the same search as above (http://search.twitter.com/search.atom?q=+from%3ATWITTERHANDLE_B)
The tweet was returned.
Is there ANYWAY to find out if and/or why a particular user would be blocked from search results? Thanks so much for any help... I'm going crazy here. Twitter's documentation just says sometimes a user's tweets won't be searched!
The documentation you refer to has a link to this page:
http://support.twitter.com/forums/10713/entries/42646
The bottom entry addresses your issue, hopefully you can fix it that way.
Bear in mind the search API only grabs results from the last couple of weeks, so if the user hasn't tweeted in a while then there won't be any results.
You could try not using the Search API and grab the results directly:
http://twitter.com/statuses/user_timeline/-username-.json
Works fine for http requests at least... >_<
Feel free to edit the title if you know how to formulate the question better. (Tagging is a problem as well.) The problem may be too difficult in this general form, so let us consider a concrete example.
You get a screenful of stackoverflow questions by requesting /questions ?sort=newest page. Next page link leads to /questions?page=2 &sort=newest. I suppose that at server side, the request is translated into an SQL query with LIMIT clause. Problem with this approach is, that if new question were added while user browses first page, his second page will start with some questions he already saw. (If he has 10 question per page, and 10 new questions happened to be added, he’ll get exactly the same content second time!)
Is there an elegant way to solve this common problem? I realize that it is not that big a problem, at least not for stackoverflow, but still.
The best idea I have (apart from storing request history per client) is to use /questions?answer_id=NNN format. Server returns a page that starts with the requested answer, and puts the id of the first answer on the next page into next page link. There must be a way to write SQL for that, right?
Is it how it usually done? Or there is a better way?
This can't be done an easy way. For instance, the "Unanswered" list here at stackoverflow is sorted by number of votes. So if you'd save the last ID of the page you're viewing (in a cookie, request, session, whereever) and someone upvotes a post while you're browsing page 2, page 3 isn't complete since the recently upvoted post could have been moved to page 1 or 2.
Only way to do it is to load the complete list in someones session. Please don't...
As already mentioned, let's hope people are used to this by now.
Most web sites I've seen don't solve this problem - they show you a page including some content you've already seen.
You might consider that a feature - when you click "next" and see some content you're seen before, it's a signal that you want to go back to the front again because there's some new content.
Tag each question with its time entered into the database, carry the time the frontpage was last loaded as a cookie or part of the URL, and limit the search to items n through n+displaynum as you go forward.
But I wouldn't bother. This behavior is uniform enough that most users expect it, and it serves as a flag for when new data is becoming available. You can even open a new tab/window that starts back at the top of the list to see what has come up.
I believe the SQL (for MySQL) would be:
SELECT *
FROM entries
WHERE entry_id >= #last_viewed_entry_id
ORDER BY entry_id
LIMIT 50