How Facebook organize posts in news feed page - sql

I have always wondered how Facebook organize posts in news feed page. Facebook doesn't use date and time to organize posts in news feed page. This is obvious when some posts acquire many likes or comments. These posts, in spite they may be older posts, will be displayed first.
let's suppose a simple database table for posts :
Post_Id
Post_Owner_Id
Post_Text
Post_Image
Post_Date
So what field (or fields) that must be added to organize posts like the one in Facebook ?

The algorithm for how Facebook sorts the newsfeed isn't public from what I've heard, but what the algorithm looks for isn't completely.
Have a look at these articles for a slight idea on what they do and why.
Bufferapp - Decoding the Facebook newsfeed
Forbes - Facebook Changes News Feed Algorithm To Prioritize Content From Friends Over Pages
Everything You Need To Know About Facebook’s News Feed Algorithm
So if you are wanting to recreate their algorithm, you could get a very rough imitation by sorting based on date rounded to the closest week, second by the type of post it is (message, page, etc) then perhaps the number of likes it got.
Which means you would need number of likes and the Post_Type attributes.
You would also need to have it sort them based on friend status (direct or friends-of-friends) and whether or not the post comes from someone verified such as a celebrity.
There is so much to it.

Related

Disqus API: How to get recent upvotes across all posts?

I am the founder of an educational site + app that uses Disqus for commenting.
Recently we have had a spate of spammers upvoting random posts. The upvote causes an email to be generated to the author of the original post, and the upvoter's name is also visible when you hover over the number of votes. The upvoter's display name is usually something like "(Heart) See Bio (Heart)" and the bio contains a link to a porn site, etc. Sometimes the display name is more explicit.
Since the spammer is not actually leaving comments but simply voting on existing comments, the existing spam countermeasures do not catch this. If the spammer votes on a recent post, a moderator might notice, but if they vote on an old post, it would probably go undetected (except to the victim who receives an email notification, and any users who happen to view the upvoter list).
Is there any way to see a list of all recent upvotes, across all posts on a forum, so that we can manually moderate? I've looked through the methods in the Disqus API, but I don't see anything. Any pointers would be greatly appreciated.

Twitter API Standard Search: Can I get hidden replies?

I am trying to get as much data as a I can out of the Twitter API for an academic research project. Even though I only have access to the Standard API the data should be as accurate as possible. I am building myself a "wrapper" around Twarc and other utilities in Python that gets me most of the data I want in just the format I need. A big problem was getting all the replies, but I was able to solve it with a bit of trickery: Searching from the tweet in question onwards and then checking if the tweets in the obtained sample have the original tweet ID in "in_reply_to_tweet_id". Rinse and repeat with those newly obtained tweets.
Then I noticed the new moderation feature Twitter implemented in March. Now the moderated comments under "More replies" do not show up in my search output.
Example: https://twitter.com/NDRreporter/status/1113353224730365952
I find all replies except the following: Under "More replies" ("Mehr Antworten" in German), there is a reply chain started by a extreme right leaning (possibly troll) account ("#Der Steuerzahler") that got moderated and shoved down there. This does not show up in API searches, even if I let the code iterate for over an hour just looking for replies to this particular original tweet.
My question is pretty general: Aside from getting replies as they come in (i.e. before they are moderated) via Filter API, is it possible to find these moderated tweets via the Standard Search API? Not looking for a ready-made solution, general pointers suffice. If I can't find them via Search, then I obviously won't try it with that anymore.
Thanks in advance.

Creating a SOLR index for activity stream or newsfeed

I am trying to index the activity feed of a social portal am building. The portal allows users to follow each other to get updates from the people they follow as an activity feed sorted by date.
For example, user A will be following users B, C, D, E & F. So user A should see all the posts from B, C, D, E & F on his/her activity feed.
Let's assume the post consist of just two fields.
1. The text of the post. (text_field)
2. The name/UID of the user who posted it. (user_field)
Currently, I am creating an index for all the posts and indexing the text_field & user_field. In scale, there can be 1,000,000+ posts. A user may follow 100s if not 1000s of users. What will be the best way to create an index for this scenario?
Should I also index a person followers, so that its quickly looked up and then pass it to a second query for getting the posts of all those users sorted by date?
What is the best way to query the index consisting of all these posts, by passing the UID of all the users that are followed? Considering this may be in 100's or more.
Update:
The motivation for using Solr for the news feed was mainly inspired by this detailed slide and my brief discussion with OpenSocial team.
When starting off with a social portal, Fan out on write seems an overkill and more expensive. However Fan out on read is better. Both the slide and the OpenSocial team suggested using a search backend for Fan out on read. The slide mentioned above also have data on how it helped them.
At present, the feed is going to be flat and only sort criteria will be the date(recency). We won't be considering relevance or posts from more closer groups.
It's kind of abstract, but I will do my best here. Based on what you mentioned, I am not sure if Solr is really the right tool for the job here. You can still have Solr for full text search, but I am not sure about generating a news feed from it in this scenario. Remember that although Solr is pretty impressive, it is a search engine. I will pretend that you will stick with Solr for the rest of the post, keep in mind that we are trying to put a square peg through a round hole here though.
Here are a few additional questions you should think about.
You will probably want to add a timestamp of the post to the data element
You need to figure out how to properly sort the results. Is it in order of recency? Or based on posts that the user is more likely to interact with?
If a user has 1000+ connections, would he want to see an update from every one of them in the main feed? Or should posts from a closer group of friends show up higher?
Here are some comments about your questions:
1) If you index person's followers, it may be hard to keep up. I am assuming followers are going to be changing often and re-indexing in this scenario would not really be practical.
2) That sounds more on par, but again, you need to figure out the sorting. You can get a list of connections for the user, then run a search for top posts from all of them.

Instagram: sort photos with a specific tag with most likes

I'm running a contest on the web where the image with the most likes wins. It's tiresom having to go through 900 images manually so what I want to do is, sort all images with the tag lets say #computer after the amount of likes, with the most liked pics on top. I have searched the net like crazy for some program or site that does this (ExtraGram, gramhoot, statigram, webstagram) but none offer to sort by amount of likes and it drives me INSANE! It's a really relevant request.
I've tried istafeed.js but it doesn't include all images, actually it leaves out the ones with the moest likes which defies the purpose.
There's nothing I know of in the Instagram API that sends back media sorted by likes in advance. I don't think there's a tool to do this either, but writing one is relatively simple IMO and I've done it before for a contest specifically.
The simplest thing to do is to do the following:
Use the Instagram API (via a library or pure REST) to query by tag. For instance, if you only care about the most recently tagged media or you want to process by date, you can use the [/tag/tag-name/media/recent][1] enpoint.
Page through each result page by processing the next_max_id/next_max_tag_id.
Collect the results locally into a database. You will receive the "like" count for each media item. You will have to update the data if you want to track the likes over time.
Sort the results using your database or if it's a small result set, you could skip #3 and just sort in memory.
If you need to refresh the results, you need to subscribe to the Tag via the API. You can give Instagram a URL to then push updates, and then you'll have to retrieve 1 or media items and update them in your database accordingly.
You will of course need to register your application with Instagram to get an API key if you want to do this. Then you can either send them your client_id or use OAuth.
The best way to achieve this is to pull the photos in and then sort them programmatically based on the likes numeric value. I've designed a plugin that does this automatically for you for anyone interested.
Instagram Journal

Is it possible to access Open Graph Insights via Graph API or other programmatic means?

I'm able to access FQL Insights metrics (http://developers.facebook.com/docs/reference/fql/insights/) such as "application_active_users" for my App via the Graph API; however, the Open Graph Insights metrics (http://developers.facebook.com/docs/opengraph/insights/) such as "published actions" and "ticker impressions" don't seem to be available via Graph API. I looked through the documentation thoroughly and don't see this addressed. Has anyone been able to access Open Graph Insights programmatically?
My impression is, these metrics are combined for us, and there is no way to access the individual metric.
referring to: https://developers.facebook.com/docs/reference/fql/insights/
page_impressions_organic
The number of times your posts were seen in News Feed or Ticker or on visits to your Page. These impressions can be Fans or non-Fans
page_impressions_organic_unique
The number of people who visited your Page, or saw your Page or one of its posts in News Feed or Ticker. These impressions can be Fans or non-Fans
page_posts_impressions_organic
The number of impressions of your posts in News Feed or Ticker or on your Page day, week, days_28
page_posts_impressions_organic_unique
The number of people who saw your Page posts in News Feed or Ticker, or on your Page's Wall
post_impressions_organic*
The number of impressions of your post in Newsfeed, Ticker, or on your Page's Wall
post_impressions_organic_unique*
The number of people who saw your post in their Newsfeed or Ticker or on your Page's Wall
Have you tried doing FQL through the graph api:
http://graph.facebook.com/fql?q=[your insights query]
we can filter the campaign based on the number of impression got by adding parameter in the request body the "filtering" -for that the specify the fields for the filtering in this case it will be impression, next is "oprerator" oprerator is the boolean operator here it is "GREATER_THAN",next is value it will be a number .