How do the social media monitoring sites fetch the huge number of user posts? - social-media

There are many social media monitoring sites in the market. I am very curious about how do the sites fetch the posts of such a huge number of users. How do they know which user's posts should be fetched?
For example, if one site needs me to log in with my Facebook account, and it just fetches/analyzes my or my friend's posts. That would be reasonable. But I tried several social media monitoring services several days ago, I found that there are massive amount of data fetched, users of all kinds are included.
How do the services know which user's data should they fetch? If they fetch all the posts of a certain social site, how do they achieve that? Doesn't the social site's API always prohibit apps from fetching data with large amount?

The application Social Radar is primarily crawler driven. This is similar to how the Google.com search engine works.
Google doesn't really worry about which users' content they're crawling, they just index what they can find. Content is typically structured in ecosystems so if you can find part of a conversation, you can often discover the rest of it as well. This is also true and helpful in the process of spam filtering.
APIs are leveraged as well, terms differ by service.

Related

How can I acquire data from social media to analyze it using machine learning?

I have a project where I'm required to predict future user location so that we can provide him with location specific services as well as collect data from his device that would be used to provide a service for another user etc...
I have already developed an android app that collects some data but as social media is the richest in terms of information, I would like to make use of that. For example, if the user checks in in a restaurant and gives it a good review (on fb for example) then he is likely to go back there. Or if he tweets a negative tweet about a place then he is unlikely to go back there... these are just examples I thought of.
So my main issue is: how do I even get access to that information? I mean it's not like the user is going to send me a copy of every social media activity they have so how do I get it and is that even possible? Because I know fb, twitter and other social medias have security policies so I initially thought it couldn't be done and that only facebook gets access to their users' information to predict their likes and dislikes and show them adds and sponsored posts accordingly but when googling it, I found a lot of tools that claim to be able to provide that sort of data. How did they even acquire it and is it possible for me to do the same?
Facebook, Twitter, etc. have well-documented APIs that may or may not allow you to access the data.
For the APIs, see the official documentation of each, because anything I write here will likely be outdated in a year or two, as their APIs change.
Don't rely on web scraping. The web sites change design more often than the API, and you will likely violate the terms-of-service.

How do web comparison websites like kayak get their data?

I have been looking online and saw many similar/same posts but all were extremely old (latest I found was from 2011) so since technology changes, I thought I ask too.
I wonder how a flight comparison website (where you cannot book flights and can only be redirected to other websites) get their data.
Is it all by now through api's or is it throgh scrapping data (which would be not so reliable)? Ive been reading online, trying to find out if thats the case but it doesnt really seem that EVERY airline and EVERY flight search website (with booking option) provides an api. So I wonder how sites like Kayak get their data if not every airline/every flight booking website provides an api?
Also, I came across some api's like
QPX Express API
skyscanner travel api (which I checked out on some website which is using it and it does seem that data is quite limited ?!)
Travelport api
Amadeus API
Sabre travel api
Wego Affiliate Network (which seems really great but search takes super long)
I wonder if anyone has experience with the mentioned api's and how good their are /if using them is 'the way' of doing it or if its actually much more realiable to request data directly from each airline and booking website (if thats possible)?
Thanks a lot!
If we take Kayak as the example, as that is who you mentioned, they approach the data in two forms.
They have API PULL connections to GDS companies (i.e. Sabre), some airlines and large online travel companies such as Expedia etc.
Smaller airlines in particular PUSH their inventory and fares from their inventory to companies such as Kayak.
Aggregation companies generally provide PUSH access though companies who want to PUSH their data have to comply with the aggregators requirements/standards.
It is a supply and demand service. Aggregation companies will generally request access to large established companies, however, will also allow companies to push their data to them if they wish.
The data is not normally scrapped, it is through API and web service platforms.

Running Multiple websites with same profiles

Is it a bad practice in terms of search traffic to maintain multiple websites in the same niche. For example using the same set of social profiles from twitter, facebook and g+ and using them on two websites related to laptop shopping.
I am interested to know the search traffic impact with and without using social sharing at all.
Not is a bad practice for SEO at all. You could be penalize for duplicate content but the socials profiles would not do that.
Every day is more important the impact in social networks just to take more reputation and more traffic at all. Your profile of Google+ just will do, that your profile will improve as author.
Soon will be important the reputation from authors and without a good SEO position of your website if you are good position as author the sites where you will collaborate will have better reputation.
I expect it will help to you.

how to have google sites pages display things specific to the currently logged in user

I've been wondering for a while if I can have google sites pages display things specific to the currently logged in user. I keep googling for a solution, but never come up with anything conclusive.
Basically I use Google sites for my university class websites. I'd love it if I could display things specific to an individual student, such as, you have completed homeworks 1,2 and 3 - now it's time to work on homework number 4
Many thanks in advance
I've search in Google and found this:
You can use Page-level permissions to display certain pages to certain
users. You can use Google Apps Script to detect the signed in user
too, but what you could display in Google Apps Script is limited. I'd
go for Page-level permissions.
(Link to that post on Google community support site.)
I think this is a good start for you.

Google Policy on interlinking my websites together

I was wondering what's Google's official policy on linking my own websites together, do they forbid it, allow it, allow it as long as it's no-follow, etc.
For clarification i will give both a white-hat and black-hat examples:
white-hat:
I'm a web designer who also has several affiliate websites. I designed those websites so i would like to give myself credit by linking from the affiliate website to my professional bio website where people can hire me as a designer.
black-hat:
I buy 100 different domains and link each one to the other 99 sharing all the link juice between them. The content of each website abide by Google's policy and isn't spammy , the only thing that's wrong is the fact that i got 99 links to each of them and i'm the only one doing the linking.
First solution - nofollow:
Well, if they are nofollow, I don't see why Google would care.
So, you'd probably be safe with that, if what you want to achieve is indeed giving yourself credit.
But, as for SEO optimization, as you already know, the sites wouldn't benefit much.
However with nofollow, even if you didn't increase pagerank, number of visits to each site should increase (the traffic from your other sites). This also could be beneficial.
Second solution - portfolio site:
There is one scenario which could suit your purpose:
Create your "portfolio". A site with links to all the sites you created, as an example of your skills and stuff..
Place a link on each of your sites to this portfolio.
Now, you have a page with 100 outbound links, each perfectly legitimate. And each of your sites contains just one outbound link connecting it to your other sites.
This should be fine both for your presentation and for SEO, and you avoided having a link farm.
EDIT: You can find actual info from Google here: http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf