How can I acquire data from social media to analyze it using machine learning? - api

I have a project where I'm required to predict future user location so that we can provide him with location specific services as well as collect data from his device that would be used to provide a service for another user etc...
I have already developed an android app that collects some data but as social media is the richest in terms of information, I would like to make use of that. For example, if the user checks in in a restaurant and gives it a good review (on fb for example) then he is likely to go back there. Or if he tweets a negative tweet about a place then he is unlikely to go back there... these are just examples I thought of.
So my main issue is: how do I even get access to that information? I mean it's not like the user is going to send me a copy of every social media activity they have so how do I get it and is that even possible? Because I know fb, twitter and other social medias have security policies so I initially thought it couldn't be done and that only facebook gets access to their users' information to predict their likes and dislikes and show them adds and sponsored posts accordingly but when googling it, I found a lot of tools that claim to be able to provide that sort of data. How did they even acquire it and is it possible for me to do the same?

Facebook, Twitter, etc. have well-documented APIs that may or may not allow you to access the data.
For the APIs, see the official documentation of each, because anything I write here will likely be outdated in a year or two, as their APIs change.
Don't rely on web scraping. The web sites change design more often than the API, and you will likely violate the terms-of-service.

Related

Verification Google OAuth2 concert scren with the apps for personal use only

I recently asked this question and user's #DalmTo and #Sergio NH they gave me an exhaustive answer for which I thank them very much.
Moving forward to question, we started publishing the application, and its verification was not required, since no scope was added (here it is a little unclear why the requests worked in an application with a test mode in which these scope were not added (google drive, google sheet and google ads)).
However, this time the application in the "In Production" mode began to give us an "Unverified app screen" (see Unverified app screen). We decided that we still need to add scope to the list, and, of course, that the scope list (their list is described above) requires verification by Google.
We started filling in the necessary fields, while studying the Google documentation at the same time, and came across the following information (see block Verification process -> What are the requirements for verification?):
Apps not applicable for verification
Apps for internal use only
(single domain use) Apps for personal use only Apps that are Gmail
SMTP plugins for WordPress Apps that are in development or
staging/testing
Apps for personal use only
And this is just our case: we have already received permission from Google Ads and are just generating simple reports that we want to integrate with Google Sheet. I.e., this is an elementary script that works within this account (however, we still need to request the first concert screen, even for this developer account) and cannot be distributed to any other accounts.
But when adding our scope, Google requires us to pass verification, forcing us to fill in the required fields, in the form of domains and their verification via the Search Console (we have already done this and this stage does not cause difficulties) and links to Youtube videos - where we must show how scope is used.
And just this stage is not clear. We do not allow other people's accounts to connect to this application, and the software does not have any interface, it is just a script that receives data from Google Ads and saves it to Google Sheet (creating a file via Google Drive). We have described all this in the scope usage description field. But the link to the Youtube video is require field, and we sincerely do not understand why (considering our case) we should record something, and most importantly, what exactly we should record in this case. If the documentation itself says that in our case we do not even need a verification.
Maybe we did not understand something and now we are doing it wrong? We will be glad to receive any tips from experts working with Google Cloud Console and apologize in advance for broken English.
We also apologize in advance to the StackOverflow community that we have to publish such elementary (which we are absolutely sure of from our side) questions here. We come here from Google Cloud Console - > Support - > Community support, and we must first try to publish posts in the Google Groups specified there, but they simply do not answer us, apparently considering our questions too elementary and not worthy of attention (however, these same questions in Google Groups are moderated) (for example, the previous question). And we are no longer able to contact any other support. Once again, we apologize for having to ask about this here.
It is true that if your app is a single use app then you do not need to be verified.
However if you don't get your app verified then there will be some restrictions.
you will see the unverified app screen
your refresh tokens will probably only be good for two weeks.
In the case of the YouTube api uploaded videos will be suck private.
If you can live with those points then you don't need to verify your app and you can continue as is.
If on the other hand you don't want to see the unverified app screen and you want a refresh token that will last longer then two weeks. You will need to verify your app. Yes, Even if your app is a console application running as a job some where you still show the consent screen. This is the YouTube video you will need to show Google. Show the consent screen popping up show the URL bar and then show your script running. You also need to set up the homepage and privacy policy screens. Yes i 100% agree with you that this is silly.
When you go though the process. Explain to google that this is a single use script running as a job some where.
Unfortunately when Google changed it so that Refresh tokens expire for unverified apps they pretty much tied the hands of all developers who are running such single user scripts. We now have to get our apps verified if we don't want to have to request a new refresh token every two weeks.
If your program needs to access the requested scopes of the Google account privacy, even though the user is yourself, you also need to provide a youtube video to demonstrate how you use this program. The auditor cannot guarantee whether you will make this program public.

Requirements for beginner using twitter api to retweet specific tweets

I am new to twitter api.
I want to search tweets that have 2 specific terms and 1 specific hashtag, and then I want to retweet them in my account for the purpose of consolidating all the tweets.
Do I need to have a developer account?
Should I look to an already existing app (I prefer one that is free or open source), or can I do this with twitter api as a regular user?
Any tutorials or instructions are greatly appreciated. TIA.
I have applied for a developer account, but I don't know how long it will take - I also don't know what the criteria are for being granted one.
I found different kinds of "retweet" applets on ifttt.com - I implemented one of them, and it accomplished what I wanted to achieve, though not perfectly, and there was no documentation to customize functionality, etc.
I couldn't find information anywhere about using twitter API without a developer account, so I applied for that type of account. They emailed approximately 3 times to get more information about my use case, and purposes, intent of use, what I intend to develop, etc. My application was approved within approximately 48 hours.
I will update this answer if there is more information I think might be valuable to share.

Is there a way to register an application on Google+ like on Facebook?

In particular I'm interested in the possibility of getting an App Access Token with no expiration time, exactly as I do with Facebook.
I want to publish on behalf of the user via server, and I found very useful and convenient the Facebook's procedure in which we ask for the user permissions only the first time.
I have been working with this kind of social-networks interaction for merely three weeks, so I will be very happy to hear any type of suggestions or critics.
Google+ does not currently have a public write API. There are selected partners that they work with (such as HootSuite) that provide this feature, but they are making access to it available very slowly. See https://developers.google.com/+/api/pages-signup for further details.
Google+ does have a concept of Moments, which are activities that happen in your app that are reported to Google+ and which the user may later wish to share, or may make available to people in their circles on a limited non-notification basis. This is probably not what you want, but may serve some needs. See https://developers.google.com/+/api/latest/moments for more info and examples how to use it.
Simply, No there is no way to do that in Google+ in current time. In general, apps for Google plus is read only.

Can client side mess with my API?

I have a website that revolves around transactions between two users. Each user needs to agree to the same terms. If I want an API so other websites can implement this into their own website, then I want to make sure that the other websites cannot mess with the process by including more fields in between or things that are irrelevant to my application. Is this possible?
If I was to implement such a thing, I would allow other websites to use tokens/URLs/widgets that would link them to my website. So, for example, website X wants to use my service to agree user A and B on the same terms. Their page will have an embedded form/frame which would be generated from my website and user B will also receive an email with link to my website's page (or a page of website X with a form/frame generated from my server).
Consider how different sites use eBay to enable users to pay. You buy everything on the site but when you are paying, either you are taken to ebay page and come back after payment, or the website has a small form/frame that is directly linked to ebay.
But this is my solution, one way of doing it. Hope this helps.
It depends on how your API is implemented. It takes considerably more work, thought, and engineering to build an API that can literally take any kind of data or to build an API that can take additional, named, key/value pairs as fields.
If you have implemented your API in this manner, then it's quite possible that users of this API could use it to extend functionality or build something slightly different by passing in additional data.
However, if your API is built to where specific values must be passed and these fields are required, then it becomes much more difficult for your API to be used in a manner that differs from what you originally intended.
For example, Google has many different API's for different purposes, and each API has a very specific number of required parameters that a developer must use in order to make a successful HTTP request. While the goal of these API's are to allow developers to extend functionality, they do allow access to only very specific pieces of data.
Lastly, you can use authentication to prevent unauthorized access to your API. The specific implementation details depend largely on the platform you're working with as well as how the API will be used. For instance, if users must login to use services provided by your API, then a form of OAuth may suffice. However, if other servers will consume your API, then the authorization will have to take place in the HTTP headers.
For more information on API best practices, see 7 Rules of Thumb When You Build an API, and a slideshow from a Google Engineer titled How to Design a Good API and Why That Matters.

Design an API for a web service without "selling the farm"?

I'm going to try to phrase this as a generic question.
A company runs a website that has a lot of valuable information on it. This information is queried from an internal private database. So technically, the information in the database is the valuable part.
If this company wished to develop an API that developers could use to access their database of valuable & useful information, what approach should the company take?
It's important to give developers what they need. But it is also important to keep competing websites from essentially using the API to steal everything and essentially steal all traffic from the company's website.
Is there was some way the API could be used in a way that drives traffic back to the original company's website somehow? Something that gives users a reason to keep going there.
This is a design consideration that my company is struggling with that I can imagine other web-based services have come across before.
Institute API keys - don't make it public. Maybe make the signup process more complex than "anyone with an e-mail address".
Rate limit the API based on keys. If you're running more than X requests a minute, you're likely mining the database.
Don't provide a "fetch everything" API. Make the users know something to get information on it. Don't reveal what you know.
I've seen a lot of companies giving out API keys and stating a TOS that all developers must adhere to. For example, any page that uses data from the API must include your logo and a link back to your website. If any developer is found breaking the rules, the API key can be cancelled and your data is safe again.
Who is meant to use the API?
A good general method of solving this problem is to limit access to the data to end users (rather than allow applications or developers at it). Provide applications and users with identification, each, and make sure that to access a subset of the data, a combination of both user and application key is required.
Following this pattern, each user will have access to a very limited subset of the data (presumably, the data that they require for their own specific use), and you can put measures in place to enforce this. Any attempts at data-mining will become obvious.
This type of approach meshes well with capability-type security models on the server side.