Curious about the algorithm behind the persoanlization feature of spotify - api

https://developer.spotify.com/web-api/get-users-top-artists-and-tracks/
Get a User’s Top Artists and Tracks
Get the current user’s top artists or tracks based on calculated affinity.
Affinity is a measure of the expected preference a user has for a particular >track or artist. It is based on user behavior, including play history, but >does not include actions made while in incognito mode. Light or infrequent >users of Spotify may not have sufficient play history to generate a full >affinity data set.
As a user’s behavior is likely to shift over time, this preference data is >available over three time spans. See time_range in the query parameter table >for more information.
For each time range, the top 50 tracks and artists are available for each user. >In the future, it is likely that this restriction will be relaxed. This data is >typically updated once each day for each user.
I am wondering which algorithm is used for implementing this feature? colloborative filtering algorithm or content based algorithm.

Related

How to calculate the similarity between two Redis sorted sets?

I have an application which keeps a redis sorted set of each user's browsing history for a particular type of resource.
Each user has a set containing the ids of resources they've viewed, scored by the number of times they've viewed the resource.
I'm wondering if there is a practical way to work out the sets which are "nearest" or "most similar" to the current user's set. (So that I can use the sets to make content recommendations.)

How to get number of Instagram followers on a specified date like minter.io does?

From the picture, you can see how followers statistics looks on minter.io
The only way how I imagine I would count the followers change: I would download the list of all he followers every day by the Instagram API to my DB. And after having this history already can calculate any change.
But on minter.io you can have such a graphics after few minutes after registration... How???
They are probably storing this information on a daily basis and hence are able to keep a historical trend.
If you go to the minter.io website, they mention at the bottom that they have collected data for close to 198 million accounts. I guess you were one of those.
You don't need to get the list of all followers just to show the absolute change in the numbers. The Instagram API gives that directly when you query any of the endpoints giving user information.
I know how it works at smartmetrics.co.
Smartmetrics collects information about all followers of tracked accounts and build history based on this data. So if you followed someone who is already tracked, you can get history for your account.
But minter makes fake linear graph according to some tests: How to Get Historical Data from Instagram API
Crowdbabble and Minter re-use Twitter tokens, which allows them to collect data on millions of accounts. This gives you the historical data that you want -- change in followers over time. As an individual, you aren't able to access the Twitter API and aggregate data like that for storage as easily. You don't have thousands of people giving you tokens that you can then scrape and store on a regular basis.
Crowdbabble has a free 14 day trial with no payment info required. If you don't want in-depth analytics, Twittercounter will give you your follower numbers over the past 30 days -- you can view each day separately.

Apply pattern recognition to user authentication for malicious attempts

To strengthen the authentication mechanism (web), I would like to log a user fingerprint for every attempt and apply pattern recognition to distinguish malicious attempts. For example if the user always logs in from European computers and there is an attempt made from China, the user is blocked until the user confirms (via email, for example) to allow logins from China.
I have a very, very small knowledge of pattern recognition from a university course. However, I cannot recall enough to start developing this service. What I know is that you should look at these various features:
Browser agent string, resulting in:
Operating system
Browser vendor
IP address, resulting in:
Location
Time stamp of login
Number of (failed) attempts (within session, or total)
You search for patterns and any extraordinary attempt is marked because it does not follow the average pattern. You probably will apply a threshold, so if a user logs in at night or has a new PC, it still works.
There are also a few requirements: first, the check of an attempt must be made real-time. You cannot block access after 2 minutes if the credentials were OK but you found out later on the attempt could have been malicious. Furthermore, all our apps are written in PHP, but PHP is probably too slow for this. I prefer to use Python then, but subsequently there is also a binding to Python required.
So the question is: where to start? What is the best approach to accomplish this? I can log all data in a key storage like Redis or document based like Mongo. How would I design a service which allows to validate a new attempt with certain features against a bulk of known other attempts? And return whether the attempt matches the average within a timely fashion, say 250ms.
What you want to do is called anomaly detection- wikipedia is a good place to start. As a first stab, you might want to try clustering:
you will need a data set. The good news is clustering is unsupervised, so you will not have to mark up a ton of login attempts as regular or malicious.
For a given user, keep a history of their past N logins (big brother warning!) and features of those logins. The features you have listed are a good start, but you can think of more.
apply a clustering algorithm to estimate what the average login is like. For every new attempt you can calculate the distance from the average and decide if it look malicious or not.
As a side not, you can go a long way without learning. My intuition is the location of the login and the number of failed attempts will get you most of the way there. A simple if-else might be good enough.

Filtering Foursquare Venue Results

I am currently evaluating several different APIs in order to get venue information. A key component of any provider is the ability to not just return all venues nearby but tailor the list based on previously entered user preferences.
Foursquare does not allow 'munging' their venue data with other data, like Google's places to create an aggregated service. But can I take Foursquare's venues for a given area, apply some filtering based on user preferences and recommendation engine techniques, and present a modified, personalized version of their information? Do they frown on only using their venue info as a jumping off point, even if attribution on the final results is given?
This customization would be above and beyond using retailer categories, something that can be included in the facebook request. Asking because other services require results presented exactly as returned from the API, including ads.
First, check out the policies at https://developer.foursquare.com/overview/community
We welcome you to use foursquare as your location database. You can associate additional content with our venue data in your system, but you may not combine our database with another database or export it on your own.
I think that they even encourage you to manipulate the data and create creative solutions with it, as long as you do not break their ground rule of not merging it with another database (see the full text at the link).
The API even lets you filter the results according to your needs with the categoryId and intent parameters. For example in our app, we filter out places that have less than 2 unique people checked in, because we assume its faked places.. we do other filtering on the result set as well, but we display only data from from foursquare venues database, and we attribute.

Store total online users in RRDtool given login/logout, checkin/checkout logs

Given a log file with clear check-in and check-out messages per user, how can I feed this data into RRDtool to track the total users logged in to the site? (At this time, I do not care about unique users, but that would be nice too of course!)
I read about the DERIVE data source type. How do I get a hypothetical INTEGRAL type instead? Can this be done straightforwardly?
Of course, I could process the file myself, updating GAUGE into the RRD and saving state somehow.; however, I am hoping to avoid that if I can get away with it.
since rrdtool deals in 'rates' it can not really count as you would like it to. The best you could get would be to use an ABSOLUTE type data-source, but this would only work for creating something like a login/logout rate statistic ... you could answer how many logins and how many logouts per-hour you had ...
You would then do an rrdtool update login.rrd N:1 whenever you have a login.