Automatic music rating based on listening habits - automation

I've created a Winamp-like music player in Delphi. Not so complex, of course. Just a simple one.
But now I would like to add a more complex feature: Songs in the library should be automatically rated based on the user's listening habits.
This means: The application should "understand" if the user likes a song or not. And not only whether he/she likes it but also how much.
My approach so far (data which could be used):
Simply measure how often a song was played per time. Start counting time when the song was added to the library so that recent songs don't have any disadvantage.
Measure how long a song was played on average (minutes).
Starting a song but directly change to another one should have a bad influence on the ranking since the user didn't seem to like the song.
...
Could you please help me with this problem? I would just like to have some ideas. I don't need the implementation in Delphi.

I would track all of your users' listening habits in a central database, so you can make recommendations based on what other people like too ("people that liked this song, also liked these other songs")
some other metrics to consider:
proportion of times that the song was immediately replayed (ex. this song was immediately replayed 12% of the times it was played)
did they turn on the "repeat this song" button during play?
times played per hour, day, week, month
proportion of times this song was skipped. (ex. this song was played, but immediately skipped 99% of the time)
proportion of song listened to (the user listened to 50% of this song on average, versus 100% of some other song)
also:
listen in on the user's microphone. do they sing along? :D
what volume do they play the song? do they crank it up?
Put in a "recommend this song to friends" button (that emails song title to friend or something). Songs they recommend, they probably like.
You might want to do some feature extraction on the audio stream, and find similar songs. This is hard, but you can read more about it here:
"Automatic Feature Extraction for Classifying Audio Data "
Link
"Understandable models Of music collections based on exhaustive feature generation with temporal statistics"
http://portal.acm.org/citation.cfm?id=1150523
"Collaborative Use of Features in a Distributed System for the Organization of Music Collections"
http://www.idea-group.com/Bookstore/Chapter.aspx?TitleId=24432

Measure how long a song was played on average (minutes).
I don't think this is a good metric, because a long song would gain an unfair advantage over a short song. You should use a percentage instead:
avg. time played / total song length

Please let degrade likeliness over time. You seem to like songs better if you heard them often during the last n days, while older songs should only get a casual mentioning, since you like them but heard them way too much, probably.
Least but not last you could add beat detection (and maybe frequence spectrum) to find similar songs, which could provide you with more data than the user inputted by hearing the songs.
I would also go for grouping songs having the same MP3-Id Tag here, since this also gives a hint what the user is currently on. And if you want to provide some autoplay function, it would also help. After hearing a great Goa song, switching to Punk is strange, even if I like songs of both worlds.

Concerning your additional metrics: Shouldn't one combine metric #4 and metric #5? If a song is immediately skipped, then the proportion listened to is just 1% or so, right? – marco92w May 21 at 15:08
These should be separate. Skipping should result in negative rating for the song that was skipped. However, if the user closes the application when a song begins, you should not consider it as negative rating, even though only a low percentage of the song was played.

(ListenPartCount * (ListenFullCount ^ 2)) + (AverageTotalListenTime * ListenPartTimeAverage)
--------------------------------------------------------------------------------------------
((AverageTotalListenTime - ListenPartTimeAverage) + 0.0001f)
This formula will produce an nice result, since user could really like just part of song, this should be seen in the score, also if user likes full song then weight should be doubled.
You can tweak this folmula in various ways, f.ex include user tree of listening, f.ex if user listens one song and after that he listens another song few times, etc.

Use the date the song was added to the library as a starting point.
Measure how often the song/genre/artist/album is played (fully, or in part or skipped) - this will also allow you to measure how often a song/genre/artist/album is not played.
Come up with a weighting based on these parameters, when a song, it's genre, artist or album has not been played frequently, it should rank poorly. When an artist is played every day songs from this artist should get a boost, but say one of the artist's songs is never played this song should still rank pretty low

Simply measure how often a song was
played per time.
Often, I go to play a particular song, and then just let my iPod run until the end of an album. So this method would give an unfair advantage to songs late in an album. Something you might want to compensate for if your music player works the same way.

What about artificial intelligence appliance on this problem?
Well! Let me say that starting from scratch could be really funny to use
a network of clients with their own "intelligence" and finally collect
client results on a central "intelligence".
Each client could produce his own "user ratings" based on user habitudes
(as already said: average listenig, listenig count, etc...).
Than a central "intelligent" collector could merge individual ratings into "global ratings"
showing trands, suggestions and every high level rating you need.
Anyway to train such a "brain" means that you have to solve the problem in an analytical way first, but really could be funny to build such a cloud of interconnected small brains to produce higher level "intelligence".
As usual, as I don´t know your skills, take a look to neural networks, genetic algorithms, fuzzy logic, pattern recognition and similar problems for a deeper understanding.

You can use some simple function like:
listened_time_of_song/(length_of_song + 15s)
or
listened_time_of_song/(length_of_song * 1.1)
that means that if song was stopped in 15 seconds then it would be rated with negative score, or maybe the second case is even better (length of song would have no matter to final note if user listened whole song)
Another way may be using neural networks if you are common with this subject.

Related

Text Analysis Services/Libraries

Right now I'm getting tweets from Twitter Streaming API and doing some semantic analysis. Basically I want to extract tweets about people leaving home and going somewhere (a place) for a vacation or business trip or some related matters, so I can recommend them a weather app that can show the real-time weather of the place they're going to.
Now I'm using key words like: going to, heading for, leaving for, trip and travel for the streaming API and then I feed the filtered tweets to AYLIEN to do semantic labeling. I'm currently using labels: trip, travel, vacation and holiday. As long as any of those labels has a score higher than 0.01, I consider the corresponding tweet as the one I want.
But I found AYLIEN is not very satisfying for my task, I still get a lot of tweets such as I am leaving for a while, Babe is leaving for 2 weeks and I'm leaving for work, which are not what I want.
So I want to know if anybody know some descent text analysis services or libraries that can help me to achieve my goal. Thanks.

Create winnable "random" solitaire shuffles

I've created a solitaire game for Mac, but people keep complaining that there's not enough "winning" shuffles. People have a win-rate of about 5%-10%, where their usual win-rate is around 50%.
Right now, I'm creating an array with all the cards in the deck and after that I shuffle that array, using the F/Y method.
So my question is... is there any way that I can "check" for a winning solitaire shuffle, so I can bump up the numbers of winning solitaire shuffles I'm dealing to people?
Do you have a backend part of your app? if so then you can store all success (win) shuffled arrays on the server from all the users and increase the win-rate by sending them successes from other users with some frequency.
I read that for some sort of these games, there is no more efficient way than to do brute force checking of all possible moves.
My suggestion for situations like this is to start from the completed state (end of game), and move backwards randomly to create a random start state.
I run site with thousands of solitaire games played a day. We tried to built an AI that would allow us to understand what hands were winnable or not. It turns out, like many of the responses, that there are so many permutations to win or lose a solitaire game that an AI to detect a winnable or non-winnable game is extremely difficult to build and requires significant processing power. We weren't able to succeed in building it.
Now, we simply leverage our user data to find which games are winnable, and offer that as option for players who want to play winnable only games. Otherwise, they can choose a random game. I would recommend trying something similar!

Using Redis for "trending now" functionality

I'm working on a very high throughput site with many items, am looking into implementing "trending now" type functionality, that would allow users to quickly get a prioritized list of the top N items that have been viewed recently by many people, that gradually fade away as they get fewer views.
One idea about how to do this is to give more weight to recent views of an item, something like a weight of 16 for every view of an item the past 15 minutes, a weight of 8 for every view of an item in the past 1 hour, a weight of 4 for things in the past 4 hours, etc but I do not know if this is the right way to approach it.
I'd like to do this in Redis, we've had good success with Redis in the past for other projects.
What is the best way to do this, both technologically and the determination of what is trending?
The first answer hints at a solution but I'm looking for more detail -- starting a bounty.
These are both decent ideas, but not quite detailed enough. One got half the bounty but leaving the question open.
So, I would start with a basic time ordering (zset of item_id scored by timestamp, for example), and then float things up based on interactions. So you might decided that a single interaction is worth 10 minutes of 'freshness', so each interaction adds that much time to the score of the relevant item. If all interactions are valued equally, you can do this with one zset and just increment the scores as interactions occur.
If you want to have some kind of back-off, say, scoring by the square root of the interaction count instead of the interaction count directly, you could build a second zset with your score for interactions, and use zunionstore to combine this with your timestamp index. For this, you'll probably want to pull out the existing score, do some math on it and put a new score over it (zadd will let you overwrite a score)
The zunionstore is potentially expensive, and for sufficiently large sets even the zadd/zincrby gets expensive. To this end, you might want to keep only the N highest scoring items, for N=10,000 say, depending on your application needs.
These two links are very helpful:
http://stdout.heyzap.com/2013/04/08/surfacing-interesting-content/
http://word.bitly.com/post/41284219720/forget-table
The Reddit Ranking algorithm does a pretty good job of what you describe. A good write up here that talks through how it works.
https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9
consider an ordered set with the number of views as the scores. whenever an item is accessed, increment its score (http://redis.io/commands/zincrby). this way you can get top items out of the set ordered by scores.
you will need to "fade" the items too, maybe with an external process that would decrement the scores.

I am looking for a radio advertising scheduling algorithm / example / experience

Tried doing a bit of research on the following with no luck. Thought I'd ask here in case someone has come across it before.
I help a volunteer-run radio station with their technology needs. One of the main things that have come up is they would like to schedule their advertising programmatically.
There are a lot of neat and complex rule engines out there for advertising, but all we need is something pretty simple (along with any experience that's worth thinking about).
I would like to write something in SQL if possible to deal with these entities. Ideally if someone has written something like this for other advertising mediums (web, etc.,) it would be really helpful.
Entities:
Ads (consisting of a category, # of plays per day, start date, end date or permanent play)
Ad Category (Restaurant, Health, Food store, etc.)
To over-simplify the problem, this will be a elegant sql statement. Getting there... :)
I would like to be able to generate a playlist per day using the above two entities where:
No two ads in the same category are played within x number of ads of each other.
(nice to have) high promotion ads can be pushed
At this time, there are no "ad slots" to fill. There is no "time of day" considerations.
We queue up the ads for the day and go through them between songs/shows, etc. We know how many per hour we have to fill, etc.
Any thoughts/ideas/links/examples? I'm going to keep on looking and hopefully come across something instead of learning it the long way.
Very interesting question, SMO. Right now it looks like a constraint programming problem because you aren't looking for an optimal solution, just one that satisfies all the constraints you have specified. In response to those who wanted to close the question, I'd say they need to check out constraint programming a bit. It's far closer to stackoverflow that any operations research sites.
Look into constraint programming and scheduling - I'll bet you'll find an analogous problem toot sweet !
Keep us posted on your progress, please.
Ignoring the T-SQL request for the moment since that's unlikely to be the best language to write this in ...
One of my favorites approaches to tough 'layout' problems like this is Simulated Annealing. It's a good approach because you don't need to think HOW to solve the actual problem: all you define is a measure of how good the current layout is (a score if you will) and then you allow random changes that either increase or decrease that score. Over many iterations you gradually reduce the probability of moving to a worse score. This 'simulated annealing' approach reduces the probability of getting stuck in a local minimum.
So in your case the scoring function for a given layout might be based on the distance to the next advert in the same category and the distance to another advert of the same series. If you later have time of day considerations you can easily add them to the score function.
Initially you allocate the adverts sequentially, evenly or randomly within their time window (doesn't really matter which). Now you pick two slots and consider what happens to the score when you switch the contents of those two slots. If either advert moves out of its allowed range you can reject the change immediately. If both are still in range, does it move you to a better overall score? Initially you take changes randomly even if they make it worse but over time you reduce the probability of that happening so that by the end you are moving monotonically towards a better score.
Easy to implement, easy to add new 'rules' that affect score, can easily adjust run-time to accept a 'good enough' answer, ...
Another approach would be to use a genetic algorithm, see this similar question: Best Fit Scheduling Algorithm this is likely harder to program but will probably converge more quickly on a good answer.

Implementation of achievement systems in modern, complex games

Many games that are created these days come with their own achievement system that rewards players/users for accomplishing certain tasks. The badges system here on stackoverflow is exactly the same.
There are some problems though for which I couldn't figure out good solutions.
Achievement systems have to watch out for certain events all the time, think of a game that offers 20 to 30 achievements for e.g.: combat. The server would have to check for these events (e.g.: the player avoided x attacks of the opponent in this battle or the player walked x miles) all time.
How can a server handle this large amount of operations without slowing down and maybe even crashing?
Achievement systems usually need data that is only used in the core engine of the game and wouldn't be needed out of there anyway if there weren't those nasty achievements (think of e.g.: how often the player jumped during each fight, you don't want to store all this information in a database.). What I mean is that in some cases the only way of adding an achievement would be adding the code that checks for its current state to the game core, and thats usually a very bad idea.
How do achievement systems interact with the core of the game that holds the later unnecessary information? (see examples above)
How are they separated from the core of the game?
My examples may seem "harmless" but think of the 1000+ achievements currently available in World of Warcraft and the many, many players online at the same time, for example.
Achievement systems are really just a form of logging. For a system like this, publish/subscribe is a good approach. In this case, players publish information about themselves, and interested software components (that handle individual achievements) can subscribe. This allows you to watch public values with specialised logging code, without affecting any core game logic.
Take your 'player walked x miles' example. I would implement the distance walked as a field in the player object, since this is a simple value to increment and does not require increasing space over time. An achievement that rewards players that walk 10 miles is then a subscriber of that field. If there were many players then it would make sense to aggregate this value with one or more intermediate broker levels. For example, if 1 million players exist in the game, then you might aggregate the values with 1000 brokers, each responsible for tracking 1000 individual players. The achievement then subscribes to these brokers, rather than to all the players directly. Of course, the optimal hierarchy and number of subscribers is implementation-specific.
In the case of your fight example, players could publish details of their last fight in exactly the same way. An achievement that monitors jumping in fights would subscribe to this info, and check the number of jumps. Since no historical state is required, this does not grow with time either. Again, no core code need be modified; you only need to be able to access some values.
Note also that most rewards do not need to be instantaneous. This allows you some leeway in managing your traffic. In the previous example, you might not update the broker's published distance travelled until a player has walked a total of one more mile, or a day has passed since last update (incrementing internally until then). This is really just a form of caching; the exact parameters will depend on your problem.
You can even do this if you don't have access to source, for example in videogame emulators. A simple memory-scan tool can be written to find the displayed score for example. Once you have that your achievement system is as easy as polling that memory location every frame and seeing if their current "score" or whatever is higher than their highest score. The cool thing about videogame emulators is that memory locations are deterministic (no operating system).
There are two ways this is done in normal games.
Offline games: nothing as complex as pub/sub - that's massive overkill. Instead you just use a big map / dictionary, and log named "events". Then every X frames, or Y seconds (or, usually: "every time something dies, and 1x at end of level"), you iterate across achievements and do a quick check. When the designers want a new event logged, it's trivial for a programmer to add a line of code to record it.
NB: pub/sub is a poor fit for this IME because the designers never want "when player.distance = 50". What they actually want is "when player's distance as perceived by someone watching the screen seems to have travelled past the first village, or at least 4 screen widths to the right" -- i.e. far more vague and abstract than a simple counter.
In practice, that means that the logic goes at the point where the change happens (before the event is even published), which is a poor way to use pub/sub. There are some game engines that make it easier to do a "logic goes at the point of receipt" (the "sub" part), but they're not the majority, IME.
Online games: almost identical, except you store "counters" (int that goes up), and usually also: "deltas" (circular buffers of what's-happened frame to frame), and: "events" (complex things that happened in game that can be hard-coded into a single ID plus a fixed-size array of parameters). These are then exposed via e.g SNMP for other servers to collect at low CPU cost and asynchronously
i.e. almost the same as 1 above, except that you're careful to do two things:
Fixed-size memory usage; and if the "reading" servers go offline for a while, achievements won in that time will need to be re-won (although you usually can have a customer support person manually go through the main system logs and work out that the achievement "probably" was won, and manually award it)
Very low overhead; SNMP is a good standard for this, and most teams I know end up using it
If your game architecture is Event-driven, then you can implement achievements system using finite-state machines.