We have a model in our ralis app whose objects are assigned a score based on positive user actions. We'll call them products for simplicity sake. If a user likes a product or buys a product or views a product, the score is incremented at various weights (a like might be worth more than a view, two views in the span of 30 seconds might be worth more than three views spread over an hour, etc.)
We'd like to use these scores to help sort and rank products, say for a popular products list, but for various reasons -- using the straight ranking is going to unevenly favor older products, since they'll have more time to amass a higher score.
My question is, how to normalize the scores between new and old products. I thought about dividing the products score by a unit of time, say the number of days it's been in existence, but am worried that will cut down the older products too much. Any thoughts on the best way to fairly normalize the scores between the old and new products?
I'm also considering an example of a bayesian rating system I found in another question:
rating = ((avg_num_votes * avg_rating) + (product_num_votes * product_rating)) / (avg_num_votes + product_num_votes)
Where theavg numbers are calculated by looking at the scores across all products that have more than one vote (or in our case, a positive action). This might not be the best way, because we don't have a negative rating in our system and it doesn't take time into consideration at all.
Your question reminds me the concept of Exponential Discounting Cash Flow in finance.
The concept is the following : 100$ in two years worth less than 100$ in one year, which worth less than 100$ now, ...
I think that we can make a good comparison here : a product of yesterday worth more that a product of the day before but less than a product of today.
The formula is simple :
Vn = V0 * (1-t)^n
with V0 the initial value (the real number of positives votes), t a discount rate (you have to fix it, like 10%) and n the time passed (for example n days). Thus a product will lose 10% of his value each day (but 10% of the precedent day, not of the initial value).
You can also see Hyperbolic discounting that is closer of your try. The formula can be sometyhing like that I guess :
Vn = V0 * (1/(1+k*n))
An other approach, simpler, but crudest : linear discounting. You can simply give an initial value for the scores, like 1000 and each day, you decrement all scores by 1 (or an other constant).
Vn = V0 - k*n
Related
Is there a simple algorithm or process for maximizing quantity. Assume there are two products, A and B. The price of each changes each day and is independent of the other. You start with 100 units of A. Each day you can exchange (sell and buy) one product for the other. Your objective is to increase your quantity of A over say 100 days/iterations. What process do you use?
Price of A
Quantity
Price of B
Quantity
$10
100
$43
0
$11
-
$39
-
$12
-
$41
-
Note: I’m using prices and products in this example, but the problem could involve any countable thing with a variable feature.
I've modeled this process with excel/numbers using combinations of buy hi/lo, random, etc., with decent results, but I'm sure this is a problem that has already been studied. I just haven't found much on this topic in my research so far.
Is there a simple algorithm or process for maximizing quantity
To make it a little easier you can reduce the problem to a single unknown variable - "price of A / price of B". To maximize your quantity of A; you want to exchange all of your A for B when "price of A / price of B" will decrease on the next day/iteration, and exchange all of your B for A when "price of A / price of B" will increase on the next day/iteration.
However, to do this (to guarantee you maximize the quantity of A) you have to accurately predict the future with no mistakes.
If you can't predict the future with no mistakes, the best you can do is rely on "statistical probability" to try to increase your quantity of A (with a risk that you will fail and reduce your quantity of A, and an extremely low chance that you'll "maximize your quantity of A by accident").
If you can't predict the future at all, then it just becomes pure luck (better to do nothing and keep the quantity of A you already have). For an example; over 100 days/iterations, you could spend the first 50 days gathering information about how prices change (e.g. calculate "past min/max/average prices" and maybe find that "price of A / price of B" was always between 0.2 and 0.3 for the first 50 days/iterations); but it'd be foolish to merely assume that the past predicts the future in any way at all (e.g. the "price of A / price of B" might suddenly jump in any direction and never return to the range of previously seen values).
In other words; to improve your probability of increasing your quantity of A you need to improve your ability to predict the future; and to improve your ability to predict the future you need more information than what was provided.
The problem: A company has $120 000 to spend on the development and promotion of a new product. The company estimates that if x is spent on the development and y is spent on promotion, then approximately (x^(1/2)y^(3/2))/(400000) items of new product will be sold. Based on this estimate, what is the maximum number of products that the company can sell?
Not sure if this is an optimization or related rates problem, but even then I am not sure as to how to start it. I know the answer is supposed to be 11691.
The critical observation here is to notice that y = 120000-x, so your expression simplifies to:
On the plane,
The max occurs at x=30000 (you can verify analytically by finding zeroes of the first derivative); substituting, you'll find the max number of products to be 11,691.
I'm using AMPL to model a production where I have two particular constraints that I am not very sure how to handle.
subject to Constraint1 {t in T}:
prod[t] = sum{i in I} x[i,t]*u[i] + Recycle[f]*RecycledU[f];
subject to Constraint2 {t in T}:
Solditems[t]+Recycle[t]=prod[t];
EDIT: where x[i,t] is the amount of products from supply point i. u[i] denotes the "exchange rate" of the raw material from supply point i to create the product. I.E. a percentage of the raw material will become the finished products, whereas some raw material will go to waste. The same is true for RecycledU[f] where f is in F, which denotes the refinement station where it has been refined. The difference is that RecycledU[f] has a much lower percentage that will go to waste due to Recycled already being a finished product from f (albeitly a much less profitable one). I.e. Recycle has already "went through" the process of being a raw material earlier, x, but has become a finished product in some earlier stage, or hopefully (if it can be modelled) in the same time period as this. In the actual models things as "products" and "refinement station" is existent as well, but I figured for this question those could be abandoned to keep it more simple.
What I want to accomplish is that the amount of products produced is the sum of all items sold in time period t and the amount of products recycled in time period t (by recycled I mean that the finished product is kept at the production site for further refinement in some timestep g, g>t).
Is it possible to write two equal signs for prod[t] like I have done? Also, how to handle Recycle[t]? Can AMPL "understand" that since these are represented at the same time step, that AMPL must handle the constraints recursively, i.e. compute a solution for Recycle[t] and subsequently try to improve that solution in every timestep?
EDIT: The time periods are expressed in years which is why I want to avoid having an expression with Recycle[t-1].
EDIT2: prod and x are parameters and Recycle and Solditems are variables.
Hope anyone can shed some light into this!
Cenderze
The two constraints will be considered simultaneously (unless you explicitly exclude one from the problem). AMPL or optimization solvers don't have the notion of time steps and the complete problem is considered at the same time, so you might need to add some linking constraints between time periods yourself to model time periods. In particular, you might need to make sure that the inventory (such as the amount finished product is kept at the production site for further refinement) is carried over from one period to another, something like:
Recycle[t + 1] = Recycle[t] - RecycleDecrease + RecycleIncrease;
You have to figure out the expressions for the amounts by which Recycle is increased (RecycleIncrease) and decreased (RecycleDecrease).
Also if you want some kind of an iterative procedure with one constraint considered at a time instead, then you should use AMPL script.
This is my problem set for one of my CS class and I am kind of stuck. Here is the summary of the problem.
Create a program that will:
1) take a list of grocery stores and its available items and prices
2) take a list of required items that you need to buy
3) output a supermarket where you can get all your items with the cheapest price
input: supermarkets.list, [tomato, orange, turnip]
output: supermarket_1
The list looks something like
supermarket_1
$2.00 tomato
$3.00 orange
$4.00 tomato, orange, turnip
supermarket_2
$3.00 tomato
$2.00 orange
$3.00 turnip
$15.00 tomato, orange, turnip
If we want to buy tomato and orange, then the optimal solution would be buying
from supermarket_1 $4.00. Note that it is possible for an item to be bough
twice. So if wanted to buy 2 tomatoes, buying from supermarket_1 would be the
optimal solution.
So far, I have been able to put the dataset into a data structure that I hope will allow me to easily do operations on it. I basically have a dictionary of supermarkets and the value would point to a another dictionary containing the mapping from each entry to its price.
supermarket_1 --> [turnip --> $2.00]
[orange --> $1.50]
One way is to use brute force, to get all combinations and find whichever satisfies the solution and find the one with the minimum. So far, this is what I can come up with. There is no assumption that the price of a combination of two items would be less than buying each separately.
Any suggestions hints are welcome
Finding the optimal solution for a specific supermarket is a generalization of the set cover problem, which is NP-complete. The reduction goes as follows:
Given an instance of the set cover problem, just define a cost function assigning 1 to each combination, apply an algorithm that solves your problem, and you obtain an optimal solution of the set cover instance. (Finding the minimal price hence corresponds to finding the minimum number of covering sets.) Thus, your Problem is NP-hard, and you cannot expect to finde a solution that runs in polynomial time.
You really should implement the brute-force method you mentioned. I too recommand you to do this as a first step. If the performance is not sufficient, you can try a
using a MIP-formulation and a solver like CPLEX, or you have to devolop a heuristic approach.
For a single supermarket, it is rather trivial to find a mixed integer program (MIP). Let x_i be the integer number how often product combination i is contained in a solution, c_i its cost and w_ij the number how often product j is contained in product combination i. Then, you are minimizing
sum_i x_i * c_i
subject to conditions like
sum_i x_i * w_ij >= r_j,
where r_j is the number how often product j is required.
Well, you have one method, so implement it now so you have something that works to submit. A brute-force solution should not take long to code up, then you can get some performance data and you can think about the problem more deeply. Guesstimate the number of supermarkets in a reasonable shopping range in a large city. Create that many supermarket records and link them to product tables with random-ish prices, (this is more work than the solution).
Run your brute-force solution. Does it work? If it outputs a solution, 'manually' add up the prices and list them against three other 'supermarket' records taken at random, (just pick a number), showing that the total is less or equal. Modify the price of an item on your list so that the solution is no longer cheap and re-run, so that you get a different solution.
Is it so fast that no further work is justified? If so, say so in the conclusions section of your report and post the lot to your prof/TA. You understood the task, thought about it, came up with a solution, implemented it, tested it with a representative dataset and so shown that functionality is demonstrable and performance is adequate - your assignment is over, go to the bar, think about the next one over a beer.
I am not sure what you mean by "brute force" solution.
Why don't you just calculate the cost of your list of items in each of the supermarkets, and then select the minimum? Complexity would be in O(#items x #supermarkets) which is good.
Regarding your data structure you can also simply have a matrix (2 dimension array) price[super][item], and use ids for your supermarkets/items.
Ok, I'm just curious what the formula would be for calculating an expected income over the next X weeks/months/etc, if the only data I have in mySQL DB is all past transactions (dates of transactions, amounts, etc)
I am thinking taking some averages and whatnot, but I can't think of a specific formula (there must be something along those lines) to take say average rise of income over time (weekly/monthly) and then apply it to a select future period and display it weekly/monthly/etc?
Any suggestions?
use AVG() on the income in the past devide it to proper weekly/monthly amounts if neccessary.
see http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_avg for more info on AVG()
Linear regression + simple integration is probably sufficient for your needs. I leave sorting out exact implementation for your DB up to you, but that follow that link to the "Estimation Methods" section, and probably use Ordinary Least Squares.
Alternatively, you can always slurp your data into something like R where the details are already implemented.
EDIT:
For more detail: you're trying to model INCOME = BASE + SCALING*T where we are assuming that a linear model is "good" (it's probably not great, but it's probably good enough on a short time scale). For two value linear regression, you're pretty much just taking averages; follow that link to "Fitting the Regression Line" and you'll see which things you need to average (y = INCOME and x = T). There are some tricks you can play to simplify the calculation for the computer if you can enforce some other conditions (e.g., having equally spaced time periods + no missing data), but you'll need to math a bit more yourself first if you want to do that (and you'll be less flexible in the face of changing db assumptions).