What approach is best suited for analysing relationship between X_inputs and Y_output? - input

A set of restaurants have product mix informations well as Food Cost/Netto % like in the data sample below:
Restaurant 1:
Input: Pizza - 60%, Pasta - 25%, Hamburgers - 10%, Steaks - 5%. Output: Cost - 25% of Netto
Restaurant 2:
Input: Pizza - 45%, Pasta - 45%, Hamburgers - 8%, Steaks - 2%. Output: Cost - 20% of Netto
Restaurant 3:
Input: Pizza - 47%, Pasta - 38%, Hamburgers - 14%, Steaks - 1%. Output: Cost - 27% of Netto
Assuming we have prior knowledge that food cost of Pizza is higher than the rest and the rest have approximately the same food cost per unit. By eyeballing this sample we can see Restaurant 3 has unusually high food cost % as it makes less pizza and should have less cost%.
What is the best analysis approach for such a problem? I've tried multivariate LSTM to predict food cost and then take the difference with actual value to see worst performing restaurants. Results are mixed.
Thank you

Why use Long Short Term Memory neurons ?
You could just use a basic Neural Network with 4 inputs
Pizza
Pasta
Hamburgers
Steaks
and 1 outputs :
Cost / Netto%
Don't mind me if I m wrong but 'Cost/Netto%' term is kidda confuse

Related

Expected Value formula

I have a table as follows
Day Savings
1 : 251
2 : 722
3 : 1132
4 : 1929
5 : 3006
6 : 4522
7 : 8467
...
14 : x
These savings are growing day by day, I want to find a formula to expect the final value of day 14 which is x!
I didn't look at the data in any detail, but it seems like an exponential growth situation. If that's the case, then you can estimate the growth rate by fitting an exponential curve to the data using least squares approximation to get an estimated interest rate, r. If you find the data not conducive to that, you could try fitting it to some other curve. You can then use the estimated interest rate to compute the expected funds using the standard a = p*exp(r*i) where p is the initial principal and i is the elapsed time.
This all assumes compounding interest which is an exponential growth situation. If that assumption is incorrect, this approach is probably not going to work for you.

Where did I go wrong in numpy normalization of input data in linear regression?

When following through Andrew Ng's Machine learning course assignment - Exercise:1 in python,
I had to predict the prize of a house given the size of the house in sq-feet,number of bedroom using multi variable linear regression.
In one of the steps where we had to predict the cost of the house on a new example X = [1,1650,3] where 1 is the bias term,1650 is the size of the house and 3 is the number of bedrooms, I used the below code to normalize and predict the output:
X_vect = np.array([1,1650,3])
X_vect[1:3] = (X_vect[1:3] - mu)/sigma
pred_price = np.dot(X_vect,theta)
print("the predicted price for 1650 sq-ft,3 bedroom house is ${:.0f}".format(pred_price))
Here mu is the mean of the training set calculated previously as [2000.68085106 3.17021277],sigma is the standard deviation of the training data calculated previously as [7.86202619e+02 7.52842809e-01] and theta is [340412.65957447 109447.79558639 -6578.3539709 ]. The value of X_vect after the calculation was [1 0 0].Hence the prediction code :
pred_price = np.dot(X_vect,theta_vals[0])
gave the result as the predicted price for 1650 sq-ft,3 bedroom house is $340413.
But this was wrong according to the answer key.So I did it manually as below:
print((np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma)
This is the value of normalized form of X_vect and the output was [[-0.44604386 -0.22609337]].
The next line of code to calculate the hypothesis was:
print(340412.65957447 + 109447.79558639*-0.44604386 + 6578.3539709*-0.22609337)
Or in cleaner code:
X1_X2 = (np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma
xo = 1
x1 = X1_X2[:,0:1]
x2 = X1_X2[:,1:2]
hThetaOfX = (340412.65957447*xo + 109447.79558639*x1 + 6578.3539709*x2)
print("The price of a 1650 sq-feet house with 3 bedrooms is ${:.02f}".format(hThetaOfX[0][0]))
This gave the result of the predicted price to be $290106.82.This was matching the answer key.
My question is where did I go wrong in my first approach?

Pandas: Multiple Grand Totals in a Summary Dataframe

Apologies for the noob question as I try to learn Python. Looking forward to getting up to speed and giving back
Assuming I have the following data,
YEAR SECTOR PROFIT STARTMVYEAR TOTALPROFIT STARTMV
IBM TECHNOLOGY -500 2500 500 1500
APPLE TECHNOLOGY 800 4000 300 4500
GM INDUSTRIAL 250 1000 0 1250
CHRYSLER INDUSTRIAL 600 3000 100 3500
I want to create a summary that looks as follows
SECTOR PROFITYEAR TOTALPROFIT
TECHNOLOGY .046 .133
INDUSTRIAL .213 .021
Where for each group, we have sum(PROFIT)/sum(STARTMVYEAR) and sum(TOTALPROFIT)/sum(STARTMV)
If I wanted to do it for just the first benchmark, I could do
by_profit_totals =(df.groupby(['SECTOR'])['PROFIT'].sum()/by_first_count.groupby(['SECTOR'])['STARTMVYEAR'].sum())
But how do I do it for both? Also, is there is easy function I could use that takes, for example, profit and startmvyear and returns the summary value?
You can use groupby with aggregating cython optimized sum and then div by numpy array created by values:
g = df.groupby('SECTOR').sum()
print (g[['PROFIT','TOTALPROFIT']].div( g[['STARTMVYEAR','STARTMV']].values).reset_index())
SECTOR PROFIT TOTALPROFIT
0 INDUSTRIAL 0.212500 0.021053
1 TECHNOLOGY 0.046154 0.133333

Dynamic Discount on Products

In the real world the discount on products you purchase is quite interesting. For example a seller offers a discount on his products in the following way:
On buying one quantity there will be no discount
On buying 2 he offers 10% discount
On buying 4 and 5 disount will be 20%
On 6 and onward 22%
What is the best way to accomplish this in an eCommerce application?
Take a ceiling function of the exponent or approximation thereof. For example Discount = MaxDiscount * (1 - (N-1)/N), where N is number of items. for 1 item discount is 0, for 2 items discount is 1/2 of the max, for large number of item it will approach MaxDiscount. Use ceiling function to you want discount to be integer number.

storing weight in a database table

I am playing around with learning MVC and want to create a recipe recorder application to store my recipes.
I am using .net with Sql Server 2008 R2 however I don't think that really matters with what I am trying to do.
I want to be able to record all of the measures I use. In my country we use metric however I want people to be able to use imperial with my application.
How do I structure my table to cope with the differences, I was thinking of storing all of the measurements as ints and have a foreign key to store the kind of weight.
Ideally I would like to be able to share the recipes between people and display the measurements in their preferred way.
Is this the right kind of way
IngredientID PK
Weight int
TypeOfWeight int e.g. tsp=1,tbl=2,kilogram=3,pound=4,litre=5,ounce=6 etc
UserID int
Or is this way off track? Any suggestions would be great!
I think you should store the weights (Kilo/Pound) etc as a single weight type (metric) and simply "display" them in the correct conversion using the user's preference. If the user has there weight settings set to Imperial, values entered into the system would need to be converted as well. This should simplify your data anyway.
Similar to Dates, you could store every date and what timezone it is from, or otherwise store all dates as the same (or no timezone) and then display them in the application using offsets according to the user's preference
If you are storing weights (a non-discrete value) I would strongly suggest using numeric or decimal for this data. You have the right idea with the typeofweight column. Store a reference table somewhere showing what the conversion ratio is for each (to a certain standard).
This gets quite tricky when you want to show ounces as TSP, because the conversion depends on the ingredient itself, so you need a 3rd table - ingredient: id, name, volume-to-weight ratio.
Example typeofweight table, where the standard unit is grams
type | conversion
gram | 1
ounce | 28.35
kg | 1000
tsp | 5 // assuming that 1 tsp = 5 grams of water
pound | 453.59
Example ingredient volume to weight conversion
type | vol-to-weight
water | 1
sugar | 1.4 // i.e. 1 tsp holds 5g of water, but 7g of sugar
So to display 500 ounces of sugar in tsp, you would use the formula
units x ounce.conversion x sugar.vol-to-weight
= 500 x 28.35 x 1.4
Another example with 2 weights
Ingredient is specified as 3 ounces of starch. Show in grams
= 3 x 28.35 (straightforward isn't it)
or
Ingredient is specified as 3 ounces of starch. Show in pounds
= 3 * 28.35 / 453.59