Modeling products pricing structure - sql
I need to model a rather complex pricing structure for some of our products.
Today we lookup the prices manually. Here's a picture with explanations of the "matrix" that we use today: Sample model (sorry for the link - but I'm not allowed to post images because I've just opened my account.)
Now I need to transfer this model to a RDBMS system (SQL Server 2008 R2). The entry point when looking up a price is the Category, then the yearly interval and finally the interval depending on how many products we're selling on this order. The result of the query should be two prices.
Do you have any suggestions on how to model this? I was thinking of modeling it as a matrix with a RowNumber, CellNumber and a CellValue. But then I need another table for describing what is contained in each cell (by referencing the row and cell numbers). If doing that, I could just include the prices in that description table. But that doesn't seem like the best solution.
Do you have any hints/solutions on how to model this problem the best way?
I think I would make something like this:
Categories are separated into its own table.
Each row in the price table are uniquely identified by the category and starting point of the sold and shipped range. I don't think you would need to specify ending point in the table (since the end point of a range should be the starting point of the next range minus one).
Edit: With this model, you will need to add a row in the Prices table for each combination of category, units sold-interval and units shipped-interval, but right now I can't think of an easier way.
Related
How to populate all possible combination of values in columns, using Spark/normal SQL
I have a scenario, where my original dataset looks like below Data: Country,Commodity,Year,Type,Amount US,Vegetable,2010,Harvested,2.44 US,Vegetable,2010,Yield,15.8 US,Vegetable,2010,Production,6.48 US,Vegetable,2011,Harvested,6 US,Vegetable,2011,Yield,18 US,Vegetable,2011,Production,3 Argentina,Vegetable,2010,Harvested,15.2 Argentina,Vegetable,2010,Yield,40.5 Argentina,Vegetable,2010,Production,2.66 Argentina,Vegetable,2011,Harvested,15.2 Argentina,Vegetable,2011,Yield,40.5 Argentina,Vegetable,2011,Production,2.66 Bhutan,Vegetable,2010,Harvested,7 Bhutan,Vegetable,2010,Yield,35 Bhutan,Vegetable,2010,Production,5 Bhutan,Vegetable,2011,Harvested,2 Bhutan,Vegetable,2011,Yield,6 Bhutan,Vegetable,2011,Production,3 Image of the above csv: Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB: I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present). So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values. The original dataset from the above has to be converted into below: Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae): Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F US,Vegetable,2010,Production,6.48,T,F,F,F,F,F US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F US,Vegetable,2011,Harvested,6,T,F,F,F,F,F US,Vegetable,2011,Yield,18,T,F,F,F,F,F US,Vegetable,2011,Production,3,T,F,F,F,F,F US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F The image of the above expected output data for a structured look at it: Part 1 - Part 2 - Formulae for populating Amount Field for Derived Type: Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns. So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year. Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries. Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question. Any help shall be greatly acknowledged.
Qlikview - Target missing where no actual value
I have a fact table of Delay by Date by Category (and many other Fields). I have another (target) table of DelayTarget by Month and Category. I am currently associating the target table to the fact table on Month & Category but when there is no Delay for a given Category in a given Month, then the DelayTarget value does not display in my dashboard. How do I associate the DelayTarget to all Months in my main dataset - even when there is no Delay to report? I think I want to create a Zero value for Delay when it is null but I don't know how to do this or if this is the best method.
You need to create MasterCalendar to fill gap in dates. I can give you more detailed answer but the best would be to share you data model (ctrl +T) and some example data from tables (or even better just.qvw)
Tableau - Adding dimensions together to show overall revenue
I am very new to Tableau (first day user) and have been a long time Excel user. I am trying to fully understand the power of Tableau to eventually move away from Excel. I have a question concerning dimensions and creating a calculated field. My table has multiple categories and sub-categories. My goal is to display the total revenue and average order value per chosen sub-category (this seems easy enough). I want to then take those sub-categories and show a combined sum of revenue and average of the average order value. I am stuck on trying to also combine these sub-categories to show a blended view. Furthermore, the 2 sub-categories are weighted very differently. The average order value of 1 has a much heavier weight than the other and will definitely affect the AOV when combined. How do you also assign a weight to this combined total? Any help will be much appreciated. I know this may be a rather simple solution but I am new to the program and am having difficulty finding this answer. Tableau screen: or img1 http://postimg.org/image/dq5wqgnyl/ Best, CR
Put sub categories in the rows column. Put sum revenue in the text pill in the marks section In the analysis tab on the top select column grand totals. I'm unable to see your images,i hope this answers a apart of your question.
Need advice in designing tables in SQL-Server
I have a quote that contains items (store in table QuoteItem): QuoteItemId, QuoteId, ItemId, Quantity etc. Now, I need to be able to create a group of chosen items in the quote and apply a discount on it. Well that's simple, I create two more tables: Group: GroupId, DiscountPercentage GroupQuoteItem: GroupId, QuoteItemId Let's say I have 30 items in a quote. I made a group that contains items 1-20 from the quote and I applied a discount on it. Now I need to have another group that contains items 10-30, the problem is about those inner 10 items, I need to control whether the discount should apply on the items after the other discount or it should be on the items' base price. For instance, I am gonna talk about item no. 15 in the quote: QuoteItem.Cost = 100 I applied 1st discount of 10% = 90. Now I want to apply the second discount, I need to be able to control if the discount should be on the 100 or should be on the 90. Same is when I have multiple discount groups and when I wanna apply a complex architecture of discounts. Any assistance will be really appreciated.
I would look into adding a column to the GroupQuoteItem table, GroupQuoteItem.Priority. This column would be used in the query that determines the final price. If you have N discounts with the same, highest priority, they will be stacked atop each other (the order doesn't matter, thanks to associativity of multiplication). If all of these high-priority discounts are later removed, lower-priority discounts can take their place. This should help you in setting up pretty complex discount structures. I hope that at least gives you somewhere to start from.
It really depends on your own business rules. Do you want to apply the discounts on the price after discount or on the original price. When you ask questions like this it helps with SAMPLE Data then show us expected results.
This may be one of those rare times in normalization when you want to store data that you could calculate otherwise. So, in QuoteItem, you could have a Cost field and a DiscountedCost field. If they're the same, then you know no discount has been applied, if they are not, then a discount has been applied. By having this field, you would also be able to do comparisons on what the discount is already and whether you want to add the additional discount. In fact, you could also store that number in an ExistingDiscount field.
Why not store a column in the Group table that specifies whether or not the discount can be accumulated with other discounts versus if it must be applied to the base price only? You could name the field something like "ApplyToBasePriceOnly." Other than that, I agree with JonH that a lot of this logic should be placed in business rules. I think your general database structure looks pretty good.
track sales for week/month and find the best sellers
Lets say I have a website that sells widgets. I would like to do something similar to a tag cloud tracking best sellers. However, due to constantly aquiring and selling new widgets, I would like the sales to decay on a weekly time scale. I'm having problems puzzling out how store and manipulate this data and have it decay properly over time so that something that was an ultra hot item 2 months ago but has since tapered off doesn't show on top of the list over the current best sellers. What would be the logic and database design for this?
Part 1: You have to have tables storing the data that you want to report on. Date/time sold is obviously key. If you need to work in decay factors, that raises the question: for how long is the data good and/or relevant? At what point in time as the "value" of the data decayed so much that you no longer care about it? When this point is reached for any given entry in the database, what do you do--keep it there but ensure it gets factored out of all subsequent computations? Or do you archive it--copy it to a "history" table and delete it from your main "sales" table? This is relevant, as it has to be factored into your decay formula (as well as your capacity planning, annual reporting requirements, and who knows what all else.) Part 2: How much thought has been given to the decay formula that you want to use? There's no end of detail you can work into this. Options and factors to wade through include but are not limited to: Simple age-based. Everything before the cutoff date counts as 1; everything after counts as 0. Sum and you're done. What's the cutoff date? Precisly 14 days ago, to the minute? Midnight as of two Saturdays ago from (now)? Does the cutoff date depend on the item that was sold? If some items are hot but some are not, does that affect things? What if you want to emphasize some things (the expensive/hard to sell ones) over others (the fluff you'd sell anyway)? Simple age-based decays are trivial, but can be insufficient. Time to go nuclear. Perhaps you want some kind of half-life, Dr. Freeman? Everything sold is "worth" X, where the value of X is either always the same or varies on the item sold. And the value of X can decay over time. Perhaps the value of X decreased by one-half every week. Or ever day. Or every month. Or (again) it may vary depending on the item. If you do half-lifes, the value of X may never reach zero, and you're stuck tracking it forever (which is why I wrote "part 1" first). At some point, you probably need some kind of cut-off, some point after which you just don't care. X has decreased to one-tenth the intial value? Three months have passed? Either/or but the "range" depends on the inherent valud of the item? My real point here is that how you calculate your decay rate is far more important than how you store it in the database. So long as the data's there that the formalu needs to do it's calculations, you should be good. And if you only need the last month's data to do this, you should perhaps move everything older to some kind of archive table.
you could just count the sales for the last month/week/whatever, and sort your items according to that. if you want you can always add the total amonut of sold items into your formula.
You might have a table which contains the definitions of the pointing criterion (most sales, most this, most that, etc.), then for a given period, store in another table the attribution of points for each of the criterion defined in the criterion table. Obviously, a historical table will be used to store the score for each sellers for a given period or promotion, call it whatever you want. Does it help a little?