Two tables, Three prices, Different periods, What can i do? - sql

I have tried and failed to solve this puzzle, and now im looking for some nice help if anyone is out there. Explanation:
First table consists of the original price on a item.
Table1 (Sales_Data):
+---------+----------+-------+------------+
| Item_Id | Store_Id | Price | Sales_Date |
+---------+----------+-------+------------+
Second table consists of two prices
one is if a store has another price on the item and this has a 0 value in To_Date
because this price should last forever.(Lets call this forever price)
one is if a store has another price on a item for just a period (02.03.2014-10.03.2014) lets call this discount
both prices is stored in Price, but the dates are the big difference.
Table2 (Discount_Data):
+---------+----------+-------+------------+
| Item_Id | Store_Id | Price | Sales_Date |
+---------+----------+-------+------------+
Now to the big Q:
Forever price should always overwrite original price
Discount price should always overwrite original/or forever price for the exact period
Item_Id, and Store_Id has to be the same.
How can i go forward to solve this? Can anyone help me on the way?

I hate to make an assumption, but it appears your second table should include two more columns, From_Date and To_Date. I will alias Discount_Data forever price as ddf and Discount_Data period price as ddp
select sd.Item_Id,
sd.Store_Id,
sd.Price as OriginalPrice,
coalesce(ddd.Price,ddf.Price,sd.Price) as UpdatedPrice
from Sales_Data sd
left join Discount_Data ddf
on sd.Item_Id=ddf.Item_Id
and sd.Store_Id=ddf.Store_Id
and sd.Sales_Date >= ddf.From_Date
and ddf.To_Date=0
left join Discount_Data ddp
on sd.Item_Id=ddp.Item_Id
and sd.Store_Id=ddp.Store_Id
and sd.Sales_Date between ddp.From_Date and ddp.To_Date
Hope that helps.

Related

How do I iterate through subsets of a table in SQL

I'm new to SQL and I would appreciate any advice!
I have a table that stores the history of an order. It includes the following columns: ORDERID, ORDERMILESTONE, NOTES, TIMESTAMP.
There is one TIMESTAMP for every ORDERMILESTONE in an ORDERID and vice versa.
What I want to do is compare the TIMESTAMPs for certain ORDERMILESTONEs to obtain the amount of time it takes to go from start to finish or from order to shipping, etc.
To get this, I have to gather all of the lines for a specific ORDERID, then somehow iterate through them... while I was trying to do this by declaring a TVP for each ORDERID, but this is just going to take more time because some of my datasets are like 20000 rows long.
What do you recommend? Thanks in advance.
EDIT:
In my problem, I want to find the number of days that the order spends in QA. For example, once an order is placed, we need to make the item requested and then send it to QA. so there's a milestone "Processing" and a milestone "QA". The item could be in "Processing" then "QA" once, and get shipped out, or it could be sent back to QA several times, or back and forth between "Processing" and "Engineering". I want to find the total amount of time that the item spends in QA.
Here's some sample data:
ORDERID | ORDERMILESTONE | NOTES | TIMESTAMP
43 | Placed | newly ordered custom time machine | 07-11-2020 12:00:00
43 | Processing | first time assembling| 07-11-2020 13:00:05
43 | QA | sent to QA | 07-11-2020 13:30:12
43 | Engineering | Engineering is fixing the crank on the time machine that skips even years | 07-12-2020 13:00:02
43 | QA | Sent to QA to test the new crank. Time machine should no longer skip even years. | 07-13-2020 16:00:18
0332AT | Placed | lightsaber custom made with rainbow colors | 07-06-2020 01:00:09
0332AT | Processing| lightsaber being built | 07-06-2020 06:00:09
0332AT | QA | lightsaber being tested | 07-06-2020 06:00:09
I want the total number of days that each order spends with QA.
So I suppose I could create a lookup table that has each QA milestone and its next milestone. Then sum up the difference between each QA milestone and the one that follows. My main issue is that I don't necessarily know how many times the item will need to be sent to QA on each order...
To get the hours to complete a specific mile stone of all orders you can do
select orderid,
DATEDIFF(hh, min(TIMESTAMP), max(TIMESTAMP))
from your_table
where ORDERMILESTONE = 'mile stone name'
group by orderid
Assuming you are using SQL Server and your milestones are not repeated, then you can use:
select om.orderid,
datediff(seconds, min(timestamp), max(timestamp))
from order_milestones om
where milestone in ('milestone1', 'milestone2')
group by om.orderid;
If you want to do this more generally on every row, you can use a cumulative aggregation function:
select om.*,
datediff(seconds,
timestamp,
min(case when milestone = 'order' then timestamp end) over
(partition by orderid
order by timestamp
rows between current row and unbounded following
)
) as time_to_order
from order_milestones om
group by om.orderid;
You can create a lookup table taking a milestone and giving you the previous milestone. Then you can left join to it and left join back to the original table to get the row for the same order at the previous milestone and compare the dates (sqlfiddle):
select om.*, datediff(minute, pm.TIMESTAMP, om.TIMESTAMP) as [Minutes]
from OrderMilestones om
left join MilestoneSequence ms on ms.ORDERMILESTONE = om.ORDERMILESTONE
left join OrderMilestones pm on pm.ORDERID = om.ORDERID
and pm.ORDERMILESTONE = ms.PREVIOUSMILESTONE
order by om.TIMESTAMP

Avoiding roundtrips in the database caused by looping

I am using postgres and, I recently encountered that the code I am using has too many roundtrips.
What I am doing is basically getting data from a table on a daily basis because I have to look for changes on a daily basis, but the whole function that does this job is called once a month.
An example of my table
Amount
Id | Itemid | Amount | Date
1 | 2 | 50 | 20-5-20
Now this table can be updated to add items at any point in time and I have to see the total amount that is SUM(Amount) every day.
But here's the catch, I have to add interest to the amount of each day at the rate of 5%.
So I can't just once call the function, I have to look at its value every day.
For example if I add an item of 50$ on the 1st of may then the interest on that day is 5/100*50
I add another item on the 5th of may worth 50$ and now the interest on the 5th day is 5/100*50.
But prior to 5th, the interest was on only 50$ so If I just simply use SUM(Amount)*5/100. It is wrong.
Also, another issue is the fact that dates are stored as timestamps and I need to group it by date of the timestamp because if I group it on the basis of timestamp then it will create multiple rows for the same date which I want to avoid while taking the sum.
So if there are two entries on the same date but different hours ideally the query should sum it up as one single date.
Example
Amount Table
Date | Amount
2020-5-5 20:8:8 100
2020-5-5 7:8:8 | 100
Result should be
Amount Table
Date | Amount
2020-5-5 200
My current code.
for i in numberofdaysinthemonth:
amount = amount + session.query(func.sum(Amount.Amount)).filter(Amount.date<current_date).scalar() * 5/100
I want a query that gets all these values according to dates, for example
date | Sum of amount till that date
20-5-20 | 50
20-6-20 | 100
Any ideas about what I should do to avoid a loop that runs 30 times since the function is called once in a month.
I am supposed to get all this data in a table daywise and aggregated as the sum of amount for each day
That is a simple "running total"
select "date",
sum(amount) over (order by "date") as amount_til_date
from the_table
order by "date";
If you need the amount per itemid
select "date",
sum(amount) over (partition by itemid order by "date") as amount_til_date
from the_table
order by "date";
If you also need to calculate the "compound interest rate" up to that day, you can do that as well:
select item_id,
"date",
sum(amount) over (partition by itemid order by "date") as amount_til_date,
sum(amount) over (partition by item_id order by "date") * power(1.05, count(*) over (partition by item_id order by "date")) as compound_interest
from the_table
order by "date";
To get that for a specific month, add a WHERE clause:
where "date" >= date '2020-06-01'
and "date" < date '2020-07-01'
In general to avoid round trips between application and database, application code must be moved from application to database in stored code (stored procedures an stored functions) using a procedural language. This approach is sometimes called "thick database" in commercial databases like Oracle Database.
PostgreSQL default procedural language is pl/pgsql but you can use Java, Perl, Python, Javascript using PostgreSQL extensions that you would need to install in PostgreSQL.

How to join rows from the same table with different dates

I am only a beginner in SQL and I am encountering the following problem:
I have a table with a list of SKU orders where each row displays the SKU, DELIVERY DATE, AND ORDER QUANTITY. I want to somehow rearrange the table in a way that the rows contain not only the delivery date for that given quantity, but also the following delivery date that occured in the future.
The table currently looks like that:
SKU/ DELIVERY_DATE/ QUANTITY_ORDERED
1.SKUx 14/3/2020 200
2.SKUx 19/3/2020 400
3.SKUx 27/3/2020 550
What I want to achieve is this:
SKU/ DELIVERY_DATE/ **NEXT_DELIVERY_DATE**/ QUANTITY_ORDERED <br/>
1.SKUx 14/3/2020 **19/3/2020** 200
2.SKUx 19/3/2020 **27/3/2020** 400
3.SKUx 27/3/2020 **NULL** 550
Keep in mind, as shown above, that the days between two deliveries vary (5 days between 14/3-19/3 and 8 days between 27/3-19/3) and therefore cannot pick an absolute value to make the column reappear twice e.g
SELECT SKU, DELIVERY_DATE,
DELIVERY_DATE + 5 AS NEXT_DELIVERY_DATE,
QUANTITY_ORDERED
FROM TABLE1
Any help is much appreciated!
Use lead():
select t1.*,
lead(delivery_date) over (partition by sku order by delivery_date) as next_delivery_date
from table1 t1

SQL Pattern to select only first row from a group

I have a table which has a large number of entries and from which I need only the first in each group.
The table is used to store daily fund prices (1000+ funds) over the last 30 years. I need to find the last price prior to, or on a specific date for each fund existing on that date (so only one row per fund).
In its simplified form, the table has columns Date, FundCode and Price.
The following input
Date FundCode Price
2016/01/05 X123 1.234
2016/01/04 X123 1.233
2016/01/03 X123 1.222
2016/01/05 A456 1.876
2016/01/04 A456 1.822
2016/01/03 A456 1.776
2016/01/03 M234 3.234
...when queried for 2016/01/04, should produce
Date FundCode Price
2016/01/04 X123 1.233
2016/01/04 A456 1.822
2016/01/03 M234 3.234
I have a solution which uses a correlated subquery in the where but no amount of messing with indexes will make it run in a reasonable amount of time.
I'm sure there's a straightforward solution to this but I just can't see it.
Somethink like
SELECT fundCode, price, date FROM your_table WHERE date<='date_you_need' GROUP BY fundCode HAVING MAX(date)
Query like this works in SQLITE, what db do you use?
A single nested query gives me the max date for each fund, then inner join to this on fundCode/Date, thus...
SELECT
Date,
FundCode,
Price
FROM
PriceHistory H
INNER JOIN
/* this nested query gives the max date for each fund*/
(SELECT
FundCode,
max(Date) AS MaxDate
FROM
PriceHistory H2
WHERE
Date<=#DateToSearchFor
GROUP BY
FundCode) AS RowSelector
ON H.FundCode=RowSelector.FundCode AND H.Date=RowSelector.MaxDate

Finding StartDate,EndDate from overriding StartDates

I have two tables with the following (simplified) structures:
table "Factors" which holds data about purchased goods' factors and has these columns:
FactorSerial, PurchaseDate, PurchasedGood
table "Prices" which holds goods prices on different dates
Serial, GoodCode, EvaluationDate, Price
A price is valid until a new row with the same Code but different date is added and thus updates its value
Now, I want to create a table which adds the price to the table 1 according to purchase date.
So if we have:
PurchaseDate PurchasedGood
-----------------------------
05/20/2011 A111
and:
GoodCode EvaluationDate Price
--------------------------------
A111 02/01/2011 100
...
A111 04/01/2011 110
...
A111 06/01/2011 120
the result would be
PurchaseDate PurchasedGood Price
-----------------------------------
05/20/2011 A111 110
Preferred method is creating the view Prices1 as
Serial GoodCode StartDate EndDate Price
and then joining Factors with this view by
PurchasedDate between StartDate AND EndDate
Can anybody show me how to create view1 (or obtaining the final result with any other method)? Thanks in advance!
P.S. sorry for my bad English!
I want to create a table which adds the price to the table 1 according to purchase date.
Here is a query that returns such data. The syntax is pretty standard SQL, I believe, but this was tested on SQL Server (looks like you may be using PostgreSQL with your "serial" naming).
select a.FactorSerial, a.PurchasedGood, a.PurchaseDate
, (select max(Price) from Prices where GoodCode = a.PurchasedGood and EvaluationDate = a.EvaluationDate) as Price
from (
select f.FactorSerial, f.PurchasedGood, f.PurchaseDate, max(p.EvaluationDate) as EvaluationDate
from Factors as f
join Prices as p on f.PurchasedGood = p.GoodCode
where f.PurchaseDate >= p.EvaluationDate
group by f.FactorSerial, f.PurchasedGood, f.PurchaseDate
) as a
This query assumes that there are no Purchases before a Price existed.
Also, considering:
Preferred method is creating the view Prices1 as
Serial GoodCode StartDate EndDate Price
and then joining Factors with this view by
PurchasedDate between StartDate AND EndDate
between is inclusive. Using this method that you've described, you would get duplicates if a PurchaseDate lies on the EndDate of one row and the StartDate of another.