Compare values from one column in table A and another column in table B - sql

I need to create a NeedDate column in the expected output. I will compare the QtyShort from Table B with QtyReceive from table A.
In the expected output, if QtyShort = 0, NeedDate = MaltDueDate.
For the first row of table A, if 0 < QtyShort (in Table B) <= QtyReceive (=6), NeedDate = 10/08/2021 (DueDate from Table A).
If 6 < QtyShort <= 10 (QtyReceive), move to the second row, NeedDate = 10/22/2021 (DueDate from Table A).
If 10 < QtyShort <= 20 (QtyReceive), move to the third row, NeedDate = 02/01/2022 (DueDate from Table A).
If QtyShort > QtyReceive (=20), NeedDate = 09/09/9999.
This should continue in a loop until the last row on table B has been compared
How could we do this? Any help will be appreciated. Thank you in advance!
Table A
Item DueDate QtyReceive
A1 10/08/2021 6
A1 10/22/2021 10
A1 02/01/2022 20
Table B
Item MatlDueDate QtyShort
A1 06/01/2022 0
A1 06/02/2022 0
A1 06/03/2022 1
A1 06/04/2022 2
A1 06/05/2022 5
A1 06/06/2022 7
A1 06/07/2022 10
A1 06/08/2022 15
A1 06/09/2022 25
Expected Output:
Item MatlDueDate QtyShort NeedDate
A1 06/01/2022 0 06/01/2022
A1 06/02/2022 0 06/02/2022
A1 06/03/2022 1 10/08/2021
A1 06/04/2022 2 10/08/2021
A1 06/05/2022 5 10/08/2021
A1 06/06/2022 7 10/22/2021
A1 06/07/2022 10 10/22/2021
A1 06/08/2022 15 02/01/2022
A1 06/09/2022 25 09/09/9999

Use OUTER APPLY() operator to find the minimum DueDate from TableA that is able to fulfill the QtyShort
select b.Item, b.MatlDueDate, b.QtyShort,
NeedDate = case when b.QtyShort = 0
then b.MatlDueDate
else isnull(a.DueDate, '9999-09-09')
end
from TableB b
outer apply
(
select DueDate = min(a.DueDate)
from TableA a
where a.Item = b.Item
and a.QtyReceive >= b.QtyShort
) a
Result:
Item
MatlDueDate
QtyShort
NeedDate
A1
2022-06-01
0
2022-06-01
A1
2022-06-02
0
2022-06-02
A1
2022-06-03
1
2021-10-08
A1
2022-06-04
2
2021-10-08
A1
2022-06-05
5
2021-10-08
A1
2022-06-06
7
2021-10-22
A1
2022-06-07
10
2021-10-22
A1
2022-06-08
15
2022-02-01
A1
2022-06-09
25
9999-09-09
db<>fiddle demo

Related

Display values that are out of scope in addition to in scope

So I have a complex situation.
I have a 3 tables:
Product
Resource Name
Resource Type
C1
Bold
E2
Crema
C2
Bold
C3
Bold
Purchase_History
Resource Name
Qty
Cust ID
Date
Batch
C1
7
123
Jun 1
324
C1
7
222
Jun 10
324
C1
7
333
Jun 11
4BZ
C1
7
124
Jun 11
4BZ
C1
7
125
Jun 11
324
C1
7
111
Jun 21
324
C2
7
55
Jun 22
A22
C2
7
1
Jun 24
A22
Inventory
Resource Name
Available
Qty
Batch
C1
1
40
324
C2
1
50
3GC
C1
2
0
4BZ
C2
1
99
A22
E2
1
99
B22
E2
2
0
C22
So I've created a query as below:
Select
p.resourcename
, ph.cust_id
, ph.batch
, case when i.available=1 then 'Yes' when i.available=2 then 'no' else ''end 'In Stock'
from product p
join purchase_history ph on ph.resource_name=p.resource_name
join inventory i on i.batch=ph.batch
where
ph.date >='Jun 1'
ph.date <='Jun 20'
I am getting the following:
Resource Name
Cust Id
Batch
In Stock
C1
123
324
Yes
C1
222
324
Yes
C1
333
4BZ
No
C1
124
4BZ
No
C1
123
324
Yes
What I would like to achieve is the below, where even though the last 2 batch and products are out of the range of transactions, we can still see them as below. I know this is a weird as but essentially the team wants to see what has been sold so far - within the date range - and all product availability status. Is this something achievable?
Resource Name
Cust Id
Batch
In Stock
C1
123
324
Yes
C1
222
324
Yes
C1
333
4BZ
No
C1
124
4BZ
No
C1
123
324
Yes
C2
n/a
3GC
Yes
C2
n/a
A22
Yes
E2
n/a
B22
Yes
E2
n/a
C22
No
you need to use left join with purchase_history table:
Select
p.resourcename
, ph.cust_id
, i.batch
, case when i.available=1 then 'Yes' when i.available=2 then 'no' else ''end 'In Stock'
from product p
join inventory i on i.resource_name=p.resource_name
left join purchase_history ph
on ph.resource_name=p.resource_name
and i.batch=ph.batch
where
ph.date >='Jun 1' and ph.date <='Jun 20'
notice I changes the order of tables for better readability.

SQL/Presto: how to rank within a subgroup of each group

I have a table like the following:
group_id sub_group_id user_id score time
1 a1 ann 1 2019
1 a1 bob 1 2020
1 a2 cat 0 2020
2 b1 dan 0 2019
2 b1 eva 0 2019
2 b1 ed 1 2020
2 b2 liz 1 2020
i want to rank user_id within subgroup of each group by the score and then by time (earlier better) each user_id gets. so the desired output is
group_id sub_group_id user_id score time rank
1 a1 ann 1 2019 1
1 a1 bob 1 2020 2
1 a2 cat 0 2020 1
2 b1 dan 0 2019 1
2 b1 eva 0 2019 1
2 b1 ed 1 2020 2
2 b2 liz 1 2020 1
Use rank():
select t.*,
rank() over (partition by group_id, sub_group_id order by score desc, time) as ranking
from t;
Actually, I'm not sure if higher scores are better than lower ones, so you might want score asc.

Create column based on other column but with conditions

I'm very new to SQL and I've I simply cannot work out a method for the following:
I have this table with start dates of the codes
Row Code Start Date Product
1 A1 2020-01-01 X
2 A1 2020-05-15 Y
3 A2 2020-02-02 X
4 A3 2020-01-31 Z
5 A3 2020-02-15 Y
6 A3 2020-12-31 X
Ultimately I need to be able to query another table and find out what Product a code was on a certain date, so Code A1 on 2020-01-10 was = X, but Code A1 today is = Y
I think I can work out how to use a between statement in the where clause, but I cannot work out how to Alter the table to have an End Date so it looks like this:
Row Code Start_Date Product End_Date
1 A1 2020-01-01 X 2020-05-14
2 A1 2020-05-15 Y NULL
3 A2 2020-02-02 X NULL
4 A3 2020-01-31 Z 2020-02-14
5 A3 2020-02-15 Y 2020-12-30
6 A3 2020-12-31 X NULL
Please note the database does not have an End_Date field
I think you want lead():
select t.*,
dateadd(day, -1,
lead(start_date) over (partition by code order by start_date)
) as end_date
from t;
Note: I would recommend not subtracting one day for the end date, so the end date is non-inclusive. This makes the end date the same as the next start date, which I find is easier to ensure that there are no gaps or overlaps in the data.

create a new table from 2 other tables

If I want to merge the table with 2 other tables b,c
where table a contains columns:( Parent, Style, Ending_Date, WeekNum, Net_Requirment)
tables and calculate how much is required to make product A in a certain date.
The table should like the BOM (Bill of Material)
Can it be applied by pandas?
table b represent the demand for product A per date:
Style Date WeekNum Quantity
A 24/11/2019 0 600
A 01/12/2019 1 500
table c represent Details and quantity used to make product A:
Parent Child Q
A A1 2
A1 A11 3
A1 A12 2
so table a should be filled like this:
Parent Child Date WeekNum Net_Quantity
A A1 24/11/2019 0 1200
A1 A11 24/11/2019 0 3600
A1 A12 24/11/2019 0 2400
A A1 01/12/2019 1 1000
A1 A11 01/12/2019 1 3000
A1 A12 01/12/2019 1 2000
Welcome, in order to properly merge these tables and the rest you would have to have a common key to merge on. What you could do is add said key to each table like this:
data2 = {'Parent':['A','A1','A1'], 'Child':['A1','A11','A12'],
'Q':[2,3,2], 'Style':['A','A','A']}
df2 = pd.DataFrame(data2)
After this you can do a left join on the first table and then you can have multiple rows for the same date. So essentially this:
(notice if you do a left join, your left table will create as many duplicate rows as needed tu suffice the matching key on the right table)
data = {'Style':['A','A'], 'Date':['24/11/2019', '01/12/2019'],
'WeekNum':[0,1], 'Quantity':[600,500]}
df = pd.DataFrame(data)
mergeDf = df.merge(df2,how='left', left_on='Style', right_on='Style')
mergeDf
Then to calculate:
test['Net_Quantity'] = test.Quantity*test.Q
test.drop(['Q'], axis = 1,inplace=True)
result:
Style Date WeekNum Quantity Parent Child Net_Quantity
0 A 24/11/2019 0 600 A A1 1200
1 A 24/11/2019 0 600 A1 A11 1800
2 A 24/11/2019 0 600 A1 A12 1200
3 A 01/12/2019 1 500 A A1 1000
4 A 01/12/2019 1 500 A1 A11 1500
5 A 01/12/2019 1 500 A1 A12 1000

SQL: Counting and Numbering Duplicates - Optimising Correlated Subquery

In an SQLite database I have one table where I need to count the duplicates across certain columns (i.e. rows where 3 particular columns are the same) and then also number each of these cases (i.e. if there are 2 occurrences of a particular duplicate, they need to be numbered as 1 and 2). I'm finding it a bit difficult to explain in words so I'll use a simplified example below.
The data I have is similar to the following (first line is header row, table is referenced in following as "idcountdata"):
id match1 match2 match3 data
1 AbCde BC 0 data01
2 AbCde BC 0 data02
3 AbCde BC 1 data03
4 AbCde AB 0 data04
5 FGhiJ BC 0 data05
6 FGhiJ AB 0 data06
7 FGhiJ BC 1 data07
8 FGhiJ BC 1 data08
9 FGhiJ BC 2 data09
10 HkLMop BC 1 data10
11 HkLMop BC 1 data11
12 HkLMop BC 1 data12
13 HkLMop DE 1 data13
14 HkLMop DE 2 data14
15 HkLMop DE 2 data15
16 HkLMop DE 2 data16
17 HkLMop DE 2 data17
And the output I need to generate for the above would be:
id match1 match2 match3 data matchid matchcount
1 AbCde BC 0 data01 1 2
2 AbCde BC 0 data02 2 2
3 AbCde BC 1 data03 1 1
4 AbCde AB 0 data04 1 1
5 FGhiJ BC 0 data05 1 1
6 FGhiJ AB 0 data06 1 1
7 FGhiJ BC 1 data07 1 2
8 FGhiJ BC 1 data08 2 2
9 FGhiJ BC 2 data09 1 1
10 HkLMop BC 1 data10 1 3
11 HkLMop BC 1 data11 2 3
12 HkLMop BC 1 data12 3 3
13 HkLMop DE 1 data13 1 1
14 HkLMop DE 2 data14 1 4
15 HkLMop DE 2 data15 2 4
16 HkLMop DE 2 data16 3 4
17 HkLMop DE 2 data17 4 4
Previously I was using a couple of correlated subqueries to achieve this as follows:
SELECT id, match1, match2, match3, data,
(SELECT count(*) FROM idcountdata d2
WHERE d1.match1=d2.match1 AND d1.match2=d2.match2 AND d1.match3=d2.match3
AND d2.id<=d1.id)
AS matchid,
(SELECT count(*) FROM idcountdata d2
WHERE d1.match1=d2.match1 AND d1.match2=d2.match2 AND d1.match3=d2.match3)
AS matchcount
FROM idcountdata d1;
But the table has over 200,000 rows (and the data can be variable in length/content) and hence this takes hours to run. (Strangely, when I first used the same query on the same data back in mid-to-late 2013 it took minutes rather than hours, but that is beside the point - even back then I thought it was inelegant and inefficient.)
I've already converted the correlated subquery for "matchcount" in the above to an uncorrelated subquery with a JOIN as follows:
SELECT d1.id, d1.match1, d1.match2, d1.match3, d1.data,
matchcount
FROM idcountdata d1
JOIN
(SELECT id,match1,match2,match3,count(*) matchcount
FROM idcountdata
GROUP BY match1,match2,match3) d2
ON (d1.match1=d2.match1 and d1.match2=d2.match2 and d1.match3=d2.match3);
So it's just the subquery for "matchid" that I would like some help to optimise.
In short, the following query runs too slowly for larger datasets:
SELECT id, match1, match2, match3, data,
(SELECT count(*) FROM idcountdata d2
WHERE d1.match1=d2.match1 AND d1.match2=d2.match2 AND d1.match3=d2.match3
AND d2.id<=d1.id)
matchid
FROM idcountdata d1;
How can I improve the performance of the above query?
It doesn't have to run in seconds, but it needs to be minutes rather than hours (for around 200,000 rows).
A self join may be faster than a correlated subquery
SELECT d1.id, d1.match1, d1.match2, d1.match3, d1.data, count(*) matchid
FROM idcountdata d1
JOIN idcountdata d2 on d1.match1 = d2.match1
and d1.match2 = d2.match2
and d1.match3 = d2.match3
and d1.id >= d2.id
GROUP BY d1.id, d1.match1, d1.match2, d1.match3, d1.data
This query can take advantage of a composite index on (match1,match2,match3,id)