SQL append data based on multiple dates - sql

I have two tables; one contains encounter dates and the other order dates. They look like this:
id enc_id enc_dt
1 5 06/11/20
1 6 07/21/21
1 7 09/15/21
2 2 04/21/20
2 5 05/05/20
id enc_id ord_dt
1 1 03/7/20
1 2 04/14/20
1 3 05/15/20
1 4 05/30/20
1 5 06/12/20
1 6 07/21/21
1 7 09/16/21
1 8 10/20/21
1 9 10/31/21
2 1 04/15/20
2 2 04/21/20
2 3 04/30/20
2 4 05/02/20
2 5 05/05/20
2 6 05/10/20
The order and encounter date can be the same, or differ slightly for the same encounter ID. I'm trying to get a table that contains all order dates before each encounter date. So the data would like this:
id enc_id enc_dt enc_key
1 1 03/7/20 5
1 2 04/14/20 5
1 3 05/15/20 5
1 4 05/30/20 5
1 5 06/11/20 5
1 1 03/7/20 6
1 2 04/14/20 6
1 3 05/15/20 6
1 4 05/30/20 6
1 5 06/12/20 6
1 6 07/21/21 6
1 1 03/7/20 7
1 2 04/14/20 7
1 3 05/15/20 7
1 4 05/30/20 7
1 5 06/12/20 7
1 6 07/21/21 7
1 7 09/15/21 7
2 1 04/15/20 2
2 2 04/21/20 2
2 1 04/15/20 5
2 2 04/21/20 5
2 3 04/30/20 5
2 4 05/02/20 5
2 5 05/05/20 5
Is there a way to do this? I am having trouble figuring out how to append the orders and encounter table for each encounter based on orders that occur before a certain date.

You may join the two tables as the following:
SELECT O.id, O.enc_id, O.ord_dt, E.enc_id
FROM
order_tbl O
JOIN encounter_tbl E
ON O.ord_dt <= E.enc_dt AND
O.id = E.id
See a demo from db<>fiddle.

Related

Pandas sum with groupby on condition

I have this dataframe:
id priority quantity
0 A 1 2
1 A 2 4
2 A 3 4
3 A 4 2
4 B 1 5
5 B 2 7
6 B 3 2
7 B 4 3
that I want to turn into this one:
id priority quantity cumulativeQuantity
0 A 1 2 2
1 A 2 4 6
2 A 3 4 10
3 A 4 2 12
4 B 1 5 5
5 B 2 7 12
6 B 3 2 14
7 B 4 3 17
Columns id, priority and quantity haven't changed.
cumulativeQuantity is the sum, by id, of all quantity from 1 to n where n is the priority of the current row.
priority can take any value. Only orders matter. We sum if priority is lower than the priority of the current row.
ANSWER:
df.groupby(['id','priority']).sum().groupby(level=0).cumsum().reset_index()

Group counts in new column

I want a new column "group_count". This shows me in how many groups in total the attribute occurs.
Group Attribute group_count
0 1 10 4
1 1 10 4
2 1 10 4
3 2 10 4
4 2 20 1
5 3 30 1
6 3 10 4
7 4 10 4
I tried to groupby Group and attributes and then transform by using count
df["group_count"] = df.groupby(["Group", "Attributes"])["Attributes"].transform("count")
Group Attribute group_count
0 1 10 3
1 1 10 3
2 1 10 3
3 2 10 1
4 2 20 1
5 3 30 1
6 3 10 1
7 4 10 1
But it doesnt work
Use df.drop_duplicates(['Group','Attribute']) to get unique Attribute per group , then groupby on Atttribute to get count of Group, finally map with original Attribute column.
m=df.drop_duplicates(['Group','Attribute'])
df['group_count']=df['Attribute'].map(m.groupby('Attribute')['Group'].count())
print(df)
Group Attribute group_count
0 1 10 4
1 1 10 4
2 1 10 4
3 2 10 4
4 2 20 1
5 3 30 1
6 3 10 4
7 4 10 4
Use DataFrameGroupBy.nunique with transform:
df['group_count1'] = df.groupby('Attribute')['Group'].transform('nunique')
print (df)
Group Attribute group_count group_count1
0 1 10 4 4
1 1 10 4 4
2 1 10 4 4
3 2 10 4 4
4 2 20 1 1
5 3 30 1 1
6 3 10 4 4
7 4 10 4 4

Repeat values with in the GROUP in SQL

I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9
select
groupid,
seq,
tag,
last_value(tag) ignore nulls over (
partition by groupid
order by seq
) as new_tag
from t
order by groupid, seq;
GRO SEQ TAG NEW_TAG
1 1 - -
1 2 - -
1 3 - -
1 4 4 4
1 5 - 4
1 6 - 4
1 7 - 4
1 8 - 4
2 1 - -
2 2 - -
2 3 - -
2 4 - -
2 5 5 5
2 6 - 5
2 7 - 5
2 8 - 5
2 9 9 9
2 10 - 9
2 11 - 9
19 rows selected.

summarising a 3 months sales report across 2 branches into top 3 product for each month

I have the following REPORT table
m = month,
pid = product_id,
bid = branch_id,
s = sales
m pid bid s
--------------------------
1 1 1 20
1 3 1 11
1 2 1 14
1 4 1 16
1 5 1 31
1 1 2 30
1 3 2 10
1 2 2 24
1 4 2 17
1 5 2 41
2 3 1 43
2 5 1 21
2 4 1 10
2 1 1 5
2 2 1 12
2 3 2 22
2 5 2 10
2 4 2 5
2 1 2 4
2 2 2 10
3 3 1 21
3 5 1 10
3 4 1 44
3 1 1 4
3 2 1 14
3 3 2 10
3 5 2 5
3 4 2 6
3 1 2 7
3 2 2 10
I'd like to have a summary of this sales table
by showing the top 3 sales among the products across all branches.
something like this:
m pid total
---------------------
1 5 72
1 1 50
1 4 33
2 3 65
2 5 31
2 2 22
3 4 50
3 3 31
3 2 24
so on month 1, product #5 has the highest total sales with 72, followed by product #1 is 50.. and so on. if i could separate them into different table for each month would be better
so far what i can do is make a summary for 1 month and shows the entire thing and not top 3.
select pid, sum(s)
from report
where m = 1
group by pid
order by sum(s);
thanks a lot!
Most databases support the ANSI standard window functions. You can do what you want with row_number():
select m, pid, s
from (select r.m, r.pid, sum(s) as s,
row_number() over (partition by m order by sum(s) desc) as seqnum
from report r
group by r.m, r.pid
) r
where seqnum <= 3
order by m, s desc;

Return unique combinations from many to many join

I have a hierarchy table with the following data :
SOURCE TARGET Level ID
0 1 1 1
0 2 1 2
2 3 2 3
2 4 2 4
2 5 2 5
1 3 2 6
1 4 2 7
1 5 2 8
5 3 3 9
5 3 3 10
4 3 3 11
4 3 3 12
3 6 3 13
3 6 3 14
3 6 4 15
3 6 4 16
3 6 4 17
3 6 4 18
The SOURCE and TARGET rows are the original data and are used to connect between parents and children. for example, the third row (SOURCE 2, TARGET 3 on LEVEL 2) connects to the second row (SOURCE 0, TARGET 2 on LEVEL 1) since the Source of the first equals the target of the second.
The ID column is added at the end using a ROW_NUMBER function and is used to give each row a unique ID.
It may be easier to understand if SOURCE is replaced with PARENT and TARGET with CHILD.
I join the table to itself in order to find the "parent".
I want each "instance" of a "source" on each level to connect to one of its parents. It's not important which ones connect but all need to be connected and to different parents.
The final results should look something like this:
SOURCE TARGET Level ID P_ID
0 1 1 1 NULL
0 2 1 2 NULL
2 3 2 3 2
2 4 2 4 2
2 5 2 5 2
1 3 2 6 1
1 4 2 7 1
1 5 2 8 1
5 3 3 9 5
5 3 3 10 8
4 3 3 11 4
4 3 3 12 7
3 6 3 13 3
3 6 3 14 6
3 6 4 15 9
3 6 4 16 10
3 6 4 17 11
3 6 4 18 12
Any suggestions on how to write a good ms-sql query for this?
Link to sample data and SQL Fiddle
The query to use is below.
;with cte as (
select *,rn=row_number() over (partition by level, target
order by id),
lc=count(1) over (partition by level, target)
from tbl
)
select a.*, b.id as parent_id
from cte a
left join cte b on b.level=a.level-1
and b.target=a.source
and b.rn=(a.rn-1)%b.lc+1
order by id
Items are sequenced at each level/target combination
Children are linked to parents using by sequence, however if there are more children than parents, the MOD (%) operator takes care of going back to the first parent and continues distribution