How to order "group" of row? - sql

This is the query:
SELECT WorkTypeId, WorktypeWorkID, LevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
This is the result:
WorkTypeId WorktypeWorkID LevelID
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
1 3 1
1 4 1
1 4 2
1 5 1
NULL NULL NULL
3 19 2
4 6 1
4 7 1
4 7 2
4 7 3
4 17 1
4 17 2
4 18 1
4 18 2
NULL NULL NULL
I'd like to order the block of rows of each WorktypeWorkID, placing at the top the blocks which have the lower LevelID within the block.
Here's the result that I'd like to get:
WorkTypeId WorktypeWorkID LevelID
NULL NULL NULL
NULL NULL NULL
1 3 1 // blocks which have MinLevel 1
1 5 1
4 6 1
1 4 1 // blocks which have MinLevel 2
1 4 2
3 19 2
4 17 1
4 17 2
4 18 1
4 18 2
1 1 1 // blocks which have MinLevel 3
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
4 7 1
4 7 2
4 7 3

I think this is what you are looking for:
SELECT WorkTypeId, WorktypeWorkID, LevelID, MAX(LevelID) OVER (PARTITION BY WorktypeWorkID) as maxLevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
ORDER BY maxLevelID

Related

SQL append data based on multiple dates

I have two tables; one contains encounter dates and the other order dates. They look like this:
id enc_id enc_dt
1 5 06/11/20
1 6 07/21/21
1 7 09/15/21
2 2 04/21/20
2 5 05/05/20
id enc_id ord_dt
1 1 03/7/20
1 2 04/14/20
1 3 05/15/20
1 4 05/30/20
1 5 06/12/20
1 6 07/21/21
1 7 09/16/21
1 8 10/20/21
1 9 10/31/21
2 1 04/15/20
2 2 04/21/20
2 3 04/30/20
2 4 05/02/20
2 5 05/05/20
2 6 05/10/20
The order and encounter date can be the same, or differ slightly for the same encounter ID. I'm trying to get a table that contains all order dates before each encounter date. So the data would like this:
id enc_id enc_dt enc_key
1 1 03/7/20 5
1 2 04/14/20 5
1 3 05/15/20 5
1 4 05/30/20 5
1 5 06/11/20 5
1 1 03/7/20 6
1 2 04/14/20 6
1 3 05/15/20 6
1 4 05/30/20 6
1 5 06/12/20 6
1 6 07/21/21 6
1 1 03/7/20 7
1 2 04/14/20 7
1 3 05/15/20 7
1 4 05/30/20 7
1 5 06/12/20 7
1 6 07/21/21 7
1 7 09/15/21 7
2 1 04/15/20 2
2 2 04/21/20 2
2 1 04/15/20 5
2 2 04/21/20 5
2 3 04/30/20 5
2 4 05/02/20 5
2 5 05/05/20 5
Is there a way to do this? I am having trouble figuring out how to append the orders and encounter table for each encounter based on orders that occur before a certain date.
You may join the two tables as the following:
SELECT O.id, O.enc_id, O.ord_dt, E.enc_id
FROM
order_tbl O
JOIN encounter_tbl E
ON O.ord_dt <= E.enc_dt AND
O.id = E.id
See a demo from db<>fiddle.

How to compute column sum on the basis of other column value in pandas dataframe?

P
T1
T2
T3
0
1
2
3
1
1
2
0
2
3
1
2
3
1
0
2
In the above pandas dataframe df,
I want to add columns on the basis of the value of column 'P'.
if df['P'] == 0: 0
if df['P'] == 1: T1 (=1)
if df['P'] == 2: T1+T2 (=3+1=4)
if df['P'] == 3: T1+T2+T3 (=1+0+2=3)
In other words, I want to add from T1 to TN if df['P'] == N.
How can I implement this with Python code?
EDIT:
For sum values by P column create mask by broadcasting np.arange by length of filtered columns by DataFrame.filter, compare by P values and this mask pass to DataFrame.where, last use sum per rows:
np.random.seed(20)
c = [f'{x}{i + 1}' for x in ['T','U','V'] for i in range(3)]
df = pd.DataFrame(np.random.randint(4, size=(10,10)), columns=['P'] + c)
arrP = df['P'].to_numpy()[:, None]
for c in ['T','U','V']:
df1 = df.filter(regex=rf'^{c}')
df[f'{c}_SUM'] = df1.where(np.arange(len(df1.columns)) < arrP, 0).sum(axis=1)
print (df)
P T1 T2 T3 U1 U2 U3 V1 V2 V3 T_SUM U_SUM V_SUM
0 3 2 3 3 0 2 1 0 3 2 8 3 5
1 3 2 0 2 0 1 2 2 3 3 4 3 8
2 0 1 2 2 2 0 1 1 3 1 0 0 0
3 3 2 2 2 1 3 2 1 3 2 6 6 6
4 3 1 1 3 1 2 2 0 2 3 5 5 5
5 2 3 2 3 1 1 1 0 3 0 5 2 3
6 2 3 2 3 3 3 2 1 1 2 5 6 2
7 3 2 0 2 1 1 2 2 2 3 4 4 7
8 2 2 1 0 2 2 0 3 3 0 3 4 6
9 2 2 3 2 2 3 2 2 1 1 5 5 3

Repeat values with in the GROUP in SQL

I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9
select
groupid,
seq,
tag,
last_value(tag) ignore nulls over (
partition by groupid
order by seq
) as new_tag
from t
order by groupid, seq;
GRO SEQ TAG NEW_TAG
1 1 - -
1 2 - -
1 3 - -
1 4 4 4
1 5 - 4
1 6 - 4
1 7 - 4
1 8 - 4
2 1 - -
2 2 - -
2 3 - -
2 4 - -
2 5 5 5
2 6 - 5
2 7 - 5
2 8 - 5
2 9 9 9
2 10 - 9
2 11 - 9
19 rows selected.

Hive : Column data mismatch after left outer join?

I am facing strange behaviour as explained below. I have tow hive table T1 and T2 , joined with LEFT OUTER JOIN ..I am getting strange value for two columns t2c2 t2c3 of table T2 after join.
See below complete detail :
Table T1 :
create table T1 ( t1c1 int , t1c2 int , t1c3 int ) clustered by (t1c1) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
4 4 1
4 4 0
1 1 1
1 1 0
5 5 1
Table T2:
create table T2 ( t2c0 int , t2c1 int , t2c2 int , t2c3 int ) clustered by ( t2c1) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true'); ;
0 1 -1 3
0 1 0 0
0 4 6 6
0 4 1 6
1 1 0 2
1 4 3 5
1 4 2 5
Query :
select *
from T1 a
LEFT OUTER JOIN
T2 b
ON a.t1c2 = b.t2c1;
Result set : (not expected )
a.t1c1 a.t1c2 a.t1c3 b.t2c0 b.t2c1 b.t2c2 b.t2c3
4 4 1 1 4 4 3
4 4 1 0 4 4 6
4 4 1 1 4 4 2
4 4 1 0 4 4 1
4 4 0 1 4 4 3
4 4 0 0 4 4 6
4 4 0 1 4 4 2
4 4 0 0 4 4 1
1 1 1 0 1 1 -1
1 1 1 1 1 1 0
1 1 1 0 1 1 0
1 1 0 0 1 1 -1
1 1 0 1 1 1 0
1 1 0 0 1 1 0
5 5 1 NULL NULL NULL NULL
Error description :
values in result set b.t2c2 and b.t2c3 are strange and not expected . -1 in b.t2c3 is no more belong to T2.t2c3 , and 4 in b.t2c2 is no more belong to T2.t2c2.
I am not sure whats wrong with. Please help me to identify the issue and resolve it.
Expected result :
1 1 1 0 1 -1 3
1 1 1 1 1 0 2
1 1 1 0 1 0 0
1 1 0 0 1 -1 3
1 1 0 1 1 0 2
1 1 0 0 1 0 0
4 4 1 1 4 3 5
4 4 1 0 4 6 6
4 4 1 1 4 2 5
4 4 1 0 4 1 6
4 4 0 1 4 3 5
4 4 0 0 4 6 6
4 4 0 1 4 2 5
4 4 0 0 4 1 6
5 5 1 <null> <null> <null> <null>
but if i change query to below , it start giving correct result.
select *
from ( select * from T1) a
LEFT OUTER JOIN
( select * from T2) b
ON a.t1c2 = b.t2c1;

summarising a 3 months sales report across 2 branches into top 3 product for each month

I have the following REPORT table
m = month,
pid = product_id,
bid = branch_id,
s = sales
m pid bid s
--------------------------
1 1 1 20
1 3 1 11
1 2 1 14
1 4 1 16
1 5 1 31
1 1 2 30
1 3 2 10
1 2 2 24
1 4 2 17
1 5 2 41
2 3 1 43
2 5 1 21
2 4 1 10
2 1 1 5
2 2 1 12
2 3 2 22
2 5 2 10
2 4 2 5
2 1 2 4
2 2 2 10
3 3 1 21
3 5 1 10
3 4 1 44
3 1 1 4
3 2 1 14
3 3 2 10
3 5 2 5
3 4 2 6
3 1 2 7
3 2 2 10
I'd like to have a summary of this sales table
by showing the top 3 sales among the products across all branches.
something like this:
m pid total
---------------------
1 5 72
1 1 50
1 4 33
2 3 65
2 5 31
2 2 22
3 4 50
3 3 31
3 2 24
so on month 1, product #5 has the highest total sales with 72, followed by product #1 is 50.. and so on. if i could separate them into different table for each month would be better
so far what i can do is make a summary for 1 month and shows the entire thing and not top 3.
select pid, sum(s)
from report
where m = 1
group by pid
order by sum(s);
thanks a lot!
Most databases support the ANSI standard window functions. You can do what you want with row_number():
select m, pid, s
from (select r.m, r.pid, sum(s) as s,
row_number() over (partition by m order by sum(s) desc) as seqnum
from report r
group by r.m, r.pid
) r
where seqnum <= 3
order by m, s desc;