Repeat values with in the GROUP in SQL - hive

I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9

select
groupid,
seq,
tag,
last_value(tag) ignore nulls over (
partition by groupid
order by seq
) as new_tag
from t
order by groupid, seq;
GRO SEQ TAG NEW_TAG
1 1 - -
1 2 - -
1 3 - -
1 4 4 4
1 5 - 4
1 6 - 4
1 7 - 4
1 8 - 4
2 1 - -
2 2 - -
2 3 - -
2 4 - -
2 5 5 5
2 6 - 5
2 7 - 5
2 8 - 5
2 9 9 9
2 10 - 9
2 11 - 9
19 rows selected.

Related

SQL append data based on multiple dates

I have two tables; one contains encounter dates and the other order dates. They look like this:
id enc_id enc_dt
1 5 06/11/20
1 6 07/21/21
1 7 09/15/21
2 2 04/21/20
2 5 05/05/20
id enc_id ord_dt
1 1 03/7/20
1 2 04/14/20
1 3 05/15/20
1 4 05/30/20
1 5 06/12/20
1 6 07/21/21
1 7 09/16/21
1 8 10/20/21
1 9 10/31/21
2 1 04/15/20
2 2 04/21/20
2 3 04/30/20
2 4 05/02/20
2 5 05/05/20
2 6 05/10/20
The order and encounter date can be the same, or differ slightly for the same encounter ID. I'm trying to get a table that contains all order dates before each encounter date. So the data would like this:
id enc_id enc_dt enc_key
1 1 03/7/20 5
1 2 04/14/20 5
1 3 05/15/20 5
1 4 05/30/20 5
1 5 06/11/20 5
1 1 03/7/20 6
1 2 04/14/20 6
1 3 05/15/20 6
1 4 05/30/20 6
1 5 06/12/20 6
1 6 07/21/21 6
1 1 03/7/20 7
1 2 04/14/20 7
1 3 05/15/20 7
1 4 05/30/20 7
1 5 06/12/20 7
1 6 07/21/21 7
1 7 09/15/21 7
2 1 04/15/20 2
2 2 04/21/20 2
2 1 04/15/20 5
2 2 04/21/20 5
2 3 04/30/20 5
2 4 05/02/20 5
2 5 05/05/20 5
Is there a way to do this? I am having trouble figuring out how to append the orders and encounter table for each encounter based on orders that occur before a certain date.
You may join the two tables as the following:
SELECT O.id, O.enc_id, O.ord_dt, E.enc_id
FROM
order_tbl O
JOIN encounter_tbl E
ON O.ord_dt <= E.enc_dt AND
O.id = E.id
See a demo from db<>fiddle.

How to order "group" of row?

This is the query:
SELECT WorkTypeId, WorktypeWorkID, LevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
This is the result:
WorkTypeId WorktypeWorkID LevelID
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
1 3 1
1 4 1
1 4 2
1 5 1
NULL NULL NULL
3 19 2
4 6 1
4 7 1
4 7 2
4 7 3
4 17 1
4 17 2
4 18 1
4 18 2
NULL NULL NULL
I'd like to order the block of rows of each WorktypeWorkID, placing at the top the blocks which have the lower LevelID within the block.
Here's the result that I'd like to get:
WorkTypeId WorktypeWorkID LevelID
NULL NULL NULL
NULL NULL NULL
1 3 1 // blocks which have MinLevel 1
1 5 1
4 6 1
1 4 1 // blocks which have MinLevel 2
1 4 2
3 19 2
4 17 1
4 17 2
4 18 1
4 18 2
1 1 1 // blocks which have MinLevel 3
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
4 7 1
4 7 2
4 7 3
I think this is what you are looking for:
SELECT WorkTypeId, WorktypeWorkID, LevelID, MAX(LevelID) OVER (PARTITION BY WorktypeWorkID) as maxLevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
ORDER BY maxLevelID

Group counts in new column

I want a new column "group_count". This shows me in how many groups in total the attribute occurs.
Group Attribute group_count
0 1 10 4
1 1 10 4
2 1 10 4
3 2 10 4
4 2 20 1
5 3 30 1
6 3 10 4
7 4 10 4
I tried to groupby Group and attributes and then transform by using count
df["group_count"] = df.groupby(["Group", "Attributes"])["Attributes"].transform("count")
Group Attribute group_count
0 1 10 3
1 1 10 3
2 1 10 3
3 2 10 1
4 2 20 1
5 3 30 1
6 3 10 1
7 4 10 1
But it doesnt work
Use df.drop_duplicates(['Group','Attribute']) to get unique Attribute per group , then groupby on Atttribute to get count of Group, finally map with original Attribute column.
m=df.drop_duplicates(['Group','Attribute'])
df['group_count']=df['Attribute'].map(m.groupby('Attribute')['Group'].count())
print(df)
Group Attribute group_count
0 1 10 4
1 1 10 4
2 1 10 4
3 2 10 4
4 2 20 1
5 3 30 1
6 3 10 4
7 4 10 4
Use DataFrameGroupBy.nunique with transform:
df['group_count1'] = df.groupby('Attribute')['Group'].transform('nunique')
print (df)
Group Attribute group_count group_count1
0 1 10 4 4
1 1 10 4 4
2 1 10 4 4
3 2 10 4 4
4 2 20 1 1
5 3 30 1 1
6 3 10 4 4
7 4 10 4 4

pandas drop duplicate row value from a specific column

I want to remove the duplicate row value from a specific column - in this case the column name is "number".
Before:
number qty status
0 10 2 go
1 10 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 5 6 go
7 5 3 nogo
8 1 10 yes
9 1 10 go
10 5 2 nah
After:
number qty status
0 10 2 go
5 nogo
1 4 6 yes
2 3 1 no
3 2 7 go
4 5 2 nah
6 go
3 nogo
5 1 10 yes
10 go
6 5 2 nah
It is possible replace values to empty string or NaNs by mask with duplicated by new Series a created by comparing column with shifted column with cumsum:
a = df['number'].ne(df['number'].shift()).cumsum()
#for replace ''
df['number'] = df['number'].mask(a.duplicated(), '')
#for replace NaNs
#df['number'] = df['number'].mask(a.duplicated())
print (df)
number qty status
0 10 2 go
1 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 6 go
7 3 nogo
8 1 10 yes
9 10 go
10 5 2 nah
Detail:
a = df['number'].ne(df['number'].shift()).cumsum()
print (a)
0 1
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 6
9 6
10 7
Name: number, dtype: int32

Return unique combinations from many to many join

I have a hierarchy table with the following data :
SOURCE TARGET Level ID
0 1 1 1
0 2 1 2
2 3 2 3
2 4 2 4
2 5 2 5
1 3 2 6
1 4 2 7
1 5 2 8
5 3 3 9
5 3 3 10
4 3 3 11
4 3 3 12
3 6 3 13
3 6 3 14
3 6 4 15
3 6 4 16
3 6 4 17
3 6 4 18
The SOURCE and TARGET rows are the original data and are used to connect between parents and children. for example, the third row (SOURCE 2, TARGET 3 on LEVEL 2) connects to the second row (SOURCE 0, TARGET 2 on LEVEL 1) since the Source of the first equals the target of the second.
The ID column is added at the end using a ROW_NUMBER function and is used to give each row a unique ID.
It may be easier to understand if SOURCE is replaced with PARENT and TARGET with CHILD.
I join the table to itself in order to find the "parent".
I want each "instance" of a "source" on each level to connect to one of its parents. It's not important which ones connect but all need to be connected and to different parents.
The final results should look something like this:
SOURCE TARGET Level ID P_ID
0 1 1 1 NULL
0 2 1 2 NULL
2 3 2 3 2
2 4 2 4 2
2 5 2 5 2
1 3 2 6 1
1 4 2 7 1
1 5 2 8 1
5 3 3 9 5
5 3 3 10 8
4 3 3 11 4
4 3 3 12 7
3 6 3 13 3
3 6 3 14 6
3 6 4 15 9
3 6 4 16 10
3 6 4 17 11
3 6 4 18 12
Any suggestions on how to write a good ms-sql query for this?
Link to sample data and SQL Fiddle
The query to use is below.
;with cte as (
select *,rn=row_number() over (partition by level, target
order by id),
lc=count(1) over (partition by level, target)
from tbl
)
select a.*, b.id as parent_id
from cte a
left join cte b on b.level=a.level-1
and b.target=a.source
and b.rn=(a.rn-1)%b.lc+1
order by id
Items are sequenced at each level/target combination
Children are linked to parents using by sequence, however if there are more children than parents, the MOD (%) operator takes care of going back to the first parent and continues distribution