I have two tables; one contains encounter dates and the other order dates. They look like this:
id enc_id enc_dt
1 5 06/11/20
1 6 07/21/21
1 7 09/15/21
2 2 04/21/20
2 5 05/05/20
id enc_id ord_dt
1 1 03/7/20
1 2 04/14/20
1 3 05/15/20
1 4 05/30/20
1 5 06/12/20
1 6 07/21/21
1 7 09/16/21
1 8 10/20/21
1 9 10/31/21
2 1 04/15/20
2 2 04/21/20
2 3 04/30/20
2 4 05/02/20
2 5 05/05/20
2 6 05/10/20
The order and encounter date can be the same, or differ slightly for the same encounter ID. I'm trying to get a table that contains all order dates before each encounter date. So the data would like this:
id enc_id enc_dt enc_key
1 1 03/7/20 5
1 2 04/14/20 5
1 3 05/15/20 5
1 4 05/30/20 5
1 5 06/11/20 5
1 1 03/7/20 6
1 2 04/14/20 6
1 3 05/15/20 6
1 4 05/30/20 6
1 5 06/12/20 6
1 6 07/21/21 6
1 1 03/7/20 7
1 2 04/14/20 7
1 3 05/15/20 7
1 4 05/30/20 7
1 5 06/12/20 7
1 6 07/21/21 7
1 7 09/15/21 7
2 1 04/15/20 2
2 2 04/21/20 2
2 1 04/15/20 5
2 2 04/21/20 5
2 3 04/30/20 5
2 4 05/02/20 5
2 5 05/05/20 5
Is there a way to do this? I am having trouble figuring out how to append the orders and encounter table for each encounter based on orders that occur before a certain date.
You may join the two tables as the following:
SELECT O.id, O.enc_id, O.ord_dt, E.enc_id
FROM
order_tbl O
JOIN encounter_tbl E
ON O.ord_dt <= E.enc_dt AND
O.id = E.id
See a demo from db<>fiddle.
I have a dataframe:
df = 0 1 2 3 4
1 1 3 2 5
4 1 5 7 8
7 1 2 3 9
I want to enforce monotonically per row, to get:
df = 0 1 2 3 4
1 1 3 3 5
4 4 5 7 8
7 7 7 7 9
What is the best way to do so?
Try cummax
out = df.cummax(1)
Out[80]:
0 1 2 3 4
0 1 1 3 3 5
1 4 4 5 7 8
2 7 7 7 7 9
I need help with comparing two dataframes. For example:
The first dataframe is
df_1 =
0 1 2 3 4 5
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 2 2 2 2 2 2
5 5 5 5 5 5 5
6 1 1 1 1 1 1
7 6 6 6 6 6 6
The second dataframe is
df_2 =
0 1 2 3 4 5
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 5 5 5 5 5 5
5 6 6 6 6 6 6
May I know if there is a way (without using for loop) to find the index of the rows of df_1 that have the same row values of df_2. In the example above, my expected output is below
index =
0
1
2
3
5
7
The size of the column of the "index" variable above should have the same column size of df_2.
If the same row of df_2 repeated in df_1 more than once, I only need the index of the first appearance, thats why I don't need the index 4 and 6.
Please help. Thank you so much!
Tommy
Use DataFrame.merge with DataFrame.drop_duplicates and DataFrame.reset_index for convert index to column for avoid lost index values, last select column called index:
s = df_2.merge(df_1.drop_duplicates().reset_index())['index']
print (s)
0 0
1 1
2 2
3 3
4 5
5 7
Name: index, dtype: int64
Detail:
print (df_2.merge(df_1.drop_duplicates().reset_index()))
0 1 2 3 4 5 index
0 1 1 1 1 1 1 0
1 2 2 2 2 2 2 1
2 3 3 3 3 3 3 2
3 4 4 4 4 4 4 3
4 5 5 5 5 5 5 5
5 6 6 6 6 6 6 7
Check the solution
df1=pd.DataFrame({'0':[1,2,3,4,2,5,1,6],
'1':[1,2,3,4,2,5,1,6],
'2':[1,2,3,4,2,5,1,6],
'3':[1,2,3,4,2,5,1,6],
'4':[1,2,3,4,2,5,1,6],
'5':[1,2,3,4,2,5,1,6]})
df1=pd.DataFrame({'0':[1,2,3,4,5,6],
'1':[1,2,3,4,5,66],
'2':[1,2,3,4,5,6],
'3':[1,2,3,4,5,66],
'4':[1,2,3,4,5,6],
'5':[1,2,3,4,5,6]})
df1[df1.isin(df2)].index.values.tolist()
### Output
[0, 1, 2, 3, 4, 5, 6, 7]
I want to remove the duplicate row value from a specific column - in this case the column name is "number".
Before:
number qty status
0 10 2 go
1 10 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 5 6 go
7 5 3 nogo
8 1 10 yes
9 1 10 go
10 5 2 nah
After:
number qty status
0 10 2 go
5 nogo
1 4 6 yes
2 3 1 no
3 2 7 go
4 5 2 nah
6 go
3 nogo
5 1 10 yes
10 go
6 5 2 nah
It is possible replace values to empty string or NaNs by mask with duplicated by new Series a created by comparing column with shifted column with cumsum:
a = df['number'].ne(df['number'].shift()).cumsum()
#for replace ''
df['number'] = df['number'].mask(a.duplicated(), '')
#for replace NaNs
#df['number'] = df['number'].mask(a.duplicated())
print (df)
number qty status
0 10 2 go
1 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 6 go
7 3 nogo
8 1 10 yes
9 10 go
10 5 2 nah
Detail:
a = df['number'].ne(df['number'].shift()).cumsum()
print (a)
0 1
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 6
9 6
10 7
Name: number, dtype: int32
I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9
select
groupid,
seq,
tag,
last_value(tag) ignore nulls over (
partition by groupid
order by seq
) as new_tag
from t
order by groupid, seq;
GRO SEQ TAG NEW_TAG
1 1 - -
1 2 - -
1 3 - -
1 4 4 4
1 5 - 4
1 6 - 4
1 7 - 4
1 8 - 4
2 1 - -
2 2 - -
2 3 - -
2 4 - -
2 5 5 5
2 6 - 5
2 7 - 5
2 8 - 5
2 9 9 9
2 10 - 9
2 11 - 9
19 rows selected.