Remove duplicates from dataframe, based on two columns A,B, keeping [list of values] in another column C

Remove duplicates from dataframe, based on two columns A,B, keeping [list of values] in another column C - dataframe

I have a pandas dataframe which contains duplicates values according to two columns (A and B):
A B C
1 2 1
1 2 4
2 7 1
3 4 0
3 4 8
I want to remove duplicates keeping the values in column C inside a list of len N values in C (example 2 values in this example). This would lead to:
A B C
1 2 [1,4]
2 7 1
3 4 [0,8]
I cannot figure out how to do that. Maybe use groupby and drop_duplicates?

Related

Merge and inverleave rows of two dataframes [duplicate]

This question already has answers here:
Pandas - Interleave / Zip two DataFrames by row
(5 answers)
Closed 20 days ago.
This post was edited and submitted for review 20 days ago.
Suppose we have:
>>> df1
A B
0 1 a
1 2 a
2 3 a
3 4 a
>>> df2
A B
0 1 b
1 2 b
2 3 b
3 5 b
I would like to merge them on "A" and then list them by interleaving rows like:
A B
0 1 a
0 1 b
1 2 a
1 2 b
2 3 a
2 3 b
I tried merge but it list them column by column. For example if I have 3 or more data frames, merge can merge them on some columns, but my problem would be then to interleave them

If need match by A filter rows by Series.isin in boolean indexing, pass to concat with DataFrame.sort_index:
df = pd.concat([df1[df1.A.isin(df2.A)],
df2[df2.A.isin(df1.A)]]).sort_index(kind='stable')
print (df)
A B
0 1 a
0 1 b
1 2 a
1 2 b
2 3 a
2 3 b
EDIT:
For general data is possible sorting by A and create default index for correct interleaving:
df = (pd.concat([df1[df1.A.isin(df2.A)].sort_values('A', kind='stable').reset_index(drop=True),
df2[df2.A.isin(df1.A)].sort_values('A', kind='stable').reset_index(drop=True)])
.sort_index(kind='stable'))

Dynamic transpose of rows to column without pivot (Number of rows are not fixed all the time)

i have a table like
a 1
a 2
b 1
b 3
b 2
b 4
i wanted out put like this
1 2 3 4
a a
b b b b
Number of rows in output may vary.
Pivoting is not working as it is in exasol, and case cant work as it is dynamic

How to concatenate two dfs having a similar datetime column?

I have two dfs which have one identical datetime column. I want to concatenate columns from one df to another, skipping where the data is missing. I want to print NaN for missing data.
I tried writing a while loop to concatenate. It gave this error:
ValueError: Can only compare identically-labeled Series objects
while df['TIMESTAMP'] == x['TIMESTAMP']:
z = pd.concat([df,x],axis=1)
I expect to concatenate two dfs, x and df. df is full timestamp range and x has some missing values. I want to write the data from x to df w.r.t. datetime column. Write NaN for missing values.

When you concatenate dataframes it will add one to the bottom of another:
DF1:
A B C
1 2 5
2 5 3
DF2:
A D E
1 2 3
3 4 7
Given my two example dataframes if you concatenate you will get
DF_Concat:
A B C D E
1 2 5 NULL NULL
2 5 3 NULL NULL
1 NULL NULL 2 3
3 NULL NULL 4 7
Whereas a merge will return
DF_Merge:
A B C D E
1 2 5 2 3
2 5 3 NULL NULL
3 NULL NULL 4 7
It sounds to me like you are looking for a merge:
pd.merge(DF1, DF2, on='A')

i want to know how i get corresponding value of columns to the selective columns value

I am trying to get those rows from the table which is corresponding to the selective indexes. For example, i have one xls file in which different columns of data. currently my code search the selective two columns and their indexes also, know i want to search those selective rows corresponding elements which is in different rows.
Lets A B C D E F G are columns name in which 1000 of rows of numbers
like
A B c D E F G
1 3 4 5 6 3 3
3 4 5 6 3 2 7
.............
4 7 3 2 5 3 2
So Currently my code search two specific columns (lets suppose B and F selective values which is in some range), now i want to search column A value which is present in those selective ranges.
B F A
3 4 5
3 5 3
7 7 3
5 4 6
...
like this
This is my current code VI

I hope we've finally gotten to the bottom of it. How about this one?

Python: Add column to panda data frame with different column length

I have a panda dataframe and would like to add data columns using one common column as index. In case the new data does not have the index value it should enter a 0. The new column will have a different length. Is there a better way than using a loop? Example below
main Dataframe:
index_column date value
1 1 A
2 2 B
3 3 C
4 4 D
add new column:
date value
2 G
3 J
Result:
index_column date value new value
1 1 A 0
2 2 B G
3 3 C J
4 4 D 0
Many thanks!
Rolf

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicates from dataframe, based on two columns A,B, keeping [list of values] in another column C - dataframe

Related

Merge and inverleave rows of two dataframes [duplicate]

Dynamic transpose of rows to column without pivot (Number of rows are not fixed all the time)

How to concatenate two dfs having a similar datetime column?

i want to know how i get corresponding value of columns to the selective columns value

Python: Add column to panda data frame with different column length

Categories

Resources