I have a dataframe df as follows:
Col1 Col2
A1 A1
B1 A1
B1 B1
C1 C1
D1 A1
D1 B1
D1 D1
E1 A1
I am trying to achieve the following:
Col1 Group
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
i.e. in df every value which have relationship gets grouped together as a single value. i.e. in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.
I am not sure how to do this.
This can be approached using a graph.
Here is your graph:
You can use networkx to find the connected_components:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')
d = {}
for g in nx.connected_components(G):
g = sorted(g)
for x in g:
d[x] = g[0]
out = pd.Series(d)
output:
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
dtype: object
Related
I have an in-memory table as follows:
A1 B1 C1
A2 B2 C2
A3 B3 C3
How can I turn it into the table below?
A3 B3 C3
A2 B2 C2
A1 B1 C1
I have tried by adding an auto-increment field. Use keyword order by and desc to sort the records based on the new column on descending order. Then, I get the expected result after deleting this column. I wonder if there is a more convenient way to get a reversed table.
You can use the rowNo function to sort columns without adding a new auto-increment field.
t=table(`A1`A2`A3 as col1,`B1`B2`B3 as col2,`C1`C2`C3 as col3)
select * from t order by rowNo(col1) desc
Output:
col1 col2 col3
---- ---- ----
A3 B3 C3
A2 B2 C2
A1 B1 C1
I have a table with sample values as below
In this table, all the values in Col1 will have its supporting values in Col2. The values A1 and A2 are like master values and they will never appear in Col2. I need to make an output displaying this master values in a new column like below
What would be the best way to achieve this in Oracle SQL?
Looks like a hierarchical query:
SQL> select connect_by_root t.col1 as main,
2 t.col1,
3 t.col2
4 from test t
5 start with t.col1 in ('A1', 'A2')
6 connect by t.col1 = prior t.col2
7 order by main, t.col1, t.col2;
MAIN COL1 COL2
----- ----- -----
A1 A1 B1
A1 A1 B2
A1 A1 B3
A1 B1 C1
A1 B2 C2
A1 C1 D1
A2 A2 E1
A2 A2 E2
A2 E1 F1
A2 E1 F2
10 rows selected.
SQL>
I have a dataframe that looks like:
A B C
---------------
A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B1 C1
A2 B1 C2
A2 B1 C3
A3 B2 C1
A3 B2 C2
A3 B2 C3
How do I rank as per column A,B? Expected Output:
A B C rank
-----------------------
A1 B1 C1 1
A1 B1 C2 2
A1 B1 C3 3
A2 B1 C1 1
A2 B1 C2 2
A2 B1 C3 3
A3 B2 C1 1
A3 B2 C2 2
A3 B2 C3 3
I want to perform group by on column A,B and give the rank as per change in value of column C..?
Can you try the following?
df.withColumn("rank", F.rank().over(Window.partitionBy("A", "B").orderBy("C")))
I want to make a copy of some of the data in a table, and change only one column. It looks like the following:
Before:
C1 C2 C3 .... // C1 C2 C3 are columns, C3 have same value
a1 b1 c
a2 b2 c
a3 b3 c
After:
C1 C2 C3 .... // for the copy, all columns are same except C3, all c and f are same value
a1 b1 c
a2 b2 c
a3 b3 c
...
a1 b1 f
a2 b2 f
a3 b3 f
Is there any quick way to do this in sql? Thanks!
insert into your_table (C1, C2, C3)
select C1, C2, 'f'
from your_table
I have a table:
A B c
a1 1 a11
a1 2 a12
a1 3 a13
a2 1 a21
a2 2 a22
a2 3 a23
and I want to convert it to:
A C1 C2 C3
a1 a11 a12 a13
a2 a21 a22 a23
How can I write a SQL query to achieve this... I do not want to convert my table to csv and use python and do it...
SELECT A,
MAX(CASE WHEN B=1 THEN c END) AS C1,
MAX(CASE WHEN B=2 THEN c END) AS C2, // etc for B=3,4
FROM table1
GROUP BY A