how to rank the column in pyspark with group by clause - apache-spark-sql

I have a dataframe that looks like:
A B C
---------------
A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B1 C1
A2 B1 C2
A2 B1 C3
A3 B2 C1
A3 B2 C2
A3 B2 C3
How do I rank as per column A,B? Expected Output:
A B C rank
-----------------------
A1 B1 C1 1
A1 B1 C2 2
A1 B1 C3 3
A2 B1 C1 1
A2 B1 C2 2
A2 B1 C3 3
A3 B2 C1 1
A3 B2 C2 2
A3 B2 C3 3
I want to perform group by on column A,B and give the rank as per change in value of column C..?

Can you try the following?
df.withColumn("rank", F.rank().over(Window.partitionBy("A", "B").orderBy("C")))

Related

Pandas create groups from column values

I have a dataframe df as follows:
Col1 Col2
A1 A1
B1 A1
B1 B1
C1 C1
D1 A1
D1 B1
D1 D1
E1 A1
I am trying to achieve the following:
Col1 Group
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
i.e. in df every value which have relationship gets grouped together as a single value. i.e. in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.
I am not sure how to do this.
This can be approached using a graph.
Here is your graph:
You can use networkx to find the connected_components:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')
d = {}
for g in nx.connected_components(G):
g = sorted(g)
for x in g:
d[x] = g[0]
out = pd.Series(d)
output:
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
dtype: object

Create two separate columns based on two conditions from single table using ORACLE/SQL

I have a table like
A1 A2 A3 A4 A5 A6
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
and I want do the task like:
SELECT A2 AS B2 WHERE A1 = 1
SELECT A2 AS B3 WHERE A1 = 2
How should I do this?
I think you can use a query like this:
SELECT
CASE WHEN A1 = 1 THEN A2 END As B2,
CASE WHEN A1 = 2 THEN A2 END As B3,
FROM
yourTable;
Note that B2 is as same as A2 if A1 = 1 else is null and so on.
SELECT
CASE WHEN A1 = 1 THEN A2 END As B2,
CASE WHEN A1 = 2 THEN A2 END As B3
FROM TABLE_NAME
WHERE A1 = 1 OR A1 = 2

PostgreSQL cross join Table

Initial Situation:
Table1:
Table1 s1_a s2_b s3_c s_key
Table1 a1 b1 c1 1
Table1 a2 b2 c2 2
Table1 a3 b3 c3 3
Table1 a4 b4 c4 4
Table2:
Table2 d1_q d2_w d3_e d_key
Table2 q1 w1 e1 1
Table2 q2 w2 e2 2
Table2 q3 w3 e3 3
How can I get this result: common columns are s_key & d_key -> key
Extract View s1_a s2_b s3_c key d1_q d2_w d3_e
Extract View a1 b1 c1 1
Extract View a2 b2 c2 2
Extract View a3 b3 c3 3
Extract View a4 b4 c4 4
Extract View 1 q1 w1 e1
Extract View 2 q2 w2 e2
Extract View 3 q3 w3 e3
No reason for a cross join here. Just an old fashioned UNION ALL will do the trick:
SELECT s1_a, s2_b, s3_c, s_key, NULL as d1_q, NULL as d2_w, NULL as d3_e FROM Table1
UNION ALL
SELECT NULL, NULL, NULL, d_key, d1_q, d2_w, d3_e FROM Table2

SQL Copy data within a table

I want to make a copy of some of the data in a table, and change only one column. It looks like the following:
Before:
C1 C2 C3 .... // C1 C2 C3 are columns, C3 have same value
a1 b1 c
a2 b2 c
a3 b3 c
After:
C1 C2 C3 .... // for the copy, all columns are same except C3, all c and f are same value
a1 b1 c
a2 b2 c
a3 b3 c
...
a1 b1 f
a2 b2 f
a3 b3 f
Is there any quick way to do this in sql? Thanks!
insert into your_table (C1, C2, C3)
select C1, C2, 'f'
from your_table

Convert sql row into columns

I have a table:
A B c
a1 1 a11
a1 2 a12
a1 3 a13
a2 1 a21
a2 2 a22
a2 3 a23
and I want to convert it to:
A C1 C2 C3
a1 a11 a12 a13
a2 a21 a22 a23
How can I write a SQL query to achieve this... I do not want to convert my table to csv and use python and do it...
SELECT A,
MAX(CASE WHEN B=1 THEN c END) AS C1,
MAX(CASE WHEN B=2 THEN c END) AS C2, // etc for B=3,4
FROM table1
GROUP BY A