SQL Copy data within a table - sql

I want to make a copy of some of the data in a table, and change only one column. It looks like the following:
Before:
C1 C2 C3 .... // C1 C2 C3 are columns, C3 have same value
a1 b1 c
a2 b2 c
a3 b3 c
After:
C1 C2 C3 .... // for the copy, all columns are same except C3, all c and f are same value
a1 b1 c
a2 b2 c
a3 b3 c
...
a1 b1 f
a2 b2 f
a3 b3 f
Is there any quick way to do this in sql? Thanks!

insert into your_table (C1, C2, C3)
select C1, C2, 'f'
from your_table

Related

How to flip records of an in-memory table upside down?

I have an in-memory table as follows:
A1 B1 C1
A2 B2 C2
A3 B3 C3
How can I turn it into the table below?
A3 B3 C3
A2 B2 C2
A1 B1 C1
I have tried by adding an auto-increment field. Use keyword order by and desc to sort the records based on the new column on descending order. Then, I get the expected result after deleting this column. I wonder if there is a more convenient way to get a reversed table.
You can use the rowNo function to sort columns without adding a new auto-increment field.
t=table(`A1`A2`A3 as col1,`B1`B2`B3 as col2,`C1`C2`C3 as col3)
select * from t order by rowNo(col1) desc
Output:
col1 col2 col3
---- ---- ----
A3 B3 C3
A2 B2 C2
A1 B1 C1

Pandas create groups from column values

I have a dataframe df as follows:
Col1 Col2
A1 A1
B1 A1
B1 B1
C1 C1
D1 A1
D1 B1
D1 D1
E1 A1
I am trying to achieve the following:
Col1 Group
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
i.e. in df every value which have relationship gets grouped together as a single value. i.e. in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.
I am not sure how to do this.
This can be approached using a graph.
Here is your graph:
You can use networkx to find the connected_components:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')
d = {}
for g in nx.connected_components(G):
g = sorted(g)
for x in g:
d[x] = g[0]
out = pd.Series(d)
output:
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
dtype: object

how to rank the column in pyspark with group by clause

I have a dataframe that looks like:
A B C
---------------
A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B1 C1
A2 B1 C2
A2 B1 C3
A3 B2 C1
A3 B2 C2
A3 B2 C3
How do I rank as per column A,B? Expected Output:
A B C rank
-----------------------
A1 B1 C1 1
A1 B1 C2 2
A1 B1 C3 3
A2 B1 C1 1
A2 B1 C2 2
A2 B1 C3 3
A3 B2 C1 1
A3 B2 C2 2
A3 B2 C3 3
I want to perform group by on column A,B and give the rank as per change in value of column C..?
Can you try the following?
df.withColumn("rank", F.rank().over(Window.partitionBy("A", "B").orderBy("C")))

create delimited string from table1 depend table 2 [duplicate]

This question already has answers here:
join comma delimited data column
(7 answers)
Closed 8 years ago.
my table1 is :
T1
col1 col2
A1 C1,C2
A2 C3,C5,C6
A3 C4
A4 C2,C5
and so table 2:
T2
col1 col2 col3
A1 C1 reaction
A1 C2 accept
A2 C5 reaction
A2 C6 manager
A4 C2 manager
how to result this?:
query result
col1 col2
A1 reaction,accept
A2 NULL,reaction,manager
A3 NULL
A4 manager,NULL
please help me?
Never, never, never store multiple values in one column.
Like you see now this will only give you headaches. Normalize your table T1. Then you can join normally.
It should look like this
col1 col2
A1 C1
A1 C2
A2 C3
A2 C5
A2 C6
A3 C4
A4 C2
A4 C5

Convert sql row into columns

I have a table:
A B c
a1 1 a11
a1 2 a12
a1 3 a13
a2 1 a21
a2 2 a22
a2 3 a23
and I want to convert it to:
A C1 C2 C3
a1 a11 a12 a13
a2 a21 a22 a23
How can I write a SQL query to achieve this... I do not want to convert my table to csv and use python and do it...
SELECT A,
MAX(CASE WHEN B=1 THEN c END) AS C1,
MAX(CASE WHEN B=2 THEN c END) AS C2, // etc for B=3,4
FROM table1
GROUP BY A