Suppose we have this table and it has a column with multiple values separated by commas. I need to be able to separate the comma values and make a separate row out of it.
SELECT * FROM the_table
customer_id customer_value
1 aaa,bbb,ccc
2 ddd,ggg,ttt,lll
3 ppp,nnn,mmm,kkk,fff
I do not know if regexp_extract is the right function to use here but I am unable to create a new row.
SELECT *,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",1) as value_1,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",2) as value_2
FROM the_table
customer_id customer_value value_1 value_2
1 aaa,bbb,ccc aaa bbb
2 ddd,ggg,ttt,lll ddd ggg
3 ppp,nnn,mmm,kkk,fff ppp nnn
What I am looking for:
SELECT * FROM the_table
customer_id customer_value customer_value_comma
1 aaa,bbb,ccc aaa
1 aaa,bbb,ccc bbb
1 aaa,bbb,ccc ccc
2 ddd,ggg,ttt,lll ddd
2 ddd,ggg,ttt,lll ggg
2 ddd,ggg,ttt,lll ttt
2 ddd,ggg,ttt,lll lll.........
Here's your SQL:
SELECT
*,
explode( -- turn array into rows
split(customer_value, ",") -- make an array
as customer_value_comma -- rename column
)
FROM the_table
Here's it in pyspark:
from pyspark.sql.functions import split, explode, col
data = [(1,"aaa,bbb,ccc"),
(2,"ddd,ggg,ttt,lll"),
(3,"ppp,nnn,mmm,kkk,fff")]
df = sc.parallelize(data).toDF(["customer_id","customer_value"])
df.withColumn("cust_value_array",explode(split(col("customer_value"),","))).show()
table
id text
1 aaa
121 bbb
4 ccc
1 ddd
new table
id text2
1 aaaddd
121 bbb
4 ccc
I do not think I can use PIVOT since I never know how many and what id and text values would be so I cannot hardcode them in a PIVOT instruction.
use group by with string_agg
select id,string_agg(text,'') as text2
from table
group by id
I have two tables with schemas as shown below:
user.table1
pid => varchar
timestamp => timestamp
sid => varchar
pid timestamp sid attribute1 attribute2 ...
-------------------------------------------------------------------
1 2020/01/20 10 ... ... ...
2 2020/02/28 10 ... ... ...
3 2020/03/01 10 ... ... ...
4 2020/04/08 20 ... ... ...
5 2020/05/31 20 ... ... ...
6 2020/06/30 20 ... ... ...
7 2020/06/31 20 ... ... ...
8 2020/07/31 20 ... ... ...
user.table2
pid => varchar
text => blob
pid text
-------------------------------------------------------------------
1 xxx
2 abc
3 def
4 yyy
5 sss
6 abc
7 rrr
8 fff
I need to create a third table that should contain the information shown below:
user.table3
pid timestamp sid text
-------------------------------
1 2020/01/20 10 xxx
2 2020/02/28 10 abc
3 2020/03/01 10 def
4 2020/04/08 20 yyy
5 2020/05/31 20 sss
6 2020/06/30 20 abc
7 2020/06/31 20 rrr
8 2020/07/31 20 fff
Any idea how to do the select, inner join, and the insert in one SQL statement?
The reason I want to do it in one SQL statement is that I don't want to pull the information from the DB into Java and then write it back to the DB. I did the latter earlier, but that is running very slowly.
Currently, even the select I wrote below is not working.
select
table1.pid, table1.sid, table1.timestamp, utl_raw.cast_to_varchar2(dbms_lob.substr(table2.text,10,1))
from user.table1 inner join user.table2
using pid
where pid in ('1', '2');
Just turn your select into an insert statement
INSERT INTO USER.table3(pid, sid, timestamp, text)
SELECT t1.pid,
t1.sid,
t1.timestamp,
UTL_RAW.cast_to_varchar2 (DBMS_LOB.SUBSTR (t2.text, 10, 1))
FROM USER.table1 t1 INNER JOIN USER.table2 t2 ON (t1.pid = t2.pid)
WHERE t1.pid IN ('1', '2');
How can I add an identity number so that when a row is inserted an incremental number is assigned as below by a trigger? I am using SQL Server.
1 AAA
2 AAA
3 BBB
4 CCC
5 CCC
6 CCC
7 DDD
8 DDD
9 EEE
....
And I want to convert it to:
1 AAA 1
2 AAA 2
4 CCC 1
5 CCC 2
6 CCC 3
7 DDD 1
8 DDD 2
You could create a FUNCTION which get a name and gives MAX identity for given parameter:
CREATE FUNCTION [dbo].[GetIdentityForName] (#Name VARCHAR(MAX))
RETURNS INT
AS
BEGIN
RETURN
(SELECT ISNULL(MAX(NameId),0)+1
FROM YourTable
WHERE Name = #Name);
END
and then set DefaultValue for NameId for call the function when a record has been inserted like this:
ALTER TABLE YourTable ADD CONSTRAINT
DF_Identity_NameId DEFAULT ([dbo].[GetIdentityForName](Name)) FOR NameId
Assuming that YourTable is (Id, Name, NameId).
I hope to be helpful for you :)
There is no reason why you have to store the value. You could calculate it when you need it:
select t.*, row_number() over (partition by name order by id) as nameId
from t;
This question already has an answer here:
Insert Into... Merge... Select (SQL Server)
(1 answer)
Closed 4 years ago.
I'm currently working on some kind of data mapping. Lets say I have the following three tables:
TemporaryTable
RUNID | DocId | Amount
E 7 50
C 6 12
Table1
T1ID | DocID | Amount
1 5 10
2 6 20
3 6 50
Table2
T2ID | RUNID | T1Id
1 B 1
2 C 2
3 D 3
In table Table1 and Table2 the columns T1ID and T2ID are identity columns that are populated automatically.
What I want to do now is to insert the values from TemporaryTable into Table1 and save the value in column RunID from TemporaryTable and the newly generated T1ID to Table2
The resulting table should look like this:
Table1
T1ID | DocID | Amount
1 5 10
2 6 20
3 6 50
4 7 50
5 6 12
Table2
T2ID | RUNID | T1Id
1 B 1
2 C 2
3 D 3
4 E 4
5 C 5
I would like to do so with the help of the output statement. Something like this:
CREATE TABLE #map(T1ID, RUNID)
INSERT INTO Table1(DocId, Amount)
OUTPUT inserted.T1ID, t.RunId INTO #map
SELECT t.DocId, t.Amount
FROM TemporaryTable t
This obviously doesn't work since I have no access to t.RunId in the output statement. How could this be done?
You can use MERGE command with some always-false condition to simulate your insert with all columns available in OUTPUT
MERGE Table1 t1
USING TemporaryTable t
ON 1=2
WHEN NOT MATCHED THEN
INSERT (DocId, Amount)
VALUES (t.DocId, t.Amount)
OUTPUT inserted.T1ID, t.RunId
INTO #map ;