regexp_extract insert new row for each comma - sql

Suppose we have this table and it has a column with multiple values separated by commas. I need to be able to separate the comma values and make a separate row out of it.
SELECT * FROM the_table
customer_id customer_value
1 aaa,bbb,ccc
2 ddd,ggg,ttt,lll
3 ppp,nnn,mmm,kkk,fff
I do not know if regexp_extract is the right function to use here but I am unable to create a new row.
SELECT *,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",1) as value_1,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",2) as value_2
FROM the_table
customer_id customer_value value_1 value_2
1 aaa,bbb,ccc aaa bbb
2 ddd,ggg,ttt,lll ddd ggg
3 ppp,nnn,mmm,kkk,fff ppp nnn
What I am looking for:
SELECT * FROM the_table
customer_id customer_value customer_value_comma
1 aaa,bbb,ccc aaa
1 aaa,bbb,ccc bbb
1 aaa,bbb,ccc ccc
2 ddd,ggg,ttt,lll ddd
2 ddd,ggg,ttt,lll ggg
2 ddd,ggg,ttt,lll ttt
2 ddd,ggg,ttt,lll lll.........

Here's your SQL:
SELECT
*,
explode( -- turn array into rows
split(customer_value, ",") -- make an array
as customer_value_comma -- rename column
)
FROM the_table
Here's it in pyspark:
from pyspark.sql.functions import split, explode, col
data = [(1,"aaa,bbb,ccc"),
(2,"ddd,ggg,ttt,lll"),
(3,"ppp,nnn,mmm,kkk,fff")]
df = sc.parallelize(data).toDF(["customer_id","customer_value"])
df.withColumn("cust_value_array",explode(split(col("customer_value"),","))).show()

Related

How to transform data rows into new column?

table
id text
1 aaa
121 bbb
4 ccc
1 ddd
new table
id text2
1 aaaddd
121 bbb
4 ccc
I do not think I can use PIVOT since I never know how many and what id and text values would be so I cannot hardcode them in a PIVOT instruction.
use group by with string_agg
select id,string_agg(text,'') as text2
from table
group by id

SQL Auto Number rows with grouping

I am not sure if this can even be done, but what I am needing to do is create row numbers for specific columns and not just a simple row_number() as I need a specific pattern or sequence and am not sure how to handle this. Below is the desired result I am attempting in SQL.
COL_1 COL_2 DESIRED RESULT
AAA AAA 0
AAA BBB 1
AAA BBB 1
AAA CCC 2
AAA DDD 3
ABB ABB 0
ABB BBB 1
ABB CCC 2
ABB CCC 2
ABB DDD 3
I interpret this as wanting to enumerate the values of col_2 within col_1.
If so, you can use dense_rank():
select t.*,
dense_rank() over (partition by col_1 order by col_2) - 1 as ranking
from t;
This assumes that the ordering you want is based on the col_2 column within each col_1.

SQL find distinct values count, 2 times repeated values count, 3 times repeated values count and so on

For Example,
AAA
BBB
BBB
CCC
BBB
CCC
DDD
DDD
There are 4 unique values (AAA, BBB, CCC, DDD), 2 two times repeated values
(CCC, DDD), 1 three times repeated values (BBB). I want to write a sql query for this problem. please help
So the answer is 4, 2, 1.
You can do this using row_number():
select, seqnum, count(*)
from (select col, row_number() over (partition by col) as seqnum
from t
group by col
) h
group by seqnum
order by seqnum;

SQL SERVER: TOP 10 rows per field - if less than 10 rows then display empty rows

I need to select TOP 10 ACCT rows based on SYS_CD value. Hence i wrote the below query. The query working fine.
SELECT SYS_CD, ACCT, CNTACCT ,rowid
FROM
( SELECT SYS_CD, ACCT, COUNT(ACCT) AS CNTACCT,
ROW_NUMBER() OVER (PARTITION BY SYS_CD
ORDER BY COUNT(ACCT) DESC
)
AS rowid
FROM [FCIDIAL].[dbo].table1
WHERE ERR_CD != 'Y'
GROUP BY SYS_CD, ACCT
) tmp
WHERE rowid <= 10
ORDER BY SYS_CD, rowid, ACCT;
It providing the below result
SYS_CD FIN_AAAT CNTFIN_AAAT rowid
AAA 606000 4 1
AAA 566000 3 2
AAA 503200 1 3
BBB 251260 42433978 1
BBB 400601 41181797 2
BBB 400401 8399908 3
BBB 503200 2087703 4
BBB 604000 40795 5
BBB 130039 4748 6
BBB 252000 655 7
BBB 736000 40 8
BBB 735000 38 9
BBB 734000 36 10
CCC 233210 73611 1
CCC 464250 39397 2
CCC 186020 35231 3
CCC 265155 4949 4
The query result also correct.
But my expected output is, for a SYS_CD if the rowid is less than 10 then display blank rows for the remaining rows.
Example: In the above 'AAA' present with only 3 rowids. So i need to display 7 blank rows.
'BBB'is present with 10 rowids. So no need of blank rows.
'CCC' is present with 4 rowids, so i need to display 6 blank rows.
I expect the below output.
SYS_CD ACCT CNTACCT rowid
AAA 606000 4 1
AAA 566000 3 2
AAA 503200 1 3
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
BBB 251260 42433978 1
BBB 400601 41181797 2
BBB 400401 8399908 3
BBB 503200 2087703 4
BBB 604000 40795 5
BBB 130039 4748 6
BBB 252000 655 7
BBB 736000 40 8
BBB 735000 38 9
BBB 734000 36 10
CCC 233210 73611 1
CCC 464250 39397 2
CCC 186020 35231 3
CCC 265155 4949 4
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
- Blank Row
How i can achieve this desired result.
You need to get all your values for SYS_CD, and table of numbers from 1 - 10:
SELECT ccd.SYS_CD, n.RowID
FROM (SELECT DISTINCT SYS_CD FROM [FCIDIAL].[dbo].table1 WHERE ERR_CD != 'Y') AS ccd
CROSS JOIN (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS n (RowID);
Once you have this you can LEFT JOIN back to your original query, so you will end up with NULL for missing records:
WITH tmp AS
( SELECT SYS_CD,
ACCT,
COUNT(ACCT) AS CNTACCT,
ROW_NUMBER() OVER (PARTITION BY SYS_CD ORDER BY COUNT(ACCT) DESC) AS rowid
FROM [FCIDIAL].[dbo].table1
WHERE ERR_CD != 'Y'
GROUP BY SYS_CD, ACCT
)
SELECT ccd.SYS_CD, tmp.ACCT, tmp.CNTACCT, n.RowID
FROM (SELECT DISTINCT SYS_CD FROM [FCIDIAL].[dbo].table1 WHERE ERR_CD != 'Y') AS ccd
CROSS JOIN (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS n (RowID)
LEFT JOIN tmp
ON tmp.SYS_CD = ccd.SYS_CD
AND tmp.rowid = n.RowID
ORDER BY ccd.Sys_CD, n.RowID;
If you don't have an existing tally table you can generate one for this on the fly pretty easily. This should work assuming you are on 2008+.
with MyData as
(
SELECT SYS_CD
, ACCT
, COUNT(ACCT) AS CNTACCT
, ROW_NUMBER() OVER (PARTITION BY SYS_CD ORDER BY COUNT(ACCT) DESC) AS rowid
FROM [FCIDIAL].[dbo].table1
WHERE ERR_CD != 'Y'
GROUP BY SYS_CD
, ACCT
)
select *
from (Values(1),(2), (3), (4), (5), (6), (7), (8), (9), (10)) n(x)
left join MyData d on d.rowid = n.x
order by d.SYS_CD, n.x

filter column + insert

I have two tables:
TABLE1:
field1 | field2 | field3
1 5 aaa
2 10 bbb
3 10 ccc
4 10 ddd
5 10 eee
6 6 fff
7 7 ggg
TABLE2:
will have the insert of all values that contain in field2 >= 2 equals value
so in this case it should be like this:
TABLE2:
field1 | field2 | field3
2 10 bbb
3 10 ccc
4 10 ddd
5 10 eee
how can I know whats values have the >= 2 same name? and make this insert?
If I understand, you want the second table to have all duplicates (relative to field2) in the first table.
select field1, field2, field3
into table2
from (select t.*, count(*) over (partition by field2) as cnt
from table1 t
) t
where cnt >= 2;
This creates the second table using select into. If it already exists, use insert . . . select instead.