How to normalize a multi-values in a single field (SQL)

How to normalize a multi-values in a single field (SQL) - sql

I have a table which consists of column names Foo & Bar where Foo is a unique ID and bar contains multi-values seperated by ~
Foo Bar
1 A~B~
2 A~C~D
I need it to be normalised as such:
Foo Bar
1 A
1 B
2 A
2 C
2 D
While I can do it from Excel by separating TEXT to Column followed by pivoting, it is not doable as I have 1 million over records and Bar column may contain up to 12 different values.
Is there a simple way which I could manipulate straight from SQL?

you have a standard 1 to many relationship here. so you have 1 Foo to many Bars. So you need to make your data abide by 2nd Normal Form here (2NF).
Here is a SO post explaining the best way to split the string column value into rows like you want:
Turning a Comma Separated string into individual rows

You didn't specify your DBMS so this is for Postgres:
select t.foo, b.bar
from the_table t,
unnest(string_to_array(t.bar, '~')) as b(bar);

Thanks all. The script below works wonder even though I do not understand XML or the logic.
SELECT A.FOO,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
(
SELECT FOO,
CAST ('' + REPLACE(BAR, ',', '') + '' AS XML) AS Data
FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Reference:
Turning a Comma Separated string into individual rows

Related

BigQuery - Concatenate multiple columns into a single column for large numbers of columns

I have data that looks like:
row
col1
col2
col3
...
coln
1
A
null
B
...
null
2
null
B
C
...
D
3
null
null
null
...
A
I want to condense the columns together to get:
row
final
1
A, B
2
B, C, D
3
A
The order of the letters doesn't matter, and if the solution includes the nulls eg. A,null,B,null ect. I can work out how to remove them later. I've used up to coln as I have about 200 columns to condense.
I've tried a few things and if I were trying to condense rows I could use STRING_AGG() example
Additionally I could do this:
SELECT
CONCAT(col1,", ",col2,", ",col3,", ",coln) #ect.
FROM mytable
However, this would involve writing out each column name by hand which isn't really feasible. Is there a better way to achieve this ideally for the whole table.
Additionally CONCAT returns NULL if any value is NULL.

#standardSQL
select row,
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.* except(row))), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as final
from `project.dataset.table` t
if to apply to sample data in your question - output is

Not in exact format that you asked for, but you can try if this simplifies things for you:
SELECT TO_JSON_STRING(mytable) FROM mytable
If you want the exact format, you can write a regex to extract values from the output JSON string.

sql how to convert multi select field to rows with totals

I have a table that has a field where the contents are a concatenated list of selections from a multi-select form. I would like to convert the data in this field into in another table where each row has the text of the selection and a count the number of times this selection was made.
eg.
Original table:
id selections
1 A;B
2 B;D
3 A;B;D
4 C
I would like to get the following out:
selection count
A 2
B 3
C 1
D 2
I could easily do this with split and maps in javascript etc, but not sure how to approach it in SQL. (I use Postgresql) The goal is to use the second table to plot a graph in Google Data Studio.

A much simpler solution:
select regexp_split_to_table(selections, ';'), count(*)
from test_table
group by 1
order by 1;

You can use a lateral join and handy set-returning function regexp_split_to_table() to unnest the strings to rows, then aggregate and count:
select x.selection, count(*) cnt
from mytable t
cross join lateral regexp_split_to_table(t.selections, ';') x(selection)
group by x.selection

Get Distinct value from a list in SQL Server

I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!

I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.

If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);

Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895

need to split a column value

I have a below table
id name total
1 a 2
2 b 3
3 c,d,e,f 15
Expected Output:-
id name total
1 a 2
2 b 3
3 c 15
4 d 15
5 e 15
5 f 15
I tried split function and also XML, but didn't work.

As you dont specify the DB name, Assuming SQL SERVER. You can try this one.
Working Example
SELECT A.[id],
Split.a.value('.', 'VARCHAR(100)') AS String,A.total
FROM (SELECT [id],
CAST ('<M>' + REPLACE([name], ',', '</M><M>') + '</M>' AS XML) AS String ,
[total]
FROM #t) AS A
CROSS APPLY String.nodes ('/M') AS Split(a);
Refer this article

Which version of SQL are you using?
The split function is for splitting a string of text, but what you are requesting is a change to the format of the table itself.
Your table has a tuple of id=3, name=c,d,e,f, total=15.
If you want id=3, name=c and so on, you have to change the data.
From the way your question is phrased, it implies that you want the data to be presented in a different way, but the id is the defining column which differentiates between rows in the database.
You could automatically generate a new table, in which case the split statement would be useful to get each element out of your comma separated record.
Once you have that list of items, assuming your id field is an identity field (auto incrementing), you could run an insert statement for each element.
You might be able to get the sort of output you're looking for using an inner select that splits the comma separated list of values, but you would need some procedural SQL (or T-SQL... you do not specify your SQL server) to iterate over the values and insert them into a new table.
If you do go down this route, the id values will have to be thrown away, and you would treat the list as just a raw data set.
EDIT: The example posted by Have No Display Name is about as close as you're going to get with the data in the form it is.
The IDs for the names 'c','d','e' and 'f' will all be 3, but your format will be very close.

substring and trim in Teradata

I am working in Teradata with some descriptive data that needs to be transformed from a gerneric varchar(60) into the different field lengths based on the type of data element and the attribute value. So I need to take whatever is in the Varchar(60) and based on field 'ABCD' act on field 'XYZ'. In this case XYZ is a varchar(3). To do this I am using CASE logic within my select. What I want to do is
eliminate all occurances of non alphabet/numeric data. All I want left are upper case Alpha chars and numbers.
In this case "Where abcd = 'GROUP' then xyz should come out as a '000', '002', 'A', 'C'
eliminate extra padding
Shift everything Right
abcd xyz
1 GROUP NULL
2 GROUP $
3 GROUP 000000000000000000000000000000000000000000000000000000000000
4 GROUP 000000000000000000000000000000000000000000000000000000000002
5 GROUP A
6 GROUP C
7 GROUP r
To do this I have tried TRIM and SUBSTR amongst several other things that did not work. I have pasted what I have working now, but I am not reliably working through the data within the select. I am really looking for some options on how to better work with strings in Teradata. I have been working out of the "SQL Functions, Operators, Expressions and Predicates" online PDF. Is there a better reference. We are on TD 13
SELECT abcd
, CASE
-- xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
WHEN abcd= 'GROUP'
THEN(
CASE
WHEN SUBSTR(tx.abcd,60, 4) = 0
THEN (
SUBSTR(tx.abcd,60, 3)
)
ELSE
TRIM (TRAILING FROM tx.abcd)
END
)
END AS abcd
FROM db.descr tx
WHERE tx.abcd IS IN ( 'GROUP')
The end result should look like this
abcd xyz
1 GROUP 000
2 GROUP 002
3 GROUP A
4 GROUP C
I will have to deal with approx 60 different "abcd" types, but they should all conform to the type of data I am currently seeing.. ie.. mixed case, non numeric, non alphabet, padded, etc..
I know there is a better way, but I have come in several circles trying to figure this out over the weekend and need a little push in the right direction.
Thanks in advance,
Pat

The SQL below uses the CHARACTER_LENGTH function to first determine if there is a need to perform what amounts to a RIGHT(tx.xyz, 3) using the native functions in Teradata 13.x. I think this may accomplish what you are looking to do. I hope I have not misinterpreted your explanation:
SELECT CASE WHEN tx.abcd = 'GROUP'
AND CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz) > 3
THEN SUBSTRING(tx.xyz FROM (CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz)) - 3))
ELSE tx.abcd
END
FROM db.descr tx;
EDIT: Fixed parenthesis in SUBSTRING

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to normalize a multi-values in a single field (SQL) - sql

You didn't specify your DBMS so this is for Postgres: select t.foo, b.bar from the_table t, unnest(string_to_array(t.bar, '~')) as b(bar);

Related

BigQuery - Concatenate multiple columns into a single column for large numbers of columns

sql how to convert multi select field to rows with totals

Get Distinct value from a list in SQL Server

need to split a column value

substring and trim in Teradata

Categories

Resources