Unpack all arrays in a JSON column SQL Server 2019 - sql

Say I have a table Schema.table with these columns
id | json_col
on the forms e.g
id=1
json_col ={"names":["John","Peter"],"ages":["31","40"]}
The lengths of names and ages are always equal but might vary from id to id (size is at least 1 but no upper limit).
How do we get an "exploded" table - a table with a row for each "names", "ages" e.g
id | names | ages
---+-------+------
1 | John | 31
1 | Peter | 41
2 | Jim | 17
3 | Foo | 2
.
.
I have tried OPENJSON and CROSS APPLY but the following gives any combination of names and ages which is not correct, thus I need to to a lot of filtering afterwards
SELECT *
FROM Schema.table
CROSS APPLY OPENJSON(Schema.table,'$.names')
CROSS APPLY OPENJSON(Schema.table,'$.ages')

Here's my suggestion
DECLARE #tbl TABLE(id INT,json_col NVARCHAR(MAX));
INSERT INTO #tbl VALUES(1,N'{"names":["John","Peter"],"ages":["31","40"]}')
,(2,N'{"names":["Jim"],"ages":["17"]}');
SELECT t.id
,B.[key] As ValueIndex
,B.[value] AS PersonNam
,JSON_VALUE(A.ages,CONCAT('$[',B.[key],']')) AS PersonAge
FROM #tbl t
CROSS APPLY OPENJSON(t.json_col)
WITH(names NVARCHAR(MAX) AS JSON
,ages NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.names) B;
The idea in short:
We use OPENJSON with a WITH clause to read names and ages into new json variables.
We use one more OPENJSON to "explode" the names-array
As the key is the value's position within the array, we can use JSON_VALUE() to read the corresponding age-value by its position.
One general remark: If this JSON is under your control, you should change this to an entity-centered approach (array of objects). Such a position dependant storage can be quite erronous... Try something like
{"persons":[{"name":"John","age":"31"},{"name":"Peter","age":"40"}]}

Conditional Aggregation along with applying CROSS APPLY might be used :
SELECT id,
MAX(CASE WHEN RowKey = 'names' THEN value END) AS names,
MAX(CASE WHEN RowKey = 'ages' THEN value END) AS ages
FROM
(
SELECT id, Q0.[value] AS RowArray, Q0.[key] AS RowKey
FROM tab
CROSS APPLY OPENJSON(JsonCol) AS Q0
) r
CROSS APPLY OPENJSON(r.RowArray) v
GROUP BY id, v.[key]
ORDER BY id, v.[key]
id | names | ages
---+-------+------
1 | John | 31
1 | Peter | 41
2 | Jim | 17
3 | Foo | 2
Demo
The first argument for OPENJSON would be a JSON column value, but not a table itself

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |
SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

What would be a good way to pivot this data?

I have data in the following granularity:
CityID | Name | Post_Science | Post_Science | Post_Reading | Pre_Reading | Post_Writing | Pre_Writing
123 | Bob | 2.0 | 1.0 | 2.0 | 4.0 | 1.0 | 1.0
I'll be calling those <Post/Pre>_XXXXXX columns as Labels. Basically, these column names without the 'Pre' or 'Post' text map to a Label in another table.
I want to pivot the data in a way so that the pre and post values of the same Label are in the same row, for each group of CityID, Name, Label. So it would look like this:
CityID | Name | Pre Category | Post Category | Label
123 | Bob | 1.0 | 2.0 | Science
123 | Bob | 4.0 | 2.0 | Reading
123 | Bob | 1.0 | 1.0 | Writing
The Label comes from a separate table via a join. Hopefully that doesn't confuse anyone. If it does, ignore the column for now.
So there are much more of these categories - Science, Reading, and Writing are just a few I picked for example.
I've thought of two options to get the data in this format:
Unpivot all the data into a long list of all the values at a group of CityID, Name, Label. Then parse the Label name and pivot back into the pre and post values of one category into 1 row
Do a bunch of Unions. So select all Science in one select statement, all the Reading in another select statement and union them. There are about 50 pairings, so 50 union statements
I'm imagining the first option is cleaner than the latter. Any other options though?
This is unpivoting and I strongly recommend apply:
select t.CityId, t.Name, v.*
from t cross apply
(values (t.Post_Science, t.Pre_Science, 'Science'),
(t.Post_Reading, t.Pre_Reading, 'Reading'),
(t.Post_Writing, t.Pre_Writing, 'Writing')
) v(postcategory, precategory, label) ;
UNPIVOT is very particular syntax to do one thing. APPLY introduces lateral joins, which are very powerful for this and many other purposes.
Clearly Gordon's solution would be more performant, but if you have MANY or VARIABLE COLUMNS, here is an option that will dynamically UNPIVOT your data without actually using DYNAMIC SQL
Example
Select A.CityID
,A.Name
,PreCat = max(case when Item Like 'Pre%' then Value end)
,PostCat = max(case when Item Like 'Post%' then Value end)
,Label = substring(Item,charindex('_',Item+'_')+1,50)
From YourTable A
Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
Cross Apply (
Select Item = xAttr.value('local-name(.)', 'varchar(100)')
,Value = xAttr.value('.','varchar(max)')
From XMLData.nodes('//#*') xNode(xAttr)
Where xAttr.value('local-name(.)','varchar(100)') not in ('CityId','Name','Other-Columns','To-Exclude')
) C
Group By A.CityID
,A.Name
,substring(Item,charindex('_',Item+'_')+1,50)
Returns
CityID Name PreCat PostCat Label
123 Bob 4.0 2.0 Reading
123 Bob 1.0 2.0 Science
123 Bob 1.0 1.0 Writing

Using dynamic unpivot with columns with different types

i have a table with around 100 columns named F1, F2, ... F100.
I want to query the data row-wise, like this:
F1: someVal1
F2: someVal2
...
I am doing all this inside a SP, therefore, I am generating the sql dynamically.
I have successfully generated the following sql:
select CAST(valname as nvarchar(max)), CAST(valvalue as nvarchar(max)) from tbl_name unpivot
(
valvalue for valname in ([form_id], [F1],[F2],[F3],[F4],[F5],[F6],[F7],[F8],[F9],[F10],[F11],[F12],[F13],[F14],[F15],[F16],[F17],[F18],[F19],[F20],[F21],[F22],[F23],[F24],[F25],[F26],[F27],[F28],[F29],[F30],[F31],[F32],[F33],[F34],[F35],[F36],[F37],[F38],[F39],[F40],[F41],[F42],[F43],[F44],[F45],[F46],[F47],[F48],[F49],[F50],[F51],[F52],[F53],[F54],[F55],[F56],[F57],[F58],[F59],[F60],[F61],[F62],[F63],[F64],[F65],[F66],[F67],[F68],[F69],[F70],[F71],[F72],[F73],[F74],[F75],[F76],[F77],[F78],[F79],[F80],[F81],[F82],[F83],[F84],[F85])
) u
But on executing this query, I get this exception:
The type of column "F3" conflicts with the type of other columns
specified in the UNPIVOT list.
I guess this is because F3 is varchar(100) while form_id, F1 and F2 are varchar(50). According to my understanding, I shouldn't be getting this error because I am casting all the results to nvarchar(max) in the select statement.
This table has all kinds of columns like datetime, smallint and int.
Also, all the columns of this table except one have SQL_Latin1_General_CP1_CI_AS collaltion
What is the fix for this error ?
this solution is you must use a subquery to let all columns be the same type to have the same length.
Try to CAST the values in subquery then unpivot instead of select
select valname, valvalue
from (
SELECT
CAST([form_id] as nvarchar(max)) form_id,
CAST([F1] as nvarchar(max)) F1,
CAST([F2] as nvarchar(max)) F2,
CAST([F3] as nvarchar(max)) F3,
CAST([F4] as nvarchar(max)) F4,
....
FROM tbl_name
) t1 unpivot
(
valvalue for valname in ([form_id], [F1],[F2],[F3],[F4],[F5],[F6],[F7],[F8],[F9],[F10],[F11],[F12],[F13],[F14],[F15],[F16],[F17],[F18],[F19],[F20],[F21],[F22],[F23],[F24],[F25],[F26],[F27],[F28],[F29],[F30],[F31],[F32],[F33],[F34],[F35],[F36],[F37],[F38],[F39],[F40],[F41],[F42],[F43],[F44],[F45],[F46],[F47],[F48],[F49],[F50],[F51],[F52],[F53],[F54],[F55],[F56],[F57],[F58],[F59],[F60],[F61],[F62],[F63],[F64],[F65],[F66],[F67],[F68],[F69],[F70],[F71],[F72],[F73],[F74],[F75],[F76],[F77],[F78],[F79],[F80],[F81],[F82],[F83],[F84],[F85])
) u
In a simplest way I would use CROSS APPLY with VALUES to do unpivot
SELECT *
FROM People CROSS APPLY (VALUES
(CAST([form_id] as nvarchar(max))),
(CAST([F1] as nvarchar(max))),
(CAST([F2] as nvarchar(max))),
(CAST([F3] as nvarchar(max))),
(CAST([F4] as nvarchar(max))),
....
) v (valvalue)
Here is a sample about CROSS APPLY with VALUES to do unpivot
we can see there are many different types in the People table.
we can try to use cast to varchar(max), let columns be the same type.
CREATE TABLE People
(
IntVal int,
StringVal varchar(50),
DateVal date
)
INSERT INTO People VALUES (1, 'Jim', '2017-01-01');
INSERT INTO People VALUES (2, 'Jane', '2017-01-02');
INSERT INTO People VALUES (3, 'Bob', '2017-01-03');
Query 1:
SELECT *
FROM People CROSS APPLY (VALUES
(CAST(IntVal AS VARCHAR(MAX))),
(CAST(StringVal AS VARCHAR(MAX))),
(CAST(DateVal AS VARCHAR(MAX)))
) v (valvalue)
Results:
| IntVal | StringVal | DateVal | valvalue |
|--------|-----------|------------|------------|
| 1 | Jim | 2017-01-01 | 1 |
| 1 | Jim | 2017-01-01 | Jim |
| 1 | Jim | 2017-01-01 | 2017-01-01 |
| 2 | Jane | 2017-01-02 | 2 |
| 2 | Jane | 2017-01-02 | Jane |
| 2 | Jane | 2017-01-02 | 2017-01-02 |
| 3 | Bob | 2017-01-03 | 3 |
| 3 | Bob | 2017-01-03 | Bob |
| 3 | Bob | 2017-01-03 | 2017-01-03 |
Note
when you use unpivot need to make sure the unpivot columns date type are the same.
Many ways a cat can skin you, or vice-versa.
Jokes apart, what D-Shih suggested is what you should start with and may get you home and dry.
In a majority of cases;
Essentially the UNPIVOT operation is concatenating the data from multiple rows. Starting with a CAST operation is the best way forward as it makes the data types identical(preferably a string type like varchar or nvarchar), its also a good idea to go with the same length for all UNPIVOTED columns in addition to having the same type.
In other cases;
If this still does not solve the problem, then you need to look deeper and check whether the ANSI_Padding setting is ON or OFF across all columns on the table. In latter day versions of SQL server this is mostly ON by default, but some developers may customise certain columns to have ANSI_PADDING set to off. If you have a mixed setup like this its best to move the data to another table with ANSI_PADDING set to ON. Try using the same UNPIVOT query on that table and it should work.
Check ANSI_Padding Status
SELECT name
,CASE is_ansi_padded
WHEN 1 THEN 'ANSI_Padding_On'
ELSE 'ANSI_Padding_Off'
AS [ANSI_Padding_Check]
FROM sys.all_columns
WHERE object_id = object_id('yourschema.yourtable')
Many situations be better suited for CROSS APPLY VALUES. It all depends on you, the jockey to choose horses for courses.
Cheers.

Add Value column using another column as Key

Hopefully the table itself states the problem. Essentially with the Type column on the left, is it possible to add a unique code/value column using Type as a hash key/set based on the appearance orders of the types:
Type | Code
-----------
ADA | 1
ADA | 1
BIM | 2
BIM | 2
CUR | 3
BIM | 2
DEQ | 4
ADA | 1
... | ...
We can't simply hard-code the conversion as each time there's arbitrary number of Types.
You can use dense_rank():
select type, dense_rank() over (order by type) as code
from t;
However, I would advise you to create another table and to use that:
create table Types as (
select row_number() over (order by type) as TypeId,
type
from t
group by type;
Then, join that in:
select t.type, tt.TypeId
from t join
types tt
on t.type = tt.type;

Get even / odd / all numbers between two numbers

I want to display all the numbers (even / odd / mixed) between two numbers (1-9; 2-10; 11-20) in one (or two) column.
Example initial data:
| rang | | r1 | r2 |
-------- -----|-----
| 1-9 | | 1 | 9 |
| 2-10 | | 2 | 10 |
| 11-20 | or | 11 | 20 |
CREATE TABLE initialtableone(rang TEXT);
INSERT INTO initialtableone(rang) VALUES
('1-9'),
('2-10'),
('11-20');
CREATE TABLE initialtabletwo(r1 NUMERIC, r2 NUMERIC);
INSERT INTO initialtabletwo(r1, r2) VALUES
('1', '9'),
('2', '10'),
('11', '20');
Result:
| output |
----------------------------------
| 1,3,5,7,9 |
| 2,4,6,8,10 |
| 11,12,13,14,15,16,17,18,19,20 |
Something like this:
create table ranges (range varchar);
insert into ranges
values
('1-9'),
('2-10'),
('11-20');
with bounds as (
select row_number() over (order by range) as rn,
range,
(regexp_split_to_array(range,'-'))[1]::int as start_value,
(regexp_split_to_array(range,'-'))[2]::int as end_value
from ranges
)
select rn, range, string_agg(i::text, ',' order by i.ordinality)
from bounds b
cross join lateral generate_series(b.start_value, b.end_value) with ordinality i
group by rn, range
This outputs:
rn | range | string_agg
---+-------+------------------------------
3 | 2-10 | 2,3,4,5,6,7,8,9,10
1 | 1-9 | 1,2,3,4,5,6,7,8,9
2 | 11-20 | 11,12,13,14,15,16,17,18,19,20
Building on your first example, simplified, but with PK:
CREATE TABLE tbl1 (
tbl1_id serial PRIMARY KEY -- optional
, rang text -- can be NULL ?
);
Use split_part() to extract lower and upper bound. (regexp_split_to_array() would be needlessly expensive and error-prone). And generate_series() to generate the numbers.
Use a LATERAL join and aggregate the set immediately to simplify aggregation. An ARRAY constructor is fastest in this case:
SELECT t.tbl1_id, a.output -- array; added id is optional
FROM (
SELECT tbl1_id
, split_part(rang, '-', 1)::int AS a
, split_part(rang, '-', 2)::int AS z
FROM tbl1
) t
, LATERAL (
SELECT ARRAY( -- preserves rows with NULL
SELECT g FROM generate_series(a, z, CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END) g
) AS output
) a;
AIUI, you want every number in the range only if upper and lower bound are a mix of even and odd numbers. Else, only return every 2nd number, resulting in even / odd numbers for those cases. This expression implements the calculation of the interval:
CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END
Result as desired:
output
-----------------------------
1,3,5,7,9
2,4,6,8,10
11,12,13,14,15,16,17,18,19,20
You do not need WITH ORDINALITY in this case, because the order of elements is guaranteed.
The aggregate function array_agg() makes the query slightly shorter (but slower) - or use string_agg() to produce a string directly, depending on your desired output format:
SELECT a.output -- string
FROM (
SELECT split_part(rang, '-', 1)::int AS a
, split_part(rang, '-', 2)::int AS z
FROM tbl1
) t
, LATERAL (
SELECT string_agg(g::text, ',') AS output
FROM generate_series(a, z, CASE WHEN (z-a)%2 = 0 THEN 2 ELSE 1 END) g
) a;
Note a subtle difference when using an aggregate function or ARRAY constructor in the LATERAL subquery: Normally, rows with rang IS NULLare excluded from the result because the LATERAL subquery returns no row.
If you aggregate the result immediately, "no row" is transformed to one row with a NULL value, so the original row is preserved. I added demos to the fiddle.
SQL Fiddle.
You do not need a CTE for this, which would be more expensive.
Aside: The type conversion to integer removes leading / training white space automatically, so a string like this works as well for rank: ' 1 - 3'.