Comma separated string to JSON object - sql

I need to update/migrate a table IdsTable in my SQL Server database which has the following format:
+----+------------------+---------+
| id | ids | idType |
+----+------------------+---------+
| 1 | id11, id12, id13 | idType1 |
| 2 | id20 | idType2 |
+----+------------------+---------+
The ids column is a comma separate list of ids. I need to combine the ids and idType column to form a single JSON string for each row and update the ids column with that object.
The JSON object has the following format:
{
"idType": string,
"ids": string[]
}
Final table after transforming/migrating data should be:
+----+-----------------------------------------------------+---------+
| id | ids | idType |
+----+-----------------------------------------------------+---------+
| 1 | {"idType": "idType1","ids": ["id11","id12","id13"]} | idType1 |
| 2 | {"idType": "idType2","ids": ["id20"]} | idType2 |
+----+-----------------------------------------------------+---------+
The best I've figured out so far is to get the results into a format where I could GROUP BY id to try and get the correct JSON format:
SELECT X.id, Y.value, X.idType
FROM
IdsTable AS X
CROSS APPLY STRING_SPLIT(X.ids, ',') AS Y
Which gives me the results:
+----+------+---------+
| id | ids | idType |
+----+------+---------+
| 1 | id11 | idType1 |
| 1 | id12 | idType1 |
| 1 | id13 | idType1 |
| 2 | id20 | idType2 |
+----+------+---------+
But I'm not familiar enough with SQL Server JSON to move forward.

If it's a one-off op I think I'd just do it the dirty way:
UPDATE table SET ids =
CONCAT('{"idType": "', idType, '","ids": ["', REPLACE(ids, ', ', '","'), '"]}'
You might need to do some prep first, like if your ids column can look like:
id1,id2,id3
id4, id5, id6
id7 ,id8 , id9
etc, a series of replacements like:
UPDATE table SET ids = REPLACE(ids, ' ,', ',') WHERE ids LIKE '% ,%'
UPDATE table SET ids = REPLACE(ids, ', ', ',') WHERE ids LIKE '%, %'
Keep running those until they don't update any more records
ps; if you've removed all the spaces from around the commas, you'll need to tweak the REPLACE in the original query - I specified ', ' as the needle

I found this blog post that helped me construct my answer:
-- Create Temporary Table
SELECT
[TAB].[id], [TAB].[ids],
(
SELECT [STRING_SPLIT_RESULTS].value as [ids], [TAB].[idType] as [idType]
FROM [IdsTable] AS [REQ]
CROSS APPLY STRING_SPLIT([REQ].[ids],',') AS [STRING_SPLIT_RESULTS]
FOR JSON PATH
) as [newIds]
INTO [#TEMP_RESULTS]
FROM [IdsTable] AS [TAB]
-- Update rows
UPDATE [IdsTable]
SET [ids] = [#TEMP_RESULTS].[newIds]
FROM [#TEMP_RESULTS]
WHERE [IdsTable].[Id] = [#TEMP_RESULTS].[Id]
-- Delete Temporary Table
DROP TABLE [#TEMP_RESULTS]
Which replaces those ids column (not replaced below for comparison):
+----+----------------+---------+------------------------------------------------------------------------------------------------------+
| id | ids | idType | newIds |
+----+----------------+---------+------------------------------------------------------------------------------------------------------+
| 1 | id11,id12,id13 | idType1 | [{"id":"id11","idType":"idType1"},{"id":"id12","idType":"idType1"},{"id":"id13","idType":"idType1"}] |
| 2 | id20 | idType2 | [{"id":"id20","idType":"idType2"}] |
+----+----------------+---------+------------------------------------------------------------------------------------------------------+
This is more verbose that I wanted but considering the table size and the number of ids stored in the ids column which translates to the size of the JSON object, this is fine for me.

Related

array clustering with unique identifier for file datasets

I have a dataset with big int array column in s3 and I want to filter rows efficiently based on array values. I know we can use gin index in sql table but need solution to work on s3 dataset. I am planning to use cluster id for each combinations of elements in array (as their cardinality is not huge. max 2500) and then store it as new column on which later on filter can applied.
Example,
Table A
+------+------+-----------+
| Col1 | Col2 | Col3 |
+------+------+-----------+
| 1 | 101 | [123,234] |
| 2 | 102 | [123] |
| 3 | 103 | [234,345] |
+------+------+-----------+
I am trying to add new column like,
Table B (column Col3 will be removed from actual schema)
+------+------+-----------+-----------+
| Col1 | Col2 | Col3 | Cid |
+------+------+-----------+-----------+
| 1 | 101 | [123,234] | 1 |
| 2 | 102 | [123] | 2 |
| 3 | 103 | [234,345] | 3 |
+------+------+-----------+-----------+
and there will be another table of mapping for col3 and Cid like,
Table C
+-----------+-----+
| Col3 | Cid |
+-----------+-----+
| [123,234] | 1 |
| [123] | 2 |
| [234,345] | 3 |
+-----------+-----+
This table C will be added a new entry if a new combination is created and B will be updated if any array element gets added or removed. Goal is to be able to filter out records from Table A based on values in array column efficiently. Queries like
123 = Any(Col3) can be served as Cid = 2 or queries like [123, 345] = Any(Col3) can be served as Cid in (2,3).
Is there any better way to do solve this problem?
Also I am thinking of creating required combinations at runtime to limit number of combinations. Is it a good idea to create minimum combinations?
In Postgres, you can create the table and use join to calculate the values:
create table array_dim as
select col3 as arr, row_number() over (order by min(col1)) as array_id
from t
group by col3;
You can then add the new column:
select a.*, ad.array_id
from a join
array_dim ad
on a.col3 = ad.arr

Casting string to int i.e. the string "res"

I have a column in a table which is type array<string>. The table is partitioned daily since 2018-01-01. At some stage, the values in the array goes from strings to integers. The data looks like this:
| yyyy_mm_dd | h_id | p_id | con |
|------------|-------|------|---------------|
| 2018-10-01 | 52988 | 1 | ["res", "av"] |
| 2018-10-02 | 52988 | 1 | ["1","2"] |
| 2018-10-03 | 52988 | 1 | ["1","2"] |
There is a mapping between the strings and integers. "res" maps to 1 and "av" maps to 2 etc. However, I've written a query to perform some logic. Here is a snippet (subquery) of it:
SELECT
t.yyyy_mm_dd,
t.h_id,
t.p_id,
CAST(e.con AS INT) AS api
FROM
my_table t
LATERAL VIEW EXPLODE(con) e AS con
My problem is that this doesn't work for the earlier dates when strings were used instead of integers. Is there anyway to select con and remap the strings to integers so the data is across all partitions?
Expected output:
| yyyy_mm_dd | h_id | p_id | con |
|------------|-------|------|---------------|
| 2018-10-01 | 52988 | 1 | ["1","2"] |
| 2018-10-02 | 52988 | 1 | ["1","2"] |
| 2018-10-03 | 52988 | 1 | ["1","2"] |
Once the values selected are all integers (within a string array), then the CAST(e.con AS INT) will work
Edit: To clarify, I will put the solution as a subquery before I use lateral view explode. This way I am exploding on a table where all partitions have integers in con. I hope this makes sense.
CAST(e.api as INT) returns NULL if not possible to cast. collect_list will collect an array including duplicates and without NULLs. If you need array without duplicated elements, use collect_set().
SELECT
t.yyyy_mm_dd,
t.h_id,
t.p_id,
collect_list(--array of integers
--cast case as string if you need array of strings
CASE WHEN e.api = 'res' THEN 1
WHEN e.api = 'av' THEN 2
--add more cases
ELSE CAST(e.api as INT)
END
) as con
FROM
my_table t
LATERAL VIEW EXPLODE(con) e AS api
GROUP BY t.yyyy_mm_dd, t.h_id, t.p_id

How do I update a column from a table with data from a another column from this same table?

I have a table "table1" like this:
+------+--------------------+
| id | barcode | lot |
+------+-------------+------+
| 0 | ABC-123-456 | |
| 1 | ABC-123-654 | |
| 2 | ABC-789-EFG | |
| 3 | ABC-456-EFG | |
+------+-------------+------+
I have to extract the number in the center of the column "barcode", like with this request :
SELECT SUBSTR(barcode, 5, 3) AS ToExtract FROM table1;
The result:
+-----------+
| ToExtract |
+-----------+
| 123 |
| 123 |
| 789 |
| 456 |
+-----------+
And insert this into the column "lot" .
follow along the lines
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
i.e in your case
UPDATE table_name
SET lot = SUBSTR(barcode, 5, 3)
WHERE condition;(if any)
UPDATE table1 SET Lot = SUBSTR(barcode, 5, 3)
-- WHERE ...;
Many databases support generated (aka "virtual"/"computed" columns). This allows you to define a column as an expression. The syntax is something like this:
alter table table1 add column lot varchar(3) generated always as (SUBSTR(barcode, 5, 3))
Using a generated column has several advantages:
It is always up-to-date.
It generally does not occupy any space.
There is no overhead when creating the table (although there is overhead when querying the table).
I should note that the syntax varies a bit among databases. Some don't require the type specification. Some use just as instead of generated always as.
CREATE TABLE Table1(id INT,barcode varchar(255),lot varchar(255))
INSERT INTO Table1 VALUES (0,'ABC-123-456',NULL),(1,'ABC-123-654',NULL),(2,'ABC-789-EFG',NULL)
,(3,'ABC-456-EFG',NULL)
UPDATE a
SET a.lot = SUBSTRING(b.barcode, 5, 3)
FROM Table1 a
INNER JOIN Table1 b ON a.id=b.id
WHERE a.lot IS NULL
id | barcode | lot
-: | :---------- | :--
0 | ABC-123-456 | 123
1 | ABC-123-654 | 123
2 | ABC-789-EFG | 789
3 | ABC-456-EFG | 456
db<>fiddle here

Pivot SSRS Dataset

I have a dataset which looks like so
ID | PName | Node | Val |
1 | Tag | Name | XBA |
2 | Tag | Desc | Dec1 |
3 | Tag | unit | Int |
6 | Tag | tids | 100 |
7 | Tag | post | AAA |
1 | Tag | Name | XBB |
2 | Tag | Desc | Des9 |
3 | Tag | unit | Float |
7 | Tag | post | BBB |
6 | Tag | tids | 150 |
I would like the result in my report to be
Name | Desc | Unit | Tids | Post |
XBA | Dec1 | int | 100 | AAA |
XBB | Des9 | Float | 150 | BBB |
I have tried using a SSRS Matrix with
Row: PName
Data: Node
Value: Val
The results were simply one row with Name and next row with desc and next with unit etc. Its not all in the same rows and also the second row was missing. This is possibly because there is no grouping on the dataset.
What is a good way of achieving the expected results?
I would not recommend this for a production scenario but if you need to knock out a report quickly or something you can try this. I would just not feel comfortable that the order of the records you get will always be what you expect.
You COULD try to insert the results of the SP into a table (regular table, temp table, table variable...doesn't matter really as long as you can get an identity column added). Assuming that the rows always come out in the correct order (which is probably not a valid assumption 100% of the time) then add an identity column on the table to get a unique row number for each row. From there you should be able to write some math logic to "group" your values together and then pivot out what you want.
create table #temp (ID int, PName varchar(100), Node varhar(100), Val varchar(100))
insert #temp exec (your stored proc)
alter table #temp add UniqueID int identity
then use UniqueID (modulo on 5 perhaps?) to group records together and then pivot

Multiple Selection in T-SQL and Transpose one cells to another's rows

I have a project that has dynamic field details for an asp page. When the page is executed, the database is queried for the list of dynamic fields and then all the controls are created at runtime on the asp page. Now all these dynamic fields have corresponding data in another (or say more than one) tables.
The data in the Dynamic Control Detail table is like this:
FieldID | FieldName | MappingTableName | MappingColumnName
-------------------------------------------------------------------------
92 | txtPrsnFirstName | Table1 | FirstName
93 | txtPrsnLastName | Table1 | LastName
94 | ddlPrsnGender | Table2 | Gender
95 | txtCompany | Table2 | Company
96 | txtDesignation | Table1 | Designation
The corresponding datatable's data is as follows:
Table1:
PersonID | FirstName | LastName | Designation
-----------------------------------------
1 | Person1 | SomeName | Manager
2 | Person2 | MoreName | Executive
Table2:
PersonID | Gender | Company
--------------------------------------
1 | Male | ABC Cons
2 | Female | XYZ PVT.LTD
Now, currently what I have done is, created a stored proc, that returns all three tables data at once and I would fetch each one of them in a DataSet then I would iterate through the FieldTable data first and then fetch the corresponding data from Table1 and Table2
I need to write a query that fetches the data in a way that the data from Table1 and Table2 will get transposed in form of multiple rows (Fetches only one record at a time) something like this.
FieldID | FieldName | MappingTableName | MappingColumnName | Data
---------------------------------------------------------------------------------------
92 | txtPrsnFirstName | Table1 | FirstName | Person1
93 | txtPrsnLastName | Table1 | LastName | SomeName
94 | ddlPrsnGender | Table2 | Gender | Male
95 | txtCompany | Table2 | Company | ABC Cons
96 | txtDesignation | Table1 | Designation | Manager
I would add a hidden column (I will call it PersonMap) in the Field-Table which indicates a mapping between FieldID and PersonID (meaning: Field with ID XY shows data from Person Z).
Now create a function like (my TSQL is a little rusty - therefore pseudo-SQL):
create function returnPersonData(in tablename varchar, in columnname varchar, in personID varchar) RETURNS VARCHAR
BEGIN
declare res varchar;
set res = EXEC('SELECT %columnname% FROM %tablename% WHERE PersonID = %personID%');
return res;
END
Now use the created function for your Data_column. The FROM statement is your FieldTable + the PersonMap-column:
SELECT FieldID, FieldName, MappingTableName, MappingColumnName, (returnPersonData(MappingTableName, MappingColumnName, PersonMap) AS Data)
FROM (SELECT FieldID, FieldName, MappingTableName, MappingColumnName, (SELECT PersonID from somewhere) AS PersonMap FROM FieldTable ... )
I hope I covered you question... greetings.