Redundant data to comma separated string - sql

I wish to convert redundant row values into a comma separated string to build a JSON. Here in my example the columns I need to convert to comma separated string is attrValueId, attrValue and name.
Please use the snippet to build the schema
CREATE TABLE t
([attrId] int, [displayPosition] int,
[attrValueId] int, [attrValue] varchar(30),
name varchar(30), attrName varchar(30),attrType varchar(30),
isRequired bit);
INSERT INTO t VALUES
(1,2,1,'123',NULL,'testattribute','dropdown',0);
INSERT INTO t VALUES
(1,2,2,'1234',NULL,'testattribute','dropdown',0);
INSERT INTO t VALUES
(3,1,6,'miuu2',NULL,'mult','multi-select',1);
INSERT INTO t VALUES
(3,1,7,'miuu3396',NULL,'mult','multi-select',1);
The table data is like
attrId displayPosition attrValueId attrValue name attrName attrType isRequired
1 2 1 123 NULL testattribute dropdown 0
1 2 2 1234 NULL testattribute dropdown 0
3 1 6 miuu2 NULL mult multi-select 1
3 1 7 miuu3396 NULL mult multi-select 1
My required result is
attrId displayPosition attrValueId attrValue name attrName attrType isRequired
1 2 1,2 1234,1234 NULL,NULL testattribute dropdown 0
3 1 6,7 miuu2,miuu3396 NULL,NULL mult multi-select 1
My ultimate aim is to construct a JSON string in the format
[
{"attrId":"1","displayPosition":"2","attrValueId":["1,2"],"attrValue":["1234,1234"],"name":["null","null"],"attrName":"testattribute","attrType":"dropdown","isRequired":"0"}
,
{second row goes here}]

Try using STRING_AGG() by doing GROUP BY over attrId & displayPosition
I have one sample for you.
SELECT
[attrId]
,[displayPosition]
,STRING_AGG([attrValueId],',') [attrValueId]
,STRING_AGG(name,',') name
,STRING_AGG([attrValue],',') [attrValue]
,STRING_AGG(attrType,',') attrType, isRequired
FROM t group by [attrId],[displayPosition],isRequired
I have assumed that [attrId],[displayPosition],isRequired uniquely represent one row.
For Old SQL Server versions, try STUFF()
Check the following threads:
Group By and STUFF combined result in sql server
How to use GROUP BY to concatenate strings in SQL Server?

Related

How to use the SQL REPLACE Function, so that it will replace some text between a certain range, rather than one specific value

I have a table called Product and I am trying to replace some of the values in the Product ID column pictured below:
ProductID
PIDLL0000074853
PIDLL000086752
PIDLL00000084276
I am familiar with the REPLACE function and have used this like so:
SELECT REPLACE(ProductID, 'LL00000', '/') AS 'Product Code'
FROM Product
Which returns:
Product Code
PID/74853
PIDLL000086752
PID/084276
There will always be there letter L in the ProductID twice LL. However, the zeros range between 4-6. The L and 0 should be replaced with a /.
If anyone could suggest the best way to achieve this, it would be greatly appreciate. I'm using Microsoft SQL Server, so standard SQL syntax would be ideal.
Please try the following solution.
All credit goes to #JeroenMostert
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ProductID VARCHAR(50));
INSERT INTO #tbl (ProductID) VALUES
('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PITLL0000084770');
-- DDL and sample data population, end
SELECT *
, CONCAT(LEFT(ProductID,3),'/', CONVERT(DECIMAL(38, 0), STUFF(ProductID, 1, 5, ''))) AS [After]
FROM #tbl;
Output
+----+------------------+-----------+
| ID | ProductID | After |
+----+------------------+-----------+
| 1 | PIDLL0000074853 | PID/74853 |
| 2 | PIDLL000086752 | PID/86752 |
| 3 | PIDLL00000084276 | PID/84276 |
| 4 | PITLL0000084770 | PIT/84770 |
+----+------------------+-----------+
This isn't particularly pretty in T-SQL, as it doesn't support regex or even pattern replacement. Therefore you method is to use things like CHARINDEX and PATINDEX to find the start and end positions and then replace (don't read REPLACE) that part of the text.
This uses CHARINDEX to find the 'LL', and then PATINDEX to find the first non '0' character after that position. As PATINDEX doesn't support a start position I have to use STUFF to remove the first characters.
Then, finally, we can use STUFF (again) to replace the length of characters with a single '/':
SELECT STUFF(V.ProductID,CI.I+2,ISNULL(PI.I,0),'/')
FROM (VALUES('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PIDLL3246954384276'))V(ProductID)
CROSS APPLY(VALUES(NULLIF(CHARINDEX('LL',V.ProductID),0)))CI(I)
CROSS APPLY(VALUES(NULLIF(PATINDEX('%[^0]%',STUFF(V.ProductID,1,CI.I+2,'')),1)))PI(I);
If you are always starting with "PIDLL", you can just remove the "PIDLL", cast the rest as an INT to lose the leading 0's, then append the front of the string with "PID/". One line of code.
-- Sample Data
DECLARE #t TABLE (ProductID VARCHAR(40));
INSERT #t VALUES('PIDLL0000074853'),('PIDLL000086752'),('PIDLL00000084276');
-- Solution
SELECT t.ProductID, NewProdID = 'PID/'+LEFT(CAST(REPLACE(t.ProductID,'PIDLL','') AS INT),20)
FROM #t AS t;
Returns:
ProductID NewProdID
------------------ ----------------
PIDLL0000074853 PID/74853
PIDLL000086752 PID/86752
PIDLL00000084276 PID/84276

Capturing particular part of Integer Value from part of a String value

I have a table like cust_attbr consists column attbr which has values like:
{"SRCTAXAMT":"11300",เอ็ก10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
........ ... ...
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}
So, I have to write one select statement which will fetch only VAT_NUMBER value like:
0835546003122
0835546003122
.... ... ..
null
With sample data you posted:
SQL> select * From test;
ID ATTBR
---------- ----------------------------------------------------------------------------------------------------------------
1 "{"SRCTAXAMT":"11300",????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
2 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
3 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}"
this might be one option:
SQL> select id,
2 regexp_substr(regexp_substr(attbr, 'VAT_NUMBER":"(\d+)?'), '\d+$') vat
3 from test;
ID VAT
---------- --------------------
1 0835546003122
2 0835546003122
3
SQL>
Inner regexp_substr returns VAT_NUMBER followed by optional number, while the outer one extracts only the number anchored to the end of the previous substring.
If you're on 18c and the data is actual json (it currently is not because of the double quotes around the curly braces and the ",.กรุงค10110" - It is unclear that this is because of your sample data) you could use json_table function:
WITH t (json_val) AS
(
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}' FROM DUAL
)
SELECT jt.*
FROM t,
JSON_TABLE(json_val, '$'
COLUMNS (first_name VARCHAR2(50 CHAR) PATH '$."VAT_NUMBER"')) jt;
0835546003122
0835546003122
One option would be converting those column values to JSON syntax an then extract the values of VAT_NUMBER keys provided DB version is 12c Release 1+. Here, we have an issue that there are unrecognized characters, probably an alphabet from far east and those strings are not properly quoted, then we need to remove the part upto TAXAMT key, and then extracting VAT_NUMBER key's value through prefixing an opening curly brace('{') by use of JSON_VALUE() function :
SELECT JSON_VALUE(
'{'||REGEXP_REPLACE(str,'(.*10110",)(.*)','\2'),
'$.VAT_NUMBER'
) AS VAT_NUMBER
FROM tab --> your original data source
Demo

Split a column in two based based on variable lenght field

Hi: I have a table made with rows like this:
ID_CATEGORIA CATEGORIA_DRG
------------ ---------------------------------------------------------------
1 001-002-003-543 Craniotomia
2 004-531-532 Interventi midollo spinale
3 005-533-534 Interventi vasi extracranici
4 006 Decompressione tunnel carpale
I'd like to get something like this:
ID CATEGORIA DESCRIZIONE
------------ ------------------ --------------------------------------
1 001-002-003-543 Craniotomia
2 004-531-532 Interventi midollo spinale
3 005-533-534 Interventi vasi extracranici
4 006 Decompressione tunnel carpale
I don't need to alter the table, a 'formatted' query can be enough.
I Think SUBSTRING() is the right function for me, but I don't know how to mesaure the lenght of the first (numbers, dash-separated) field.
In Python I'll find that size with len("005-533-534 Interventi vasi extracranici".split(' ')[0])', but I don't have idea about how to write it in SQL
Something like this should do -
SELECT ID_CATEGORIA AS ID ,SUBSTRING(CATEGORIA_DRG,1,CHARINDEX(' ',CATEGORIA_DRG)) as CATEGORIA,SUBSTRING(CATEGORIA_DRG,CHARINDEX(' ',CATEGORIA_DRG),LEN(CATEGORIA_DRG)) AS DESCRIZIONE
FROM TABLENAME
Try this:
select id_categoria ID,
substring(categoria_drg, 1, idx) CATEGORIA,
substring(categoria_drg, idx + 1, 1000) DESCRIZIONE
from (
select id_categoria, categoria_drg, charindex(' ', categoria_drg) idx from my_table
) a
It uses charindex to detect when the code is finished, because it is followed by first space in the string, which the function finds :)

Sql Search in a string for multiple words, knowing that some of them might not exist

I have a table full of strings, I want to search for a bunch of words, and retrieve the strings that contains at least 1 of them, for example:
The words I want to search are "cat", "dog" and "bat".
And some strings in a table are:
1 "hi cat, I like dogs and owls"
2 "hi cat"
3 "hey bat, cat, owl, dog"
4 "orange is sweet"
This query should return 1, 2, & 3, but not 4.
I know join might be a solution, but I have no idea how to implement it with more than 6 words for instance.
Edit:
Is there a way I can sort the results by the biggest match? meaning I want first results to be the ones that contains most of the keys, if any, that's why I ignored the LIKE-OR combination.
You can use PATINDEX function, wchich returns index of a match in string. If there's no match it returns 0, so try the following script:
declare #x table(
ID int identity(1,1),
name varchar(30)
)
insert into #x values ('hi cat, I like dogs and owls'), ('hi cat'),('hey bat, cat, owl, dog'),('orange is sweet')
select ID from #x where PATINDEX('%cat%', name) + PATINDEX('%dog%', name) + PATINDEX('%bat%', name) > 0
Another solution would be using LIKE operator:
select ID from #x where name LIKE '%cat%' or name LIKE '%dog%' or name LIKE '%bat%'
Answering your edit, first approach is the best. You achieve desired result by following query (unfortunately it gets little complicated):
select ID from #x
where PATINDEX('%cat%', name) + PATINDEX('%dog%', name) + PATINDEX('%bat%', name) > 0
order by case PATINDEX('%cat%', name) when 0 then 0 else 1 end +
case PATINDEX('%dog%', name) when 0 then 0 else 1 end +
case PATINDEX('%bat%', name) when 0 then 0 else 1 end
desc
DROP table #TEMP
create table #Temp
(
string Varchar(50),
)
insert into #Temp (string) values
('hi cat, I like dogs and owls'),
('hi cat'),
('orange is sweet'),
('hey bat, cat, owl, dog')
SELECT * FROM #Temp WHERE string LIKE '%cat%'

How to delete character from list

I have row,
example : 1,2,3,5,9,7 -> not in (3,7)
(This character need to delete -> result select 1,2,5,9.
How do it ?
For example :
drop table test.table_4;
create table test.table_4 (
id integer,
list_id text
);
insert into test.table_4 values(1,'1,2,3,5,9,7');
insert into test.table_4 values(2,'1,2,3,5');
insert into test.table_4 values(3,'7,9');
insert into test.table_4 values(5,'1,2');
insert into test.table_4 values(9,'1');
insert into test.table_4 values(7,'5,7,9');
query :
select list_id from test.table_4 where id not in (3,7) --return 4 row
id list_id
1. 1 '1,2,3,5,9,7'
2. 2 '1,2,3,5'
3. 5 '1,2'
4. 9 '1'
How to remove 3 and 7 in row 1 and 2 ?
id
1. 1 '1,2,5,9'
2. 2 '1,2,5'
3. 5 '1,2'
4. 9 '1'
The following should deal with 3 or 7 at the start of the string, at the end of the string, or anywhere in the middle. It also ensures that the 3 in 31 and the 7 in 17 don't get replaced:
select
list_id,
regexp_replace(list_id, '(^[37],|,[37](,)|,[37]$)', '\2', 'g')
from test.table_4
where id not in (3,7)
Explanation:
^[37], matches a 3 or 7 followed by a comma at the start of the string. This should be replaced with nothing.
,[37](,) matches a ,3, or ,7, in the middle of the string. This needs to be replaced with a single comma, which is captured by the brackets around it.
[37]$ matches a 3 or 7 preceded by a comma at the end of the string. This should be replaced with nothing.
\2 is used to replace the string - this is , for the second case above, and empty for cases 1 and 3.
You could use the following statements to update all of the records. In the below example the first statement will remove any ,7 found. Then you execute the next statement to find any sting that has the 7 in the front of the string.
UPDATE test.table_4 SET list_id = REPLACE(list_id, ',7', '')
UPDATE test.table_4 SET list_id = REPLACE(list_id, '7', '')
If you also want to remove all occurrences of 3 then execute the following statements:
UPDATE test.table_4 SET list_id = REPLACE(list_id, ',3', '')
UPDATE test.table_4 SET list_id = REPLACE(list_id, '3', '')
However, it is a bad design to store values that you need to search agianst, work with, and etc in a string.
You can use regexp_replace to get the expected output:
select id, regexp_replace(list_id,'3,|,7', '','g')
from table_4
where id not in (3,7)
Output:
id regexp_replace
1 1,2,5,9
2 1,2,5
5 1,2
9 1
Here is the SQL Fiddle