Split column value into substrings and search for pattern in SQL - sql

I have a table like this:
campaign
code
AL2330GH_HDKASL_QCLKP
NULL
JPDJK34_QPKSLL_QKPAL
NULL
QCK32_SDSKDS_TLLKA
NULL
I want to update the above table by populating the column 'code' with a substring in column 'campaign' which starts with 'AL', 'QC', or 'QP'. All the column values have 3 substrings separated by an '_'. If none of the substrings matches with the provided values, then keep the 'code' column value as NULL. And if multiple matches happen, take the first substring.
Desired Output:
campaign
code
AL2330GH_HDKASL_QCLKP
AL2330GH
JPDJK34_QPKSLL_QKPAL
QPKSLL
QCK32_SDSKDS_TLLKA
QCK32
Link to try out the problem: https://www.db-fiddle.com/f/8qoFDL1RmjwpwFNP3LP4eK/1

Here's a method using OPENJSON():
;WITH src AS
(
SELECT campaign, value, code,
rn = ROW_NUMBER() OVER (PARTITION BY campaign ORDER BY [key])
FROM
(
SELECT campaign, [key], value, code
FROM dbo.SomeTable
CROSS APPLY OPENJSON(CONCAT('["',
REPLACE(STRING_ESCAPE(campaign,'JSON'),'_','","'),'"]')) AS j
) AS x WHERE LEFT(value,2) IN ('AL','QC','QP')
)
UPDATE src SET code = value WHERE rn = 1;
Example db<>fiddle

You can try to use STRING_SPLIT with CROSS APPLY and ROW_NUMBER window function to make it.
CHARINDEX function will find the first match position value then we can put the split value in the first parameter, then we can find which string the first appearance.
SELECT campaign,value
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn
FROM mainTable t1
CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
WHERE (value LIKE 'AL%'
OR value LIKE 'QC%'
OR value LIKE 'QP%')
) t1
WHERE rn = 1
If you want to UPDATE values you can try UPDATE like this.
UPDATE t1
SET
code = value
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn
FROM mainTable t1
CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
WHERE (value LIKE 'AL%'
OR value LIKE 'QC%'
OR value LIKE 'QP%')
) t1
WHERE rn = 1
sqlfiddle

Please try the following solution.
It is using XML and XQuery for tokenization. XML/XQuery data model is based on ordered sequences. Exactly what we need for the scenario.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, campaign varchar(50), code varchar(20));
INSERT INTO #tbl (campaign, code) VALUES
('AL2330GH_HDKASL_QCLKP', NULL),
('JPDJK34_QPKSLL_QKPAL', NULL),
('QCK32_SDSKDS_TLLKA', NULL);
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = '_';
UPDATE t
SET code = c.query('
for $x in /root/r[substring(text()[1],1,2)=("AL","QC","QP")]
return $x').value('(/r/text())[1]', 'VARCHAR(20)')
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(campaign, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-----------------------+----------+
| id | campaign | Code |
+----+-----------------------+----------+
| 1 | AL2330GH_HDKASL_QCLKP | AL2330GH |
| 2 | JPDJK34_QPKSLL_QKPAL | QPKSLL |
| 3 | QCK32_SDSKDS_TLLKA | QCK32 |
+----+-----------------------+----------+

Related

Compare two rows (both with different ID) & check if their column values are exactly the same. All rows & columns are in the same table

I have a table named "ROSTER" and in this table I have 22 columns.
I want to query and compare any 2 rows of that particular table with the purpose to check if each column's values of that 2 rows are exactly the same. ID column always has different values in each row so I will not include ID column for the comparing. I will just use it to refer to what rows will be used for the comparison.
If all column values are the same: Either just display nothing (I prefer this one) or just return the 2 rows as it is.
If there are some column values not the same: Either display those column names only or display both the column name and its value (I prefer this one).
Example:
ROSTER Table:
ID
NAME
TIME
1
N1
0900
2
N1
0801
Output:
ID
TIME
1
0900
2
0801
OR
Display "TIME"
Note: Actually I'm okay with whatever result or way of output as long as I can know in any way that the 2 rows are not the same.
What are the possible ways to do this in SQL Server?
I am using Microsoft SQL Server Management Studio 18, Microsoft SQL Server 2019-15.0.2080.9
Please try the following solution based on the ideas of John Cappelletti. All credit goes to him.
SQL
-- DDL and sample data population, start
DECLARE #roster TABLE (ID INT PRIMARY KEY, NAME VARCHAR(10), TIME CHAR(4));
INSERT INTO #roster (ID, NAME, TIME) VALUES
(1,'N1','0900'),
(2,'N1','0801')
-- DDL and sample data population, end
DECLARE #source INT = 1
, #target INT = 2;
SELECT id AS source_id, #target AS target_id
,[key] AS [column]
,source_Value = MAX( CASE WHEN Src=1 THEN Value END)
,target_Value = MAX( CASE WHEN Src=2 THEN Value END)
FROM (
SELECT Src=1
,id
,B.*
FROM #roster AS A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
WHERE id=#source
UNION ALL
SELECT Src=2
,id = #source
,B.*
FROM #roster AS A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
WHERE id=#target
) AS A
GROUP BY id, [key]
HAVING MAX(CASE WHEN Src=1 THEN Value END)
<> MAX(CASE WHEN Src=2 THEN Value END)
AND [key] <> 'ID' -- exclude this PK column
ORDER BY id, [key];
Output
+-----------+-----------+--------+--------------+--------------+
| source_id | target_id | column | source_Value | target_Value |
+-----------+-----------+--------+--------------+--------------+
| 1 | 2 | TIME | 0900 | 0801 |
+-----------+-----------+--------+--------------+--------------+
A general approach here might be to just aggregate over the entire table and report the state of the counts:
SELECT
CASE WHEN COUNT(DISTINCT ID) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [ID same],
CASE WHEN COUNT(DISTINCT NAME) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [NAME same],
CASE WHEN COUNT(DISTINCT TIME) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [TIME same]
FROM yourTable;

How to show only the latest record in SQL

I have this issue where I want to show only the latest record (Col 1). I deleted the date column thinking that it might not work if it has different values. but if that's the case, then the record itself has a different name (Col 1) because it has a different date in the name of it.
Is it possible to fetch one record in this case?
The code:
SELECT distinct p.ID,
max(at.Date) as date,
at.[RAPID3 Name] as COL1,
at.[DLQI Name] AS COL2,
at.[HAQ-DI Name] AS COL3,
phy.name as phyi,
at.State_ID
FROM dbo.[Assessment Tool] as at
Inner join dbo.patient as p on p.[ID] = at.[Owner (Patient)_Patient_ID]
Inner join dbo.[Physician] as phy on phy.ID = p.Physician_ID
where (at.State_ID in (162, 165,168) and p.ID = 5580)
group by
at.[RAPID3 Name],
at.[DLQI Name],
at.[HAQ-DI Name],
p.ID, phy.name,
at.State_ID
SS:
In this SS I want to show only the latest record (COL 1) of this ID "5580". Means the first row for this ID.
Thank you
The Most Accurate way to handle this.
Extract The Date.
Than use Top and Order.
create table #Temp(
ID int,
Col1 Varchar(50) null,
Col2 Varchar(50) null,
Col3 Varchar(50) null,
Phyi Varchar(50) null,
State_ID int)
Insert Into #Temp values(5580,'[9/29/2021]-[9.0]High Severity',null,null,'Eman Elshorpagy',168)
Insert Into #Temp values(5580,'[10/3/2021]-[9.3]High Severity',null,null,'Eman Elshorpagy',168)
select top 1 * from #Temp as t
order by cast((Select REPLACE((SELECT REPLACE((SELECT top 1 Value FROM STRING_SPLIT(t.Col1,'-')),'[','')),']','')) as date) desc
This is close to ANSI standard, and it also caters for the newest row per id.
The principle is to use ROW_NUMBER() using a descending order on the date/timestamp (using a DATE type instead of a DATETIME and avoiding the keyword DATE for a column name) in one query, then to select from that query using the result of row number for the filter.
-- your input, but 2 id-s to show how it works with many ..
indata(id,dt,col1,phyi,state_id) AS (
SELECT 5580,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5580,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
)
-- real query starts here, replace following comman with "WITH" ...
,
with_rank AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rank_id
FROM indata
)
SELECT
id
, dt
, col1
, phyi
, state_id
FROM with_rank
WHERE rank_id=1
;
id | dt | col1 | phyi | state_id
------+------------+-----------------------------------+-----------------+----------
5580 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168
5581 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168

Append values from 2 different columns in SQL

I have the following table
I need to get the following output as "SVGFRAMXPOSLSVG" from the 2 columns.
Is it possible to get this appended values from 2 columns
Please try this.
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tblName
FOR XML PATH('')
), 1, 0, '')
For Example:-
Declare #tbl Table(
id INT ,
DEPART_AIRPORT_CODE Varchar(50),
ARRIVE_AIRPORT_CODE Varchar(50),
value varchar(50)
)
INSERT INTO #tbl VALUES(1,'g1','g2',NULL)
INSERT INTO #tbl VALUES(2,'g2','g3',NULL)
INSERT INTO #tbl VALUES(3,'g3','g1',NULL)
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tbl
FOR XML PATH('')
), 1, 0, '')
Summary
Use Analytic functions and listagg to get the job done.
Detail
Create two lists of code_id and code values. Match the code_id values for the same airport codes (passengers depart from the same airport they just arrived at). Using lag and lead to grab values from other rows. NULLs will exist for code_id at the start and end of the itinerary. Default the first NULL to 0, and the last NULL to be the previous code_id plus 1. A list of codes will be produced, with a matching index. Merge the lists together and remove duplicates by using a union. Finally use listagg with no delimiter to aggregate the rows onto a string value.
with codes as
(
select
nvl(lag(t1.id) over (order by t1.id),0) as code_id,
t1.depart_airport_code as code
from table1 t1
union
select
nvl(lead(t1.id) over (order by t1.id)-1,lag(t1.id) over (order by t1.id)+1) as code_id,
t1.arrive_airport_code as code
from table1 t1
)
select
listagg(c.code,'') WITHIN GROUP (ORDER BY c.code_id) as result
from codes c;
Note: This solution does rely on an integer id field being available. Otherwise the analytic functions wouldn't have a column to sort by. If id doesn't exist, then you would need to manufacture one based on another column, such as a timestamp or another identifier that ensures the rows are in the correct order.
Use row_number() over (order by myorderedidentifier) as id in a subquery or view to achieve this. Don't use rownum. It could give you unpredictable results. Without an ORDER BY clause, there is no guarantee that the same query will return the same results each time.
Output
| RESULT |
|-----------------|
| SVGFRAMXPOSLSVG |

Change an iterative query to a relational set-based query

SQL Fiddle
I'm trying without success to change an iterative/cursor query (that is working fine) to a relational set query to achieve a better performance.
What I have:
table1
| ID | NAME |
|----|------|
| 1 | A |
| 2 | B |
| 3 | C |
Using a function, I want to insert my data into another table. The following function is a simplified example:
Function
CREATE FUNCTION fn_myExampleFunction
(
#input nvarchar(50)
)
RETURNS #ret_table TABLE
(
output nvarchar(50)
)
AS
BEGIN
IF #input = 'A'
INSERT INTO #ret_table VALUES ('Alice')
ELSE IF #input = 'B'
INSERT INTO #ret_table VALUES ('Bob')
ELSE
INSERT INTO #ret_table VALUES ('Foo'), ('Bar')
RETURN
END;
My expected result is to insert data in table2 like the following:
table2
| ID | NAME |
|----|-------|
| 1 | Alice |
| 2 | Bob |
| 3 | Foo |
| 3 | Bar |
To achieve this, I've tried some CTEs (Common Table Expression) and relational queries, but none worked as desired. The only working solution that I've got so far was an iterative and not performatic solution.
My current working solution:
BEGIN
DECLARE
#ID int,
#i int = 0,
#max int = (SELECT COUNT(name) FROM table1)
WHILE ( #i < #max ) -- In this example, it will iterate 3 times
BEGIN
SET #i += 1
-- Select table1.ID where row_number() = #i
SET #ID =
(SELECT
id
FROM
(SELECT
id,
ROW_NUMBER() OVER (ORDER BY id) as rn
FROM
table1) rows
WHERE
rows.rn = #i
)
-- Insert into table2 one or more rows related with table1.ID
INSERT INTO table2
(id, name)
SELECT
#ID,
fn_result.output
FROM
fn_myExampleFunction (
(SELECT name FROM table1 WHERE id = #ID)
) fn_result
END
END
The objective is to achieve the same without iterating through the IDs.
if the question is about how to apply a function in a set oriented way, then cross apply (or outer apply) is your friend:
insert into table2 (
id, name
) select
t1.id,
t2.output
from
table1 t1
cross apply
fn_myExampleFunction(t1.name) t2
Example SQLFiddle
If the non-simplified version of your function is amenable to rewriting, the other solutions will likely be faster.
A query like this will do what you want:
insert into table2(id, name)
select id, (case when name = 'A' then 'Alice'
when name = 'B' then 'Bob'
when name = 'C' then 'Foo'
end)
from table1
union all
select id, 'Bar'
from table1
where name = 'C';
Why wouldn't you store this data as a table? It's relational. Coding it in a function or stored procedure seems less than ideal.
In any case, I hope the following gives you ideas about how to improve your code. I realize that you said your function is more complicated than your example, but you can still use this idea even inside of the function as necessary.
INSERT dbo.table2 (ID, Name)
SELECT
T1.ID,
N.FullName
FROM
dbo.table1 T1
INNER JOIN (VALUES -- A "derived table" made up of only constants
('A', 'Alice'),
('B', 'Bob'),
('C', 'Foo'),
('C', 'Bar')
) N (ShortName, FullName)
ON T1.Name = N.ShortName
;
But of course, that could just be rendered INNER JOIN dbo.NameTranslation N if it were in a real table (and then updating it would be so much easier!).
If your function absolutely can't be rewritten to be relational (it must take a single name at a time) then you would use CROSS APPLY:
INSERT dbo.table2 (ID, Name)
SELECT
T1.ID,
N.OutputName
FROM
dbo.table1 T1
CROSS APPLY dbo.YourFunction(T1.Name) F
;
However, this will not perform very well for large rowsets. Rewriting the function to be the type that RETURNS TABLE is a step in the right direction (instead of RETURNS #variable TABLE (definition)).

Add a rownumber based on the sequence of values provided

SELECT Code, Value FROM dbo.Sample
Output:
Code Value
Alpha Pig
Beta Horse
Charlie Dog
Delta Cat
Echo Fish
I want to add a sequence column by specifying a list of Codes and sort the list based on the order specified in the IN clause.
SELECT Code, Value FROM dbo.Sample
WHERE Code in ('Beta', 'Echo', 'Alpha')
I could declare a variable at the top to specify the Codes if that is easier.
The key is that I want to add the row number based on the order that I specify them in.
Output:
Row Code Value
1 Beta Horse
2 Echo Fish
3 Alpha Pig
Edit: I realized after that my Codes are all a fixed length which makes a big difference in how it could be done. I marked the answer below as correct, but my solution is to use a comma-separated string of values:
DECLARE #CodeList TABLE (Seq int, Code nchar(3))
DECLARE #CodeSequence varchar(255)
DECLARE #ThisCode char(3)
DECLARE #Codes int
SET #Codes = 0
-- string of comma-separated codes
SET #CodeSequence = 'ZZZ,ABC,FGH,YYY,BBB,CCC'
----loop through and create index and populate #CodeList
WHILE #Codes*4 < LEN(#CodeSequence)
BEGIN
SET #ThisCode = SUBSTRING(#CodeSequence,#Codes*4+1,3)
SET #Codes = #Codes + 1
INSERT #CodeList (Seq, Code) VALUES (#Codes, #ThisCode)
END
SELECT Seq, Code from #CodeList
Here are the only 2 ways I've seen work accurately:
The first uses CHARINDEX (similar to Gordon's, but I think the WHERE statement is more accurate using IN):
SELECT *
FROM Sample
WHERE Code IN ('Beta','Echo','Alpha')
ORDER BY CHARINDEX(Code+',','Beta,Echo,Alpha,')
Concatenating the comma with code should ensure sub-matches don't affect the results.
Alternatively, you could use a CASE statement:
SELECT *
FROM Sample
WHERE Code in ('Beta','Echo','Alpha')
ORDER BY CASE
WHEN Code = 'Beta' THEN 1
WHEN Code = 'Echo' THEN 2
WHEN Code = 'Alpha' THEN 3
END
SQL Fiddle Demo
Updated Demo with sub-matches.
Also you can use Values as Table Source
SELECT Row, Code, Value
FROM [Sample] s JOIN (
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT 1)) AS Row, Match
FROM (VALUES ('Beta'),
('Echo'),
('Alpha'))
x (Match)
) o ON s.Code = o.Match
ORDER BY Row
Demo on SQLFiddle
Here is solution for any lenght code list.
Create table with self incrementing field and code. Insert in given order. Join tables and order by ...
Some details. Please read this. You will find there function that creates table with auto increment field from string (delimited by commas), i.e.
mysql> call insertEngineer('dinusha,nuwan,nirosh');
Query OK, 1 row affected (0.12 sec)
mysql> select * from engineer;
+----+----------+
| ID | NAME |
+----+----------+
| 1 | dinusha |
| 2 | nuwan |
| 3 | nirosh |
+----+----------+
Next join your Sample table with result of above. GL
Just a lil bit of change to whats been done above to include the rownumbers as well.
SELECT CASE
WHEN Code = 'BetaBeta' THEN 1
WHEN Code = 'Beta' THEN 2
WHEN Code = 'Alpha' THEN 3
END CodeOrder,
*
FROM Sample
WHERE Code in ('BetaBeta','Beta','Alpha')
ORDER BY CodeOrder
SQL Fiddle Demo
I might be tempted to do this using string functions:
declare #list varchar(8000) = 'Beta,Echo,Alpha';
with Sample as (
select 'Alpha' as Code, 'Pig' as Value union all
select 'Beta', 'Horse' union all
select 'Charlie', 'Dog' union all
select 'Delta', 'Cat' union all
select 'Echo', 'Fish'
)
select * from Sample
where charindex(Code, #list) > 0
order by charindex(Code, #list)
If you are worried about submatches, just do the "delimiter" trick:
where #list like '%,'+Code+',%'