SQL Server: split a single row into 2 rows - sql

I'm trying to import data from an excel spreadsheet into a SQL Server 2008 database and I need to massage the data a bit first.
I've used the Data Import wizard to load it into a staging table in the database in the following format:
Id ISOCode Data
1 US Foo
2 CA Bar
3 US or CA Blah
In cases where ISO is an OR-delimited string, eg US or CA, I need to split it into 2 rows, so in the final destination table it would look like this:
Id ISOCode Data
1 US Foo
2 CA Bar
3 US Blah
3 CA Blah
I do have a SplitString table-valued function available but I'm not sure how to work it into the equation.

Here is my solution:
SELECT ID,
CASE
WHEN ( ISOCODE LIKE '% or %' ) THEN LEFT(ISOCODE, Charindex('or',
ISOCODE) - 1
)
ELSE ISOCODE
END AS ISOCode,
DATA
FROM TBL
UNION
SELECT ID,
RIGHT(ISOCODE, Len(ISOCODE) - ( Charindex('or', ISOCODE) + 2 ))AS ISOCode
,
DATA
FROM TBL
WHERE ISOCODE LIKE '% or %'
You can take a look at the full solution (with data) on SQL Fiddle.

select t.Id, c.ISOCode, t.Data
from t
cross apply (select charindex(' or ', t.ISOCode) as OrIndex) idx
cross apply
(
select t.ISOCode where idx.OrIndex = 0
union all select left(t.ISOCode, idx.OrIndex - 1) where idx.OrIndex > 0
union all select substring(t.ISOCode, idx.OrIndex + 4, len(t.ISoCode) - idx.OrIndex - 3) where idx.OrIndex > 0
) c
(this query doesn't require 2 table scans)

Related

Condition check to get output

I have 2 tables name and match. The name and match table have columns type.
The columns and data in the name table
ID| type |
--| ---- |
1| 1ABC |
2| 2DEF |
3| 3DEF |
4| 4IJK |
The columns and data in match table is
type
DATA
NOT %ABC% AND NOT %DEF%
NOT ABC AND NOT DEF
%DEF%
DEF ONLY
NOT %DEF% AND NOT %IJK%
NOT DEF AND NOT IJK
I have tried using case statement. The first 3 characters will be NOT if there is a NOT in the type in match table.
The below query is giving me a missing keyword error. I am not sure what I am missing here
SELECT s.id, s.type, m.data
where case when substr(m.type1,3)='NOT' then s.type not in (REPLACE(REPLACE(m.type,'NOT',''),'AND',','))
ELSE s.type in (m.type) end
from source s, match m;
I need the output to match the type in source column and display the data in match column.
The output should be
ID|type|DATA
1 |1ABC|NOT DEF AND NOT IJK
2 |2DEF|DEF ONLY
3 |3DEF|DEF ONLY
4 |4IJK|NOT ABC AND NOT DEF
The biggest problem with your attempted query seems to be that SQL requires the WHERE clause to come after the FROM clause.
But your query is flawed in other ways as well. Although it can have complicated logic within it, including subqueries, a CASE statement must ultimately return a constant. Conditions within it are not applied as if they are in a WHERE clause of the main query (like you appear to be trying to do).
My recommendation would be to not store the match table as you currently are. It seems much preferable to have something that contains each condition you want to evaluate. Assuming that's not possible, I suggest a CTE (or even a view) that breaks it down that way first.
This query (based on Nefreo's answer for breaking strings into multiple rows)...
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
... breaks your match table into something more like:
DATA
NUM
NEGATE
MATCH
NOT ABC AND NOT DEF
2
1
%ABC%
NOT ABC AND NOT DEF
2
1
%DEF%
DEF ONLY
1
0
%DEF%
NOT DEF AND NOT IJK
2
1
%DEF%
NOT DEF AND NOT IJK
2
1
%IJK%
So we now know each specific like condition, whether it should be negated, and the number of conditions that need to be matched for each MATCH row. (For simplicity, I am using match.data as essentially a key for this since it is unique for each row in match and is what we want to return anyway, but if you were actually storing the data this way you'd probably use a sequence of some sort and not repeat the human-readable text.)
That way, your final query can be quite simple:
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
The conditions in the ON do the appropriate LIKE or NOT LIKE (matches one condition from the CRITERIA view/CTE), and the condition in the HAVING makes sure we had the correct number of total matches to return the row (makes sure we matched all the conditions in one row of the MATCH table).
You can see the entire thing...
WITH criteria AS
(
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
)
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
... working in this fiddle.
As a one-off, I don't think this is significantly different than the other answer already provided, but I wanted to do this since I think this is probably more maintainable if the complexity of your conditions changes.
It already handles arbitrary numbers of conditions, mixes of NOT and not-NOT within the same row of MATCH, and allows for the % signs (for the like) to be placed arbitrarily (e.g. startswith%, %endswith, %contains%, start%somewhere%end, exactmatch should all work as expected). If in the future you want to add different types of conditions or handle ORs, I think the general ideas here will apply.
Not knowing the possible other rules for selecting rows, just with your data from the question, maybe you could use this:
WITH
tbl_name AS
(
Select 1 "ID", '1ABC' "A_TYPE" From Dual Union All
Select 2 "ID", '2DEF' "A_TYPE" From Dual Union All
Select 3 "ID", '3DEF' "A_TYPE" From Dual Union All
Select 4 "ID", '4IJK' "A_TYPE" From Dual
),
tbl_match AS
(
Select 'NOT %ABC% AND NOT %DEF%' "A_TYPE", 'NOT ABC AND NOT DEF' "DATA" From Dual Union All
Select '%DEF%' "A_TYPE", 'DEF ONLY' "DATA" From Dual Union All
Select 'NOT %DEF% AND NOT %IJK%' "A_TYPE", 'NOT DEF AND NOT IJK' "DATA" From Dual
)
Select
n.ID "ID",
n.A_TYPE,
m.DATA
From
tbl_match m
Inner Join
tbl_name n ON (1=1)
Where
(
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 1) = 0
AND
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 2) = 0
AND
Length(m.A_TYPE) > Length(SubStr(n.A_TYPE, 2)) + 2
)
OR
(
Length(m.A_TYPE) = Length(SubStr(n.A_TYPE, 2)) + 2
AND
'%' || SubStr(n.A_TYPE, 2) || '%' = m.A_TYPE
)
Order By n.ID
Result:
ID
A_TYPE
DATA
1
1ABC
NOT DEF AND NOT IJK
2
2DEF
DEF ONLY
3
3DEF
DEF ONLY
4
4IJK
NOT ABC AND NOT DEF
Any other format of condition should be evaluated separately ...
Regards...
WITH match_cte AS (
SELECT m.data
,m.type
,decode(instr(m.type,'NOT')
,1 -- found at position 1
,0
,1) should_find_str_1
,substr(m.type
,instr(m.type,'%',1,1) + 1
,instr(m.type,'%',1,2) - instr(m.type,'%',1,1) - 1) str_1
,decode(instr(m.type,'NOT',instr(m.type,'%',1,2))
,0 -- no second NOT
,1
,0) should_find_str_2
,substr(m.type
,instr(m.type,'%',1,3) + 1
,instr(m.type,'%',1,4) - instr(m.type,'%',1,3) - 1) str_2
FROM match m
)
SELECT s.id
,s.type
,m.data
FROM source s
CROSS JOIN match_cte m
WHERE m.should_find_str_1 = sign(instr(s.type,m.str_1))
AND (m.str_2 IS NULL
OR m.should_find_str_2 = sign(instr(s.type, m.str_2))
)
ORDER BY s.id, m.data
MATCH_CTE
|DATA|TYPE|SHOULD_FIND_STR_1|STR_1|SHOULD_FIND_STR_2|STR_2|
|-|-|-|-|-|-|
|NOT ABC AND NOT DEF|NOT %ABC% AND NOT %DEF%|0|ABC|0|DEF|
|DEF|%DEF%|1|DEF|1|NULL|
|NOT DEF AND NOT IJK|NOT %DEF% AND NOT %IJK%|0|DEF|0|IJK|

looping in sql with delimiter

I just had this idea of how can i loop in sql?
For example
I have this column
PARAMETER_VALUE
E,C;S,C;I,X;G,T;S,J;S,F;C,S;
i want to store all value before (,) in a temp column also store all value after (;) into another column
then it wont stop until there is no more value after (;)
Expected Output for Example
COL1 E S I G S S C
COL2 C C X T J F S
etc . . .
You can get by using regexp_substr() window analytic function with connect by level <= clause
with t1(PARAMETER_VALUE) as
(
select 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual
), t2 as
(
select level as rn,
regexp_substr(PARAMETER_VALUE,'([^,]+)',1,level) as str1,
regexp_substr(PARAMETER_VALUE,'([^;]+)',1,level) as str2
from t1
connect by level <= regexp_count(PARAMETER_VALUE,';')
)
select listagg( regexp_substr(str1,'([^;]+$)') ,' ') within group (order by rn) as col1,
listagg( regexp_substr(str2,'([^,]+$)') ,' ') within group (order by rn) as col2
from t2;
COL1 COL2
------------- -------------
E S I G S S C C C X T J F S
Demo
Assuming that you need to separate the input into rows, at the ; delimiters, and then into columns at the , delimiter, you could do something like this:
-- WITH clause included to simulate input data. Not part of the solution;
-- use actual table and column names in the SELECT statement below.
with
t1(id, parameter_value) as (
select 1, 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual union all
select 2, ',U;,;V,V;' from dual union all
select 3, null from dual
)
-- End of simulated input data
select id,
level as ord,
regexp_substr(parameter_value, '(;|^)([^,]*),', 1, level, null, 2) as col1,
regexp_substr(parameter_value, ',([^;]*);' , 1, level, null, 1) as col2
from t1
connect by level <= regexp_count(parameter_value, ';')
and id = prior id
and prior sys_guid() is not null
order by id, ord
;
ID ORD COL1 COL2
--- --- ---- ----
1 1 E C
1 2 S C
1 3 I X
1 4 G T
1 5 S J
1 6 S F
1 7 C S
2 1 U
2 2
2 3 V V
3 1
Note - this is not the most efficient way to split the inputs (nothing will be very efficient - the data model, which is in violation of First Normal Form, is the reason). This can be improved using standard instr and substr, but the query will be more complicated, and for that reason, harder to maintain.
I generated more input data, to illustrate a few things. You may have several inputs that must be broken up at the same time; that must be done with care. (Note the additional conditions in CONNECT BY). I also illustrate the handling of NULL - if a comma comes right after a semicolon, that means that the "column 1" part of that pair must be NULL. That is shown in the output.

SQL Server Convert Particular Column Values into comma separated String

I have a table in Database as below :
Id Name
1 Item1
1 Item2
2 Item3
1 Item4
3 Item5
I need output as below(3rd column is count):
1 Item1,Item2,Item4 3
2 Item3 1
3 Item5 1
How it can achieved by SQL Query ?
SQL Server has STUFF() function which could able to help you.
SELECT t.Id,
Name = STUFF( (SELECT DISTINCT ','+Name
FROM table
WHERE Id = t.Id
FOR XML PATH('')
), 1, 1, ''
)
FROM table t
GROUP BY t.Id;
SQL Server 2017 has introduced a much easier way to achieve this using STRING_AGG(expression, separator).
Here's an example:
SELECT STRING_AGG(T.Name, ', ') FROM MyTable T where MyColumnID = 78
You could even play around with formatting in other ways like this one:
SELECT STRING_AGG(CONCAT(T.MyColumnID,' - ',T.Name), ', ') FROM MyTable T where MyColumnID = 78
More info in this blog I found about it: https://database.guide/the-sql-server-equivalent-to-group_concat/
select id, group_concat(name) csv,
from Table
group by id

Count the number of not null columns using a case statement

I need some help with my query...I am trying to get a count of names in each house, all the col#'s are names.
Query:
SELECT House#,
COUNT(CASE WHEN col#1 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#2 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#3 IS NOT NULL THEN 1 ELSE 0 END) as count
FROM myDB
WHERE House# in (house#1,house#2,house#3)
GROUP BY House#
Desired results:
house 1 - the count is 3 /
house 2 - the count is 2 /
house 3 - the count is 1
...with my current query the results for count would be just 3's
In this case, it seems that counting names is the same as counting the commas (,) plus one:
SELECT House_Name,
LEN(Names) - LEN(REPLACE(Names,',','')) + 1 as Names
FROM dbo.YourTable;
Another option since Lamak stole my thunder, would be to split it and normalize your data, and then aggregate. This uses a common split function but you could use anything, including STRING_SPLIT for SQL Server 2016+ or your own...
declare #table table (house varchar(16), names varchar(256))
insert into #table
values
('house 1','peter, paul, mary'),
('house 2','sarah, sally'),
('house 3','joe')
select
t.house
,NumberOfNames = count(s.Item)
from
#table t
cross apply dbo.DelimitedSplit8K(names,',') s
group by
t.house
Notice how the answers you are getting are quite complex for what they're doing? That's because relational databases are not designed to store data that way.
On the other hand, if you change your data structure to something like this:
house name
1 peter
1 paul
1 mary
2 sarah
2 sally
3 joe
The query now is:
select house, count(name)
from housenames
group by house
So my recommendation is to do that: use a design that's more suitable for SQL Server to work with, and your queries become simpler and more efficient.
One dirty trick is to replace commas with empty strings and compare the lengths:
SELECT house +
' has ' +
CAST((LEN(names) - LEN(REPLACE(names, ',', '')) + 1) AS VARCHAR) +
' names'
FROM mytable
You can parse using xml and find count as below:
Select *, a.xm.value('count(/x)','int') from (
Select *, xm = CAST('<x>' + REPLACE((SELECT REPLACE(names,', ','$$$SSText$$$') AS [*] FOR XML PATH('')),'$$$SSText$$$','</x><x>')+ '</x>' AS XML) from #housedata
) a
select House, 'has '+cast((LEN(Names)-LEN(REPLACE(Names, ',', ''))+1) as varchar)+' names'
from TempTable

Could a sql database return a sql result set plus a score for the results?

Just curious, if I wanted to send strings to a database (perhaps for MS SQL Server) can anyone provide any insight on what the best way would be to return results from a database where the result set might be sorted and "scored" on its closeness to the string passed in?
So, if I sent a query for :
SELECT name FROM table where name LIKE 'Jon'
and then get a result of 1000 results that looks like:
100 Jon
98 John
80 Jonathan
32 Nathan
Views, indexes, stored procedures, coded solution? What is the recommendation?
You could, but you'd need to use another function to do it. Levenshtein ratio or Jaro distance would be the most common solutions. I'm not sure what, if anything, SQL Server includes builtin for this. If nothing else I think you can use the SimMetrics library as described here. Regardless, it would look something like this.
select top 1000
jaro('John', name) as score, name
from table
where name like '%John%'
order by 1 desc
EDIT
Due to some persistent prodding from the comments, I present here an implementation of the Levenshtein distance calculation in SQL. TSQL for SQL Server 2005+ is used here, but the technique can be converted to other DBMS as well. Maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all
select 'Bone' union all
select 'BJon' union all
select 'Nathan' union all
select 'Jonne')
SELECT *, SCORE_Levenshtein + SCORE_SOUNDEX TotalScore
FROM
(
SELECT name,
CAST(50 /
(
select 1.0 + MAX(LDist)
FROM
(
select startAt.number,
LEN(longer) -
sum(case when SUBSTRING(longer, startAt.number+offset.number, 1)
= SUBSTRING(shorter, 1+offset.number, 1) then 1 else 0 end ) LDist
FROM
(select case when LEN(Name) < LEN(LookFor) then Name else LookFor end shorter) shorter
cross join
(select case when LEN(Name) >= LEN(LookFor) then Name else LookFor end longer) longer
inner join master..spt_values startAt
on startAt.type='P' and startAt.number between 1 and len(longer) - LEN(shorter) + 1
inner join master..spt_values offset
on offset.type='P' and offset.number between 0 and LEN(shorter)-1
group by startAt.number, longer, shorter
) X
) AS NUMERIC(16,4)) SCORE_Levenshtein
,
CAST(50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX(LookFor) as B) X
)) AS NUMERIC(16,4)) AS SCORE_SOUNDEX
FROM tbl
CROSS JOIN (SELECT 'Jon' as LookFor) LookFor
) Scored
Order by SCORE_Levenshtein + SCORE_SOUNDEX DESC
Note - This line CROSS JOIN (SELECT 'Jon' as LookFor) LookFor is used so that the input 'Jon' does not need to be repeated many times in the query. One could also define a variable instead and use it where LookFor is used in the query.
Output
It is worth noting that together with SOUNDEX, Jonny gets to score higher than Bone which won't happen with Levenshtein alone.
name SCORE_Levenshtein SCORE_SOUNDEX TotalScore
Jon 50.0000 50.0000 100.0000
John 12.5000 50.0000 62.5000
Jonny 8.3333 50.0000 58.3333
Jonne 8.3333 50.0000 58.3333
Bone 10.0000 25.0000 35.0000
BJon 10.0000 12.5000 22.5000
Jonathan 5.5556 16.6667 22.2223
Nathan 7.1429 12.5000 19.6429
Original answer follows, based on pre-filtering the input based on LIKE '%x%' which collapses the Levenshtein to a simple Len(column) - Len(Like-expression) calculation
Have a look at this example - it tests the length and SOUNDEX differences, for lack of better measures.
The maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all -- doesn't match LIKE
select 'BJon' union all
select 'Jonne')
SELECT name,
50 / (Len(Name) - LEN('Jon') + 1.0) -- inversely proportional to length difference
+
50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX('Jon') as B) X
)) AS SCORE
FROM tbl
where name LIKE '%Jon%'
Order by SCORE DESC
Output
name SCORE
Jon 100.00000000000000000
Jonny 66.66666666666660000
Jonne 66.66666666666660000
BJon 37.50000000000000000
Jonathan 24.99999999999996666
Something like this might help:
http://www.mombu.com/microsoft/microsoft/t-equivalent-sql-server-functions-for-match-against-in-mysql-2292412.html