SQL Comma separated values comparisons - sql

I am having a challenge comparing values in Available column with values in Required column. They are both comma separated.
Available
Required
Match
One, Two, Three
One, Three
1
One, Three
Three, Five
0
One, Two, Three
Two
1
What I want to achieve is, if values in the Required column are all found in the Available column then it gives me a match of 1 and 0 if one or more values that are in the Required column is missing in the Available column
I want to achieve this in SQL.

If I understand the question correctly, an approach based on STRING_SPLIT() and an appropriate JOIN is an option:
Sample data:
SELECT *
INTO Data
FROM (VALUES
('One, Two, Three', 'One, Three'),
('One, Three', 'Three, Five'),
('One, Two, Three', 'Two')
) v (Available, Required)
Statement:
SELECT
Available, Required,
CASE
WHEN EXISTS (
SELECT 1
FROM STRING_SPLIT(Required, ',') s1
LEFT JOIN STRING_SPLIT(Available, ',') s2 ON TRIM(s1.[value]) = TRIM(s2.[value])
WHERE s2.[value] IS NULL
) THEN 0
ELSE 1
END AS Match
FROM Data
Result:
Available
Required
Match
One, Two, Three
One, Three
1
One, Three
Three, Five
0
One, Two, Three
Two
1

A variation of Zhorov's solution.
It is using a set based operator EXCEPT.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (Available VARCHAR(100), Required VARCHAR(100));
INSERT INTO #tbl (Available, Required) VALUES
('One, Two, Three', 'One, Three'),
('One, Three', 'Three, Five'),
('One, Two, Three', 'Two');
-- DDL and sample data population, end
SELECT t.*
, [Match] = CASE
WHEN EXISTS (
SELECT TRIM([value]) FROM STRING_SPLIT(Required, ',')
EXCEPT
SELECT TRIM([value]) FROM STRING_SPLIT(Available, ',')
) THEN 0
ELSE 1
END
FROM #tbl AS t;
Output
+-----------------+-------------+-------+
| Available | Required | Match |
+-----------------+-------------+-------+
| One, Two, Three | One, Three | 1 |
| One, Three | Three, Five | 0 |
| One, Two, Three | Two | 1 |
+-----------------+-------------+-------+

You need to do a cross join to look in all available values, your query would be :
SELECT t.*
,case when SUM(CASE
WHEN t1.Available LIKE '%' + t.Required + '%'
THEN 1
ELSE 0
END) > 0 THEN 1 ELSE 0 END AS [Match_Calculated]
FROM YOUR_TABLE t
CROSS JOIN YOUR_TABLE t1
GROUP BY t.Available
,t.Required
,t.Match
Here's a dbfiddle

You can use "STRING_SPLIT" to achieve your request
;with Source as
(
select 1 id,'One,Two,Three' Available,'One,Three' Required
union all
select 2 id,'One,Three' Available,'Three,Five' Required
union all
select 3 id,'One,Two,Three' Available,'Two' Required
)
,AvailableTmp as
(
SELECT t.id,
x.value
FROM Source t
CROSS APPLY (SELECT trim(value) value
FROM string_split(t.Available, ',')) x
)
,RequiredTmp as
(
SELECT t.id,
x.value
FROM Source t
CROSS APPLY (SELECT trim(value) value
FROM string_split(t.Required, ',')) x
)
,AllMatchTmp as
(
select a.id
,1 Match
From RequiredTmp a
left join AvailableTmp b on a.id=b.id and a.value = b.value
group by a.id
having max(case when b.value is null then 1 else 0 end ) = 0
)
select a.id
,a.Available
,a.Required
,ISNULL(b.Match,0) Match
from Source a
left join AllMatchTmp b on a.id = b.id

Another way using STRING_SPLIT
DECLARE #data TABLE (Available VARCHAR(100), [Required] VARCHAR(100),
INDEX IX_data(Available,[Required]));
INSERT #data
VALUES ('One, Two, Three', 'One, Three'),('One, Three', 'Three, Five'),
('One, Two, Three', 'Two');
SELECT
Available = d.Available,
[Required] = d.[Required],
[Match] = MIN(f.X)
FROM #data AS d
CROSS APPLY STRING_SPLIT(REPLACE(d.[Required],' ',''),',') AS split
CROSS APPLY (VALUES(REPLACE(d.[Available],' ',''))) AS cleaned(String)
CROSS APPLY (VALUES(IIF(split.[value] NOT IN
(SELECT s.[value] FROM STRING_SPLIT(cleaned.String,',') AS s),0,1))) AS f(X)
GROUP BY d.Available, d.[Required];

Related

SQL conditional aggregation?

Let's say I have the following table:
name virtual message
--------------------------
a 1 'm1'
a 1 'm2'
a 0 'm3'
a 0 'm4'
b 1 'm5'
b 0 'm6'
c 0 'm7'
I want to group by name but only concat the message if virtual is 1.
The result I am looking for is:
name concat_message
---------------------
a 'm1,m2'
b 'm5'
c ''
I couldn't find a way to conditionally aggregate using string_agg.
Standard SQL offers listagg() to aggregate strings. So this looks something like:
select name,
listagg(case when virtual = 1 then message end, ',') within group (order by message)
from t
group by name;
However, most databases have different names (and syntax) for string aggregation, such as string_agg() or group_concat().
EDIT:
In BQ the syntax would be:
select name,
string_agg(case when virtual = 1 then message end, ',')
from t
group by name;
That said, I would recommend array_agg() rather than string_agg().
Consider below
select name,
ifnull(string_agg(if(virtual=1,message,null)), '') as concat_message
from your_table
group by name
If applied to sample data in your question - output is
use xml xpath to rotate row data into a single column
declare #temp table(name varchar(1), virtual int, message varchar(2))
insert into #temp
values('a' ,1 , 'm1'),
('a', 1 , 'm2'),
('a', 0 , 'm3'),
('a', 0 , 'm4'),
('b', 1 , 'm5'),
('b', 0 , 'm6'),
('c', 0 , 'm7')
select tmp2.name, stuff((select ','+message from #temp tmp1
where
tmp1.virtual=1
and tmp1.name=tmp2.name
for xml path('')),1,1,'') result
from #temp tmp2
where tmp2.virtual=1
group by tmp2.name
output:
name result
a m1,m2
b m5

SQL - Return a default value when my search returns no results along with search criteria

I am searching with a query
--Code Format
SELECT COLA,COLB,COLC from MYTABLE where SWITCH IN (1,2,3);
If MYTABLE does not contain rows with SWITCH 1,2 or 3 I need default values returned along with the SWITCH value. How do I do it?
Below is my table format
COLA | COLB | COLC | SWITCH
------------------------------
A B C 1
a b c 2
i want a query when I search with
select * from MYTABLE where switch in (1,2,3)
That gets results like this --
COLA | COLB | COLC | SWITCH
------------------------------
A B C 1
a b c 2
NA NA NA 3
--Check to see if any row exists matching your conditions
IF NOT EXISTS (SELECT COLA,COLB,COLC from MYTABLE where SWITCH IN (1,2,3))
BEGIN
--Select your default values
END
ELSE
BEGIN
--Found rows, return them
SELECT COLA,COLB,COLC from MYTABLE where SWITCH IN (1,2,3)
END
if not exists( SELECT 1 from MYTABLE where SWITCH IN (1,2,3))
select default_value
How about:
SELECT COLA,COLB,COLC from MYTABLE where SWITCH IN (1,2,3)
union select 5555, 6666, 7777 where not exists (
SELECT COLA,COLB,COLC from MYTABLE where SWITCH IN (1,2,3)
);
5555, 6666, 7777 being the default row in case there aren't any rows matching your criteria.
Here is one way to tackle this. You need a table of the SWITCH values you want to look at. Then a simple left join makes this super easy.
select ColA
, ColB
, ColC
v.Switch
from
(
values
(1)
, (2)
, (3)
)v (Switch)
left join YourTable yt on yt.Switch = v.Switch
You can Use a Split Function And Left Join As Shown Below:
Select ISNULL(ColA,'NA') As ColA,ISNULL(ColB,'NA') As ColB,ISNULL(ColC,'NA') As ColC,ISNULL(Switch,a.splitdata)
from [dbo].[fnSplitString]('1,2,3',',') a
LEFT JOIN #MYTABLE t on a.splitdata=t.Switch
[dbo].[fnSplitString] is a Split Function with 2 arguments - Delimeter Separated String and Delimeter and Output a Table.
EDIT:
Given the new explanation, I changed the answer completely. I think I got your question now:
SELECT * FROM MYTABLE AS mt
RIGHT JOIN (SELECT 1 AS s UNION SELECT 2 AS s UNION SELECT 3 AS s) AS st
ON st.s = mt.SWITCH
You could change the SELECT 1 AS s UNION SELECT 2 AS s UNION SELECT 3 AS spart to a subquery that results in all possible values SWITCH could assume. E.g.:
SELECT DISTINCT SWITCH FROM another_table_with_all_switches
If all want is the value of switch that is not in MYTABLE, not the whole table with null values, you could try:
SELECT * FROM
(SELECT 1 AS s UNION SELECT 2 AS s UNION SELECT 3) AS st
WHERE st.s NOT IN (SELECT DISTINCT SWITCH FROM MYTABLE)

SQL query to get column names if it has specific value

I have a situation here, I have a table with a flag assigned to the column names(like 'Y' or 'N'). I have to select the column names of a row, if it have a specific value.
My Table:
Name|sub-1|sub-2|sub-3|sub-4|sub-5|sub-6|
-----------------------------------------
Tom | Y | | Y | Y | | Y |
Jim | Y | Y | | | Y | Y |
Ram | | Y | | Y | Y | |
So I need to get, what are all the subs are have 'Y' flag for a particular Name.
For Example:
If I select Tom I need to get the list of 'Y' column name in query output.
Subs
____
sub-1
sub-3
sub-4
sub-6
Your help is much appreciated.
The problem is that your database model is not normalized. If it was properly normalized the query would be easy. So the workaround is to normalize the model "on-the-fly" to be able to make the query:
select col_name
from (
select name, sub_1 as val, 'sub_1' as col_name
from the_table
union all
select name, sub_2, 'sub_2'
from the_table
union all
select name, sub_3, 'sub_3'
from the_table
union all
select name, sub_4, 'sub_4'
from the_table
union all
select name, sub_5, 'sub_5'
from the_table
union all
select name, sub_6, 'sub_6'
from the_table
) t
where name = 'Tom'
and val = 'Y'
The above is standard SQL and should work on any (relational) DBMS.
Below code works for me.
select t.Subs from (select name, u.subs,u.val
from TableName s
unpivot
(
val
for subs in (sub-1, sub-2, sub-3,sub-4,sub-5,sub-6,sub-7)
) u where u.val='Y') T
where t.name='Tom'
Somehow I am near to the solution. I can get for all rows. (I just used 2 columns)
select col from ( select col, case s.col when 'sub-1' then sub-1 when 'sub-2' then sub-2 end AS val from mytable cross join ( select 'sub-1' AS col union all select 'sub-2' ) s ) s where val ='Y'
It gives the columns for all row. I need the same data for a single row. Like if I select "Tom", I need the column names for 'Y' value.
I'm answering this under a few assumptions here. The first is that you KNOW the names of the columns of the table in question. Second, that this is SQL Server. Oracle and MySql have ways of performing this, but I don't know the syntax for that.
Anyways, what I'd do is perform an 'UNPIVOT' on the data.
There's a lot of parans there, so to explain. The actual 'unpivot' statement (aliased as UNPVT) takes the data and twists the columns into rows, and the SELECT associated with it provides the data that is being returned. Here's I used the 'Name', and placed the column names under the 'Subs' column and the corresponding value into the 'Val' column. To be precise, I'm talking about this aspect of the above code:
SELECT [Name], [Subs], [Val]
FROM
(SELECT [Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6]
FROM pvt) p
UNPIVOT
(Orders FOR [Name] IN
([Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6])
)AS unpvt
My next step was to make that a 'sub-select' where I could find the specific name and val that was being hunted for. That would leave you with a SQL Statement that looks something along these lines
SELECT [Name], [Subs], [Val]
FROM (
SELECT [Name], [Subs], [Val]
FROM
(SELECT [Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6]
FROM pvt) p
UNPIVOT
(Orders FOR [Name] IN
([Name], [Sub-1], [Sub-2], [Sub-3], [Sub-4], [Sub-5], [Sub-6])
)AS unpvt
) AS pp
WHERE 1 = 1
AND pp.[Val] = 'Y'
AND pp.[Name] = 'Tom'
select col from (
select col,
case s.col
when 'sub-1' then sub-1
when 'sub-2' then sub-2
when 'sub-3' then sub-3
when 'sub-4' then sub-4
when 'sub-5' then sub-5
when 'sub-6' then sub-6
end AS val
from mytable
cross join
(
select 'sub-1' AS col union all
select 'sub-2' union all
select 'sub-3' union all
select 'sub-4' union all
select 'sub-5' union all
select 'sub-6'
) s on name="Tom"
) s
where val ='Y'
included the join condition as
on name="Tom"

Tricky SQL. Consolidating rows

I have a (in my oppinion) tricky SQL problem.
I got a table with subscriptions. Each subscription has an ID and a set of attributes which will change over time. When an attribute value changes a new row is created with the subscription key and the new values – but ONLY for the changed attributes. The values for the attributes that weren’t changed are left empty. It looks something like this (I left out the ValidTo and ValidFrom dates that I use to sort the result correctly):
SubID Att1 Att2
1 J
1 L
1 B
1 H
1 A H
I need to transform this table so I can get the following result:
SubID Att1 Att2
1 J
1 J L
1 B L
1 B H
1 A H
So basically; if an attribute is empty then take the previous value for that attribute.
Anything solution goes…. I mean it doesn’t matter what I have to do to get the result: a view on top of the table, an SSIS package to create a new table or something third.
You can do this with a correlated subquery:
select t.subid,
(select t2.att1 from t t2 where t2.rowid <= t.rowid and t2.att1 is not null order by rowid desc limit 1) as att1,
(select t2.att2 from t t2 where t2.rowid <= t.rowid and t2.att2 is not null order by rowid desc limit 1) as att1
from t
This assumes that you have a rowid or equivalent (such as date time created) that specifies the ordering of the rows. It also uses limit to limit the results. In other databases, this might use top instead. (And Oracle uses a slightly more complex expression.)
I would write this using ValidTo. However, because there is ValidTo and ValidFrom, the actual expression is much more complicated. I would need for the question to clarify the rules for using these values with respect to imputing values at other times.
this one works in oracle 11g
select SUBID
,NVL(ATT1,LAG(ATT1) over(order by ValidTo)) ATT1
,NVL(ATT2,lag(ATT2) over(order by ValidTo)) ATT2
from table_name
i agree with Gordon Linoff and Jack Douglas.this code has limitation as when multiple records with nulls are inserted..
but below code will handle that..
select SUBID
,NVL(ATT1,LAG(ATT1 ignore nulls) over(order by VALIDTO)) ATT1
,NVL(ATT2,LAG(ATT2 ignore nulls) over(order by VALIDTO)) ATT2
from Table_name
please see sql fiddle
http://sqlfiddle.com/#!4/3b530/4
Assuming (based on the fact that you mentioned SSIS) you can use OUTER APPLY to get the previous row:
DECLARE #T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT #T VALUES
(1, 'J', '', '20121201'),
(1, '', 'l', '20121202'),
(1, 'B', '', '20121203'),
(1, '', 'H', '20121204'),
(1, 'A', 'H', '20121205');
SELECT T.SubID,
Att1 = COALESCE(NULLIF(T.att1, ''), prev.Att1, ''),
Att2 = COALESCE(NULLIF(T.att2, ''), prev.Att2, '')
FROM #T T
OUTER APPLY
( SELECT TOP 1 Att1, Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
ORDER BY ValidFrom DESC
) prev
ORDER BY T.ValidFrom;
(I've had to add random values for ValidFrom to ensure the order by is correct)
EDIT
The above won't work if you have multiple consecutive rows with blank values - e.g.
DECLARE #T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT #T VALUES
(1, 'J', '', '20121201'),
(1, '', 'l', '20121202'),
(1, 'B', '', '20121203'),
(1, '', 'H', '20121204'),
(1, '', 'J', '20121205'),
(1, 'A', 'H', '20121206');
If this is likely to happen you will need two OUTER APPLYs:
SELECT T.SubID,
Att1 = COALESCE(NULLIF(T.att1, ''), prevAtt1.Att1, ''),
Att2 = COALESCE(NULLIF(T.att2, ''), prevAtt2.Att2, '')
FROM #T T
OUTER APPLY
( SELECT TOP 1 Att1
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att1 , '') != ''
ORDER BY ValidFrom DESC
) prevAtt1
OUTER APPLY
( SELECT TOP 1 Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att2 , '') != ''
ORDER BY ValidFrom DESC
) prevAtt2
ORDER BY T.ValidFrom;
However, since each OUTER APPLY is only returning one value I would change this to a correlated subquery, since the above will evaluate PrevAtt1.Att1 and `PrevAtt2.Att2' for every row whether required or not. However if you change this to:
SELECT T.SubID,
Att1 = COALESCE(
NULLIF(T.att1, ''),
( SELECT TOP 1 Att1
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att1 , '') != ''
ORDER BY ValidFrom DESC
), ''),
Att2 = COALESCE(
NULLIF(T.att2, ''),
( SELECT TOP 1 Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att2 , '') != ''
ORDER BY ValidFrom DESC
), '')
FROM #T T
ORDER BY T.ValidFrom;
The subquery will only evaluate when required (ie. when Att1 or Att2 is blank) rather than for every row. The execution plan does not show this, and in fact the "Actual Execution Plan" of the latter appears more intensive it almost certainly won't be. But as always, the key is testing, run both on your data and see which performs the best, and check the IO statistics for reads etc.
I never touched SQL Server, but I read that it supports analytical functions just like Oracle.
> select * from MYTABLE order by ValidFrom;
SUBID A A VALIDFROM
---------- - - -------------------
1 J 2012-12-06 15:14:51
2 j 2012-12-06 15:15:20
1 L 2012-12-06 15:15:31
2 l 2012-12-06 15:15:39
1 B 2012-12-06 15:15:48
2 b 2012-12-06 15:15:55
1 H 2012-12-06 15:16:03
2 h 2012-12-06 15:16:09
1 A H 2012-12-06 15:16:20
2 a h 2012-12-06 15:16:29
select
t.SubID
,last_value(t.Att1 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att1
,last_value(t.Att2 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att2
,t.ValidFrom
from MYTABLE t;
SUBID A A VALIDFROM
---------- - - -------------------
1 J 2012-12-06 15:45:33
1 J L 2012-12-06 15:45:41
1 B L 2012-12-06 15:45:49
1 B H 2012-12-06 15:45:58
1 A H 2012-12-06 15:46:06
2 j 2012-12-06 15:45:38
2 j l 2012-12-06 15:45:44
2 b l 2012-12-06 15:45:53
2 b h 2012-12-06 15:46:02
2 a h 2012-12-06 15:46:09
with Tricky1 as (
Select SubID, Att1, Att2, row_number() over(order by ValidFrom) As rownum
From Tricky
)
select T1.SubID, T1.Att1, T2.Att2
from Tricky1 T1
cross join Tricky1 T2
where (ABS(T1.rownum-T2.rownum) = 1 or (T1.rownum = 1 and T2.rownum = 1))
and T1.Att1 is not null
;
Also, have a look at accessing previous value, when SQL has no notion of previous value, here.
I was at it for quite a while now. I found a rather simple way of doing it. Not the best solution as such as i know there must be other way, but here it goes.
I had to consolidates duplicates too and in 2008R2.
So if you can try to create a table which contains one set of duplicates records.
According to your example create one table where 'ATT1' is blank. Then use Update queries with Inner join on 'SubId' to populate the data that you need

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'
In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings
The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.
From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.
Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.