Get rows in which any column contains sequence using exists select 1 - sql

I have a table with three columns with the following values (dbFiddle)
C1 C2 C3
----------------------------
Red Yellow Blue
null Red Green
Yellow null Violet
I'm trying to create a query that returns all the rows that contain the value "Yellow" without using IN or OR. If I execute the following query:
SELECT 1
FROM test
WHERE CONCAT(C1, C2, C3) LIKE '%Yellow%'
It correctly returns the rows specified. However, if I try to use this query inside an exists:
SELECT *
FROM test
WHERE EXISTS (SELECT 1 FROM test WHERE CONCAT(C1, C2, C3) LIKE '%Yellow%')
it returns all the rows, not just the two with the "Yellow" word. What am I doing wrong here?
Any help would be greatly appreciated.

Re
SELECT 1 FROM test WHERE CONCAT(C1, C2, C3) LIKE '%Yellow%'
"correctly returns the rows specified"
The select returns a single column, 1, although this is because there is a row which has one column containing Yellow somewhere in its text.
This is because EXISTS:
Returns TRUE if a subquery contains any rows.
i.e. All of the following queries also returned all rows in your test table:
SELECT * FROM test WHERE EXISTS (SELECT 1);
SELECT * FROM test WHERE EXISTS (SELECT 0);
SELECT * FROM test WHERE EXISTS (SELECT NULL);
... simply because the SELECT returns at least one row!
The usual usage of EXISTS also includes correlation of the subquery in the EXISTS back to the outer select.
Example of Correlation
In the below contrived example, we've got 4 people living in two houses. Here we're using EXISTS to figure out the names of the persons who are happy, and also have someone else who is also happy living in the same (correlated) House.
CREATE TABLE House
(
HouseId INT PRIMARY KEY,
Name VARCHAR(MAX)
);
CREATE TABLE Person
(
PersonId INT PRIMARY KEY,
HouseId INT FOREIGN KEY REFERENCES HOUSE(HouseId),
Name VARCHAR(MAX),
IsHappy BIT
);
INSERT INTO House(HouseId, Name) VALUES (1, 'House1'), (2, 'House2');
INSERT INTO Person(PersonId, HouseId, Name, IsHappy) VALUES
(1, 1, 'Joe', 0),
(2, 1, 'Jim', 1),
(3, 2, 'Fred', 1),
(4, 2, 'Mary', 1);
SELECT pOuter.Name
FROM Person pOuter
WHERE pOuter.IsHappy = 1
AND EXISTS
(SELECT 1
FROM Person pInner
WHERE pInner.HouseId = pOuter.HouseId
AND pInner.PersonId != pOuter.PersonId
AND pInner.IsHappy = 1);
Returns
Mary
Fred
(There are obviously other ways to find the same result, e.g. finding groupings of House Id where there exists 2 or more happy people, etc)

exists clause check specifies a subquery to test for the existence of rows.
so exists (SELECT 1 FROM test WHERE CONCAT(C1, C2, C3) LIKE '%Yellow%') will always have rows when any column contain yellow data.
if you want to use exists you need to set inner exists query CONCAT(t.C1, t.C2, t.C3) by the outer table.
SELECT *
FROM test t
where exists (SELECT 1 FROM test WHERE CONCAT(t.C1, t.C2, t.C3) LIKE '%Yellow%')
You don't need to use exists only set the condition on where
SELECT *
FROM test
where CONCAT(C1, C2, C3) LIKE '%Yellow%'
sqlfiddle

I would use cross apply:
SELECT 1
FROM test t CROSS APPLY
(SELECT COUNT(*) as cnt
FROM (VALLUES (C1), (C2), (C3)) V(C)
WHERE c = 'Yellow'
) v
WHERE cnt > 0;
You can readily adapt this to a subquery:
SELECT . . .
FROM test t
WHERE EXISTS (SELECT 1
FROM (VALLUES (C1), (C2), (C3)) V(C)
WHERE c = 'Yellow'
) ;
Personally, I much prefer the direct comparison of each value to 'Yellow' rather than using LIKE. For instance, this will not match "yellow-green" or any other value where "yellow" is part of the name.
And, just for the record, you can still use boolean logic, even if you don't use OR and IN:
where not (coalesce(c1, '') <> 'Yellow' and
coalesce(c2, '') <> 'Yellow' and
coalesce(c3, '') <> 'Yellow'
)
Technically, this is probably the "simplest" solution to your problem. However, I still prefer the apply method, because the intent is clearer.

Related

Unique combination of multiple columns, order doesn't matter

Suppose a table with 3 columns. each row represents a unique combination of each value:
a a a
a a b
a b a
b b a
b b c
c c a
...
however, what I want is,
aab = baa = aba
cca = cac = acc
...
Finally, I want to get these values in a CSV format as a combination for each value like the image that I attached.
Thanks for your help!
Below is the query to generate my problem, please take a look!
--=======================================
--populate test data
--=======================================
drop table if exists #t0
;
with
cte_tally as
(
select row_number() over (order by (select 1)) as n
from sys.all_columns
)
select
char(n) as alpha
into #t0
from
cte_tally
where
(n > 64 and n < 91) or
(n > 96 and n < 123);
drop table if exists #t1
select distinct upper(alpha) alpha into #t1 from #t0
drop table if exists #t2
select
a.alpha c1
, b.alpha c2
, c.alpha c3
, row_number()over(order by (select 1)) row_num
into #t2
from #t1 a
join #t1 b on 1=1
join #t1 c on 1=1
drop table if exists #t3
select *
into #t3
from (
select *
from #t2
) p
unpivot
(cvalue for c in (c1,c2,c3)
) unpvt
select
row_num
, c
, cvalue
from #t3
order by 1,2
--=======================================
--these three rows should be treated equally
--=======================================
select *
from #t2
where concat(c1,c2,c3) in ('ABA','AAB', 'BAA')
--=======================================
--what i've tried...
--row count is actually correct, but the problem is that it ommits where there're any duplicate alphabet.
--=======================================
select
distinct
stuff((
select
distinct
'.' + cvalue
from #t3 a
where a.row_num = h.row_num
for xml path('')
),1,1,'') as comb
from #t3 h
As pointed out in the comments, you can unpivot the values, sort them in the right order and reaggregate them into a single row. Then you can group the original rows by those new values.
SELECT *
FROM #t2
CROSS APPLY (
SELECT a = MIN(val), b = MIN(CASE WHEN rn = 2 THEN val), c = MAX(val)
FROM (
SELECT *, rn = ROW_NUMBER() OVER (ORDER BY val)
FROM (VALUES (c1),(c2),(c3) ) v3(val)
) v2
) v
GROUP BY v.a, v.b, v.c;
Really, what you should perhaps do, is ensure that the values are in the correct order in the first place:
ALTER TABLE #t2
ADD CONSTRAINT t2_ValuesOrder
CHECK (c1 <= c2 AND c2 <= c3);
Would be curious why, sure you have a reason. Might suggest having a lookup table, holding all associated keys to a "Mapping Table". You might optimize some of this as you implement it. First create one table for holding the "Next/New Key" (this is where the 1, 2, 3...) come from. You get a new "New Key" after each batch of records you bulk insert into your "Mapping Table". The "Mapping Table" holds the combination of the key values, one row for each combinations along with your "New Key" Should get a table looking something like:
A, B, C, 1
A, C, B, 1
B, A, C, 1
...
X, Y, Z, 2
X, Z, Y, 2
If you can update your source table to hold a column for your "Mapping Key" (the 1,2,3) then you just look up from the mapping table where (c1=a, c2=a, c3=b) order for this look-up shouldn't matter. One suggestion would create a composite unique key using c1,c2,c3 on your mapping table. Then to get your records just look up the "mapping key value" from the mapping table and then query for records matching the mapping key value. Or, if you don't do a pre-lookup to get the mapping key you should be able to do a self-join using the mapping key value...
If you want them in a CSV format:
select distinct v.cs
from #t2 t2 cross apply
(select string_agg(c order by c desc, ',') as cs
from (values (t2.c1), (t2.c2), (t2.c3)
) v(c)
) v;
It seems to me that what you need is some form of masking*. Take this fiddle:
http://sqlfiddle.com/#!18/fc67f/8
where I have created a mapping table that contains all of the possible values and paired that with increasing orders of 10. Doing a cross join on that map table, concatenating the values, adding the masks and grouping on the total will yield you all the unique combinations.
Here is the code from the fiddle:
CREATE TABLE maps (
val varchar(1),
num int
);
INSERT INTO maps (val, num) VALUES ('a', 1), ('b', 10), ('c', 100);
SELECT mask, max(vals) as val
FROM (
SELECT concat(m1.val, m2.val, m3.val) as vals,
m1.num + m2.num + m3.num as mask
FROM maps m1
CROSS JOIN maps m2
CROSS JOIN maps m3
) q GROUP BY mask
Using these values of 10 will ensure that mask contains the count for each value, one for each place column in the resulting number, and then you can group on it to get the unique(ish) strings.
I don't know what your data looks like, and if you have more than 10 possible values then you will have to use some other base than 10, but the theory should still apply. I didn't write code to extract the columns from the value table into the mapping table, but I'm sure you can do that.
*actually, I think the term I was looking for was flag.

Alternate approach to WITH CTE and large UNION query

I'd like to rework a script I've been given.
The way it currently works is via a WITH CTE using a large number of UNIONs.
Current setup
We're taking one record from a source table, inserting it into a destination table once with [Name] A then inserting it again with [Name] B. Essentially creating multiple rows in the destination, albeit with different [Name].
An example of one transaction would be to take this row from [Source]:
ID [123] Name [Red and Green]
The results of my current set up in the [Destination] is:
ID [123] Name [Red]
ID [123] Name [Green]
Current logic
Here's a simplified version of the current logic:
WITH CTE
AS
(SELECT ID,
'Red' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green'
UNION ALL
SELECT ID,
'Green' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green')
INSERT INTO [Destination_Table]
(ID,
[Name])
SELECT ID,
[Name]
FROM CTE;
The reason I'd like to rework this is when we get a new [Name], we have to manually add another portion of code into our (ever increasing) UNION, to make sure it gets picked up.
What I've considered
What I was considering was setting up a WHILE LOOP (or CURSOR) running off a control table, where we could store all of the [Names]. However, I'm not sure if this would be the best approach and I'm not too familiar yet with LOOPS/CURSORS. Also, Wouldn't be too sure of how to stop the loop once all [Name]s had been completed.
Any help much appreciated.
You can use cross apply to duplicate the rows:
insert into [destination_table] (id, name)
select x.*
from source_table s
cross apply (values (id, 'Red'), (id, 'Green')) x(id, name)
where name = 'Red and Green'
Introduce a new table called Color_List which just contains one row for each possible color. Then do this:
with cte as
(
select
st.ID,
c.colorname
from
Source_Table s
inner join
Color_List c
on
CHARINDEX(c.colorname, s.[Name]) > 0
)
insert into Destination_Table
(
ID,
[Name]
)
select
ID,
colorname
from
cte
The benefit of this method is that you aren't hard-coding any color names in the query. All the color names (and presumably there can be many more than two) get maintained in the Color_List table.
You could use string_split to split the values apart. First replace the ' and ' with a pipe '|'. Then do a string split on the vertical pipe.
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '[123]', 'Name', '[Red and Green]')) V(ID, testCol, nameCol, stringCol);
select ID, testCol, nameCol,
case when left([value], 1)!='[' then concat('[',[value]) else
case when right([value], 1)!=']' then concat([value], ']') else [value] end end valCol
from #tTEST t
cross apply string_split(replace(t.stringCol, ' and ', '|'), '|');
Results
ID testCol nameCol valCol
1 [123] Name [Red]
1 [123] Name [Green]

Nested query that requires the first result to be returned

I have 2 tables as such
Table ErrorCodes:
type_code desc
01 Error101
02 Error99
03 Error120
Table ErrorXML:
row_index typeCode
1 87
2 02
3 01
The output should be the description(column desc) of the first matched type_code between the 2 tables
Expected output : Error99
I have gotten so far.
select isnull(descript, 'unknown') as DESCRIPTION
from (select top 1 a.stmt_cd as descript
from ErrorCodes a, ErrorXML b
where a.type_cd = b.typecode
order by b.row_index)
But this query doesn't return the string UNKNOWN when there is no common typecode (join condition) between the 2 tables. In this case, im getting null.
How can I resolve this?
This is an interesting question. I believe the following can be an intuitive and beautiful solution (I used desc_ as column name rather than desc which is a reserved word):
select (select desc_ from ErrorCodes x where x.type_code = a.typeCode) desc_
from ErrorXML a
where (select desc_ from ErrorCodes x where x.type_code = a.typeCode) is not null
order by row_index
limit 1;
If you also need to handle the case if query returns no row then for MySQL, following syntax should suffice. For other databases you can use similar encapsulation with isnull, nvl, etc:
select ifnull((select (select desc_ from ErrorCodes x where x.type_code = a.typeCode) desc_ from ErrorXML a where (select desc_ from ErrorCodes x where x.type_code = a.typeCode) is not null order by row_index limit 1), 'UNKNOWN');
To test I used following scripts and seems to work properly:
create database if not exists stackoverflow;
use stackoverflow;
drop table if exists ErrorCodes;
create table ErrorCodes
(
type_code varchar(2),
desc_ varchar(10)
);
insert into ErrorCodes(type_code, desc_) values
('01', 'Error101'),
('02', 'Error99'),
('03', 'Error120');
drop table if exists ErrorXML;
create table ErrorXML
(
row_index integer,
typeCode varchar(2)
);
insert into ErrorXML(row_index, typeCode) values
('1', '87'),
('2', '02'),
('3', '01');
Final-1 quote: While generating your tables try to use same column names as much as possible. I.e. I'd suggest ErrorXML to use type_code rather than typeCode.
Final quote: I choose to use lower letters in SQL since capital letters should be used while emphasizing an important point. I also suggest that style.
What about this: Do a subquery to bring back the first row_index for each type_code.
Do a LEFT OUTER Join on the ErrorCodes table so that you get NULLs as well.
SELECT
ISNULL(ErrorCodes.desc,'unknown') AS description
ErrorXML.row_index
FROM ErrorCodes
LEFT OUTER JOIN (
SELECT type_code, MIN(row_index) AS row_index
FROM ErrorXML
GROUP BY type_code
) AS ErrorXML ON ErrorCodes.type_code = ErrorXML .type_code

Count(*) with 0 for boolean field

Let's say I have a boolean field in a database table and I want to get a tally of how many are 1 and how many are 0. Currently I am doing:
SELECT 'yes' AS result, COUNT( * ) AS num
FROM `table`
WHERE field = 1
UNION
SELECT 'no' AS result, COUNT( * ) AS num
FROM `table`
WHERE field = 0;
Is there an easier way to get the result so that even if there are no false values I will still get:
----------
|yes | 3 |
|no | 0 |
----------
One way would be to outer join onto a lookup table. So, create a lookup table that maps field values to names:
create table field_lookup (
field int,
description varchar(3)
)
and populate it
insert into field_lookup values (0, 'no')
insert into field_lookup values (1, 'yes')
now the next bit depends on your SQL vendor, the following has some Sybase (or SQL Server) specific bits (the outer join syntax and isnull to convert nulls to zero):
select description, isnull(num,0)
from (select field, count(*) num from `table` group by field) d, field_lookup fl
where d.field =* fl.field
you are on the right track, but the first answer will not be correct. Here is a solution that will give you Yes and No even if there is no "No" in the table:
SELECT 'Yes', (SELECT COUNT(*) FROM Tablename WHERE Field <> 0)
UNION ALL
SELECT 'No', (SELECT COUNT(*) FROM tablename WHERE Field = 0)
Be aware that I've checked Yes as <> 0 because some front end systems that uses SQL Server as backend server, uses -1 and 1 as yes.
Regards
Arild
This will result in two columns:
SELECT SUM(field) AS yes, COUNT(*) - SUM(field) AS no FROM table
Because there aren't any existing values for false, if you want to see a summary value for it - you need to LEFT JOIN to a table or derived table/inline view that does. Assuming there's no TYPE_CODES table to lookup the values, use:
SELECT x.desc_value AS result,
COALESCE(COUNT(t.field), 0) AS num
FROM (SELECT 1 AS value, 'yes' AS desc_value
UNION ALL
SELECT 2, 'no') x
LEFT JOIN TABLE t ON t.field = x.value
GROUP BY x.desc_value
SELECT COUNT(*) count, field FROM table GROUP BY field;
Not exactly same output format, but it's the same data you get back.
If one of them has none, you won't get that rows back, but that should be easy enough to check for in your code.

How to select only one full row per group in a "group by" query?

In SQL Server, I have a table where a column A stores some data. This data can contain duplicates (ie. two or more rows will have the same value for the column A).
I can easily find the duplicates by doing:
select A, count(A) as CountDuplicates
from TableName
group by A having (count(A) > 1)
Now, I want to retrieve the values of other columns, let's say B and C. Of course, those B and C values can be different even for the rows sharing the same A value, but it doesn't matter for me. I just want any B value and any C one, the first, the last or the random one.
If I had a small table and one or two columns to retrieve, I would do something like:
select A, count(A) as CountDuplicates, (
select top 1 child.B from TableName as child where child.A = base.A) as B
)
from TableName as base group by A having (count(A) > 1)
The problem is that I have much more rows to get, and the table is quite big, so having several children selects will have a high performance cost.
So, is there a less ugly pure SQL solution to do this?
Not sure if my question is clear enough, so I give an example based on AdventureWorks database. Let's say I want to list available States, and for each State, get its code, a city (any city) and an address (any address). The easiest, and the most inefficient way to do it would be:
var q = from c in data.StateProvinces select new { c.StateProvinceCode, c.Addresses.First().City, c.Addresses.First().AddressLine1 };
in LINQ-to-SQL and will do two selects for each of 181 States, so 363 selects. I my case, I am searching for a way to have a maximum of 182 selects.
The ROW_NUMBER function in a CTE is the way to do this. For example:
DECLARE #mytab TABLE (A INT, B INT, C INT)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
FROM #mytab AS m
)
SELECT *
FROM numbered
WHERE rn=1
As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.
The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.
In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.
It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?
If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.
select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base
group by A
having (count(A) > 1)
you can do some thing like this if you have id as primary key in your table
select id,b,c from tablename
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1)
)d on tablename.id= d.id