How do I report the results for a test that matches ID's across different sources? - sql

An ID can exist in multiple SOURCE's. I need to test whether the VALUE for the same ID matches across different SOURCES. If they didn't all match across all sources, it should return FALSE.
CREATE TABLE example_table
(
SOURCE varchar(255),
ID varchar(255),
VALUE varchar(255)
);
INSERT INTO example_table (SOURCE, ID, VALUE)
VALUES ('A', 1, 55), ('A', 2, 36), ('B', 1, 55), ('B', 2, 34);
With the code above, I would like the query to return the following:
ID MATCH
1 TRUE
2 FALSE
This is a bit of a "big data" problem, as there are millions of ID's and around 50 or so sources. The query is being written for Vertica 9.2.

You can use aggregation:
select et.id,
(case when min(value) = max(value) then 'true' else 'false' end) as match
from example_table et
group by et.id;
You can simplify this to:
select et.id,
(min(value) = max(value)) as match
from example_table et
group by et.id;

Using SELF JOIN and GROUP BY, you can write the query as below:
SELECT t1.id, t1.value, (CASE WHEN t1.value = t2.value THEN 'true' ELSE 'false' END) AS 'MATCH'
FROM example_table t1 INNER JOIN example_table t2
ON t1.id = t2.id
WHERE t1.source != t2.source
GROUP BY t1.id, t1.value;
Output:
id | value | match
1 55 true
2 34 false
2 36 false

Related

SQL CASE statement returns duplicate values

Here is how my data looks
title value
------------
t1 v1
t2 v2
t3 v3
Now I want t1 and t2 to be inferred as the same value t12. So, I do:
SELECT
CASE
WHEN title = 't1' OR title = 't2'
THEN 't12'
ELSE title
END AS inferred_title,
COUNT(value)
FROM
my_table
GROUP BY
inferred_title;
I expected the output to be:
inferred title values
-----------------------
t12 2
t3 1
But what I end up getting is:
inferred title values
--------------------------
t12 1
t12 1
t3 1
How do I make it behave the way I want it to? I don't want the duplicated rows.
The problem is scoping. You must have an inferred_title in the table. Either give a new column alias or repeat the expression:
SELECT (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END) AS inferred_title,
COUNT(value)
FROM my_table
GROUP BY (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END);
Do the "merge" case in a derived table (sub-query), group by its result:
SELECT inferred_title, COUNT(value)
FROM
(
SELECT CASE WHEN title = 't1' OR title = 't2' THEN 't12'
ELSE title
END AS inferred_title,
value
FROM my_table
) dt
GROUP BY inferred_title;
This saves you some typing, is less error prone and easier to maintain - and is
ANSI SQL compliant!
Select Title, COUNT(Title) AS Totals
From my_table
Group By Title
Having COUNT(Title)>1
Order By 2 desc

SQL query to get repeating column value that have other columns in a certain codition

Let's say we have below table of below schema.
create table result
(
id int,
task_id int,
test_name string,
test_result string
);
And dataset populated on this table looks like this.
insert into result
values (1, 1, 'test_a', 'pass'),
(2, 1, 'test_b', 'fail'),
(3, 1, 'test_c', 'pass'),
(4, 1, 'test_d', 'pass'),
(5, 2, 'test_a', 'pass'),
(6, 2, 'test_b', 'pass'),
(7, 2, 'test_c', 'pass'),
(8, 2, 'test_d', 'pass');
Basically single task has multiple test results entry. I want to retrieve task_id that has test_b fail but all the other test passed. So in this example it should return only task_id: 1.
I've tried with EXISTS and HAVING but it doesn't seem working in this case. I'm new to SQL. How can I implement it?
I would just use aggregation with a having clause:
select task_id
from result
group by task_id
having sum(case when test_name = 'test_b' and test_result = 'fail' then 1 else 0 end) = 1 and
sum(case when test_result = 'pass' then 1 else 0 end) = count(*) - 1;
The first condition validates that test_b failed. The second counts the number of passes and it should be one less then the number of rows for the task.
If your database supports except (or minus), you an use set-based operations:
select task_id
from result
where test_name = 'test_b' and test_result = 'fail'
except
select task_id
from result
where test_name <> 'test_b' and test_result = 'fail'
Maybe selecting distinct task IDs that have a fail result:
select distinct [task_id], [task_result]
from [result]
where [task_result] = 'fail'
Note that this query will scan the entire table unless there is an index on task_result.
Following code first sums test takers per task and counts fro 'test_b' whether it failed or not. Outer select ensure 'test_b' failed and other have passed.
select task_id from (
select
task_id,
count(test_result) numberoftakers,
sum(case when test_result<>'pass' AND test_name='test_b' then 1 else 0 end) numberoffailb,
sum(case when test_result='pass' then 1 else 0 end) numberofallpasses
from result
group by task_id) a
where numberoftakers=numberoffailb+numberofallpasses and numberoffailb=1
Assuming that (task_id, task_name) is a unique key of your table, you can indeed use (not) exists, along with a correlated subqueries wich ensures that other records having the same task_id did not passed.
select task_id
from result r
where
test_name = 'test_b'
and test_result = 'fail'
and not exists (
select 1
from result r1
where
r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
)
The left join antipattern also comes to mind:
select r.task_id
from result r
left join result r1
on r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
where
r.test_name = 'test_b'
and r.test_result = 'fail'
and r1.id is null
Demo on DB Fiddle - Both queries return:
| task_id |
| :------ |
| 1 |

SQL query to get count based on filtered status

I have a table which has two columns, CustomerId & Status (A, B, C).
A customer can have multiple status in different rows.
I need to get the count of different status based on following rules:
If the status of a customer is A & B, he should be counted in Status A.
If status is both B & C, it should be counted in Status B.
If status is all three, it will fall in status A.
What I need is a table with status and count.
Could please someone help?
I know that someone would ask me to write my query first, but i couldn't understand how to implement this logic in query.
You could play with different variations of this:
select customerId,
case when HasA+HasB+HasC = 3 then 'A'
when HasA+HasB = 2 then 'A'
when HasB+HasC = 2 then 'B'
when HasA+HasC = 2 then 'A'
when HasA is null and HasB is null and HasC is not null then 'C'
when HasB is null and HasC is null and HasA is not null then 'A'
when HasC is null and HasA is null and HasB is not null then 'B'
end as overallStatus
from
(
select customerId,
max(case when Status = 'A' then 1 end) HasA,
max(case when Status = 'B' then 1 end) HasB,
max(case when Status = 'C' then 1 end) HasC
from tableName
group by customerId
) as t;
I like to use Cross Apply for this type of query as it allows for use of the calculated status in the Group By clause.
Here's my solution with some sample data.
Declare #Table Table (Customerid int, Stat varchar(1))
INSERT INTO #Table (Customerid, Stat )
VALUES
(1, 'a'),
(1 , 'b'),
(2, 'b'),
(2 , 'c'),
(3, 'a'),
(3 , 'b'),
(3, 'c')
SELECT
ca.StatusGroup
, COUNT(DISTINCT Customerid) as Total
FROM
#Table t
CROSS APPLY
(VALUES
(
CASE WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'a')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
THEN 'A'
WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'c')
THEN 'B'
ELSE t.stat
END
)
) ca (StatusGroup)
GROUP BY ca.StatusGroup
I edited this to deal with Customers who only have one status... in which case it will return A, B or C dependant on the customers status

Joining two select queries from the same table

The table contains an ID column, valueHeading column and a value column. I want to separate the value column into two new columns called valueHeading1 and valueHeading2 depending on which type of valueHeading the value has.
So I want to join this select:
Edit: Full join
SELECT ID
,valueHeading
,value as 'valueHeading1'
FROM table1
WHERE valueHeading = 'valueHeading1'
With This select:
SELECT ID
,value as 'valueHeading2'
FROM table1
WHERE valueHeading = 'valueHeading2'
on their respective ID's. How do I do this?
Edit to illustrate what I want to do:
Original table:
ID valueHeading value
0 valueHeading1 a
0 valueHeading2 a
1 valueHeading1 ab
1 valueHeading2 NULL
2 valueHeading1 abcd
2 valueHeading2 abc
New Table:
ID valueHeading1 valueHeading2
0 a a
1 ab NULL
2 abcd abc
If you need only join use this. Using case when is elegant way if you don't need join.
SELECT * FROM
(SELECT ID
,valueHeading
,value as 'valueHeading1'
FROM table1
WHERE valueHeading = 'valueHeading1') AS TAB_1,
(SELECT ID
,value as 'valueHeading2'
FROM table1
WHERE valueHeading = 'valueHeading2') AS TAB_2
WHERE TAB_1.ID = TAB_2.ID
Try something like :
SELECT ID
, CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END AS valueHeading1
, CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END AS valueHeading2
FROM table1
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
If you want to regroup all values on one row for each ID, you can try :
SELECT ID
, MAX(CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END) AS valueHeading1
, MAX(CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END) AS valueHeading2
FROM table1
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
GROUP BY ID
HAVING MAX(CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END) IS NOT NULL
OR MAX(CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END) IS NOT NULL
See SQLFiddle. I also tried on Oracle 11g and MSSQL 2012, and it works each time.
In SQLServer2005+ possible use PIVOT
SELECT ID, valueHeading1, valueHeading2
FROM
(
SELECT *
FROM dbo.test28
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
) x
PIVOT
(
MAX(value)
FOR valueHeading IN ([valueHeading1], [valueHeading2])
) p
Demo on SQLFiddle
self join could be a simple solution
SELECT DISTINCT t1.ID, t1.value as valueHeading1, t2.value as valueHeading2,
FROM table1 t1
INNER JOIN table1 t2 ON t1.ID = t2.ID
WHERE t1.valueHeading <> t2.valueHeading

SQL statement for maximum common element in a set

I have a table like
id contact value
1 A 2
2 A 3
3 B 2
4 B 3
5 B 4
6 C 2
Now I would like to get the common maximum value for a given set of contacts.
For example:
if my contact set was {A,B} it would return 3;
for the set {A,C} it would return 2
for the set {B} it would return 4
What SQL statement(s) can do this?
Try this:
SELECT value, count(distinct contact) as cnt
FROM my_table
WHERE contact IN ('A', 'C')
GROUP BY value
HAVING cnt = 2
ORDER BY value DESC
LIMIT 1
This is MySQL syntax, may differ for your database. The number (2) in HAVING clause is the number of elements in set.
SELECT max(value) FROM table WHERE contact IN ('A', 'C')
Edit: max common
declare #contacts table ( contact nchar(10) )
insert into #contacts values ('a')
insert into #contacts values ('b')
select MAX(value)
from MyTable
where (select COUNT(*) from #contacts) =
(select COUNT(*)
from MyTable t
join #contacts c on c.contact = t.contact
where t.value = MyTable.value)
Most will tell you to use:
SELECT MAX(t.value)
FROM TABLE t
WHERE t.contact IN ('A', 'C')
GROUP BY t.value
HAVING COUNT(DISTINCT t.*) = 2
Couple of caveats:
The DISTINCT is key, otherwise you could have two rows of t.contact = 'A'.
The number of COUNT(DISTINCT t.*) has to equal the number of values specified in the IN clause
My preference is to use JOINs:
SELECT MAX(t.value)
FROM TABLE t
JOIN TABLE t2 ON t2.value = t.value AND t2.contact = 'C'
WHERE t.contact = 'A'
The downside to this is that you have to do a self join (join to the same table) for every criteria (contact value in this case).