Identify columns with varying values for each set of columns - sql

Given a table, t:
a b c d e
1 2 3 4 7
1 2 3 5 7
3 2 4 6 7
3 2 4 6 8
What SQL query can identify the columns that has one or more instances of varying values associated with each tuple from columns a and b, ?
In table t above, columns d and e would satisfy this criterion but not column c.
For tuples <1,2> and <3,2> that come from columns a and b, column c doesn't have varying values for each tuple.
Column d has one instance of varying values for tuple <1,2> -- values 4 and 5.
Column e also has one instance of varying values for tuple <3,2> -- values 7 and 8.

Something like this should work for you using CASE, COUNT and GROUP BY:
select
a, b,
case when count(distinct c) > 1 then 'yes' else 'no' end colc,
case when count(distinct d) > 1 then 'yes' else 'no' end cold,
case when count(distinct e) > 1 then 'yes' else 'no' end cole
from t
group by a, b
SQL Fiddle Demo

Slightly indirectly:
SELECT a, b,
COUNT(DISTINCT c) AS num_c,
COUNT(DISTINCT d) AS num_d,
COUNT(DISTINCT e) AS num_e
FROM t
GROUP BY a, b;
This yields:
1 2 1 2 1
3 2 1 1 2
If the num_c or num_d or num_e column has a value greater than 1, then there are varying values. You can vary the query to list whether the column is varying for a given value of (a, b) by using a CASE statement like this:
-- v for varying, n for non-varying
SELECT a, b,
CASE WHEN COUNT(DISTINCT C) > 1 THEN 'v' ELSE 'n' END AS num_c,
CASE WHEN COUNT(DISTINCT d) > 1 THEN 'v' ELSE 'n' END AS num_d,
CASE WHEN COUNT(DISTINCT e) > 1 THEN 'v' ELSE 'n' END AS num_e
FROM t
GROUP BY a, b;
This yields:
1 2 n v n
3 2 n n v
If you really want just to know whether any set of values in the given column varies for any values of (a, b) — and not which values of (a, b) it varies for — you can use the query above as a sub-query in the FROM clause and organize things as you want.
SELECT MAX(num_c) AS num_c,
MAX(num_d) AS num_d,
MAX(num_e) AS num_e
FROM (SELECT a, b,
CASE WHEN COUNT(DISTINCT C) > 1 THEN 'v' ELSE 'n' END AS num_c,
CASE WHEN COUNT(DISTINCT d) > 1 THEN 'v' ELSE 'n' END AS num_d,
CASE WHEN COUNT(DISTINCT e) > 1 THEN 'v' ELSE 'n' END AS num_e
FROM t
GROUP BY a, b
);
This relies on v being larger than n; it is easy enough (and convenient enough) for this binary decision, but not necessarily convenient or easy if there are, say, 4 states to map.
This yields:
n v v

Related

How do I aggregate/sort the results of one column based on data in another?

I have a table displaying three columns, A, B, C. Column A has duplicate values. How do I sort the results based on column C?
For example:
A B C
Amanda healthy
Amanda healthy
Brian healthy
Brian sick
Brian healthy
Colleen [null]
Colleen sick
Tyler healthy
Tyler [null]
Tyler fever
Daniel [null]
Daniel [null]
Daniel [null]
So that's just an example. I've left column B blank because it doesn't really matter here. What I'm trying to do is aggregate the duplicates in A based on the results in C. If all the results are null, then that should show me value 0. If the results are all healthy, or a mixture of healthy and null, then I want value 1. If there is any mention of being sick in the results, I want that to be value 2.
So for example, in the above, I want Amanda to give me value 1, Brian 2, Colleen 2, Tyler 2, Daniel 0. Any thoughts on how I may go about doing that? Thank you!
From your data and results, it looks like want count(distinct):
select a, count(distinct c) cnt
from mytbale
group by a
From the description of your question, that's a bit different. You can use a case expression:
select
a,
case
when max(case when c = 'sick' then 1 else 0 end) = 1 then 2
when max(case when c = 'healthy' then 1 else 0 end) = 1 then 1
when count(c) = 0 then 0
end as res
from mytable
group by a
First step: register the incidences (yes/no = 1/0) of the possible outcomes in C per A:
SELECT a,BIT_XOR(CASE WHEN c IS NULL THEN 1 WHEN c="healthy" THEN 2 ELSE 4 END) FROM table GROUP BY a;
(here I've assumed that "sickness" and "fever" are to be treated similarly since you didn't specify what e.g. all "fever"'s should yield)
In that query the outputs are:
1 if all NULL (for c).
2 if all "healthy".
3 if all "healthy" (at least 1) or NULL (at least one)
otherwise (1xxb) if there is at least one incidence of "sick"/"fever".
To adapt those outputs to your desirable values (0/1/1/2 for the respective cases) the query can be adapted to e.g.
SELECT a,If(bitmap=1,0,IF(bitmap&4,2,1)) FROM
(SELECT a,BIT_XOR(CASE WHEN c IS NULL THEN 1 WHEN c="healthy" THEN 2 ELSE 4 END) bitmap FROM table GROUP BY a) tmp;
PS: if you have allergic reactions to this nested query (e.g. because the SQL server might not optimize it by itself), just rewrite it into a single query by repeating the inner expression, or simply translate the values of the original query at the receiving end.
That script is for oracle, but the 'select' is SQL-92. Hope it will be useful for you.
create table anamnesis (
a varchar(20),
c varchar(20)
);
INSERT ALL
INTO anamnesis (a, c) VALUES ('Amanda','healthy')
INTO anamnesis (a, c) VALUES ('Amanda','healthy')
INTO anamnesis (a, c) VALUES ('Brian','healthy')
INTO anamnesis (a, c) VALUES ('Brian','sick')
INTO anamnesis (a, c) VALUES ('Brian','healthy')
INTO anamnesis (a, c) VALUES ('Colleen',null)
INTO anamnesis (a, c) VALUES ('Colleen','sick')
INTO anamnesis (a, c) VALUES ('Tyler','healthy')
INTO anamnesis (a, c) VALUES ('Tyler',null)
INTO anamnesis (a, c) VALUES ('Tyler','fever')
INTO anamnesis (a, c) VALUES ('Daniel',null)
INTO anamnesis (a, c) VALUES ('Daniel',null)
INTO anamnesis (a, c) VALUES ('Daniel',null)
SELECT * FROM dual;
select t.a, (case when sick>0 then 2 else
case when healthy>0 then 1 else
0
end
end
) res
from
(select
t.a,
sum(case when t.c is null then 1 else 0 end) nullable,
sum(case when t.c='healthy' then 1 else 0 end) healthy,
sum(case when coalesce(t.c,'-') not in ('-','healthy') then 1 else 0 end) sick
from
anamnesis t
group by t.a) t;

Order by -- different sequence for different criteria

I want to do something like this:
select a
from table
order by
case when a='A' then b,c,d
else d,c,b
a, b, c, d are all columns of the the table.
Your question is not that clear about the result that you really expect, bu I suspect that is:
order by
case when a = 'A' then b else d end,
c,
case when a = 'A' then d else b end
Or if you want records where a = 'A' first (with the specified order), and then the rest of the records (with the other sequence), then:
order by
case when a = 'A' then 0 else 1 end,
case when a = 'A' then b else d end,
c,
case when a = 'A' then d else b end

Suggested sequence of responses in query

I have such values in the letter column:
A, B, C, D, E, **X**.
I would like the select to return to me such an order of
A, B, **X**, C, D, E.
I tried with ORDER BY, but I don't know if it's a good way, or it should be SELECT Top 2 and next...
If it's only one character:
order by case when MyColumn < 'C' then 1
when MyColumn = 'X' then 2
else 3
end,
MyColumn
In this case you should assign a numeric value to each of the possible values in order to get them in the desired way. It could be something like
order by case when column = 'A' then 1
when column = 'B' then 2
when column = 'C' then 3
when column = 'X' then 4
...
else 99999999
end
This can be done using a case expression to re-position the 'X' between 'B' and 'C' as follows.
order by case when MyColumn = 'X' then 'BB' else MyColumn end

SQL : Group by and check if all, some or none are set

Lets say I have the following table:
FKEY A B C D E F
'A' 1 0 1 0 1 0
'A' 0 1 1 1 0 0
Now i want to make a group by FKEY but I just want to know if the A-F columns has 1 in one, all or none of the grouped rows.. The resulton the above table would be:
FKEY A B C D E F
'A' S S A S S N
..where S is "some", A is "all" and N is "none".
What would be the best approach to make this query. I could so some nested queries, but isnt there a smarter way?
In my real life data, the 1's and 0's are actually DATETIME and NULL's
You can use case and aggregation:
select fkey,
(case when sum(a) = 0 then 'N'
when sum(a) = count(*) then 'A'
else 'S'
end) as a,
(case when sum(b) = 0 then 'N'
when sum(b) = count(*) then 'A'
else 'S'
end) as b,
. . .
from t
group by fkey;
The above assumes that the values are only 0 and 1. If that is the case, you can actually phrase this as:
(case when max(a) = 0 then 'N'
when min(a) = 1 then 'A'
else 'S'
end) as a,
You mentioned that your 0 and 1 are actually null or non null dates. Here's a modified version of Gordon's query that caters for that:
select fkey,
(case when count(datecol) = 0 then 'all dates are null'
when count(datecol) = count(*) then 'all dates are filled'
else 'some are null, some filled'
end) as a,
...
from t
group by fkey;
COUNT(null) is 0, COUNT('2001-01-01') is 1, COUNT(*) is the row count independent of any variable. Hence, if our count of the dates was 0, all must be null. If the count of the dates was equal to the count of the rows, then all must be filled with some value, otherwise it's a mix

sql: select rows where group of elements occurs several times in the table

I am searching for an implementation of the following pseodo-code:
SELECT A, B, C
FROM X
HAVING COUNT(A,B) > 1
Here is an example of what the code should do:
Assume table X looks as follows:
A B C D
--------------
1 1 0 2
1 1 1 1
2 1 1 0
The first and the second row have the same entries in columns A and B, the third column is identical in column B but different in column A. The desired output is columns A,B, and C of rows 1 and 2:
1 1 0
1 1 1
How could this be implemented? The problem with my pseodo-code is, that COUNT accepts either a single column or all columns (*), but it can't take two out of 4 columns. GROUP BY has the same property.
You can do this with an exists clause. This should work in all databases:
select a, b, c
from x
where exists (select 1
from x x2
where x.a = x2.a and x.b = x2.b and x.c <> x2.c
);
This assumes that the rows have difference c values.
This will perform best with an index on x(a, b).
For RDMS that supports analytic functions, you can do
SELECT a,b,c
FROM
(
SELECT a, b, c, count(1) OVER(PARTITION BY a,b) cnt
FROM X
)t1
WHERE t1.cnt >1
If analytic/windows function are not available , join should do the job
SELECT t1.a, t1.b, t1.c
FROM X t1
INNER JOIN
(
SELECT a,b
FROM X
GROUP BY a,b
HAVING COUNT(1) >1
)t2 ON (t2.a=t1.a AND t2.b=t1.b)