Break out nested data within SQL, criteria across multiple rows (similar to dcast in R) - sql

I'm trying to write a simple query to take a data set that looks like this:
ID | Col2
X B
X C
Y B
Y D
and return this:
ID | Col2 | Col3
X B C
Y B D
Essentially, I have an ID column that can have either B, C, or D in Col2. I am trying to identify which IDs only have B and D. I have a query to find both, but not only that combination. Query:
select ID, Col2
from Table1
where ID in (
select ID from Table1
group by ID
having count(distinct Col2) = 2)
order by ID
Alternatively, I could use help in finding a way to filter that query on B and D and leave off B and C. I have seen perhaps a self join, but am not sure how to implement that.
Thanks!
EDIT: Most of the data set has, for a given ID, all three of B, C, and D. The goal here is to isolate the IDs that are missing one, namely missing C.

I am trying to identify which IDs only have B and D. I have a query to find both
If this is what you want, you don't need multiple columns:
select id
from table1
where col2 in ('B', 'D')
group by id
having count(distinct col2) = 2;
If you want only 'B' and 'D' and no others, then:
select id
from table1
group by id
having sum(case when col2 = 'B' then 1 else 0 end) > 0 AND
sum(case when col2 = 'C' then 1 else 0 end) > 0 AND
sum(case when col2 not in ('B', 'D') then 1 else 0 end) = 0;
If there are only two columns, you can also easily pivot the values using aggregation:
select id, min(col2), nullif(max(col2), min(col2))
from table1
group by id;

Related

SQL Server Weird Grouping Scenario by multiple columns and OR

I have a weird grouping scenario and have some troubles finding out what would be the best way for grouping in SQL.
Imagine we have the following one table
CREATE TABLE Item
(
KeyId VARCHAR(1) NOT NULL,
Col1 INT NULL,
Col2 INT NULL,
Col3 INT NULL
)
GO
INSERT INTO Item (KeyId, Col1, Col2, Col3)
VALUES
('a',1,2,3),
('b',5,4,3),
('c',5,7,6),
('d',8,7,9),
('e',11,10,9),
('f',11,12,13),
('g',20,22,21),
('h',23,22,24)
I need to group records in this table so that if Col1 OR Col2 OR Col3 is the same for two records, then these two records should be in the same group, and there should be chaining.
In other words, with the data as above record 'a' (first record) has Col3 = 3 and record 'b' (second record) has also Col3 = 3, so these two should be in one group. But then record 'b' has the same Col1 as record 'c', so record 'c' should be in the same group as 'a' and 'b'. And then record 'd' has the same Col2 as in 'c', so this should also be in the same group. Similarly 'e' and 'f' has the same values in Col3 and Col1 respectively.
On the other hand records 'g' and 'h' will be in one group (because they have the same Col2 = 22), but this group will be different from the group for records 'a','b','c','d','e','f'.
The result of the query should be something like
KeyId GroupId
'a' 1
'b' 1
'c' 1
'd' 1
'e' 1
'f' 1
'g' 2
'h' 2
There is probably a way of doing this with some loops/cursors, but I started thinking about cleaner way and this seems quite difficult.
Here you go:
with g (rootid, previd, level, keyid, col1, col2, col3) as (
select keyid, '-', 1, keyid, col1, col2, col3 from item
union all
select g.rootid, g.keyid, g.level + 1, i.keyid, i.col1, i.col2, i.col3
from g
join item i on i.col1 = g.col1 or i.col2 = g.col2 or i.col3 = g.col3
where i.keyid > g.keyid
),
m (keyid, rootid) as (
select keyid, min(rootid) from g group by keyid
)
select * from m;
Result:
keyid rootid
----- ------
a a
b a
c a
d a
e a
f a
g g
h g
Note: Keep in mind that SQL Server has by default a limit of 100 iterations (number of rows per group) when processing recursive CTEs. In English: even though it's possible to do this as shown above, there are clear limitations to what SQL Server can process. If you reach this limit you'll get the message:
The maximum recursion 100 has been exhausted before statement completion.
If this happens consider adding the clause option (maxrecursion 32767).

How do I check if a certain value exists?

I have a historization table called CUR_VALID. This table looks something like this:
ID CUR_VALID
1 N
1 N
1 Y
2 N
2 Y
3 Y
For every ID there needs to be one Y. If there is no Y or multiple Y there is something wrong. The statment for checking if there are multiple Y I already got. Now I only need to check for every ID if there is one Y existing. Im just not sure how to do that. This is what I have so far. So how do I check if the Value 'Y' exists?
SELECT Count(1) [Number of N]
,MAX(CUR_VALID = 'N')
,[BILL_ID]
,[BILL_MONTH]
,[BILL_SRC_ID]
FROM db.dbo.table
GROUP BY [BILL_ID]
,[BILL_MONTH]
,[BILL_SRC_ID]
Having MAX(CUR_VALID = 'N') > 1
Why are you fiddling with 'N' when you are interested in 'Y'?
Use conditional aggregation to get the count of the value your are interested in.
SELECT
COUNT(*) AS number_of_all,
COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) AS number_of_y,
COUNT(CASE WHEN cur_valid = 'N' THEN 1 END) AS number_of_n,
bill_id,
bill_month,
bill_src_id,
FROM db.dbo.table
GROUP BY bill_id, bill_month, bill_src_id;
Add a HAVING clause in order to get only valid
HAVING COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) = 1
or invalid
HAVING COUNT(CASE WHEN cur_valid = 'Y' THEN 1 END) <> 1
bills.
The following query will give you the list of id for which your integrity condition is not met: For every ID there needs to be one Y. If there is no Y or multiple Y there is something wrong.
select T1.id from table T1 where (select count(*) from table T2 where T2.id=T1.id and T2.CUR_VALID='Y')!=1
This query returns both not having at least one 'Y' value and more than one 'Y' value ID's.
First, sum up the Y values and relate to each id, then select not 1 ones from that table.
select * from (
select ID, SUM(case when CUR_VALID = 'Y' then 1 else 0 end) as CNT
from table
group by ID
) b where b.CNT <> 1
DBFiddle
As I understand, you want to get all the id for which your integrity check passes. And integrity check for you means, there is only one row with CUR_VALID value equal to Y in the CUR_VALID table.
This can be achieved by a group by clause:
select id from CUR_VALID
where CUR_VALID.CUR_VALID = 'Y'
group by id
having count(CUR_VALID.CUR_VALID) = 1;

How can I group by two rows in SQL?

In the result of an SQL Select command I have two rows:
A | B
B | A
A|B and B|A means the same to me. I want, that only one of them would be selected in an SQL command.
How can I do that?
I have a select command , I join it self (natural join), like this:
SELECT a.coloumn ,b.coloumn
FROM table a,table b
where .... (not important)
and b.coloumn IN (
SELECT coloumn
FROM table
where ... (the same like above)
)
and b.coloumn != a.coloumn ;
And after that I have multiple coloumns.
You neither told us your column names nor your table name, but assuming you have two columns A and B in a table named the_table then the following will do:
select distinct least(a,b), greatest(a,b)
from the_table;
If you want to group by them using standard SQL:
select (case when a < b then a else b end) as a,
(case when a < b then b else a end) as b,
count(*) as cnt
from table t
group by (case when a < b then a else b end),
(case when a < b then b else a end);
Oracle supports the greatest() and least() functions, but not all databases do.
Another possible solution is:
select a, b from the_table
union
select b, a from the_table
This would work fine even if there are NULL values.

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

SQL (TSQL) - Select values in a column where another column is not null?

I will keep this simple- I would like to know if there is a good way to select all the values in a column when it never has a null in another column. For example.
A B
----- -----
1 7
2 7
NULL 7
4 9
1 9
2 9
From the above set I would just want 9 from B and not 7 because 7 has a NULL in A. Obviously I could wrap this as a subquery and USE the IN clause etc. but this is already part of a pretty unique set and am looking to keep this efficient.
I should note that for my purposes this would only be a one-way comparison... I would only be returning values in B and examining A.
I imagine there is an easy way to do this that I am missing, but being in the thick of things I don't see it right now.
You can do something like this:
select *
from t
where t.b not in (select b from t where a is null);
If you want only distinct b values, then you can do:
select b
from t
group by b
having sum(case when a is null then 1 else 0 end) = 0;
And, finally, you could use window functions:
select a, b
from (select t.*,
sum(case when a is null then 1 else 0 end) over (partition by b) as NullCnt
from t
) t
where NullCnt = 0;
The query below will only output one column in the final result. The records are grouped by column B and test if the record is null or not. When the record is null, the value for the group will increment each time by 1. The HAVING clause filters only the group which has a value of 0.
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
If you want to get all the rows from the records, you can use join.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
) b ON a.b = b.b