SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value - sql

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.

And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...

Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable

No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.

I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR

If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)

Related

Counting Booleans for Distinct and Non Distinct ID numbers

I have a simple table that looks like the following PNG file from the following join:
SELECT *
FROM tableA A
JOIN tableB B ON B.Main_SPACE_ID = A.Main_SPACE_ID
Table A contains Guest_ON and User_Controls (last 2 columns) and Table B contains Trigger_ON and DOCX_ON.
Issue:
What I am trying to do is count all the True's for each tableB.Subspace_ID and the DISTINCT trues for tableA.Main_SPACE_ID.
The problem is that subspace_ID from table B lives within the main_space_id from table A and therefore creates a situation where I am double counting.
I only want to count the trues for a distinct Main_space ID
Current Data Model
Desired Output:
From the above screenshot, I am trying to get a count of true values without double counting in the case for tableA_MAIN_SPACE_ID.
As you can see, each row is counted for true values as it relates to the subspace_ID (table B) for totals of 12 and 8 (1 if True, 0 if False) and for tableA, I am only counting distinct values so we only count Trues for a single MainspaceID and avoid recounting them.
If someone can advise on how to get this output from my current data model that would be very helpful!
My attempt as follows double counts trues for the Main space ID column..
SELECT
count(CASE WHEN B.TRIGGER_ON THEN 1 END) as TRIGGER_ON,
count(CASE WHEN B.DOCX_ON THEN 1 END) as DOCX_ON,
count(CASE WHEN A.GUEST_ON THEN 1 END) as SPRINTS,
count(CASE WHEN A.USER_CONTROLS THEN 1 END) as SPRINTS
FROM DataModel
What I am trying to do is count all the True's for each tableB.Subspace_ID and the DISTINCT trues for tableA.Main_SPACE_ID.
You can use conditional aggregation. In Snowflake, you can use the convenient COUNT_IF() for the first two columns. However, for the second two, you need COUNT(DISTINCT) with conditional logic:
SELECT COUNT_IF( B.Trigger_on ) as Trigger_On,
COUNT_IF( B. DOCX_ON ) as DOCX_ON,
COUNT(DISTINCT CASE WHEN A.GUEST_ON THEN A.Main_SPACE_ID END) as GUEST_ON,
COUNT(DISTINCT CASE WHEN A. USER_CONTROLS THEN A.Main_SPACE_ID END) as USER_CONTROLS
FROM tableA A JOIN
tableB B
ON B.Main_SPACE_ID = A.Main_SPACE_ID;
Mabye:
SELECT
COUNT(CASE WHEN B.TRIGGER_ON THEN 1 END) AS TRIGGER_ON,
COUNT(CASE WHEN B.DOCX_ON THEN 1 END) AS DOCX_ON,
(SELECT COUNT(*) FROM (SELECT DISTINCT A.MAIN_SPACE_ID, A.GUEST_ON FROM DataModel WHERE A.GUEST_ON = TRUE) A) AS GUEST_ON
(SELECT COUNT(*) FROM (SELECT DISTINCT A.USER_CONTROLS, A.GUEST_ON FROM DataModel WHERE A.USER_CONTROLS = TRUE) A) AS USER_CONTROLS
FROM DataModel

Use of CASE with criteria from multiple tables

I have to do a select query to create a view with specific criteria.
I have multiple tables which contains many many columns and lines.
However, I have extracted a value to use as my key (e.g.: id). I have 7000+ of those unique keys that I extracted from all my tables with the function UNION to avoid duplicates.
Now, I want to add a column INDICATOR_1 which will affect the value YES or NO based on criteria.
This is where I struggle.
I need to find the line in those tables that contain the id. After that, I'd like to check, always in that line, if the field XYZ contains the value 'N' (example). If yes, affect the value 'YES' to INDICATOR_1, else it's no.
In a matter of pseudo-code, what I want to do looks like this :
CASE
WHEN id = (id from table_1) AND (if table_1.xyz = 'N')
THEN 'YES'
ELSE 'NO'
END AS INDICATOR_1
I don't know if I'm clear enough, but your help will be greatly appreciated.
If I understand correctly, you want a separate indicator for each table. Something like this:
select i.*,
(case when exists (select 1
from table1 t1
where t1.id = i.id and t1.xyz = 'N'
)
then 'YES' else 'NO'
end) as indicator_1,
(case when exists (select 1
from table2 t2
where t2.id = i.id and t2.xyz = 'N'
)
then 'YES' else 'NO'
end) as indicator_2,
. . .
from (<your id list here>) i
I think you should fix this in the union, where you have all the data you need. You probably have something like:
SELECT Id
FROM table_1
UNION
SELECT Id
FROM table_2
How about selecting the information you want as well (I use distinct here to clarify):
SELECT DISTINCT Id
, CASE WHEN table_1.xyz = 'N' THEN 'N'
ELSE 'Y'
END INDICATOR_1
FROM table_1
This can lead to more records than you had, if id's can have records of both flavours exist. We can fix that with a row number in an outer query. You end up with something like:
SELECT Id
, INDICATOR_1
FROM (
SELECT Id
, INDICATOR_1
, ROW_NUMBER()OVER(PARTITION BY ID ORDER BY CASE WHEN INDICATOR_1 ='N' THEN 0 ELSE 1 END) RN
FROM (
SELECT Id
, CASE WHEN table_1.xyz = 'N' THEN 'N'
ELSE 'Y'
END INDICATOR_1
FROM table_1
UNION
...
) T
) S
WHERE S.RN = 1
You can in fact shorten that by using the inner most case expression in the ROW_NUMBER expression.

Select rows having the same features than others

I've the following table with 3 columns: Id, FeatureName and Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
Each Id has different features and each Feature has a value for that Id.
I need to write a query which gives me the Ids that have exactly the same features and values than a given one, but only taking into account those whose name starts with 'A'. For example, in the top table, I can use that query to search for all the Ids that have the same features. For example, features with values where Id=1 would result Id=3 with same features starting with 'A' and same values for these features.
I found a couple of different ways to do this, but all of them go very slow when the table has lots of rows (more than hundred of thousands)
The way I obtain the best performance is using the next query:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
where #nFeatures is the number of features starting by 'A' in Id=1. I counted them before calling this query. I make the intersection to avoid results that have the same parameters than Id=1 but also some others whose name starts with 'A'.
I think that the slowest part is the second subquery:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
but I don't know how to make it faster. Maybe I will have to play with the indexes.
Any idea of how could I write a fast query for this purpose?
So you want all rows where the combination of FeatureName and Value is not unique? You can use EXISTS:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
Demo
how could I write a fast query for this purpose?
If it's not fast enough create an index on FeatureName + Value.
I tried to eliminate the join with MyTable again to select the data for the ID's that have matching FeatureName and Value values. Here's the query:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
Here is the SQL Fiddle demo. It has an extra condition in the inline view mt2, to perform this search only for id = 1.
I'm a little dense this morning, I'm not sure if you wanted just the ID's or...
Here's my take on it...
You could probably move the where FeatureName like 'A%' into the inner query to filter the data on the initial table scan.
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id
A general solution is
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
to limit the solution to a group just add a WHERE condition on the main query for a.ID. The CTE is needed to get the correct number of rows for each id
SQLFiddle demo, in the demo I changed little the test data to have a another couple of ID with only one of the FeatureName of 1 and 3

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

SQL (TSQL) - Select values in a column where another column is not null?

I will keep this simple- I would like to know if there is a good way to select all the values in a column when it never has a null in another column. For example.
A B
----- -----
1 7
2 7
NULL 7
4 9
1 9
2 9
From the above set I would just want 9 from B and not 7 because 7 has a NULL in A. Obviously I could wrap this as a subquery and USE the IN clause etc. but this is already part of a pretty unique set and am looking to keep this efficient.
I should note that for my purposes this would only be a one-way comparison... I would only be returning values in B and examining A.
I imagine there is an easy way to do this that I am missing, but being in the thick of things I don't see it right now.
You can do something like this:
select *
from t
where t.b not in (select b from t where a is null);
If you want only distinct b values, then you can do:
select b
from t
group by b
having sum(case when a is null then 1 else 0 end) = 0;
And, finally, you could use window functions:
select a, b
from (select t.*,
sum(case when a is null then 1 else 0 end) over (partition by b) as NullCnt
from t
) t
where NullCnt = 0;
The query below will only output one column in the final result. The records are grouped by column B and test if the record is null or not. When the record is null, the value for the group will increment each time by 1. The HAVING clause filters only the group which has a value of 0.
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
If you want to get all the rows from the records, you can use join.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
) b ON a.b = b.b