Counting Booleans for Distinct and Non Distinct ID numbers - sql

I have a simple table that looks like the following PNG file from the following join:
SELECT *
FROM tableA A
JOIN tableB B ON B.Main_SPACE_ID = A.Main_SPACE_ID
Table A contains Guest_ON and User_Controls (last 2 columns) and Table B contains Trigger_ON and DOCX_ON.
Issue:
What I am trying to do is count all the True's for each tableB.Subspace_ID and the DISTINCT trues for tableA.Main_SPACE_ID.
The problem is that subspace_ID from table B lives within the main_space_id from table A and therefore creates a situation where I am double counting.
I only want to count the trues for a distinct Main_space ID
Current Data Model
Desired Output:
From the above screenshot, I am trying to get a count of true values without double counting in the case for tableA_MAIN_SPACE_ID.
As you can see, each row is counted for true values as it relates to the subspace_ID (table B) for totals of 12 and 8 (1 if True, 0 if False) and for tableA, I am only counting distinct values so we only count Trues for a single MainspaceID and avoid recounting them.
If someone can advise on how to get this output from my current data model that would be very helpful!
My attempt as follows double counts trues for the Main space ID column..
SELECT
count(CASE WHEN B.TRIGGER_ON THEN 1 END) as TRIGGER_ON,
count(CASE WHEN B.DOCX_ON THEN 1 END) as DOCX_ON,
count(CASE WHEN A.GUEST_ON THEN 1 END) as SPRINTS,
count(CASE WHEN A.USER_CONTROLS THEN 1 END) as SPRINTS
FROM DataModel

What I am trying to do is count all the True's for each tableB.Subspace_ID and the DISTINCT trues for tableA.Main_SPACE_ID.
You can use conditional aggregation. In Snowflake, you can use the convenient COUNT_IF() for the first two columns. However, for the second two, you need COUNT(DISTINCT) with conditional logic:
SELECT COUNT_IF( B.Trigger_on ) as Trigger_On,
COUNT_IF( B. DOCX_ON ) as DOCX_ON,
COUNT(DISTINCT CASE WHEN A.GUEST_ON THEN A.Main_SPACE_ID END) as GUEST_ON,
COUNT(DISTINCT CASE WHEN A. USER_CONTROLS THEN A.Main_SPACE_ID END) as USER_CONTROLS
FROM tableA A JOIN
tableB B
ON B.Main_SPACE_ID = A.Main_SPACE_ID;

Mabye:
SELECT
COUNT(CASE WHEN B.TRIGGER_ON THEN 1 END) AS TRIGGER_ON,
COUNT(CASE WHEN B.DOCX_ON THEN 1 END) AS DOCX_ON,
(SELECT COUNT(*) FROM (SELECT DISTINCT A.MAIN_SPACE_ID, A.GUEST_ON FROM DataModel WHERE A.GUEST_ON = TRUE) A) AS GUEST_ON
(SELECT COUNT(*) FROM (SELECT DISTINCT A.USER_CONTROLS, A.GUEST_ON FROM DataModel WHERE A.USER_CONTROLS = TRUE) A) AS USER_CONTROLS
FROM DataModel

Related

check and compare the count from two tables without relation

I have below tables
Table1: "Demo"
Columns: SSN, sales, Create_DT,Update_Dt
Table2: "Agent"
Columns: SSN,sales, Agent_Name, Create_Dt, Update_DT
Scenario 1 and desired result set:
I want output as 0 if the count of SSN in Demo table is matched with the count of SSN in Agent table
if the count is not matched then I want result as 1
Scenario 2 and desired result set:
I want output as 0 if the sum of sales in Demo table is matched with the sum of sales in Agent table
if the sum is not matched then I want result as 1
Please help on this query part
Thanks
You can write two queries separately to take counts within the result query
SELECT (SELECT count(Demo.SSN) as SSN1 from Demo)!=(SELECT count(Agent.SSN) as SSN2 from Agent) AS Result;
Basically what the inner queries does is it checked whether the counts are equal or not and outputs 1 if it is true and 0 if it is false. Since you have asked to output 1 if it is false I used '!=' sign.
You can try the same procedure in scenario 2 also
For scenario 1
select (Case when (select count(ssn) from Demo)=(select count(ssn) from Agent) then 0 else 1 end) as desired_result
If you want to count unique ssn then:
select (Case when (select count(distinct ssn) from Demo)=(select count(distinct ssn) from Agent) then 0 else 1 end) as desired_result
For scenario 2:
select (Case when (select sum(sales) from Demo)=(select sum(sales) from Agent) then 0 else 1 end) as desired_result
I would suggest one query with both sets of information:
select (d.num_ssn <> a.num_ssn) as have_different_ssn_count,
(d.sales <> a.sales) as have_different_sales
from (select count(distinct ssn) as num_ssn,
coalesce(sum(sales), 0) as sales
from demo
) d cross join
(select count(distinct ssn) as num_ssn,
coalesce(sum(sales), 0) as sales
from agent
) a;
Note: This returns boolean values -- true/false rather than 1/0. If you really want 0/1, then use case:
select (case when d.num_ssn <> a.num_ssn then 1 else 0 end) as have_different_ssn_count,
(case when d.sales <> a.sales then 1 else 0 end) as have_different_sales
It would not surprise me if you were not only interested in the total counts but also that the agent/sales combinations are the same in both tables. If that is the case, please ask a new question with a clear explanation. Sample data and desired results help.

Proportion request sql

There is a table of accidents and output the share of accidents number 2 to all accidents I wrote this code, but I can not make it work:
select ((select count("ID") from "DTP" where "REASON"=2)/count("REASON"))
from "DTP"
group by "ID"
Something like this (not tested):
select id, count(case reason when 2 then 1 end)/count(*) as proportion
from your_table
-- where ... (if you need to filter, for example by date)
group by id
;
count(*) counts all the rows in a group (that is, all the rows for each separate id). The case expression returns 1 when the reason is 2 and it returns null otherwise; count counts only non-null values, so it will count the rows where the reason is 2.
You can use avg():
select id,
avg(case when reason = 2 then 1.0 else 0 end)
from "DTP"
group by "ID"
This produces the ratio for each id -- based on your sample query. If you only want one row for all the data, then:
select avg(case when reason = 2 then 1.0 else 0 end)
from "DTP";

Sum distinct records in a table with duplicates in Teradata

I have a table that has some duplicates. I can count the distinct records to get the Total Volume. When I try to Sum when the CompTia Code is B92 and run distinct is still counts the dupes.
Here is the query:
select
a.repair_week_period,
count(distinct a.notif_id) as Total_Volume,
sum(distinct case when a.header_comptia_cd = 'B92' then 1 else 0 end) as B92_Sum
FROM artemis_biz_app.aca_service_event a
where a.Sales_Org_Cd = '8210'
and a.notif_creation_dt >= current_date - 180
group by 1
order by 1
;
Is There a way to only SUM the distinct records for B92?
I also tried inner joining the table on itself by selecting the distinct notification id and joining on that notification id, but still getting wrong sum counts.
Thanks!
Your B92_Sum currently returns either NULL, 1 or 2, this is definitely no sum.
To sum distinct values you need something like
sum(distinct case when a.header_comptia_cd = 'B92' then column_to_sum else 0 end)
If this column_to_sum is actually the notif_id you get a conditional count but not a sum.
Otherwise the distinct might remove too many vales and then you probably need a Derived Table where you remove duplicates before aggregation:
select
repair_week_period,
--no more distinct needed
count(a.notif_id) as Total_Volume,
sum(case when a.header_comptia_cd = 'B92' then column_to_sum else 0 end) as B92_Sum
FROM
(
select repair_week_period,
notif_id
header_comptia_cd,
column_to_sum
from artemis_biz_app.aca_service_event
where a.Sales_Org_Cd = '8210'
and a.notif_creation_dt >= current_date - 180
-- only onw row per notif_id
qualify row_number() over (partition by notif_id order by ???) = 1
) a
group by 1
order by 1
;
#dnoeth It seems the solution to my problem was not to SUM the data, but to count distinct it.
This is how I resolved my problem:
count(distinct case when a.header_comptia_cd = 'B92' then a.notif_id else NULL end) as B92_Sum

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.
And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...
Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable
No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.
I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR
If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)