Find missing values within groups of data using SQL - sql

Recently I need to generate a report based on the fact:
TableA has the following 2 columns UserID and DocumentType
I have been provided a list of 'mandatory' document types: Type1, Type2, Type3 and I need to return every UserID that doesn't have all three of these types along with the types they are missing.
For example if TableA contains the following rows
12 Type1
12 Type2
12 Type4
13 Type1
13 Type2
13 Type3
14 Type1
15 Type6
15 Type7
15 Type8
Then ideally the output would be something like:
12 Type3
14 Type2, Type3
15 Type1, Type2, Type3
Ideally, the query to generate the results should be able to handle up to tens of millions of records.
We recently implemented a solution to a similar question (which is a bit more complicated than this) with SQL server 2012. It takes 3 and a half minutes to get the full report among multiple tables with around 4 million records in total. We wonder whether there are better ideas which can do this faster.
Please feel free to share your ideas which can solve this problem.
Thank you! :)

DEMO
WITH
base ([DocumentType]) as (
SELECT 'Type1' UNION ALL
SELECT 'Type2' UNION ALL
SELECT 'Type3'
),
users as (
SELECT DISTINCT [userID]
FROM Table1 t
),
pairs as (
SELECT *
FROM users, base
)
SELECT p.userID, p.[DocumentType], t.[DocumentType]
FROM pairs p
LEFT JOIN Table1 t
ON p.[DocumentType] = t.[DocumentType]
AND p.[userID] = t.[userID]
WHERE t.[DocumentType] IS NULL
OUTPUT

i think you need something like this
select x2.*
(select *
from (select distinct UserID from [table])x
cross join
(select 'type1' DocumentType union
select 'type2' union
select 'type3' ) y
) x2
left join [table] y2
on y2.UserID = x2.UserID
and y2.DocumentType = x2.DocumentType
where y2.DocumentType is null
order by x2.UserID

here is the FOR XML path concatenation method:
CREATE TABLE TableA (ID INT, TypeCol CHAR(5));
INSERT INTO TableA (ID,TypeCol) VALUES (12,'Type1')
,(12,'Type2')
,(12,'Type4')
,(13,'Type1')
,(13,'Type2')
,(13,'Type3')
,(14,'Type1')
,(15,'Type6')
,(15,'Type7')
,(15,'Type8')
;WITH cteRequiredTypes AS (
SELECT 'type1' as TypeCol
UNION ALL
SELECT 'type2'
UNION ALL
SELECT 'type3'
)
, cteTableAIds AS (
SELECT DISTINCT Id
FROM
TableA
)
, cteMissingTypes AS (
SELECT
i.ID
,r.TypeCol
FROm
cteRequiredTypes r
CROSS JOIN cteTableAIds i
LEFT JOIN TableA a
ON r.TypeCol = a.TypeCol
AND i.ID = a.ID
WHERE
a.ID IS NULL
)
SELECT
DISTINCT a.ID
,STUFF(
(SELECT ',' + TypeCol
FROM
cteMissingTypes t
WHERE t.ID = a.ID
FOR XML PATH(''))
,1,1,'')
FROM
cteMissingTypes a
I believe for checking a large dataset a conditional aggregation query will probably be better performance.
Conditional Aggregation
SELECT
ID
,CASE WHEN SUM(IIF(TypeCol = 'type1',1,0)) = 0 THEN 'type1' ELSE '' END as Type1
,CASE WHEN SUM(IIF(TypeCol = 'type2',1,0)) = 0 THEN 'type2' ELSE '' END as Type2
,CASE WHEN SUM(IIF(TypeCol = 'type3',1,0)) = 0 THEN 'type3' ELSE '' END as Type3
,STUFF(
REPLACE (
REPLACE (
+ ',' + CASE WHEN SUM(IIF(TypeCol = 'type1',1,0)) = 0 THEN 'type1' ELSE '' END
+ ',' + CASE WHEN SUM(IIF(TypeCol = 'type2',1,0)) = 0 THEN 'type2' ELSE '' END
+ ',' + CASE WHEN SUM(IIF(TypeCol = 'type3',1,0)) = 0 THEN 'type3' ELSE '' END
,',,,',',,')
,',,',',')
,1,1,'') as MissingTypeList
FROM
TableA
GROUP BY
ID
HAVING
SUM(IIF(TypeCol = 'type1',1,0)) = 0
OR SUM(IIF(TypeCol = 'type2',1,0)) = 0
OR SUM(IIF(TypeCol = 'type3',1,0)) = 0

The easy way
SELECT DISTINCT UserID
FROM your_table
EXCEPT
(
SELECT UserID
FROM your_table
WHERE DocumentType = ('Type1')
UNION
SELECT UserID
FROM your_table
WHERE DocumentType = ('Type2')
UNION
SELECT UserID
FROM your_table
WHERE DocumentType = ('Type3')
)
Can't be more exact in the rules than this -- so your compiler will look at indexes etc and optimize.
Unless there are parts of the problem you are not telling us.

Related

Want to Return Empty Rows With a Case When Statement

Lets say I'm using 2 case when statements to group my data, like in the below example:
select case
when group1 = 'A' then 'Large'
when group1 = 'B' then 'Medium'
else 'Small'
end as 'Order Size'
,case
when method = 'Delivery' then 'Delivery'
else 'Pick-up'
end as 'Distribution Method'
,count(distinct(OrderIDs))
from OrderTable
GROUP BY
select case
when group1 = 'A' then 'Large'
when group1 = 'B' then 'Medium'
else 'Small'
end
,case
when method = 'Delivery' then 'Delivery'
else 'Pick-up'
end
Lets also say that there were no "Large" deliveries that were "Pick-Up'. Currently, this query will not return a row with Large,PickUp category.
Is there a way to have a row returned with 0’s if there is nothing that meets the multiple case when criteria?
Use a cross join to generate the rows and left join to bring in the data:
select os.OrderSize, coalesce(d.DistributionMethod, 'Pick-Up') as
count(*)
from (select 'Large' as OrderSize union all
select 'Medium' as OrderSize union all
select 'Small' as OrderSize
) os cross join
(select 'Delivery' as DistributionMethod union all
select 'Pick-Up' as DistributionMethod
) d left join
OrderTable ot
on ( (ot.group1 = 'A' and os.OrderSize = 'Large') or
(ot.group1 = 'B' and os.OrderSize = 'Medium') or
(ot.group1 not in ('A', 'B') and os.OrderSize = 'Small')
) and
ot.method = d.DistributionMethod
group by os.OrderSize, coalesce(d.DistributionMethod, 'Pick-Up');
Not all databases support the creation of a table of constants using this syntax, but there is generally some syntax that does this.
You could select a recordset that contains the required values and then left join your grouped recordset from there. Following is an example for SQL Server where you would join your results to [Groupings].[OrderSize] and [Groupings].[DistributionMethod]:
SELECT *
FROM (
SELECT *
FROM (
SELECT 'Large' AS [OrderSize]
UNION
SELECT 'Medium' AS [OrderSize]
UNION
SELECT 'Small' AS [OrderSize]
) AS [OrderSizes]
CROSS JOIN (
SELECT 'Delivery' AS [DistributionMethod]
UNION
SELECT 'Pick-up' AS [DistributionMethod]
) AS [DistributionMethods]
) AS [Groupings]
LEFT JOIN ...

Sql server aggregate function and GROUP BY Clause error

I have a query below where it compares the number of stagingCabincrew and StagingCockpitCrew columns from the staging schema and compares them to their data schema equivalent 'DataCabinCrew' and 'DataCockpitCrew'.
Below is the query and the results outputted:
WITH CTE AS
(SELECT cd.*,
c.*,
DataFlight,
l.ScheduledDepartureDate,
l.ScheduledDepartureAirport
FROM
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY LegKey
ORDER BY UpdateID DESC) AS RowNumber
FROM Data.Crew) c
INNER JOIN Data.CrewDetail cd ON c.UpdateID = cd.CrewUpdateID
AND cd.IsPassive = 1
AND RowNumber = 1
INNER JOIN
(SELECT *,
Carrier + CAST(FlightNumber AS VARCHAR) + Suffix AS DataFlight
FROM Data.Leg) l ON c.LegKey = l.LegKey )
SELECT StagingFlight,
sac.DepartureDate,
sac.DepartureAirport,
cte.DataFlight,
cte.ScheduledDepartureDate,
cte.ScheduledDepartureAirport,
SUM(CASE
WHEN sac.CREWTYPE = 'F' THEN 1
ELSE 0
END) AS StagingCabinCrew,
SUM(CASE
WHEN sac.CREWTYPE = 'C' THEN 1
ELSE 0
END) AS StagingCockpitCrew,
SUM(CASE
WHEN cte.CrewType = 'F' THEN 1
ELSE 0
END) AS DataCabinCrew,
SUM(CASE
WHEN cte.CrewType = 'C' THEN 1
ELSE 0
END) AS DataCockpitCrew
FROM
(SELECT *,
Airline + CAST(FlightNumber AS VARCHAR) + Suffix AS StagingFlight,
ROW_NUMBER() OVER(PARTITION BY Airline + CAST(FlightNumber AS VARCHAR) + Suffix
ORDER BY UpdateId DESC) AS StageRowNumber
FROM Staging.SabreAssignedCrew) sac
LEFT JOIN CTE cte ON StagingFlight = DataFlight
AND sac.DepartureDate = cte.ScheduledDepartureDate
AND sac.DepartureAirport = cte.ScheduledDepartureAirport
AND sac.CREWTYPE = cte.CrewType
WHERE MONTH(sac.DepartureDate) + YEAR(sac.DepartureDate) = MONTH(GETDATE()) + YEAR(GETDATE())
AND StageRowNumber = 1 --AND cte.ScheduledDepartureDate IS NOT NULL
--AND cte.ScheduledDepartureAirport IS NOT NULL
GROUP BY StagingFlight,
sac.DepartureDate,
sac.DepartureAirport,
cte.DataFlight,
cte.ScheduledDepartureDate,
cte.ScheduledDepartureAirport
The results are correct, all I need to do is add a condition in the WHERE clause where StagingCabinCrew <> DataCabinCrew AND StagingCockpitCrew <> DataCockpitCrew
If a row appears then we have found an error in the data, I just need helping adding this condition in the WHERE Clause because the columns in the WHERE Clause are referring to a SUM and CASE Function. I just need help manipulating the query so that I can add this WHERE Clause
I will guess you are trying to use an alias in the same query.
You CANT do this, because the alias wont be recognized in the WHERE.
SELECT field1 + field2 as myField
FROM yourTable
WHERE myField > 3
You need to include it in a sub query
with cte2 as (
SELECT field1 + field2 as myField
FROM yourTable
)
SELECT *
FROM cte2
WHERE myField > 3
or repeat the function
SELECT field1 + field2 as myField
FROM yourTable
WHERE field1 + field2 > 3

SQL Server case when or enum

I have a table something like:
stuff type price
first_stuff 1 43
second_stuff 2 46
third_stuff 3 24
fourth_stuff 2 12
fifth_stuff NULL 90
And for every type of stuff is assigned a description which is not stored in DB
1 = Bad
2 = Good
3 = Excellent
NULL = Not_Assigned
All I want is to return a table which count each type separately, something like:
Description Count
Bad 1
Good 2
Excellent 1
Not_Assigned 1
DECLARE #t TABLE ([type] INT)
INSERT INTO #t ([type])
VALUES (1),(2),(3),(2),(NULL)
SELECT
[Description] =
CASE t.[type]
WHEN 1 THEN 'Bad'
WHEN 2 THEN 'Good'
WHEN 3 THEN 'Excellent'
ELSE 'Not_Assigned'
END, t.[Count]
FROM (
SELECT [type], [Count] = COUNT(*)
FROM #t
GROUP BY [type]
) t
ORDER BY ISNULL(t.[type], 999)
output -
Description Count
------------ -----------
Bad 1
Good 2
Excellent 1
Not_Assigned 1
;WITH CTE_TYPE
AS (SELECT DESCRIPTION,
VALUE
FROM (VALUES ('BAD',
1),
('GOOD',
2),
('EXCELLENT',
3))V( DESCRIPTION, VALUE )),
CTE_COUNT
AS (SELECT C.DESCRIPTION,
Count(T.TYPE) TYPE_COUNT
FROM YOUR_TABLE T
JOIN CTE_TYPE C
ON T.TYPE = C.VALUE
GROUP BY TYPE,
DESCRIPTION
UNION ALL
SELECT 'NOT_ASSIGNED' AS DESCRIPTION,
Count(*) TYPE_COUNT
FROM YOUR_TABLE
WHERE TYPE IS NULL)
SELECT *
FROM CTE_COUNT
Hope, this helps.
SELECT ISNULL(D.descr, 'Not_Assigned'),
T2.qty
FROM
(SELECT T.type,
COUNT(*) as qty
FROM Table AS T
GROUP BY type) AS T2
LEFT JOIN (SELECT 1 as type, 'Bad' AS descr
UNION ALL
SELECT 2, 'Good'
UNION ALL
SELECT 3, 'Excellent') AS D ON D.type = T2.type
If you are using Sql server 2012+ use this
SELECT
[Description] = coalesce(choose (t.[type],'Bad','Good' ,'Excellent'), 'Not_Assigned'),
t.[Count]
FROM (
SELECT [type], [Count] = COUNT(*)
FROM yourtable
GROUP BY [type]
) t

Looping in select query

I want to do something like this:
select id,
count(*) as total,
FOR temp IN SELECT DISTINCT somerow FROM mytable ORDER BY somerow LOOP
sum(case when somerow = temp then 1 else 0 end) temp,
END LOOP;
from mytable
group by id
order by id
I created working select:
select id,
count(*) as total,
sum(case when somerow = 'a' then 1 else 0 end) somerow_a,
sum(case when somerow = 'b' then 1 else 0 end) somerow_b,
sum(case when somerow = 'c' then 1 else 0 end) somerow_c,
sum(case when somerow = 'd' then 1 else 0 end) somerow_d,
sum(case when somerow = 'e' then 1 else 0 end) somerow_e,
sum(case when somerow = 'f' then 1 else 0 end) somerow_f,
sum(case when somerow = 'g' then 1 else 0 end) somerow_g,
sum(case when somerow = 'h' then 1 else 0 end) somerow_h,
sum(case when somerow = 'i' then 1 else 0 end) somerow_i,
sum(case when somerow = 'j' then 1 else 0 end) somerow_j,
sum(case when somerow = 'k' then 1 else 0 end) somerow_k
from mytable
group by id
order by id
this works, but it is 'static' - if some new value will be added to 'somerow' I will have to change sql manually to get all the values from somerow column, and that is why I'm wondering if it is possible to do something with for loop.
So what I want to get is this:
id somerow_a somerow_b ....
0 3 2 ....
1 2 10 ....
2 19 3 ....
. ... ...
. ... ...
. ... ...
So what I'd like to do is to count all the rows which has some specific letter in it and group it by id (this id isn't primary key, but it is repeating - for id there are about 80 different values possible).
http://sqlfiddle.com/#!15/18feb/2
Are arrays good for you? (SQL Fiddle)
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
;
id | total | somecol | totalcol
----+-------+---------+----------
1 | 6 | {b,a,c} | {2,1,3}
2 | 5 | {d,f} | {2,3}
In 9.2 it is possible to have a set of JSON objects (Fiddle)
select row_to_json(s)
from (
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
;
row_to_json
---------------------------------------------------------------
{"id":1,"total":6,"somecol":["b","a","c"],"totalcol":[2,1,3]}
{"id":2,"total":5,"somecol":["d","f"],"totalcol":[2,3]}
In 9.3, with the addition of lateral, a single object (Fiddle)
select to_json(format('{%s}', (string_agg(j, ','))))
from (
select format('%s:%s', to_json(id), to_json(c)) as j
from
(
select
id,
sum(totalcol) as total_sum,
array_agg(somecol) as somecol_array,
array_agg(totalcol) as totalcol_array
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
cross join lateral
(
select
total_sum as total,
somecol_array as somecol,
totalcol_array as totalcol
) c
) s
;
to_json
---------------------------------------------------------------------------------------------------------------------------------------
"{1:{\"total\":6,\"somecol\":[\"b\",\"a\",\"c\"],\"totalcol\":[2,1,3]},2:{\"total\":5,\"somecol\":[\"d\",\"f\"],\"totalcol\":[2,3]}}"
In 9.2 it is also possible to have a single object in a more convoluted way using subqueries in instead of lateral
SQL is very rigid about the return type. It demands to know what to return beforehand.
For a completely dynamic number of resulting values, you can only use arrays like #Clodoaldo posted. Effectively a static return type, you do not get individual columns for each value.
If you know the number of columns at call time ("semi-dynamic"), you can create a function taking (and returning) polymorphic parameters. Closely related answer with lots of details:
Dynamic alternative to pivot with CASE and GROUP BY
(You also find a related answer with arrays from #Clodoaldo there.)
Your remaining option is to use two round-trips to the server. The first to determine the the actual query with the actual return type. The second to execute the query based on the first call.
Else, you have to go with a static query. While doing that, I see two nicer options for what you have right now:
1. Simpler expression
select id
, count(*) AS total
, count(somecol = 'a' OR NULL) AS somerow_a
, count(somecol = 'b' OR NULL) AS somerow_b
, ...
from mytable
group by id
order by id;
How does it work?
Compute percents from SUM() in the same SELECT sql query
SQL Fiddle.
2. crosstab()
crosstab() is more complex at first, but written in C, optimized for the task and shorter for long lists. You need the additional module tablefunc installed. Read the basics here if you are not familiar:
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT id
, count(*) OVER (PARTITION BY id)::int AS total
, somecol
, count(*)::int AS ct -- casting to int, don't think you need bigint?
FROM mytable
GROUP BY 1,3
ORDER BY 1,3
$$
,
$$SELECT unnest('{a,b,c,d}'::text[])$$
) AS f (id int, total int, a int, b int, c int, d int);

counting records on the same table with different values possibly none sql server 2008

I have a inventory table with a condition i.e. new, used, other, and i am query a small set of this data, and there is a possibility that all the record set contains only 1 or all the conditions. I tried using a case statement, but if one of the conditions isn't found nothing for that condition returned, and I need it to return 0
This is what I've tried so far:
select(
case
when new_used = 'N' then 'new'
when new_used = 'U' then 'used'
when new_used = 'O' then 'other'
end
)as conditions,
count(*) as count
from myDB
where something = something
group by(
case
when New_Used = 'N' then 'new'
when New_Used = 'U' then 'used'
when New_Used = 'O' then 'other'
end
)
This returns the data like:
conditions | count
------------------
new 10
used 45
I am trying to get the data to return like the following:
conditions | count
------------------
new | 10
used | 45
other | 0
Thanks in advance
;WITH constants(letter,word) AS
(
SELECT l,w FROM (VALUES('N','new'),('U','used'),('O','other')) AS x(l,w)
)
SELECT
conditions = c.word,
[count] = COUNT(x.new_used)
FROM constants AS c
LEFT OUTER JOIN dbo.myDB AS x
ON c.letter = x.new_used
AND something = something
GROUP BY c.word;
try this -
DECLARE #t TABLE (new_used CHAR(1))
INSERT INTO #t (new_used)
SELECT t = 'N'
UNION ALL
SELECT 'N'
UNION ALL
SELECT 'U'
SELECT conditions, ISNULL(r.cnt, 0) AS [count]
FROM (
VALUES('U', 'used'), ('N', 'new'), ('O', 'other')
) t(c, conditions)
LEFT JOIN (
SELECT new_used, COUNT(1) AS cnt
FROM #t
--WHERE something = something
GROUP BY new_used
) r ON r.new_used = t.c
in output -
new 2
used 1
other 0
You can do it as a cross-tab:
select
sum(case when new_used = 'N' then 1 else 0 end) as N,
sum(case when new_used = 'U' then 1 else 0 end) as U,
sum(case when new_used = 'O' then 1 else 0 end) as Other
from myDB
where something = something