Selecting COUNT from different criteria on a table - sql

I have a table named 'jobs'. For a particular user a job can be active, archived, overdue, pending, or closed. Right now every page request is generating 5 COUNT queries and in an attempt at optimization I'm trying to reduce this to a single query. This is what I have so far but it is barely faster than the 5 individual queries. Note that I've simplified the conditions for each subquery to make it easier to understand, the full query acts the same however.
Is there a way to get these 5 counts in the same query without using the inefficient subqueries?
SELECT
(SELECT count(*)
FROM "jobs"
WHERE
jobs.creator_id = 5 AND
jobs.status_id NOT IN (8,3,11) /* 8,3,11 being 'inactive' related statuses */
) AS active_count,
(SELECT count(*)
FROM "jobs"
WHERE
jobs.creator_id = 5 AND
jobs.due_date < '2011-06-14' AND
jobs.status_id NOT IN(8,11,5,3) /* Grabs the overdue active jobs
('5' means completed successfully) */
) AS overdue_count,
(SELECT count(*)
FROM "jobs"
WHERE
jobs.creator_id = 5 AND
jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000'
) AS due_today_count
This goes on for 2 more subqueries but I think you get the idea.
Is there an easier way to collect this data since it's basically 5 different COUNT's off of the same subset of data from the jobs table?
The subset of data is 'creator_id = 5', after that each count is basically just 1-2 additional conditions. Note that right now we're using Postgres but may be moving to MySQL in the near future. So if you can provide an ANSI-compatible solution I'd be gratetful :)

This is the typical solution. Use a case statement to break out the different conditions. If a record meets it gets a 1 else a 0. Then do a SUM on the values
SELECT
SUM(active_count) active_count,
SUM(overdue_count) overdue_count
SUM(due_today_count) due_today_count
FROM
(
SELECT
CASE WHEN jobs.status_id NOT IN (8,3,11) THEN 1 ELSE 0 END active_count,
CASE WHEN jobs.due_date < '2011-06-14' AND jobs.status_id NOT IN(8,11,5,3) THEN 1 ELSE 0 END overdue_count,
CASE WHEN jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000' THEN 1 ELSE 0 END due_today_count
FROM "jobs"
WHERE
jobs.creator_id = 5 ) t
UPDATE
As noted when 0 records are returned as t this result in as single result of Nulls in all the values. You have three options
1) Add A Having clause so that you have No records returned rather than result of all NULLS
HAVING SUM(active_count) is not null
2) If you want all zeros returned than you could add coalesce to all your sums
For example
SELECT
COALESCE(SUM(active_count)) active_count,
COALESCE(SUM(overdue_count)) overdue_count
COALESCE(SUM(due_today_count)) due_today_count
3) Take advantage of the fact that COUNT(NULL) = 0 as sbarro's demonstrated. You should note that the not-null value could be anything it doesn't have to be a 1
for example
SELECT
COUNT(CASE WHEN
jobs.status_id NOT IN (8,3,11) THEN 'Manticores Rock' ELSE NULL
END) as [active_count]

I would use this approach, use COUNT in combination with CASE WHEN.
SELECT
COUNT(CASE WHEN
jobs.status_id NOT IN (8,3,11) THEN 1
END) as [Count1],
COUNT(CASE WHEN
jobs.due_date < '2011-06-14'
AND jobs.status_id NOT IN(8,11,5,3) THEN 1
END) as [COUNT2],
COUNT(CASE WHEN
jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000'
END) as [COUNT3]
FROM
"jobs"
WHERE
jobs.creator_id = 5

Brief
SQL Server 2012 introduced the IIF logical function. Using SQL Server 2012 or greater you can now use this new function instead of a CASE expression. The IIF function also works with Azure SQL Database (but at the moment it does not work with Azure SQL Data Warehouse or Parallel Data Warehouse). It's shorthand for the CASE expression.
I find myself using the IIF function rather than the CASE expression when there is only one case. This alleviates the pain of having to write CASE WHEN condition THEN x ELSE y END and instead writing it as IIF(condition, x, y). If multiple conditions may be met (multiple WHENs), you should instead consider using the regular CASE expression rather than nested IIF functions.
Returns one of two values, depending on whether the Boolean expression
evaluates to true or false in SQL Server.
Syntax
IIF ( boolean_expression, true_value, false_value )
Arguments
boolean_expression A valid Boolean expression.
If this argument is not a Boolean expression, then a syntax error is
raised.
true_value Value to return if boolean_expression evaluates to
true.
false_value Value to return if boolean_expression evaluates
to false.
Remarks
IIF is a shorthand way for writing a CASE expression. It evaluates
the Boolean expression passed as the first argument, and then returns
either of the other two arguments based on the result of the
evaluation. That is, the true_value is returned if the Boolean
expression is true, and the false_value is returned if the Boolean
expression is false or unknown. true_value and false_value can be
of any type. The same rules that apply to the CASE expression for
Boolean expressions, null handling, and return types also apply to
IIF. For more information, see CASE (Transact-SQL).
The fact that IIF is translated into CASE also has an impact on
other aspects of the behavior of this function. Since CASE
expressions can be nested only up to the level of 10, IIF statements
can also be nested only up to the maximum level of 10. Also, IIF is
remoted to other servers as a semantically equivalent CASE
expression, with all the behaviors of a remoted CASE expression.
Code
Implementation of the IIF function in SQL would resemble the following (using the same logic presented by #rsbarro in his answer):
SELECT
COUNT(
IIF(jobs.status_id NOT IN (8,3,11), 1, 0)
) as active_count,
COUNT(
IIF(jobs.due_date < '2011-06-14' AND jobs.status_id NOT IN(8,11,5,3), 1, 0)
) as overdue_count,
COUNT(
IIF(jobs.due_date BETWEEN '2011-06-14' AND '2011-06-15 06:00:00.000000', 1, 0)
) as due_today_count
FROM
"jobs"
WHERE
jobs.creator_id = 5

Related

SQL CASE WHEN- can I do a function within a function? New to SQL

SELECT
SP.SITE,
SYS.COMPANY,
SYS.ADDRESS,
SP.CUSTOMER,
SP.STATUS,
DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) AS MONTH_COUNT
CASE WHEN(MONTH_COUNT = 0 THEN MONTH_COUNT = DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES) AS DAY_COUNT)
ELSE NULL
END
FROM SALEPASSES AS SP
INNER JOIN SYSTEM AS SYS ON SYS.SITE = SP.SITE
WHERE STATUS IN (7,27,29);
I am still trying to understand SQL. Is this the right order to have everything? I'm assuming my datediff() is unable to work because it's inside case when. What I am trying to do, is get the day count if month_count is less than 1 (meaning it's less than one month and we need to count the days between the dates instead). I need month_count to run first to see if doing the day_count would even be necessary. Please give me feedback, I'm new and trying to learn!
Case is an expression, it returns a value, it looks like you should be doing this:
DAY_COUNT =
CASE WHEN DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) = 0
THEN DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES))
ELSE NULL END
You shouldn't actually need else null as NULL is the default.
Note also you [usually] cannot refer to a derived column in the same select
It appears that what you are trying to do is define the MonthCount column's value, and then reuse that value in another column's definition. (The Don't Repeat Yourself principle.)
In most dialects of SQL, you can't do that. Including MS SQL Server.
That's because SQL is a "declarative" language. This means that SQL Server is free to calculate the column values in any order that it likes. In turn, that means you're not allowed to do anything that would rely on one column being calculated before another.
There are two basic ways around that...
First, use CTEs or sub-queries to create two different "scopes", allowing you to define MonthCount before DayCount, and so reuse the value without retyping the definition.
SELECT
*,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
(
SELECT
*,
bar AS MonthCount
FROM
x
)
AS derive_month
The second main way is to somehow derive the value Before the SELECT block is evaluated. In this case, using APPLY to 'join' a single value on to each input row...
SELECT
x.*,
MonthCount,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
x
CROSS APPLY
(
SELECT
bar AS MonthCount
)
AS derive_month

Compute a boolean result of multiple value/operator from a database table

I have a table returning me a series of boolean(0/1) paired with an operator (and/or) as follow
e.g.
ID
Operator
Value
1
and
0
2
and
0
3
or
1
I am trying to find an efficient way to parse these rows into a single result that would be equivalent to
(false and false or true) //returning true
I think this would be possible with CLR, but it appear to be something I could do in regular T-SQL. I've thought about scalar function receiving a table variable. Would this be my best course of action at this stage?
The table will never be very complex and always start with an AND operator which can be ignored.
As pointed out in comment, I was going to do this with a simple AND and if I have any OR it would shortcut the entire thing, but I am doubting my own self as when I tried this in JS it doesn't appear to be the case: https://jsfiddle.net/3ctshxae/
Edit: I incorrectly stated I would process them in as a stack since that is what made sense to me initially. I never thought about orders of operators without brackets as one would simply never write them that way when programming. I indeed would like to respect precedence rules of Boolean logic of ANDs taking precedence.
I know I could do this in a CLR and simply throw the logic in it and get the returned value, but I was hoping to avoid stepping outside of T-SQL.
Normally one would process this using precedence rules: and comes before or, and this would complicate things immensely.
But if you want to process this in a very simple stack manner, where each item is processed in order, you can rearrange this in the following manner.
Since (A AND B) OR C is the same as C OR (A AND B), we can simply look for all ands separately from or
We can ignore any or condition which is false because any other condition will override it
We need all and conditions to be true
SELECT CASE WHEN
(COUNT(CASE WHEN Operator = 'and' AND value = 1 THEN 1 END) > 0
AND COUNT(CASE WHEN Operator = 'and' AND value = 0 THEN 1 END) = 0)
OR COUNT(CASE WHEN Operator = 'or' AND value = 1 THEN 1 END) > 0
THEN 1 ELSE 0 END
FROM YourTable;
You say that actually you do want to use precedence rules. You have no brackets, so that makes things easier.
So we can formulate it like this:
Any or which is false can be ignored
Any or which is true overrides anything else
Group all ands together in islands, and evaluate each island separately
Any island of all trues is true, all others are false
SELECT CASE WHEN
EXISTS (SELECT 1
FROM YourTable t
WHERE Operator = 'or' AND value = 1)
OR EXISTS (SELECT 1
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY ID) -
ROW_NUMBER() OVER (PARTITION BY Operator ORDER BY ID) grouping
FROM YourTable t
) t
WHERE Operator = 'and'
GROUP BY grouping
HAVING COUNT(CASE WHEN value = 0 THEN 1 END) = 0
)
THEN 1 ELSE 0 END;

Using Case to sum NULL instances gives missing expression error

I'm attempting to generate a list of vehicles that don't have a price or mileage listed using the below query. When I attempt to run the query, I get an error "ORA-00936: missing expression", but can't seem to find out why. From other posts here, I can see that using IS NULL should be the appropriate term for the WHEN portion, but I am not seeing anything wrong with the query itself. Any help would be appreciated!
Select
SUM(CASE vehicles.mileage WHEN IS NULL THEN 1 ELSE 0 END) NO_MILEAGE,
SUM(CASE vehicles.price WHEN IS NULL THEN 1 ELSE 0 END) NO_PRICE
From
[data]
Simple syntax error:
Select
SUM(CASE WHEN vehicles.mileage IS NULL THEN 1 ELSE 0 END) NO_MILEAGE,
SUM(CASE WHEN vehicles.price IS NULL THEN 1 ELSE 0 END) NO_PRICE
From
[data];
This is assuming a table named vehicles in your FROM clause or a columns with an object or nested table type in [data] named vehicles. Else the qualification vehicles. would not make sense.
Use a "searched" CASE for a decision between two alternatives.
Details about "simple" and "searched" CASE in the Oracle online reference.
You can also use COUNT for your particular case. The online reference again:
If you specify expr, then COUNT returns the number of rows where expr is not null.
If you specify the asterisk (*), then this function returns all rows,
including duplicates and nulls. COUNT never returns null.
So you need the difference:
Select
COUNT(*) - COUNT(vehicles.mileage) AS NO_MILEAGE,
COUNT(*) - COUNT(vehicles.price) AS NO_PRICE
From
[data];
You could also use Oracle's NVL2 function:
Select
SUM(NVL2(vehicles.mileage, 0, 1)) NO_MILEAGE,
SUM(NVL2(vehicles.price, 0, 1)) NO_PRICE
From
[data]

Case Statement Performance (is it possible to reference another case statement from within another one?)

I have a question about sub-queries and case statements
I have two case statements in the same query:
One has a sub-query that is used to determine if a column has a match.
I'd like the other to check if theres a match [among other checks], then tag a value.
However, t-SQL will not let me reference my first field (generated from the case statement) within my second case statement.
This forces me to add the subquery into my second case statement and do away with the first case statement
When I do this, my query goes from 13 seconds to 2.5 minutes
When I remove the subquery altogether from my query, it takes 8 seconds to run
Question 1: Can case-statement-generated fields be referenced in subsequent case statements in the same query?
Question 2: Why does my query take only 5 seconds longer when I have the subquery in an isolated case statement but 2 minutes longer when that subquery is in a case statement that has 4-5 other checks?
1st Case Statement
CASE WHEN (SELECT xxx.xxx from xxx) THEN 'Y'
END AS "Match_Ind",
Second Case Statement
CASE WHEN condition 1 = true THEN 'cond1'
WHEN condition 2 = true THEN 'cond2'
WHEN Match_Ind = 'Y' THEN 'matched'
END AS "Match Detail"
You should consider posting your full query but if you want to reference the result of the first CASE inside of another CASE statement, then you can wrap it in a SELECT similar to this:
select
CASE
WHEN condition 1 = true THEN 'cond1'
WHEN condition 2 = true THEN 'cond2'
WHEN Match_Ind = 'Y' THEN 'matched'
END AS Match Detail
from
(
SELECT CASE
WHEN conditionHere -- (SELECT xxx.xxx from xxx)
THEN 'Y'
END AS Match_Ind,
othercols
from yourtable
) x
Have you tried something along the lines of:
select case Bar1 ... end as Bar2, ...
from ( select case Foo1 ... end as Bar1, ... from ... )
It can be JOINed with other tables as needed.

Sql Server equivalent of a COUNTIF aggregate function

I'm building a query with a GROUP BY clause that needs the ability to count records based only on a certain condition (e.g. count only records where a certain column value is equal to 1).
SELECT UID,
COUNT(UID) AS TotalRecords,
SUM(ContractDollars) AS ContractDollars,
(COUNTIF(MyColumn, 1) / COUNT(UID) * 100) -- Get the average of all records that are 1
FROM dbo.AD_CurrentView
GROUP BY UID
HAVING SUM(ContractDollars) >= 500000
The COUNTIF() line obviously fails since there is no native SQL function called COUNTIF, but the idea here is to determine the percentage of all rows that have the value '1' for MyColumn.
Any thoughts on how to properly implement this in a MS SQL 2005 environment?
You could use a SUM (not COUNT!) combined with a CASE statement, like this:
SELECT SUM(CASE WHEN myColumn=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
Note: in my own test NULLs were not an issue, though this can be environment dependent. You could handle nulls such as:
SELECT SUM(CASE WHEN ISNULL(myColumn,0)=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
I usually do what Josh recommended, but brainstormed and tested a slightly hokey alternative that I felt like sharing.
You can take advantage of the fact that COUNT(ColumnName) doesn't count NULLs, and use something like this:
SELECT COUNT(NULLIF(0, myColumn))
FROM AD_CurrentView
NULLIF - returns NULL if the two passed in values are the same.
Advantage: Expresses your intent to COUNT rows instead of having the SUM() notation.
Disadvantage: Not as clear how it is working ("magic" is usually bad).
I would use this syntax. It achives the same as Josh and Chris's suggestions, but with the advantage it is ANSI complient and not tied to a particular database vendor.
select count(case when myColumn = 1 then 1 else null end)
from AD_CurrentView
How about
SELECT id, COUNT(IF status=42 THEN 1 ENDIF) AS cnt
FROM table
GROUP BY table
Shorter than CASE :)
Works because COUNT() doesn't count null values, and IF/CASE return null when condition is not met and there is no ELSE.
I think it's better than using SUM().
Adding on to Josh's answer,
SELECT COUNT(CASE WHEN myColumn=1 THEN AD_CurrentView.PrimaryKeyColumn ELSE NULL END)
FROM AD_CurrentView
Worked well for me (in SQL Server 2012) without changing the 'count' to a 'sum' and the same logic is portable to other 'conditional aggregates'. E.g., summing based on a condition:
SELECT SUM(CASE WHEN myColumn=1 THEN AD_CurrentView.NumberColumn ELSE 0 END)
FROM AD_CurrentView
It's 2022 and latest SQL Server still doesn't have COUNTIF (along with regex!). Here's what I use:
-- Count if MyColumn = 42
SELECT SUM(IIF(MyColumn = 42, 1, 0))
FROM MyTable
IIF is a shortcut for CASE WHEN MyColumn = 42 THEN 1 ELSE 0 END.
Not product-specific, but the SQL standard provides
SELECT COUNT() FILTER WHERE <condition-1>,
COUNT() FILTER WHERE <condition-2>, ...
FROM ...
for this purpose. Or something that closely resembles it, I don't know off the top of my hat.
And of course vendors will prefer to stick with their proprietary solutions.
Why not like this?
SELECT count(1)
FROM AD_CurrentView
WHERE myColumn=1
I had to use COUNTIF() in my case as part of my SELECT columns AND to mimic a % of the number of times each item appeared in my results.
So I used this...
SELECT COL1, COL2, ... ETC
(1 / SELECT a.vcount
FROM (SELECT vm2.visit_id, count(*) AS vcount
FROM dbo.visitmanifests AS vm2
WHERE vm2.inactive = 0 AND vm2.visit_id = vm.Visit_ID
GROUP BY vm2.visit_id) AS a)) AS [No of Visits],
COL xyz
FROM etc etc
Of course you will need to format the result according to your display requirements.
SELECT COALESCE(IF(myColumn = 1,COUNT(DISTINCT NumberColumn),NULL),0) column1,
COALESCE(CASE WHEN myColumn = 1 THEN COUNT(DISTINCT NumberColumn) ELSE NULL END,0) AS column2
FROM AD_CurrentView