I have a query that produces the following:
Team | Member | Cancelled | Rate
-----------------------------------
1 John FALSE 150
1 Bill TRUE 10
2 Sarah FALSE 145
2 James FALSE 110
2 Ashley TRUE 0
What I need is to select the count of members for a team where cancelled is false and the sum of the rate regardless of cancelled status...something like this:
SELECT
Team,
COUNT(Member), --WHERE Cancelled = FALSE
SUM(Rate) --All Rows
FROM
[QUERY]
GROUP BY
Team
So the result would look like this:
Team | CountOfMember | SumOfRate
----------------------------------
1 1 160
2 2 255
This is just an example. The real query has multiple complex joins. I know I could do one query for the sum of the rate and then another for the count and then join the results of those two together, but is there a simpler way that would be less taxing and not cause me to copy and paste an already complex query?
You want a conditional sum, something like this:
sum(case when cancelled = 'false' then 1 else 0 end)
The reason for using sum(). The sum() is processing the records and adding a value, either 0 or 1 for every record. The value depends on the valued of cancelled. When it is false, then the sum() increments by 1 -- counting the number of such values.
You can do something similar with count(), like this:
count(case when cancelled = 'false' then cancelled end)
The trick here is that count() counts the number of non-NULL values. The then clause can be anything that is not NULL -- cancelled, the constant 1, or some other field. Without an else, any other value is turned into NULL and not counted.
I have always preferred the sum() version over the count() version, because I think it is more explicit. In other dialects of SQL, you can sometimes shorten it to:
sum(cancelled = 'false')
which, once you get used to it, makes a lot of sense.
Related
I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.
tableName
food.id STRING NULLABLE
food.basket.id STRING NULLABLE
food.foodType STRING NULLABLE
food.price INTEGER NULLABLE
food.printed BOOLEAN NULLABLE
food.variations RECORD REPEATED
food.variations.id INTEGER REPEATED
food.variations.amount INTEGER NULLABLE
Sample data
id basket.id. foodType. price. printed. variations.id variations.amount
1. abbcd. JUNK. 100. TRUE. NULL. NULL
2. cdefg. PIZZA. 200. TRUE. 1234. 10
2345. 20
5678. 20
3. abbcd. JUNK. 200. FALSE. 1234. 10
4. uiwka. TOAST. 500. FALSE. NULL. NULL
variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity
so now I am trying to aggregate some data for this table
I am trying to get
number of items printed (items where printed = TRUE)
number of items unprinted (items where printed = FALSE)
total cost of all items
total price of all variations
total number of unique baskets for a specific foodType
This is the query I have:
select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,
SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,
SUM(item.price) as price,
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
from tableName;
printed. unprinted. price. baskets. variations.
2. 2. 1000. 1. 60
Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?
Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)
So, below two subqueries
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
can be replace with
COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,
SUM((SELECT SUM(amount) FROM item.variations)) AS variations
Believe me or not - but result will be the same
Row printed nonPrinted price baskets variations
1 2 2 1000 1 60
So, as you can see yo don't need subqueries and you don't need joins here either
Note: in the second row - (SELECT SUM(amount) FROM item.variations) is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum ...
Hope you get it
Looking for some Oracle SQL theoretical help on the best way to handle a grouped result set. I understand why it groups the way it does, but I'm trying to figure out if there's a way to
I have a table that lists the activity of some cost centers. It looks like this:
Company Object Sub July August
A 1 20 50
A 1 10 0
A 1 10 0 20
B 1 0 0
I then need to flag whether or not there was activity in August. So I'm writing a CASE statement where if August = 0 THEN 'FALSE' ELSE 'TRUE'. Then I need to group all records by Company, Object, and Sub. The Cumulative column is a SUM of both July and August. However, my output looks like this:
Company Object Sub SUM ActivityFlag
A 1 70 TRUE
A 1 10 FALSE
A 1 10 20 TRUE
B 1 0 FALSE
What I need is this:
Company Object Sub August ActivityFlag
A 1 80 TRUE
A 1 10 20 TRUE
B 1 0 FALSE
Obviously, this is a simplified example of a much larger issue, but I'm trying to think through this problem theoretically so I can apply similar logic to my actual issue.
Is there a good SQL method for adding the August amount for rows 1 and 2, and then selecting TRUE so that this appears on a single row? I hope this makes sense.
use aggregation
select company,object,sub,sum(july+august),
max(case when august>0 then 'True' else 'false' end)
from table_name group by company,object,sub
If you are flagging your detail with the case statement you can either put the case in a sum similar to:
MAX(CASE WHEN August = 0 THEN 1 ELSE 0 END)
Another way if to aggregate the flag upward in an inner query:
SELECT IsAugust = MAX(IsAugust) FROM
(
...
IsAugust = CASE WHEN August=0 THEN 1 ELSE 0 END
...
)AS X
GROUP BY...
I am using sql server management studio 2012 and have to make a query to show which subject a student has failed(condition for failing is point<5.0) the most for the first time from this table
StudentID | SubjectID | First/Second_Time | Point.
1 | 02 | 1 | 5.0
2 | 04 | 2 | 7.0
3 | 03 | 2 | 9
... etc
Here are my teacher's query:
SELECT SubjectID
FROM Result(NAME OF the TABLE)
WHERE [First/Second_Time] = 1 AND Point < 5
GROUP BY SubjectID
HAVING count(point) >= ALL
(
SELECT count(point)
FROM Result
WHERE [First/Second_Time] = 1 AND point < 5
GROUP BY SubjectID
)
I don't understand the reason for making the having query. Because Count(point) is always >=all(select count(point)
from Result
where First/Second_Time=1 and point<5
group by SubjectID), isnt it ?
and it doesn't show that the subject has most student fail for the first time. Thanks in advance and sorry for my bad english
The subquery is returning a list of the number of times a subject was failed (on the first attempt). It might be easier for you to see what it's doing if you run it like this:
SELECT SubjectID, count(point)
FROM Result
WHERE [First/Second_Time] = 1 AND point < 5
GROUP BY SubjectID
So if someone failed math twice and science once, the subquery would return:
2
1
You want to know which subject was failed the most (in this case, which subject was failed 2 or more times, since that is the highest number of failures in your subquery). So you count again (also grouping by subject), and use having to return only subjects with 2 or more failures (greater than or equal to the highest value in your subquery).
SELECT SubjectID
FROM Result
WHERE [First/Second_Time] = 1 AND Point < 5
GROUP BY SubjectID
HAVING count(point)...
See https://msdn.microsoft.com/en-us/library/ms178543.aspx for more examples.
Sounds like you are working on a project for a class, so I'm not even sure I should answer this, but here goes. The question is why the having clause. Have you read the descriptions for having and all ?
All "Compares a scalar value with a single-column set of values".
The scalar value in this case is count(point) or the number of occurrences of a subject id with point less than 5. The single-column set in this case is a list of the number of occurrences of every subject that has less than 5 points.
The net result of the comparison is in the ">=". "All" will only evaluate to true if it is true for every value in the subquery. The subquery returns a set of counts of all subjects meeting the <5 and 1st time requirement. If you have three subjects that meet the <5 and 1st time criteria, and they have a frequency of 1,2,3 times respectively, then the main query will have three "having" results; 1,2,3. Each of the main query results has to be >= each of the subquery results for that main value to evaluate true. So going through step by step, First main value 1 is >= 1, but isn't >= 2 so 1 drops because the "having" is false. Second main value 2 is >=1, is >= 2, but is not >= 3 so it drops. Third value, 3, evaluates true as >= 1, 2, and 3, so you end up returning the subject with the highest frequency.
This is fairly clear in the "remarks" section of the MSDN discussion of "All" keyword, but not as relates to your specific application.
Remember, MSDN is our friend!
lets say Ive got Customers Detail Records table which has columns:
UserAId
UserBId
Duration
Impulses
For eample:
UserAId UserBId Duration Impulses
1 2 30 5
1 2 20 3
2 1 10 2
2 3 5 1
Ok, now I would like to write a query which would aggregate total Duration, Impulses and count of calls betweend users without direction so that the result would look like:
UserAId UserBId TotalDuration TotalImpulses TotalCallsCount
1 2 60 10 3
2 3 5 1 1
Is it possible ? If so then how to do this > thanks for help
Of course, if you execute a query like this:
SELECT
UserAId,
UserBId,
SUM(Duration) AS TotalDuration,
SUM(Impulses) AS TotalImpulses,
COUNT(*) AS TotalCallsCount
FROM CustomerDetail
GROUP BY UserAId, UserBId
... you will not get what you want. That is because this query does not aggregate and combine the rows that have UserAId=1 and UserBId=2 with those that have UserAId=2 and UserBId=1.
To do what you want you need a little trick. What you call UserAId and UserBId in the result set are not actually always what you read on the input table. This query will do what you ask:
SELECT
CASE WHEN UserAId<UserBId THEN UserAId ELSE UserBId END AS User_AId,
CASE WHEN UserAId<UserBId THEN UserBId ELSE UserAId END AS User_BId,
SUM(Duration) AS TotalDuration,
SUM(Impulses) AS TotalImpulses,
COUNT(*) AS TotalCallsCount
FROM CustomerDetail
GROUP BY
CASE WHEN UserAId<UserBId THEN UserAId ELSE UserBId END,
CASE WHEN UserAId<UserBId THEN UserBId ELSE UserAId END
... it works even if UserAId=UserBId (you did not state if those two values can or cannot be the same). You will always get as User_AId the lesser of the 2 Ids, and as User_BId the greater of the 2 Ids... even if that combination does not exist as UserAId, UserBId nowhere in the table (obviously only if it does exist as UserBId, UserAId).
I have tested this on SQLFiddle here.
I am no SQL-Server expert. Some engines do allow the GROUP BY clause to reference calculated columns defined in the SELECT expression list without having to redefine them explicitly. This is non standard SQL, but it does make the SQL much more readable. Not sure if SQL-Server supports some sort of syntax for this.
In short I have 2 tables:
USERS:
------------------------
UserID | Name
------------------------
0 a
1 b
2 c
CALLS:
------------------------
ToUser | Result
------------------------
0 ANSWERED
1 ENGAGED
1 ANSWERED
0 ANSWERED
Etc, etc (i use a numerical referance for result in reality)
I have over 2 million records each detailing a call to a specific client. Currently I'm using Case statements to count each recurance of a particular result AFTER I have already done the quick total count:
COUNT(DISTINCT l_call_log.line_id),
COALESCE (SUM(CASE WHEN l_call_log.line_result = 1 THEN 1 ELSE NULL END), 0) AS [Answered],
COALESCE (SUM(CASE WHEN l_call_log.line_result = 2 THEN 1 ELSE NULL END), 0) AS [Engaged],
COALESCE (SUM(CASE WHEN l_call_log.line_result = 4 THEN 1 ELSE NULL END), 0) AS [Unanswered]
Am I doing 3 scans of the data after my inital total count? if so, is there a way I can do one sweep and count the calls as-per-result in one go?
Thanks.
This would take one full table scan.
EDIT: There's not enough information to answer; because the duplicate removal (DISTINCT) that I missed earlier, we can't tell what strategy that would be used.... especially without knowing the database engine.
In just about every major query engine, each aggregate function is executed per each column per each row, and it may use a cached result (such as COUNT(*) for example).
Is line_result indexed? If so, you could leverage a better query (GROUP BY + COUNT(*) to take advantage of index statistics, though I'm not sure if that's worthwhile depending on your other tables in the query.
There is the GROUP BY construction in SQL. Try:
SELECT COUNT(DISTINCT l_call_log.line_id)
GROUP BY l_call_log.line_result
I would guess it's a table scan, since you don't have any depending subqueries. Run explain on the query to be sure.