Two aggregates in one cross apply possible? - sql

I'm very much new to SQL and I'm trying to use CROSS APPLY, something I know very little about.
I'm trying to pull two SUMs of items sorted by an ID from two different tables. One SUM of all items dispensed by a cartridge, one SUM of all items refilled into a cartridge. The dispenses and refills are in separate tables. In Sample 1 you can see a piece of code that works for one of these two SUMs, currently its for the Dispensed SUM, but it also works if I change everything for the refilled SUM. Point being I can only do one SUM in this CROSS APPLY, regardless which one of the two.
So it goes wrong when I try to pull both SUMs in this one CROSS APPLY, probably cause I don't really know what I'm doing. I try to do this with the code seen in Sample 2 (which is pretty much the same code).
Some extra context:
There are two ID's here that are important:
The CartridgeRefill.FK_CartridgeRegistration_Id (or ID) is the ID for a cartridge itself. The FK_CartridgeRefill_Id is the ID for a refill, a cartridge can go through multiple refills and dispenses are registered by what refill they were dispensed from. That's why you can see the same ID multiple times in the output.
Sample 1:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
Sample 2:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, Sums.Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
,SUM(CartridgeRefillItem.Amount) AS Refilled
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
When I run sample 1 I get this output:
ID Dispensed
10 95
8 143
6 143
11 70
11 312
11 354
8 19
8 24
8 3
8 33
This output is correct, it displays the number of dispensed items next to the ID it belongs to.
This is the error I get when I run sample 2:
Msg 4101, Level 15, State 1, Line 15
Aggregates on the right side of an APPLY cannot reference columns from the left side.
But what I want to see is:
ID Dispensed Refilled (example)
10 95 143
8 143 12
6 143 etc...
11 70
11 312
11 354
8 19
8 24
8 3
8 33
I think it has something to do with CROSS APPLY running line by line? But again, I still don't exactly know what I'm doing yet. Any help would be really appreciated and please ask whatever you need to know :)

Error is quite self explanatory, you cannot run an aggregate using a reference that's outside of CROSS APPLY. You'll need to rewrite your query by adding a additional subquery to calculate SUM or use a GROUP BY clause. I've quickly scraped this:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, SUM(CartridgeRefillMedication.Amount) AS Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillMedication.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
GROUP BY CartridgeRefill.FK_CartridgeRegistration_Id;
Hopefully this works.

You may not want aggregation at all. The number of rows is not being reduced, so this may be what you want:
SELECT cr.FK_CartridgeRegistration_Id AS ID,
d.Dispensed, cr.Amount AS Refilled
FROM CartridgeRefillItem cr CROSS APPLY
(SELECT SUM(cd.Amount) AS Dispensed
FROM CartridgeDispenseAttempt c
WHERE cr.FK_CartridgeRefill_Id = cd.FK_CartridgeRefill_Id
) d;
I would expect that you want separate totals for each id. If so, then your sample results are not sensible because ids are repeated. But this would seem to do something useful:
select id, sum(refill_amount) as refill_amount,
sum(dispensed_amount) as dispensed_amount
from ((select cr.FK_CartridgeRegistration_Id as id,
cr.Amount as refill_amount,
0 as dispensed_amount
from CartridgeRefillItem cr
) union all
(select cd.FK_CartridgeRegistration_Id as id,
0, cd.Amount
from CartridgeDispenseAttempt cd
)
) c
group by id

Related

fetch aggregate value along with data

I have a table with the following fields
ID,Content,QuestionMarks,TypeofQuestion
350, What is the symbol used to represent Bromine?,2,MCQ
758,What is the symbol used to represent Bromine? ,2,MCQ
2425,What is the symbol used to represent Bromine?,3,Essay
2080,A quadrilateral has four sides, four angles ,1,MCQ
2614,A circular cone has a curved surface area of ,2,MCQ
2520,Two triangles have sides 5 cm, 11 cm, 2 cm . ,2,MCQ
2196,Life supporting process mediated by water? ,2,Essay
I would like to get random questions where total marks is an input number.
For example if I say 25, the result should be all the random questions whose Sum(QuestionMarks) is 25(+/-1)
Is this really possible using a SQL
select content,id,questionmarks,sum(questionmarks) from quiz_question
group by content,id,questionmarks;
Expected Input 25
Expected Result (Sum of Question Marks =25)
Update:
How do I ensure I get atleast 2 Essay Type Questions (this is just an example) I would extend this for other conditions. Thank you for all the help
S-Man's cumulative sum is the right approach. For your logic, though, I think you want to get up to the first row that is 24 or more. That logic is:
where total - questionmark < 24
If you have enough questions, then you could get exactly 25 using:
with q25 as (
select *
from (select t.*,
sum(questionmark) over (order by random()) as running_questionmark
from t
) t
where running_questionmark < 25
)
select q.ID, q.Content, q.QuestionMarks, q.TypeofQuestion
from q25 q
union all
(select t.ID, t.Content, t.QuestionMarks, t.TypeofQuestion
from t cross join
(select sum(questionmark) as questionmark_25 from q25) x
where not exists (select 1 from q25 where q25.id = t.id)
order by abs(questionmark - (25 - questionmark_25))
limit 1
)
This selects questions up to 25 but not at 25. It then tries to find one more to make the total 25.
Supposing, questionmark is of type integer. Then you want to get some records in random order whose questionmark sum is not more than 25:
You can use the consecutive SUM() window function. The order is random. The consecutive SUM() adds every current value to the previous sum. So, you could filter where SUM() <= <your value>:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
SUM(questionmark) OVER (ORDER BY random()) as total
FROM
t
)s
WHERE total <= 25
Note:
This returns a records list with no more than 25, but as close as possible to it with an random order.
To find an exact match of your value is some sort of combinatorical problem which shouldn't be solved in a database. Especially when there's a random factor. What if your current SUM is 22 and the next randomly chosen value is 4. Would you retry maybe until infinity to randomly find a value = 3? Or are you trying to remove an already counted record with value = 1?

SQL Server: Two COUNTs in one query multiplying with one another in output

I have a query is used to display information in a queue and part of that information is showing the amount of child entities (packages and labs) that belong to the parent entity (change). However instead of showing the individual counts of each type of child, they multiply with one another.
In the below case, there are supposed to be 3 labs and 18 packages, however the the multiply with one another and the output is 54 of each.
Below is the offending portion of the query.
SELECT cef.ChangeId, COUNT(pac.PackageId) AS 'Packages', COUNT(lab.LabRequestId) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN dbo.Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN dbo.Package pac
ON (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
I feel like this is obvious but it's not occurring to me how to fix it so the two counts are independent of one another like to me they should be. There doesn't seem to be a scenario like this in any of my research either. Can anyone guide me in the right direction?
Because you do multiply source rows by each left join. So sometimes you have more likely cross join here.
SELECT cef.ChangeId, p.Packages, l.Labs
FROM dbo.ChangeEvaluationForm cef
OUTER APPLY(
SELECT COUNT(*) as Labs
FROM dbo.Lab
WHERE cef.ChangeId = Lab.ChangeId
) l
OUTER APPLY(
SELECT COUNT(*) AS Packages
FROM dbo.Package pac
WHERE (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
) p
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
perhaps GROUP BY is not needed now.
From you question its difficult to derive what result do you expect from your query. So I presume you want following result:
+----------+----------+------+
| ChangeId | Packages | Labs |
+----------+----------+------+
| 255 | 18 | 3 |
+----------+----------+------+
Try below query if you are looking for above mentioned result.
SELECT cef.ChangeId, ISNULL(pac.PacCount, 0) AS 'Packages', ISNULL(Lab.LabCount, 0) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN (SELECT Lab.ChangeId, COUNT(*) LabCount FROM dbo.Lab GROUP BY) Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN (SELECT pac.ChangeId, COUNT(*) PacCount FROM dbo.Package pac WHERE pac.PackageStatus != 6 AND pac.PackageStatus !=7 GROUP BY pac.ChangeId) pac
ON cef.ChangeId = pac.ChangeId
WHERE cef.ChangeId = 255
Query Explanation:
In your query you didn't use group by, so it ended up giving you 54 as count which is Cartesian product.
In this query I tried to group by 'ChangeId' and find aggregate before joining tables. So 3 labs and 18 packages will be counted before join.
Your will also notice that I have moved PackageStatus filter before group by in pac table. So unwanted record won't mess with our count.
You start with a particular ChangeId from the dbo.ChangeEvaluationForm table (ChangeId = 255 from your example), then join to the dbo.Lab table. This join makes your result go from 1 row to 3, considering there are 3 Labs with ChangeId = 255. Your problem is on the next join, you are joining all 3 resulting rows from the previous join with the dbo.Package table, which has 18 rows for ChangeId = 255. The resulting count for columns pac.PackageId and lab.LabRequestId will then be 3 x 18 = 54.
To get what you want, there are 2 easy solutions:
Use COUNT DISTINCT instead of COUNT. This will just count the different values of pac.PackageId and lab.LabRequestId and not the repeated ones.
Split the joins into 2 subqueries and join their result (by ChangeId)

SQL to return records that do not have a complete set according to a second table

I have two tables. I want to find the erroneous records in the first table based on the fact that they aren't complete set as determined by the second table. eg:
custID service transID
1 20 1
1 20 2
1 50 2
2 49 1
2 138 1
3 80 1
3 140 1
comboID combinations
1 Y00020Y00050
2 Y00049Y00138
3 Y00020Y00049
4 Y00020Y00080Y00140
So in this example I would want a query to return the first row of the first table because it does not have a matching 49 or 50 or (80 and 140), and the last two rows as well (because there is no 20). The second transaction is fine, and the second customer is fine.
I couldn't figure this out with a query, so I wound up writing a program that loads the services per customer and transid into an array, iterates over them, and ensures that there is at least one matching combination record where all the services in the combination are present in the initially loaded array. Even that came off as hamfisted, but it was less of a nightmare than the awkward outer joining of multiple joins I was trying to accomplish with SQL.
Taking a step back, I think I need to restructure the combinations table into something more accommodating, but I still can't think of what the approach would be.
I do not have DB2 so I have tested on Oracle. However listagg function should be there as well. The table service is the first table and comb the second one. I assume the service numbers to be sorted as in the combinations column.
select service.*
from service
join
(
select S.custid, S.transid
from
(
select custid, transid, listagg(concat('Y000',service)) within group(order by service) as agg
from service
group by custid, transid
) S
where not exists
(
select *
from comb
where S.agg = comb.combinations
)
) NOT_F on NOT_F.custid = service.custid and NOT_F.transid = service.transid
I dare to say that your database design does not conform to the first normal form since the combinations column is not atomic. Think about it.

SQL: Select Top 2 Query is Excluding Records with more than 2 Records

I just joined after having a problem writing a query in MS Access. I am trying to write a query that will pull out the first two valid samples in from a list of replicated sample results and then would like to average the sample values. I have written a query that does pull samples with only two valid samples and averages these values. However, my query doesn't pull samples where there are more than two valid sample results. Here's my query:
SELECT temp_platevalid_table.samp_name AS samp_name, avg (temp_platevalid_table.mean_conc) AS fin_avg, count(temp_platevalid_table.samp_valid) AS sample_count
FROM Temp_PlateValid_table
WHERE (Temp_PlateValid_table.id In (SELECT TOP 2 S.id
FROM Temp_PlateValid_table as S
WHERE S.samp_name = S.samp_name and s.samp_valid=1 and S.samp_valid=1
ORDER BY ID))
GROUP BY Temp_PlateValid_table.samp_name
HAVING ((Count(Temp_PlateValid_table.samp_valid))=2)
ORDER BY Temp_PlateValid_table.samp_name;
Here's an example of what I'm trying to do:
ID Samp_Name Samp_Valid Mean_Conc
1 54d2d2 1 15
2 54d2d2 1 20
3 54d2d2 1 25
The average mean_conc should be 17.5, however, with my current query, I wouldn't receive a value at all for 54d2d2. Is there a way to tweak my query so that I get a value for samples that have more than two valid values? Please note that I'm using MS Access, so I don't think I can use fancier SQL code (partition by, etc.).
Thanks in advance for your help!
Is this what you want?
select pv.samp_name, avg(pv.value_conc)
from Temp_PlateValid_table pv
where pv.samp_valid = 1 and
pv.id in (select top 2 id
from Temp_PlateValid_table as pv2
where pv2.samp_name = pv.samp_name and pv2.samp_valid = 1
)
group by pv.samp_name;
You might need avg(pv.value_conc * 1.0).

MySQL: Getting highest score for a user

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks
Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.
Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!
Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'