SQL rollup - prevent summing records multiple times - sql

Firstly, I could not think of a better question title. Apologies for that.
So, I am writing a query and here is something(I think) it would return without aggregating functions and group by. I am using this as an example and actual query contains a lot more fields:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
B 2 50
C 3 60
A 2 50
A 1 25 <--Not actually duplicate
Now you would say there are duplicate records. But in fact they are not duplicate in a way that there are some extra fields(not shown here) which would have different values for those seemingly duplicate records.
What I want:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
2 50
TOTAL 75
B 2 50
TOTAL 50
C 3 60
TOTAL 60
//EDIT - Apparently following line is causing too much confusion. Ignore it. How can I get rest of the table correctly?
TOTAL 135 //It seems its quite difficult to get 135 here. Its ok if this total is messed up
What I am trying:
SELECT
SOME_FIELDS,
SUBJ,
CLASSROOM,
SUM(CLASSROOM_CAPACITY)
FROM
MYTABLE
WHERE .....
GROUP BY SOME_FIELDS, ROLLUP(SUBJ,CLASSROOM)
The problem:
Thanks to those "seemingly duplicate" records, classroom capacities are being summed up multiple times. How do I prevent that? Am I doing this the wrong way?
The actual query is lot more complicated but I think if I can get this right, I can apply it to bigger query.
PS: I know how to get text "Total" instead of blank entry with ROLLUP using GROUPING so you can skip that part.

The cardinality you're introducing is a little off and when you sort the that ROLLUP starts to work. Your saying that:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
is equal to:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
But the SOME_FIELDS could vary per row. When you aggregate up to just the columns above, what do you expect to happen to SOME_FIELDS?
If these can be ignore for the purposes of this query your best bet is to first find the DISTINCT records (i.e. records that contain a unique tuple of subj, classroom and classroom_capacity) and then do the ROLLUP on this data set. The following query achieves this:
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT
subj
, classroom
, SUM(classroom_capacity)
FROM distinct_subj_classrm_capacity
GROUP BY ROLLUP(subj, classroom)
If you're not interested in the break report results that ROLLUP gives you and you simply want the raw totals then you can use the analytic version of SUM (see here for more on Oracle analytic functions: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm)
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT DISTINCT
subj
, SUM(classroom_capacity) OVER (PARTITION BY subj) classroom_capacity_per_subj
FROM distinct_subj_classrm_capacity
This gives results in the format:
SUBJ CLASSROOM_CAPACITY_PER_SUBJ
A 75
B 50
C 60

Related

grouping in sql in Access

I have a table that looks like this with three columns From, To, and Symbol:
From To Symbol
0 2 dog
2 5 dog
5 9 cat
9 15 cat
15 20 dog
20 40 dog
40 45 dog
I was trying to write an SQL query that groups records in a way that produces the following result:
From To Symbol
0 5 dog
5 15 cat
15 45 dog
That is, if the From and To values are continuous for the same Symbol, one result record is created with the smallest From and the largest To values and the Symbol. In the above example table, since the second record has a value of 2 in the To column which is not the same as the From value in the next record with the same Symbol (15, 20, dog), two result records are created for the same Symbol (dog).
I have tried to join the table to itself, then group by. But I could not figure out how exactly that can be done. I have to do this in Microsoft Access. Any help would be greatly appreciated. Thanks!
Assuming the values have no overlaps and that gaps separate values, you can do this in MS Access with a trick. You need to identify the adjacent symbols that are the same. Well, you can identify them by counting the number of previous rows with different symbols (using a subquery). Once you have this information, the rest is aggregation:
select symbol, min(from) as from, max(to) as to
from (select t.*,
(select count(*)
from t as t2
where t2.from < t.from and t2.symbol <> t.symbol
) as grp
from t
) t
group by symbol, grp;
Gaps would make this problem much harder in MS Access.
Note: Don't use reserved words or keywords for column names. This code uses the names supplied in the question, but doesn't bother to escape them. I think that just makes it harder to understand the query.

Selecting filtered values from Oracle using ROWNUM

I have a requirement wherein i need to find the record number of the records that are returned from the resultset. I know that i can use ROWNUM to get the record number from the resultset but my issue is slightly different. below are the details
Table : ProcessSummary
Columns:
PS_PK ProcessId StepId AsscoiateId ProcessName AssetAmount
145 25 50 Process1 3,500.00
267 26 45 Process2 4,400.00
356 27 70 Process3 2,400.00
456 28 80 90 Process4 780.00
556 29 56 67 Process5 4,500.00
656 45 70 Process6 6,000.00
789 31 75 Process7 8,000.00
Now what i need to do is fetch all the records from the ProcessSummary Table when either of ProcessId OR StepId OR AssociateId is NULL. I wrote the below query
select * from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
As expected i got 1st, 2nd, 3rd, 6th and 7th records in the resultset that got returned.
Now what i need is to get the records numbers 1,2,3,6,7. I tried to use the ROWNUM as below but i got the values of 1,2,3,4,5 and not 1,2,3,6,7.
select ROWNUM from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
Is it possible to get the ROWNUM values in the sequence that i want and if yes then can you please let me know how can i do this. Also if ROWNUM cannot be used then what would be the other option that i can use to get the result in the form that i want.
Any help would be greately appericiated as i could not find much on the net or SO regarding this sort of requirement.
Thanks
Vikeng21
rownum is an internal numbering that gives you a row number based on the current query results only, so that numbering is not tied to a specific record, and it will change when you change the data or the query.
But the numbering you ask for is already in your table. It looks like you just need to SELECT PS_PK .. instead. PS_PK is the field in your table that contains the actual number you want.
You can generate a numbering using an analytical function, and then filter that query. You need some fields to order by, though. In this case I've chosen PS_PK, but it can be another field, like ProcessName or a combination of other fields as well.
select
*
from
(select
dense_rank() over (order by PS_PK) as RANKING,
p.*
from
ProcessSummary p)
where
ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
So, in this query, first a numbering is calculated for each row that is returned from the inner query. The numbering is returned as the field RANKING. And then the other query filters further, but still will return the field RANKING with the original numbering.
Instead of dense_rank there is also rank and row_number. The differences are subtle, but you can just experiment and read some docs here and here to learn about the differences and see which one fits you best.
Note that this might slow down your query, because the inner query first generates a number for each row in the table (there is no filtering on that level now).

SQL query for child table summary and generalazation

I have 4 tables with diagram below
I want to summary query for the Institution table. where I want to get result of only,
InstitutionType ProductName Quantity
For example. sample data of institution table
Id Name Address InstitionTypeId
1 aaa ny132 1001
2 bbb dx23 1001
3 ccc bn33 1002
And the InstitionProduct is like that
Id ProductId Quantity InstitionId
1 1000 120 1
2 1000 100 2
3 1000 50 3
Then I want a query result to output total quantity of a given product by Instition Type wise. The sample output will look like this.
InstitutionTypeId productId quantity
1001 1000 220
1002 1000 50
So I want to group the institution by type and aggregate the product quantity of all institution type group.
I tried to use the group by clause, but with the product quantity not as a grouping element it results in error.
SELECT
Institution.InstitutionTypeID,
InstitutionProduct.ProductID,
SUM(InstitutionProduct.Quantity)
FROM
Institution
LEFT JOIN
InstitutionProduct
ON InstitutionProduct.InstitutionID = Institution.ID
GROUP BY
Institution.InstitutionTypeID,
InstitutionProduct.ProductID
If you are querying with group by you need to use either aggregate functions or group by all included fields. The reason is, that the 'group by' returns exactly one row per 'group by' value, so if you introduce an ungrouped field, this would conflict if the field has more than one value per grouping constraint. Even though this might not be the case for your dataset, the query engine cannot know this, and raises an error.
The solution is to introduce aggregates for all non-grouping field with aggregates being (among others): average (avg), summarize (sum), minimum (min) and maximum (max). This would lead to something like
SELECT i.InstitutionTypeID, i.Institution.ID, SUM(ip.Quantity)
FROM Institution I LEFT JOIN InstitutionProduct IP
ON IP.InstituationID = I.ID
GROUP BY i.InstitutionTypeID, i.Institution.ID

Query to find and display unique records in ms-access?

I am using MS-ACCESS. I have a table with field as Receipt_No. In this field there are many times repeated values. I just want to display this repeated values only once rather than displaying it to several times.
Here is my table:
Registration_No Payment_Date Charges Receipt_No
T-11 8/7/2011 200 105
T-12 8/7/2011 200 106
T-13 7/12/11 200 107
T-14 12/7/2011 200 108
T-15 12/7/2011 400 108
Here in Receipt_No field 108 appears 2 times i want to display it only once as:(charges either 200 or 400. But Receipt_No should display once): Please help me..
Registration_No Payment_Date Charges Receipt_No
T-11 8/7/2011 200 105
T-12 8/7/2011 200 106
T-13 7/12/11 200 107
T-14 12/7/2011 200 108
If you want to display only the records in your table with a receipt number that appears exactly once, use this query:
select * from Demand
where reg_no in (
select reg_no
from Demand
group by reg_no
having count(*) = 1
)
With the clarifications you've provided, it looks like what you want is more like in this question, where you want to return all fields, but only one record per receipt number. Here is a variation on the accepted answer:
select * from demand
inner join
(
select
receipt_no,
min(charges) AS min_charges
from
demand
group by
receipt_no
) sq
on demand.receipt_no = sq.receipt_no
and demand.charges = sq.min_charges
Note that this is still not exactly what you want: if there are two or more records with the same values for receipt_no and charges, this query will return them all.
Part of the problem is that your table is not well-defined: it does not appear to have a field that is unique for every record. With such a field, you can modify the query above to return a single row for each receipt_no. (Another part of the problem is that there seems to be something missing from the business requirement: usually, we would want to report the total charges from a receipt, or each charge from a receipt.)
Not sure exactly what you need in your query since you didn't provide many details but using SELECT DISTINCT Omits records that contain duplicate data in the selected fields. To be included in the results of the query, the values for each field listed in the SELECT statement must be unique.
see MS Access Docs for more detail
But as an example the following query would select all LastNames but it would remove duplicate values.
SELECT DISTINCT LastName
FROM Employees;

MySQL: Getting highest score for a user

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks
Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.
Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!
Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'