I am using a local Access Database connected to Visual Basic. My query is
SELECT RebateReceived, DatePart('yyyy',[RebateMailedDate]) AS MailedDate, Sum(RebateValue) as MoneyReceived
FROM RebateInfoStorage
where RebateReceived='Received'
group by RebateReceived
having DatePart('yyyy',[RebateMailedDate])
I am trying to get the columns that have the same year and the word(s) that have 'received' to identify the records that need to be summed (Added) together. I am not very familiar with the Group By and Having keywords or the Sum() and DatePart() functions.
So the DBMS will go into the RebateInfoStorage and grab all the rows where RebateReceived = 'Received'. Then, it'll group those records, where each group contains records where the expression DatePart('yyyy', RebateMailedDate) evaluates to the same value (i.e. they have the same year). Then for each group, it'll return a single result row with the year, and the sum of all the RebateValues in that group. Operations happen in that order.
HAVING is like WHERE, but happens after the GROUP BY and is a condition placed on a group of records, whereas WHERE is a condition on a record.
SELECT
YEAR(RebateMailedDate) AS MailedDate,
SUM(RebateValue) as MoneyReceived
FROM
RebateInfoStorage
WHERE
RebateReceived = 'Received'
GROUP BY
YEAR(RebateMailedDate);
EDIT: It would appear that YEAR(x) is a more appropriate function!
You should group by DatePart having RebateReceived='Received'. For more information about the syntax of Having you may refer to http://www.w3schools.com/sql/sql_having.asp
Group by means your output table will be grouped according to unique elements in that column. For example, if there are multiple entry with 2014 as year, they will all be grouped together, and their RebateValue will be added up together. If you are grouping with RebateReceived, all the entry will be added and you will end up with a single sum.
Related
I know it wasn't allowed in SQL-92. But since then it may have changed, particularly when there's a window applied. Can you explain the changes and give the version (or versions if there were more) in which they were introduced?
Examples
Is SUM(COUNT(votes.option_id)) OVER() valid syntax per standard SQL:2016 (or earlier)?
This is my comment (unanswered, an probably unlikely to in such an old question) in Why can you nest aggregate functions when using a window function in PostgreSQL?.
The Calculating Running Total (SQL) kata at Codewars has as its most upvoted solution (using PostgreSQL 13.0, a highly standard compliant engine, so the code is likely to be standard) this one:
SELECT
CREATED_AT::DATE AS DATE,
COUNT(CREATED_AT) AS COUNT,
SUM(COUNT(CREATED_AT)) OVER (ORDER BY CREATED_AT::DATE ROWS UNBOUNDED PRECEDING)::INT AS TOTAL
FROM
POSTS
GROUP BY
CREATED_AT::DATE
(Which could be simplified to:
SELECT
created_at::DATE date,
COUNT(*) COUNT,
SUM(COUNT(*)) OVER (ORDER BY created_at::DATE)::INT total
FROM posts
GROUP BY created_at::DATE
I assume the ::s are a new syntax for casting I didn't know of. And that casting from TIMESTAMP to DATE is now allowed (in SQL-92 it wasn't).)
As this SO answer explains, Oracle Database allows it even without a window, pulling in the GROUP BY from context. I don't know if the standard allows it.
You already noticed the difference yourself: It's all about the window. COUNT(*) without an OVER clause for instance is an aggregation function. COUNT(*) with an OVER clause is a window function.
By using aggregation functions you condense the original rows you get after the FROM clause and WHERE clause are applied to either the specified group in GROUP BY or to one row in the absence of a GROUP BY clause.
Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.
In
SELECT
options.id,
options.option_text,
COUNT(votes.option_id) as vote_count,
COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;
we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id). We count on a non-nullable column in the votes table (COUNT(votes.option_id), so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.
After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER) on this result set. We apply the analytic SUM on the vote count (SUM(COUNT(votes.option_id)) by looking at the whole result set (empty OVER clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.
The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at) is nothing else than a mere COUNT(*)) along with a running total of these counts (by using a window that looks at all rows up to the current row).
So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER is not considered an agregation function but an analytic/window function.
Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.
I use queries in Ms ACCESS to create graphs (shown in forms) to represent monthly spend data on a supplier. I want the x axis to be the months in chronological order, and this is where I'm having issues.
The picture above shows that the x axis starts with april 2016, although the earliest date is august 2015.
The query code that creates the graph is the following:
SELECT (Format([DateStamp],"mmm"" '""yy")) AS Expr1, Sum([Item Master].SpendPerMaterial) AS Expr2
FROM [Item Master]
WHERE ((([Item Master].SupplierName)=[Forms]![Supplier History]![List0]))
GROUP BY (Format([DateStamp],"mmm"" '""yy")), (Year([DateStamp])*12+Month([DateStamp])-1);
[Item Master] is the table were all data is retrieved from. DateStamp refers to the column with months, SpendPerMaterial is the spend of a certain material in that month (which is aggregated since we look at the supplier level, not the material level), and List0 is a list where users can select a supplier from a list of suppliers.
You should never rely on the ordering of results from a query unless you include an explicit order by. In your case, the results are ordered by the columns alphabetically (because of the group by).
You can fix this by adding:
order by max([DateStamp])
to the query.
I would add the following to your query, after your GROUP BY clause:
ORDER BY [datestamp] ASC;
I tried the other suggesions on an aggregate totals by month report and no luck. the only way i could get the actual month labels was by putting labels directly beneath the chart, which means altering it every month!
I am doing a Union Query to add together the results of two separate queries that give me data from two different fiscal periods, to get a rolling 12 months number.
I get the message "Your query does not include the specified expression "Report_Header" as part of an aggregate function". I have read that the field needs to be included in a GROUP BY statement at the end, but when I add the field from either query or with both queries as shown below I still get the message. Help? I'm not a programmer, I'm an Access user, so I need to simple please :).
SELECT [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].Report_Header,
Sum([JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].SumOfCASES) AS CASES,
Sum([JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].SumOfPurchases) AS PURCHASES
FROM [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB]
UNION ALL
SELECT [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].Report_Header,
Sum([JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].SumOfCASES) AS CASES,
Sum([JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].SumOfPurchases) AS PURCHASES
FROM [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2]
GROUP BY [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].Report_Header,
[JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].Report_Header
Thanks!
You can aggregate both subqueries:
SELECT [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].Report_Header,
Sum([JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].SumOfCASES) AS CASES,
Sum([JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].SumOfPurchases) AS PURCHASES
FROM [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB]
GROUP BY [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB].Report_Header
UNION ALL
SELECT [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].Report_Header,
Sum([JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].SumOfCASES) AS CASES,
Sum([JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].SumOfPurchases) AS PURCHASES
FROM [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2]
GROUP BY [JOIN_IB_FREIGHT&PURCHASES_Rolling12_SUB2].Report_Header;
This may be what you want. But, it will not combine information under the same header from both tables. For that, the simplest method is probably a view.
Place GROUP BY [JOIN_IB_FREIGHT&PURCHASES_ROLLING12_SUB].Report_Header under the first query instead of the second.
I do not understand the following (returns numbers of comments for articles with the newest ones dates):
SELECT `id_comment`,COUNT(*) AS `number`, MAX(`date`) AS `newest`
FROM `page_comments`
WHERE TO_DAYS( NOW() )-TO_DAYS(`date`) < 90
GROUP BY `id_comment`
ORDER BY `count` DESC,`newest` DESC
I dont understand how come that the MAX function will not return the MAX value of all the page_comments table? That it automatically takes only the max for the given group. When using MAX, I would expect it to return the highest value of the column. I dont understand how it works together with groupig.
You described the behavior yourself quite correctly already: it automatically takes only the max for the given group.
If you group, you do it (per usual) on every column in the result set, that is not aggregated (not using COUNT, SUM, MIN, MAX...)
That way you get distinct values for all non aggregated columns and the aggregated ones will yield a result that only takes the 'current' group into account.
I am just explaining it to the ground.
MAX() - An aggregate function(Works over the group of data).
If ""group by"" clause is NOT specified, the database implicitly groups the data(column specified) considering the entire result set as group.
If specified, it just groups the data(column) in the group logic specified.
It all boils down to analysis order:
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
10 ORDER BY
TOP
so you first have the from clause
Then you cut the relevant rows via where ( so here your sentence : *I don't understand how come that the MAX function will not return the MAX value of all the page_comments* --fails)
then group it
Then you select it.
The max and aggregate functions apply on the data which is already filtered!
By default, the SUM - sums up all the column values. But in my case, i am having a report which is grouped by Name. A name can have single offer with multiple start date's. So, a report has to display each entry for all different start date i.e Same name, offer, players only difference is the date. So for ex, when you sum up the players, only one entry per name needs to taken into account. Because, even though it has multiple start date, other entries are same and duplicated.
The expected result should be like,
The offer cost $10 refers to same $10, so it should be added only once. Similarly for players, etc., But i need the display as shown above, each entries should be shown.
How to solve this?
If all you want to do is avoid aggregating the value in the group total row, as in your example, just remove the aggregation from the expression, i.e. change:
=Sum(Fields!Players.Value)
to:
=Fields!Players.Value
This just returns the first Players value in the Scope - since it's the same value for every row this should be fine.
If you need to further aggregate this value to something like a grand total row, you have a couple of options.
For 2008R2 and above, you can use nested aggregates as an expression in the report - something like:
=Sum(Max(Fields!Players.Value,"MyGroup"))
For 2008 and below, you will need to add the aggregate value to each row in the Dataset and use this without aggregation in the report as required.
I haven’t worked with SSRS much but if this was a regular SQL query you would have to group by date range.
Try adding start date column and check if you can add another group by on top of what you already have.
It would be useful if you can provide more details here like table schema you use for retrieving the data.