Optimizing GROUP BY performance - sql

Is there some tricky way to GROUP BY a variable which has been defined by alias or which is a result of calculation? I think that the following code makes a double dip by calculating MyMonth in Select statement and then again in Group statement. It may be unnecessary waste. It is not possible by simple GROUP BY MyMonth. Is it possible to force only one calculation of month([MyDate])?
Update of code. Aggregate function is added.
SELECT month([MyDate]) AS MyMonth, count([MyDate]) AS HowMany
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY month([MyDate])
ORDER BY MyMonth

Your real problem likely stems from calling MONTH(...) on every row. This prevents the optimizer from using an index to fulfill the count (it can use it for the WHERE clause, but this will still be many rows).
Instead, you should turn this into a range query, that the optimizer could use for comparisons against an index. First we build a simple range table:
WITH Months as (SELECT MONTH(d) AS month,
d AS monthStart, DATEADD(month, 1, d) AS monthEnd
FROM (VALUES(CAST('20140101' AS DATE))) t(d)
UNION ALL
SELECT MONTH(monthEnd),
monthEnd, DATEADD(month, 1, monthEnd)
FROM Months
WHERE monthEnd < CAST('20150101' AS DATE))
SQL Fiddle Example
(if you have an existing calendar table, you can base your query on that, but sometimes a simple ad-hoc one works best)
Once we have the range-table, you can then use it to constrain and bucket your data, like so:
SELECT Months.month, COUNT(*)
FROM TableA
JOIN Months
ON TableA.MyDate >= Months.monthStart
AND TableA.MyDate < Months.monthEnd
GROUP BY Months.month
Note: The start of the date range was changed to 2014-01-01, as it seems strange that you'd only include one day from January, when aggregating months...

No, you can't use column alias directly in the GROUP BY clause. Instead do a select in the from list, and use the result column in your group by.
select MyMonth, MAX(someothercolumn)
from
(
SELECT month([MyDate]) AS MyMonth,
someothercolumn
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-31' AND '2014-12-31'
)
GROUP BY MyMonth
ORDER BY MyMonth

Related

Selecting group of years from date field

I'm trying to get a list of years from a date field that's stored as an nvarchar so am thinking doing a subquery to convert the date then select the year is the best way to go but having a hard time setting it up.
select datepart(yyyy,
(
SELECT convert(date,'21-02-12 6:10:00 PM',5) datenum
)
) as [year]
from SalesReport_AllDBs
group by datepart(yyyy, [datenum])
Any advice would be helpful to get this set up correctly
The subquery should go in your FROM clause:
SELECT datepart(yyyy, mydate) as datenum
FROM (SELECT convert(date, yourdatestringfield ,5) as myDate FROM SalesReport_AllDBs) as years
GROUP BY datepart(yyyy,mydate);
Or in one query without a subquery, which is a lot nicer looking:
SELECT datepart(convert(date, yourdatestringfield ,5)) as datenum
FROM SalesReport_AllDBs
GROUP BY datenum
You should really just fix the table to hold dates instead of strings though. This is just going to lead to some nightmare scenarios and a slow slow query.
select distinct year(cast([datenum] as date)) year
from SalesReport_AllDBs

Select "YYYY" component only from DateTime column

Using SQLCe, I have a column of DateTime type. I would like to filter just by year. Is it possible or should I store year separately, which seems to me redundant?
E.g. get distinct results of 2010,2011,2013.
Thanks
think you have the DATEPART function (but not the YEAR function)
so
select DatePart(yyyy, <yourDateTime>)
or if that's for ordering, of course
order by DatePart(yyyy, <yourDatetime>)
EDIT
select max(InvoiceID)
from yourTable
where DatePart(yyyy, IssuedDate) = 2013
You can use the DATEPART function to return the year for that column:
SELECT DATEPART(yyyy, datetimecolumn) FROM YourTable
You can then filter with a where clause:
WHERE datetimecolumn = 2014
The usual way to do this is to use a range filter:
select *
from table
where datecolumn >= '2012/01/01' and datecolumn < '2013/01/01'
This has the benefit that any index you may have on datecolumn can be used.
Since the answer you accepted shows that you only care about one single year, your objection to this answer doesn't really apply.
select max(InvoiceID)
from table
where IssuedDate >= '2012/01/01' and IssuedDate < '2013/01/01'
will work just fine.

Can we do partition date wise?

I am new to the concept of Partition. I know horizontal partition but can we do partition date - wise?
in my project I want that whenever we enter in new-year, partition should be created. Can anyone explain how to do this? I am working on ERP sw and it has data of past year and I need partition on year wise.(for example APR-2011 to MAR-2012 is a year)
I think what you are looking for is something like this:
DECLARE #referenceDate datetime = '3/1/2011'
SELECT
[sub].[Period] + 1 AS [PeriodNr],
YEAR(DATEADD(YEAR, [sub].[Period], #referenceDate)) AS [PeriodStartedIn],
COUNT(*) AS [NumberOfRecords],
SUM([sub].[Value]) AS [TotalValue]
FROM
(
SELECT
*,
FLOOR(DATEDIFF(MONTH, #referenceDate, [timestamp]) / 12.0) AS [Period]
FROM [erp]
WHERE [timestamp] >= #referenceDate
) AS [sub]
GROUP BY [sub].[Period]
ORDER BY [sub].[Period] ASC
The fiddle is found here.
When you say partitioning I am curious if you mean in a windowed function or just grouping data in general. Let me show you two ways to aggregate data with partitioning it in a self demonstrating example:
declare #Orders table( id int identity, dt date, counts int)
insert into #Orders values ('1-1-12', 2),('1-1-12', 3),('1-18-12', 1),('2-11-12', 5),('3-1-12', 2),('6-1-12', 8),('10-1-12', 2),('1-13-13', 8)
-- To do days I need to do a group by
select
dt as DayDate
, SUM(counts) as sums
from #Orders
group by dt
-- To do months I need to group differently
select
DATEADD(month, datediff(month, 0, dt), 0) as MonthDate
-- above is a grouping trick basically stating count from 1/1/1900 the number of months of difference from my date field.
--This will always yield the current first day of the month of a date field
, SUM(counts) as sums
from #Orders
group by DATEADD(month, datediff(month, 0, dt), 0)
-- well that is great but what if I want to group different ways all at once?
-- why windowed functions rock:
select
dt
, counts
, SUM(counts) over(partition by DATEADD(year, datediff(year, 0, dt), 0)) as YearPartitioning
, SUM(counts) over(partition by DATEADD(month, datediff(month, 0, dt), 0)) as MonthPartitioning
-- expression above will work for year, month, day, minute, etc. You just need to add it to both the dateadd and datediff functions
, SUM(counts) over(partition by dt) as DayPartitioning
from #Orders
The important concepts on grouping is the traditional group by clause which you MUST LIST that which is NOT performing a math operation on as a pivot to do work on. So in my first select I just chose date and then said sum(counts). It then saw on 1-1-12 it had two values so it added both of them and on everything else it added them individually. On the second select method I perform a trick on the date field to make it transform to the first day of the month. Now this is great but I may want all of this at once.
Windowed functions do groupings inline, meaning they don't need a grouping clause as that is what the over() portion is doing. It however may repeat the values since you are not limiting your dataset. This means that if you look at the third column of the third select 'YearPartitioning' it repeats the number 23 seven times. WHy? Well because you never told the statement to do any grouping outside the function so it is showing every row. The number 23 will occur as long as the expression is true that the year is the same for all values. Just remember this when selecting from a windowed expression.

What is the fastest way to group a DateTime column by Date in T-SQL

I have an older sql 2005 box, and I need to do some summaries of a table with ~500m rows.
I have a datetime column in the table and I want to get just the date out of it for output and group by. I know there are a few ways to do this, but what is the absolute fastest?
Thanks
I suspect the fastest would be to:
SELECT
the_day = DATEADD(DAY, the_day, '19000101'),
the_count
FROM
(
SELECT
the_day = DATEDIFF(DAY, '19000101', [the_datetime_column]),
the_count = COUNT(*)
FROM dbo.the_table
GROUP BY DATEDIFF(DAY, '19000101', [the_datetime_column])
WHERE ...
) AS x;
But "fastest" is relative here, and it will depend largely on the indexes on the table, how you're filtering out rows, etc. You will want to test this against other typical date truncation methods, such as CONVERT(CHAR(8), [the_datetime_column], 112).
What you could consider - depending on whether this query is more important than write performance - is adding a persisted computed column with an index, or an indexed view, that would help this aggregation for you at write time instead of query time.
I imagine you can get a slightly better performance this way.
SELECT cast(cast([actiontime]+.5 as int) as datetime) as [yourdate], count(*) as count
FROM <yourtable>
GROUP BY cast([<yourdate>]+.5 as int)
You can improve this once you upgrade to mssql server 2008.
SELECT cast([<yourdate>] as date) as [yourdate], count(*) as count
FROM <yourtable>
GROUP BY cast([<yourdate>] as date)

Execute count(*) on a group-by result-set

I am trying to do a nice SQL statement inside a stored procedure.
I looked at the issue of seeing the number of days that events happened between two dates.
My example is sales orders: for this month, how many days did we have sales orders?
Suppose this setup:
CREATE TABLE `sandbox`.`orders` (
`year` int,
`month` int,
`day` int,
`desc` varchar(255)
)
INSERT INTO orders (year, month, day, desc)
VALUES (2009,1,1, 'New Years Resolution 1')
,(2009,1,1, 'Promise lose weight')
,(2009,1,2, 'Bagel')
,(2009,1,12, 'Coffee to go')
For this in-data the result should be 3, since there has been three days with sale.
The best solution I found is as below.
However, making a temporary table, counting that then dropping it seemes excess. It "should" be possible in one statement.
Anyone who got a "nicer" solution then me?
/L
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
SELECT count(*) from #Some_Days
Apologies if I'm misunderstanding the question, but perhaps you could do something like this, as an option:
SELECT COUNT(*) FROM
(SELECT DISTINCT(SomeColumn)
FROM MyTable
WHERE Something BETWEEN 100 AND 500
GROUP BY SomeColumn) MyTable
... to get around the temp-table creation and disposal?
There are two basic options which I can see. One is to group everything up in a sub query, then count those distinct rows (Christian Nunciato's answer). The second is to combine the multiple fields and count distinct values of that combined value.
In this case, the following formula coverts the three fields into a single datetime.
DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0)
Thus, COUNT(DISTINCT [formula]) will give the answer you need.
SELECT
COUNT(DISTINCT DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0))
FROM
Quarter
WHERE
[Quarter].Start >= '2009-01-01'
AND [Quarter].End < '2009-01-16'
I usually use the sub query route, but depending on what you're doing, indexes, size of table, simplicity of the formula, etc, this Can be faster...
Dems.
How about:
SELECT COUNT(DISTINCT day) FROM orders
WHERE (year, month) = (2009, 1);
Actually, I don't know if TSQL supports tuple comparisons, but you get the idea.
COUNT(DISTINCT expr) is standard SQL and should work everywhere.
You should use nested Select statements. Inner one should contain group by clause, and the outer one should count it. I think "Christian Nunciato" helped you already.
Select Count(1) As Quantity
From
(
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
) AS InnerResultSet
SELECT [Year], [Month], [Day]
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
COMPUTE COUNT(*)