Can we do partition date wise? - sql

I am new to the concept of Partition. I know horizontal partition but can we do partition date - wise?
in my project I want that whenever we enter in new-year, partition should be created. Can anyone explain how to do this? I am working on ERP sw and it has data of past year and I need partition on year wise.(for example APR-2011 to MAR-2012 is a year)

I think what you are looking for is something like this:
DECLARE #referenceDate datetime = '3/1/2011'
SELECT
[sub].[Period] + 1 AS [PeriodNr],
YEAR(DATEADD(YEAR, [sub].[Period], #referenceDate)) AS [PeriodStartedIn],
COUNT(*) AS [NumberOfRecords],
SUM([sub].[Value]) AS [TotalValue]
FROM
(
SELECT
*,
FLOOR(DATEDIFF(MONTH, #referenceDate, [timestamp]) / 12.0) AS [Period]
FROM [erp]
WHERE [timestamp] >= #referenceDate
) AS [sub]
GROUP BY [sub].[Period]
ORDER BY [sub].[Period] ASC
The fiddle is found here.

When you say partitioning I am curious if you mean in a windowed function or just grouping data in general. Let me show you two ways to aggregate data with partitioning it in a self demonstrating example:
declare #Orders table( id int identity, dt date, counts int)
insert into #Orders values ('1-1-12', 2),('1-1-12', 3),('1-18-12', 1),('2-11-12', 5),('3-1-12', 2),('6-1-12', 8),('10-1-12', 2),('1-13-13', 8)
-- To do days I need to do a group by
select
dt as DayDate
, SUM(counts) as sums
from #Orders
group by dt
-- To do months I need to group differently
select
DATEADD(month, datediff(month, 0, dt), 0) as MonthDate
-- above is a grouping trick basically stating count from 1/1/1900 the number of months of difference from my date field.
--This will always yield the current first day of the month of a date field
, SUM(counts) as sums
from #Orders
group by DATEADD(month, datediff(month, 0, dt), 0)
-- well that is great but what if I want to group different ways all at once?
-- why windowed functions rock:
select
dt
, counts
, SUM(counts) over(partition by DATEADD(year, datediff(year, 0, dt), 0)) as YearPartitioning
, SUM(counts) over(partition by DATEADD(month, datediff(month, 0, dt), 0)) as MonthPartitioning
-- expression above will work for year, month, day, minute, etc. You just need to add it to both the dateadd and datediff functions
, SUM(counts) over(partition by dt) as DayPartitioning
from #Orders
The important concepts on grouping is the traditional group by clause which you MUST LIST that which is NOT performing a math operation on as a pivot to do work on. So in my first select I just chose date and then said sum(counts). It then saw on 1-1-12 it had two values so it added both of them and on everything else it added them individually. On the second select method I perform a trick on the date field to make it transform to the first day of the month. Now this is great but I may want all of this at once.
Windowed functions do groupings inline, meaning they don't need a grouping clause as that is what the over() portion is doing. It however may repeat the values since you are not limiting your dataset. This means that if you look at the third column of the third select 'YearPartitioning' it repeats the number 23 seven times. WHy? Well because you never told the statement to do any grouping outside the function so it is showing every row. The number 23 will occur as long as the expression is true that the year is the same for all values. Just remember this when selecting from a windowed expression.

Related

Grouping all dates as one field and showing the sum of sales

I have converted all dates within my table to reflect as YYYY/MM/01 but I am left with 25 or so of these dates that are all the same and I just want to group them together and I can't figure out how to do it. I'm newish to SQL and was hoping someone could point me in the right direction for this.
Much appreciated!
SELECT
DATEFROMPARTS(YEAR(ReportedDate), MONTH(ReportedDate), 1) AS Date, SUM(Sales) Sales
FROM
dbo.Sales
WHERE
YEAR(ReportedDate) = 2018 AND MONTH(ReportedDate) = 01
GROUP BY
ReportedDate
Because you are grouping by ReportedDate, for every ReportedDate you will get a record, even though you didn't select ReportedDate in your SELECT clause. Think of it as a hidden column in your data. Instead, try grouping by the functions in your select statement.
SELECT
DATEFROMPARTS(YEAR(ReportedDate), MONTH(ReportedDate), 1) AS Date, SUM(Sales) Sales
FROM
dbo.Sales
WHERE
YEAR(ReportedDate) = 2018 AND MONTH(ReportedDate) = 01
GROUP BY
DATEFROMPARTS(YEAR(ReportedDate), MONTH(ReportedDate), 1)
As an alternative to your query I suggest you to use EOMONTH function. You would not need to use extra date functions. And I think it's better to show last day of month than first day when showing totals per month
SELECT
EOMONTH(ReportedDate) AS Date, SUM(Sales) Sales
FROM
dbo.Sales
WHERE
EOMONTH(ReportedDate) = EOMONTH(GETDATE(), -1)
GROUP BY
EOMONTH(ReportedDate)
Notes:
EOMONTH(GETDATE(), -1) gets last day of previous month
Use DATEADD(DD, 1, EOMONTH(ReportedDate, -1)) to get first day of month

SSIS - Sorted Aggregating

I have source data at the day granularity and I need to aggregate it to week granularity. Most fields are easy sum aggregations. But, I have one field that I need to take Sunday's value (kinda like a "first" aggregation) and another field that I need to take Saturday's value.
The road I'm going down using SSIS is to Multicast my source data three times, doing a regular Aggregate for the easy fields, and then using lookup joins to a calendar table to match the other two to Saturday and Sunday respectively to grab those values.... then merge joining everything back together.
Is there a better way to do this?
example source data:
What the output should look like:
Is there a better way to do this? Yes. Don't use a complicated SSIS solution for something that is a simple SQL statement
SELECT
Day,
SUM(Sales) Sales,
MAX(
CASE WHEN DATEPART(dw,Day) = 1 THEN BOP ELSE NULL END
) As BOP,
MAX(
CASE WHEN DATEPART(dw,Day) = 7 THEN EOP ELSE NULL END
) As EOP
FROM Table
GROUP BY Table
You might need to tweak the 1 and 7 depending on your server settings but hopefully you get the idea.
You can use First_value and Last_Value for this as below:
select top 1 with ties datepart(week, [day]) as [Week],
sum(sales) over(partition by datepart(week, [day])) as Sales,
FIRST_VALUE(BOP) over(partition by datepart(week, [day]) order by [day]) as BOP
, EOP = LAST_VALUE(EOP) over(partition by datepart(week, [day]) order by [day] RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING )
from #youraggregate
Order by Row_number() over(partition by datepart(week, [day]) order by [day])
Use Derived column transformation to get the week first
DATEPART("wk", Day)
After that use Aggregate using Week Column

Optimizing GROUP BY performance

Is there some tricky way to GROUP BY a variable which has been defined by alias or which is a result of calculation? I think that the following code makes a double dip by calculating MyMonth in Select statement and then again in Group statement. It may be unnecessary waste. It is not possible by simple GROUP BY MyMonth. Is it possible to force only one calculation of month([MyDate])?
Update of code. Aggregate function is added.
SELECT month([MyDate]) AS MyMonth, count([MyDate]) AS HowMany
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY month([MyDate])
ORDER BY MyMonth
Your real problem likely stems from calling MONTH(...) on every row. This prevents the optimizer from using an index to fulfill the count (it can use it for the WHERE clause, but this will still be many rows).
Instead, you should turn this into a range query, that the optimizer could use for comparisons against an index. First we build a simple range table:
WITH Months as (SELECT MONTH(d) AS month,
d AS monthStart, DATEADD(month, 1, d) AS monthEnd
FROM (VALUES(CAST('20140101' AS DATE))) t(d)
UNION ALL
SELECT MONTH(monthEnd),
monthEnd, DATEADD(month, 1, monthEnd)
FROM Months
WHERE monthEnd < CAST('20150101' AS DATE))
SQL Fiddle Example
(if you have an existing calendar table, you can base your query on that, but sometimes a simple ad-hoc one works best)
Once we have the range-table, you can then use it to constrain and bucket your data, like so:
SELECT Months.month, COUNT(*)
FROM TableA
JOIN Months
ON TableA.MyDate >= Months.monthStart
AND TableA.MyDate < Months.monthEnd
GROUP BY Months.month
Note: The start of the date range was changed to 2014-01-01, as it seems strange that you'd only include one day from January, when aggregating months...
No, you can't use column alias directly in the GROUP BY clause. Instead do a select in the from list, and use the result column in your group by.
select MyMonth, MAX(someothercolumn)
from
(
SELECT month([MyDate]) AS MyMonth,
someothercolumn
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-31' AND '2014-12-31'
)
GROUP BY MyMonth
ORDER BY MyMonth

Execute count(*) on a group-by result-set

I am trying to do a nice SQL statement inside a stored procedure.
I looked at the issue of seeing the number of days that events happened between two dates.
My example is sales orders: for this month, how many days did we have sales orders?
Suppose this setup:
CREATE TABLE `sandbox`.`orders` (
`year` int,
`month` int,
`day` int,
`desc` varchar(255)
)
INSERT INTO orders (year, month, day, desc)
VALUES (2009,1,1, 'New Years Resolution 1')
,(2009,1,1, 'Promise lose weight')
,(2009,1,2, 'Bagel')
,(2009,1,12, 'Coffee to go')
For this in-data the result should be 3, since there has been three days with sale.
The best solution I found is as below.
However, making a temporary table, counting that then dropping it seemes excess. It "should" be possible in one statement.
Anyone who got a "nicer" solution then me?
/L
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
SELECT count(*) from #Some_Days
Apologies if I'm misunderstanding the question, but perhaps you could do something like this, as an option:
SELECT COUNT(*) FROM
(SELECT DISTINCT(SomeColumn)
FROM MyTable
WHERE Something BETWEEN 100 AND 500
GROUP BY SomeColumn) MyTable
... to get around the temp-table creation and disposal?
There are two basic options which I can see. One is to group everything up in a sub query, then count those distinct rows (Christian Nunciato's answer). The second is to combine the multiple fields and count distinct values of that combined value.
In this case, the following formula coverts the three fields into a single datetime.
DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0)
Thus, COUNT(DISTINCT [formula]) will give the answer you need.
SELECT
COUNT(DISTINCT DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0))
FROM
Quarter
WHERE
[Quarter].Start >= '2009-01-01'
AND [Quarter].End < '2009-01-16'
I usually use the sub query route, but depending on what you're doing, indexes, size of table, simplicity of the formula, etc, this Can be faster...
Dems.
How about:
SELECT COUNT(DISTINCT day) FROM orders
WHERE (year, month) = (2009, 1);
Actually, I don't know if TSQL supports tuple comparisons, but you get the idea.
COUNT(DISTINCT expr) is standard SQL and should work everywhere.
You should use nested Select statements. Inner one should contain group by clause, and the outer one should count it. I think "Christian Nunciato" helped you already.
Select Count(1) As Quantity
From
(
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
) AS InnerResultSet
SELECT [Year], [Month], [Day]
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
COMPUTE COUNT(*)

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.