Execute count(*) on a group-by result-set - sql

I am trying to do a nice SQL statement inside a stored procedure.
I looked at the issue of seeing the number of days that events happened between two dates.
My example is sales orders: for this month, how many days did we have sales orders?
Suppose this setup:
CREATE TABLE `sandbox`.`orders` (
`year` int,
`month` int,
`day` int,
`desc` varchar(255)
)
INSERT INTO orders (year, month, day, desc)
VALUES (2009,1,1, 'New Years Resolution 1')
,(2009,1,1, 'Promise lose weight')
,(2009,1,2, 'Bagel')
,(2009,1,12, 'Coffee to go')
For this in-data the result should be 3, since there has been three days with sale.
The best solution I found is as below.
However, making a temporary table, counting that then dropping it seemes excess. It "should" be possible in one statement.
Anyone who got a "nicer" solution then me?
/L
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
SELECT count(*) from #Some_Days

Apologies if I'm misunderstanding the question, but perhaps you could do something like this, as an option:
SELECT COUNT(*) FROM
(SELECT DISTINCT(SomeColumn)
FROM MyTable
WHERE Something BETWEEN 100 AND 500
GROUP BY SomeColumn) MyTable
... to get around the temp-table creation and disposal?

There are two basic options which I can see. One is to group everything up in a sub query, then count those distinct rows (Christian Nunciato's answer). The second is to combine the multiple fields and count distinct values of that combined value.
In this case, the following formula coverts the three fields into a single datetime.
DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0)
Thus, COUNT(DISTINCT [formula]) will give the answer you need.
SELECT
COUNT(DISTINCT DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0))
FROM
Quarter
WHERE
[Quarter].Start >= '2009-01-01'
AND [Quarter].End < '2009-01-16'
I usually use the sub query route, but depending on what you're doing, indexes, size of table, simplicity of the formula, etc, this Can be faster...
Dems.

How about:
SELECT COUNT(DISTINCT day) FROM orders
WHERE (year, month) = (2009, 1);
Actually, I don't know if TSQL supports tuple comparisons, but you get the idea.
COUNT(DISTINCT expr) is standard SQL and should work everywhere.

You should use nested Select statements. Inner one should contain group by clause, and the outer one should count it. I think "Christian Nunciato" helped you already.
Select Count(1) As Quantity
From
(
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
) AS InnerResultSet

SELECT [Year], [Month], [Day]
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
COMPUTE COUNT(*)

Related

SQL How can I get a count of messages going out by month

I have a table that sends out messages, I would like to get a total count of the messages that have been going out month by month over the last year . I am new to SQL so I am having trouble with it . I am using MSSQL 2012 this is my sql
SELECT sentDateTime, MessageID, status AS total, CONVERT(NVARCHAR(10), sentDateTime, 120) AS Month
FROM MessageTable
WHERE CAST(sentDateTime AS DATE) > '2017-04-01'
GROUP BY CONVERT(NVARCHAR(10), sentDateTime, 120), sentDateTime, MessageID, status
ORDER BY Month;
I think the month() and year() functions are more convenient than datepart() for this purpose.
I would go for:
select year(sentDateTime) as yr, month(sentDateTime) as mon, count(*)
from MessageTable
where sentDateTime > '2017-04-01'
group by year(sentDateTime), month(sentDateTime)
order by min(sentDateTime);
Additional notes:
Only include the columns in the select that you care about. This would be the ones that define the month and the count.
Only include the columns in the group by that you care about. Every combination of the expressions in the group by found in the data define a column.
There is no need to convert sentDateTime to a date explicitly for the comparison.
The order by orders the results by time. Using the min() is a nice convenience.
Including the year() makes sure you don't make a mistake -- say by including data from 2018-04 with 2017-04.
-- this selects the part of the date you are looking for, replace this with the date format you are using, this should give you what you are looking for
SELECT DATEPART(mm, GETDATE())
SELECT COUNT(DATEPART(mm, sentDateTime)), MessageID, status
From MessageTable where Cast(sentDateTime as date) > '2017-04-01'
group by DATEPART(mm, sentDateTime), MessageID, status
order by DATEPART(mm, sentDateTime)
You can group by the month number of the sentDateTime with the function DATEPART(MONTH, sentDateTime). The next select will also yield results if no message was sent for a particular month (total = 0).
;WITH PossibleMonths AS
(
SELECT
M.PossibleMonth
FROM
(VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) M(PossibleMonth)
),
MonthTotals AS
(
select
COUNT(1) AS Total,
DATEPART(MONTH, sentDateTime) [Month]
From
MessageTable
where
Cast(sentDateTime as date) > '2017-04-01'
group by
DATEPART(MONTH, sentDateTime)
)
SELECT
P.PossibleMonth,
Total = ISNULL(T.Total, 0)
FROM
PossibleMonths AS P
LEFT JOIN MonthTotals AS T ON P.PossibleMonth = T.Month

How To Select Records in a Status Between Timestamps? T-SQL

I have a T-SQL Quotes table and need to be able to count how many quotes were in an open status during past months.
The dates I have to work with are an 'Add_Date' timestamp and an 'Update_Date' timestamp. Once a quote is put into a 'Closed_Status' of '1' it can no longer be updated. Therefore, the 'Update_Date' effectively becomes the Closed_Status timestamp.
I'm stuck because I can't figure out how to select all open quotes that were open in a particular month.
Here's a few example records:
Quote_No Add_Date Update_Date Open_Status Closed_Status
001 01-01-2016 NULL 1 0
002 01-01-2016 3-1-2016 0 1
003 01-01-2016 4-1-2016 0 1
The desired result would be:
Year Month Open_Quote_Count
2016 01 3
2016 02 3
2016 03 2
2016 04 1
I've hit a mental wall on this one, I've tried to do some case when filtering but I just can't seem to figure this puzzle out. Ideally I wouldn't be hard-coding in dates because this spans years and I don't want to maintain this once written.
Thank you in advance for your help.
You are doing this by month. So, three options come to mind:
A list of all months using left join.
A recursive CTE.
A number table.
Let me show the last:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
)
select format(dateadd(month, n.n, q.add_date), 'yyyy-MM') as yyyymm,
count(*) as Open_Quote_Count
from quotes q join
n
on (closed_status = 1 and dateadd(month, n.n, q.add_date) <= q.update_date) or
(closed_status = 0 and dateadd(month, n.n, q.add_date) <= getdate())
group by format(dateadd(month, n.n, q.add_date), 'yyyy-MM')
order by yyyymm;
This does assume that each month has at least one open record. That seems reasonable for this purpose.
You can use datepart to extract parts of a date, so something like:
select datepart(year, add_date) as 'year',
datepart(month, date_date) as 'month',
count(1)
from theTable
where open_status = 1
group by datepart(year, add_date), datepart(month, date_date)
Note: this counts for the starting month and primarily to show the use of datepart.
Updated as misunderstood the initial request.
Consider following test data:
DECLARE #test TABLE
(
Quote_No VARCHAR(3),
Add_Date DATE,
Update_Date DATE,
Open_Status INT,
Closed_Status INT
)
INSERT INTO #test (Quote_No, Add_Date, Update_Date, Open_Status, Closed_Status)
VALUES ('001', '20160101', NULL, 1, 0)
, ('002', '20160101', '20160301', 0, 1)
, ('003', '20160101', '20160401', 0, 1)
Here is a recursive solution, that doesn't rely on system tables BUT also performs poorer. As we are talking about months and year combinations, the number of recursions will not get overhand.
;WITH YearMonths AS
(
SELECT YEAR(MIN(Add_Date)) AS [Year]
, MONTH(MIN(Add_Date)) AS [Month]
, MIN(Add_Date) AS YMDate
FROM #test
UNION ALL
SELECT YEAR(DATEADD(MONTH,1,YMDate))
, MONTH(DATEADD(MONTH,1,YMDate))
, DATEADD(MONTH,1,YMDate)
FROM YearMonths
WHERE YMDate <= SYSDATETIME()
)
SELECT [Year]
, [Month]
, COUNT(*) AS Open_Quote_Count
FROM YearMonths ym
INNER JOIN #test t
ON (
[Year] * 100 + [Month] <= CAST(FORMAT(t.Update_Date, 'yyyyMM') AS INT)
AND t.Closed_Status = 1
)
OR (
[Year] * 100 + [Month] <= CAST(FORMAT(SYSDATETIME(), 'yyyyMM') AS INT)
AND t.Closed_Status = 0
)
GROUP BY [Year], [Month]
ORDER BY [Year], [Month]
Statement is longer, also more readable and lists all year/month combinations to date.
Take a look at Date and Time Data Types and Functions for SQL-Server 2008+
and Recursive Queries Using Common Table Expressions

Can we do partition date wise?

I am new to the concept of Partition. I know horizontal partition but can we do partition date - wise?
in my project I want that whenever we enter in new-year, partition should be created. Can anyone explain how to do this? I am working on ERP sw and it has data of past year and I need partition on year wise.(for example APR-2011 to MAR-2012 is a year)
I think what you are looking for is something like this:
DECLARE #referenceDate datetime = '3/1/2011'
SELECT
[sub].[Period] + 1 AS [PeriodNr],
YEAR(DATEADD(YEAR, [sub].[Period], #referenceDate)) AS [PeriodStartedIn],
COUNT(*) AS [NumberOfRecords],
SUM([sub].[Value]) AS [TotalValue]
FROM
(
SELECT
*,
FLOOR(DATEDIFF(MONTH, #referenceDate, [timestamp]) / 12.0) AS [Period]
FROM [erp]
WHERE [timestamp] >= #referenceDate
) AS [sub]
GROUP BY [sub].[Period]
ORDER BY [sub].[Period] ASC
The fiddle is found here.
When you say partitioning I am curious if you mean in a windowed function or just grouping data in general. Let me show you two ways to aggregate data with partitioning it in a self demonstrating example:
declare #Orders table( id int identity, dt date, counts int)
insert into #Orders values ('1-1-12', 2),('1-1-12', 3),('1-18-12', 1),('2-11-12', 5),('3-1-12', 2),('6-1-12', 8),('10-1-12', 2),('1-13-13', 8)
-- To do days I need to do a group by
select
dt as DayDate
, SUM(counts) as sums
from #Orders
group by dt
-- To do months I need to group differently
select
DATEADD(month, datediff(month, 0, dt), 0) as MonthDate
-- above is a grouping trick basically stating count from 1/1/1900 the number of months of difference from my date field.
--This will always yield the current first day of the month of a date field
, SUM(counts) as sums
from #Orders
group by DATEADD(month, datediff(month, 0, dt), 0)
-- well that is great but what if I want to group different ways all at once?
-- why windowed functions rock:
select
dt
, counts
, SUM(counts) over(partition by DATEADD(year, datediff(year, 0, dt), 0)) as YearPartitioning
, SUM(counts) over(partition by DATEADD(month, datediff(month, 0, dt), 0)) as MonthPartitioning
-- expression above will work for year, month, day, minute, etc. You just need to add it to both the dateadd and datediff functions
, SUM(counts) over(partition by dt) as DayPartitioning
from #Orders
The important concepts on grouping is the traditional group by clause which you MUST LIST that which is NOT performing a math operation on as a pivot to do work on. So in my first select I just chose date and then said sum(counts). It then saw on 1-1-12 it had two values so it added both of them and on everything else it added them individually. On the second select method I perform a trick on the date field to make it transform to the first day of the month. Now this is great but I may want all of this at once.
Windowed functions do groupings inline, meaning they don't need a grouping clause as that is what the over() portion is doing. It however may repeat the values since you are not limiting your dataset. This means that if you look at the third column of the third select 'YearPartitioning' it repeats the number 23 seven times. WHy? Well because you never told the statement to do any grouping outside the function so it is showing every row. The number 23 will occur as long as the expression is true that the year is the same for all values. Just remember this when selecting from a windowed expression.

Grouping by Week in SQL, but displaying full datetime?

the following statement:
group by datepart(wk, createdon)
Groups the selected rows according to the week in which they fall. My select statement shows the following:
SELECT
datepart(wk, createdon) week,
How do I display the actual datetime of the week("12-10-2012"), instead of the number ("12")?
If I understand you correctly, you're wanting to group by week but also return the full datetime. You must include the selected fields in aggregate columns.
Thusly should work:
SELECT DATEPART(wk, createdon) week, createdon
FROM TableName
GROUP BY DATEPART(wk, createdon), createdon
Works on MSSQL2008R2
Edit:
As the OP seems to want the starting date of the week, SO has the answer.
One option is using a cte with ROW_NUMBER instead:
WITH CTE AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY datepart(wk, createdon) ORDER BY createdon)
, DATEADD(ww, DATEDIFF(ww,0,createdon), 0) As Week
, *
FROM dbo.Table
)
SELECT * FROM CTE
WHERE RN = 1
You can't group by a week, and then show constituent parts of it, i.e. days. You will need group by the date instead.

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.