SQL group by day, with count - sql

I've got a log table in SQL Server that looks like this:
CREATE TABLE [dbo].[RefundProcessLog](
[LogId] [bigint] IDENTITY(1,1) NOT NULL,
[LogDate] [datetime] NOT NULL,
[LogType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[RefundId] [int] NULL,
[RefundTypeId] [smallint] NULL,
[LogMessage] [varchar](1000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[LoggedBy] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_RefundProcessLog] PRIMARY KEY CLUSTERED
(
[LogId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
What I want is a list of results that represents how many different refundids were processed each day, throwing out any NULLs.
What SQL would I need to write to produce these results?

I like this approach in (MS SQL):
SELECT
Convert(char(8), LogDate, 112),
count(distinct RefundId)
FROM RefundProcessing
GROUP BY Convert(char(8), LogDate, 112)

select cast(LogDate as date) as LogDate, count(refundId) as refundCount
from yourTable
group by cast(LogDate as date)
Depending on the dialect of SQL you're using, you may have to change the CAST to something else. The expression should convert the LogDate to a date-only value.
Also, if you say "different refundId" because there could be repeated values of refundId that you only want to count once, use count(DISTINCT refundId)

What database vendor are you using? Whichever it is, replace the "DateOnly(LogDate)" in the following with the appropriate construict to extract the date portion (strip off the time) from the logdate column value and then try this:
Select [DateOnly(LogDate)], Count Distinct RefundId
From RefundProcessLog
Group By [DateOnly(LogDate)]
In Sql server, for e.g., the appropriate construct would be:
Select DateAdd(day, 0, DateDiff(day, 0, LogDate)), Count(Distinct RefundId)
From RefundProcessLog
Group By DateAdd(day, 0, DateDiff(day, 0, LogDate))

SELECT COUNT(RefundId), DateOnly(LogDate) LoggingDate
FROM RefundProcessLog
GROUP BY DateOnly(LogDate)
"DateOnly" is specific to your SQL database, which you haven't specified.
For SQL Server you could use DateAdd(dd,0, DateDiff(dd,0,LogDate)) for "DateOnly"

SQL Server 2008 introduced the date datatype which makes the following possible:
select convert(date, LogDate),
,count(refundid) AS 'refunds'
from RefundProcessing
group by convert(date,LogDate)
order by convert(date,LogDate)

In SqlServer, it would be something like:
select datepart(YEAR, [LogDate]), datepart(MONTH, [LogDate]), datepart(DAY, [LogDate]), count(refundid) as [Count]
from [RefundProcessing]
group by datepart(YEAR, [LogDate]), datepart(MONTH, [LogDate]), datepart(DAY, [LogDate])

Select count(*), LogDate, refundid from RefundProcessLog
where refundid is not null
group by LogDate, refundid
Edit:
Or drop RefundID if you don't want it broken down by refunds

Related

SQL order by needs to check if DATETIME2 is not null then return them first and after order by id

I have two tables and I have trouble figuring out how to do the order by statement to fit my needs.
Basically if the FeaturedUntil column if greater than now then these should be returned first ordered by the PurchasedAt column. Most recent purchases should be first. After these everything should be ordered by the item Id column descending.
Create Table Script
create table Items(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] nvarchar(200) null,
)
create table Feature(
[Id] [int] IDENTITY(1,1) NOT NULL,
[PurchasedAt] [datetime2](7) NOT NULL,
[FeaturedUntil] [datetime2](7) NOT NULL,
[ItemId] [int] NOT NULL,
)
Insert Script
insert into Items(Name) values ('test1')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -3, getdate()), dateadd(month, 1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test2')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -2, getdate()), dateadd(month, 1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test3')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -1, getdate()), dateadd(month, -1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test4')
Select Script
select *
from Items i
left join Feature f on i.Id = f.ItemId
order by
case when f.FeaturedUntil is not null THEN f.PurchasedAt
else i.Id
end
The select should return test2 first as it's FeaturedUntil is greater than now and it is the most recently purchased, second row should be test1 as it is bought before test2. After these should be test4 and last one is test3, because these have no joining Feature table data or the FeatureUntil is not greater than now and these are order by their Item.Id descending.
SELECT *
FROM items i
LEFT JOIN feature f
ON i.id = f.itemid
ORDER BY CASE
WHEN f.featureduntil > getdate THEN purchasedat
ELSE '19000101'
END DESC,
id DESC
You need to order this in descending in order to get the most recent purchase first; the ID sort will still occur, so if you have two PurchasedAt's that are the same, it would sort those 2 by ID.
Based on what you've told us, I think this might be what you're after:
ORDER BY CASE WHEN FeaturedUntil > GETDATE THEN PurchasedAt ELSE '99991231' END ASC, --Future features first, and in date order
--(past have a silly in future date, so always last
Id; --Then ID
Try the following.
select *, case when f.FeaturedUntil is not null THEN f.PurchasedAt else NULL end AS PurchasedAtNew
from Items i
left join Feature f on i.Id = f.ItemId
order by PurchasedAtNew desc, i.Id

SQL - condition on a timestampdiff

I would like to get the average time of solution time for tickets
from state 'billet ouvert' to state 'résolu'.
Table Sample
The queries I tried:
Query 1:
SELECT
title AS 'Etat', ticket_id, user_id,
AVG(TIMESTAMPDIFF(HOUR,
helpdesk_followup.date having title in ('billet ouvert'),
helpdesk_followup.date having title in ('résolu'))
) AS 'moyenne'
FROM helpdesk_followup
GROUP BY user_id;
Query 2:
SELECT
title AS 'Etat', ticket_id, user_id,
AVG(TIMESTAMPDIFF(HOUR,
helpdesk_followup.date as date1,
helpdesk_followup.date as date2)
) AS 'moyenne'
FROM helpdesk_followup
WHERE date1 having title IN 'résolu'
AND date2 having title IN 'billet ouvert'
GROUP BY user_id;
But these queries doesn't get the result I need. How can I add condition to a timestampdiff?
The first column is the starting event and the second column is the end event.
I have done the average in minutes. This SQL works off the title which you may wish to tweak to something more distinct
select a.title, b.title, avg(DateDiff(MINUTE, '00:00:00', b.[date] ) - DateDiff(MINUTE, '00:00:00', a.[date] ) ) from
(select *, row_number() over (order by [date]) hf from helpdesk_followup) a
join (select *, row_number() over (order by [date]) hf from helpdesk_followup) b on (a.hf=b.hf-1)
group by
a.title, b.title
I have left out the user_id from the query as I'm unsure if you wish to break it down using that field.
Hopefully its a start for you to amend into what you need
EDIT: Here is the test data I used for the query
CREATE TABLE [dbo].[helpdesk_followup](
[title] [varchar](50) NULL,
[ticket_id] [int] NULL,
[user_id] [int] NULL,
[date] [datetime] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[helpdesk_followup] ([title], [ticket_id], [user_id], [date]) VALUES (N'billet ouvert', 133, NULL, CAST(N'2015-07-22 15:36:00.000' AS DateTime))
GO
INSERT [dbo].[helpdesk_followup] ([title], [ticket_id], [user_id], [date]) VALUES (N'résolu', 133, 19, CAST(N'2015-07-23 15:36:00.000' AS DateTime))
GO
INSERT [dbo].[helpdesk_followup] ([title], [ticket_id], [user_id], [date]) VALUES (N'billet ouvert', 134, 15, CAST(N'2015-07-23 15:36:00.000' AS DateTime))
GO
INSERT [dbo].[helpdesk_followup] ([title], [ticket_id], [user_id], [date]) VALUES (N'résolu', 134, 21, CAST(N'2015-07-27 15:36:00.000' AS DateTime))
GO

Query optimization for convert VARBINARY to VARCHAR and charindex on it

I have a repository table which has around 18.7 million rows and every month around 500 thousand to 100 thousand rows are added. The table structure is as follows
CREATE TABLE [dbo].[my_table](
[id] [bigint] NULL,
[a_timestamp] [datetime] NULL,
[eventId] [bigint] NULL,
[userId] [varchar](255) NULL,
[customerid] [varchar](128) NULL,
[messageType] [varchar](100) NULL,
[message] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have written the following query to get various counts for each month. The query takes around 10 minutes to execute now. I need help to optimize this query and if possible to bring the time to a couple of mins.
SELECT DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM [my_table]
WHERE messageType = 'Outbound'
AND userId NOT IN ('master', 'admin')
AND CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1
GROUP BY DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0)
ORDER BY MonthYear
I think the key reasons for the long execution time are as follows
CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1 converting from VARBINARY to VARCHAR and searching if 'Retrieve Document'
userId NOT IN ('master', 'admin') filtering users other than the users in the list (the actual list is longer than 2 strings around 10 strings)
18.7 million rows in the table
A couple of points to note
I don't create this table and I can't change it
I don't have SHOWPLAN permission
I need to use this query in Excel data connections and have the user run it from excel. The user will have only select privileges.
Given that you cannot change the existing table, it may be better to change your strategy.
Instead of running your query and building a new set of results completely every time. Why don't you insert new results into another table (lets call it AccumulatedResults) on a monthly basis.
That way you are only handling the 500K new recs each time. This will be much faster than rebuilding the entire result set every time. The query will look a little like:
INSERT INTO AccumulatedResults
(
MonthYear,
[COUNT],
UniqueCustomers,
UniqueUsers,
)
SELECT
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM
[my_table]
WHERE
messageType = 'Outbound' AND
userId NOT IN ('master', 'admin') AND
CHARINDEX('Retrieve Document', CONVERT(VARCHAR(MAX), [message])) > 1
-- This is a new condition
AND DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)
> (SELECT MAX(MonthYear) FROM AccumulatedResults)
GROUP BY
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)

SQL Aggregate Function Query not Producing Expected Results

The following is my query to go through about a million rows to calculate MTBUR (Mean Time Before Unscheduled Repair):
DECLARE #BeginDate date = '01-01-2013',
#EndDate date = '12-31-2013'
BEGIN
SELECT H.AutoType,
COALESCE(((SUM(H.Hours))/(CASE WHEN R.ReceivedDate BETWEEN #BeginDate AND #EndDate THEN COUNT(R.Confirmed) END)), SUM(H.Hours)) AS 'MTBUR'
FROM Hours H
INNER JOIN Repair R
ON H.SN = R.SN
WHERE (R.Confirmed NOT LIKE 'C%' AND R.Confirmed NOT LIKE 'O%')
AND (H.Date BETWEEN #BeginDate AND #EndDate)
GROUP BY H.AutoType,
R.ReceivedDate
END
The following are example results for 2 types:
Type | MTBUR
------------
a | value
a | value
a | value
b | value
b | value
b | value
I want my results to look like this:
Type | MTBUR
------------
a | value
b | value
Why is it grouping the same type several times. I want only 1 value for each type.
Also, Why is the DBMS making me also group by ReceivedDate? I get the feeling that is screwing my results up. Any suggestions?
The following are my CREATE TABLE:
CREATE TABLE [dbo].[acss_hours](
[hoursId] [uniqueidentifier] NOT NULL,
[name] [nvarchar](100) NULL,
[Type] [nvarchar](100) NULL,
[SN] [nvarchar](100) NULL,
[Reg] [nvarchar](100) NULL,
[Hours] [float] NULL,
[Date] [datetime] NULL)
CREATE TABLE [dbo].[repair](
[repairId] [uniqueidentifier] NOT NULL,
[Part] [nvarchar](100) NULL,
[Customer] [nvarchar](100) NULL,
[AutoType] [nvarchar](100) NULL,
[ReceivedDate] [datetime] NULL,
[Confirmed] [nvarchar](100) NULL,
[Company] [nvarchar](100) NULL,
[Reg] [nvarchar](100) NULL,
[Manu] [nvarchar](100) NULL,
[SN] [nvarchar](100) NULL)
You are correct, adding ReceivedDate is screwing up your results. You are getting one row for each type for RecievedDate.
SQL Server if forcing you to add RecievedDate to the group by because you are using it in the select clause. When SQL Server processes each AutoType, what ReceivedDate should it use? It has multiple ReceivedDates per AutoType. Either it needs to use each seperate ReceivedDate by adding it to the group by, or it can use a aggregate function like min or max to select one of the RecievedDates.
How do you want your query to handle it?
I think you should wrap your case in the COUNT.
COUNT(CASE WHEN R.ReceivedDate BETWEEN #BeginDate AND #EndDate
THEN R.Confirmed ELSE 0 END)
You need to include R.ReceivedDate from your calculation in the group by because you're evaluating the column with the between statement. Its the same as including the column in the select. Basically any column in the select line that doesn't have an aggregation function needs to be in the group by.
You will have to make use of the keyword DISTINCT.
So effectively, you can query as such:
DECLARE #BeginDate date = '01-01-2013',
#EndDate date = '12-31-2013'
BEGIN
SELECT DISTINCT H.AutoType,
COALESCE(((SUM(H.Hours))/(CASE WHEN R.ReceivedDate BETWEEN #BeginDate AND #EndDate THEN COUNT(R.Confirmed) END)), SUM(H.Hours)) AS 'MTBUR'
FROM Hours H
INNER JOIN Repair R ON H.SN = R.SN
WHERE (R.Confirmed NOT LIKE 'C%' AND R.Confirmed NOT LIKE 'O%')
AND (H.Date BETWEEN #BeginDate AND #EndDate)
GROUP BY H.AutoType, R.ReceivedDate
END
Hope this helps!!!
Didn't let me post this in the comment. Try an inner query:
SELECT H.AutoType, COALESCE(((SUM(H.Hours))/(SUM(x.CountConfirmed))), SUM(H.Hours)) AS 'MTBUR'
FROM
(SELECT H.AutoType, CASE WHEN R.ReceivedDate BETWEEN #BeginDate AND #EndDate THEN 1 ELSE 0 END AS CountConfirmed
FROM Hours H
INNER JOIN Repair R
ON H.SN = R.SN
WHERE (R.Confirmed NOT LIKE 'C%' AND R.Confirmed NOT LIKE 'O%')
AND (H.Date BETWEEN #BeginDate AND #EndDate)) x
JOIN Hours H
ON H.AutoType = x.AutoType
WHERE (H.Date BETWEEN #BeginDate AND #EndDate)
GROUP BY H.AutoType

How can I group by time in SQL [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL Query Group By Datetime problem?
I am working on an application with 2 steps.
Scan logs and persist data from them in a database.
Read data from database and visualize the data.
The first step is more or less finished. I try to explain the background and my reguirement with the second step.
Each row in the database consists of some info like logdate, logfilename, LogType, logMessage etc. So I want for example write SQL that summarize a given LogType per day.
This is the columns:
[LogDate] [datetime] NOT NULL,
[Computer] [varchar](50) NOT NULL,
[Type] [varchar](50) NOT NULL,
[FileName] [varchar](100) NOT NULL,
[LineNo] [int] NOT NULL,
[UserName] [varchar](50) NOT NULL,
[Message] [varchar](max) NOT NULL,
I imagine the output could be like this if I want to show all rows with Type=TDBError:
Date Sum
2012-10-01 3
2012-10-02 12
2012-10-03 40
2012-10-05 24
2012-10-06 18
So at date 2012-10-01 there was 3 rows in DB where Type=TDBError. At date 2012-10-02 there was 12 etc.
How should I write the SQL for this ?
Assuming SQL Server 2008 or newer:
SELECT
[Date] = CONVERT(DATE, LogDate),
[Sum] = COUNT(*)
FROM dbo.Log_Table_Name
WHERE [Type] = 'DBError'
GROUP BY CONVERT(DATE, LogDate)
ORDER BY [Date];
GROUP BY DATEPART(day, date), DATEPART(month, date), DATEPART(year, date)
You can do a group by the parts of the time
GROUP BY date(log_date), month(log_date), day(log_date)
Select Cast(FLOOR(CAST(DATE as float)) as DateTime) as Date,COUNT(*) as [SUM]
from Log_Table_Name
Group by Cast(FLOOR(CAST(DATE as float)) as DateTime)
order by Cast(FLOOR(CAST(DATE as float)) as DateTime)