Query optimization for convert VARBINARY to VARCHAR and charindex on it - sql

I have a repository table which has around 18.7 million rows and every month around 500 thousand to 100 thousand rows are added. The table structure is as follows
CREATE TABLE [dbo].[my_table](
[id] [bigint] NULL,
[a_timestamp] [datetime] NULL,
[eventId] [bigint] NULL,
[userId] [varchar](255) NULL,
[customerid] [varchar](128) NULL,
[messageType] [varchar](100) NULL,
[message] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have written the following query to get various counts for each month. The query takes around 10 minutes to execute now. I need help to optimize this query and if possible to bring the time to a couple of mins.
SELECT DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM [my_table]
WHERE messageType = 'Outbound'
AND userId NOT IN ('master', 'admin')
AND CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1
GROUP BY DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0)
ORDER BY MonthYear
I think the key reasons for the long execution time are as follows
CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1 converting from VARBINARY to VARCHAR and searching if 'Retrieve Document'
userId NOT IN ('master', 'admin') filtering users other than the users in the list (the actual list is longer than 2 strings around 10 strings)
18.7 million rows in the table
A couple of points to note
I don't create this table and I can't change it
I don't have SHOWPLAN permission
I need to use this query in Excel data connections and have the user run it from excel. The user will have only select privileges.

Given that you cannot change the existing table, it may be better to change your strategy.
Instead of running your query and building a new set of results completely every time. Why don't you insert new results into another table (lets call it AccumulatedResults) on a monthly basis.
That way you are only handling the 500K new recs each time. This will be much faster than rebuilding the entire result set every time. The query will look a little like:
INSERT INTO AccumulatedResults
(
MonthYear,
[COUNT],
UniqueCustomers,
UniqueUsers,
)
SELECT
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM
[my_table]
WHERE
messageType = 'Outbound' AND
userId NOT IN ('master', 'admin') AND
CHARINDEX('Retrieve Document', CONVERT(VARCHAR(MAX), [message])) > 1
-- This is a new condition
AND DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)
> (SELECT MAX(MonthYear) FROM AccumulatedResults)
GROUP BY
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)

Related

SQL order by needs to check if DATETIME2 is not null then return them first and after order by id

I have two tables and I have trouble figuring out how to do the order by statement to fit my needs.
Basically if the FeaturedUntil column if greater than now then these should be returned first ordered by the PurchasedAt column. Most recent purchases should be first. After these everything should be ordered by the item Id column descending.
Create Table Script
create table Items(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] nvarchar(200) null,
)
create table Feature(
[Id] [int] IDENTITY(1,1) NOT NULL,
[PurchasedAt] [datetime2](7) NOT NULL,
[FeaturedUntil] [datetime2](7) NOT NULL,
[ItemId] [int] NOT NULL,
)
Insert Script
insert into Items(Name) values ('test1')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -3, getdate()), dateadd(month, 1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test2')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -2, getdate()), dateadd(month, 1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test3')
insert into Feature(PurchasedAt, FeaturedUntil, ItemId) values (dateadd(day, -1, getdate()), dateadd(month, -1, getdate()), ##IDENTITY)
insert into Items(Name) values ('test4')
Select Script
select *
from Items i
left join Feature f on i.Id = f.ItemId
order by
case when f.FeaturedUntil is not null THEN f.PurchasedAt
else i.Id
end
The select should return test2 first as it's FeaturedUntil is greater than now and it is the most recently purchased, second row should be test1 as it is bought before test2. After these should be test4 and last one is test3, because these have no joining Feature table data or the FeatureUntil is not greater than now and these are order by their Item.Id descending.
SELECT *
FROM items i
LEFT JOIN feature f
ON i.id = f.itemid
ORDER BY CASE
WHEN f.featureduntil > getdate THEN purchasedat
ELSE '19000101'
END DESC,
id DESC
You need to order this in descending in order to get the most recent purchase first; the ID sort will still occur, so if you have two PurchasedAt's that are the same, it would sort those 2 by ID.
Based on what you've told us, I think this might be what you're after:
ORDER BY CASE WHEN FeaturedUntil > GETDATE THEN PurchasedAt ELSE '99991231' END ASC, --Future features first, and in date order
--(past have a silly in future date, so always last
Id; --Then ID
Try the following.
select *, case when f.FeaturedUntil is not null THEN f.PurchasedAt else NULL end AS PurchasedAtNew
from Items i
left join Feature f on i.Id = f.ItemId
order by PurchasedAtNew desc, i.Id

Assign session number to a series of transactions

CREATE TABLE [Transaction](
[TransactionID] [bigint] IDENTITY(1,1) NOT NULL,
[LocationID] [int] NOT NULL,
[KioskID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [varchar](7) NOT NULL,
[Credits] [int] NOT NULL,
[StartingBalance] [int] NULL,
[EndingBalance] [int] NULL,
[SessionID] [int] NULL
);
Please refer to this fiddle for the sample data:
Link to SQL Fiddle
I'm trying to figure out if there is a way to assign a session number to a sequence of transactions in a single update.
A "Session" is defined as a number of deposits and purchases ending with a withdrawal. A Session has sequential transactions consisting of:
1 to n deposits (TransactionType = 'D'),
0 to n purchases (TransactionType = 'P') and
0 or 1 withdrawals (TransactionType = 'W')
With the same LocationID and KioskID. A session can end with a 0 balance or a withdrawal. First deposit with no session starts one. Only P transactions have balances. For D and W they are NULL.
LocationID, KioskID, SessionID must be unique.
I'm really hoping that there is a SQL way of doing this. I'd hate to have to loop through hundreds of millions of transactions to set sessions procedurally.
This should do it:
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
SELECT *,
SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd As SessionID
FROM markSessions
No triggers, cursors or client code needed.
If you actually want to set the SessionID in the table, then you'd use an UPDATE statement like this:
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
UPDATE markSessions
SET SessionID = SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd
I am unable to test it, but the following should take into account pre-existing SessionIDs
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
UPDATE markSessions
SET SessionID = SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd
+ COALESCE(MAX(SessionID) OVER (PARTITION BY LocationID, KioskID), 0)
WHERE SessionID Is NULL
Note that this will only work if all new rows (those without SessionIDs) have higher transaction IDs than the Pre-existing rows (those that already have SessionIDs). It definitely NOT work if new rows were added with TransactionIDs, lower than the highest TransactionID already assigned a SessionID.
If you may have that situation, then you likely will have to reassign the old TransactionIDs.

How can I group by time in SQL [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL Query Group By Datetime problem?
I am working on an application with 2 steps.
Scan logs and persist data from them in a database.
Read data from database and visualize the data.
The first step is more or less finished. I try to explain the background and my reguirement with the second step.
Each row in the database consists of some info like logdate, logfilename, LogType, logMessage etc. So I want for example write SQL that summarize a given LogType per day.
This is the columns:
[LogDate] [datetime] NOT NULL,
[Computer] [varchar](50) NOT NULL,
[Type] [varchar](50) NOT NULL,
[FileName] [varchar](100) NOT NULL,
[LineNo] [int] NOT NULL,
[UserName] [varchar](50) NOT NULL,
[Message] [varchar](max) NOT NULL,
I imagine the output could be like this if I want to show all rows with Type=TDBError:
Date Sum
2012-10-01 3
2012-10-02 12
2012-10-03 40
2012-10-05 24
2012-10-06 18
So at date 2012-10-01 there was 3 rows in DB where Type=TDBError. At date 2012-10-02 there was 12 etc.
How should I write the SQL for this ?
Assuming SQL Server 2008 or newer:
SELECT
[Date] = CONVERT(DATE, LogDate),
[Sum] = COUNT(*)
FROM dbo.Log_Table_Name
WHERE [Type] = 'DBError'
GROUP BY CONVERT(DATE, LogDate)
ORDER BY [Date];
GROUP BY DATEPART(day, date), DATEPART(month, date), DATEPART(year, date)
You can do a group by the parts of the time
GROUP BY date(log_date), month(log_date), day(log_date)
Select Cast(FLOOR(CAST(DATE as float)) as DateTime) as Date,COUNT(*) as [SUM]
from Log_Table_Name
Group by Cast(FLOOR(CAST(DATE as float)) as DateTime)
order by Cast(FLOOR(CAST(DATE as float)) as DateTime)

SQL Select within a select

I'm creating a dataset that will be displayed in an SSRS report.
I have a query in a job that puts a count into a table [dbo].[CountMetersDue] on a rolling basis on the 1st of every month; the value changes throughout the month so need to take a snapshot at beginning.
I have the report set up which uses a custom expression to produce a cumulative trend graph. Basically takes one value, divides by another to work out a percentage. Therefore I have two queries that need combining... Took me ages to get my head round all this!
I just need help with the last bit.
SELECT (SELECT [Count]
FROM [MXPTransferDev].[dbo].[CountMetersDue]
WHERE [MXPTransferDev].[dbo].[CountMetersDue].[DateTime] =
[MXPTransferDev].[dbo].[Readings].[dateRead]) AS [MetersDue],
COUNT(readingid) AS [TotalReadings],
CONVERT(DATE, dateread) AS [dateRead]
FROM [MXPTransferDev].[dbo].[Readings]
WHERE ( [MXPTransferDev].[dbo].[Readings].[dateRead] BETWEEN
'01-may-11' AND '31-may-11' )
AND ( webcontactid IS NOT NULL )
AND ( meter = 1 )
GROUP BY CONVERT(DATE, [MXPTransferDev].[dbo].[Readings].[dateRead])
CREATE TABLE [dbo].[CountMetersDue](
[Count] [int] NULL,
[DateTime] [datetime] NULL
) ON [USER]
GO
ALTER TABLE [dbo].[CountMetersDue]
ADD CONSTRAINT [DF_CountMetersDue_DateTime] DEFAULT (getdate()) FOR [DateTime]
GO
CREATE TABLE [dbo].[Readings](
[readingId] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[dateRead] [datetime] NOT NULL,
[meter] [int] NOT NULL,
[webcontactid] [bigint] NULL,
Readings
readingId meter reading dateRead webcontactid
583089 4 3662 2011-05-25 15:00:33.040 479
583207 3 682 2011-05-25 15:00:33.027 479
583088 2 98064 2011-05-25 15:00:33.007 479
CountMetersDue
Count DateTime
2793 2011-12-01 00:00:00.000
1057 2011-05-01 14:08:20.437
610 2011-03-01 00:00:00.000
Second stab at answering your question (will probably need some clarification from yourself before the answer is correct):
/* DDL: 2 tables [CountMetersDue] & [Readings]
[CountMetersDue]
([DateTime] datetime,
[Count] int)
[Readings]
([ReadingId] bigint,
[dateRead] datetime,
[webcontactid] bigint,
[meter] int)
[CountMetersDue] - contains 1 record on the first of every month, with count of the number of readings at that date
[Readings] - contains all the individual readings
ie:
[CountMetersDue]
01-Jan-2011 1000
01-Feb-2011 2357
01-Mar-2011 3000
[Readings]
1 01-Jan-2011 11 1
2 02-Jan-2011 12 1
3 03-Jan-2011 13 1
...
*/
SELECT
CONVERT(DATE, [dbo].[Readings].[dateRead]) AS dateRead,
COUNT([dbo].[Readings].[readingId]) AS TotalReadings,
[dbo].[CountMetersDue].[Count] AS MetersDue
FROM
[CountMetersDue] /* get all count meters due */
left join [Readings] /* get any corresponding Reading records
where the dateRead in the same month as
the CountMetersDue */
on DATEPART(year, Readings.dateRead) = DATEPART(year, [CountMetersDue].[DateTime]) /* reading in same year as CountMetersDue */
and DATEPART(month, Readings.dateRead) = DATEPART(month, [CountMetersDue].[DateTime]) /* reading in same month as CountMetersDue */
WHERE ([MXPTransferDev].[dbo].[Readings].[dateRead]) BETWEEN
#StartDate AND #EndDate
AND ( webcontactid IS NOT NULL )
AND ( meter = 1 )
GROUP BY
[dbo].[CountMetersDue].[Count],CONVERT(DATE, [dbo].[Readings].[dateRead])
This would be the query you are looking for then?
Subqueries, as they are called, can be included by enclosing them in parentheses '()'.
SELECT (SELECT [Count] FROM [xxxxx].[dbo].[CountMetersDue] AS tabA WHERE tabA.[datefield] = tabB.dateRead) AS [MetersDue], COUNT(readingId) AS [TotalReadings], CONVERT(DATE, dateRead) AS [dateRead]
FROM [xxxxx] AS tabB
WHERE (dateRead BETWEEN #StartDate AND #EndDate) AND (webcontactid IS NOT NULL) AND (meter = 1)
GROUP BY CONVERT(DATE, dateRead)

SQL group by day, with count

I've got a log table in SQL Server that looks like this:
CREATE TABLE [dbo].[RefundProcessLog](
[LogId] [bigint] IDENTITY(1,1) NOT NULL,
[LogDate] [datetime] NOT NULL,
[LogType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[RefundId] [int] NULL,
[RefundTypeId] [smallint] NULL,
[LogMessage] [varchar](1000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[LoggedBy] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_RefundProcessLog] PRIMARY KEY CLUSTERED
(
[LogId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
What I want is a list of results that represents how many different refundids were processed each day, throwing out any NULLs.
What SQL would I need to write to produce these results?
I like this approach in (MS SQL):
SELECT
Convert(char(8), LogDate, 112),
count(distinct RefundId)
FROM RefundProcessing
GROUP BY Convert(char(8), LogDate, 112)
select cast(LogDate as date) as LogDate, count(refundId) as refundCount
from yourTable
group by cast(LogDate as date)
Depending on the dialect of SQL you're using, you may have to change the CAST to something else. The expression should convert the LogDate to a date-only value.
Also, if you say "different refundId" because there could be repeated values of refundId that you only want to count once, use count(DISTINCT refundId)
What database vendor are you using? Whichever it is, replace the "DateOnly(LogDate)" in the following with the appropriate construict to extract the date portion (strip off the time) from the logdate column value and then try this:
Select [DateOnly(LogDate)], Count Distinct RefundId
From RefundProcessLog
Group By [DateOnly(LogDate)]
In Sql server, for e.g., the appropriate construct would be:
Select DateAdd(day, 0, DateDiff(day, 0, LogDate)), Count(Distinct RefundId)
From RefundProcessLog
Group By DateAdd(day, 0, DateDiff(day, 0, LogDate))
SELECT COUNT(RefundId), DateOnly(LogDate) LoggingDate
FROM RefundProcessLog
GROUP BY DateOnly(LogDate)
"DateOnly" is specific to your SQL database, which you haven't specified.
For SQL Server you could use DateAdd(dd,0, DateDiff(dd,0,LogDate)) for "DateOnly"
SQL Server 2008 introduced the date datatype which makes the following possible:
select convert(date, LogDate),
,count(refundid) AS 'refunds'
from RefundProcessing
group by convert(date,LogDate)
order by convert(date,LogDate)
In SqlServer, it would be something like:
select datepart(YEAR, [LogDate]), datepart(MONTH, [LogDate]), datepart(DAY, [LogDate]), count(refundid) as [Count]
from [RefundProcessing]
group by datepart(YEAR, [LogDate]), datepart(MONTH, [LogDate]), datepart(DAY, [LogDate])
Select count(*), LogDate, refundid from RefundProcessLog
where refundid is not null
group by LogDate, refundid
Edit:
Or drop RefundID if you don't want it broken down by refunds