COUNT DISTINCT(column) slows the query 20X - sql

I have this query which works fine and is fast (about 1 seconds execution time):
SELECT COUNT(ticket) AS times_appears
,COUNT(LOGIN) AS number_of_accounts
,comment
FROM mt4_trades
WHERE COMMENT != ''
AND CLOSE_TIME != '1970-01-01 00:00:00.000'
GROUP BY comment
ORDER BY times_appears DESC
but as soon as I change the second line to:
,COUNT(DISTINCT LOGIN) AS number_of_accounts
the query is slowing down 20X times.
Is the DISTINCT so slow that affects the whole query or am I missing something here?

After some research I found out that sometimes is better to use a subquery than COUNT(DISTINCT column).
So this is my query which is 20X times faster than the one on my question:
SELECT COUNT(mtt.ticket) as times_appears
--,COUNT(DISTINCT login) as number_of_accounts
,(SELECT COUNT(LOGIN) FROM (SELECT DISTINCT login FROM mt4_trades WHERE COMMENT=mtt.COMMENT AND CLOSE_TIME != '1970-01-01 00:00:00.000' ) AS temp)AS number_of_accounts
,comment
FROM mt4_trades mtt
WHERE mtt.COMMENT != ''
AND mtt.CLOSE_TIME != '1970-01-01 00:00:00.000'
GROUP BY mtt.comment
ORDER BY times_appears DESC
#Raphaƫl-Althau Thanks for the helpful URL-hint

---- tickt count, irrespective of login
Select mtt.comment
,t.number_of_accounts
,Count(mtt.ticket) As times_appears
From mt4_trades As mtt With (Nolock)
Join
(
Select t.comment
,Count(t.login) As number_of_accounts
From (
Select Distinct
mtt.login
,mtt.comment
From mt4_trades As mtt With (Nolock)
Where mtt.comment <> ''
And mtt.CLOSE_TIME <> '1970-01-01 00:00:00.000'
) As t
Group By t.comment
) As mt On mtt.comment = t.comment
Where mtt.comment <> ''
And mtt.CLOSE_TIME <> '1970-01-01 00:00:00.000'
Group By mtt.comment
,t.number_of_accounts
---- tickt count with respect to login
Select t.comment
,Count(t.ticket) As times_appears
,Count(t.login) As number_of_accounts
From (
Select Distinct
mtt.ticket
,mtt.login
,mtt.comment
From mt4_trades As mtt With (Nolock)
Where mtt.comment <> ''
And mtt.CLOSE_TIME <> '1970-01-01 00:00:00.000'
) As t
Group By t.comment

Related

How to optimize selection of pairs from one column of the table?

I'm using PostgreSQL 9.5.19, DBeaver 6.3.4
I have a table where one row is - user's name, place he attended, time when he was there
I need to select all pairs of places where any user was (if user was at place a and place b i need row like this: user, place a, place b, time at place a, time at place b)
The ponds table:
CREATE TABLE example.example (
tm timestamp NOT NULL,
place_name varchar NOT NULL,
user_name varchar NOT NULL
);
Some sample data:
INSERT INTO example.example (tm, place_name, user_name)
values
('2020-02-25 00:00:19.000', 'place_1', 'user_1'),
('2020-03-25 00:00:19.000', 'place_2', 'user_1'),
('2020-02-25 00:00:19.000', 'place_1', 'user_2'),
('2020-03-25 00:00:19.000', 'place_1', 'user_3'),
('2020-02-25 00:00:19.000', 'place_2', 'user_3');
I'm trying this script:
select
t.user_name
,t.place_name as r1_place
,max(t.tm) as r1_tm
,t2.place_name as r2_place
,min(t2.tm) as r2_tm
from example.example as t
join example.example as t2 on t.user_name = t2.user_name
and t.tm < t2.tm
and t.place_name <> t2.place_name
where t.tm between '2020-02-25 00:00:00' and '2020-03-25 15:00:00'
and t2.tm between '2020-02-25 00:00:00' and '2020-03-25 15:00:00'
group by t.user_name
, t.place_name
, t2.place_name
Seems like it gives me the right result, but it works really slow.
Can I optimize it somehow?
I would suggest trying indexes. For this query:
select t.user_name, t.place_name as r1_place, max(t.tm) as r1_tm,
t2.place_name as r2_place, min(t2.tm) as r2_tm
from schema.table t join
schema.table t2
on t.user_name = t2.user_name and
t.tm < t2.tm and
t.place_name <> t2.place_name
where t.tm between '2020-03-25 00:00:00' and '2020-03-25 15:00:00' and
t2.tm between '2020-03-25 00:00:00' and '2020-03-25 15:00:00'
group by t.user_name, t.place_name, t2.place_name
I would suggest an index on (tm, user_name, place_name) and on (user_name, tm, place_name) -- yes, both, one for each reference.
Colleague helped me to create window function:
select
subq.*
,EXTRACT(EPOCH FROM (subq.next_tm - subq.tm)) as seconds_diff
from (
select
t1.user_name,
t1.place_name,
t1.tm,
lead(t1.place_name) over w as next_place_name,
lead(t1.tm) over w as next_tm
from example.example as t1
window w as (partition by t1.user_name order by tm asc)
)subq
where
next_place_name is not null
and next_tm is not null
and place_name <> next_place_name
;

How to get the difference in dates in SQL Server

I'm having trouble with writing a query to get difference between the UpdateDate and the CreationDate of 2 records if the ID is the lowets and the difference between the most recent and second most recent UpdateDate.
Here's my Query:
SELECT
a.ID, a.RequestID, b.KrStatus, b.CrDate , b.UpdateDate,
DATEDIFF (HOUR, b.CrDate, b.UpdateDate) AS TimeDifference,
CASE WHEN a.ID = (SELECT MAX(a.ID) FROM [dbo].[Krdocs_hist] a WHERE a.RequestID = 1)
THEN 'YES'
ELSE 'NO'
END AS isMax,
CASE WHEN a.ID = (SELECT MIN(a.ID) FROM [dbo].[Krdocs_hist] a WHERE a.RequestID = 1)
THEN 'YES'
ELSE 'NO'
END AS isMi
FROM [dbo].[Krdocs_hist] a, [dbo].Krdocs_Details_hist b
WHERE
a.RequestId = b.RequestId
and a.ID = b.ID
and a.RequestId = 1
ORDER BY b.RequestID
Here's my current result:
What I'd like to do is get the last possible record, check to see if there was an existing one before it. If there wasn't compare the UpdateDate and CrDate (UpdateDate minus CrDate. If there was a record before this I want to do the UpdateDate minus the previous UpdateDate.
Using this query:
SELECT b.Id, b.RequestId, b.UpdateDate, b.KrStatus
FROM [dbo].[Krdocs_Details_hist] b
WHERE b.RequestId = 1
Has this result:
And using this query:
SELECT a.*
FROM [dbo].[Krdocs_hist] a
WHERE RequestId = 1
Has this result:
UPDATE
Since LAG is available from SQL 2012, you can use like below:
SELECT
ID,
RequestID,
CrDate,
UpdateDate,
KrStatus,
DATEDIFF(HOUR, PreviousUpdateDate, UpdateDate) as TimeDifference
FROM
(SELECT
ID,
RequestID,
CrDate,
UpdateDate,
KrStatus,
LAG(UpdateDate, 1, CrDate) OVER (ORDER BY YEAR(ID)) AS PreviousUpdateDate
FROM [dbo].Krdocs_Details_hist) as tmp
I think you can try like this:
SELECT
CASE
WHEN COUNT(*) <= 1 THEN DATEDIFF(HOUR,
(SELECT CrDate FROM [dbo].Krdocs_Details_hist),
(SELECT UpdateDate FROM [dbo].Krdocs_Details_hist))
WHEN COUNT(*) > 1 THEN DATEDIFF(HOUR,
(SELECT MAX(UpdateDate) FROM [dbo].Krdocs_Details_hist WHERE UpdateDate < ( SELECT MAX(UpdateDate) FROM [dbo].Krdocs_Details_hist)),
(SELECT MAX(UpdateDate) FROM [dbo].Krdocs_Details_hist))
END AS TimeDifference
FROM [dbo].Krdocs_Details_hist

SQL Server: view is slow

I have a SQL Server view to show an overview of account statements, first we calculate the latest closing balances of the user accounts to know what the latest balance was from their account. This is the LATEST_CB_DATES part.
Than we calculate the next business days, meaning the 2 next days where we are expecting to receive a balance in the database. This happens in NEXT_B_DAYS
Finally we calculate if the account is expecting a closing balance, received one or received one too late. Note that we use a window reception ending for this.
IF EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'VIEW_AS_AS_ACCT_STAT')
DROP VIEW VIEW_AS_AS_ACCT_STAT
GO
CREATE VIEW VIEW_AS_AS_ACCT_STAT AS
WITH LATEST_CB_DATES AS (
SELECT * FROM (
SELECT row_number() over (partition by SD_ACCT.ID order by (AS_ACCT_STAT.CBAL_BAL_DATE) DESC) RN,SD_ACCT.ID, SD_ACCT.ACCT_NBR, AS_ACCT_STAT.CBAL_BAL_DATE AS BAL_DATE, SD_ACCT.CODE, SD_ACCT.CCY, SD_ACCT_GRP.ID AS GRP_ID, SD_ACCT_GRP.CODE AS ACCT_GRP_CODE, SD_ACCT.DATA_OWNER_ID, AS_ACCT_STAT.STATIC_DATA_BNK AS BANK_CODE, AS_ACCT_STAT.STATIC_DATA_HLD AS HOLDER_CODE
FROM SD_ACCT
LEFT JOIN AS_ACCT on SD_ACCT.ID = AS_ACCT.STATIC_DATA_ACCT_ID
LEFT JOIN AS_ACCT_STAT on AS_ACCT.ID = AS_ACCT_STAT.ACCT_ID
JOIN SD_ACCT_GRP_MEMBER ON SD_ACCT.ID = SD_ACCT_GRP_MEMBER.ACCT_ID
JOIN SD_ACCT_GRP on SD_ACCT_GRP_MEMBER.GRP_ID = SD_ACCT_GRP.ID
JOIN SD_ACCT_GRP_ROLE on SD_ACCT_GRP_ROLE.ID = SD_ACCT_GRP.ROLE_ID
WHERE SD_ACCT_GRP_ROLE.CODE = 'AccountStatementsToReceive' AND (AS_ACCT_STAT.VALID = 1 OR AS_ACCT_STAT.VALID IS NULL)
) LST_STMT
WHERE RN = 1
),
NEXT_B_DAYS AS (
SELECT VIEW_BUSINESS_DATES.CAL_ID, VIEW_BUSINESS_DATES.BUSINESS_DATE,
LEAD(VIEW_BUSINESS_DATES.BUSINESS_DATE, 1) OVER (PARTITION BY VIEW_BUSINESS_DATES.CAL_CODE ORDER BY VIEW_BUSINESS_DATES.BUSINESS_DATE) AS NEXT_BUSINESS_DATE,
LEAD(VIEW_BUSINESS_DATES.BUSINESS_DATE, 2) OVER (PARTITION BY VIEW_BUSINESS_DATES.CAL_CODE ORDER BY VIEW_BUSINESS_DATES.BUSINESS_DATE) AS SECOND_BUSINESS_DATE
FROM VIEW_BUSINESS_DATES
)
SELECT LATEST_CB_DATES.ID AS ACCT_ID,
LATEST_CB_DATES.CODE AS ACCT_CODE,
LATEST_CB_DATES.ACCT_NBR,
LATEST_CB_DATES.CCY AS ACCT_CCY,
LATEST_CB_DATES.BAL_DATE AS LATEST_CLOSING_BAL_DATE,
LATEST_CB_DATES.DATA_OWNER_ID,
LATEST_CB_DATES.BANK_CODE,
LATEST_CB_DATES.HOLDER_CODE,
LATEST_CB_DATES.ACCT_GRP_CODE,
CASE
WHEN LATEST_CB_DATES.BAL_DATE IS NULL THEN 'Expecting'
WHEN NEXT_B_DAYS.NEXT_BUSINESS_DATE IS NULL OR NEXT_B_DAYS.SECOND_BUSINESS_DATE IS NULL THEN 'Late'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NOT NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) THEN 'Late'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) AND CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) >= CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) THEN 'Expecting'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.NEXT_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) AND CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) < CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) THEN 'Expecting' -- overnight
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND CAST (GETDATE() AS DATE) > NEXT_B_DAYS.SECOND_BUSINESS_DATE THEN 'Expecting'
ELSE 'Received'
END AS STAT,
CASE
WHEN LATEST_CB_DATES.BAL_DATE IS NULL THEN NULL
WHEN NEXT_B_DAYS.NEXT_BUSINESS_DATE IS NULL OR NEXT_B_DAYS.SECOND_BUSINESS_DATE IS NULL THEN NULL
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NOT NULL THEN CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) AS DATETIME)
ELSE NULL
END AS DEADLINE,
SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET AS TIME_ZONE
FROM AS_AS_RECEPTION_CONF
JOIN LATEST_CB_DATES ON AS_AS_RECEPTION_CONF.ACCT_GRP_ID = LATEST_CB_DATES.GRP_ID
JOIN SEC_TIMEZONE ON SEC_TIMEZONE.ID = AS_AS_RECEPTION_CONF.TIME_ZONE_ID
LEFT JOIN NEXT_B_DAYS ON AS_AS_RECEPTION_CONF.CALENDAR_ID = NEXT_B_DAYS.CAL_ID AND LATEST_CB_DATES.BAL_DATE = NEXT_B_DAYS.BUSINESS_DATE
GO
SELECT * FROM VIEW_AS_AS_ACCT_STAT
What is the issue? Nothing, this works fine, but it's slow. We created a graphical report to display the data for our customers, but it takes 1minute, 30 seconds to load this SQL when you have 5000 accounts, which is too slow.
I guess the reason is the last line, but I didn't manage to refactor it well
LEFT JOIN NEXT_B_DAYS ON AS_AS_RECEPTION_CONF.CALENDAR_ID =
NEXT_B_DAYS.CAL_ID AND LATEST_CB_DATES.BAL_DATE =
NEXT_B_DAYS.BUSINESS_DATE
The exeuction plan of my sql can be found here
How can I refactor this to make my view still work but much more performant?

Display Only One Record That May Or May Not Have Children

I've been stuck on this issue for a while now. I'm really close I think, but there's something I'm missing.
A Transaction can have zero or many TransactionErrors. I am trying to display all Transactions only once, and I'm also trying to display only the latest error message if there is one.
SELECT [Transaction].[TransactionID]
,[FileName]
,[DestinationSystem]
,[CreatedOn]
,LEFT([TransactionError].[ErrorMessage], 300) AS LatestErrorMessage --Gets only the first 300 characters of the error message
FROM [WM01DB].[dbo].[Transaction]
INNER JOIN SourceSystem ON SourceSystem.SourceSystemId = Transaction.SourceSystemId
LEFT JOIN TransactionError ON TransactionError.TransactionId = Transaction.TransactionId
WHERE Transaction.CreatedOn >= '2014-08-01 00:00:00.000'
AND Transaction.CreatedOn < '2014-09-02 00:00:00.000'
ORDER BY [CreatedOn], [Transaction].[TransactionID]
When I run this query, I get most of the results I want, but I get duplicate transactions because these transactions have multiple TransactionErrors. It looks like this...
TransactionID FileName DestinationSystem CreatedOn LatestErrorMessage
18124 201408131541517937_DC_TEST_3339376-4.1.xml TEST 2014-08-18 18:31:19.993 U_BOL and Tracking Number are blank
18124 201408131541517937_DC_TEST_3339376-4.1.xml TEST 2014-08-18 18:31:19.993 FRT_CHG_TYPE is blank
18125 201408111521484448_DC_TEST_3339375-2.1.xml TEST 2014-08-19 16:04:58.467 NULL
18126 201408111521484448_DC_TEST_3339375-2.1.xml TEST 2014-08-19 16:09:00.467 NULL
Ugh... Bad looking code block...
As you can see, there are duplicate TransactionIDs as demonstrated with 18124. I would like 18124 to display only once with the latest error message. The only way to get the latest error message would be to use the latest TransactionErrorID for a particular TransactionID...
Please help! :(
I have a similar solution to Krishnraj Rana. However, I think that you need to avoid having the rowid filter in the WHERE clause because that will make if behave like an inner join:
; with Errors as
(SELECT [ErrorMessage]
, Row_Number() over (Partition By TransactionId order by TransactionErrorId Desc) as id
FROM TransactionError
)
SELECT [Transaction].[TransactionID]
,[FileName]
,[DestinationSystem]
,[CreatedOn]
,LEFT([ErrorMessage], 300) AS LatestErrorMessage --Gets only the first 300 characters of the error message
FROM [WM01DB].[dbo].[Transaction]
INNER JOIN SourceSystem ON SourceSystem.SourceSystemId = Transaction.SourceSystemId
LEFT JOIN Errors ON TransactionError.TransactionId = Transaction.TransactionId
and errors.id = 1
WHERE Transaction.CreatedOn >= '2014-08-01 00:00:00.000'
AND Transaction.CreatedOn < '2014-09-02 00:00:00.000'
ORDER BY [CreatedOn], [Transaction].[TransactionID]
You can achieve it by using ROW_NUMBER() with PARTITION BY clause like this -
SELECT [Transaction].[TransactionID]
,[FileName]
,[DestinationSystem]
,[CreatedOn]
,LEFT([TransactionError].[ErrorMessage], 300) AS LatestErrorMessage --Gets only the first 300 characters of the error message
,ROW_NUMBER() OVER (
PARTITION BY [Transaction].[TransactionID] ORDER BY [CreatedOn]
,[Transaction].[TransactionID] DESC
) AS SrNo
FROM [WM01DB].[dbo].[Transaction]
INNER JOIN SourceSystem ON SourceSystem.SourceSystemId = TRANSACTION.SourceSystemId
LEFT JOIN TransactionError ON TransactionError.TransactionId = TRANSACTION.TransactionId
WHERE TRANSACTION.CreatedOn >= '2014-08-01 00:00:00.000'
AND TRANSACTION.CreatedOn < '2014-09-02 00:00:00.000'
AND SrNo = 1
ORDER BY [CreatedOn]
,[Transaction].[TransactionID]
SELECT A.[TransactionID]
,A.[FileName]
,A.[DestinationSystem]
,A.[CreatedOn]
,A.LatestErrorMessage
FROM (
SELECT [Transaction].[TransactionID]
,[FileName]
,[DestinationSystem]
,[CreatedOn]
,LEFT([TransactionError].[ErrorMessage], 300) AS LatestErrorMessage --Gets only the first 300 characters of the error message
,ROW_NUMBER() OVER (PARTITION BY [Transaction].[TransactionID] ORDER BY [CreatedOn] DESC) rn
FROM [WM01DB].[dbo].[Transaction]
INNER JOIN SourceSystem ON SourceSystem.SourceSystemId = [Transaction].SourceSystemId
AND [Transaction].CreatedOn >= '2014-08-01 00:00:00.000'
AND [Transaction].CreatedOn < '2014-09-02 00:00:00.000'
LEFT JOIN TransactionError ON TransactionError.TransactionId = [Transaction].TransactionId
)A
WHERE A.rn = 1
ORDER BY A.[CreatedOn], A.[TransactionID]
Also using row_number() but picking last TransactionErrorId as requested (and assuming at least SQL Server 2005):
with x as (
select
t.[TransactionId],
[FileName],
[DestinationSytem],
[CreatedOn],
e.[ErrorMessage],
row_number() over (
partition by t.[TransactionId],
order by e.[TransactionErrorId] desc
) rn
from
[wm01db].[dbo].[Transaction] t
inner join
[dbo].[SourceSystem] s
on t.SourceSystemId = s.SourceSytemId
left outer join
[dbo].[TransactionError] e
on e.TransactionId = t.TransactionId
where
t.CreatedOn >= '2014-08-01 00:00:00.000' and
t.CreatedOn < '2014-09-02 00:00:00.000'
) select
[TransactionId],
[FileName],
[DestinationSytem],
[CreatedOn],
left([ErrorMessage], 300) as LastErrorMessage
from
x
where
rn = 1
order by
[CreatedOn],
[TransactionId] ;

No column name was specified for column 2 of 't' T_SQL error

PLease i want to return only debtorid from this T-SQL . It gives the error "
No column name was specified for column 2 of 't'." My query is below.
with t as
(SELECT debtorid,MAX(dbo.FollowUp.followupdate) FROM dbo.FollowUp WITH (NOLOCK) WHERE
( dbo.FollowUp.FollowUpDate >= '01-01-2011 00:00:00:000' and dbo.FollowUp.FollowUpDate <= '01-13-2014 23:59:59.000'
and Status = 'PTP') GROUP BY [Status], DebtorId)
SELECT t.debtorid FROM t;
WIth my little knowledge , i thought the above should work fine for me. however, it didn't.Any help would be appreciated.
You need to add a name to your second column, e.g.:
with t as
(SELECT debtorid, MAX(dbo.FollowUp.followupdate) maxFollowup
FROM dbo.FollowUp WITH (NOLOCK)
WHERE dbo.FollowUp.FollowUpDate >= '01-01-2011 00:00:00:000'
and dbo.FollowUp.FollowUpDate <= '01-13-2014 23:59:59.000'
and Status = 'PTP'
GROUP BY [Status], DebtorId)
SELECT t.debtorid FROM t;
You are using Common Table Expressions or CTE, in CTE you must have column name as per msdn. You can also refer for CTE best practices:
http://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx
http://blog.sqlauthority.com/2011/05/10/sql-server-common-table-expression-cte-and-few-observation/
;with t as
(
SELECT FWP.debtorid
,MAX(FWP.followupdate) followupdate
FROM dbo.FollowUp FWP WITH (NOLOCK)
WHERE
(
FWP.FollowUpDate >= '01-01-2011 00:00:00:000'
and FWP.FollowUpDate <= '01-13-2014 23:59:59.000'
and FWP.Status = 'PTP'
) GROUP BY FWP.[Status], FWP.DebtorId
)
SELECT t.debtorid FROM t;