T-SQL Aggregation of Overlapping Date Times From Large View

T-SQL Aggregation of Overlapping Date Times From Large View - sql

The task is: I have an application that is similar to a time card if you will. However, any employee may have 1 or more claim entries that overlap with another. The aggregation is currently being done in VB.NET, however there are huge performance issues this way. So, my object here is to use T-SQL if possible to do this for me. Hopefully this makes sense. Each claim entry will have a notes field that should be combined if the entries overlap. So, it works something like this:
Claim-1: ClaimID-123 Start-"9:00" End-"10:00" Notes-"Testing 1"
Claim-2: ClaimID-456 Start-"9:30" End-"10:30" Notes-"Testing 2"
Desired Result: Start-"9:00", End-"10:30", concatenating the notes column to include notes from both claim entries.
SQL Code Start
SELECT s1.StartTime,
MIN(t1.EndTime) As EndTime
FROM vw_ClaimLine s1
INNER JOIN vw_ClaimLine t1 ON s1.StartTime <= t1.EndTime
AND NOT EXISTS(SELECT * FROM vw_ClaimLine t2
WHERE t1.EndTime >= t2.StartTime AND t1.EndTime < t2.EndTime)
WHERE NOT EXISTS(SELECT * FROM vw_ClaimLine s2
WHERE s1.StartTime > s2.StartTime AND s1.StartTime <= s2.EndTime)
AND
s1.RecDate BETWEEN '4-01-2018' AND '4-1-2018' AND s1.ProvidedBy = 233
GROUP BY s1.StartTime

Related

Amazon SQL job interview question: customers who made 2+ purchases -- is it doable in DAX?

You have a simple table that has only two fields: CustomerID, DateOfPurchase. List all customers that made at least 2 purchases in any period of six months. You may assume the table has the data for the last 10 years. Also, there is no PK or unique value.
One possible solution for this question is as follows:
SELECT DISTINCT CustomerID
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.CustomerID = t1.CustomerID AND
t2.DateOfPurchase > t1.DateOfPurchase AND
t2.DateOfPurchase <= DATEADD(month, 6, t1.DateOfPurchase));
I was wondering if we can do something similar in DAX. For the sake of simplicity, let's assume everything is in one table and there is no relationship.
Thanks

You got me at
For the sake of simplicity
Maybe this(?)
Table =
SUMMARIZE(
FILTER(
yourTable,
VAR CurrentCustomerID = yourTable[CustomerID]
VAR CurrentDateOfPurchase = yourTable[DateOfPurchase]
RETURN
NOT ISEMPTY(
CALCULATETABLE(
VALUES(yourTable[CustomerID]),
ALL(yourTable),
yourTable[CustomerID] = CurrentCustomerID,
yourTable[DateOfPurchase] < CurrentDateOfPurchase,
yourTable[DateOfPurchase] >= EDATE(CurrentDateOfPurchase,-6)
)
)
),
yourTable[CustomerID]
)
Update
Since there are no more answers, I am sharing the simulated data so that someone else will be encouraged to respond and we can all learn something from this. Around 50,000 customers, 100,000 transactions, 10 years. Date format: MDY
https://drive.google.com/file/d/1JPS4XHfpGSTXuNWIMdoPlp5Z1BqV9WGO/view?usp=sharing

SQL Query to Find if (Count of date > X) = 0 For Group of ID

I apologize if the title is not be correct as I'm not sure what I need to ask for, since I don't know how to build the query.
I have the following query built to return a list of chemicals and other related fields.
SELECT DISTINCT
RDB.Chemical_Record.[Chemical_ID],
RDB.Chemical_Record.[Expires_Date],
RDB.Assay_Group.[Assay_Group_Name] AS [Assay Group],
RDB.Chemical.[Chemical_Name],
RDB.Chemical.[Product_Number],
RDB.Chemical_Record.[Lot_Number],
RDB.Storage_Location.[Location_Name]
FROM RDB.Chemical_Record
LEFT JOIN RDB.Chemical ON Chemical_Record.[Chemical_ID] = Chemical.[ID_Chemical]
LEFT JOIN RDB.Storage_Location ON Storage_Location.[ID_Storage_Location] = Chemical_Record.[Storage_Location_ID]
LEFT JOIN RDB.Chemical_To_AGroup ON Chemical_To_AGroup.[Chemical_ID] = Chemical_Record.[Chemical_ID]
LEFT JOIN RDB.Assay_Group ON Assay_Group.[ID_Assay_Group] = Chemical_To_AGroup.[Assay_Group_ID]
WHERE RDB.Chemical_Record.[Expires_Date] >= DATEADD(day,-60, GETDATE())
ORDER BY RDB.Chemical_Record.[Chemical_ID], RDB.Chemical_Record.[Expires_Date], RDB.Assay_Group.[Assay_Group_Name]
I am using this query in a VB.Net application where it exports the results to an Excel worksheet and then performs additional actions to delete the rows I don't need. The process to query is quick, but working with Excel from .Net is painful and slow.
Instead I'd like to build the query to return the exact results I want, which I think is possible, I just can't figure out how. I have tried using a combination of Count, Group and Having, but since I've never worked with those I can't get them to work for me.
Example:
SELECT
COUNT(RDB.Chemical_Record.[Chemical_ID]) Count_ID,
RDB.Chemical_Record.[Chemical_ID],
RDB.Chemical_Record.[Expires_Date]
FROM RDB.Chemical_Record
WHERE RDB.Chemical_Record.[Expires_Date] > DATEADD(day,30,GETDATE())
GROUP BY RDB.Chemical_Record.[Chemical_ID], RDB.Chemical_Record.[Expires_Date]
ORDER BY RDB.Chemical_Record.[Chemical_ID]
As you can see from this example, it doesn't return the count of ID's where Expiration Date > DATEADD(day,30,GETDATE()) nor does it return the ID's that I actually wanted.
What I need to return is all chemicals (ID) that DO NOT have an expiration date > Today + 30 for that specific ID. The screenshot below shows an example of the data that gets pulled. The yellow highlighted rows are the only two in that set that should get returned as there are no other chemicals of those two ID's with an expiration date > Today + 30. All the other ID's should not show up since they DO have ID's of COUNT(Expiration Date > Today + 30) > 0.
If someone could help me build the query using the appropriate Aggregate functions, it would be MUCH appreciated.

What I need to return is all chemicals (ID) that DO NOT have an expiration date > Today + 30 for that specific ID.
For this question, you can use a HAVING clause. No WHERE is needed:
SELECT COUNT(*) as Count_ID, cr.[Chemical_ID]
FROM RDB.Chemical_Record cr
GROUP BY cr.[Chemical_ID]
HAVING MAX(cr.Expires_Date) <= DATEADD(day, 30, GETDATE())
ORDER BY cr.[Chemical_ID]

Using the HAVING MAX solved my problem and I was then able to work out exactly what I needed. I had to do some more research to figure out how to bring all my columns back, but that wasn't as difficult.
Here is my final solution:
WITH CHEM AS (
SELECT RDB.Chemical_Record.[Chemical_ID]
FROM RDB.Chemical_Record
GROUP BY RDB.Chemical_Record.[Chemical_ID]
HAVING MAX(RDB.Chemical_Record.Expires_Date) <= DATEADD(day, 60, GETDATE())
)
SELECT DISTINCT
RDB.Chemical_Record.[Chemical_ID],
RDB.Chemical_Record.[Expires_Date],
RDB.Assay_Group.[Assay_Group_Name] AS [Assay Group],
RDB.Chemical.[Chemical_Name],
RDB.Chemical.[Product_Number],
RDB.Chemical_Record.[Lot_Number],
RDB.Storage_Location.[Location_Name]
FROM RDB.Chemical_Record
INNER JOIN CHEM ON CHEM.Chemical_ID = RDB.Chemical_Record.Chemical_ID
LEFT JOIN RDB.Chemical ON Chemical_Record.[Chemical_ID] = Chemical.[ID_Chemical]
LEFT JOIN RDB.Storage_Location ON Storage_Location.[ID_Storage_Location] = Chemical_Record.[Storage_Location_ID]
LEFT JOIN RDB.Chemical_To_AGroup ON Chemical_To_AGroup.[Chemical_ID] = Chemical_Record.[Chemical_ID]
LEFT JOIN RDB.Assay_Group ON Assay_Group.[ID_Assay_Group] = Chemical_To_AGroup.[Assay_Group_ID]
WHERE Expires_Date >= DATEADD(day, -60, GETDATE())
ORDER BY RDB.Chemical_Record.[Chemical_ID], RDB.Chemical_Record.Expires_Date
And a screenshot showing the resulting search:

Access 2013 Query, DateDiff from Consecutive Rows TimeStamps

I'm facing a problem successfully completing (running) a query on a singular table in Access 2013 using SQL to complete a Datediff on consecutive/Sequential rows of timestamps, which track status changes in tickets going through our ticketing system.
The table titled: dbo_Master3_FieldHistory, has a field which tracks timestamps each time a ticket's status changes. Unfortunately, it only includes 1 timestamp per change, meaning it doesn't inherently have a secondary timestamp for when the status is changed again, which I need to run a DateDiff to calculate AGE for tickets, based on Status.
I found a plausible solution for this on StackOverflow, linked below. When i tried to implement this solution, as is with minor adjustments, and including adjustments for filtering out old data and particular fields, it just freezes my Access program and never times out (have to force close Access)
Date Difference between consecutive rows
'This is the basic code, traslated from the linked StackOverflow solution to fit this tables fields (I believed)
SELECT T.mrID, T.mrSEQUENCE, T.mrUSERID, T.mrFIELDNAME, T.mrNEWFIELDVALUE, T.mrOLDFIELDVALUE, T.mrTIMESTAMP, T.mrNextTIMESTAMP, DateDiff("s",T.mrTIMESTAMP, T.mrNextTIMESTAMP) AS STATUSTIME
FROM (
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP,
(SELECT MIN(mrTIMESTAMP)
FROM dbo_MASTER3_FIELDHISTORY AS T2
WHERE T2.mrID = T1.mrID
AND T2.mrTIMESTAMP > T1.mrTIMESTAMP
) As mrNextTIMESTAMP
FROM dbo_MASTER3_FIELDHISTORY AS T1
) AS T
'This is the code that I wanted to use to account for filtering out two particular fields, limiting the data to tickets (mrID) newer than 1/1/2018 and only those where the mrFIELDNAME is mrSTATUS
SELECT T.mrID, T.mrSEQUENCE, T.mrUSERID, T.mrFIELDNAME, T.mrNEWFIELDVALUE, T.mrOLDFIELDVALUE, T.mrTIMESTAMP, T.mrNextTIMESTAMP, DateDiff("s",T.mrTIMESTAMP, T.mrNextTIMESTAMP) AS STATUSTIME
FROM (
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP,
(SELECT MIN(mrTIMESTAMP)
FROM dbo_MASTER3_FIELDHISTORY AS T2
WHERE mrFIELDNAME = "mrSTATUS"
AND T2.mrID = T1.mrID
AND T2.mrTIMESTAMP > T1.mrTIMESTAMP
) As T1.mrNextTIMESTAMP
FROM dbo_MASTER3_FIELDHISTORY AS T1
WHERE mrFIELDNAME = "mrSTATUS"
AND mrTIMESTAMP >= #1/1/2018#
) AS T;
Access freezes when I try to run these queries. I've tried several ways but can't get it to work

I was able to figure it out, thank you to those who took your time to read through this interesting challenge. Instead of using the second code set in the link provided, I utilized the first and it worked beautifully. With some additions to the code to account for other filters/criteria, I have the results I need.
SELECT T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP, MIN(T2.mrTIMESTAMP) AS mrNextTIMESTAMP, DATEDIFF("s", T1.mrTIMESTAMP, MIN(T2.mrTIMESTAMP)) AS TimeInStatus
FROM ((dbo_MASTER3_FIELDHISTORY AS T1 LEFT JOIN dbo_MASTER3_FIELDHISTORY AS T2 ON (T2.mrTIMESTAMP > T1.mrTIMESTAMP) AND (T1.mrID = T2.mrID)) INNER JOIN dbo_MASTER3 AS T4 ON (T4.mrID = T1.mrID))
WHERE T4.mrSUBMITDATE >= #1/1/2018#
AND t1.mrFIELDNAME = "mrSTATUS"
AND NOT T4.mrSTATUS="_Deleted_"
AND NOT T4.mrSTATUS="_SOLVED_"
AND NOT T4.mrSTATUS="_PENDING_SOLUTION_"
GROUP BY T1.mrID, T1.mrSEQUENCE, T1.mrUSERID, T1.mrFIELDNAME, T1.mrNEWFIELDVALUE, T1.mrOLDFIELDVALUE, T1.mrTIMESTAMP
ORDER BY T1.mrID, T1.mrTIMESTAMP;
Sincerely,
Kristopher

Simplify SQL query to Interbase DB

I have WPF application which task is to drag data from Interbase DB. Note, that this DB is located on the remote network device. Also, Firebird ado.net data provider is used.
One of my query looks like:
SELECT
T1.ind_st,
T2.ttt,
T2.tdtdtd,
sumr
FROM ((SELECT ind_st,
Sum(r) AS sumR
FROM (SELECT ind_st,
rrr AS r
FROM srok_tel
WHERE date_ch = '23.07.2018 0:00:00'
AND srok_ch = '18'
AND ind_st >= 33049
AND ind_st <= 34717
UNION
SELECT ind_st,
-rrr AS r
FROM srok_tel
WHERE date_ch = '23.07.2018 0:00:00'
AND srok_ch = '12'
AND ind_st >= 33049
AND ind_st <= 34717
UNION
SELECT ind_st,
rrr AS r
FROM srok_tel
WHERE date_ch = '24.07.2018 0:00:00'
AND srok_ch IN ( 6, 12 )
AND ind_st >= 33049
AND ind_st <= 34717)
GROUP BY ind_st) T1
JOIN (SELECT ind_st,
ttt,
tdtdtd
FROM srok_tel
WHERE date_ch = '24.07.2018 0:00:00'
AND srok_ch = '12'
AND ind_st >= 33049
AND ind_st <= 34717) T2
ON T1.ind_st = T2.ind_st)
Yes, heavy, hard to read at first look and probably written in a wrong way, but my task is to drag all data with one query and I am NOT sql pro.
Target table (SROK_TEL), from with data is selecting, contains aproximately 10^7 rows. Query run time is about 90 seconds, which is significantly more, then I wish to see.
Any suggestions about how to make this query work faster?
UPDATE1: On luisarcher's request I've added a query plan (hope that's exactly what he asked for)
PLAN JOIN (SORT ((T1 SROK_TEL NATURAL)
PLAN (T1 SROK_TEL NATURAL)
PLAN (T1 SROK_TEL NATURAL)), T2 SROK_TEL INDEX (PK_SROK_TEL))

I've had an issue like yours not long ago, so I'll share some tips that apply to your situation:
1) If you don't mind having duplicates, you can use UNION ALL instead of UNION. You can see why here
2) Restrict the data you use. This one is important; I got about 90% of execution time reduced by correctly removing data I don't need from the query (more specific where clauses, not selecting useless data).
3) Check if you can add an index in your table srok_tel.

Check all associated records before returning a value

So, I'm trying to create a collection in SCCM which I would like to give me a
list of assets(name0) that don't have an .ide file linked to them that is newer than
21 days old. Once identified I can go off and investigate why these assets are not updating.
So far I have written the following query in SSMS before I set it up in SCCM,
but it's become evident that this isn't the correct approach .
SELECT DISTINCT v_GS_SYSTEM.Name0
FROM v_GS_SYSTEM inner join v_GS_SoftwareFile
ON v_GS_SoftwareFile.ResourceID = v_GS_SYSTEM.ResourceID
WHERE (DATEDIFF(day, v_GS_SoftwareFile.ModifiedDate, getdate()) >=21)
AND NOT
(DATEDIFF(day, v_GS_SoftwareFile.ModifiedDate, getdate()) <=21)
AND
v_GS_SoftwareFile.FileName like '/%.ide/'
ORDER BY v_GS_SYSTEM.Name0;
This code returns the "correct" values but doesn't consider the fact that an asset may
still have newer ide files related to it, which defeats the purpose of this exercise.
So (I think!) my question is, is there a way check if Name0 has any associated ModifiedDate records
newer than 21 days and only return a value if this check returns true/false?
EDIT: edited #MatBailie answer with output:

To join all '*.ide' Files to their Resource, but only for resources that have not had any '*.ide' files modified in the last 21 days...
SELECT
s.Name0,
f.FilePath,
f.FileName
FROM
(
SELECT
*,
MAX(ModifiedDate) OVER (PARTITION BY ResourceID) AS ResourceMaxModifiedDate
FROM
v_GS_SoftwareFile
WHERE
FileName LIKE '%.ide'
)
AS f
INNER JOIN
v_GS_SYSTEM AS s
ON s.ResourceID = f.ResourceID
WHERE
f.ResourceMaxModifiedDate <= DATEADD(DAY, -21, GETDATE())
ORDER BY
s.Name0,
f.FilePath,
f.FileName
To get all Resources that have had no '*.ide' files modified in the last 21 days...
SELECT
s.Name0
FROM
v_GS_SYSTEM AS s
WHERE
NOT EXISTS (
SELECT *
FROM v_GS_SoftwareFile AS f
WHERE f.FileName LIKE '%.ide'
AND f.ResourceID = s.ResourceID
AND f.ModifiedDate >= DATEADD(DAY, -21, GETDATE())
)
ORDER BY
s.name0
Consider your indexes on these tables depending on which query you end up with. A Covering index over (ResourceID, ModifiedDate) would be useful. And a flag for the file type would be useful too (LIKE '*.ide' is going to require scanning the rows to find the matches, it can't be solved with a typical index).

You can simply add an additional EXISTS clause where you check this.
I believe the query you're trying to write is:
SELECT DISTINCT vs.Name0
, vsf.FilePath
, vsf.FileName
FROM v_GS_SYSTEM vs
INNER JOIN v_GS_SoftwareFile vsf
ON vsf.ResourceID = vs.ResourceID
WHERE (DATEDIFF(day, vsf.ModifiedDate, getdate()) >= 21)
AND NOT (DATEDIFF(day, vsf.ModifiedDate, getdate()) <= 21) -- this seems a bit redundant and might even exclude some rows where the result is exactly (21 * 24) hours
AND vsf.FileName LIKE '/%.ide/'
AND NOT EXISTS (
SELECT 1
FROM v_GS_SoftwareFile vsf2
WHERE vsf2.ModifiedDate > GETDATE() - 21
AND vsf2.ResourceId = vsf.ResourceId
)
ORDER BY vs.Name0;
You can basically change the check the TRUE / FALSE by keeping or removing the NOT in the AND NOT EXISTS.
Edit:
Since you were mentioning performance problems check if you have non-clustered idexes:
on column ResourceId in v_GS_SoftwareFile table (I really hope this is not a view, but the v_ at the start kind of makes me think this one is).
on column ResourceID in v_GS_SYSTEM table (same concerned comment here)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

T-SQL Aggregation of Overlapping Date Times From Large View - sql

Related

Amazon SQL job interview question: customers who made 2+ purchases -- is it doable in DAX?

SQL Query to Find if (Count of date > X) = 0 For Group of ID

Access 2013 Query, DateDiff from Consecutive Rows TimeStamps

Simplify SQL query to Interbase DB

Check all associated records before returning a value

Categories

Resources