How to select rows with max date older then some value - sql

I have Microsoft SQL Server 2008 and a table with data like this:
id | file_date [datatime] | file_path [varchar(255)]
____________________________________________________
1 | 01-01-1999 | C:\f1.txt
2 | 01-01-2020 | C:\f2.txt
3 | 05-05-1999 | C:\f3.txt
4 | 05-05-2020 | C:\f3.txt
5 | 05-05-1999 | C:\f4.txt
6 | 06-05-1999 | C:\f4.txt
I need to select all file_paths, where file_date is old and no other rows with this file_path with newer file_date exists
For example, if I have to fetch rows with dates older then 2019, my result should be like this:
file_path
C:\f1.txt
C:\f4.txt
I have a solution:
SELECT rslt.file_path
FROM mytable rslt
GROUP BY rslt.file_path
HAVING MAX(rslt.file_date) < '2019-01-01'
The problem is that this script takes ~2 minutes to returns ~62k of rows in a table, where I have 44.6 millions of rows, and simple script to take all rows older than the date (see below) takes 2-3 seconds
SELECT * FROM mytable WHERE file_date < '2019-01-01'
So, is there any way to optimize my solution?

How long does this take?
SELECT t.file_path
FROM mytable t
WHERE NOT EXISTS (SELECT 1
FROM mytable t2
WHERE t2.file_path = t.file_path AND t2.file_date >= '2019-01-01'
);
You want an index on (file_path, file_date) for best performance.

Could you do a negation of your second faster query and do a NOT IN?
SELECT rslt.file_path
FROM mytable rslt
WHERE rslt.file_path NOT IN
(SELECT rslt2.file_path
FROM mytable rslt2
WHERE rslt2.file_path IS NOT NULL
AND rslt2.file_date >= '2019-01-01')
GROUP BY rslt.file_path;
NOT IN appears to get a bit funky if the selection pulls back nulls, so I put a IS NOT NULL in the where of the inner query as well, but it may not be necessary for you.

DECLARE #TargetDate date = '01-01-2019'
DECLARE #PathList TABLE (id int, file_date datetime, file_path varchar(255))
INSERT INTO #PathList VALUES
(1, '01-01-1999', 'C:\f1.txt')
, (2, '01-01-2020', 'C:\f2.txt')
, (3, '05-05-1999', 'C:\f3.txt')
, (4, '05-05-2020', 'C:\f3.txt')
, (5, '05-05-1999', 'C:\f4.txt')
, (6, '06-05-1999', 'C:\f4.txt')
;
SELECT DISTINCT
PL.file_path
FROM #PathList PL
LEFT JOIN #PathList PH ON PH.file_path = PL.file_path
AND PH.file_date >= #TargetDate
WHERE
PL.file_date < #TargetDate
AND PH.id IS NULL

Check this
SELECT rslt.file_path, MAX(rslt.file_date) as Max_file_date
into #t
FROM mytable rslt
GROUP BY rslt.file_path
Select file_path
From #t
Where Max_file_date < '2019-01-01'
or try
SELECT rslt.file_path
into #t
FROM mytable rslt
WHERE file_date < '2019-01-01'
GROUP BY rslt.file_path

Related

SQL Server and difficulty generating column with unique values

I'm using Microsoft SQL Server Management Studio, and I want a new column that calculates the following:
If it has an ‘Exec’ value for category, it takes the ‘enddate’.
If it has an ‘Scop’ value for category, it takes the ‘start date’.
This new column calculates the number of months between these two.
I want SQL to do the calculation for a given id, so each id will have different values calculated.
At the moment it takes the minimum enddate and minimum 'startdate' for the entire table.
SELECT
id, category, startdate, enddate,
CASE
WHEN id = id
THEN DATEDIFF(month,
(SELECT MIN(enddate) from [A].[PP] where category = 'Exec'),
(SELECT MIN(startdate) from [A].[PP] where category = 'Scop')) --AS datemodify
ELSE NULL
END
FROM
[A].[PP]
WHERE
startdate IS NOT NULL
AND (category = 'Exec' OR category = 'Scop')
ORDER BY
id ASC
Results it produces at the moment:
id
category
startdate
enddate
NewCOlumn
1
Scop
2022-11-1
2022-10-1
11
1
Exec
2023-11-1
2023-10-1
11
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
The results I want:
id
category
startdate
enddate
NewCOlumn
1
Scop
2021-11-1
2022-10-1
24
1
Exec
2023-11-1
2023-11-1
24
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
Based on comments I'm not sure you still know you want as your output so I've come up with two different versions.
Here's how I'm created a version of your data set:
INSERT INTO #TempTable (ID, Category, StartDate, EndDate)
VALUES (1, 'Scop', '2021-11-01', '2022-10-01'),
(1, 'Exec', '2023-11-01', '2023-10-01'),
(2, 'Scop', '2022-11-01', '2022-10-01'),
(2, 'Exec', '2023-11-01', '2023-10-01');
This is the first version, this created your two lines per ID but hacks the StartDate and EndDate from different rows. This works by selecting all of the data straight out of the temp table, it then goes on to say if the row is Category = Scop then do a DateDiff between the StartDate and then fetches the EndDate from a subquery where the IDs match and the Category = Exec (it also has the same logic applied but the other way around for where the initial Category = Exec):
SELECT TT.ID,
TT.Category,
TT.StartDate,
TT.EndDate,
CASE
WHEN TT.Category = 'Scop' THEN DATEDIFF(M, TT.StartDate, (SELECT EndDate FROM #TempTable WHERE Category = 'Exec' AND ID = TT.ID))
ELSE CASE
WHEN TT.Category = 'Exec' THEN DATEDIFF(M, (SELECT StartDate FROM #TempTable WHERE Category = 'Scop' AND ID = TT.ID), TT.EndDate)
END
END AS DateDiffCalc
FROM #TempTable AS TT;
This version compresses the IDs to a single row, it initially only fetches Scop data, but then joins back to itself using ID and specifices now to get the Exec data only. Now you can DateDiff between the Scop StartDate and the Exec EndDate
SELECT DISTINCT t1.ID,
t1.Category,
t1.StartDate,
T2.Category,
T2.EndDate,
DATEDIFF(M, t1.StartDate, T2.EndDate) AS DateDiffCalc
FROM #TempTable AS t1
INNER JOIN #TempTable AS T2 ON T2.ID = T2.ID AND T2.Category = 'Exec'
WHERE t1.Category = 'Scop'
ORDER BY t1.ID;

Exclude rows where dates exist in another table

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?
If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');
What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.
How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6
Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

Highlight multiple records in a date range

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no
If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.
Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like
One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Select between dates and with initial values

I need to make a graph from a log. The log entries are not in regular intervals.
I would like to select rows between dates along with what the values were immediately before the start date (that is, from whenever the immediatly preceeding log was entered).
So, let's say:
table Foo has id and value columns,
table Bar has id, foo_id, and value columns, and
table BarLog has id, foo_id, bar_id, bar_value and timestamp.
So there can be many Bars for one Foo.
I need all rows from BarLog for all Bars given some foo_id between, say, 07/01/2012 and 07/31/2012 and the value (row) for each Bar as it was on 07/01/2012.
Hope that made sense, if not, I'll try to clarify.
EDIT (above left for context):
Let's simplify this down another step. If I have a table with two foreign keys, fk_a and fk_b, and a timestamp, how can I get the most recent rows with a given fk_a and a distict fk_b.
As suggested, here's an example.
+----+------+------+-------------+
| id | fk_a | fk_b | timestamp |
+----+------+------+-------------+
| 1 | 1 | 1 | 01-JUL-2012 |
| 2 | 2 | 2 | 02-JUL-2012 |
| 3 | 1 | 1 | 04-JUL-2012 |
| 4 | 2 | 2 | 05-JUL-2012 |
| 5 | 1 | 3 | 07-JUL-2012 |
+----+------+------+-------------+
Given a fk_a of 1, I would want rows 3 and 5. So looking only at rows 1, 3, and 5 (those with fk_a of 1), get the most recent of each fk_b (where row 3 is more recent than row 1 for fk_b=1).
Thanks again.
Are you looking for something like this?
SELECT bl.bar_value, timestamp
FROM foo f, bar b, barlog bl
WHERE f.id = b.id
AND b.foo_id = bl.foo_id
AND timestamp BETWEEN '01-JUL-2012' AND '31-JUL-2012'
AND b.foo_id = :enter_value_here
ORDER BY timestamp DESC
Use the :enter_value_here to add the foo_id you need the data for...
What plotting tool are you using? You can take the data-set and push it into excel for plotting..in any case, hopefully the query above can get you closer to what you're trying to do.
For a dense set, create a date table and run the following query:
DECLARE #StartDate datetime
SET #StartDate = '2012-01-01'
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM Foo f
inner join Bar b on f.id = b.foo_id
WHERE /*enter criteria for bar selection*/
UNION ALL
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM (
SELECT MAX(bl.timestamp) as bl_timestamp, bl.bar_id as bar_id
FROM Dates d
INNER JOIN BarLog bl on bl.timestamp < d.Date
WHERE /*enter criteria for bar selection*/
GROUP BY bl.bar_id
) as pi
INNER JOIN BarLog bl on pi.bar_id = bl.bar_id and bl.timestamp = pi.bl_timestamp
WHERE d.Day_Of_Month = 1 and d.Date between #StartDate and getDate()
AND /*enter criteria for bar selection*/
The date table can be something like http://it.toolbox.com/wiki/index.php/Create_a_Time_Dimension_/_Date_Table or could be created temporarily each query by:
CREATE TABLE #Dates ([Date] datetime, Day_Of_Month int)
DECLARE #cDate datetime
SET #cDate = #StartDate
WHILE #cDate < getdate()
BEGIN
INSERT INTO #Dates (Date, Day_Of_Month)
SELECT #cDate, Datepart(d, #cdate)
SET #cDate = DATEADD(m, 1 + DATEDIFF(m, 0, #cdate), 0)
END
with a DROP TABLE #Dates sitting after the select.
This query will return:
Foo_ID, Bar_ID, Value at datestamp, Datestamp
with the datestamps incrementing by 1 month at a time.
Finally found this question which had what I was looking for. Basically just joining with a grouped select. So the answer for my edit would be something like
SELECT * FROM SomeTable a
JOIN (
SELECT fk_b, MAX(timestamp) as latest
FROM SomeTable
GROUP BY fk_b
) b
ON a.id = b.id
WHERE a.fk_a = #someIdA
Which would return the latest of each distinct fk_b with a specified fk_a
The original question would just be a union of this with a simple get between dates

Ungrouping effect?

I have a dynamic set of data X of the form:
----------------------------------
x.id | x.allocated | x.unallocated
----------------------------------
foo | 2 | 0
bar | 1 | 2
----------------------------------
And I need to get to a result of Y (order is unimportant):
----------------------------------
y.id | y.state
----------------------------------
foo | allocated
foo | allocated
bar | allocated
bar | unallocated
bar | unallocated
----------------------------------
I have a UTF based solution, but I'm looking for hyper-efficiency so I'm idly wondering if there's a statement based, non-procedural way to get this kind of "ungroup by" effect?
It feels like an unpivot, but my brain can't get there right now.
If you have a numbers table in your database, you could use that to help get your results. In my database, I have a table named Numbers with a Num column.
Declare #Temp Table(id VarChar(10), Allocated Int, UnAllocated Int)
Insert Into #Temp Values('foo', 2, 0)
Insert Into #Temp Values('bar',1, 2)
Select T.id,'Allocated'
From #Temp T
Inner Join Numbers
On T.Allocated >= Numbers.Num
Union All
Select T.id,'Unallocated'
From #Temp T
Inner Join Numbers
On T.unAllocated >= Numbers.Num
Using Sql Server 2005, UNPIVOT, and CTE you can try something like
DECLARE #Table TABLE(
id VARCHAR(20),
allocated INT,
unallocated INT
)
INSERT INTO #Table SELECT 'foo', 2, 0
INSERT INTO #Table SELECT 'bar', 1, 2
;WITH vals AS (
SELECT *
FROM
(
SELECT id,
allocated,
unallocated
FROM #Table
) p
UNPIVOT (Cnt FOR Action IN (allocated, unallocated)) unpvt
WHERE Cnt > 0
)
, Recurs AS (
SELECT id,
Action,
Cnt - 1 Cnt
FROM vals
UNION ALL
SELECT id,
Action,
Cnt - 1 Cnt
FROM Recurs
WHERE Cnt > 0
)
SELECT id,
Action
FROM Recurs
ORDER BY id, action
This answer is just to ping back to G Mastros and doesn't need any upvotes. I thought he would appreciate a performance boost to his already superior query.
SELECT
T.id,
CASE X.Which WHEN 1 THEN 'Allocated' ELSE 'Unallocated' END
FROM
#Temp T
INNER JOIN Numbers N
On N.Num <= CASE X.Which WHEN 1 THEN T.Allocated ELSE T.Unallocated END
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) X (Which)