Select between dates and with initial values - sql

I need to make a graph from a log. The log entries are not in regular intervals.
I would like to select rows between dates along with what the values were immediately before the start date (that is, from whenever the immediatly preceeding log was entered).
So, let's say:
table Foo has id and value columns,
table Bar has id, foo_id, and value columns, and
table BarLog has id, foo_id, bar_id, bar_value and timestamp.
So there can be many Bars for one Foo.
I need all rows from BarLog for all Bars given some foo_id between, say, 07/01/2012 and 07/31/2012 and the value (row) for each Bar as it was on 07/01/2012.
Hope that made sense, if not, I'll try to clarify.
EDIT (above left for context):
Let's simplify this down another step. If I have a table with two foreign keys, fk_a and fk_b, and a timestamp, how can I get the most recent rows with a given fk_a and a distict fk_b.
As suggested, here's an example.
+----+------+------+-------------+
| id | fk_a | fk_b | timestamp |
+----+------+------+-------------+
| 1 | 1 | 1 | 01-JUL-2012 |
| 2 | 2 | 2 | 02-JUL-2012 |
| 3 | 1 | 1 | 04-JUL-2012 |
| 4 | 2 | 2 | 05-JUL-2012 |
| 5 | 1 | 3 | 07-JUL-2012 |
+----+------+------+-------------+
Given a fk_a of 1, I would want rows 3 and 5. So looking only at rows 1, 3, and 5 (those with fk_a of 1), get the most recent of each fk_b (where row 3 is more recent than row 1 for fk_b=1).
Thanks again.

Are you looking for something like this?
SELECT bl.bar_value, timestamp
FROM foo f, bar b, barlog bl
WHERE f.id = b.id
AND b.foo_id = bl.foo_id
AND timestamp BETWEEN '01-JUL-2012' AND '31-JUL-2012'
AND b.foo_id = :enter_value_here
ORDER BY timestamp DESC
Use the :enter_value_here to add the foo_id you need the data for...
What plotting tool are you using? You can take the data-set and push it into excel for plotting..in any case, hopefully the query above can get you closer to what you're trying to do.

For a dense set, create a date table and run the following query:
DECLARE #StartDate datetime
SET #StartDate = '2012-01-01'
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM Foo f
inner join Bar b on f.id = b.foo_id
WHERE /*enter criteria for bar selection*/
UNION ALL
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM (
SELECT MAX(bl.timestamp) as bl_timestamp, bl.bar_id as bar_id
FROM Dates d
INNER JOIN BarLog bl on bl.timestamp < d.Date
WHERE /*enter criteria for bar selection*/
GROUP BY bl.bar_id
) as pi
INNER JOIN BarLog bl on pi.bar_id = bl.bar_id and bl.timestamp = pi.bl_timestamp
WHERE d.Day_Of_Month = 1 and d.Date between #StartDate and getDate()
AND /*enter criteria for bar selection*/
The date table can be something like http://it.toolbox.com/wiki/index.php/Create_a_Time_Dimension_/_Date_Table or could be created temporarily each query by:
CREATE TABLE #Dates ([Date] datetime, Day_Of_Month int)
DECLARE #cDate datetime
SET #cDate = #StartDate
WHILE #cDate < getdate()
BEGIN
INSERT INTO #Dates (Date, Day_Of_Month)
SELECT #cDate, Datepart(d, #cdate)
SET #cDate = DATEADD(m, 1 + DATEDIFF(m, 0, #cdate), 0)
END
with a DROP TABLE #Dates sitting after the select.
This query will return:
Foo_ID, Bar_ID, Value at datestamp, Datestamp
with the datestamps incrementing by 1 month at a time.

Finally found this question which had what I was looking for. Basically just joining with a grouped select. So the answer for my edit would be something like
SELECT * FROM SomeTable a
JOIN (
SELECT fk_b, MAX(timestamp) as latest
FROM SomeTable
GROUP BY fk_b
) b
ON a.id = b.id
WHERE a.fk_a = #someIdA
Which would return the latest of each distinct fk_b with a specified fk_a
The original question would just be a union of this with a simple get between dates

Related

How to select rows with max date older then some value

I have Microsoft SQL Server 2008 and a table with data like this:
id | file_date [datatime] | file_path [varchar(255)]
____________________________________________________
1 | 01-01-1999 | C:\f1.txt
2 | 01-01-2020 | C:\f2.txt
3 | 05-05-1999 | C:\f3.txt
4 | 05-05-2020 | C:\f3.txt
5 | 05-05-1999 | C:\f4.txt
6 | 06-05-1999 | C:\f4.txt
I need to select all file_paths, where file_date is old and no other rows with this file_path with newer file_date exists
For example, if I have to fetch rows with dates older then 2019, my result should be like this:
file_path
C:\f1.txt
C:\f4.txt
I have a solution:
SELECT rslt.file_path
FROM mytable rslt
GROUP BY rslt.file_path
HAVING MAX(rslt.file_date) < '2019-01-01'
The problem is that this script takes ~2 minutes to returns ~62k of rows in a table, where I have 44.6 millions of rows, and simple script to take all rows older than the date (see below) takes 2-3 seconds
SELECT * FROM mytable WHERE file_date < '2019-01-01'
So, is there any way to optimize my solution?
How long does this take?
SELECT t.file_path
FROM mytable t
WHERE NOT EXISTS (SELECT 1
FROM mytable t2
WHERE t2.file_path = t.file_path AND t2.file_date >= '2019-01-01'
);
You want an index on (file_path, file_date) for best performance.
Could you do a negation of your second faster query and do a NOT IN?
SELECT rslt.file_path
FROM mytable rslt
WHERE rslt.file_path NOT IN
(SELECT rslt2.file_path
FROM mytable rslt2
WHERE rslt2.file_path IS NOT NULL
AND rslt2.file_date >= '2019-01-01')
GROUP BY rslt.file_path;
NOT IN appears to get a bit funky if the selection pulls back nulls, so I put a IS NOT NULL in the where of the inner query as well, but it may not be necessary for you.
DECLARE #TargetDate date = '01-01-2019'
DECLARE #PathList TABLE (id int, file_date datetime, file_path varchar(255))
INSERT INTO #PathList VALUES
(1, '01-01-1999', 'C:\f1.txt')
, (2, '01-01-2020', 'C:\f2.txt')
, (3, '05-05-1999', 'C:\f3.txt')
, (4, '05-05-2020', 'C:\f3.txt')
, (5, '05-05-1999', 'C:\f4.txt')
, (6, '06-05-1999', 'C:\f4.txt')
;
SELECT DISTINCT
PL.file_path
FROM #PathList PL
LEFT JOIN #PathList PH ON PH.file_path = PL.file_path
AND PH.file_date >= #TargetDate
WHERE
PL.file_date < #TargetDate
AND PH.id IS NULL
Check this
SELECT rslt.file_path, MAX(rslt.file_date) as Max_file_date
into #t
FROM mytable rslt
GROUP BY rslt.file_path
Select file_path
From #t
Where Max_file_date < '2019-01-01'
or try
SELECT rslt.file_path
into #t
FROM mytable rslt
WHERE file_date < '2019-01-01'
GROUP BY rslt.file_path

Exclude rows where dates exist in another table

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?
If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');
What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.
How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6
Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

How to I get a correct average number of appointments per day?

I want to see what the average number of appointments is by each appointment type is. Basically I have the following tables and columns:
Table 1 - Dates
-----------
Date date (primary key)
Table 2 - Appointments
-----------
AppointmentStart Datetime
ApptId Numeric
FacilityId Numeric
ApptKind Numeric
Appointmentid Numeric
Table 3 AppointmentType
-----------
ApptTypeId Numeric
Name Varchar
Sample Data
============
Table 1 Date
---------------
date
1/1/2017
1/2/2017
...
Table 2 Appointment
----------------
ApptStart | ApptTypeId | FacilityId | ApptKind | ApptId
2017-1-1 9:00:00 1 2 1 2385525
2017-1-1 9:15:00 3 2 1 2385526
2017-1-1 9:30:00 2 2 1 2385527
...
Table 3 ApptType
-----------------
ApptTypeId | Name
1 Walk-in
2 MAT
3 Acute
...
There are about 30 different appointment types and not all of them occur every day. So far I have created a table that lists every date in the time range that I want then I do a left join with the count of appointments (nulls equal 0). I also remove Saturdays and Sundays. This works really well for one appointment type but when I do this with multiple appointment types zeroes only show up for the days where there are no appointments.
My solution:
Somehow insert each appointment type next to each day then do the left join with the NULL = 0 part although I don't know how to get the list to repeat for each day in the table.
Example:
At the end I want
EndResult
----------
Average(Count(appts)) | ApptType.Name
OR
EndResult
---------
Count(apptid) | ApptType.Name | Date
5 Acute 1/1/2017
0 MAT 1/1/2017
4 Walk-in 1/1/2017
0 Other 1/1/2017
Then repeat for the next day with the same appointment type names
This is how I would write a query that gets you to
End Result #2:
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptTypeID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
This assumes that ApptTypeID is indeed part of Table2. You can wrap this result up further to get your End Result #1:
SELECT Avg(D.ApptCount), D.ApptTypeName
FROM (
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
) AS D
GROUP BY D.ApptTypeName
First we declare and populate table variables for example data.
DECLARE #Dates TABLE (
Date DATE
)
INSERT #Dates
VALUES
('2017-01-01')
,('2017-01-02')
DECLARE #Appointments TABLE (
AppointmentStart DATETIME
,ApptId INT
,FacilityId INT
,ApptKind INT
,Appointmentid INT
)
INSERT #Appointments
VALUES
('2017-01-01 09:00:00.000', 1, 2, 1, 2385525)
,('2017-01-01 09:15:00.000', 3, 2, 1, 2385526)
,('2017-01-01 09:30:00.000', 2, 2, 1, 2385527)
DECLARE #ApptType TABLE (
ApptTypeId INT
,Name VARCHAR(32)
)
INSERT #ApptType
VALUES
(1, 'Walk-in')
,(2, 'MAT')
,(3, 'Acute')
This shows us the cartesian product of a full outer join of Dates and ApptType.
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1
We can use the cartesian product as our left data set, and count the number of items in our right data set (#Appointments). By doing this with a left join, we ensure that every date/appointment type combination is included, even if there were no appointments of that type on that date.
SELECT
A.[Date]
,A.[Name]
,COUNT(B.Appointmentid)
FROM (
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1) AS A
LEFT JOIN #Appointments AS B
ON A.[ApptTypeId] = B.[ApptId]
AND A.[Date] = CAST(B.[AppointmentStart] AS DATE)
GROUP BY
A.[Date]
,A.[Name]
ORDER BY
A.[Date]
,A.[Name]

Ingres SQL, find the max value of one column based on the value of another column

I'm working on an Ingres DB with a script I've inherited from someone else. I need to change the script to pull out the action_times of the latest start_time and end_time event, and also the difference between the two. A sample of the DB is listed below
id_num | version | action_id | action_time
----------------------------------------------------------------------------
1 2 start_time 2014-05-26 14:58:14
1 2 end_time 2014-05-26 14:58:16
1 4 start_time 2014-05-27 10:10:57
1 4 end_time 2014-05-27 10:10:11
So far what I've come up with is:
SELECT max(a.action_time) as BIG, max(b.action_time) as SMALL, max(a.action_time) - max(b.action_time) as DIFF
FROM table1 as a, table1 as b,
WHERE a.id_num = '1' AND a.action_id = 'end_time' AND b.id_num = '1' AND b.action_id = 'start_time'
but the results are coming out as follows:
BIG SMALL DIFF
----------------------------------------------------------------------------
2014-05-27 10:10:11 2014-05-27 10:10:57 null
Apologies if a question like this has already been answered (I'm sure it probably has) but I've spent a couple of days looking over various forums and I can't find a similar example, probably how I'm phrasing the search terms. Any help would be much appreciated, I'm pretty sure I would have covered something like this in college but that was a few years ago and my SQL is a bit rusty these days. Thanks in advance!
Edit: So after some research I have come up with the following which will work in the DB GUI:
SELECT ingresdate(varchar(max(a.action_time))) as BIG, ingresdate(varchar(max(b.action_time))) as SMALL, date_part('secs',ingresdate(varchar(max(a.action_time))) - ingresdate(varchar(max(b.action_time)))) as DIFF
FROM table1 as a, table1 as b,
WHERE a.id_num = '1' AND a.action_id = 'end_time' AND b.id_num = '1' AND b.action_id = 'start_time'
If you want to calculate the difference between max(a.acction_time), and max(b.acction_time) you should use the following script:
SELECT max(a.acction_time) as BIG, max(b.acction_time) as SMALL,DATEDIFF(s, max(a.acction_time), max(b.acction_time)) as DIFF
FROM table1 as a, table1 as b
WHERE a.id_num = '1' AND a.action_id = 'end_time' AND b.id_num = '1' AND b.action_id = 'start_time'
If you do not remember DATEDIFF() function i will explain it for you.
P.S: where is the Primary key in your table1?!!
I would use sub-selects for this. Try :-
select a.action_time as max_end_time, b.action_time as max_start_time,
a.action_time - b.action_time as diff
from table a, table b
where a.action_time = (select max(action_time)
from table where action_id = 'end_time')
and b.action_time = (select max(action_time)
from table where action_id = 'start_time)
Here is my attempt:
SELECT start.action_time, end.action_time,
interval('seconds', end.action_time - start.action_time ) as diff_secs
FROM
(
SELECT action_time
FROM table a
INNER JOIN
( SELECT max(id_num) as max_id_num, max(version) as max_version FROM table
) b on ( id_num = max_id_num and version = max_version )
WHERE a.action_id = 'start_time'
) start
CROSS JOIN
(
SELECT action_time
FROM table a
INNER JOIN
( SELECT max(id_num) as max_id_num, max(version) as max_version FROM table
) b on ( id_num = max_id_num and version = max_version )
WHERE a.action_id = 'end_time'
) end
Using your data I get the following output:
+----------------------+----------------------+-----------+
| action_time | action_time | diff_secs |
+----------------------+----------------------+-----------+
| 27-May-2014 10:10:57 | 27-May-2014 10:10:11 | -46 |
+----------------------+----------------------+-----------+
For reference, here is the script I used to create and populate the test table
CREATE TABLE table
(
id_num integer,
version integer,
action_id char(10),
action_time timestamp
)
INSERT INTO table VALUES (1,2,'start_time', '2014-05-26 14:58:14');
INSERT INTO table VALUES (1,2,'end_time', '2014-05-26 14:58:16');
INSERT INTO table VALUES (1,4,'start_time', '2014-05-27 10:10:57');
INSERT INTO table VALUES (1,4,'end_time', '2014-05-27 10:10:11');

Greatest Date group by TCP address

What I want: I'm having problems with a greatest-n-per-group problem. My group is a set of TCP Addresses and the n is the date at which the table row was inserted into the database.
The problem: I'm currently getting all rows with tcp Addresses which match my where clause, rather then one with the largest date per tcp address.
I'm trying to follow this example and failing: SQL Select only rows with Max Value on a Column.
Here's what my table looks like.
CREATE TABLE IF NOT EXISTS `xactions` (
`id` int(15) NOT NULL AUTO_INCREMENT,
`tcpAddress` varchar(40) NOT NULL,
//a whole lot of other stuff in batween
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=150 ;
Example rows are
ID | tcpAddress | ... | date
1 | 192.168.1.161 | ... | 2012-09-12 14:19:39
2 | 192.168.1.162 | ... | 2012-09-12 14:19:40
3 | 192.168.1.162 | ... | 2012-09-12 14:19:41
4 | 192.168.1.162 | ... | 2012-09-12 14:19:42
SQL statement I'm trying to use
select yt.id, yt.tcpAddress, yt.analog, yt.discrete, yt.counter, yt.date
from xactions yt
inner join(
select id, tcpAddress, analog, discrete, counter, max(date) date
from xactions
WHERE tcpAddress='192.168.1.161' OR tcpAddress='192.168.1.162'
group by date
) ss on yt.id = ss.id and yt.date= ss.date
You need to group by the tcpAddress, not by the date.
And join by the tcpAddress, not the id.
select yt.id, yt.tcpAddress, yt.analog, yt.discrete, yt.counter, yt.date
from xactions yt
inner join (
select tcpAddress, max(date) date
from xactions
where tcpAddress in ('192.168.1.161', '192.168.1.162')
group by tcpAddress
) ss using (tcpAddress, date);
Also, you don't need to select any extra columns in the derived table -- only the tcpAddress and the max(date).
Also you can use option with EXISTS(). In EXISTS() find MAX(date) for each group of tcpAddress
and compare them
SELECT id, tcpAddress, analog, discrete, counter, date
FROM xactions x1
WHERE EXISTS (
SELECT 1
FROM xactions x2
WHERE x1.tcpAddress = x2.tcpAddress
HAVING MAX(x2.date) = x1.date
) AND (tcpAddress='192.168.1.161' OR tcpAddress='192.168.1.162')