update: changed one time to show that the times per shipment may not be in sequential order always.
here is my input
create table test
(
shipment_id int,
stop_seq tinyint,
time datetime
)
insert into test values (1,1,'2009-8-10 8:00:00')
insert into test values (1,2,'2009-8-10 9:00:00')
insert into test values (1,3,'2009-8-10 10:00:00')
insert into test values (2,1,'2009-8-10 13:00:00')
insert into test values (2,2,'2009-8-10 14:00:00')
insert into test values (2,3,'2009-8-10 20:00:00')
insert into test values (2,4,'2009-8-10 18:00:00')
the output that i want is below
shipment_id start end
----------- ----- ---
1 8:00 10:00
2 13:00 18:00
i need to take the time from the min(stop) row for each shipment and the time from the max(stop) row and place in start/end respectively. i know this can be done with multiple queries rather easily but i am looking to see if a single select query can do this.
thanks!
I think the only way you'll be able to do it is with sub-queries.
SELECT shipment_id
, (SELECT TOP 1 time
FROM test AS [b]
WHERE b.shipment_id = a.shipment_id
AND b.stop_seq = MIN(a.stop_seq)) AS [start]
, (SELECT TOP 1 time
FROM test AS [b]
WHERE b.shipment_id = a.shipment_id
AND b.stop_seq = MAX(a.stop_seq)) AS [end]
FROM test AS [a]
GROUP BY shipment_id
You'll need to use the DATEPART function to chop up the time column to get your exact output.
Use a Common Table Expression (CTE) - this works (at least on my SQL Server 2008 test system):
WITH SeqMinMax(SeqID, MinID, MaxID) AS
(
SELECT Shipment_ID, MIN(stop_seq), MAX(stop_seq)
FROM test
GROUP BY Shipment_ID
)
SELECT
SeqID 'Shipment_ID',
(SELECT TIME FROM test
WHERE shipment_id = smm.seqid AND stop_seq = smm.minid) 'Start',
(SELECT TIME FROM test
WHERE shipment_id = smm.seqid AND stop_seq = smm.maxid) 'End'
FROM seqminmax smm
The SeqMinMax CTE selects the min and max "stop_seq" values for each "shipment_id", and the rest of the query then builds on those values to retrieve the associated times from the table "test".
CTE's are supported on SQL Server 2005 (and are a SQL:2003 standard feature - no Microsoft "invention", really).
Marc
Am I correct in thinking that you want the first time rather than the 'min' time, and the last time in the sequence rather than the 'max' time?
SELECT C.shipment_id, C.start, B2.time AS stop FROM
(
SELECT A.shipment_id, B1.time AS start, A.max_stop_seq FROM
(
SELECT shipment_id, MIN(stop_seq) as min_stop_seq, MAX(stop_seq) as max_stop_seq
FROM test
GROUP BY shipment_id
) AS A
INNER JOIN
(
SELECT shipment_id, stop_seq, time FROM test
) AS B1
ON A.shipment_id = B1.shipment_id AND A.min_stop_seq = B1.stop_seq
) AS C
INNER JOIN
(
SELECT shipment_id, stop_seq, time FROM test
) AS B2
ON C.shipment_id = B2.shipment_id AND C.max_stop_seq = B2.stop_seq
select t1.shipment_id, t1.time start, t2.time [end]
from (
select shipment_id, min(stop_seq) min, max(stop_seq) max
from test
group by shipment_id
) a
inner join test t1 on a.shipment_id = t1.shipment_id and a.min = t1.stop_seq
inner join test t2 on a.shipment_id = t2.shipment_id and a.max = t2.stop_seq
I suggest you take advantage of row_number and PIVOT. This may look messy, but I think it will perform well, and it's more adaptable to various assumptions. For example, it doesn't assume that the latest datetime value corresponds to the largest stop_seq value for a given shipment.
with test_ranked(shipment_id,stop_seq,time,rankup,rankdown) as (
select
shipment_id, stop_seq, time,
row_number() over (
partition by shipment_id
order by stop_seq
),
row_number() over (
partition by shipment_id
order by stop_seq desc
)
from test
), test_extreme_times(shipment_id,tag,time) as (
select
shipment_id, 'start', time
from test_ranked where rankup = 1
union all
select
shipment_id, 'end', time
from test_ranked where rankdown = 1
)
select
shipment_id, [start], [end]
from test_extreme_times
pivot (max(time) for tag in ([start],[end])) P
order by shipment_id;
go
The PIVOT isn't really needed, but it's handy. However, do note that the MAX inside the PIVOT expression doesn't do anything useful. There's only one [time] value for each tag, so MIN would work just as well. The syntax requires an aggregate function in this position.
Addendum: Here's an adaptation of CptSkippy's solution that may be more efficient than using MIN and MAX if you have a shipments table:
SELECT shipment_id
, (SELECT TOP 1 time
FROM test AS [b]
WHERE b.shipment_id = a.shipment_id
ORDER BY stop_seq ASC) AS [start]
, (SELECT TOP 1 time
FROM test AS [b]
WHERE b.shipment_id = a.shipment_id
ORDER BY stop_seq DESC) AS [end]
FROM shipments_table AS [a];
Related
The problem: a simplified membership table containing membership id, starting date for each membership and membership level description:
CREATE TABLE cover
(
[membership_id] int,
[cover_from_date] date,
[description] varchar(57)
);
INSERT INTO cover ([membership_id], [cover_from_date], [description])
VALUES (1, '1/1/2011', 'AA'),
(1, '1/2/2011', 'BB'),
(1, '1/3/2011', 'CC'),
(1, '1/4/2011', 'CC');
The task: to list the current membership and the immediate previous membership different to the current one. So from the above table I would like to see something like:
1, 1/4/2011, CC, 1/2/2011, BB
The attempted solution: I have managed to come up with a solution but it takes an enormous time to run on a large database and I'm sure there are better ways of resolving this problem. My no-doubt over complicated query is as follows:
with cte as
(
select
cover.membership_id, cover.cover_from_date,
cover.description,
row_number() over (partition by cover.membership_id order by cover.cover_from_date desc) AS version_no
from
cover
)
select
cte.membership_id,
cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
from
cte
left outer join
cte cover_now on cte.membership_id = cover_now.membership_id
and cover_now.version_no = 1
left outer join
cte cover_prev on cte.membership_id = cover_prev.membership_id
and cover_prev.version_no = (select min(x.version_no)
from cte x
where x.version_no >= 2
and x.membership_id = cover_now.membership_id
and x.description <> cover_now.description)
group by
cte.membership_id, cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
The entire fiddle is located here. Any tips on how to optimise the query would be appreciated.
First create an index on membership_id and cover_from_date in descending order. It will be heavily used by this query.
create index cover_by_date on cover (membership_id asc, cover_from_date desc)
Then:
select
membership.membership_id,
membership.cover_from_date,
membership.description,
previous_membership.cover_from_date,
previous_membership.description
from
(
select membership_id, description, cover_from_date, row_number() over (partition by membership_id order by cover_from_date desc) as rank
from cover
) as membership
left join (
select previous.membership_id, previous.description, previous.cover_from_date, row_number() over (partition by previous.membership_id order by previous.cover_from_date desc) as rank
from cover
join cover as previous on
cover.membership_id = previous.membership_id and
cover.description <> previous.description and
cover.cover_from_date > previous.cover_from_date
) as previous_membership on
previous_membership.membership_id = membership.membership_id and
previous_membership.rank = 1
where
membership.rank = 1
We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1
You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO
You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle
The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.
I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)
This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000
A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.
This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20
I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.
Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.
I am new to sql queries.
Table is defined as
( symbol varchar,
high int,
low int,
today date,
Primary key (symbol, today)
)
I need to find for each symbol in a given date range, max(high) and min(low) and corresponding dates for max(high) and min(low).
Okay to get first max date and min date in given table.
In a given date range some dates may be missing. If start date is not present then next date should be used and if last date is not present then earlier available date should be used
Data is for one year and around 5000 symbols.
I tried something like this
SELECT a.symbol,
a.maxValue,
a.maxdate,
b.minValue,
b.mindate
FROM (
SELECT table1.symbol, max_a.maxValue, max_a.maxdate
FROM table1
INNER JOIN (
SELECT table1.symbol,
max(table1.high) AS maxValue,
table1.TODAY AS maxdate
FROM table1
GROUP BY table1.symbol
) AS max_a
ON max_a.symbol = table1.symbol
AND table1.today = max_a.maxdate
) AS a
INNER JOIN (
SELECT symbol,
min_b.minValue,
min_b.mindate
FROM table1
INNER JOIN (
SELECT symbol,
min(low) AS minValue,
table1.TODAY AS mindate
FROM table1
GROUP BY testnsebav.symbol
) AS min_b
ON min_b.symbol = table1.symbol
AND table1.today = min_b.mindate
) AS b
ON a.symbol = b.symbol
The first INNER query pre-qualifies for each symbol what the low and high values are within the date range provided. After that, it joins back to the original table again (for same date range criteria), but also adds the qualifier that EITHER the low OR the high matches the MIN() or MAX() from the PreQuery. If so, allows it in the result set.
Now, the result columns. Not knowing which version SQL you were using, I have the first 3 columns as the "Final" values. The following 3 columns after that come from the record that qualified by EITHER of the qualifiers. As stocks go up and down all the time, its possible for the high and/or low values to occur more than once within the same time period. This will include ALL those entries that qualify the MIN() / MAX() criteria.
select
PreQuery.Symbol,
PreQuery.LowForSymbol,
PreQuery.HighForSymbol,
tFinal.Today as DateOfMatch,
tFinal.Low as DateMatchLow,
tFinal.High as DateMatchHigh
from
( select
t1.symbol,
min( t1.low ) as LowForSymbol,
max( t1.high ) as HighForSymbol
from
table1 t1
where
t1.today between YourFromDateParameter and YourToDateParameter
group by
t1.symbol ) PreQuery
JOIN table1 tFinal
on PreQuery.Symbol = tFinal.Symbol
AND tFinal.today between YourFromDateParameter and YourToDateParameter
AND ( tFinal.Low = LowForSymbol
OR tFinal.High = HighForSymbol )