Finding nearest dates in SQL - sql

I know that there are some threads on this subject, however, my query is slightly different to what I've seen and the solutions presented before don't seem to be working for me.
I have two tables, X and Y, here simplified to one ID, in fact of course I have multiple IDs. The period category lasts from the Date given to the beginning of the next period.
ID Date Period
A 12/01/2010 1
A 12/03/2010 2
A 15/06/2010 3
A 17/08/2010 4
A 20/10/2010 5
and
ID SampleDate
A 20/01/2010
A 25/01/2010
A 21/11/2010
What I need to get is:
ID SampleDate Period
A 20/01/2010 1
A 25/01/2010 1
A 21/11/2010 5
I've tried this:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1
This produces the correct size of the table, but instead of giving the perspective periods for the samples, it just prints out the top period number for all of them (for a given ID).
Any idea where I'm making a mistake?

There is nothing in your query that removes entries with negative datediff, so if you add that to the join:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID and X.Date < Y.sampleDate /* skip periods after the one we're interested in */
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1

Related

How to select k-th record per field in a single SQL query

please help me with the following problem. I have spent already one week trying to put all the logic into one SQL query​ but still got no elegant result. I hope the SQL experts could give me a hint,
I have a table which has 4 fields: date, expire_month, expire_year and value. The primary key is defined on 3 first fields. Thus for a concrete date few values are present with different expire_month, expire_year. I need to chose one value from them for every date, present in the table.
For example, when I execute a query:
SELECT date, expire_month, expire_year, value FROM futures
WHERE date = ‘1989-12-01' ORDER BY expire_year, expire_month;
I get a list of values for the same date sorted by expirity (months are coded with letters):
1989-12-01 Z 1989 408.25
1989-12-01 H 1990 408.25
1989-12-01 K 1990 389
1989-12-01 N 1990 359.75
1989-12-01 U 1990 364.5
1989-12-01 Z 1990 375
The correct single value for that date is the k-th record from top. For example, of k is 2 then the «correct single» record would be:
1989-12-01 H 1990 408.25
How can I select these «correct single» values for every date in my table?
You can do it with rank():
select t.date, t.expire_month, t.expire_year, t.value from (
select *,
rank() over(partition by date order by expire_year, expire_month) rn
from futures
) t
where t.rn = 2
The column rn in the subquery, is actually the rank of the row grouped by date. Change 2 to the rank you want.
While forpas's answer is the better one (Though I think I'd use row_number() instead of rank() here), window functions are fairly recent additions to Sqlite (In 3.25). If you're stuck on an old version and can't upgrade, here's an alternative:
SELECT date, expire_month, expire_year, value
FROM futures AS f
WHERE (date, expire_month, expire_year) =
(SELECT f2.date, f2.expire_month, f2.expire_year
FROM futures AS f2
WHERE f.date = f2.date
ORDER BY f2.expire_year, f2.expire_month
LIMIT 1 OFFSET 1)
ORDER BY date;
The OFFSET value is 1 less than the Kth row - so 1 for the second row, 2 for the third row, etc.
It executes a correlated subquery for every row in the table, though, which isn't ideal. Hopefully your composite primary key columns are in the order date, expire_year, expire_month, which will help a lot by eliminating the need for additional sorting in it.
You can try the following query.
select * from
(
SELECT rownum seq, date1, expire_month, expire_year, value FROM testtable
WHERE date1 = to_date('1989-12-01','yyyy-mm-dd')
ORDER BY expire_year, expire_month
)
where seq=2

Can we modify the previous row and use it in current row in a SQL query for a list?

I've looked around and found a few posts with LAG() and running total type queries, but none seem to fit what I'm looking for. Maybe i'm not using the correct terms in my search or maybe I might be over complicating the situation. Hope someone could help me out.
But what I'm looking to do is to take the previous result and multiple it by the current row for a range of dates. The starting is always some base number lets do 10 to keep it simple. The values will be float, but i kept it to round numbers here to better explain my inquiry.
The first is showing the calculation part and the 2nd table below is showing what the result should look like in the end.
date val1 calc_result
20120930 null 10
20121031 2 10*2=20
20121130 3 20*3=60
20121231 1 60*1=60
20130131 2 60*2=120
20130228 1 120*1=120
The query would return
20120930 10
20121031 20
20121130 60
20121231 60
20130131 120
20130228 120
I'm trying to see if this can be done in a query type solution or would a PL/SQL table/cursors need to be used?
Any help would be appreciated.
You can do this with a recursive CTE:
with dates as (
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select t.date, t.val, 10 as calc_result
from dates t
where t.seqnum = 1
union all
select t.date, t.val, cte.calc_result * t.val
from cte join
dates t
on t.seqnum = cte.seqnum + 1
)
select cte.date, cte.calc_result
from cte
order by cte.date;
This is calculating a cumulative product. You can do it with some exponential arithmetic. Replace 10 in the query with the desired start value.
select date,val1
,case when row_number() over(order by date) = 1 then 10 --set start value for first row
else 10*exp(sum(ln(val1)) over(order by date)) end as res
from tbl

Get sum of previous 6 values including the group

I need to sum up the values for the last 7 days,so it should be the current plus the previous 6. This should happen for each row i.e. in each row the column value would be current + previous 6.
The case :-
(Note:- I will calculate the hours,by suming up the seconds).
I tried using the below query :-
select SUM([drivingTime]) OVER(PARTITION BY driverid ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from [f.DriverHseCan]
The problem I face is I have to do grouping on driver,asset for a date
In the above case,the driving time should be sumed up and then,its previous 6 rows should be taken,
I cant do this using rank() because I need these rows as well as I have to show it in the report.
I tried doing this in SSRS and SQL both.
In short it is adding total driving time for current+ 6 previous days
Try the following query
SELECT
s.date
, s.driverid
, s.assetid
, s.drivingtime
, SUM(s2.drivingtime) AS total_drivingtime
FROM f.DriverHseCan s
JOIN (
SELECT date,driverid, SUM(drivingtime) drivingtime
FROM f.DriverHseCan
GROUP BY date,driverid
) AS s2
ON s.driverid = s2.driverid AND s2.date BETWEEN DATEADD(d,-6,s.date) AND s.date
GROUP BY
s.date
, s.driverid
, s.assetid
, s.drivingtime
If you have week start/end dates, there could be better performing alternatives to solve your problem, e.g. use the week number in SSRS expressions rather than do the self join on SQL server
I think aggregation does what you want:
select sum(sum([drivingTime])) over (partition by driverid
order by date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
from [f.DriverHseCan]
group by driverid, date
I guess you need to use CROSS APPLY.
Something like following? :
SELECT driverID,
date,
CA.Last6DayDrivingTime
FROM YourTable YT
CROSS APPLY
(
SELECT SUM(drivingTime) AS Last6DayDrivingTime
FROM YourTable CA ON CA.driverID=YT.driverID
WHERE CA.date BETWEEN DATEADD(DAY,-6,YT.date) AND YT.date)
) CA
Edit:
As you commented that cross apply slow down the performance, other option is to pre calculate the week values in temp table or using CTE and then use them in your main query.

How to select the first row from group by date [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 8 years ago.
I am writing a program for amateur radio. Some callsigns will appear more than once in the data but the qsodate will be different. I only want the first occurrence of a call sign after a given date.
The query
select distinct
a.callsign,
a.SKCC_Number,
a.qsodate,
b.name,
a.SPC,
a.Band
from qso a, skccdata b
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
and b.callsign = a.callsign
order by a.QSODate
The problem:
Because contacts occur on different dates, I get all of the contacts - I have tried adding min(a.qsodate) to get only the first but then I run into all sorts of issues regarding grouping.
This query will be in a stored procedure, so creating temp tables or cursors will not be a problem.
You can use the ROW_NUMBER() to get the first row with the first date, like this:
WITH CTE
AS
(
select
a.callsign,
a.SKCC_Number,
a.qsodate,
b.name,
a.SPC,
a.Band,
ROW_NUMBER() OVER(PARTITION BY a.callsign ORDER BY a.QSODate) AS RN
from qso a,skccdata b
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
and b.callsign = a.callsign
)
SELECT *
FROM CTE
WHERE RN = 1;
ROW_NUMBER() OVER(PARTITION BY a.callsign ORDER BY a.QSODate) will give you a ranking number for each group of callsign ordered by QSODate, then the WHERE RN = 1 will eliminate all the rows except the first one which has the minimum QSODate.
Have you tried starting your query with SELECT TOP 1 ...(fields) Then you will only get one row. You can use TOP x .... for x number of rows, or TOP 50 PERCENT for the top half of the rows, etc. Then you can eliminate DISTINCT in this case
EDIT: misunderstood question. How about this?
select
a.callsign,
a.SKCC_Number,
a.qsodate,
(SELECT TOP 1 b.name FROM skccdata b WHERE b.callsign = a.callsign) as NAME,
a.SPC,
a.Band
from qso a
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
GROUP BY a.QSODate, a.callsign, a.SKCC_Number, a.SPC, a.Band
order by a.QSODate
and add callsign to your where clause to isolate callsigns

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate