Finding data closest to date? - sql

I have a table
CREATE TABLE `symbol_details` (
`symbol_header_id` int(11) DEFAULT NULL,
`DATE` datetime DEFAULT NULL,
`ADJ_NAV` double DEFAULT NULL
)
with ~20,000,000 entries. Now I want to find the ADJ_NAV value closest to the end of the quarter for just one symbol_header_id:
SET #quarterend = '2009-3-31';
SELECT symbol_header_id AS she, ADJ_NAV AS aend FROM symbol_details
WHERE
symbol_header_id = 18546
AND DATE= (
# date closest after quarter end
SELECT DATE FROM symbol_details
WHERE ABS(DATEDIFF(DATE, #quarterend)) < 10
AND DATE<=#quarterend
AND symbol_header_id = 18546
ORDER BY ABS(DATEDIFF(DATE, #quarterend)) ASC LIMIT 1)
When I run the inner "select date" query it returns quickly. Just running the outer query with the correct date filled in instead of the subquery also finishes very quick. But when I run the whole thing it takes forever - something is wrong?

Seems that the optimizer has some problems to properly evaluate the statement and find the most efficient plan. (In Oracle, I'd ask you to update the statistics, but I'm not sure how the optimizer works in MySQL.)
I'd try some other ways of expressing your statement to see what makes most sense to the optimizer:
explicitly connect the two symbol_header_ids of the immer and the outer query
try a SELECT max(date) .. instead of the 'Order By Limit 1'
try to do a self join of symbol_details
Hope there is a useful idea in here.

You can probably do without the subquery. Just grab the first row:
SELECT *
FROM symbol_details
WHERE DATE <= #quarterend
AND symbol_header_id = 18546
ORDER BY DATE DESC
LIMIT 1

Try:
SELECT t.symbol_header_id,
COALESCE(t.adj_nav, '0.0') 'adj_nav'
FROM SYMBOL_DETAILS t
LEFT JOIN (SELECT sh.symbol_header_id,
MAX(sh.date) 'max_date'
FROM SYMBOL_DETAILS sh
WHERE ABS(DATEDIFF(sh.date, #quarter_end)) < 10
AND sh.date <= #quarter_end) x ON x.symbol_header_id = t.symbol_header_id
AND x.max_date = t.date
WHERE t.symbol_header_id = 18546

Related

How to use DISTINCT in one column when SELECTING from many columns

I'm trying to select columns from two different views but I only want to use the DISTINCT statement on one specific column. I thought using the GROUP BY statement would work but it's throwing an error.
SELECT DISTINCT
[Act].[ClientId]
, [Ref].[Agency]
, [Act].[FundCode]
, [Act].[VService]
, [Act].[Service]
, [Act].[Attended]
, [Act].[StartDate]
FROM [dbo].[FS_v_CrossReference_ALL] AS [Ref]
INNER JOIN [dbo].[FS_v_Activities] AS [Act] ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] BETWEEN '1/1/2015' AND '12/31/2015'
GROUP BY [Act].[ClientId]
I want to use the DISTINCT statement on [Act].[ClientId]. Is there any way to do this?
Presumably, you want row_number():
SELECT ar.*
FROM (SELECT Act.*, Reg.Agency,
ROW_NUMBER() OVER (PARTITION BY Act.ClientId ORDER BY ACT.StartDate DESC) as seqnum
FROM [dbo].[FS_v_CrossReference_ALL] [Ref] JOIN
[dbo].[FS_v_Activities] Act
ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] >= '2015-01-01' AND
[Act].[StartDate] < '2016-01-01'
) ar
WHERE seqnum = 1;
Particularly note the changes to the date comparisons:
The dates are in standard format (YYYY-MM-DD or YYYYMMDD).
BETWEEN is replaced by two inequalities. This makes the code robust if the date is really a date/time with a time component.

oracle query for get 20 top agency which have pass issued

I have a query which show datewise pass issued by agency. I wanted to get top 20 agency who have most pass issued here is my query
Nothing in your data id identifies "agency". If I assume you mean "agent", you can get the top 20 by aggregating and then limiting the result. In Oracle 12C+, you can use:
SELECT gp.agent_id, a.agent_name, COUNT(*)
FROM eofficeuat.gatepass gp INNER JOIN
eofficeuat.cnf_agents a
ON gp.agent_id = a.agent_id INNER JOIN
eofficeuat.cardprintlog_user u
ON gp.agent_id = u.agent_id
WHERE gp.issuedatetime BETWEN DATE '2019-09-28' AND DATE '2019-09-29'
GROUP BY gp.agent_id, a.agent_name
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
In earlier versions, a subquery is needed:
SELECT *
FROM (SELECT gp.agent_id, a.agent_name, COUNT(*)
FROM eofficeuat.gatepass gp INNER JOIN
eofficeuat.cnf_agents a
ON gp.agent_id = a.agent_id INNER JOIN
eofficeuat.cardprintlog_user u
ON gp.agent_id = u.agent_id
WHERE gp.issuedatetime BETWEEN DATE '2019-09-28' AND DATE '2019-09-29'
GROUP BY gp.agent_id, a.agent_name
ORDER BY COUNT(*) DESC
) a
WHERE rownum <= 20;
Obviously, if you do mean "agency" and that is identified by different columns, you would just adjust the SELECT and GROUP BY clauses.
Also, I would advise you never to use BETWEEN on dates in Oracle. There is a time component that might cause issues.
If you intend only times on '2019-09-28', then:
gp.issuedatetime >= DATE '2019-09-28' AND
gp.issuedatetime < DATE '2019-09-29'
If you intend both the 28th and 29th:
gp.issuedatetime >= DATE '2019-09-28' AND
gp.issuedatetime < DATE '2019-09-30'
You can use LIMIT clause(12c or higher version) with TOP 20 records as following:
SELECT eofficeuat.gatepass.agent_id, eofficeuat.cnf_agents.agent_name, COUNT(1) as cnt
FROM eofficeuat.gatepass INNER JOIN
eofficeuat.cnf_agents
ON eofficeuat.gatepass.agent_id = eofficeuat.cnf_agents.agent_id INNER JOIN
eofficeuat.cardprintlog_user
ON eofficeuat.gatepass.agent_id = eofficeuat.cardprintlog_user.agent_id
WHERE eofficeuat.gatepass.issuedatetime BETWEN DATE '2019-09-28' AND DATE '2019-09-29'
GROUP BY eofficeuat.gatepass.agent_id, eofficeuat.cnf_agents.agent_name
ORDER BY cnt DESC
FETCH FIRST 20 ROWS ONLY; -- this will fetch top 20 agents
Cheers!!

Multiple query date range

I have multiple queries that I run each day, Is there a way to set a date range for multiple selects rather than enter the date in each select.I use oracle sql developer to run these.
here is an example of just three but there are around a dozen queries:
select count(*)"Tall ord"
from cr_ordpar
inner join cr_palhis on cr_palhis.pikref = cr_ordpar.pikref
inner join CR_LODHED_DESP on cr_lodhed_desp.ilodno = cr_ordpar.ILODNO
where cr_palhis.hstdat between '24-nov-15 07:00' and '25-nov-15 07:00:00'
and hststs_str = 'Pallet Output From Racking'
and cr_palhis.palhgt = '2700'
order by cr_lodhed_desp.lodref;
select count(*)"Tall del"
from cr_palhis
where cr_palhis.hstdat between '24-nov-15 07:00' and '25-nov-15 07:00:00'
and hststs_str = 'Pallet Deleted'
and cr_palhis.palhgt = 2700
and cr_palhis.rakblk <> 510;
select count(*)"Tall ind"
from cr_palhis
where cr_palhis.hstdat between '24-nov-15 07:00' and '25-nov-15 07:00:00'
and hststs_str = 'Pallet Indexed'
and cr_palhis.PALHGT = '2700';
There are multiple issues with your DATE.
Firstly, '24-nov-15 07:00' is not a DATE, it is a string. You must always use TO_DATE to explicitly convert a literal into DATE. Also, remember TO_DATE is NLS dependent if you do not mention the NLS_DATE_LANGUAGE. NOV is not the same in other languages.
YY format was long ago recongnised as Y2K bug and the world already saw the efforts to get rid off it. Either mention complete YYYY or use RR format, but understand it before implementing.
TO_DATE('24-nov-2015 07:00','dd-mon-yyyy hh24:mi','nls_date_language=ENGLISH')
You could use GROUP BY clause to get the different counts in a single query. Given that all the conditions remain same.
SELECT hststs_str,
COUNT(*) cnt
FROM cr_ordpar
INNER JOIN cr_palhis
ON cr_palhis.pikref = cr_ordpar.pikref
INNER JOIN CR_LODHED_DESP
ON cr_lodhed_desp.ilodno = cr_ordpar.ILODNO
WHERE cr_palhis.hstdat
BETWEEN TO_DATE('24-nov-2015 07:00','dd-mon-yyyy hh24:mi','nls_date_language=ENGLISH')
AND TO_DATE('25-nov-2015 07:00','dd-mon-yyyy hh24:mi','nls_date_language=ENGLISH')
AND hststs_str IN( 'Pallet Output From Racking',
'Pallet Deleted',
'Pallet Indexed'
)
AND cr_palhis.palhgt = '2700'
AND cr_palhis.rakblk <> 510
GROUP BY hststs_str
ORDER BY cnt;

How can I include in schedules today's departures after midnight using GTFS?

I began with GTFS and offhand ran into big problem with my SQL query:
SELECT *, ( some columns AS shortcuts )
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN( $incodes )
AND trips.service_id IN ( $service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t,l,sm
ORDER BY t ASC, l DESC
LIMIT 14
This should show departures from some stop in next 3 hours.
It works but with approaching midnight (e.g. 23:50) it catch only "today's departure". After midnight it catch only "new day departures" and departures from previous day are missing, because they have departure_time e.g. "24:05" (=not bigger than $time 00:05).
Is possible to use something lighter than UNION same query for next day?
If UNION is using, how can I ORDER departures for trimming by LIMIT?
Trips.start_time and end_time are my auxiliary variables for accelerate SQL query execution, it means sequence1-arrival_time and MAXsequence-departure_time of any trip.
Using UNION to link together a query for each day is going to be your best bet, unless perhaps you want to issue two completely separate queries and then merge the results together in your application. The contortionism required to do all this with a single SELECT statement (assuming it's even possible) would not be worth the effort.
Part of the complexity here is that the set of active service IDs can vary between consecutive days, so a distinct set must be used for each one. (For a suggestion of how to build this set in SQL using a subquery and table join, see my answer to "How do I use calendar exceptions to generate accurate schedules using GTFS?".)
More complexity arises from the fact the results for each day must be treated differently: For the result set to be ordered correctly, we need to subtract twenty-four hours from all of (and only) yesterday's times.
Try a query like this, following the "pseudo-SQL" in your question and assuming you are using MySQL/MariaDB:
SELECT *, SUBTIME(departure_time, '24:00:00') AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $yesterdays_service_ids )
AND ( departure_time >= ADDTIME($time, '24:00:00') )
AND ( trips.end_time >= ADDTIME($time, '24:00:00') )
AND ( trips.start_time <= ADDTIME($time_plus_3hrs, '24:00:00') )
UNION
SELECT *, departure_time AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $todays_service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t, l, sm
ORDER BY t ASC, l DESC
LIMIT 14

How to self-join table in a way that every record is joined with the "previous" record?

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for Close.
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).
My idea is to do something like this:
select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008
One option is to use a recursive cte (if I'm understanding your requirements correctly):
WITH RNCTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
FROM quotes
),
CTE AS (
SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
FROM RNCTE
WHERE rn = 1
UNION ALL
SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
FROM CTE c
JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
)
SELECT * FROM CTE
ORDER BY symbol, date
SQL Fiddle Demo
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.
Something like this w'd work in SQLite:
SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
AND t1.date < t2.date
GROUP BY t2.ID
HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.
Index on (symbol, date)
SELECT *
FROM quotes q_curr
CROSS APPLY (
SELECT TOP(1) *
FROM quotes
WHERE symbol = q_curr.symbol
AND date < q_curr.date
ORDER BY date DESC
) q_prev
You do something like this:
with OrderedQuotes as
(
select
row_number() over(order by Symbol, Date) RowNum,
ID,
Symbol,
Date,
Open,
High,
Low,
Close
from Quotes
)
select
a.Symbol,
a.Date,
a.Open,
a.High,
a.Low,
a.Close,
a.Date PrevDate,
a.Open PrevOpen,
a.High PrevHigh,
a.Low PrevLow,
a.Close PrevClose,
b.Close-a.Close/a.Close PctChange
from OrderedQuotes a
join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.
You can use option with CTE and ROW_NUMBER ranking function
;WITH cte AS
(
SELECT symbol, date, [Open], [High], [Low], [Close],
ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
FROM quotes
)
SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close],
ISNULL(c2.[Close] / c1.[Close], 0) AS perc
FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle
What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select *
from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null
You could do something like this:
DECLARE #Today DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
;WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE],
DATEADD(DAY, -1, Date) AS yesterday
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
AND today.yesterday = yesterday.Date
That way you limit your "today" results, if that's an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE #Today DATETIME, #PreviousDay DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT #PreviousDay = MAX(Date) FROM quotes WHERE Date < #Today;
WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE]
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
ON today.Symbol = previousday.Symbol
AND previousday.Date = #PreviousDay