Quickly querying with effective dates on rows themselves with dates - sql

Suppose I have a table called measurement. This table's purpose to measure a numeric "value" (which itself is calculated from other data) for a "series_id" at a particular "date".
Now let's add effective dating to this table with "effective_start" (inclusive) and "effective_end" (inclusive) fields.
DDL:
CREATE TABLE public.measurement
(
date date NOT NULL,
effective_end date NOT NULL,
effective_start date NOT NULL,
series_id character varying(255) COLLATE pg_catalog."default" NOT NULL,
value numeric,
CONSTRAINT measurement_pkey PRIMARY KEY (date, effective_end, effective_start, series_id)
)
My challenge is to now quickly, and with SQL only (I have Java code and a partial query that solves this), construct a query that results the following:
For all series, at a particular date in time (query parameter), return back the measurement that is the most recent (maximum "date") that was effective at the particular date in time being queried.
My current "all-SQL" solution is a view, combined with a query over the view:
DDL for the view:
CREATE OR REPLACE VIEW public.known_at AS
SELECT o.date,
o.effective_end,
o.effective_start,
o.series_id,
o.value
FROM measurement o
JOIN ( SELECT o_1.series_id,
min(o_1.effective_start) AS effective_start,
o_1.date
FROM measurement o_1
GROUP BY o_1.series_id, o_1.date) x ON o.series_id::text = x.series_id::text AND o.effective_start = x.effective_start AND o.date = x.date
JOIN ( SELECT o_1.series_id,
o_1.effective_start,
max(o_1.date) AS date
FROM measurement o_1
GROUP BY o_1.series_id, o_1.effective_start) y ON x.series_id::text = y.series_id::text AND x.effective_start = y.effective_start AND x.date = y.date
WHERE o.date <= o.effective_start
ORDER BY o.date DESC, o.series_id DESC;
Query:
select k.* from known_at k
inner join (
select
k.series_id,
max(k.date) as date
from known_at k
-- the passed in date here is a parameter as described above
where k.date <= '2020-03-26'
group by k.series_id) as mx
on k.series_id = mx.series_id and k.date = mx.date
order by k.series_id;
Unfortunately, the combination view and query is slow (~400ms) despite btree indices on series_id, date, effective_end, and effective_start. How can I do better?

I think this query should give you the results you want, though without having your dataset it's hard to say what its performance would be like. For this query, I'd recommend a multi-column index on (effective_start, effective_end, series_id, date DESC).
SELECT DISTINCT ON (series_id) *
FROM measurement
WHERE effective_start <= '2020-03-26' -- the passed-in date
AND effective_end >= '2020-03-26' -- the passed-in date
ORDER BY series_id, date DESC;
Explanation: The query filters for rows that include the passed-in date within the effective period, then for each series_id in the filtered rows, the row with the max date is taken.
Also, you may want to consider using a daterange type for the effective dates. Range types come with some useful range operators.

Related

Query to return the matching or nearest previous record by a list of dates

I have a table of records ordered by date. There is a maximum of 1 record per day, but some days there is no record (weekends and bank holidays).
When I query a record by date, if no record exists for that day I am interested in the previous record by date. Eg:
SELECT * FROM rates WHERE date <= $mydate ORDER BY date DESC LIMIT 1;
Given a list of dates, how would I construct a query to return multiple records matching the exact or closest previous record for each date? Is this possible to achieve in a single query?
The array of dates may be spread over a large time frame but I wouldn't necessarily want every record in the entire time span (eg query 20 dates spread over a year long time span).
You can construct the dates as a derived table and then use SQL logic. A lateral join is convenient:
select v.dte, r.*
from (values ($date1), ($date2), ($date3)
) v(dte) left join lateral
(select r.*
from rates r
where r.date <= v.dte
order by r.date desc
limit 1
) r
on 1=1;
You might find it useful to use an array to pass in the dates using an array and using unnest() on that array.

Postgres: Unable to determine percent of successful events ending in a completed trip

SQL Gurus,
I'm trying to solve this challenging problem as I'm practicing my SQL skills, however I'm stuck and would appreciate if someone could help.
A signup is defined as an event labelled ‘sign_up_success’ within the events table. For each city (‘A’ and ‘B’) and each day of the week, determine the percentage of signups in the first week of 2016 that resulted in completed a trip within 10 hours of the sign up date.
Table Name: trips
Column Name Datatype
id integer
client_id integer (Foreign keyed to
events.rider_id)
driver_id integer
city_id Integer (Foreign keyed to
cities.city_id)
client_rating integer
driver_rating integer
request_at Timestamp with timezone
predicted_eta Integer
actual_eta Integer
status Enum(‘completed’,
‘cancelled_by_driver’, ‘cancelled_by_client’)
Table Name: cities
Column Name Datatype
city_id integer
city_name string
Table Name: events
Column Name Datatype
device_id integer
rider_id integer
city_id integer
event_name Enum(‘sign_up_success’, ‘attempted_sign_up’,
‘sign_up_failure’)
_ts Timestamp with timezone
Tried something on this lines, however its no where near the expected answer:
SELECT *
FROM trips AS trips
LEFT JOIN cities AS cities ON trips.city_id = cities.city_id
LEFT JOIN events AS events ON events.client_id = events.rider_id
WHERE events.event_name = "sign_up_success"
AND Convert(datetime, trips.request_at') <= Convert(datetime, '2016-01-
07' )
AND DATEDIFF(d, Convert(datetime, events._ts), Convert(datetime,
trips.request_at)) < 7 days
AND events.status = "completed
Desired Results look like below:
Monday A x%
Monday B y%
Tuesday A z%
Tuesday A p%
Can someone please help.
First of all, I assume that "trips"."city_id" is mandatory, so I use INNER JOIN instead of LEFT JOIN when joining with cities.
Then, to specify string constants, you need to use single quotes.
There are some other changes in the query -- hope you'll notice them yourself.
Also, the query might fail, since I didn't run it actually (you didn't provide boilerplate SQL unfortunately).
date_trunc() function with 'week' first parameter converts your timestamp to "first day of the corresponding week, time 00:00:00", based on your current timezone settings (see https://www.postgresql.org/docs/current/static/functions-datetime.html).
I used GROUP BY on that value and second "layer" of grouping was city ID.
Also, I used "filter (where ...)" next to count() -- it allows to count only desired rows.
Finally, I used CTE to improve the query's structure and readability.
Let me know if it fails, I'll fix it. In general, this approach must work.
with data as (
select
left(date_trunc('week', t.request_at)::text, 10) as period,
c.city_id,
count(distinct t.id) as trips_count,
count(*) filter (
where
e.event_name = 'sign_up_success'
and e._ts < t.request_at + interval '10 hour'
) as successes_count
from trips as t
join cities as c on t.city_id = c.city_id
left join events as e on t.client_id = e.rider_id and e._ts
where
t.request_at between '2016-01-01' and '2016-01-08'
group by 1, 2
)
select
*,
round(100 * success_count::numeric / trips_count, 2)::text || '%' as ratio_percent
from data
order by period, city_id
;

Creation of Oracle index date column for Oracle

SELECT FILE_SUB_RET_DATE_TIME
FROM
(SELECT Y.FILE_SUB_RET_DATE_TIME,
ROW_NUMBER() OVER (partition by Y.WR_FILE_TRANS_INFO_ID order by Y.FILE_SUB_RET_DATE_TIME DESC) rowByID
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN '11-DEC-2015' AND '03-FEB-2017')
WHERE rowByID = 1;
I got some performance issue and we need to add the index for this date column and i am looking for help whether its going to be straight index or any thing more than that..
You should not use STRINGS when you compare with DATE values, because it depends on current session NLS-Settings. Use DATE literal or TO_DATE(), functions (resp. TIMESTAMP and TO_TIMESTAMP).
It depends on your data whether Oracle will use an index on FILE_SUB_RET_DATE_TIME column, post the execution plan.
I don't think subquery is required in your case, this query should return the same result.
SELECT Max(FILE_SUB_RET_DATE_TIME)
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN DATE '2015-12-11' AND DATE '2017-02-03'
GROUP BY WR_FILE_TRANS_INFO_ID;

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate

Query aggregate faster than MAX

I have a fairly large table in which one of the columns is a date column. The query I execute is as follows.
select max(date) from tbl where date < to_date('10/01/2010','MM/DD/YYYY')
That is, I want to find the cell value closest to and less than a particular date value. This takes considerable time because of the max on the large table. Is there a faster way to do this? maybe using LAST_VALUE?
Put an index on the date column and the query should be plenty fast.
1) Add an index to the date column. Simply put, an index allows the database engine to store information about the data so it will speed up most queries where that column is one of the clauses. Info here http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
2) Consider adding a second clause to the query. You have where date < to_date('10/01/2010','MM/DD/YYYY') now, why not change it to:
where date < to_date('10/01/2010','MM/DD/YYYY') and date > to_date('09/30/2010', 'MM/DD/YYYY')
since this will reduce the number of scanned rows.
Try
select date from (
select date from tbl where date < to_date('10/01/2010','MM/DD/YYYY') order by date desc
) where rownum = 1