Fast way to query latest record?

Fast way to query latest record? - sql

I have a table of the sort:
USER | PLAN | START_DATE | END_DATE
1 | A | 20110101 | NULL
1 | B | 20100101 | 20101231
2 | A | 20100101 | 20100505
In a way that if END_DATE is null, means that this user has that plan currently active.
What I want to query is:
(a) the current plan he has active, or (b) the lastest plan he was into. I need only one row returned for each given user.
Now, I managed to do that in using unions and sub queries, but it happens that table is massive and these are not efficient enough.
Would any of you guys have a quicker way to query that?
Thanks,
[EDIT]
Most answers here return a single value. That was my bad. What I meant was to return a single value per user but all users at once. I've adapted the answers I could (and corrected the question) but just making it clear for future reference.

This question is a little hard to answer without further information about the data and the table. When you say in your comment that you have all the indexes that you need, what are these indexes?
Also, are the time periods abutting and non-overlapping? Can you just get the period with the latest START_DATE?
The problem with looking at END_DATE is that a normal B-Tree index doesn't index NULLs. So, a predicate of the form where end_date is nulll is unlikely to use the index. You could use a bitmap index with the column as those type of indexes do index nulls but that might not be ideal because of some of the other drawbacks of bitmap indexes.
For the reasons given above, I would probably use a query similar to the one below:
select user, plan, start_date, end_date
from (
select
user,
plan,
start_date,
end_date,
row_number() over (partition by user order start_date desc) as row_num_1,
row_number() over (partition by user order end_date desc nulls first) as row_num_2
from user_table
where user = :userid
)
where row_num_1 = 1
You could probably use either the row_num_1 or the row_num_2 column here depending on the exact requirements.
OR
select user, plan, start_date, end_date
from (
select
user,
plan,
start_date,
end_date,
from user_table
where user = :userid
order by start_date desc
)
where rownum = 1
The first query should work whether you are trying get all the users back or just one. The second query will only work with one user.
If you can augment the question with more details of the schema (indexes, meaning of the start/end date) you are likely to get better answers.

CREATE TABLE XY
( USERID INTEGER NOT NULL
, PLAN VARCHAR2(8) NOT NULL
, START_DATE DATE NOT NULL
, END_DATE DATE )
TABLESPACE USERS;
INSERT INTO XY ( USERID, PLAN, START_DATE, END_DATE )
VALUES ( 1, 'A', To_Date('22-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS'), To_Date('22-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS') );
INSERT INTO XY ( USERID, PLAN, START_DATE, END_DATE )
VALUES ( 1, 'B', To_Date('01-04-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS'), NULL );
INSERT INTO XY ( USERID, PLAN, START_DATE, END_DATE )
VALUES ( 2, 'A', To_Date('03-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS'), To_Date('04-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS') );
INSERT INTO XY ( USERID, PLAN, START_DATE, END_DATE )
VALUES ( 2, 'B', To_Date('15-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS'), To_Date('20-05-2011 00:00:00', 'DD-MM-YYYY HH24:MI:SS') );
COMMIT WORK;
SELECT USERID, PLAN, END_DATE, START_DATE
FROM (SELECT USERID,
PLAN,
END_DATE,
START_DATE,
ROW_NUMBER() OVER(PARTITION BY USERID ORDER BY END_DATE DESC) SEQUEN
FROM XY)
WHERE SEQUEN < 2

This may help:
SELECT user,plan,end_date,start_date
FROM ( SELECT users,plans,end_date,start_date, DENSE_RANK() OVER ( PARTITION BY user
ORDER BY end_date DESC) sequen
FROM table_name
)
WHERE sequen <= 2

Have you tried to limit the resultset with rownum?
select plan
from (
select plan
from YourTable
where User = 1
order by
case when end_date is null then '99991231' else end_date end desc
)
where rownum < 2

AFAIK Using CASE and sub queries will cause your query to become very slow. So better to use them with care. How About:
SELECT User, Plan, start_Date, MAX(End_Date) FROM Plans WHERE User NOT IN
(SELECT User FROM Plans WHERE End_Date IS NULL)
GROUP BY Start_Date, Plan, User
UNION
SELECT User,Plan,Start_Date FROM Plans WHERE End_Date IS NULL
I'm not a SQL guru. consider this just as a suggestion.
Hope this helps.

Does this work?
SELECT U.user
,(SELECT Plan FROM t WHERE t.user=u.user AND end_date IS NULL LIMIT 1) AS Current_Plan
,(SELECT Plan FROM t WHERE t.user=u.user AND end_date IS NOT NULL ORDER BY end_date DESC LIMIT 1) AS Last_Plan
FROM
( SELECT DISTINCT USER FROM t ) AS U
If it is slow, please send us the EXPLAIN output for the query.

How about this?
select PLAN
from USER_TABLE
where END_DATE is null or END_DATE = (
select max(END_DATE)
from USER_TABLE
where USER = 1 and END_DATE is not null)
and USER = 1

I suggest the following :
with t as
(select 1 as col_id, 1 as USER_id, 'A' as PLAN , 20110101 as START_DATE, NULL as END_DATE from dual union all
select 2,1,'B', 20100101,20101231 from dual union all
select 3,2,'A', 20100102,20100505 from dual union all
select 4,2,'C', 20100101,20100102 from dual)
--
SELECT user_id, plan
FROM (SELECT user_id,
plan,
MAX(nvl(END_DATE, 99999999)) over(PARTITION BY user_id) max_date,
nvl(END_DATE, 99999999) END_DATE
FROM t)
WHERE max_date = end_date

Related

SQL - How to select latest available record for each date in a given range

I have a table (DATA_RECORDS) in a database which contains multiple records for the same date, but at different times, running from 2015-2018. What I am trying to do is select all records within a given date range and then select the latest record available for each date. The current code I have in SQL is:
SELECT NAME, DATE_LOADED, R_ID
FROM DATA_RECORDS
WHERE ((DATE_LOADED>=to_date('01/12/2018 00:00:00', 'dd/mm/yyyy HH24:MI:SS'))
AND (DATE_LOADED<=to_date('31/12/2018 23:59:59', 'dd/mm/yyyy HH24:MI:SS')))
ORDER BY DATE_LOADED DESC;
Where the column names are 'NAME','DATE_LOADED' and 'R_ID'.
The above gives the following results:
NAME |DATE_LOADED |R_ID
-------------------------------------
RECORD_1 |31/12/2018 17:36:38 |1234
RECORD_2 |31/12/2018 10:15:11 |1235
RECORD_3 |30/12/2018 16:45:23 |1236
RECORD_4 |30/12/2018 09:06:54 |1237
RECORD_5 |30/12/2018 07:53:30 |1238
etc... As you can see, there is also not a consistent number of uploads per day.
What I want is to select
NAME |DATE_LOADED |R_ID
-------------------------------------
RECORD_1 |31/12/2018 17:36:38 |1234
RECORD_3 |30/12/2018 16:45:23 |1236
I'm very new to SQL so any help would be appreciated.
N.B: I'm using Oracle SQL Developer and I only have read-only access to the database so I cannot create any new tables or modify the current table.

I would write this logic as:
SELECT NAME, DATE_LOADED, R_ID
FROM DATA_RECORDS
WHERE DATE_LOADED >= DATE '2018-01-12' AND
DATE_LODED < DATE '2018-12-31'
ORDER BY DATE_LOADED DESC;
Then a simple method is ROW_NUMBER() -- along with extracting only the date from the date/time value:
SELECT NAME, DATE_LOADED, R_ID
FROM (SELECT NAME, DATE_LOADED, R_ID ,
ROW_NUMBER() OVER (PARTITION BY TRUNC(DATE_LOADED) ORDER BY DATE_LOADED DESC) as seqnum
FROM DATA_RECORDS
WHERE DATE_LOADED >= DATE '2018-01-12' AND
DATE_LODED < DATE '2018-12-31'
) dr
WHERE seqnum = 1
ORDER BY DATE_LOADED DESC;

YOu can use correlated subquery
select * from tablename a where date in
(select max(DATE_LOADED) from tablename b where cast(a.DATE_LOADED as date)=cast(b.DATE_LOADED as date)) and
((DATE_LOADED>=to_date('01/12/2018 00:00:00', 'dd/mm/yyyy HH24:MI:SS'))
AND (DATE_LOADED<=to_date('31/12/2018 23:59:59', 'dd/mm/yyyy HH24:MI:SS')))

union all in SQL (Postgres) mess the order

I have a query which is order by date , there is the query I have simplified it a bit but basically is :
select * from
(select start_date, to_char(end_date,'YYYY-mm-dd') as end_date from date_table
order by start_date ,end_date )
where start_date is null or end_date is null
It shows prefect order
but I add
union all
select start_date, 'single missing day' as end_date from
calendar_dates
where db_date>'2017-12-12' and db_date<'2018-05-13'
Then the whole order messed up. Why is that happened? Union or union all should just append the dataset from first query with the second, right? It should not mess the order in the first query, right?
I know this query doesn't makes any sense, but I have simplified it to
show the syntax.

You can't predict what would be the order outcome by just assuming that UNION ALL will append queries in the order you write them.
The query planner will execute your queries in whatever order it sees it fit. That's why you have the ORDER BY clause. Use it !
For example, if you want to force the order of the first query, then the second, do :
select * from
(select 1, start_date, to_char(end_date,'YYYY-mm-dd') as end_date from date_table
order by start_date ,end_date )
where start_date is null or end_date is null
union all
select 2, start_date, 'single missing day' as end_date from
calendar_dates
where db_date>'2017-12-12' and db_date<'2018-05-13'
ORDER BY 1

You are mistaken. This query:
select d.*
from (select start_date, to_char(end_date,'YYYY-mm-dd') as end_date
from date_table
order by start_date, end_date
) d
where start_date is null or end_date is null
does not "show perfect order". I might just happen to produce the ordering that you want, but that is a coincidence. The only way to get results in a particular order is to use ORDER BY in the outermost SELECT. Period.
So, if you want results in a particular order, then use order by:
select d.*
from ((select d.start_date, to_char(end_date, 'YYYY-mm-dd') as end_date, 1 as ord
from date_table d
where d.start_date is null or d.end_date is null
order by start_date, end_date
) union all
(select cd.start_date, 'single missing day' as end_date, 2 as ord
from calendar_dates cd
where cd.db_date > '2017-12-12' and cd.db_date < '2018-05-13'
)
) d
order by ord, start_date;

UNION or UNION ALL will mess up the order in the first SELECT. Therefore, we can make a trick that we will re-order these columns in the Outer Select as below:
SELECT * FROM
(
select colA, colB
From TableA
-- ORDER BY colA, colB --
UNION ALL
select colC, colD
FROM TableB
ORDER BY colC, colD
) tb
ORDER BY colA, colB

SQL- how to retrieve by similar dates

Okay, so I have a table with a user_id column and a submitted_dtm column.
I want to find instances where users submitted multiple records within 1 day of each other, and count how many times that has happened.
I've tried something like
select * from table_t t where
(select count(*) from table_t t2 where
t.user_id = t2.user_id and
t.pk!=t2.pk and
t.submitted_dtm between t2.submitted_dtm-.5 and t2.submitted_dtm+.5)>0;
The problem is that this query returns a result for each record in a date group. Instead, I just want a result per date group. Ideally, I'd just get the count in that group.
That is, if I have 6 records:
user_id submitted_dtm
--------------------------
1 12/04/2017 1:15
1 12/04/2017 5:50
2 11/25/2017 2:00
2 11/25/2017 3:25
2 11/25/2017 6:05
2 10/06/2017 4:00
I want 2 results, a count of 2 and a count of 3.
Is it possible to do this in sql?

Following up on Dessma's answer.
select user_id, trunc(submitted_dtm), count(1)
from table_t
group by user_id, trunc(submitted_dtm)
having count(1) > 1;
Sqlfiddle

In Oracle 12.1 and higher, you can solve such problems easily with the match_recognize clause. Link to documentation (with examples) below; my only note about the solution below is that I left the date in DATE data type, especially important if the output is used in further computations. If it isn't, you can wrap within TO_CHAR() with whatever format model is appropriate for your users.
https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
with
inputs ( user_id, submitted_dtm ) as (
select 1, to_date('12/04/2017 1:15', 'mm/dd/yyyy hh24:mi') from dual union all
select 1, to_date('12/04/2017 5:50', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 2:00', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 3:25', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 6:05', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('10/06/2017 4:00', 'mm/dd/yyyy hh24:mi') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins below this line. Use your actual table and column names.
select user_id, submitted_dtm, cnt
from inputs
match_recognize(
partition by user_id
order by submitted_dtm
measures trunc(a.submitted_dtm) as submitted_dtm,
count(*) as cnt
pattern ( a b+ )
define b as trunc(submitted_dtm) = trunc(a.submitted_dtm)
);
USER_ID SUBMITTED_DTM CNT
---------- ------------------- ----------
1 2017-12-04 00:00:00 2
2 2017-11-25 00:00:00 3

I don't have data to test it but I suspect something like this would do the trick :
SELECT user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy'), COUNT(*)
FROM table_t t
INNER JOIN table_t t2
ON t.user_id = t2.user_id
AND t.pk != t2.pk
AND t.submitted_dtm BETWEEN t2.submitted_dtm - .5 AND
t2.submitted_dtm + .5
GROUP BY user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy')
HAVING COUNT(*) > 1

This is a general idea of how to get the instances.
select user_id, t1.submitted_dtm t1submitted, t2.submitted_dtm t2submtted
from table_t t1 join table_t t2 using (user_id)
where t2.submitted_dtm > t1.submitted_dtm
and t2.submitted_dtm - t1.submitted_dtm <= 1;
The last line could be modified somehow depending on what you mean by within a day.
To count the instances, create a derived table from the above and select count(*) from it.

Is it possible to combine these two queries together other than using temp tables?

Query 1:
SELECT MAX(START_DATE) AS HIGHEST_DT
FROM T;
Query 2:
SELECT
START_DATE AS LOWER_DT
FROM T
WHERE END_DATE = HIGHEST_DT;
I'm hoping to get something like
START_DATE HIGHEST_DT

So, it seems you have a table t with two columns, start_date and end_date (and maybe more columns); you want to find the most recent (max) start_date, and then to find all the rows where the end_date is equal to this max(start_date), right?
One way is (not tested since you didn't provide test data):
select start_date as lower_dt, highest_dt
from (select start_date, end_date, max(start_date) over () as highest_dt
from t)
where end_date = highest_dt;

Here is one way using Sub-Query
SELECT
START_DATE AS LOWER_DT,END_DATE as HIGHEST_DT
FROM T
WHERE END_DATE = (SELECT MAX(START_DATE) FROM T)

I need a little help to optimize

I want to optimize this query, but only using index, hints, clusters and pctfree and pctused. Thanks.
WITH
A AS (SELECT SSN from contracts where (end_date is null or end_date>sysdate)),
B AS (SELECT SSN,start_date, NVL(end_date,sysdate) finish,
(NVL(end_date,sysdate)-start_date) length
FROM CONTRACTS NATURAL JOIN A)
SELECT SSN
FROM B
GROUP BY SSN HAVING (Max(finish)-MIN(start_date)) > SUM(length)

You should be able to get rid of the join by using an analytic query:
SELECT SSN
FROM (
SELECT SSN,
start_date,
NVL( end_date, SYSDATE ) finish,
COUNT( CASE WHEN end_date IS NULL OR end_date > SYSDATE THEN 1 END )
OVER ( PARTITION BY SSN ) AS has_invalid_end_date,
FROM contracts
)
WHERE has_invalid_end_date > 0
GROUP BY SSN
HAVING MAX( finish ) - MIN( start_date ) > SUM( finish - start_date );

I think you could just rewrite this as:
with b as (select ssn,
start_date,
nvl(end_date, sysdate) finish_date,
nvl(end_date, sysdate) - start_date duration
from contracts)
select ssn
from b
where end_date is null
or end_date > sysdate
group by ssn
having max(finish_date) - min(start_date) > sum(duration);
You might also benefit from having an index on (ssn, start_date, end_date).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas