How to select within the closest thing to Y in SAS as a starting point - sql

So I have a dataset where I want to select the closest records to point X for my output,
What I have is
PROC SQL ;
create table Check_vs_Excel2 as
SELECT PROPERTY, START_DATE, END_DATE, DAY_OF_WEEK, MARKET_CODE_PREFIX, RATE_PGM, ROOM_POOL, QUOTE_SERIES_NO, QUOTE_POSITION
FROM Sbtddraf.Vssmauditdraftfull
group by Property, RATE_PGM
having START_DATE = MAX(START_DATE);
quit;
I want to take the START_DATE = Max(Start_DATE); and change it to something which is (effectively)
having START_DATE = close to(TODAY())
Advice would be much appreciated

In SQL your query would be using a Correlated Subquery:
SELECT PROPERTY, START_DATE, END_DATE, DAY_OF_WEEK, MARKET_CODE_PREFIX, RATE_PGM, ROOM_POOL, QUOTE_SERIES_NO, QUOTE_POSITION
FROM Sbtddraf.Vssmauditdraftfull AS t
-- group by Property, RATE_PGM
WHERE START_DATE =
( select MAX(START_DATE)
FROM Sbtddraf.Vssmauditdraftfull AS t2
where t1.Property = t2.Property
and t1.RATE_PGM = t2.RATE_PGM
)

how close is 'close to'? This might be what you are looking for
where START_DATE between TODAY() and DATE_ADD(TODAY(),INTERVAL 30 DAY)

Assuming I understand that you want the row that has the minimum absolute difference between start_date and today() (so, MIN(ABS(START_DATE-TODAY()))), you can do a somewhat messy query using the having clause this way:
data have;
do id = 2 to 9;
do start_date = '02MAR2016'd to '31MAR2016'd by id;
output;
end;
end;
run;
proc sql;
select id, start_date format=date9.
from have
group by id
having abs(start_date-today()) = min(abs(start_date-today()));
quit;
I don't like this in part because it's non-standard SQL and gives a note about re-merging data (it's non-standard and gives you that note because you're using a value that's not really available in a group by), and in part because it gives you multiple rows if two are tied (see id=4 if you run this on 3/16/2016).
A correlated subquery version, which at least avoids the remerging note (but actually does effectively the same thing):
proc sql;
select id, start_date format=date9.
from have H
where abs(start_date-today()) = (
select min(abs(start_date-today()))
from have V
where H.id=V.id
);
quit;
Still gives two for id=4 though (on 3/16/2016). You'd have to make a way to pick if there are possibly two answers (or perhaps you want strictly less than?). This does a subquery to determine what the smallest difference is then returns it.

Related

tdate issue I'm facing in SQL query

While fetching count from table by using following query
Select count(*)
from tab
where tdate = '17-05-19' ---> output 0
or
Select count(*)
from tab
where trunc(tdate) = '17-05-19' ---->output 0
If I use:
Select count(*)
from tab
where tdate >sysdate - 1 ---> it returns some count(yesterday+some of the today txn)
But here I want only yesterday txn whenever I fire this query.
But here I want only yesterday txn whenever I fire this query.
You may use this.
Select count (*) from tab where
tdate >= TRUNC(SYSDATE) - 1
AND tdate < TRUNC(SYSDATE)
The advantage of this over using TRUNC on the date column is that it will utilize an index if it exists over tdate
If you tried by using
Select count(*) from tab where trunc(tdate) = date'2019-05-17'
(or, you could use
Select count(*) from tab where to_char(tdate,'dd-mm-yy') = '17-05-19' by formatting through to_char function
or, you could use
Select count(*) from tab where trunc(tdate) = trunc(sysdate)-1 to get only the data for the day before
)
you'd get some results provided you have data for the date 17th May.
So, you need to provide a formatting for your literal as date'2019-05-17'(known as date literal) especially for Oracle DB, it might be used as '2019-05-17' without date part in MySQL as an example.
Btw, trunc function is used to extract the date portion, and remove the time part of a date type column value.
If your table is populated with huge data, therefore performance may matter, then you can even create functional index on trunc(tdate).
Demo

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

Creation of Oracle index date column for Oracle

SELECT FILE_SUB_RET_DATE_TIME
FROM
(SELECT Y.FILE_SUB_RET_DATE_TIME,
ROW_NUMBER() OVER (partition by Y.WR_FILE_TRANS_INFO_ID order by Y.FILE_SUB_RET_DATE_TIME DESC) rowByID
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN '11-DEC-2015' AND '03-FEB-2017')
WHERE rowByID = 1;
I got some performance issue and we need to add the index for this date column and i am looking for help whether its going to be straight index or any thing more than that..
You should not use STRINGS when you compare with DATE values, because it depends on current session NLS-Settings. Use DATE literal or TO_DATE(), functions (resp. TIMESTAMP and TO_TIMESTAMP).
It depends on your data whether Oracle will use an index on FILE_SUB_RET_DATE_TIME column, post the execution plan.
I don't think subquery is required in your case, this query should return the same result.
SELECT Max(FILE_SUB_RET_DATE_TIME)
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN DATE '2015-12-11' AND DATE '2017-02-03'
GROUP BY WR_FILE_TRANS_INFO_ID;

Remove overlapping days And arrange dates in SQL

I have a table with the following data
Start End
===== ===
12/21/2011 12/20/2012
05/05/2012 10/20/2013
12/21/2012 12/20/2013
12/21/2013 12/20/2014
12/21/2014 12/20/2015
And want to get the following results
Start End
===== ===
12/21/2011 05/04/2012
05/05/2012 10/20/2013
10/21/2013 12/20/2013
12/21/2013 12/20/2014
12/21/2014 12/20/2015
Any ideas on where to start? A lot of the reading I've done suggests I need to create entries and for each single day once and remove overlapping days and manage date accordingly. is this the only way?
I think that this kind of problem is better solved in some sort of procedural approach, also for the sake of readability. Nevertheless, for fun, I figured out an SQL statement that does the trick with the aid of the rownum statement (in Oracle syntax, as I had no sql-server database at hand):
Let's assume your table is called DATE_TABLE with columns START_DATE and END_DATE. Then the statement is as follows:
select start_date,
coalesce(
(select case when tbl_inner.start_date < tbl_outer.end_date
then tbl_inner.start_date - 1
else tbl_outer.end_date end
from (select rownum row_num, start_date, end_date from date_table order by 1) tbl_inner
where tbl_inner.row_num = tbl_outer.row_num + 1),
tbl_outer.end_date)
from (select rownum row_num, start_date, end_date from date_table order by 1) tbl_outer;
The inner select provides the rows of the table DATE_TABLE with row numbers that can be referenced by the outer select. Without the COALESCE clause, the statement would not work for the last row in the DATE_TABLE.
I presume that the statement does not scale too well.

SQL merging result sets on a unique column value

I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.