Table left join with itself returning NULL - sql

I have a table like
Tdate Symbol new_close
20100110 xxx 1.2
20100111 xxx 1.3
...
20100110 yyy 1.1
20100111 yyy 1.5
Where Tdate was stored as integer and updated per day for each symbol. I want generate a new table from it by subtracting new_close value to its previous one, which looks like this
Tdate Symbol delta
20100110 xxx =1.2-1.2
20100111 xxx =1.3-1.2
...
20100110 yyy =1.1-1.1
20100111 yyy =1.5-1.1
Here is my code
with delta as
( select a.Tdate as TDate, a.Symbol as Symbol,
a.new_close-b.new_close as Pdelta, b.new_close as oldPrice
from ctsWithSplit a left join ctsWithSplit b
on a.TDate-b.TDate=1 and a.Symbol=b.Symbol)
However, in the new generated table, some delta value is NULL, how to fix it please?

Why is TDATE being stored as a number? Your query will fail to find the previous day when the date is the first of the month - eg: 20150201 - 20150131 = 70, whereas there's only a day between 31st Jan and 1st Feb.
Store dates as DATE or TIMESTAMP datatype, and then you give Oracle a chance at getting the date arithmetic correct.
Perhaps you're after something like:
with sample_data as (select to_date('10/01/2010', 'dd/mm/yyyy hh24:mi:ss') tdate, 'xxx' symbol, 1.2 new_close from dual union all
select to_date('11/01/2010', 'dd/mm/yyyy hh24:mi:ss') tdate, 'xxx' symbol, 1.3 new_close from dual union all
select to_date('10/01/2010', 'dd/mm/yyyy hh24:mi:ss') tdate, 'yyy' symbol, 1.1 new_close from dual union all
select to_date('11/01/2010', 'dd/mm/yyyy hh24:mi:ss') tdate, 'yyy' symbol, 1.5 new_close from dual)
select tdate,
symbol,
new_close - lag(new_close, 1, new_close) over (partition by symbol order by tdate) delta
from sample_data;
TDATE SYMBOL DELTA
---------- ------ -----
10/01/2010 xxx 0.0
11/01/2010 xxx 0.1
10/01/2010 yyy 0.0
11/01/2010 yyy 0.4
If you've never worked with analytic functions, then I suggest you look them up - they're incredibly useful and very powerful.
N.B. If you can't convert your TDATE column to be DATE datatype, then you need to convert the column into a date in any queries you run by using to_date().

When your left join doesn't find a matching row in b (i.e. on the first Tdate, or where there isn't a Tdate for the preceding day), then b.new_close will be NULL, as will the result of the subtraction.
Try:
select
a.Tdate as TDate,
a.Symbol as Symbol,
CASE WHEN b.new_close IS NULL THEN 0 ELSE a.new_close-b.new_close END as Pdelta,
b.new_close as oldPrice
from ctsWithSplit a
left join ctsWithSplit b
on a.TDate-b.TDate=1
and a.Symbol=b.Symbol

Related

Why group by date is returning multiple rows for the same date?

I have a query like the following.
select some_date_col, count(*) as cnt
from <the table>
group by some_date_col
I get something like that at the output.
13-12-2021, 6
13-12-2021, 8
13-12-2021, 9
....
How is that possible? Here some_date_col is of type Date.
A DATE is a binary data-type that is composed of 7 bytes (century, year-of-century, month, day, hour, minute and second) and will always have those components.
The user interface you use to access the database can choose to display some or all of those components of the binary representation of the DATE; however, regardless of whether or not they are displayed by the UI, all the components are always stored in the database and used in comparisons in queries.
When you GROUP BY a date data-type you aggregate values that have identical values down to an accuracy of a second (regardless of the accuracy the user interface).
So, if you have the data:
CREATE TABLE the_table (some_date_col) AS
SELECT DATE '2021-12-13' FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' SECOND FROM DUAL CONNECT BY LEVEL <= 8 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' MINUTE FROM DUAL CONNECT BY LEVEL <= 9;
Then the query:
SELECT TO_CHAR(some_date_col, 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY some_date_col;
Will output:
SOME_DATE_COL
CNT
2021-12-13 00:01:00
9
2021-12-13 00:00:01
8
2021-12-13 00:00:00
6
The values are grouped according to equal values (down to the maximum precision stored in the date).
If you want to GROUP BY dates with the same date component but any time component then use the TRUNCate function (which returns a value with the same date component but the time component set to midnight):
SELECT TRUNC(some_date_col) AS some_date_col,
count(*) as cnt
FROM <the table>
GROUP BY TRUNC(some_date_col)
Which, for the same data outputs:
SOME_DATE_COL
CNT
13-DEC-21
23
And:
SELECT TO_CHAR(TRUNC(some_date_col), 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY TRUNC(some_date_col)
Outputs:
SOME_DATE_COL
CNT
2021-12-13 00:00:00
23
db<>fiddle here
Oracle date type holds a date and time component. If the time components do not match, grouping by that value will place the same date (with different times) in different groups:
The fiddle
CREATE TABLE test ( xdate date );
INSERT INTO test VALUES (current_date);
INSERT INTO test VALUES (current_date + INTERVAL '1' MINUTE);
With the default display format:
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
Result:
XDATE
COUNT(*)
13-DEC-21
1
13-DEC-21
1
Now alter the format and rerun:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MON-DD HH24:MI:SS';
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
The result
XDATE
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1
Also try this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted FROM test;
Result:
FORMATTED
2021-DEC-13 23:29:36
2021-DEC-13 23:30:36
and this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted, COUNT(*) FROM test GROUP BY xdate;
Result:
FORMATTED
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1

Date difference not matching

I have the below query which gives the below output. The '06/01/20' is some corrupted data I need to deal with. Converting it to 'DD/MM/YY' is not an option, I just would like to understand what's happening here.
WITH aux (
d1,
d2
) AS (
SELECT
'06/01/20',
'15/01/2021'
FROM
dual
)
SELECT
nvl(to_date(d2, 'DD/MM/YYYY'), sysdate) - to_date(d1, 'DD/MM/YYYY') diff
FROM
aux;
Output:
DIFF
----
730862
However, if I do the below, the results do not match with what I would expect, the difference between those dates would be
SELECT
TO_DATE('06/01/20', 'DD/MM/YYYY') d1,
TO_DATE('15/01/2021', 'DD/MM/YYYY') d2
FROM
dual
Output:
D1 D2
---------------------
06-JAN-20 15-JAN-21
--
SELECT
DATE '2021-01-15' - DATE '2020-01-06' d
FROM
dual
Output:
D
-
9
I suggest you set a default date format that allows to see full dates. Many Oracle clients have such settings and you can also change it for current session:
ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD';
When you do so you'll realise that TO_DATE('06/01/20', 'DD/MM/YYYY') produces 0020-01-06 rather than 2020-01-06.
An average year has 365.25 days and 730862 / 365,25 equals 2001 ;-)

BETWEEN Two Dates is Returning Rows of the Same Two Dates instead of the Ones Between

I'm trying to select rows that are between two specific dates, but I'm getting the rows that are of the same dates specified in BETWEEN instead. I tried using operators > and <, but nothing seems to work. Does it have to do with the date format?
SELECT r.RESERVATION_ID, a.AGENT_ID, a.AGENT_FNAME AS AGENT_NAME, t.TRIP_ID,
s.RESERVATION_STATUS
FROM RESERVATION r
INNER JOIN AGENT a
ON
a.AGENT_ID=r.AGENT_ID
INNER JOIN TOURTRIP_RESERVATION t
ON
r.RESERVATION_ID=t.RESERVATION_ID
INNER JOIN RESERVATION_STATUS s
ON
r.RESERVATION_STATUSID=s.RESERVATION_STATUSID
WHERE r.AGENT_ID IS NOT NULL
AND r.RESERVATION_DATE BETWEEN '15-MAR-20' AND '26-MAY-20'
AND r.RESERVATION_STATUSID=100;
I used to_date('03.06.2020','DD.MM.YYYY') format to update the data in the reservation_id column. However, when I use
and RESERVATION_DATE > to_date('15.03.2020','DD.MM.YYYY')
and RESERVATION_DATE < to_date('26.05.2020','DD.MM.YYYY')
it's returning nothing
This is the reservation table
The condition between includes the start and the end.
You can use the operators > and < to exclude it.
Your example simplified:
create table RESERVATION (
RESERVATION_ID number
,RESERVATION_DATE date)
Data:
insert into RESERVATION values (1, to_date('01.06.2020','DD.MM.YYYY'));
insert into RESERVATION values (2, to_date('02.06.2020','DD.MM.YYYY'));
insert into RESERVATION values (3, to_date('03.06.2020','DD.MM.YYYY'));
Query A)
select RESERVATION_ID
from RESERVATION
where RESERVATION_DATE between to_date('01.06.2020','DD.MM.YYYY')
and to_date('03.06.2020','DD.MM.YYYY')
will return:
1
2
3
The Query B)
select RESERVATION_ID
from RESERVATION
where RESERVATION_DATE > to_date('01.06.2020','DD.MM.YYYY')
and RESERVATION_DATE < to_date('03.06.2020','DD.MM.YYYY')
will return
2
I assume that RESERVATION_DATE is a date. Use always explicit datatype conversion.
I changed the date format, try this and tell me if you have other result.
SELECT r.RESERVATION_ID, a.AGENT_ID, a.AGENT_FNAME AS AGENT_NAME, t.TRIP_ID,
s.RESERVATION_STATUS
FROM RESERVATION r
INNER JOIN AGENT a
ON
a.AGENT_ID=r.AGENT_ID
INNER JOIN TOURTRIP_RESERVATION t
ON
r.RESERVATION_ID=t.RESERVATION_ID
INNER JOIN RESERVATION_STATUS s
ON
r.RESERVATION_STATUSID=s.RESERVATION_STATUSID
WHERE r.AGENT_ID IS NOT NULL
AND r.RESERVATION_DATE BETWEEN '2020-03-15' AND '2020-05-26'
AND r.RESERVATION_STATUSID=100;
If RESERVATION_DATE column's datatype is DATE, don't compare it to strings, because '15-MAR-20' is a string. Oracle will try to implicitly convert it to a valid date value; sometimes it'll succeed, sometimes it'll return false value (as its NLS settings differ from what you provided), and sometimes it'll fail and raise an error.
These two:
date '2020-03-15'
to_date('15.03.2020', 'dd.mm.yyyy')
on the other hand, are dates.
BETWEEN is inclusive and will return both limits, if they exist:
AND r.RESERVATION_DATE BETWEEN date '2020-03-15' and date '2020-05-26'
If you want to exclude those limits, then
AND r.RESERVATION_DATE > date '2020-03-15'
AND r.RESERVATION_DATE < date '2020-05-26'
But, if RESERVATION_DATE is a VARCHAR2 column, you're doing a big mistake. Never store dates as strings. If you can't afford modifying datatype, then you'll have to convert it to date:
and to_date(r.reservation_date, 'dd-mon-yy') between date '2020-03-15'
and date '2020-05-26'
This will work as long as there aren't any invalid values in that column. Because, as it is a string, you can put something like 15-AA-2F which certainly isn't a date, but can be stored into such a column. In that case, query will fail and you'll have to fix data.
Comment you posted says that you tried TO_DATE('26-MAY-20'). That's not enough - you should provide format mask as that can be
26th of May 2020
20th of May 2026
and it depends on NLS settings. Furthermore, it'll fail in my database:
Because NLS settings are different:
SQL> select to_date('26-may-20') from dual;
select to_date('26-may-20') from dual
*
ERROR at line 1:
ORA-01858: a non-numeric character was found where a numeric was expected
Because we don't have "may" in Croatia:
SQL> select to_date('26-may-20', 'dd-mon-yy') from dual;
select to_date('26-may-20', 'dd-mon-yy') from dual
*
ERROR at line 1:
ORA-01843: not a valid month
But this works, as I told Oracle what I want:
SQL> select to_date('26-may-20', 'dd-mon-yy', 'nls_date_language = english') from dual;
TO_DATE('26-MAY-20'
-------------------
26.05.2020 00:00:00
Even better, use digits only or date literal (which is always a date 'yyyy-mm-dd'):
SQL> select to_date('26.05.2020', 'dd.mm.yyyy') d1,
2 date '2020-05-26' d2
3 from dual;
D1 D2
------------------- -------------------
26.05.2020 00:00:00 26.05.2020 00:00:00
SQL>
Based on sample data you provided, presuming that reservation_date column's datatype is date:
setting environment first
sample data is from line #1 - 9
query you need begins at line #10
Here you go:
SQL> alter session set nls_date_language = 'english';
Session altered.
SQL> alter session set nls_Date_format = 'dd-mon-yy';
Session altered.
SQL> with reservation (reservation_id, reservation_date, agent_id) as
2 (select 8576, date '2020-03-15', 222 from dual union all
3 select 7325, date '2020-05-26', 333 from dual union all
4 select 3186, date '2020-04-23', 111 from dual union all
5 select 8000, date '2020-04-05', 555 from dual union all
6 select 4120, date '2020-01-03', null from dual union all
7 select 1546, date '2020-02-15', null from dual union all
8 select 1007, date '2020-05-06', null from dual
9 )
10 select *
11 from reservation
12 where reservation_date between date '2020-03-15'
13 and date '2020-05-26'
14 order by reservation_date;
RESERVATION_ID RESERVATI AGENT_ID
-------------- --------- ----------
8576 15-mar-20 222
8000 05-apr-20 555
3186 23-apr-20 111
1007 06-may-20
7325 26-may-20 333
SQL>

How to grab the hour of the MAX value of a (Day's worth) set of data?

Let's say that I have already calculated the maximum number of fruit sold in a day (stored in the value column.) I need to have the time_of_day (i.e. 3:00 PM) that this maximum value occurs. How would I be able to do that without having to also group by time_of_day (since that would throw the grouping off). Below is an example of what I would start out with:
Value Data_date Name Hour
7 7/17/2018 A 2:00 AM
15 7/17/2018 A 4:00 AM
25 7/17/2018 A 7:00 PM
55 7/18/2018 B 1:00 AM
17 7/18/2018 B 4:00 AM
Below is what I want:
MAX(Value) Data_date Name Hour
25 7/17/2018 A 7:00 PM
55 7/18/2018 B 1:00 AM
Below is what i tried:
select max(value)
, data_date
, name
, hour
from table
group by value, grouping sets(data_date, name), grouping sets(hour, name);
Based off what I've dug up online, I think I will have to group by sets but not exactly sure which sets I need to group on ...
Thanks in advance!
You can do this with:
select max(value) as value,
data_date,
max(name) keep (dense_rank last order by value) as name,
max(hour) keep (dense_rank last order by value) as hour
from your_table
group by data_date;
For each data_date, it gets the highest value using a simple aggregate, and the corresponding name and hour using last().
With your sample data as a CTE, and using the data types you said you are using:
-- cte for sample data
with your_table (value, data_date, name, hour) as (
select 7, '7/17/2018', 'A', timestamp '2018-07-17 02:00:00' from dual
union all select 15, '7/17/2018', 'A', timestamp '2018-07-17 04:00:00' from dual
union all select 25, '7/17/2018', 'A', timestamp '2018-07-17 19:00:00' from dual
union all select 55, '7/18/2018', 'B', timestamp '2018-07-18 01:00:00' from dual
union all select 17, '7/18/2018', 'B', timestamp '2018-07-18 04:00:00' from dual
)
-- actual query
select max(value) as value,
data_date,
max(name) keep (dense_rank last order by value) as name,
to_char(max(hour) keep (dense_rank last order by value), 'HH:MI AM') as hour
from your_table
group by data_date;
VALUE DATA_DATE N HOUR
---------- --------- - --------
25 7/17/2018 A 07:00 PM
55 7/18/2018 B 01:00 AM
If the date part of hour always matches the string value you have for data_date you could use that instead:
select max(value) as value,
to_char(trunc(hour), 'MM/DD/YYYY') as data_date,
max(name) keep (dense_rank last order by value) as name,
to_char(max(hour) keep (dense_rank last order by value), 'HH:MI AM') as hour
from your_table
group by trunc(hour);
VALUE DATA_DATE N HOUR
---------- ---------- - --------
25 07/17/2018 A 07:00 PM
55 07/18/2018 B 01:00 AM
You should also consider what you want to show if the same value appears for more than one hour on a day, which seems feasible. These will show arbitrary matching values, but you could add something to the order by clauses to pick say the earliest or latest matching hour. If you want to show all matches then you'd need a different approach, such as either of #Yogesh's...
You can use :
select t.*
from table t
where value = (select max(t1.value)
from table t1
where t1.name = t.name and
t1.data_date = t.data_date
);
However, you can also use row_number() function :
select t.*
from (select *, row_number () over (partition by name, data_date order by value desc) as seq
from table t
) t
where seq = 1;

PostgreSQL Price for multiple date ranges

I have the following table of rates for given date range.
I want to write a sql query (PostgreSQL) to get the sum of prices for a give period if it's a continuous period..for example:
if I specify 2011-05-02 to 2011-05-09 on the first set the sum of the 6 rows should be returned,
but
if i specify 2011-05-02 to 2011-05-011 on the second set nothing should be returned.
My problem is that I don't know how to determine if a date range is continuous...can you please help? Thanks a lot
case 1: sum expected
price from_date to_date
------ ------------ ------------
1.0 "2011-05-02" "2011-05-02"
2.0 "2011-05-03" "2011-05-03"
3.0 "2011-05-04" "2011-05-05"
4.0 "2011-05-05" "2011-05-06"
5.0 "2011-05-06" "2011-05-07"
4.0 "2011-05-08" "2011-05-09"
case 2: no results expected
price from_date to_date
------ ------------ ------------
1.0 "2011-05-02" "2011-05-02"
2.0 "2011-05-03" "2011-05-03"
3.0 "2011-05-07" "2011-05-09"
4.0 "2011-05-09" "2011-05-011"
I do not have overlapping rates date ranges.
Not sure I understood the question completely, but what about this:
select *
from prices
where not exists (
select 1 from (
select from_date - lag(to_date) over (partition by null order by from_date asc) as days_diff
from prices
where from_date >= DATE '2011-05-01'
and to_date < DATE '2011-07-01'
) t
where coalesce(days_diff, 0) > 1
)
order by from_date
Here's a rather fonky way to solve it :
WITH RECURSIVE t AS (
SELECT * FROM d WHERE '2011-05-02' BETWEEN start_date AND end_date
UNION ALL
SELECT d.* FROM t JOIN d ON (d.key=t.key AND d.start_date=t.end_date+'1 DAY'::INTERVAL)
WHERE d.start_date <= '2011-05-09')
SELECT sum(price), min(start_date), max(end_date)
FROM t
HAVING min(start_date) <= '2011-05-02' AND max(end_date)>= '2011-05-09';
I think you need to combine window functions and CTEs:
WITH
raw_rows AS (
SELECT your_table.*,
lag(to_date) OVER w as prev_date,
lead(from_date) OVER w as next_date
FROM your_table
WHERE ...
WINDOW w as (ORDER by from_date, to_date)
)
SELECT sum(stuff)
FROM raw_rows
HAVING bool_and(prev_date >= from_date - interval '1 day' AND
next_date <= to_date + interval '1 day');
http://www.postgresql.org/docs/9.0/static/tutorial-window.html
http://www.postgresql.org/docs/9.0/static/queries-with.html