Oracle query to fill in the missing data in the same table - sql

I have a table in oracle which has missing data for a given id. I am trying to figure out the sql to fill in the data from start date: 01/01/2019 to end_dt: 10/1/2020. see the input data below. for status key the data can be filled based on its previous status key. see input:
expected output:

You can use a recursive query to generate the dates, then cross join that with the list of distinct ids available in the table. Then, use window functions to bring the missing key values:
with recursive cte (mon) as (
select date '2019-01-01' mon from dual
union all select add_months(mon, 1) from cte where mon < date '2020-10-01'
)
select i.id,
coalesce(
t.status_key,
lead(t.previous_status_key ignore nulls) over(partition by id order by c.mon)
) as status_key,
coalesce(
t.status_key,
lag(t.status_key ignore nulls, 1, -1) over(partition by id order by c.mon)
) previous_status_key,
c.mon
from cte c
cross join (select distinct id from mytable) i
left join mytable t on t.mon = c.mon and t.id = i.id
You did not give a lot of details on how to bring the missing status_keys and previous_status_keys. Here is what the query does:
status_key is taken from the next non-null previous_status_key
previous_status_key is taken from the last non-null status_key, with a default of -1

You can generate the dates and then use cross join and some additional logic to get the information you want:
with dates (mon) as (
select date '2019-01-01' as mon
from dual
union all
select mon + interval '1' month
from dates
where mon < date '2021-01-01'
)
select d.mon, i.id,
coalesce(t.status_key,
lag(t.status_key ignore nulls) over (partition by i.id order by d.mon)
) as status_key,
coalesce(t.previous_status_key,
lag(t.previous_status_key ignore nulls) over (partition by i.id order by d.mon)
) as previous_status_key
from dates d cross join
(select distinct id from t) i left join
t
on d.mon = t.mon and i.id = i.id;

Related

Windows functions orderen by date when some dates doesn't exist

Suppose this example query:
select
id
, date
, sum(var) over (partition by id order by date rows 30 preceding) as roll_sum
from tab
When some dates are not present on date column the window will not consider the unexistent dates. How could i make this windowns aggregation including these unexistent dates?
Many thanks!
You can join a sequence containing all dates from a desired interval.
select
*
from (
select
d.date,
q.id,
q.roll_sum
from unnest(sequence(date '2000-01-01', date '2030-12-31')) d
left join ( your_query ) q on q.date = d.date
) v
where v.date > (select min(my_date) from tab2)
and v.date < (select max(my_date) from tab2)
In standard SQL, you would typically use a window range specification, like:
select
id,
date,
sum(var) over (
partition by id
order by date
range interval '30' day preceding
) as roll_sum
from tab
However I am unsure that Presto supports this syntax. You can resort a correlated subquery instead:
select
id,
date,
(
select sum(var)
from tab t1
where
t1.id = t.id
and t1.date >= t.date - interval '30' day
and t1.date <= t.date
) roll_sum
from tab t
I don't think Presto support window functions with interval ranges. Alas. There is an old fashioned way to doing this, by counting "ins" and "outs" of values:
with t as (
select id, date, var, 1 as is_orig
from t
union all
select id, date + interval '30 day', -var, 0
from t
)
select id.*
from (select id, date, sum(var) over (partition by id order by date) as running_30,
sum(is_org) as is_orig
from t
group by id, date
) id
where is_orig > 0

How to fill missing dates between empty records?

I am trying to fill dates between empty records but without success. Tried to do multiple selects method, tried to join, but it seems like I am missing the point. I would like to generate records with missing dates, to generate chart from this block of code. Firstly I would like to have dates filled "manually", later I will reorganise this code and swap that method for an argument.
Can someone help me with that expression?
SELECT
LOG_LAST AS "data",
SUM(run_cnt) AS "Number of runs"
FROM
dual l
LEFT OUTER JOIN "LOG_STAT" stat ON
stat."LOG_LAST" = l."CLASS"
WHERE
new_class = '$arg[klasa]'
--SELECT to_date(TRUNC (SYSDATE - ROWNUM), 'DD-MM-YYYY'),
--0
--FROM dual CONNECT BY ROWNUM < 366
GROUP BY
LOG_LAST
ORDER BY
LOG_LAST
//Edit:
LOG_LAST is just a column with date (for example: 25.04.2018 15:44:21), run_cnt is a column with just a simple number, LOG_STAT is a table that contains LOG_LAST and run_cnt, new_class is a column with name of the record I would like to list records even when they are no existing. For example: I have a records with date 24-09-2018, 23-09-2018, 20-09-2018, 18-09-2018, and I would like to list records even without names and run_cnt, but to generate missing dates in some period
try to fill with isnull:
SELECT
case when trim(LOG_LAST) is null then '01-01-2018'
else isnull(LOG_LAST,'01-01-2018')end AS data,
SUM(isnull(run_cnt,0)) AS "Number of runs"
FROM
dual l
LEFT OUTER JOIN "LOG_STAT" stat ON
stat."LOG_LAST" = l."CLASS"
WHERE
new_class = '$arg[klasa]'
--SELECT to_date(TRUNC (SYSDATE - ROWNUM), 'DD-MM-YYYY'),
--0
--FROM dual CONNECT BY ROWNUM < 366
GROUP BY
LOG_LAST
ORDER BY
LOG_LAST
What you want is more or less:
select d.day, sum(ls.run_cnt)
from all_dates d
left join log_stat ls on trunc(ls.log_last) = d.day
where ls.new_class = :klasa
group by d.day
order by d.day;
The all_dates table in above query is supposed to contain all dates beginning with the minimum klasa log_last date and ending with the maximum klasa log_last date. You get these dates with a recursive query.
with ls as
(
select trunc(log_last) as day, sum(run_cnt) as total
from log_stat
where new_class = :klasa
group by trunc(log_last)
)
, all_dates(day) as
(
select min(day) from ls
union all
select day + 1 from all_dates where day < (select max(day) from ls)
)
select d.day, ls.total
from all_dates d
left join ls on ls.day = d.day
order by d.day;
It's called data densification. From oracle doc Data Densification for Reporting, An example data densification
with ls as
(
select trunc(created) as day,object_type new_class, sum(1) as total
from user_objects
group by trunc(created),object_type
)
, all_dates(day) as
(
select min(day) from ls
union all
select day + 1 from all_dates where day < (select max(day) from ls)
)
select d.day, nvl(ls.total,0),new_class
from all_dates d
left join ls partition by (ls.new_class) on ls.day = d.day
order by d.day;

Want a SQL statement to count number

I have a table have 3 columns id, open_time, close_time, the data looks like this:
then I want a SQL to get result like this:
the rule is : if the date equals to open time then New, if the date > open_time and date < close_time then Open, if the date equals close_time then Closed
how can I write the SQL in Oracle?
First build a table on-the-fly containing all dates from the minimum date in the table until today. You need a recursive query for this.
Then build a table on-the-fly for the three statuses.
Now cross join the two to get all combinations. These are the rows you want.
The rest is counting per day and status, which can be achieved with a join and grouping or with one or more subqueries. I'm showing the join:
with days(day) as
(
select min(open_time) as day from opentimes
union all
select day + 1 from days where day < trunc(sysdate)
)
, statuses as
(
select 'New' as status, 1 as sortkey from dual
union all
select 'Open' as status, 2 as sortkey from dual
union all
select 'Close' as status, 3 as sortkey from dual
)
select
d.day,
s.status,
count(case when (s.status = 'New' and d.day = o.open_time)
or (s.status = 'Open' and d.day = o.close_time)
or (s.status = 'Close' and d.day > cls.open_time and d.day < cls.close_time)
then 1 end) as cnt
from days d
cross join statuses s
join opentimes o on d.day between o.open_time and o.close_time
group by d.day, s.status
order by d.day, max(s.sortkey);

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];

Hits per day in Google Big Query

I am using Google Big Query to find hits per day. Here is my query,
SELECT COUNT(*) AS Key,
DATE(EventDateUtc) AS Value
FROM [myDataSet.myTable]
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
This is working fine but it ignores the date with 0 hits. I wanna include this. I cannot create temp table in Google Big Query. How to fix this.
Tested getting error Field 'day' not found.
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (
select date(date_add(day, i, "DAY")) day
from (select '2015-05-01 00:00' day) a
cross join
(select
position(
split(
rpad('', datediff(CURRENT_TIMESTAMP(),'2015-05-01 00:00')*2, 'a,'))) i
from (select NULL)) b
) d
left join [sample_data.requests] t on d.day = t.day
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
You can query data that exists in your tables, the query cannot guess which dates are missing from your table. This problem you need to handle either in your programming language, or you could join with a numbers table and generates the dates on the fly.
If you know the date range you have in your query, you can generate the days:
select date(date_add(day, i, "DAY")) day
from (select '2015-01-01' day) a
cross join
(select
position(
split(
rpad('', datediff('2015-01-15','2015-01-01')*2, 'a,'))) i
from (select NULL)) b;
Then you can join this result with your query table:
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (...the.above.query.pasted.here...) d
left join [myDataSet.myTable] t on d.day = t.day
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;