Converting monthly to daily data - sql

I have monthly data that I would like to transform to daily data. The data looks like this. The extraction_dt is in date format.
isin
extraction_date
yield
001
2013-01-31
100
001
2013-02-28
110
001
2013-03-31
105
...
...
...
002
2013-01-31
200
...
...
...
And I would like to have something like this
isin
extraction_dt
yield
001
2013-01-01
100
001
2013-01-02
100
001
2013-01-03
100
..
.....
...
001
2013-02-01
110
...
...
...
I tried the following code but it does not work. I get the error message AnalysisException: Could not resolve table reference: 'cte'. How would you convert monthly to daily data?
with cte as
(select isin, extraction_dt, yield
from datashop
union all
select isin, extraction_dt, dateadd(d, 1, extraction_dt) AS date_dt, yield
from cte
where datediff(m,date_dt,dateadd(d, 1, date_dt))=0
)
select isin, date_dt,
1.0*isin / count(*) over (partition by isin, date_dt) AS daily_yield
from cte
order by 1,2

I can suggest easy solution.
generate a date series
match it with your data so it gets repeated.
So, here is the SQL you can use for Impala.
select isin, extraction_dt, a.dt AS date_dt, yield
from
datashop d,
(
select now() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as dt
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
WHERE
from_timestamp(a.dt,'yyyy/MM') =from_timestamp(d.extraction_dt,'yyyy/MM')
order by 1,2,3
the alias a is going to generate a series of dates.
WHERE - this clause will restrict to the month of extraction_dt. and you will get all possible values for a month.
ORDER BY - will show a nice output.

Your WITH clause has a recursive (self-referencing) query. In most SQL dialects, this requires using WITH RECURSIVE, not plain WITH. According to the Impala SQL reference, Impala does not support recursive common table expressions:
The Impala WITH clause does not support recursive queries in the
WITH, which is supported in some other database systems.
In other words, you cannot do this in Impala.

Related

Is there any alternative to use MYSQL's ADDDATE() in ORACLE?

I have this query that needs to be executed for oracle sql instead of mysql which is where it originally came from, but I have the ADDDATE() function which I don't see any other alternative than DateAdd since it needs more parameters than I really need..
Apart from that, if I try to execute it, it also indicates an error in the
SELECT 0 i UNION.................
part, saying the following ORA-00923: FROM keyword not found where expected
Maybe in oracle it is not allowed to do a select 0 union select 1 union...
Any suggestions or help I appreciate it, thanks
SELECT
ADDDATE('1970-01-01', t4.i * 10000 + t3.i * 1000 + t2.i * 100 + t1.i * 10 + t0.i) selected_date
FROM
(
SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) t0,
(
SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) t1,
(
SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) t2,
(
SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) t3,
(
SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) t4
In Oracle you must select from the one-row table dual in order to select one row. You cannot select without a from clause.
If you want to generate dates, you'll write a standard SQL recursive CTE. (And this is the typical approach now in MySQL, too, since version 8.0.)
Here is an example selecting all days for 1970:
with dates (dt) as
(
select date '1970-01-01' from dual
union all
select dt + interval '1' day from dates where dt < date '1970-12-31'
)
select dt from dates;
Here is another way to SELECT a list of dates for the year 1970. Adjust the starting and ending dates if you want different years or the INTERVAL if you want different periods like seconds, minutes, hours…
ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';
with dt (dt, interv) as (
select date '1970-01-01', numtodsinterval(1,'DAY') from dual
union all
select dt.dt + interv, interv from dt
where dt.dt + interv <= date '1970-12-31')
select dt from dt;
/

Impala list all dates between 2 dates

Can HUE Impala create a column which shows all dates between a specified start and end dates?
I want to list a column with date values.
You can use this sql.
select a.Date_Range
from (
select date1 - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date_Range
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
where a.Date_Range <= date2
Explanation -
You first create a range of numbers. And then add it to the date1 to get a range. Then you can pick your date range less than date2.

Explode and Count all items from 2 dates column

I would like to get all possible date (in this case : event_day) and number of event that happen between start_date and end_date. please look table below
---------------------------------
start_date | end_date | event
---------------------------------
2019-01-01 | 2019-01-04 | A
2019-01-02 | 2019-01-03 | B
2019-01-01 | 2019-01-06 | C
and I want to query to get number of event_count in all date. please see the following result
----------------------------
event_day | event_count
----------------------------
2019-01-01 | 2
2019-01-02 | 3
2019-01-03 | 3
2019-01-04 | 2
2019-01-05 | 1
2019-01-06 | 1
I read others source but can only find how to explode date from 2 dates. Any helps here? Thanks
You can use a calendar table to solve this:
SELECT date_value AS event_day, COUNT(*) AS event_count
FROM (
SELECT ADDDATE('1970-01-01', t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) AS date_value
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calendar INNER JOIN events ON calendar.date_value BETWEEN events.start_date AND events.end_date
WHERE calendar.date_value BETWEEN '2019-01-01' AND '2019-01-04' -- to filter for a specific date range.
GROUP BY date_value
demo on dbfiddle.uk
If you are using postgres you can generate a calendar table using generate_series, basically you need a calendar table to be able to explode the dates.
WITH a AS(
Select '2019-01-01'::date as start_date ,'2019-01-04'::date as end_date union all
Select '2019-01-02'::date , '2019-01-03'::date union all
Select '2019-01-01'::date, '2019-01-06'::date
)
Select t.date_generated,count(*) as event
from a
JOIN(Select date_generated
from generate_series(date '2019-01-01',
date '2019-12-31',
interval '1 day') as t(date_generated)
) t
ON t.date_generated between a.start_date and a.end_date
group by t.date_generated
order by t.date_generated
select Calendar.Calndr_date , count(Calendar.Calndr_date) count_events
from event_table
join Calendar on
Calendar.Calndr_date between event_table.start_date and event_table.end_date
group by Calendar.Calndr_date
please discuss if any problem.
Please create calendar table and insert data of calendar.

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.
Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.
One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo
You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;
Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

SQL Monthly Summary

I have a table that contains a startdate for each item
for example:
ID - Startdate
1 - 2011-01-01
2 - 2011-02-01
3 - 2011-04-01
...
I need a query that will give me the count of each item within each month, i need a full 12 month report. I tried simply grouping by the Month(StartDate) but this doesnt give me a zero for the months with no values, in the case above, for march.
so i would like the output to be along the lines of..
Month - Count
1 20
2 14
3 0
...
Any ideas?
Thanks.
SELECT A.Month, ISNULL(B.countvalue,0) Count
FROM (SELECT 1 AS MONTH
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
UNION
SELECT 10
UNION
SELECT 11
UNION
SELECT 12 ) A LEFT JOIN (SELECT datepart(month,Startdate) AS Month, Count(ID) as countvalue FROM yourTable GROUP BY datepart(month,Startdate))B
ON A.month = B.month
Hope this helps
Another way to do this using SQL Server 2005+ or Oracle.
SQL Statement
;WITH q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month
Test script
;WITH Input (ID, StartDate) AS (
SELECT 1, '2011-01-01'
UNION ALL SELECT 2, '2011-02-01'
UNION ALL SELECT 3, '2011-04-01'
)
, q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month