Explode and Count all items from 2 dates column - sql

I would like to get all possible date (in this case : event_day) and number of event that happen between start_date and end_date. please look table below
---------------------------------
start_date | end_date | event
---------------------------------
2019-01-01 | 2019-01-04 | A
2019-01-02 | 2019-01-03 | B
2019-01-01 | 2019-01-06 | C
and I want to query to get number of event_count in all date. please see the following result
----------------------------
event_day | event_count
----------------------------
2019-01-01 | 2
2019-01-02 | 3
2019-01-03 | 3
2019-01-04 | 2
2019-01-05 | 1
2019-01-06 | 1
I read others source but can only find how to explode date from 2 dates. Any helps here? Thanks

You can use a calendar table to solve this:
SELECT date_value AS event_day, COUNT(*) AS event_count
FROM (
SELECT ADDDATE('1970-01-01', t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) AS date_value
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calendar INNER JOIN events ON calendar.date_value BETWEEN events.start_date AND events.end_date
WHERE calendar.date_value BETWEEN '2019-01-01' AND '2019-01-04' -- to filter for a specific date range.
GROUP BY date_value
demo on dbfiddle.uk

If you are using postgres you can generate a calendar table using generate_series, basically you need a calendar table to be able to explode the dates.
WITH a AS(
Select '2019-01-01'::date as start_date ,'2019-01-04'::date as end_date union all
Select '2019-01-02'::date , '2019-01-03'::date union all
Select '2019-01-01'::date, '2019-01-06'::date
)
Select t.date_generated,count(*) as event
from a
JOIN(Select date_generated
from generate_series(date '2019-01-01',
date '2019-12-31',
interval '1 day') as t(date_generated)
) t
ON t.date_generated between a.start_date and a.end_date
group by t.date_generated
order by t.date_generated

select Calendar.Calndr_date , count(Calendar.Calndr_date) count_events
from event_table
join Calendar on
Calendar.Calndr_date between event_table.start_date and event_table.end_date
group by Calendar.Calndr_date
please discuss if any problem.
Please create calendar table and insert data of calendar.

Related

Converting monthly to daily data

I have monthly data that I would like to transform to daily data. The data looks like this. The extraction_dt is in date format.
isin
extraction_date
yield
001
2013-01-31
100
001
2013-02-28
110
001
2013-03-31
105
...
...
...
002
2013-01-31
200
...
...
...
And I would like to have something like this
isin
extraction_dt
yield
001
2013-01-01
100
001
2013-01-02
100
001
2013-01-03
100
..
.....
...
001
2013-02-01
110
...
...
...
I tried the following code but it does not work. I get the error message AnalysisException: Could not resolve table reference: 'cte'. How would you convert monthly to daily data?
with cte as
(select isin, extraction_dt, yield
from datashop
union all
select isin, extraction_dt, dateadd(d, 1, extraction_dt) AS date_dt, yield
from cte
where datediff(m,date_dt,dateadd(d, 1, date_dt))=0
)
select isin, date_dt,
1.0*isin / count(*) over (partition by isin, date_dt) AS daily_yield
from cte
order by 1,2
I can suggest easy solution.
generate a date series
match it with your data so it gets repeated.
So, here is the SQL you can use for Impala.
select isin, extraction_dt, a.dt AS date_dt, yield
from
datashop d,
(
select now() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as dt
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
WHERE
from_timestamp(a.dt,'yyyy/MM') =from_timestamp(d.extraction_dt,'yyyy/MM')
order by 1,2,3
the alias a is going to generate a series of dates.
WHERE - this clause will restrict to the month of extraction_dt. and you will get all possible values for a month.
ORDER BY - will show a nice output.
Your WITH clause has a recursive (self-referencing) query. In most SQL dialects, this requires using WITH RECURSIVE, not plain WITH. According to the Impala SQL reference, Impala does not support recursive common table expressions:
The Impala WITH clause does not support recursive queries in the
WITH, which is supported in some other database systems.
In other words, you cannot do this in Impala.

Impala list all dates between 2 dates

Can HUE Impala create a column which shows all dates between a specified start and end dates?
I want to list a column with date values.
You can use this sql.
select a.Date_Range
from (
select date1 - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date_Range
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
where a.Date_Range <= date2
Explanation -
You first create a range of numbers. And then add it to the date1 to get a range. Then you can pick your date range less than date2.

Calendar in SparkSQL

I would like to create calendar table using this query (it works in common SQL)
SELECT DATEADD(day,t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0,'1970-01-01') AS date_value
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
but when I migrate this to SparkSQL and do some modification (date_add function), it always failed and shows syntax error : missing ')' at 'SELECT'
Any help? thanks
This works:
SELECT 10*t1.t1+t0.t0 id, DATE_ADD('1970-01-01', 10*t1.t1+t0.t0) AS date_value
FROM
(SELECT 0 t0 UNION SELECT 1 t0 UNION SELECT 2 t0 UNION SELECT 3 t0 UNION SELECT 4 t0 UNION SELECT 5 t0 UNION SELECT 6 t0 UNION SELECT 7 t0 UNION SELECT 8 t0 UNION SELECT 9 t0) t0,
(SELECT 0 t1 UNION SELECT 1 t1 UNION SELECT 2 t1 UNION SELECT 3 t1 UNION SELECT 4 t1 UNION SELECT 5 t1 UNION SELECT 6 t1 UNION SELECT 7 t1 UNION SELECT 8 t1 UNION SELECT 9 t1) t1
Result:
+-----+-------------+--+
| id | date_value |
+-----+-------------+--+
| 11 | 1970-01-12 |
| 61 | 1970-03-03 |
| 31 | 1970-02-01 |
| 51 | 1970-02-21 |
| 41 | 1970-02-11 |
...
| 80 | 1970-03-22 |
| 90 | 1970-04-01 |
| 70 | 1970-03-12 |
| 0 | 1970-01-01 |
+-----+-------------+--+

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.
Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.
One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo
You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;
Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

How do I select records with max from id column if two of three other fields are identical

I have a table that stores costs for consumables.
consumable_cost_id consumable_type_id from_date cost
1 1 01/01/2000 £10.95
2 2 01/01/2000 £5.95
3 3 01/01/2000 £1.98
24 3 01/11/2013 £2.98
27 3 22/11/2013 £3.98
33 3 22/11/2013 £4.98
34 3 22/11/2013 £5.98
35 3 22/11/2013 £6.98
If the same consumable is updated more than once on the same day I would like to select only the row where the consumable_cost_id is biggest on that day. Desired output would be:
consumable_cost_id consumable_type_id from_date cost
1 1 01/01/2000 £10.95
2 2 01/01/2000 £5.95
3 3 01/01/2000 £1.98
24 3 01/11/2013 £2.98
35 3 22/11/2013 £6.98
Edit:
Here is my attempt (adapted from another post I found on here):
SELECT cc.*
FROM
consumable_costs cc
INNER JOIN
(
SELECT
from_date,
MAX(consumable_cost_id) AS MaxCcId
FROM consumable_costs
GROUP BY from_date
) groupedcc
ON cc.from_date = groupedcc.from_date
AND cc.consumable_cost_id = groupedcc.MaxCcId
You were very close. This seems to work for me:
SELECT cc.*
FROM
consumable_cost AS cc
INNER JOIN
(
SELECT
Max(consumable_cost_id) AS max_id,
consumable_type_id,
from_date
FROM consumable_cost
GROUP BY consumable_type_id, from_date
) AS m
ON cc.consumable_cost_id = m.max_id
SELECT * FROM consumable_cost
GROUP by consumable_type_id, from_date
ORDER BY cost DESC;
Assuming consumable_cost_id is unique.
SELECT * FROM T t1
WHERE EXISTS(
SELECT t2.consumable_type_id, t2.from_date FROM T t2
GROUP by t2.consumable_type_id, t2.from_date
HAVING MAX(t2.consumable_cost_id) = t1.consumable_cost_id);
Because of comment that this was returning an incorrect result, I created a test-query for Oracle that proves that this query works. As I said, it's for Oracle, but there is really no reason why this should not work in MS Access. The only Oracle specific I used here is the FROM DUAL to generate the virtual data.
WITH T AS
(
SELECT 1 AS consumable_cost_id,1 AS consumable_type_id, TO_DATE('01/01/2000','DD/MM/YYYY') AS FROM_DATE, '£10.95' AS COST FROM DUAL
UNION ALL
SELECT 2,2,TO_DATE('01/01/2000','DD/MM/YYYY'),'£5.95' FROM DUAL
UNION ALL
SELECT 3,3,TO_DATE('01/01/2000','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 24,3,TO_DATE('01/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 27,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 33,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 34,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 35,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
)
SELECT * FROM T t1
WHERE EXISTS(
SELECT t2.consumable_type_id, t2.from_date FROM T t2
GROUP by t2.consumable_type_id, t2.from_date
HAVING MAX(t2.consumable_cost_id) = t1.consumable_cost_id);
Result:
1 1 01-JAN-00 £10.95
2 2 01-JAN-00 £5.95
3 3 01-JAN-00 £1.98
24 3 01-NOV-13 £1.98
35 3 22-NOV-13 £1.98