Get 7 days data with Hive

Get 7 days data with Hive - hive

i have one table with different date
2014-11-20
2014-12-12
2014-11-10
2014-12-13
2014-10-12
2016-01-15
2016-01-14
2016-01-16
2016-01-18
want the last 7 days data (max date -7)
2016-01-15
2016-01-14
2016-01-16
2016-01-18
I have tried with below query but did not work
select * from date_txt1 where DATEDIFF((select max(purchase_date) from date_txt1),(select min(purchase_date) from date_txt1)) <= 7;

In "datediff" as second parameter use just column name:
select * from date_txt1
where DATEDIFF((select max(purchase_date) from date_txt1), purchase_date) <= 7;
Updated: "max" extracted:
with maxDate as (
select max(purchase_date) as end_date
from date_txt1
)
Select *
From date_txt1 , maxDate
where DATEDIFF(maxDate.end_date, purchase_date) <= 7;

select a.purchase_date
from date_txt1 a
cross join ( select max( purchase_date ) as maxdate from date_txt1 ) b
where DATEDIFF( b.maxdate, a.purchase_date ) <= 7

Related

SQL query to check if there are records in the database of 6 consecutive 'Sundays'

I need to build a query to check if there are records in the database of 6 consecutive 'Sundays'
SELECT DISTINCT ST1.DATAPU, ST1.NUMCAD, TO_CHAR(ST1.DATAPU, 'DAY') AS DIA
FROM SENIOR.R066SIT ST1
WHERE ST1.DATAPU BETWEEN '01/01/22' AND '23/11/22'
AND ST1.NUMCAD = 10
AND TO_CHAR(ST1.DATAPU, 'FMDAY') = 'DOMINGO' -->which is SUNDAY in English
ORDER BY ST1.DATAPU ASC
With this query above, I get the result of the records as shown in the image below

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern analysis:
SELECT *
FROM (
SELECT DISTINCT
TRUNC(DATAPU) AS datapu,
NUMCAD,
TO_CHAR(DATAPU,'DAY') AS DIA
FROM SENIOR.R066SIT
WHERE DATAPU BETWEEN DATE '2022-01-01' AND DATE '2022-11-23'
AND NUMCAD = 10
AND TRUNC(DATAPU) - TRUNC(DATAPU, 'IW') = 6 -- Sunday
)
MATCH_RECOGNIZE(
ORDER BY datapu
ALL ROWS PER MATCH
PATTERN (first_week consecutive_week{5,})
DEFINE
consecutive_week AS PREV(datapu) + INTERVAL '7' DAY = datapu
)
Which, for the sample data:
CREATE TABLE senior.r066sit(numcad, datapu) AS
SELECT 10, DATE '2022-01-01' + LEVEL - 1 FROM DUAL CONNECT BY LEVEL <= 5*7
UNION ALL
SELECT 10, DATE '2022-04-01' + LEVEL - 1 FROM DUAL CONNECT BY LEVEL <= 7*7
UNION ALL
SELECT 10, DATE '2022-08-01' + LEVEL - 1 FROM DUAL CONNECT BY LEVEL <= 7*7;
Outputs:
DATAPU
NUMCAD
DIA
2022-04-03 00:00:00
10
SUNDAY
2022-04-10 00:00:00
10
SUNDAY
2022-04-17 00:00:00
10
SUNDAY
2022-04-24 00:00:00
10
SUNDAY
2022-05-01 00:00:00
10
SUNDAY
2022-05-08 00:00:00
10
SUNDAY
2022-05-15 00:00:00
10
SUNDAY
2022-08-07 00:00:00
10
SUNDAY
2022-08-14 00:00:00
10
SUNDAY
2022-08-21 00:00:00
10
SUNDAY
2022-08-28 00:00:00
10
SUNDAY
2022-09-04 00:00:00
10
SUNDAY
2022-09-11 00:00:00
10
SUNDAY
2022-09-18 00:00:00
10
SUNDAY
Before Oracle 12, you can use multiple analytic functions in nested sub-queries:
SELECT datapu, numcad,
TO_CHAR(datapu, 'fmDAY') AS dia
FROM (
SELECT datapu, numcad,
COUNT(*) OVER (PARTITION BY grp) AS grp_size
FROM (
SELECT datapu, numcad,
SUM(consecutive) OVER (ORDER BY datapu) AS grp
FROM (
SELECT datapu, numcad,
CASE datapu - LAG(datapu) OVER (ORDER BY datapu)
WHEN 7
THEN 0
ELSE 1
END AS consecutive
FROM (
SELECT DISTINCT
TRUNC(DATAPU) AS datapu,
NUMCAD
FROM SENIOR.R066SIT
WHERE DATAPU BETWEEN DATE '2022-01-01' AND DATE '2022-11-23'
AND NUMCAD = 10
AND TRUNC(DATAPU) - TRUNC(DATAPU, 'IW') = 6 -- Sunday
)
)
)
)
WHERE grp_size >= 6;
fiddle

How to fill the time gap after grouping date record for months in postgres

I have table records as -
date n_count
2020-02-19 00:00:00 4
2020-07-14 00:00:00 1
2020-07-17 00:00:00 1
2020-07-30 00:00:00 2
2020-08-03 00:00:00 1
2020-08-04 00:00:00 2
2020-08-25 00:00:00 2
2020-09-23 00:00:00 2
2020-09-30 00:00:00 3
2020-10-01 00:00:00 11
2020-10-05 00:00:00 12
2020-10-19 00:00:00 1
2020-10-20 00:00:00 1
2020-10-22 00:00:00 1
2020-11-02 00:00:00 376
2020-11-04 00:00:00 72
2020-11-11 00:00:00 1
I want to be grouped all the records into months for finding month total count which is working, but there is a missing of month. how to fill this gap.
time month_count
"2020-02-01" 4
"2020-07-01" 4
"2020-08-01" 5
"2020-09-01" 5
"2020-10-01" 26
"2020-11-01" 449
This is what I have tried.
SELECT (date_trunc('month', date))::date AS time,
sum(n_count) as month_count
FROM table1
group by time
order by time asc

You can use generate_series() to generate all starts of months between the earliest and latest date available in the table, then bring the table with a left join:
select d.dt, coalesce(sum(t.n_count), 0) as month_count
from (
select generate_series(date_trunc('month', min(date)), date_trunc('month', max(date)), '1 month') as dt
from table1
) as d(dt)
left join table1 t on t.date >= d.dt and t.date < d.dt + interval '1 month'
group by d.dt
order by d.dt

I would simply UNION a date series, generated from MIN and MAX date:
demo:db<>fiddle
WITH cte AS ( -- 1
SELECT
*,
date_trunc('month', date)::date AS time
FROM
t
)
SELECT
time,
SUM(n_count) as month_count --3
FROM (
SELECT
time,
n_count
FROM cte
UNION
SELECT -- 2
generate_series(
(SELECT MIN(time) FROM cte),
(SELECT MAX(time) FROM cte),
interval '1 month'
)::date,
0
) s
GROUP BY time
ORDER BY time
Use CTE to calculate date_trunc only once. Could be left out if you like to call your table twice in the UNION below
Generate monthly date series from MIN to MAX date containing your n_count value = 0. Add it to the table
Do your calculation

SQL Select only missing months

Notice the 2017-04-01, 2018-02-01, 2018-07-01, and 2019-01-01 months are missing in the output. I want to show only those months which are missing. Does anyone know how to go about this?
Query:
SELECT TO_DATE("Month", 'mon''yy') as dates FROM sample_sheet
group by dates
order by dates asc;
Output:
2017-01-01
2017-02-01
2017-03-01
2017-05-01
2017-06-01
2017-07-01
2017-08-01
2017-09-01
2017-10-01
2017-11-01
2017-12-01
2018-01-01
2018-03-01
2018-04-01
2018-05-01
2018-06-01
2018-08-01
2018-09-01
2018-10-01
2018-11-01
2018-12-01
2019-02-01
2019-03-01
2019-04-01

I don't know Vertica, so I wrote a working proof of concept in Microsoft SQL Server and tried to convert it to Vertica syntax based on the online documentation.
It should look like this:
with
months as (
select 2017 as date_year, 1 as date_month, to_date('2017-01-01', 'YYYY-MM-DD') as first_date, to_date('2017-01-31', 'yyyy-mm-dd') as last_date
union all
select
year(add_months(first_date, 1)) as date_year,
month(add_months(first_date, 1)) as date_month,
add_months(first_date, 1) as first_date,
last_day(add_months(first_date, 1)) as last_date
from months
where first_date < current_date
),
sample_dates (a_date) as (
select to_date('2017-01-15', 'YYYY-MM-DD') union all
select to_date('2017-01-22', 'YYYY-MM-DD') union all
select to_date('2017-02-01', 'YYYY-MM-DD') union all
select to_date('2017-04-15', 'YYYY-MM-DD') union all
select to_date('2017-06-15', 'YYYY-MM-DD')
)
select *
from sample_dates right join months on sample_dates.a_date between first_date and last_date
where sample_dates.a_date is null
Months is a recursive dynamic table that holds all months since 2017-01, with first and last day of the month. sample_dates is just a list of dates to test the logic - you should replace it with your own table.
Once you build that monthly calendar table all you need to do is check your dates against it using an outer query to see what dates are not between any of those periods between first_date and last_date columns.

You can build a TIMESERIES of all dates between the first input date and the last input date (The highest granularity of a TIMESERIES is the day.), and filter out only the months' first days out of that; then left join that created sequence of firsts of month with your input to find out where the join would fail, checking for NULLS from the input branch of the join:
WITH
-- your input
input(mth1st) AS (
SELECT DATE '2017-01-01'
UNION ALL SELECT DATE '2017-02-01'
UNION ALL SELECT DATE '2017-03-01'
UNION ALL SELECT DATE '2017-05-01'
UNION ALL SELECT DATE '2017-06-01'
UNION ALL SELECT DATE '2017-07-01'
UNION ALL SELECT DATE '2017-08-01'
UNION ALL SELECT DATE '2017-09-01'
UNION ALL SELECT DATE '2017-10-01'
UNION ALL SELECT DATE '2017-11-01'
UNION ALL SELECT DATE '2017-12-01'
UNION ALL SELECT DATE '2018-01-01'
UNION ALL SELECT DATE '2018-03-01'
UNION ALL SELECT DATE '2018-04-01'
UNION ALL SELECT DATE '2018-05-01'
UNION ALL SELECT DATE '2018-06-01'
UNION ALL SELECT DATE '2018-08-01'
UNION ALL SELECT DATE '2018-09-01'
UNION ALL SELECT DATE '2018-10-01'
UNION ALL SELECT DATE '2018-11-01'
UNION ALL SELECT DATE '2018-12-01'
UNION ALL SELECT DATE '2019-02-01'
UNION ALL SELECT DATE '2019-03-01'
UNION ALL SELECT DATE '2019-04-01'
)
,
-- need a series of month's firsts
-- TIMESERIES works for INTERVAL DAY TO SECOND
-- so build that timeseries, and filter out
-- the month's firsts
limits(mth1st) AS (
SELECT MIN(mth1st) FROM input
UNION ALL SELECT MAX(mth1st) FROM input
)
,
alldates AS (
SELECT dt::DATE FROM limits
TIMESERIES dt AS '1 day' OVER(ORDER BY mth1st::TIMESTAMP)
)
,
allfirsts(mth1st) AS (
SELECT dt FROM alldates WHERE DAY(dt)=1
)
SELECT
allfirsts.mth1st
FROM allfirsts
LEFT JOIN input USING(mth1st)
WHERE input.mth1st IS NULL;
-- out mth1st
-- out ------------
-- out 2017-04-01
-- out 2018-02-01
-- out 2018-07-01
-- out 2019-01-01

Pull out most non overlapping date range

Sorry, going to start over and try to explain from the start:
I have a small list of dates:
date mark
08-16-2016 1
08-17-2016 1
01-03-2017 1
02-16-2018 1
02-17-2018 1
From here I need to find out in a 3 year period if there is 2 continuous years where there are less than 3 marks. I'm looking over a date range from 2016-08-01 to 2019-08-01.
So I setup the following query:
with initData as(
select date('2016-08-16') stamp, 1 mark from sysibm.sysdummy1
union select date('2016-08-17') stamp, 1 mark from sysibm.sysdummy1
union select date('2017-01-03') stamp, 1 mark from sysibm.sysdummy1
union select date('2018-02-16') stamp, 1 mark from sysibm.sysdummy1
union select date('2018-02-17') stamp, 1 mark from sysibm.sysdummy1
)
select * from(
select
a.startDate, a.endDate, coalesce(sum(b.mark),0) as mark
from(
select startDate, endDate from(
select stamp startDate, stamp+1 YEAR endDate
from(
select stamp + ym YEAR stamp
from(
select date('2016-08-01') stamp from sysibm.sysdummy1
union
select stamp from initData
union
select stamp+1 DAY from initData
),
(
select 0 as ym from sysibm.sysdummy1
union select 1 as ym from sysibm.sysdummy1
union select 2 as ym from sysibm.sysdummy1
)
)
)
where endDate <= date('2019-08-01')
) a
left outer join(
select stamp, mark from initData
) b
on b.stamp >= a.startDate
and b.stamp < a.endDate
group by a.startDate, a.endDate
)
where mark < 3
order by startDate, endDate
This gives me my list of ranges that I'm looking which have less than 3 marks. Now I need to find full years that don't over lap with other dates.
2016-08-17 2017-08-17 2
2016-08-18 2017-08-18 1
2017-01-03 2018-01-03 1
2017-01-04 2018-01-04 0
2017-08-01 2018-08-01 2
2017-08-16 2018-08-16 2
2017-08-17 2018-08-17 2
2017-08-18 2018-08-18 2
2018-01-03 2019-01-03 2
2018-01-04 2019-01-04 2
2018-02-16 2019-02-16 2
2018-02-17 2019-02-17 1
2018-02-18 2019-02-18 0
2018-08-01 2019-08-01 0
I have finally came up with some solution, but it seems a bit slow and seems like there should be a better way to do it:
with initData as(
select date('2016-08-16') stamp, 1 mark from sysibm.sysdummy1
union select date('2016-08-17') stamp, 1 mark from sysibm.sysdummy1
union select date('2017-01-03') stamp, 1 mark from sysibm.sysdummy1
union select date('2018-02-16') stamp, 1 mark from sysibm.sysdummy1
union select date('2018-02-17') stamp, 1 mark from sysibm.sysdummy1
), dateRanges as(
select startDate, endDate, mark, row_number() over (order by startDate, endDate) rn from(
select
a.startDate, a.endDate, coalesce(sum(b.mark),0) as mark
from(
select startDate, endDate from(
select stamp startDate, stamp+1 YEAR endDate
from(
select stamp + ym YEAR stamp
from(
select date('2016-08-01') stamp from sysibm.sysdummy1
union
select stamp from initData
union
select stamp+1 DAY from initData
),
(
select 0 as ym from sysibm.sysdummy1
union select 1 as ym from sysibm.sysdummy1
union select 2 as ym from sysibm.sysdummy1
)
)
)
where endDate <= date('2019-08-01')
) a
left outer join(
select stamp, mark from initData
) b
on b.stamp >= a.startDate
and b.stamp < a.endDate
group by a.startDate, a.endDate
)
where mark < 3
), dateRangeLimit1 as(
select
a.startDate, a.endDate, a.mark, row_number() over (order by a.startDate, a.endDate) rn
from dateRanges a
left outer join dateRanges b
on a.startDate < b.endDate
and b.rn = 1
and a.rn != b.rn
where b.rn is null
)
select a.* from dateRangeLimit1 a
left outer join dateRangeLimit1 b
on a.startDate < b.endDate
and b.rn = 2 and a.rn <> b.rn and a.rn != 1
where b.rn is null
This gives me back my expected date ranges that don't over lap with each other:
2016-08-17 2017-08-17 2 1
2017-08-17 2018-08-17 2 2
I hope this makes a bit more sense.

I'm not sure your data is quite right, but nonetheless does this help?
WITH D(F,T) AS (VALUES
('2016-08-09','2017-08-09')
,('2016-08-16','2017-08-16')
,('2016-08-17','2017-08-17')
,('2016-08-18','2017-08-18')
,('2017-08-09','2018-08-09')
,('2017-08-16','2018-08-16')
,('2017-08-17','2018-08-17')
,('2017-08-18','2018-08-18')
,('2018-02-16','2019-02-16')
,('2018-02-17','2019-02-17')
,('2018-02-18','2019-02-18')
,('2018-08-09','2019-08-09')
)
SELECT F,T FROM
(
SELECT F,T
, LEAD(F,1) OVER(ORDER BY F ASC) AS NEXT_F
, LAG( T,1) OVER(ORDER BY F ASC) AS PREV_T
FROM D
)
WHERE T >= NEXT_F
OR F <= PREV_T

from dual apparently points to ORACLE.
Find the longest path of non-overlapping (end = start considered non-overlapping) intervals
select level, sys_connect_by_path (startDate || ' .. ' || endDate, '/') path
from blah a
connect by (prior startDate < startDate) and not(prior startDate < endDate and startDate < prior endDate)
order by level desc
-- fetch is 12c+ feature
fetch next 1 rows only;
Using sample data returns
3 /09-AUG-16 .. 09-AUG-17/09-AUG-17 .. 09-AUG-18/09-AUG-18 .. 09-AUG-19
Fiddle

How to combine multiple SELECTs into a single SELECT by a common column in (BigQuery) SQL?

Given I have multiple tables in BigQuery, hence I have multiple SQL-statements that gives me "the number of X per day". For example:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as installs
FROM database.table1
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | installs |
-------------------------
| 2017-01-01 | 11 |
| 2017-01-02 | 22 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as uninstalls
FROM database.table2
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | uninstalls |
---------------------------
| 2017-01-02 | 22 |
| 2017-01-03 | 33 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as cases
FROM database.table3
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | cases |
----------------------
| 2017-01-01 | 11 |
| 2017-01-03 | 33 |
etc
etc
Now I need to combine all these into a single SELECT statement that gives the following results:
| day | installs | uninstalls | cases |
----------------------------------------------
| 2017-01-01 | 11 | 0 | 11 |
| 2017-01-02 | 22 | 22 | 0 |
| 2017-01-03 | 0 | 33 | 33 |
etc
Is this even possible?
Or what's the closest SQL-statement I can write that would give me a similar result?
Any feedback is appreciated!

Here is a self-contained example that might help to get you started. It uses two dummy tables, InstallEvents and UninstallEvents, which contain timestamps for the respective actions. It creates a common table expression called StartAndEnd that computes the minimum and maximum dates for these events in order to decide which dates to aggregate over, then unions the contents of the InstallEvents and UninstallEvents, counting the events for each day.
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
),
StartAndEnd AS (
SELECT MIN(DATE(timestamp)) AS min_date, MAX(DATE(timestamp)) AS max_date
FROM (
SELECT * FROM InstallEvents UNION ALL
SELECT * FROM UninstallEvents
)
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY(
(SELECT min_date FROM StartAndEnd),
(SELECT max_date FROM StartAndEnd)
)) AS day
GROUP BY day
ORDER BY day;
If you know what the start and end dates are in advance, you can hard-code them in the query instead and then omit the StartAndEnd CTE:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-04')) AS day
GROUP BY day
ORDER BY day;
To see the events in the sample data, use a query that unions the contents:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT timestamp, true AS is_install
FROM InstallEvents UNION ALL
SELECT timestamp, false
FROM UninstallEvents;

Below is for BigQuery Standard SQL
#standardSQL
WITH calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
-- ORDER BY day
Please note: you are using timestamp as a column name which is not the best practice as it is keyword, so in my example i leave your naming but consider to change this!
You can test / play this solution with below dummy data
#standardSQL
WITH `database.table1` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs
UNION ALL SELECT TIMESTAMP '2017-01-01', 22
),
`database.table2` AS (
SELECT TIMESTAMP '2016-12-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
`database.table3` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-10', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
ORDER BY day

I am not very familiar with bigquery, so this is probably not going to be a copy-paste answer.
You'll first have to build a calander table to make sure you have all dates. Here's an example for sql server. There are probably examples for bigquery available as well. The following assumes a Calander table with Date attribute in timestamp.
Once you have your calander table you can join all your tables to that:
SELECT FORMAT_TIMESTAMP("%F",C.Date) AS day
, COUNT(T1.DATE(T1.TIMESTAMP)) AS installs --Here you could also use your FORMAT_TIMESTAMP
, COUNT(T1.DATE(T2.TIMESTAMP)) AS uninstalls
FROM Calander C
LEFT JOIN database.table1 T1
ON DATE(T1.TIMESTAMP) = DATE(C.Date) --Convert to date to remove times, you could also use your FORMAT_TIMESTAMP
LEFT JOIN database.table2 T2
ON DATE(T2.TIMESTAMP) = DATE(C.Date)
GROUP BY day
ORDER BY day ASC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get 7 days data with Hive - hive

select a.purchase_date from date_txt1 a cross join ( select max( purchase_date ) as maxdate from date_txt1 ) b where DATEDIFF( b.maxdate, a.purchase_date ) <= 7

Related

SQL query to check if there are records in the database of 6 consecutive 'Sundays'

How to fill the time gap after grouping date record for months in postgres

SQL Select only missing months

Pull out most non overlapping date range

How to combine multiple SELECTs into a single SELECT by a common column in (BigQuery) SQL?

Categories

Resources