Outer Query Column cannot be used in Clickhouse SELECT

Outer Query Column cannot be used in Clickhouse SELECT - sql

I'm trying to select rows with the max value based on a metric. So, each row should be "compared" to its own max value based on the metric of that row. This can be done with the following query in PostgreSQL:
This SQL runs correctly in PostgreSQL:
DROP TABLE IF EXISTS test_max_date;
CREATE TABLE IF NOT EXISTS test_max_date
(
timestamp date,
value double precision,
metric text,
accept_date date
);
INSERT INTO test_max_date (timestamp, value, metric, accept_date)
VALUES ('1990-01-01', 1.1, 'foo', '1999-02-01'), -- Set 1
('1990-02-01', 2.2, 'foo', '1999-02-01'),
('1990-03-01', 3.3, 'foo', '1999-02-01'),
('1990-01-01', 1.2, 'bar', '2021-02-01'), -- Set 2
('1990-02-01', 2.3, 'bar', '2021-02-01'),
('1990-03-01', 3.4, 'bar', '2021-02-01'),
('1990-01-01', 1.1, 'foo', '1999-04-01'), -- Set 3
('1990-02-01', 2.2, 'foo', '1999-04-01'),
('1990-03-01', 3.3, 'foo', '1999-04-01'),
('1990-01-01', 1.2, 'bar', '2022-02-01'), -- Set 4
('1990-02-01', 2.3, 'bar', '2022-02-01'),
('1990-03-01', 3.4, 'bar', '2022-02-01')
;
SELECT timestamp, value, metric, accept_date
FROM test_max_date tmd1
WHERE
accept_date = (SELECT max(accept_date) FROM test_max_date tmd2 WHERE tmd2.metric = tmd1.metric)
But, the outer query column cannot be used in the inner query WHERE clause in Clickhouse.
The exact same SQL does not run in Clickhouse:
DROP TABLE IF EXISTS test_max_date;
CREATE TABLE IF NOT EXISTS test_max_date
(
timestamp date,
value double precision,
metric text,
accept_date date
)
ENGINE = ReplacingMergeTree
ORDER BY (metric, accept_date, timestamp);
INSERT INTO test_max_date (timestamp, value, metric, accept_date)
VALUES ('1990-01-01', 1.1, 'foo', '1999-02-01'),
('1990-02-01', 2.2, 'foo', '1999-02-01'),
('1990-03-01', 3.3, 'foo', '1999-02-01'),
('1990-01-01', 1.2, 'bar', '2021-02-01'),
('1990-02-01', 2.3, 'bar', '2021-02-01'),
('1990-03-01', 3.4, 'bar', '2021-02-01'),
('1990-01-01', 1.1, 'foo', '1999-04-01'),
('1990-02-01', 2.2, 'foo', '1999-04-01'),
('1990-03-01', 3.3, 'foo', '1999-04-01'),
('1990-01-01', 1.2, 'bar', '2022-02-01'),
('1990-02-01', 2.3, 'bar', '2022-02-01'),
('1990-03-01', 3.4, 'bar', '2022-02-01')
;
SELECT timestamp, value, metric, accept_date
FROM test_max_date tmd1
WHERE
accept_date = (SELECT max(accept_date) FROM test_max_date tmd2 WHERE tmd2.metric = tmd1.metric)
The error is:
Code: 47. DB::Exception: Missing columns: 'tmd1.metric' while processing query: 'SELECT max(accept_date) FROM test_max_date AS tmd2 WHERE metric = tmd1.metric', required columns: 'metric' 'tmd1.metric' 'accept_date', maybe you meant: ['metric','accept_date']: While processing (SELECT max(accept_date) FROM test_max_date AS tmd2 WHERE tmd2.metric = tmd1.metric) AS _subquery805: While processing accept_date = ((SELECT max(accept_date) FROM test_max_date AS tmd2 WHERE tmd2.metric = tmd1.metric) AS _subquery805). (UNKNOWN_IDENTIFIER) (version 22.12.1.1752 (official build)) , server ClickHouseNode [uri=http://localhost:8123/default, options={custom_http_params=session_id=DataGrip_d050598f-af8a-4c56-b9d4-e88fbec47962}]#-1760860067
I assume there's something different between the two, but I was unable to find anything about outer query column access in clickhouse.

1)
SELECT timestamp, value, metric, accept_date
FROM test_max_date tmd1
WHERE (accept_date,metric) in (SELECT max(accept_date),metric FROM test_max_date group by metric)
order by 1,2
┌──timestamp─┬─value─┬─metric─┬─accept_date─┐
│ 1990-01-01 │ 1.1 │ foo │ 1999-04-01 │
│ 1990-01-01 │ 1.2 │ bar │ 2022-02-01 │
│ 1990-02-01 │ 2.2 │ foo │ 1999-04-01 │
│ 1990-02-01 │ 2.3 │ bar │ 2022-02-01 │
│ 1990-03-01 │ 3.3 │ foo │ 1999-04-01 │
│ 1990-03-01 │ 3.4 │ bar │ 2022-02-01 │
└────────────┴───────┴────────┴─────────────┘
2)
SELECT timestamp, value, metric, accept_date
from ( SELECT timestamp, value, metric, accept_date,
rank() over (partition by metric order by accept_date desc) r
FROM test_max_date tmd1 ) t
WHERE r=1
order by 1,2
┌──timestamp─┬─value─┬─metric─┬─accept_date─┐
│ 1990-01-01 │ 1.1 │ foo │ 1999-04-01 │
│ 1990-01-01 │ 1.2 │ bar │ 2022-02-01 │
│ 1990-02-01 │ 2.2 │ foo │ 1999-04-01 │
│ 1990-02-01 │ 2.3 │ bar │ 2022-02-01 │
│ 1990-03-01 │ 3.3 │ foo │ 1999-04-01 │
│ 1990-03-01 │ 3.4 │ bar │ 2022-02-01 │
└────────────┴───────┴────────┴─────────────┘
3)
select timestamp, value, metric, accept_date
from (
SELECT metric, accept_date, groupArray( (timestamp, value) ) ga
FROM test_max_date
group by metric, accept_date
order by metric, accept_date desc
limit 1 by metric
) array join ga.1 as timestamp, ga.2 as value
order by 1, 2
┌──timestamp─┬─value─┬─metric─┬─accept_date─┐
│ 1990-01-01 │ 1.1 │ foo │ 1999-04-01 │
│ 1990-01-01 │ 1.2 │ bar │ 2022-02-01 │
│ 1990-02-01 │ 2.2 │ foo │ 1999-04-01 │
│ 1990-02-01 │ 2.3 │ bar │ 2022-02-01 │
│ 1990-03-01 │ 3.3 │ foo │ 1999-04-01 │
│ 1990-03-01 │ 3.4 │ bar │ 2022-02-01 │
└────────────┴───────┴────────┴─────────────┘

Related

How can i fill empty values while summarizing over a frame?

I have a query that calculates moving sum over a frame:
SELECT "Дата",
"Износ",
SUM("Сумма") OVER (partition by "Износ" order by "Дата"
rows between unbounded preceding and current row) AS "Продажи"
FROM (
SELECT date_trunc('week', period) AS "Дата",
multiIf(wear_and_tear BETWEEN 1 AND 3, '1-3',
wear_and_tear BETWEEN 4 AND 10, '4-10',
wear_and_tear BETWEEN 11 AND 20, '11-20',
wear_and_tear BETWEEN 21 AND 30, '21-30',
wear_and_tear BETWEEN 31 AND 45, '31-45',
wear_and_tear BETWEEN 46 AND 100, '46-100',
'Новые') AS "Износ",
SUM(quantity) AS "Сумма"
FROM shinsale_prod.sale_1c sc
LEFT JOIN product_1c pc ON sc.product_id = pc.id
WHERE 1=1
-- AND partner != 'Наше предприятие'
-- AND wear_and_tear = 0
-- AND stock IN ('ShinSale Щитниково', 'ShinSale Строгино', 'ShinSale Кунцево', 'ShinSale Санкт-Петербург', 'Шиномонтаж Подольск')
AND seasonality = 'з'
-- AND (quantity IN {{quant}} OR quantity IN -{{quant}})
-- AND stock in {{Склад}}
GROUP BY "Дата", "Износ"
HAVING "Дата" BETWEEN '2021-06-01' AND '2022-01-08'
ORDER BY 'Дата'
The thing is that in some groups I have now rows dated between 2021-12-20 and 2022-01-03
Therefore the line that represent this group has a gap on my chart.
Is there a way I can fill this gap with average values or smth?
I tried to right join my subquery to empty range of dates, but then i get empty rows and my filters in WHERE section kill the query and then I get empty or nearly empty result

You can generate mockup dates and construct a proper outer join like this:
SELECT
a.the_date,
sum(your_query.value) OVER (PARTITION BY 1 ORDER BY a.the_date ASC)
FROM
(
SELECT
number AS value,
toDate('2021-01-01') + value AS the_date
FROM numbers(10)
) AS your_query
RIGHT JOIN
(
WITH
toStartOfDay(toDate('2021-01-01')) AS start,
toStartOfDay(toDate('2021-01-14')) AS end
SELECT arrayJoin(arrayMap(x -> toDate(x), range(toUInt32(start), toUInt32(end), 24 * 3600))) AS the_date
) AS a ON a.the_date = your_query.the_date
Then the results will have no gaps:
┌─a.the_date─┬─sum(value) OVER (PARTITION BY 1 ORDER BY a.the_date ASC)─┐
│ 2021-01-01 │ 0 │
│ 2021-01-02 │ 1 │
│ 2021-01-03 │ 3 │
│ 2021-01-04 │ 6 │
│ 2021-01-05 │ 10 │
│ 2021-01-06 │ 15 │
│ 2021-01-07 │ 21 │
│ 2021-01-08 │ 28 │
│ 2021-01-09 │ 36 │
│ 2021-01-10 │ 45 │
│ 2021-01-11 │ 45 │
│ 2021-01-12 │ 45 │
│ 2021-01-13 │ 45 │
└────────────┴──────────────────────────────────────────────────────────┘

Display COUNT(*) for every week instead of every day

Let us say that I have a table with user_id of Int32 type and login_time as DateTime in UTC format. user_id is not unique, so SELECT user_id, login_time FROM some_table; gives following result:
┌─user_id─┬──login_time─┐
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-02 │
│ 2 │ 2021-03-02 │
│ 2 │ 2021-03-03 │
└─────────┴─────────────┘
If I run SELECT COUNT(*) as count, toDate(login_time) as l FROM some_table GROUP BY l I get following result:
┌─count───┬──login_time─┐
│ 2 │ 2021-03-01 │
│ 2 │ 2021-03-02 │
│ 1 │ 2021-03-03 │
└─────────┴─────────────┘
I would like to reformat the result to show COUNT on a weekly level, instead of every day, as I currently do.
My result for the above example could look something like this:
┌──count──┬──year─┬──month──┬─week ordinal┐
│ 5 │ 2021 │ 03 │ 1 │
│ 0 │ 2021 │ 03 │ 2 │
│ 0 │ 2021 │ 03 │ 3 │
│ 0 │ 2021 │ 03 │ 4 │
└─────────┴───────┴─────────┴─────────────┘
I have gone through the documentation, found some interesting functions, but did not manage to make them solve my problem.
I have never worked with clickhouse before and am not very experienced with SQL, which is why I ask here for help.

Try this query:
select count() count, toYear(start_of_month) year, toMonth(start_of_month) month,
toWeek(start_of_week) - toWeek(start_of_month) + 1 AS "week ordinal"
from (
select *, toStartOfMonth(login_time) start_of_month,
toStartOfWeek(login_time) start_of_week
from (
/* emulate test dataset */
select data.1 user_id, toDate(data.2) login_time
from (
select arrayJoin([
(1, '2021-02-27'),
(1, '2021-02-28'),
(1, '2021-03-01'),
(1, '2021-03-01'),
(1, '2021-03-02'),
(2, '2021-03-02'),
(2, '2021-03-03'),
(2, '2021-03-08'),
(2, '2021-03-16'),
(2, '2021-04-01')]) data)
)
)
group by start_of_month, start_of_week
order by start_of_month, start_of_week
/*
┌─count─┬─year─┬─month─┬─week ordinal─┐
│ 1 │ 2021 │ 2 │ 4 │
│ 1 │ 2021 │ 2 │ 5 │
│ 5 │ 2021 │ 3 │ 1 │
│ 1 │ 2021 │ 3 │ 2 │
│ 1 │ 2021 │ 3 │ 3 │
│ 1 │ 2021 │ 4 │ 1 │
└───────┴──────┴───────┴──────────────┘
*/

SQL Query (ClickHouse): group by where timediff between values less then X

I need a little help with sql-query. I'm using clickhouse, but maybe standard SQL syntax is enough for this task.
I've got the following table:
event_time; Text; ID
2021-03-16 09:00:48; Example_1; 1
2021-03-16 09:00:49; Example_2; 1
2021-03-16 09:00:50; Example_3; 1
2021-03-16 09:15:48; Example_1_1; 1
2021-03-16 09:15:49; Example_2_2; 1
2021-03-16 09:15:50; Example_3_3; 1
What I want to have at the end for this example - 2 rows:
Example_1Example2Example_3
Example_1_1Example2_2Example_3_3
Concatenation of Text field based on ID. The problem that this ID is not unique during some time interval. It's unique only for a minute as an example. So I want to concatenate only strings where the difference between first and last row is less than a minute.
Right now I've got a query like:
SELECT arrayStringConcat(groupArray(Text))
FROM (SELECT event_time, Text, ID
FROM Test_Table
ORDER by event_time asc)
GROUP BY ID;
What kind of condition should I add here?

Here is an example
create table X(event_time DateTime, Text String, ID Int64) Engine=Memory;
insert into X values ('2021-03-16 09:00:48','Example_1', 1), ('2021-03-16 09:00:49','Example_2', 1), ('2021-03-16 09:00:50','Example_3', 1), ('2021-03-16 09:01:48','Example_4', 1), ('2021-03-16 09:01:49','Example_5', 1), ('2021-03-16 09:15:48','Example_1_1', 1), ('2021-03-16 09:15:49','Example_2_2', 1),('2021-03-16 09:15:50','Example_3_3', 1);
SELECT * FROM X
┌──────────event_time─┬─Text────────┬─ID─┐
│ 2021-03-16 09:00:48 │ Example_1 │ 1 │
│ 2021-03-16 09:00:49 │ Example_2 │ 1 │
│ 2021-03-16 09:00:50 │ Example_3 │ 1 │
│ 2021-03-16 09:01:48 │ Example_4 │ 1 │
│ 2021-03-16 09:01:49 │ Example_5 │ 1 │
│ 2021-03-16 09:15:48 │ Example_1_1 │ 1 │
│ 2021-03-16 09:15:49 │ Example_2_2 │ 1 │
│ 2021-03-16 09:15:50 │ Example_3_3 │ 1 │
└─────────────────────┴─────────────┴────┘
What result is expected in this case?
CH 21.3
set allow_experimental_window_functions = 1;
SELECT
ID,
y,
groupArray(event_time),
groupArray(Text)
FROM
(
SELECT
ID,
event_time,
Text,
max(event_time) OVER (PARTITION BY ID ORDER BY event_time ASC RANGE BETWEEN CURRENT ROW AND 60 FOLLOWING) AS y
FROM X
)
GROUP BY
ID,
y
ORDER BY
ID ASC,
y ASC
Query id: 9219a1f2-8c96-425f-9301-745fa7b88b40
┌─ID─┬───────────────────y─┬─groupArray(event_time)────────────────────────────────────────────────────────────────────┬─groupArray(Text)──────────────────────────────────┐
│ 1 │ 2021-03-16 09:01:48 │ ['2021-03-16 09:00:48'] │ ['Example_1'] │
│ 1 │ 2021-03-16 09:01:49 │ ['2021-03-16 09:00:49','2021-03-16 09:00:50','2021-03-16 09:01:48','2021-03-16 09:01:49'] │ ['Example_2','Example_3','Example_4','Example_5'] │
│ 1 │ 2021-03-16 09:15:50 │ ['2021-03-16 09:15:48','2021-03-16 09:15:49','2021-03-16 09:15:50'] │ ['Example_1_1','Example_2_2','Example_3_3'] │
└────┴─────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────┘

Extract and sum values with subfields inside string using ClickHouse

I have a ClickHouse database with a simple table with two fields (pagePath string, pageviews int)
I want sum visitis for each filter + value, to know pageviews by users in filters (most used filter) . Filters are separated by comma
Example data
pagePath
pageviews
/url1/filter1.value1
1
/url1/filter1.value2
2
/url1/filter1.value3,filter1.value2
3
/url1/filter1.value4,filter3.value2
4
/url1/filter1.value5,filter3.value2,filter1.value2
5
/url1/filter2.value1,filter3.value2,filter1.value2
6
/url1/filter2.value2
7
/url-2/filter2.value3
8
/url-2/filter2.value4
9
/url-11/filter3.value1
10
/url-21/filter3.value1
11
/url1/filter3.value2
12
/url1/filter3.value3
13
/url1/filter3.value4
14
create table T (pagePath String , pageviews Int64) Engine=Memory;
insert into T values ('/url1/filter1.value1',1);
insert into T values ('/url1/filter1.value2',2);
insert into T values ('/url1/filter1.value3,filter1.value2',3);
insert into T values ('/url1/filter1.value4,filter3.value2',4);
insert into T values ('/url1/filter1.value5,filter3.value2,filter1.value2',5);
insert into T values ('/url1/filter2.value1,filter3.value2,filter1.value2',6);
insert into T values ('/url1/filter2.value2',7);
insert into T values ('/url-2/filter2.value3',8);
insert into T values ('/url-2/filter2.value4',9);
insert into T values ('/url-11/filter3.value1',10);
insert into T values ('/url-21/filter3.value1',11);
insert into T values ('/url1/filter3.value2',12);
insert into T values ('/url1/filter3.value3',13);
insert into T values ('/url1/filter3.value4',14);
And I want get sum of pageviews foreach filter + value
Filters
pageviews
filter1.value1
1
filter1.value2
16 (2 + 3 +5 +6)
filter1.value3
3
filter1.value4
4
filter1.value5
5
filter2.value1
6
filter2.value2
7
filter2.value3
8
filter2.value4
9
filter3.value1
10
filter3.value1
11
filter3.value2
12 (4+5+6+12)
filter3.value3
13
filter3.value4
14
And also I want get sum of pageviews foreach filter (without values)
Filters
pageviews
filter1
15 (1, 2, 3, 4, 5)
filter2
30 (6+ 7+ 8+ 9)
filter1
60 ( 10, 11, 12, 13, 14)
I try with
select
arrJoinFilters,
sum (PV) as totales
from
(
SELECT
arrJoinFilters,
splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1')) arrFilter,
pageviews as PV
FROM
Table ARRAY JOIN arrFilter AS arrJoinFilters
GROUP by arrFilter,PV,arrJoinFilters
)
group by
arrJoinFilters
order by arrJoinFilters
But, I think there are some wrong, and I don't get second result desired
Thanks!

4+5+6+12 = 27
SELECT
arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
sum(pageviews) as PV
FROM T
GROUP by f
order by f
┌─f──────────────┬─PV─┐
│ filter1.value1 │ 1 │
│ filter1.value2 │ 16 │
│ filter1.value3 │ 3 │
│ filter1.value4 │ 4 │
│ filter1.value5 │ 5 │
│ filter2.value1 │ 6 │
│ filter2.value2 │ 7 │
│ filter2.value3 │ 8 │
│ filter2.value4 │ 9 │
│ filter3.value1 │ 21 │
│ filter3.value2 │ 27 │
│ filter3.value3 │ 13 │
│ filter3.value4 │ 14 │
└────────────────┴────┘
select splitByChar('.',f)[1] x, sum(PV), groupArray(PV), groupArray(f)
from (
SELECT
arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
sum(pageviews) as PV
FROM T
GROUP by f
order by f) group by x
┌─x───────┬─sum(PV)─┬─groupArray(PV)─┬─groupArray(f)──────────────────────────────────────────────────────────────────────────┐
│ filter2 │ 30 │ [6,7,8,9] │ ['filter2.value1','filter2.value2','filter2.value3','filter2.value4'] │
│ filter3 │ 75 │ [21,27,13,14] │ ['filter3.value1','filter3.value2','filter3.value3','filter3.value4'] │
│ filter1 │ 29 │ [1,16,3,4,5] │ ['filter1.value1','filter1.value2','filter1.value3','filter1.value4','filter1.value5'] │
└─────────┴─────────┴────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘

Join timeseries with distinct values in ClickHouse

I've got a following issue that I can't solve. Main purpose is to show graphs in Grafana. First sql request give me:
SELECT toStartOfMinute(date_time) as t, COUNT(1) as count, service_name
FROM SB_STAT.SBCommonJournal
WHERE t BETWEEN toDateTime('2019-06-04 00:00:00') AND toDateTime('2019-06-05 00:00:00')
GROUP BY t, service_name
t;count;service_name
2019-06-04 15:43:00;1;test3
2019-06-04 15:35:00;1;test3
2019-06-04 15:12:00;1;test
2019-06-04 14:57:00;1;test
2019-06-04 15:32:00;1;test3
2019-06-04 16:36:00;1;test3
2019-06-04 15:21:00;1;test
And the second one:
SELECT arrayJoin(
arrayMap(
x -> toStartOfMinute(addMinutes(toDateTime('2019-06-04 00:00:00'), x)),
range(toUInt64(dateDiff('minute', toDateTime('2019-06-04 00:00:00'), toDateTime('2019-06-05 00:00:00')) + 1)))) AS t,
0 AS count;
t;count
2019-06-04 00:00:00;0
2019-06-04 00:01:00;0
2019-06-04 00:02:00;0
2019-06-04 00:03:00;0
2019-06-04 00:04:00;0
2019-06-04 00:05:00;0
2019-06-04 00:06:00;0
2019-06-04 00:07:00;0
2019-06-04 00:08:00;0
2019-06-04 00:09:00;0
2019-06-04 00:10:00;0
etc..
How can I join these two requests to have counter for each service_name per minute? So I'm gonna have something like this
t;count;service_name
2019-06-04 15:12:00;1;test
2019-06-04 15:12:00;0;test3
2019-06-04 15:13:00;0;test
2019-06-04 15:13:00;0;test3
etc...

Grafana actualy has a zero fill option. The only thing you should have to do with ClickHouse is perhaps use groupArray on a tuple of key/value pairs per timestamp. Grafana normally pulls the returned JSON data apart and will use the first element in the tuple as a series name.
SELECT
t,
groupArray((service_name, cnt)) AS series
FROM (
SELECT
service_name,
toStartOfMinute(date_time) AS t,
count() AS cnt
FROM SBCommonJournal
WHERE (date_time >= toDateTime('2019-06-04 00:00:00')) AND (date_time <= toDateTime('2019-06-05 00:00:00'))
GROUP BY
service_name,
t
)
GROUP BY t
ORDER BY t
Failing that use WITH FILL
SELECT
t,
groupArray((service_name, cnt)) AS series
FROM (
SELECT
service_name,
toStartOfMinute(date_time) AS t,
count() AS cnt
FROM SBCommonJournal
WHERE (date_time >= toDateTime('2019-06-04 00:00:00')) AND (date_time <= toDateTime('2019-06-05 00:00:00'))
GROUP BY
service_name,
t
)
GROUP BY t
ORDER BY t WITH FILL STEP 60
If that still doesn't work for you the following should work (use Grafana $to and $from).
Create some sample data with some generated service_names and metrics:
DROP TABLE IF EXISTS SBCommonJournal;
CREATE TEMPORARY TABLE SBCommonJournal AS
WITH
(
SELECT arrayMap(x -> arrayStringConcat(arrayMap(i -> char(65 + (rand((i + x) + 1000) % 26)), range(16))), range(10))
) AS service_names
SELECT
service_names[1 + (rand() % length(service_names))] AS service_name,
toDateTime('2019-06-04 00:00:00') + toIntervalSecond(rand() % 86400) AS date_time
FROM numbers_mt(1000000)
Query:
SELECT
service_name,
t,
sum(cnt) AS cnt
FROM
(
SELECT
arrayJoin(groupUniqArray(service_name)) AS service_name,
arrayJoin(
(
SELECT groupArray(d)
FROM
(
SELECT arrayJoin([toDateTime('2019-06-04 00:00:00'), toDateTime('2019-06-05 00:00:00')]) AS d
GROUP BY d
ORDER BY d ASC WITH FILL STEP 60
)
)) AS t,
0 AS cnt
FROM SBCommonJournal
WHERE (date_time >= toDateTime('2019-06-04 00:00:00')) AND (date_time <= toDateTime('2019-06-05 00:00:00'))
UNION ALL
SELECT
service_name,
toStartOfMinute(date_time) AS t,
count() AS cnt
FROM SBCommonJournal
WHERE (date_time >= toDateTime('2019-06-04 00:00:00')) AND (date_time <= toDateTime('2019-06-05 00:00:00'))
GROUP BY
service_name,
t
)
GROUP BY
service_name,
t
ORDER BY
t ASC,
service_name ASC

Try this query:
SELECT stub_data.time_tick tick, stub_data.service_name service_name, source_data.count > stub_data.count ? source_data.count : stub_data.count AS count
FROM (
SELECT toStartOfMinute(date_time) as time_tick, COUNT() as count, service_name
FROM (
/* test data */
SELECT test_data.1 date_time, test_data.3 service_name, test_data.2 count
FROM (
SELECT arrayJoin([
(toDateTime('2019-06-04 15:43:01'), 1, 'test3'),
(toDateTime('2019-06-04 15:43:51'), 1, 'test4'),
(toDateTime('2019-06-04 15:43:52'), 1, 'test4'),
(toDateTime('2019-06-04 15:43:53'), 1, 'test4'),
(toDateTime('2019-06-04 15:35:02'), 1, 'test3'),
(toDateTime('2019-06-04 15:30:03'), 1, 'test'),
(toDateTime('2019-06-04 15:31:04'), 1, 'test'),
(toDateTime('2019-06-04 15:32:05'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:06'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:07'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:46'), 1, 'test4'),
(toDateTime('2019-06-04 15:38:07'), 1, 'test')
]) test_data)
)
WHERE time_tick BETWEEN toDateTime('2019-06-04 00:00:00') AND toDateTime('2019-06-05 00:00:00')
GROUP BY time_tick, service_name) source_data
RIGHT JOIN (
/* Cartesian product: [ticks * service_names] */
SELECT time_tick, service_name, 0 as count
FROM (
SELECT arrayJoin(
arrayMap(
x -> addMinutes(toDateTime('2019-06-04 15:30:00'), x),
range(toUInt64(dateDiff('minute', toDateTime('2019-06-04 15:30:00'), toDateTime('2019-06-04 15:43:00')) + 1)))) AS time_tick)
CROSS JOIN (
SELECT arrayJoin(groupUniqArray(test_data.3)) service_name
FROM (
/* test data */
SELECT arrayJoin([
(toDateTime('2019-06-04 15:43:01'), 1, 'test3'),
(toDateTime('2019-06-04 15:43:51'), 1, 'test4'),
(toDateTime('2019-06-04 15:43:52'), 1, 'test4'),
(toDateTime('2019-06-04 15:43:53'), 1, 'test4'),
(toDateTime('2019-06-04 15:35:02'), 1, 'test3'),
(toDateTime('2019-06-04 15:30:03'), 1, 'test'),
(toDateTime('2019-06-04 15:31:04'), 1, 'test'),
(toDateTime('2019-06-04 15:32:05'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:06'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:07'), 1, 'test3'),
(toDateTime('2019-06-04 15:36:46'), 1, 'test4'),
(toDateTime('2019-06-04 15:38:07'), 1, 'test')
]) test_data))) stub_data
ON source_data.time_tick = stub_data.time_tick AND source_data.service_name = stub_data.service_name
ORDER BY tick, service_name;
/* Result:
┌────────────────tick─┬─service_name─┬─count─┐
│ 2019-06-04 15:30:00 │ test │ 1 │
│ 2019-06-04 15:30:00 │ test3 │ 0 │
│ 2019-06-04 15:30:00 │ test4 │ 0 │
│ 2019-06-04 15:31:00 │ test │ 1 │
│ 2019-06-04 15:31:00 │ test3 │ 0 │
│ 2019-06-04 15:31:00 │ test4 │ 0 │
│ 2019-06-04 15:32:00 │ test │ 0 │
│ 2019-06-04 15:32:00 │ test3 │ 1 │
│ 2019-06-04 15:32:00 │ test4 │ 0 │
│ 2019-06-04 15:33:00 │ test │ 0 │
│ 2019-06-04 15:33:00 │ test3 │ 0 │
│ 2019-06-04 15:33:00 │ test4 │ 0 │
│ 2019-06-04 15:34:00 │ test │ 0 │
│ 2019-06-04 15:34:00 │ test3 │ 0 │
│ 2019-06-04 15:34:00 │ test4 │ 0 │
│ 2019-06-04 15:35:00 │ test │ 0 │
│ 2019-06-04 15:35:00 │ test3 │ 1 │
│ 2019-06-04 15:35:00 │ test4 │ 0 │
│ 2019-06-04 15:36:00 │ test │ 0 │
│ 2019-06-04 15:36:00 │ test3 │ 2 │
│ 2019-06-04 15:36:00 │ test4 │ 1 │
│ 2019-06-04 15:37:00 │ test │ 0 │
│ 2019-06-04 15:37:00 │ test3 │ 0 │
│ 2019-06-04 15:37:00 │ test4 │ 0 │
│ 2019-06-04 15:38:00 │ test │ 1 │
│ 2019-06-04 15:38:00 │ test3 │ 0 │
│ 2019-06-04 15:38:00 │ test4 │ 0 │
│ 2019-06-04 15:39:00 │ test │ 0 │
│ 2019-06-04 15:39:00 │ test3 │ 0 │
│ 2019-06-04 15:39:00 │ test4 │ 0 │
│ 2019-06-04 15:40:00 │ test │ 0 │
│ 2019-06-04 15:40:00 │ test3 │ 0 │
│ 2019-06-04 15:40:00 │ test4 │ 0 │
│ 2019-06-04 15:41:00 │ test │ 0 │
│ 2019-06-04 15:41:00 │ test3 │ 0 │
│ 2019-06-04 15:41:00 │ test4 │ 0 │
│ 2019-06-04 15:42:00 │ test │ 0 │
│ 2019-06-04 15:42:00 │ test3 │ 0 │
│ 2019-06-04 15:42:00 │ test4 │ 0 │
│ 2019-06-04 15:43:00 │ test │ 0 │
│ 2019-06-04 15:43:00 │ test3 │ 1 │
│ 2019-06-04 15:43:00 │ test4 │ 3 │
└─────────────────────┴──────────────┴───────┘
*/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Outer Query Column cannot be used in Clickhouse SELECT - sql

Related

How can i fill empty values while summarizing over a frame?

Display COUNT(*) for every week instead of every day

SQL Query (ClickHouse): group by where timediff between values less then X

Extract and sum values with subfields inside string using ClickHouse

Join timeseries with distinct values in ClickHouse

Categories

Resources