How to calculate the difference of two sums in SQL - sql

I'm using Grafana to show some data from Clickhouse. The data comes from a table containing itime, count and some other columns.
id method count itime
1 aaa 12 2021-07-20 00:07:06
2 bbb 9 2021-07-20 00:07:06
3 ccc 7 2021-07-20 00:07:07
...
Now I can execute the following SQL to get the sum of count between two itimes:
SELECT toUnixTimestamp(toStartOfMinute(itime)) * 1000 as t,
method,
sum(count) as c
FROM me.my_table
WHERE itime BETWEEN toDateTime(1631870605) AND toDateTime(1631874205)
and method like 'a%'
GROUP BY method, t
HAVING c > 500
ORDER BY t
It works as expected.
Now, I want to select the sum(count) according to the difference between sum(count) - sum(count)<--7-day-ago. Something like SELECT ... FROM ... WHERE ... HAVING c - c<--7-day-ago >= 100. But I don't know how.

create table test(D Date, Key Int64, Val Int64) Engine=Memory;
insert into test select today(), number, 100 from numbers(5);
insert into test select today()-7, number, 110 from numbers(5);
select sx.2 d1, Key, sumIf(sx.1, D=sx.2) s, sumIf(sx.1, D!=sx.2) s1 from (
select D, Key, arrayJoin([(s, D), (s, D + interval 7 day)]) sx
from (select D, Key, sum(Val) s from test group by D, Key)
)group by d1, Key
order by d1, Key;
┌─────────d1─┬─Key─┬───s─┬──s1─┐
│ 2021-09-10 │ 0 │ 110 │ 0 │
│ 2021-09-10 │ 1 │ 110 │ 0 │
│ 2021-09-10 │ 2 │ 110 │ 0 │
│ 2021-09-10 │ 3 │ 110 │ 0 │
│ 2021-09-10 │ 4 │ 110 │ 0 │
│ 2021-09-17 │ 0 │ 100 │ 110 │
│ 2021-09-17 │ 1 │ 100 │ 110 │
│ 2021-09-17 │ 2 │ 100 │ 110 │
│ 2021-09-17 │ 3 │ 100 │ 110 │
│ 2021-09-17 │ 4 │ 100 │ 110 │
│ 2021-09-24 │ 0 │ 0 │ 100 │
│ 2021-09-24 │ 1 │ 0 │ 100 │
│ 2021-09-24 │ 2 │ 0 │ 100 │
│ 2021-09-24 │ 3 │ 0 │ 100 │
│ 2021-09-24 │ 4 │ 0 │ 100 │
└────────────┴─────┴─────┴─────┘
SELECT
D,
Key,
Val,
any(Val) OVER (PARTITION BY Key ORDER BY D ASC RANGE BETWEEN 7 PRECEDING AND 7 PRECEDING) Val1
FROM test
┌──────────D─┬─Key─┬─Val─┬─Val1─┐
│ 2021-09-10 │ 0 │ 110 │ 0 │
│ 2021-09-17 │ 0 │ 100 │ 110 │
│ 2021-09-10 │ 1 │ 110 │ 0 │
│ 2021-09-17 │ 1 │ 100 │ 110 │
│ 2021-09-10 │ 2 │ 110 │ 0 │
│ 2021-09-17 │ 2 │ 100 │ 110 │
│ 2021-09-10 │ 3 │ 110 │ 0 │
│ 2021-09-17 │ 3 │ 100 │ 110 │
│ 2021-09-10 │ 4 │ 110 │ 0 │
│ 2021-09-17 │ 4 │ 100 │ 110 │
└────────────┴─────┴─────┴──────┘

i had some similar problem a while ago
please check the SQLfiddle
to see the result press buttons: first- build schema, second: run sql
naming
i assumed that you want for the same period A you selected a seven days later period B of time to compare (you need to be more specific, what you really looking for).
period A = your selected time period (between from and to)
period B = your selected time period one week in the past
problem
this is a real delicate question, if i understood the question right.
your example is grouped by minute inside a period A. this means, you really need to have data in period A for every minute you have data in period B, otherwise you will ignore period B data inside your chosen period.
as you can see in the sqlfiddle, i made two query strings. the first one is working, but ignores B data. the second one does a right join (sadly mysql does not support full outer joins to show all in one table) and shows 2 ignored entries.
it even makes it worse, because you group by method too.
(in this case for the fiddle you have to change the last line of the join and add:)
as b on a.unix_itime = b.unix_itime and a.method = b.method
this means, you need for every selected method and period minutewise data.
it would be better if you group only by the method and not time, as you already use a time condition (period A) to keep it small.
or do the stepping bigger, by hour or day..
this code should fit your envirement (mysql does not support toUnixTimestamp, toStartOfMinute, toDateTime):
SELECT
a.unix_itime * 1000 as t,
a.method,
a.sum AS c,
b.sum AS c2,
ifnull(a.sum,0) - ifnull(b.sum,0) as diff,
FROM (select method, sum(count) as sum, toUnixTimestamp(toStartOfMinute(itime)) as unix_itime
from my_table
WHERE method like 'a%' and
itime BETWEEN toDateTime(1631870605)
AND toDateTime(1631874205)
GROUP BY method, unix_itime)
as a
LEFT JOIN (select method, sum(count) as sum, toUnixTimestamp(toStartOfMinute(itime + INTERVAL 7 DAY)) as unix_itime
from my_table
WHERE method like 'a%' and
itime BETWEEN toDateTime(1631870605)- INTERVAL 7 DAY
AND toDateTime(1631874205)- INTERVAL 7 DAY
GROUP BY method, unix_itime)
as b on a.unix_itime = b.unix_itime and a.method = b.method
ORDER BY a.unix_itime;

The logic is slightly ambiguous, but this could produce one possible meaning of the above. If you still want to return overall SUM(count), just add that to the select list.
SELECT toUnixTimestamp(toStartOfMinute(itime)) * 1000 AS t
, method
, SUM(count) AS c
, SUM(count) - SUM(CASE WHEN itime < current_date - INTERVAL 7 DAY THEN count END) AS c2
FROM me.my_table
WHERE method like 'a%'
GROUP BY method, t
HAVING c2 >= 100
ORDER BY t
;
Adjust as needed.
Maybe you didn't want to return the difference, just filter the groups returned. If so, try this:
SELECT toUnixTimestamp(toStartOfMinute(itime)) * 1000 AS t
, method
, SUM(count) AS c
FROM me.my_table
WHERE method like 'a%'
GROUP BY method, t
HAVING SUM(count) - SUM(CASE WHEN itime < current_date - INTERVAL 7 DAY THEN count END) >= 100
ORDER BY t
;

Related

How can i fill empty values while summarizing over a frame?

I have a query that calculates moving sum over a frame:
SELECT "Дата",
"Износ",
SUM("Сумма") OVER (partition by "Износ" order by "Дата"
rows between unbounded preceding and current row) AS "Продажи"
FROM (
SELECT date_trunc('week', period) AS "Дата",
multiIf(wear_and_tear BETWEEN 1 AND 3, '1-3',
wear_and_tear BETWEEN 4 AND 10, '4-10',
wear_and_tear BETWEEN 11 AND 20, '11-20',
wear_and_tear BETWEEN 21 AND 30, '21-30',
wear_and_tear BETWEEN 31 AND 45, '31-45',
wear_and_tear BETWEEN 46 AND 100, '46-100',
'Новые') AS "Износ",
SUM(quantity) AS "Сумма"
FROM shinsale_prod.sale_1c sc
LEFT JOIN product_1c pc ON sc.product_id = pc.id
WHERE 1=1
-- AND partner != 'Наше предприятие'
-- AND wear_and_tear = 0
-- AND stock IN ('ShinSale Щитниково', 'ShinSale Строгино', 'ShinSale Кунцево', 'ShinSale Санкт-Петербург', 'Шиномонтаж Подольск')
AND seasonality = 'з'
-- AND (quantity IN {{quant}} OR quantity IN -{{quant}})
-- AND stock in {{Склад}}
GROUP BY "Дата", "Износ"
HAVING "Дата" BETWEEN '2021-06-01' AND '2022-01-08'
ORDER BY 'Дата'
The thing is that in some groups I have now rows dated between 2021-12-20 and 2022-01-03
Therefore the line that represent this group has a gap on my chart.
Is there a way I can fill this gap with average values or smth?
I tried to right join my subquery to empty range of dates, but then i get empty rows and my filters in WHERE section kill the query and then I get empty or nearly empty result
You can generate mockup dates and construct a proper outer join like this:
SELECT
a.the_date,
sum(your_query.value) OVER (PARTITION BY 1 ORDER BY a.the_date ASC)
FROM
(
SELECT
number AS value,
toDate('2021-01-01') + value AS the_date
FROM numbers(10)
) AS your_query
RIGHT JOIN
(
WITH
toStartOfDay(toDate('2021-01-01')) AS start,
toStartOfDay(toDate('2021-01-14')) AS end
SELECT arrayJoin(arrayMap(x -> toDate(x), range(toUInt32(start), toUInt32(end), 24 * 3600))) AS the_date
) AS a ON a.the_date = your_query.the_date
Then the results will have no gaps:
┌─a.the_date─┬─sum(value) OVER (PARTITION BY 1 ORDER BY a.the_date ASC)─┐
│ 2021-01-01 │ 0 │
│ 2021-01-02 │ 1 │
│ 2021-01-03 │ 3 │
│ 2021-01-04 │ 6 │
│ 2021-01-05 │ 10 │
│ 2021-01-06 │ 15 │
│ 2021-01-07 │ 21 │
│ 2021-01-08 │ 28 │
│ 2021-01-09 │ 36 │
│ 2021-01-10 │ 45 │
│ 2021-01-11 │ 45 │
│ 2021-01-12 │ 45 │
│ 2021-01-13 │ 45 │
└────────────┴──────────────────────────────────────────────────────────┘

Clickhouse. How to create a column which preserves the last value from another column?

I'm trying to figure out how one can make in Clickhouse a column with the name "What I want" in the table below:
Category
Row Number
What I have
What I want
A
1
0
0
A
2
1
1
B
3
0
1
B
4
0
1
A
5
3
3
B
6
0
3
B
7
0
3
A
8
2
2
B
9
0
2
There are two categories A and B.
And I want B category to 'remember' the latest value from A category.
There's a column by which all records are ordered: Row Number.
I've found a function arrayFill which looks promising but unfortunately it isn't supported by my version of server (19.14.11.16) and there's no chance it'll be updated soon.
I guess there's should be some trick with clickhouse arrays. But I didn't manage to find a way. Is there any clickhouse-ninja who could give me a hint how to deal with it?
p.s. In fact B category isn't zero filled but I provide it just to simplify a little my problem.
create table z(c String, rn Int64, hv Int64) Engine=Memory;
insert into z values ('A',1,0)('A',2,1)('B',3,0)('B',4,0)('A',5,3)('B',6,0)('B',7,0)('A',8,2)('B',9,0);
select (arrayJoin(flatten(arrayMap( j -> arrayMap(m -> if(m.1 = 'B', (m.1, m.2, ga1[j-1][-1].3), m) , ga1[j]),
arrayEnumerate(arraySplit(k,i -> ga[i].1 <> ga[i-1].1 , (groupArray( (c, rn, hv) ) as ga), arrayEnumerate(ga)) as ga1)))) as r).1 _c,
r.2 _rn, r.3 _n
from (select * from z order by rn)
┌─_c─┬─_rn─┬─_n─┐
│ A │ 1 │ 0 │
│ A │ 2 │ 1 │
│ B │ 3 │ 1 │
│ B │ 4 │ 1 │
│ A │ 5 │ 3 │
│ B │ 6 │ 3 │
│ B │ 7 │ 3 │
│ A │ 8 │ 2 │
│ B │ 9 │ 2 │
└────┴─────┴────┘

Display COUNT(*) for every week instead of every day

Let us say that I have a table with user_id of Int32 type and login_time as DateTime in UTC format. user_id is not unique, so SELECT user_id, login_time FROM some_table; gives following result:
┌─user_id─┬──login_time─┐
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-02 │
│ 2 │ 2021-03-02 │
│ 2 │ 2021-03-03 │
└─────────┴─────────────┘
If I run SELECT COUNT(*) as count, toDate(login_time) as l FROM some_table GROUP BY l I get following result:
┌─count───┬──login_time─┐
│ 2 │ 2021-03-01 │
│ 2 │ 2021-03-02 │
│ 1 │ 2021-03-03 │
└─────────┴─────────────┘
I would like to reformat the result to show COUNT on a weekly level, instead of every day, as I currently do.
My result for the above example could look something like this:
┌──count──┬──year─┬──month──┬─week ordinal┐
│ 5 │ 2021 │ 03 │ 1 │
│ 0 │ 2021 │ 03 │ 2 │
│ 0 │ 2021 │ 03 │ 3 │
│ 0 │ 2021 │ 03 │ 4 │
└─────────┴───────┴─────────┴─────────────┘
I have gone through the documentation, found some interesting functions, but did not manage to make them solve my problem.
I have never worked with clickhouse before and am not very experienced with SQL, which is why I ask here for help.
Try this query:
select count() count, toYear(start_of_month) year, toMonth(start_of_month) month,
toWeek(start_of_week) - toWeek(start_of_month) + 1 AS "week ordinal"
from (
select *, toStartOfMonth(login_time) start_of_month,
toStartOfWeek(login_time) start_of_week
from (
/* emulate test dataset */
select data.1 user_id, toDate(data.2) login_time
from (
select arrayJoin([
(1, '2021-02-27'),
(1, '2021-02-28'),
(1, '2021-03-01'),
(1, '2021-03-01'),
(1, '2021-03-02'),
(2, '2021-03-02'),
(2, '2021-03-03'),
(2, '2021-03-08'),
(2, '2021-03-16'),
(2, '2021-04-01')]) data)
)
)
group by start_of_month, start_of_week
order by start_of_month, start_of_week
/*
┌─count─┬─year─┬─month─┬─week ordinal─┐
│ 1 │ 2021 │ 2 │ 4 │
│ 1 │ 2021 │ 2 │ 5 │
│ 5 │ 2021 │ 3 │ 1 │
│ 1 │ 2021 │ 3 │ 2 │
│ 1 │ 2021 │ 3 │ 3 │
│ 1 │ 2021 │ 4 │ 1 │
└───────┴──────┴───────┴──────────────┘
*/

SQL Query (ClickHouse): group by where timediff between values less then X

I need a little help with sql-query. I'm using clickhouse, but maybe standard SQL syntax is enough for this task.
I've got the following table:
event_time; Text; ID
2021-03-16 09:00:48; Example_1; 1
2021-03-16 09:00:49; Example_2; 1
2021-03-16 09:00:50; Example_3; 1
2021-03-16 09:15:48; Example_1_1; 1
2021-03-16 09:15:49; Example_2_2; 1
2021-03-16 09:15:50; Example_3_3; 1
What I want to have at the end for this example - 2 rows:
Example_1Example2Example_3
Example_1_1Example2_2Example_3_3
Concatenation of Text field based on ID. The problem that this ID is not unique during some time interval. It's unique only for a minute as an example. So I want to concatenate only strings where the difference between first and last row is less than a minute.
Right now I've got a query like:
SELECT arrayStringConcat(groupArray(Text))
FROM (SELECT event_time, Text, ID
FROM Test_Table
ORDER by event_time asc)
GROUP BY ID;
What kind of condition should I add here?
Here is an example
create table X(event_time DateTime, Text String, ID Int64) Engine=Memory;
insert into X values ('2021-03-16 09:00:48','Example_1', 1), ('2021-03-16 09:00:49','Example_2', 1), ('2021-03-16 09:00:50','Example_3', 1), ('2021-03-16 09:01:48','Example_4', 1), ('2021-03-16 09:01:49','Example_5', 1), ('2021-03-16 09:15:48','Example_1_1', 1), ('2021-03-16 09:15:49','Example_2_2', 1),('2021-03-16 09:15:50','Example_3_3', 1);
SELECT * FROM X
┌──────────event_time─┬─Text────────┬─ID─┐
│ 2021-03-16 09:00:48 │ Example_1 │ 1 │
│ 2021-03-16 09:00:49 │ Example_2 │ 1 │
│ 2021-03-16 09:00:50 │ Example_3 │ 1 │
│ 2021-03-16 09:01:48 │ Example_4 │ 1 │
│ 2021-03-16 09:01:49 │ Example_5 │ 1 │
│ 2021-03-16 09:15:48 │ Example_1_1 │ 1 │
│ 2021-03-16 09:15:49 │ Example_2_2 │ 1 │
│ 2021-03-16 09:15:50 │ Example_3_3 │ 1 │
└─────────────────────┴─────────────┴────┘
What result is expected in this case?
CH 21.3
set allow_experimental_window_functions = 1;
SELECT
ID,
y,
groupArray(event_time),
groupArray(Text)
FROM
(
SELECT
ID,
event_time,
Text,
max(event_time) OVER (PARTITION BY ID ORDER BY event_time ASC RANGE BETWEEN CURRENT ROW AND 60 FOLLOWING) AS y
FROM X
)
GROUP BY
ID,
y
ORDER BY
ID ASC,
y ASC
Query id: 9219a1f2-8c96-425f-9301-745fa7b88b40
┌─ID─┬───────────────────y─┬─groupArray(event_time)────────────────────────────────────────────────────────────────────┬─groupArray(Text)──────────────────────────────────┐
│ 1 │ 2021-03-16 09:01:48 │ ['2021-03-16 09:00:48'] │ ['Example_1'] │
│ 1 │ 2021-03-16 09:01:49 │ ['2021-03-16 09:00:49','2021-03-16 09:00:50','2021-03-16 09:01:48','2021-03-16 09:01:49'] │ ['Example_2','Example_3','Example_4','Example_5'] │
│ 1 │ 2021-03-16 09:15:50 │ ['2021-03-16 09:15:48','2021-03-16 09:15:49','2021-03-16 09:15:50'] │ ['Example_1_1','Example_2_2','Example_3_3'] │
└────┴─────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────┘

how to show results of postcodes within a radius of a point

hi back with another problem lol, i have a table with several columns; 2 of which latitude and longitude and other is crime types, what i need to do is work out how many crimes were committed within an x amount of meters from a certain point
what i need is to find the amount of crimes that took place 250m, 500m and 1km from E:307998m, N:188746m this point
help would be appreciated or even just a push in the right direction
thanks
What an interesting question. The following may help.
You can use Pythagoras's theorem to calculate the distance from a point ([100,100] in this case) and any incident, then count the total where this is less than a threshold and of the right type.
# select * from test;
┌─────┬─────┬──────┐
│ x │ y │ type │
├─────┼─────┼──────┤
│ 100 │ 100 │ 1 │
│ 104 │ 100 │ 1 │
│ 110 │ 100 │ 1 │
│ 110 │ 102 │ 1 │
│ 50 │ 102 │ 2 │
│ 50 │ 150 │ 2 │
│ 50 │ 152 │ 3 │
│ 150 │ 152 │ 1 │
│ 40 │ 152 │ 1 │
│ 150 │ 150 │ 2 │
└─────┴─────┴──────┘
(10 rows)
select count(*) from test where sqrt((x-100)*(x-100)+(y-100)*(y-100))<30 and type = 1;
┌───────┐
│ count │
├───────┤
│ 4 │
└───────┘
(1 row)