Display COUNT(*) for every week instead of every day - sql

Let us say that I have a table with user_id of Int32 type and login_time as DateTime in UTC format. user_id is not unique, so SELECT user_id, login_time FROM some_table; gives following result:
┌─user_id─┬──login_time─┐
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-02 │
│ 2 │ 2021-03-02 │
│ 2 │ 2021-03-03 │
└─────────┴─────────────┘
If I run SELECT COUNT(*) as count, toDate(login_time) as l FROM some_table GROUP BY l I get following result:
┌─count───┬──login_time─┐
│ 2 │ 2021-03-01 │
│ 2 │ 2021-03-02 │
│ 1 │ 2021-03-03 │
└─────────┴─────────────┘
I would like to reformat the result to show COUNT on a weekly level, instead of every day, as I currently do.
My result for the above example could look something like this:
┌──count──┬──year─┬──month──┬─week ordinal┐
│ 5 │ 2021 │ 03 │ 1 │
│ 0 │ 2021 │ 03 │ 2 │
│ 0 │ 2021 │ 03 │ 3 │
│ 0 │ 2021 │ 03 │ 4 │
└─────────┴───────┴─────────┴─────────────┘
I have gone through the documentation, found some interesting functions, but did not manage to make them solve my problem.
I have never worked with clickhouse before and am not very experienced with SQL, which is why I ask here for help.

Try this query:
select count() count, toYear(start_of_month) year, toMonth(start_of_month) month,
toWeek(start_of_week) - toWeek(start_of_month) + 1 AS "week ordinal"
from (
select *, toStartOfMonth(login_time) start_of_month,
toStartOfWeek(login_time) start_of_week
from (
/* emulate test dataset */
select data.1 user_id, toDate(data.2) login_time
from (
select arrayJoin([
(1, '2021-02-27'),
(1, '2021-02-28'),
(1, '2021-03-01'),
(1, '2021-03-01'),
(1, '2021-03-02'),
(2, '2021-03-02'),
(2, '2021-03-03'),
(2, '2021-03-08'),
(2, '2021-03-16'),
(2, '2021-04-01')]) data)
)
)
group by start_of_month, start_of_week
order by start_of_month, start_of_week
/*
┌─count─┬─year─┬─month─┬─week ordinal─┐
│ 1 │ 2021 │ 2 │ 4 │
│ 1 │ 2021 │ 2 │ 5 │
│ 5 │ 2021 │ 3 │ 1 │
│ 1 │ 2021 │ 3 │ 2 │
│ 1 │ 2021 │ 3 │ 3 │
│ 1 │ 2021 │ 4 │ 1 │
└───────┴──────┴───────┴──────────────┘
*/

Related

Calculate percentage group by outputs incorrect results

I'm trying to get percentage from the table in clickhouse DB. I'm having a difficulty writing a query that will calculate percentage of type within each timestamp group.
SELECT
(intDiv(toUInt32(toDateTime(atime)), 120) * 120) * 1000 AS timestamp,
if(dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 5, 'sec5', if((dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) > 5) AND (dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 30), 'sec30', if((dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) > 30) AND (dateDiff('second', toDateTime(t1.atime), toDateTime(t2.unixdsn)) <= 60), 'sec60', 'secgt60'))) AS type,
count() AS total_count,
(total_count * 100) /
(
SELECT count()
FROM sess_logs.logs_view
WHERE (status IN (0, 1)) AND (toDateTime(atime) >= toDateTime(1621410625)) AND (toDateTime(atime) <= toDateTime(1621421425))
) AS percentage_cnt
FROM sess_logs.logs_view AS t1
INNER JOIN
(
SELECT
trid,
atime,
unixdsn,
status
FROM sess_logs.logs_view
WHERE (status = 1) AND (toDate(date) >= toDate(1621410625)) AND if('all' = 'all', 1, userid =
(
SELECT userid
FROM sess_logs.user_details
WHERE (username != 'all') AND (username = 'all')
))
) AS t2 ON t1.trid = t2.trid
WHERE (t1.status = 0) AND (t2.status = 1) AND ((toDate(atime) >= toDate(1621410625)) AND (toDate(atime) <= toDate(1621421425))) AND (toDateTime(atime) >= toDateTime(1621410625)) AND (toDateTime(atime) <= toDateTime(1621421425)) AND if('all' = 'all', 1, userid =
(
SELECT userid
FROM sess_logs.user_details
WHERE (username != 'all') AND (username = 'all')
))
GROUP BY
timestamp,
type
ORDER BY timestamp ASC
Output
┌─────timestamp─┬─type────┬─total_count─┬─────────percentage_cnt─┐
│ 1621410600000 │ sec5 │ 15190 │ 0.9650982602181922 │
│ 1621410600000 │ sec30 │ 1525 │ 0.09689103665785011 │
│ 1621410600000 │ sec60 │ 33 │ 0.002096658498169871 │
│ 1621410600000 │ secgt60 │ 61 │ 0.0038756414663140043 │
│ 1621410720000 │ secgt60 │ 67 │ 0.004256852102344891 │
│ 1621410720000 │ sec30 │ 2082 │ 0.13228009070271735 │
│ 1621410720000 │ sec60 │ 65 │ 0.004129781890334595 │
│ 1621410720000 │ sec5 │ 20101 │ 1.2771191658094723 │
│ 1621410840000 │ sec30 │ 4598 │ 0.29213441741166873 │
│ 1621410840000 │ sec60 │ 36 │ 0.002287263816185314 │
│ 1621410840000 │ secgt60 │ 61 │ 0.0038756414663140043 │
│ 1621410840000 │ sec5 │ 17709 │ 1.1251431922451591 │
│ 1621410960000 │ sec60 │ 17 │ 0.0010800968020875095 │
│ 1621410960000 │ secgt60 │ 81 │ 0.005146343586416957 │
│ 1621410960000 │ sec30 │ 2057 │ 0.13069171305258864 │
│ 1621410960000 │ sec5 │ 18989 │ 1.206468127931748 │
│ 1621411080000 │ sec60 │ 9 │ 0.0005718159540463285 │
│ 1621411080000 │ sec30 │ 3292 │ 0.20915756896894594 │
│ 1621411080000 │ sec5 │ 15276 │ 0.9705622793346349 │
│ 1621411080000 │ secgt60 │ 78 │ 0.004955738268401514 │
└───────────────┴─────────┴─────────────┴────────────────────────┘
It is returning the percentage for each row, but when I do sum of percentage_cnt column, the total does not goes to 100% instead it goes to 80%.
Please help me in correcting my query. I know query is huge, you guys can give simpler example for my use case. Thanks.

Clickhouse: Mapping BETWEEN filtering from an array

I understand if I want to filter a column between two numbers I can use BETWEEN:
SELECT a
FROM table
WHERE a BETWEEN 1 AND 5
Is there a way of mapping the filtering to an array of values, for instance, if the array was [1, 10, ... , N]:
SELECT a
FROM table
WHERE (a BETWEEN 1 AND 1+4) AND (a BETWEEN 10 AND 10+4) AND ... AND (a BETWEEN N AND N+4)
Try this query:
WITH
[1, 10, 75] AS starts_from,
4 AS step,
arrayMap(x -> (x, x + step), starts_from) AS intervals
SELECT number
FROM numbers(100)
WHERE arrayFirstIndex(x -> number >= x.1 AND number <= x.2, intervals) != 0
/*
┌─number─┐
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
│ 75 │
│ 76 │
│ 77 │
│ 78 │
│ 79 │
└────────┘
*/

SQL query returns product of results instead of sum

How can I make sure that with this join I'll only receive the sum of results and not the product?
I have a project entity, which contains two one-to-many relations. If I query disposal and supply.
With the following query:
SELECT *
FROM projects
JOIN disposals disposal on projects.project_id = disposal.disposal_project_refer
WHERE (projects.project_name = 'Höngg')
I get following result:
project_id,project_name,disposal_id,depository_refer,material_refer,disposal_date,disposal_measurement,disposal_project_refer
1,Test,1,1,1,2020-08-12 15:24:49.913248,123,1
1,Test,2,1,2,2020-08-12 15:24:49.913248,123,1
1,Test,7,2,1,2020-08-12 15:24:49.913248,123,1
1,Test,10,3,4,2020-08-12 15:24:49.913248,123,1
The same amount of results get returned by same query for supplies.
type Project struct {
ProjectID uint `gorm:"primary_key" json:"ProjectID"`
ProjectName string `json:"ProjectName"`
Disposals []Disposal `gorm:"ForeignKey:disposal_project_refer"`
Supplies []Supply `gorm:"ForeignKey:supply_project_refer"`
}
If I query both tables I would like to receive the sum of both single queries. Currently I am receiving 16 results (4 supply results multiplied by 4 disposal results).
The combined query:
SELECT *
FROM projects
JOIN disposals disposal ON projects.project_id = disposal.disposal_project_refer
JOIN supplies supply ON projects.project_id = supply.supply_project_refer
WHERE (projects.project_name = 'Höngg');
I have tried achieving my goal with union queries but I was not sucessfull. What else should I try to achieve my goal?
It is your case (simplified):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select * from a join b on (a.x=b.x) join c on (b.x=c.x);
┌───┬───┬───┬────┬───┬─────┐
│ x │ y │ x │ z │ x │ t │
├───┼───┼───┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 11 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 11 │ 1 │ 222 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 222 │
└───┴───┴───┴────┴───┴─────┘
It produces cartesian join because the value for join is same in all tables. You need some additional condition for joining your data.For example (tests for various cases):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
└───┴───┴────┴───┴────┴────┴───┴─────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬─────┬──────┬──────┬──────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ 3 │ 1 │ 33 │ ░░░░ │ ░░░░ │ ░░░░ │
└───┴───┴────┴───┴─────┴──────┴──────┴──────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222),(1,333))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬──────┬──────┬──────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼──────┼──────┼──────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ ░░░░ │ ░░░░ │ ░░░░ │ 3 │ 1 │ 333 │
└───┴───┴──────┴──────┴──────┴────┴───┴─────┘
db<>fiddle
Note that there is no any obvious relations between disposals and supplies (b and c in my example) so the order of both could be random. As for me the better solution for this task could be the aggregation of the data from those tables using JSON for example:
with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select
*,
(select json_agg(to_json(b.*)) from b where a.x=b.x) as b,
(select json_agg(to_json(c.*)) from c where a.x=c.x) as c
from a;
┌───┬───┬──────────────────────────────────────────────────┬────────────────────────────────────┐
│ x │ y │ b │ c │
├───┼───┼──────────────────────────────────────────────────┼────────────────────────────────────┤
│ 1 │ 1 │ [{"x":1,"z":11}, {"x":1,"z":22}, {"x":1,"z":33}] │ [{"x":1,"t":111}, {"x":1,"t":222}] │
└───┴───┴──────────────────────────────────────────────────┴────────────────────────────────────┘

how to show results of postcodes within a radius of a point

hi back with another problem lol, i have a table with several columns; 2 of which latitude and longitude and other is crime types, what i need to do is work out how many crimes were committed within an x amount of meters from a certain point
what i need is to find the amount of crimes that took place 250m, 500m and 1km from E:307998m, N:188746m this point
help would be appreciated or even just a push in the right direction
thanks
What an interesting question. The following may help.
You can use Pythagoras's theorem to calculate the distance from a point ([100,100] in this case) and any incident, then count the total where this is less than a threshold and of the right type.
# select * from test;
┌─────┬─────┬──────┐
│ x │ y │ type │
├─────┼─────┼──────┤
│ 100 │ 100 │ 1 │
│ 104 │ 100 │ 1 │
│ 110 │ 100 │ 1 │
│ 110 │ 102 │ 1 │
│ 50 │ 102 │ 2 │
│ 50 │ 150 │ 2 │
│ 50 │ 152 │ 3 │
│ 150 │ 152 │ 1 │
│ 40 │ 152 │ 1 │
│ 150 │ 150 │ 2 │
└─────┴─────┴──────┘
(10 rows)
select count(*) from test where sqrt((x-100)*(x-100)+(y-100)*(y-100))<30 and type = 1;
┌───────┐
│ count │
├───────┤
│ 4 │
└───────┘
(1 row)

operator does not exist: integer = integer[] plpgsql error

I have a problem where operator does not exist,
integer = integer[] error comes up when I try to perform the query
select staff
from affiliations
where orgUnit = any (select unnest(*) from get_ou(661));
The function get_ou(661) returns a array of integers. Iwas wondering why I can't use the = any to obtain the staff from any of the orgunits from the array.
Thank you for your help!
The ANY predicate used with subselect ensure comparing value against any value returned by subselect.
postgres=# SELECT * FROM foo_table;
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 1 │ 9 │
│ 2 │ 4 │
│ 3 │ 1 │
│ 4 │ 3 │
│ 5 │ 7 │
│ 6 │ 5 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(10 rows)
CREATE OR REPLACE FUNCTION public.foo(VARIADIC integer[])
RETURNS integer[]
LANGUAGE sql
AS $function$ SELECT $1 $function$
It is strange, your example is broken (but with syntax error). When I fix it, it is working:
postgres=# SELECT * FROM foo_table
WHERE x = ANY(SELECT unnest(v) FROM foo(3,8) g(v));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)
You should to change syntax and move from subselect to array expression (this solution should be preferred for this purpose):
postgres=# SELECT * FROM foo_table WHERE x = ANY(foo(3,8));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)