Clickhouse: how to convert date to long integer? - sql

I have date as a string field in form of '2018-01-02 12:12:22', what is the right way to convert it to long int timestamp in ClickHouse SQL?

:) SELECT toUInt64(toDateTime('2018-01-02 12:12:22'));
SELECT toUInt64(toDateTime('2018-01-02 12:12:22'))
┌─toUInt64(toDateTime('2018-01-02 12:12:22'))─┐
│ 1514884342 │
└─────────────────────────────────────────────┘
1 rows in set. Elapsed: 0.001 sec.

My query returns a different result
SELECT toUInt64(toDateTime('2018-01-02 12:12:22', 'UTC'))
┌─toUInt64(toDateTime('2018-01-02 12:12:22', 'UTC'))─┐
│ 1514895142 │
└────────────────────────────────────────────────────┘

Related

Substraction DateTime64 from DateTime64 in SQL clickhouse BD

I trying to find to calculate time difference in milliseconds betweent timestamps of two tables.
like this,
SELECT value, (table1.time - table2.time) AS time_delta
but i get error :
llegal types DateTime64(9) and DateTime64(9) of arguments of function minus:
so i can't substract DateTime64 in clickhouse.
Second way i tryed use DATEDIFF , but this func is limited by "SECONDS", i need values in "MILLISECONDS"
this is supported, but i get zeros in diff, because difference is too low(few millisecond):
SELECT value, dateDiff(SECOND , table1.time, table2.platform_time) AS time_delta
this is not supported:
SELECT value, dateDiff(MILLISECOND , table1.time, table2.time) AS time_delta
What's a better way to resolve my problem?
P.S i also tryed convert values to float, it's work , but looks strange,
SELECT value, (toFloat64(table1.time) - toFloat64(table2.time)) AS time_delta
as result i get somethink like this:
value time
51167477 -0.10901069641113281
#ditrauth Try casting to Float64, as the subsecond portion that you are looking for is stored as a decimal. Aslo, you want DateTime64(3) for milliseconds, see the docs. see below:
CREATE TABLE dt(
start DateTime64(3, 'Asia/Istanbul'),
end DateTime64(3, 'Asia/Istanbul')
)
ENGINE = MergeTree ORDER BY end
insert into dt values (1546300800123, 1546300800125),
(1546300800123, 1546300800133)
SELECT
start,
CAST(start, 'Float64'),
end,
CAST(end, 'Float64'),
CAST(end, 'Float64') - CAST(start, 'Float64') AS diff
FROM dt
┌───────────────────start─┬─CAST(start, 'Float64')─┬─────────────────────end─┬─CAST(end, 'Float64')─┬─────────────────diff─┐
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.125 │ 1546300800.125 │ 0.002000093460083008 │
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.133 │ 1546300800.133 │ 0.009999990463256836 │
└─────────────────────────┴────────────────────────┴─────────────────────────┴──────────────────────┴──────────────────────┘
2 rows in set. Elapsed: 0.001 sec.

ClickHouse querying the sum of MAP values, matching list of keys

I have around 10,000 sets of data arriving each minute containing the number of people who are observed in each of 4624 grid references (0-4623)
An example of the data:
[2022-07-11 19:45:00]
[646] => 1
[647] => 1
[648] => 1
[776] => 1
[777] => 1
[2465] => 2
[2466] => 1
[2467] => 2
...
...
I'm planning to store these in a table like this, although open to suggestions (500 stores each with ~20 IP addresses), data per minute per IP
CREATE TABLE grid_counters
(
`storeID` UInt16,
`camIP` IPv4,
`timestamp` DateTime,
`grid_counts` Map(UInt16, UInt8)
)
ENGINE = MergeTree
PARTITION BY (toMonday(timestamp), storeID)
ORDER BY (timestamp, storeID)
Example Row
storeID
camIP
timestamp
grid_counts
404
192.168.2.156
2022-07-11 19:47:00
{646:1,647:1,648:1,776:1,777:1,2465:2,2466:1,2467:2}
My question is querying these records. I need to find out how many people were observed within an arbitrary time period (upto a day) who were present in any of a an arbitrary number of grid references.
I've tried SumMap, however that doesn't seem to be what I thought it was.
SELECT sumMap(grid_counts, [1, 2, 2535, 646, 647]) as ct FROM grid_counters WHERE
storeID = 404 AND
camIP = IPv4StringToNum('192.168.2.156') AND
timestamp >= '2022-07-11 19:30:00' AND
timestamp <= '2022-07-11 20:59:59' AND
hasAny(mapKeys(grid_counts), [1, 2, 2535, 646, 647])
While I can just select and parse the grid_counts after the query I would much prefer to return a total number of people directly from the query
SELECT grid_counts FROM grid_counters WHERE
storeID = 404 AND
camIP = IPv4StringToNum('192.168.2.156') AND
timestamp >= '2022-07-11 19:30:00' AND
timestamp <= '2022-07-11 20:59:59' AND
hasAny(mapKeys(grid_counts), [1, 2, 2535, 646, 647])
...namely the sum of all values matched across all rows - from the above query this would be the sum of values from matched keys: [2465, 2535, 2399, 646, 647]
For this row {646:1,647:1,648:1,776:1,777:1,2465:2,2466:1,2467:2} it would sum the total for matched keys [646, 647] - total (2)
For this row {2535:3,647:1,99:10} it would sum the total for matched keys [2535, 647]: (4)
I'm trying to find the sum of all totals - for all rows: (2+4) = 6
What should I use in place of sumMap to calculate the sum of all values with matching keys?
Help is much appreciated.
Edit
Thanks to Denny Crane's two array advice and solution I settled upon:
SELECT arraySum(toSum) AS total
FROM
(
SELECT
(sumMapFiltered([2057, 2058, 2124, 2125, 2126, 9])(grid_count_keys, grid_count_values) AS r).1,
r.2 AS toSum
FROM grid_counters
)
┌─total─┐
│ 113 │
└───────┘
select map(646,1,647,1) union all select map(646,3);
┌─map(646, 1, 647, 1)─┐
│ {646:3} │
│ {646:1,647:1} │
└─────────────────────┘
select sumMap(x)
from (
select map(646,1,647,1) x union all select map(646,3)
);
┌─sumMap(x)─────┐
│ {646:4,647:1} │
└───────────────┘
select sumMapFiltered([646])((mapKeys(x), mapValues(x)))
from (
select map(646,1,647,1) x union all select map(646,3)
);
┌─sumMapFiltered([646])(tuple(mapKeys(x), mapValues(x)))─┐
│ ([646],[4]) │
└────────────────────────────────────────────────────────┘
I think you should use 2 Arrays instead of Map (Map is unstable and slower).
`grid_counts_keys` Array(UInt16),
`grid_counts_values` Array(UInt8)
select sumMapFiltered([646])(grid_counts_keys, grid_counts_values) from (
select [646,647] grid_counts_keys,[1,1] grid_counts_values
union all select [646],[3]
);
┌─sumMapFiltered([646])(grid_counts_keys, grid_counts_values)─┐
│ ([646],[4]) │
└─────────────────────────────────────────────────────────────┘
select sumMap((arrayFilter(i->i.1=646, arrayZip(grid_counts_keys, grid_counts_values) ) as x).1, x.2) r
from (
select [646,647] grid_counts_keys,[1,1] grid_counts_values
union all select [646],[3]
);
┌─r───────────┐
│ ([646],[4]) │
└─────────────┘

Is this the correct way to use it via grafana?

ClickHouse:
┌─name──────────┬─type──────────┬
│ FieldUUID │ UUID │
│ EventDate │ Date │
│ EventDateTime │ DateTime │
│ Metric │ String │
│ LabelNames │ Array(String) │
│ LabelValues │ Array(String) │
│ Value │ Float64 │
└───────────────┴───────────────┴
Row 1:
──────
FieldUUID: 499ca963-2bd4-4c94-bc60-e60757ccaf6b
EventDate: 2021-05-13
EventDateTime: 2021-05-13 09:24:18
Metric: cluster_cm_agent_physical_memory_used
LabelNames: ['host']
LabelValues: ['test01']
Value: 104189952
Grafana:
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
no data points.
question: Is this the correct way to use it via grafana?
Example:
cluster_cm_agent_physical_memory_used{host='test01'} 104189952
Grafana expects your SQL will return time series data format for most of the visualization.
One column DateTime\Date\DateTime64 or UInt32 which describe
timestamp
One or several columns with Numeric types (Float, Int*,
UInt*) with metric values (column name will use as time series name)
optional one column with String which can describe multiple time
series name
or advanced "time series" format, when first column will timestamp, and the second column will Array(tuple(String, Numeric)) where String column will time series name (usually it used with
so, select table metrics.shell as table and EventDateTime as field in drop-down in query editor your query could be changed to
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
SQL query from your post, can be visualized without changes only with the Table plugin and you shall change "time series" to "table" format for properly data transformation on grafana side
Analog for promQL query
cluster_cm_agent_physical_memory_used{host='test01'} 104189952
should look like
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND LabelValues[indexOf(LabelNames,'host')] = 'test01'
AND $timeFilter
ORDER BY
EventDateTime

Why Postgresql ts_query remove last char?

I've run following query:
SELECT
*
FROM
(
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_company.company_name as name,
public.tbl_company.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_company
WHERE
public.tbl_company.document ##to_tsquery('idis:*')
UNION
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_person.full_name as name,
public.tbl_person.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_person
WHERE
public.tbl_person.document ##to_tsquery('idis:*')
)as customers
ORDER BY qrank DESC
And I've received following result:
I've search a text as 'idis' but ts_query remove 's' char and search 'idi'. Results ordered by rank and rank of idil greather than idis.
Why ts_query removed last char?
How can I fix this problem?
You shoul set your default text search configuration to a language where the stemming rules are as you expect them to be:
SET default_text_search_config='english';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ t │
└──────────┘
(1 row)
SET default_text_search_config='turkish';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ f │
└──────────┘
(1 row)

How to count items by day

I'm needing to make a report from our ticket database that has the number of tickets closed per day by tech. My SQL query looks something like this:
select
i.DT_CLOSED,
rp.NAME,
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
DT_CLOSED is the date and time in ISO format, and NAME is the rep name. I also have a calculated value in my dataset called TICKETSDAY that is calcualted using =DateValue(Fields!DT_CLOSED.Value) giving me the day without the time.
Right now I have a table set up that is grouped by NAME, then by TICKETSDAY, and I would like the last column to be a count of how many tickets there are. But when I set the last column to =Count(DT_CLOSED) it lists a 1 on each row for each ticket instead of aggregating so that my table looks like this:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
And I need it to be:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 3│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
Any idea what I'm doing wrong? Any help would be greatly appreciated.
I believe that Marc B is correct. You need to group by the non aggregate columns in your select statement. Try something along these lines.
select
i.DT_CLOSED,
rp.NAME,
COUNT(i.ID)
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
GROUP BY rp.NAME, i.DT_CLOSED
Without a group by to aggregate your rows together your query is counting each row distinctly. I'm unfamiliar with how the report builder works, but try adding the group by clause manually and see what you get.
Let me know if I can clarify anything.