i have a column "device" which has rows json values like this
device
{"brand_name":'huawei,'brand_id':'1232',''country:'china'}
{"brand_name":'sony,'brand_id':'1ds232',''country:'japan'}
i want to create a column for every element inside the json like this
brand_name
brand_id
country
huawei
1232
china
sony
1ds232
japan
In a standard SQL i have done like this,
Select
device.brand_name
device.brand_id
device.country
From table
I want to do this in clickhouse and
In this case JSON only have three values ( brand_name,brand_id, country) but what if the JSON have n number of values , so what i want to do is instead of accessing every value in JSON by device.brand_name,device.brand_id....etc , I want to loop all the values inside it and make it as a column
In standard SQL i have achieved with this
Select
device.*
From table
, is there a way to do it in clickhouse? , thank you
{brand_name:'huawei,'brand_id':'1232',''country:'china'}
Not a valid JSON.
New JSON (22.3) feature
https://github.com/ClickHouse/ClickHouse/issues/23516
set allow_experimental_object_type=1;
create table testj( A Int64, device JSON ) Engine=MergeTree order by A;
insert into testj (device) format TSV {"brand_name":"huawei","brand_id":"1232","country":"china"}
select A, device.brand_name, device.brand_id, device.country from testj;
┌─A─┬─device.brand_name─┬─device.brand_id─┬─device.country─┐
│ 0 │ huawei │ 1232 │ china │
└───┴───────────────────┴─────────────────┴────────────────┘
SELECT * FROM testj;
┌─A─┬─device────────────────────┐
│ 0 │ ('1232','huawei','china') │
└───┴───────────────────────────┘
SELECT toJSONString(device) FROM testj
┌─toJSONString(device)────────────────────────────────────────┐
│ {"brand_id":"1232","brand_name":"huawei","country":"china"} │
└─────────────────────────────────────────────────────────────┘
https://kb.altinity.com/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/
https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser/
https://clickhouse.com/docs/en/sql-reference/functions/json-functions/#jsonextractjson-indices-or-keys-return-type
create table testj( A Int64, device String,
brand_name String default JSONExtractString(device,'brand_name'),
brand_id String default JSONExtractString(device,'brand_id'),
country String default JSONExtractString(device,'country') )
Engine=MergeTree order by A;
insert into testj (device) format TSV {"brand_name":"huawei","brand_id":"1232","country":"china"}
;
select A, brand_name, brand_id, country from testj;
┌─A─┬─brand_name─┬─brand_id─┬─country─┐
│ 0 │ huawei │ 1232 │ china │
└───┴────────────┴──────────┴─────────┘
Related
I used to use google bigquery and selected multiple wild card tables with query like this:
SELECT *
FROM project.dataset.events_*
WHERE _TABLE_SUFFIX BETWEEN "20220704" AND "20220731"
and it selects all tables between this two dates
Is it possible in Clickhouse to query multiple tables with _TABLE_SUFFIX or analog if i only have bunch of tables like
1. events_20210501
2. events_20210502
3. events_20210503
...
with table engine ReplicatedMergeTree?
Is it possible to create wild card analog in clickhouse?
https://clickhouse.com/docs/en/engines/table-engines/special/merge
https://clickhouse.com/docs/en/sql-reference/table-functions/merge
create table A1 Engine=Memory as select 1 a;
create table A2 Engine=Memory as select 2 a;
select * from merge(currentDatabase(), '^A.$');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘
select * from merge(currentDatabase(), 'A');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘
I trying to find to calculate time difference in milliseconds betweent timestamps of two tables.
like this,
SELECT value, (table1.time - table2.time) AS time_delta
but i get error :
llegal types DateTime64(9) and DateTime64(9) of arguments of function minus:
so i can't substract DateTime64 in clickhouse.
Second way i tryed use DATEDIFF , but this func is limited by "SECONDS", i need values in "MILLISECONDS"
this is supported, but i get zeros in diff, because difference is too low(few millisecond):
SELECT value, dateDiff(SECOND , table1.time, table2.platform_time) AS time_delta
this is not supported:
SELECT value, dateDiff(MILLISECOND , table1.time, table2.time) AS time_delta
What's a better way to resolve my problem?
P.S i also tryed convert values to float, it's work , but looks strange,
SELECT value, (toFloat64(table1.time) - toFloat64(table2.time)) AS time_delta
as result i get somethink like this:
value time
51167477 -0.10901069641113281
#ditrauth Try casting to Float64, as the subsecond portion that you are looking for is stored as a decimal. Aslo, you want DateTime64(3) for milliseconds, see the docs. see below:
CREATE TABLE dt(
start DateTime64(3, 'Asia/Istanbul'),
end DateTime64(3, 'Asia/Istanbul')
)
ENGINE = MergeTree ORDER BY end
insert into dt values (1546300800123, 1546300800125),
(1546300800123, 1546300800133)
SELECT
start,
CAST(start, 'Float64'),
end,
CAST(end, 'Float64'),
CAST(end, 'Float64') - CAST(start, 'Float64') AS diff
FROM dt
┌───────────────────start─┬─CAST(start, 'Float64')─┬─────────────────────end─┬─CAST(end, 'Float64')─┬─────────────────diff─┐
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.125 │ 1546300800.125 │ 0.002000093460083008 │
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.133 │ 1546300800.133 │ 0.009999990463256836 │
└─────────────────────────┴────────────────────────┴─────────────────────────┴──────────────────────┴──────────────────────┘
2 rows in set. Elapsed: 0.001 sec.
I have two columns as below:
names: Array(String)
['name_one','name_2','name3']
values:Array(Float64)
[1000,2000,3000]
For example, I am interested in getting the value of 'name_2'. I want to retrieve 2000.
My guess is that I should first identify the location of 'name_2' in names, and then use it to retrieve the value in column values?
Would you use JSON to get to the solution ?
PS. I have just started to learn SQL, I am only familiar with basics at the moment. I have read some documentation but I am quite struggling on that one (getting errors always)
I am using Clickhouse.
Thanks for the help !
If you need to extract name multiple occurrences
SELECT arrayFilter((x, y) -> (y = 'name_2'), values, names)
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3', 'name_2'] AS names,
[1000, 2000, 3000, 4000] AS values
)
┌─arrayFilter(lambda(tuple(x, y), equals(y, 'name_2')), values, names)─┐
│ [2000,4000] │
└──────────────────────────────────────────────────────────────────────┘
if single
SELECT values[indexOf(names, 'name_2')]
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
┌─arrayElement(values, indexOf(names, 'name_2'))─┐
│ 2000 │
└────────────────────────────────────────────────┘
Consider using arrayZip-function:
SELECT
arrayZip(names, values) AS zipped,
zipped[2] AS second_pair,
second_pair.1 AS second_name,
second_pair.2 AS second_value
FROM
(
SELECT
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
/*
┌─zipped─────────────────────────────────────────────┬─second_pair─────┬─second_name─┬─second_value─┐
│ [('name_one',1000),('name_2',2000),('name3',3000)] │ ('name_2',2000) │ name_2 │ 2000 │
└────────────────────────────────────────────────────┴─────────────────┴─────────────┴──────────────┘
*/
Probably ARRAY JOIN-clause can be useful too:
SELECT *
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
ARRAY JOIN
names,
values
/*
┌─id─┬─names────┬─values─┐
│ 1 │ name_one │ 1000 │
│ 1 │ name_2 │ 2000 │
│ 1 │ name3 │ 3000 │
└────┴──────────┴────────┘
*/
Look at Nested Data Structures to store paired values.
ClickHouse:
┌─name──────────┬─type──────────┬
│ FieldUUID │ UUID │
│ EventDate │ Date │
│ EventDateTime │ DateTime │
│ Metric │ String │
│ LabelNames │ Array(String) │
│ LabelValues │ Array(String) │
│ Value │ Float64 │
└───────────────┴───────────────┴
Row 1:
──────
FieldUUID: 499ca963-2bd4-4c94-bc60-e60757ccaf6b
EventDate: 2021-05-13
EventDateTime: 2021-05-13 09:24:18
Metric: cluster_cm_agent_physical_memory_used
LabelNames: ['host']
LabelValues: ['test01']
Value: 104189952
Grafana:
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
no data points.
question: Is this the correct way to use it via grafana?
Example:
cluster_cm_agent_physical_memory_used{host='test01'} 104189952
Grafana expects your SQL will return time series data format for most of the visualization.
One column DateTime\Date\DateTime64 or UInt32 which describe
timestamp
One or several columns with Numeric types (Float, Int*,
UInt*) with metric values (column name will use as time series name)
optional one column with String which can describe multiple time
series name
or advanced "time series" format, when first column will timestamp, and the second column will Array(tuple(String, Numeric)) where String column will time series name (usually it used with
so, select table metrics.shell as table and EventDateTime as field in drop-down in query editor your query could be changed to
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
SQL query from your post, can be visualized without changes only with the Table plugin and you shall change "time series" to "table" format for properly data transformation on grafana side
Analog for promQL query
cluster_cm_agent_physical_memory_used{host='test01'} 104189952
should look like
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND LabelValues[indexOf(LabelNames,'host')] = 'test01'
AND $timeFilter
ORDER BY
EventDateTime
I've run following query:
SELECT
*
FROM
(
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_company.company_name as name,
public.tbl_company.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_company
WHERE
public.tbl_company.document ##to_tsquery('idis:*')
UNION
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_person.full_name as name,
public.tbl_person.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_person
WHERE
public.tbl_person.document ##to_tsquery('idis:*')
)as customers
ORDER BY qrank DESC
And I've received following result:
I've search a text as 'idis' but ts_query remove 's' char and search 'idi'. Results ordered by rank and rank of idil greather than idis.
Why ts_query removed last char?
How can I fix this problem?
You shoul set your default text search configuration to a language where the stemming rules are as you expect them to be:
SET default_text_search_config='english';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ t │
└──────────┘
(1 row)
SET default_text_search_config='turkish';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ f │
└──────────┘
(1 row)