How select multiple ex wild card tables in Clickhouse? - google-bigquery

I used to use google bigquery and selected multiple wild card tables with query like this:
SELECT *
FROM project.dataset.events_*
WHERE _TABLE_SUFFIX BETWEEN "20220704" AND "20220731"
and it selects all tables between this two dates
Is it possible in Clickhouse to query multiple tables with _TABLE_SUFFIX or analog if i only have bunch of tables like
1. events_20210501
2. events_20210502
3. events_20210503
...
with table engine ReplicatedMergeTree?
Is it possible to create wild card analog in clickhouse?

https://clickhouse.com/docs/en/engines/table-engines/special/merge
https://clickhouse.com/docs/en/sql-reference/table-functions/merge
create table A1 Engine=Memory as select 1 a;
create table A2 Engine=Memory as select 2 a;
select * from merge(currentDatabase(), '^A.$');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘
select * from merge(currentDatabase(), 'A');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘

Related

Access all the values inside the json in clickhouse

i have a column "device" which has rows json values like this
device
{"brand_name":'huawei,'brand_id':'1232',''country:'china'}
{"brand_name":'sony,'brand_id':'1ds232',''country:'japan'}
i want to create a column for every element inside the json like this
brand_name
brand_id
country
huawei
1232
china
sony
1ds232
japan
In a standard SQL i have done like this,
Select
device.brand_name
device.brand_id
device.country
From table
I want to do this in clickhouse and
In this case JSON only have three values ( brand_name,brand_id, country) but what if the JSON have n number of values , so what i want to do is instead of accessing every value in JSON by device.brand_name,device.brand_id....etc , I want to loop all the values inside it and make it as a column
In standard SQL i have achieved with this
Select
device.*
From table
, is there a way to do it in clickhouse? , thank you
{brand_name:'huawei,'brand_id':'1232',''country:'china'}
Not a valid JSON.
New JSON (22.3) feature
https://github.com/ClickHouse/ClickHouse/issues/23516
set allow_experimental_object_type=1;
create table testj( A Int64, device JSON ) Engine=MergeTree order by A;
insert into testj (device) format TSV {"brand_name":"huawei","brand_id":"1232","country":"china"}
select A, device.brand_name, device.brand_id, device.country from testj;
┌─A─┬─device.brand_name─┬─device.brand_id─┬─device.country─┐
│ 0 │ huawei │ 1232 │ china │
└───┴───────────────────┴─────────────────┴────────────────┘
SELECT * FROM testj;
┌─A─┬─device────────────────────┐
│ 0 │ ('1232','huawei','china') │
└───┴───────────────────────────┘
SELECT toJSONString(device) FROM testj
┌─toJSONString(device)────────────────────────────────────────┐
│ {"brand_id":"1232","brand_name":"huawei","country":"china"} │
└─────────────────────────────────────────────────────────────┘
https://kb.altinity.com/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/
https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser/
https://clickhouse.com/docs/en/sql-reference/functions/json-functions/#jsonextractjson-indices-or-keys-return-type
create table testj( A Int64, device String,
brand_name String default JSONExtractString(device,'brand_name'),
brand_id String default JSONExtractString(device,'brand_id'),
country String default JSONExtractString(device,'country') )
Engine=MergeTree order by A;
insert into testj (device) format TSV {"brand_name":"huawei","brand_id":"1232","country":"china"}
;
select A, brand_name, brand_id, country from testj;
┌─A─┬─brand_name─┬─brand_id─┬─country─┐
│ 0 │ huawei │ 1232 │ china │
└───┴────────────┴──────────┴─────────┘

How to access a specific value with two separate Array with SQL (one with name and the other one with the values)

I have two columns as below:
names: Array(String)
['name_one','name_2','name3']
values:Array(Float64)
[1000,2000,3000]
For example, I am interested in getting the value of 'name_2'. I want to retrieve 2000.
My guess is that I should first identify the location of 'name_2' in names, and then use it to retrieve the value in column values?
Would you use JSON to get to the solution ?
PS. I have just started to learn SQL, I am only familiar with basics at the moment. I have read some documentation but I am quite struggling on that one (getting errors always)
I am using Clickhouse.
Thanks for the help !
If you need to extract name multiple occurrences
SELECT arrayFilter((x, y) -> (y = 'name_2'), values, names)
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3', 'name_2'] AS names,
[1000, 2000, 3000, 4000] AS values
)
┌─arrayFilter(lambda(tuple(x, y), equals(y, 'name_2')), values, names)─┐
│ [2000,4000] │
└──────────────────────────────────────────────────────────────────────┘
if single
SELECT values[indexOf(names, 'name_2')]
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
┌─arrayElement(values, indexOf(names, 'name_2'))─┐
│ 2000 │
└────────────────────────────────────────────────┘
Consider using arrayZip-function:
SELECT
arrayZip(names, values) AS zipped,
zipped[2] AS second_pair,
second_pair.1 AS second_name,
second_pair.2 AS second_value
FROM
(
SELECT
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
/*
┌─zipped─────────────────────────────────────────────┬─second_pair─────┬─second_name─┬─second_value─┐
│ [('name_one',1000),('name_2',2000),('name3',3000)] │ ('name_2',2000) │ name_2 │ 2000 │
└────────────────────────────────────────────────────┴─────────────────┴─────────────┴──────────────┘
*/
Probably ARRAY JOIN-clause can be useful too:
SELECT *
FROM
(
SELECT
1 AS id,
['name_one', 'name_2', 'name3'] AS names,
[1000, 2000, 3000] AS values
)
ARRAY JOIN
names,
values
/*
┌─id─┬─names────┬─values─┐
│ 1 │ name_one │ 1000 │
│ 1 │ name_2 │ 2000 │
│ 1 │ name3 │ 3000 │
└────┴──────────┴────────┘
*/
Look at Nested Data Structures to store paired values.

Why Postgresql ts_query remove last char?

I've run following query:
SELECT
*
FROM
(
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_company.company_name as name,
public.tbl_company.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_company
WHERE
public.tbl_company.document ##to_tsquery('idis:*')
UNION
SELECT
ts_rank(document, to_tsquery('idis:*')) AS qrank,
public.tbl_person.full_name as name,
public.tbl_person.document as vector,
to_tsquery('idis:*') as query
FROM
public.tbl_person
WHERE
public.tbl_person.document ##to_tsquery('idis:*')
)as customers
ORDER BY qrank DESC
And I've received following result:
I've search a text as 'idis' but ts_query remove 's' char and search 'idi'. Results ordered by rank and rank of idil greather than idis.
Why ts_query removed last char?
How can I fix this problem?
You shoul set your default text search configuration to a language where the stemming rules are as you expect them to be:
SET default_text_search_config='english';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ t │
└──────────┘
(1 row)
SET default_text_search_config='turkish';
SELECT to_tsvector('İdil') ## to_tsquery('idis:*');
┌──────────┐
│ ?column? │
├──────────┤
│ f │
└──────────┘
(1 row)

How to find leaf node in multiple branches of a tree in PostgreSQL

Here's the dataset, which represents branches of a tree. Obviously, node 3, 5 are leaf nodes.
branch_of_tree
1
1/2
1/2/3
1/2/4
1/2/4/5
I intend to find all leaf nodes, so for the above example, it should be node 3 and node 5. Could anyone give me an idea how to solve it in PostgreSQL? Thanks!
Try this query:
SELECT Tb.* FROM T as Tb
WHERE NOT EXISTS (SELECT * FROM T WHERE
T.branch_of_tree LIKE Tb.branch_of_tree || '%'
AND T.branch_of_tree <> Tb.branch_of_tree )
Maybe something like the following:
SELECT t.branch_of_tree
FROM tree t
WHERE NOT EXISTS
(SELECT 1
FROM tree t2
WHERE t.branch_of_tree <> t2.branch_of_tree
AND position(t.branch_of_tree in t2.branch_of_tree) = 1);
┌────────────────┐
│ branch_of_tree │
├────────────────┤
│ 1/2/3 │
│ 1/2/4/5 │
└────────────────┘
(2 rows)

How to count items by day

I'm needing to make a report from our ticket database that has the number of tickets closed per day by tech. My SQL query looks something like this:
select
i.DT_CLOSED,
rp.NAME,
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
DT_CLOSED is the date and time in ISO format, and NAME is the rep name. I also have a calculated value in my dataset called TICKETSDAY that is calcualted using =DateValue(Fields!DT_CLOSED.Value) giving me the day without the time.
Right now I have a table set up that is grouped by NAME, then by TICKETSDAY, and I would like the last column to be a count of how many tickets there are. But when I set the last column to =Count(DT_CLOSED) it lists a 1 on each row for each ticket instead of aggregating so that my table looks like this:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
And I need it to be:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 3│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
Any idea what I'm doing wrong? Any help would be greatly appreciated.
I believe that Marc B is correct. You need to group by the non aggregate columns in your select statement. Try something along these lines.
select
i.DT_CLOSED,
rp.NAME,
COUNT(i.ID)
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
GROUP BY rp.NAME, i.DT_CLOSED
Without a group by to aggregate your rows together your query is counting each row distinctly. I'm unfamiliar with how the report builder works, but try adding the group by clause manually and see what you get.
Let me know if I can clarify anything.