Can't use Date/DateTime as arg in argMinMerge/argMaxMerge? - sql

In one of Afinity webinars they give an example of using argMin/argMax aggregate functions to find the first/last value in some table. They are using the following table:
CREATE TABLE cpu_last_point_idle_agg (
created_date AggregateFunction(argMax, Date, DateTime),
...
)
Engine = AggregatingMergeTree
Then they create the materialized view:
CREATE MATERIALIZED VIEW cpu_last_point_idle_mw
TO cpu_last_point_idle_agg
AS SELECT
argMaxState(created_date, created_at) AS created_date,
...
And finally the view:
CREATE VIEW cpu_last_point_idle AS
SELECT
argMaxMerge(created_date) AS created_date,
...
However, when I try to replicate this approach, I am getting an error.
My table:
CREATE TABLE candles.ESM20_mthly_data (
ts DateTime Codec(Delta, LZ4),
open AggregateFunction(argMin, DateTime, Float64),
...
)
Engine = AggregatingMergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY ts
PRIMARY KEY(ts);
My Materialized View:
CREATE MATERIALIZED VIEW candles.ESM20_mthly_mw
TO candles.ESM20_mthly_data
AS SELECT
ts,
argMinState(ts, src.price) AS open,
...
FROM source_table as src
GROUP BY toStartOfInterval(src.ts, INTERVAL 1 month) as ts;
My view:
CREATE VIEW candles.ESM20_mthly
AS SELECT
ts,
argMinMerge(ts) as open,
...
FROM candles.ESM20_mthly_mw
GROUP BY ts;
I get an error:
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Illegal type DateTime('UTC') of argument for aggregate function with Merge suffix must be AggregateFunction(...).
I tried using Date and DateTime, with the same result. If I flip the arg and value, it works but of course doesn't give me what I want. Are dates no longer supported by these aggregating functions? How do I make it work?
I am using Connected to ClickHouse server version 20.12.3 revision 54442.

First of all argMin(a, b) -- take a when b is min.
--AggregateFunction(argMin, DateTime, Float64),
++AggregateFunction(argMin, Float64, DateTime),
--argMinState(ts, src.price) AS open,
++argMinState(src.price,ts) AS open,
The second issue is
--argMinMerge(ts) as open,
++argMinMerge(open) as final_open,

Related

Compare 2 versions a of table in big-query

I wanted to compare 2 versions of a table.
I wanted to compare Before last modification with latest data from a table.
here i have a sample sql script which compares the tables
WITH
before_mod AS ( SELECT *
FROM `big-query-112.temp.tableB`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB({{ lastModification }}, INTERVAL 2 second)),
after_mod AS ( SELECT * FROM `big-query-112.temp.tableB` ),
row_changed AS (
SELECT *
FROM before_mod EXCEPT DISTINCT
SELECT *
FROM after_mod
)
SELECT * FROM row_changed
This SQL first will create a CTE for
before_mob -> this holds a snapshot of the table as it was on that specific point in time.
afrer_mod -> the actual data in the tableB
Then "row_changed" table is created by selecting all rows from "before_mod" that are not in "after_mod".
The problem is that bigquery does not allow to use diferent timestamp FOR SYSTEM_TIME AS ...
Exception:If a 'FOR SYSTEM_TIME AS OF' expression is used, all references of a table should use the same TIMESTAMP value.
I also tried adding the before_mod in a view and then query the view SQL below
CREATE OR REPLACE VIEW `big-query-112.temp.tableB_before_mod_temp` AS (
SELECT *
FROM `big-query-112.temp.tableB`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB('2023-02-04 13:12:35 UTC', INTERVAL 0 second)
);
WITH
before_mod AS ( SELECT * FROM `big-query-112.temp.tableB_before_mod_temp`),
after_mod AS ( SELECT * FROM `big-query-112.temp.tableB` ),
row_changed AS (
SELECT *
FROM before_mod EXCEPT DISTINCT
SELECT *
FROM after_mod
)
SELECT * FROM row_changed
The problem with this one is that it is not showing the rows that are different, seams that is getting in table from only a specific time.
Also, cannot use materialized view Exception: Invalid value: Materialized view query cannot reference historical versions of the table definition
Is there a way how can i compare 2 versions of the table, without creating a copy?
NOTE: Table does not have an ID (in the way the table is being generated it is hard to add an id which is always same for a specific row)
also querying the SELECT * FROM `big-query-112.temp.tableB_before_mod_temp shows the expected results

Create view which will do partition filtering in Synapse openrowset

I have a view that is defined as
create view [dbo].[darts] as
SELECT pricedate, [hour], node_id, dalmp, cast(result.filepath(1) as int) as [Year], cast(result.filepath(2) as int) as [Month]
FROM
OPENROWSET(
BULK 'year=*/month=*/*.parquet',
DATA_SOURCE='mysource',
FORMAT = 'parquet'
)
with (
pricedate date,
node_id bigint,
[hour] int,
dalmp float
)
as [result]
where cast(result.filepath(1) as int)=datepart(year, pricedate) and cast(result.filepath(2) as int)=datepart(month, pricedate)
What I want to be able to do is do a query on this view like:
select * from darts where pricedate='2022-11-01'
and have the where clause of the view definition force it to only look in year=2022/month=11 but it doesn't work unless I do it explicitly such as:
select * from darts where pricedate='2022-11-01' and Year=2022 and Month=11
For clarity, when I say the first query doesn't work, what I mean is that it isn't doing any partition pruning, that query searches all files/data. Whereas, my second query only scans the fraction that I expect.
Are there any extra modifiers/syntax/functional form I could use in my view definition that would force partition pruning in the case of my first query?

How to dynamically SELECT from manually partitioned table

Suppose I have table of tenants like so;
CREATE TABLE tenants (
name varchar(50)
)
And for each tenant, I have a corresponding table called {tenants.name}_entities, so for example for tenant_a I would have the following table.
CREATE TABLE tenant_a_entities {
id uuid,
last_updated timestamp
}
Is there a way I can create a query with the following structure? (using create table syntax to show what I'm looking for)
CREATE TABLE all_tenant_entities {
tenant_name varchar(50),
id uuid,
last_updated timestamp
}
--
I do understand this is a strange DB layout, I'm playing around with foreign data in Postgres to federate foreign databases.
Did you consider declarative partitioning for your relational design? List partitioning for your case, with PARTITION BY LIST ...
To answer the question at hand:
You don't need the table tenants for the query at all, just the detail tables. And one way or another you'll end up with UNION ALL to stitch them together.
SELECT 'a' AS tenant_name, id, last_updated FROM tenant_a_entities
UNION ALL SELECT 'b', id, last_updated FROM tenant_b_entities
...
You can add the name dynamically, like:
SELECT tableoid::regclass::text, id, last_updated FROM tenant_a_entities
UNION ALL SELECT tableoid::regclass::text, id, last_updated FROM tenant_a_entities
...
See:
Get the name of a row's source table when querying the parent it inherits from
But it's cheaper to add a constant name while building the query dynamically in your case (the first code example) - like this, for example:
SELECT string_agg(format('SELECT %L AS tenant_name, id, last_updated FROM %I'
, split_part(tablename, '_', 2)
, tablename)
, E'\nUNION ALL '
ORDER BY tablename) -- optional order
FROM pg_catalog.pg_tables
WHERE schemaname = 'public' -- actual schema name
AND tablename LIKE 'tenant\_%\_entities';
Tenant names cannot contain _, or you have to do more.
Related:
Table name as a PostgreSQL function parameter
How to check if a table exists in a given schema
You can wrap it in a custom function to make it completely dynamic:
CREATE OR REPLACE FUNCTION public.f_all_tenant_entities()
RETURNS TABLE(tenant_name text, id uuid, last_updated timestamp)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
(
SELECT string_agg(format('SELECT %L AS tn, id, last_updated FROM %I'
, split_part(tablename, '_', 2)
, tablename)
, E'\nUNION ALL '
ORDER BY tablename) -- optional order
FROM pg_tables
WHERE schemaname = 'public' -- your schema name here
AND tablename LIKE 'tenant\_%\_entities'
);
END
$func$;
Call:
SELECT * FROM public.f_all_tenant_entities();
You can use this set-returning function (a.k.a "table-function") just like a table in most contexts in SQL.
Related:
How to UNION a list of tables retrieved from another table with a single query?
Simulate CREATE DATABASE IF NOT EXISTS for PostgreSQL?
Function to loop through and select data from multiple tables
Note that RETIRN QUERY does not allow parallel queriies before Postgres 14. The release notes:
Allow plpgsql's RETURN QUERY to execute its query using parallelism (Tom Lane)

"Object not found" error when using multiple table expressions in WITH...AS inside of CREATE VIEW

I am trying to create a view based on complex query in HSQLDB (version 2.5.1).
The query looks like this (simplified for clarity), also includes DDL for the tables:
DROP VIEW TEST_VIEW IF EXISTS;
DROP TABLE TEST_1 IF EXISTS;
CREATE TABLE TEST_1 (
contentid VARCHAR(10),
contenttype VARCHAR(10),
downloaddate TIMESTAMP
);
DROP TABLE TEST_2 IF EXISTS;
CREATE TABLE TEST_2 (
dbid INTEGER,
contentid VARCHAR(10),
version VARCHAR(10)
);
CREATE VIEW TEST_VIEW AS
WITH a AS (
SELECT CONTENTID, count(*) AS amount
FROM TEST_2
GROUP BY CONTENTID
),
b AS (
SELECT CONTENTID, amount
FROM a
)
SELECT b.CONTENTID, b.amount, i.DOWNLOADDATE
FROM b /* error here */
JOIN TEST_1 i ON i.CONTENTID = b.CONTENTID
ORDER BY b.CONTENTID;
However, it fails with the following error:
[42501][-5501] user lacks privilege or object not found: JOIN in statement [CREATE VIEW TEST_VIEW AS......
The same query runs fine when used as a SELECT (without CREATE VIEW...AS).
Also, the view is created successfully if there is only one table expression in WITH...AS statement, like below:
CREATE VIEW TEST_VIEW AS
WITH a AS (
SELECT CONTENTID, count(*) AS amount
FROM TEST_2
GROUP BY CONTENTID
)
SELECT a.CONTENTID, a.amount, i.DOWNLOADDATE
FROM a
JOIN TEST_1 i ON i.CONTENTID = a.CONTENTID
ORDER BY a.CONTENTID;
It looks like in the first statement the DB engine tries to parse "JOIN" as a table alias for table "b".
Is there a syntax error I have not noticed, or does HSQLDB not support multiple table expressions in WITH...AS inside of CREATE VIEW?
Edit: Updated example query to include table DDL for completeness.
HSQLDB supports creating this type of view.
As you haven't provided table definitions, I tried with a similar query with the test tables that are generated by DatabaseManager and it was successful. Please report the tables.
CREATE VIEW REPORT_LINKED_IDS AS
WITH a AS (
SELECT PRODUCTID, count(*) AS amount
FROM ITEM
GROUP BY PRODUCTID
),
b AS (
SELECT PRODUCTID, amount
FROM a
)
SELECT b.PRODUCTID, b.amount, i.NAME, i.PRICE
FROM b
JOIN PRODUCT i ON i.ID = b.PRODUCTID
ORDER BY b.PRODUCTID;
Thanks to a suggestion by #fredt, I have confirmed that the issue is with trying to use this query in IntelliJ IDEA (2020.1). The query worked fine and the view was created successfully in the same DB when another tool was used (DbVisualizer in my case). Furthermore, after having created the view in DB, IntelliJ IDEA throws an exception on the same word "JOIN" when trying to connect to this DB - the error is similar to this: The specified database user/password combination is rejected: org.hsqldb.HsqlException: unexpected token: NOT. Similarly to a comment in the above question, I have recovered from the error by manually editing the .script file.
There are at least 2 possible options to resolve the issue:
1st solution: Refactor SQL query to only have one table in WITH clause. In my case I just moved the first table to a select expression in FROM clause, like below:
CREATE VIEW TEST_VIEW AS
WITH b AS (
SELECT CONTENTID, amount
FROM (
SELECT CONTENTID, count(*) AS amount
FROM TEST_2
GROUP BY CONTENTID
)
)
SELECT b.CONTENTID, b.amount, i.DOWNLOADDATE
FROM b
JOIN TEST_1 i ON i.CONTENTID = b.CONTENTID
ORDER BY b.CONTENTID;
2nd solution: Use a different tool to work with the DB, or have the issue fixed in IDEA.

Can not select _PARTIONTIME in SELECT but can select it in INSERT

When running the below SELECT I get this error message
Error: Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, TABLE, FILE and _ROW_TIMESTAMP
SELECT
_PARTITIONTIME,
jobId
FROM
`project.dataset.audit`
WHERE
_PARTITIONTIME >= TIMESTAMP("2019-02-20")
However, when I use it in a DML the query is working
INSERT INTO
`project.dataset.audit_clustered`
(
_PARTITIONTIME,
jobId,
)
SELECT
_PARTITIONTIME,
jobId
FROM
`project.dataset.audit`
WHERE
_PARTITIONTIME >= TIMESTAMP("2019-02-20")
I have 2 questions:
Why the SELECT alone is not working
Is it guarantee that the INSERT will work properly and data will be inserted to the correct partition of the target table
Replace
SELECT _PARTITIONTIME
With
SELECT _PARTITIONTIME AS something
This because the resulting set can't have a column starting with _, but you can name it anything else.