How do I use TTL on clickhouse table? - ttl

Reading documentation, i've found the TTL feature that is very userful for me. However, I can't construct valid SQL to engage it.
How do I do that:
CREATE TABLE t1 (
name String,
date DateTime default now(),
number UInt64 default 0 TTL date + INTERVAL 1 DAY
) Engine MergeTree() ORDER BY name;
which gives error as follows:
Syntax error: failed at position 92 (line 4, col 27):
...[copy of my code here]
Expected one of: NOT, LIKE, AND, OR, IN, BETWEEN, COMMENT, CODEC, token, IS, NOT LIKE, NOT IN, GLOBAL IN, GLOBAL NOT IN, ClosingRoundBracket, Comma, QuestionMark
I've also tried to engage table-wide TTL:
CREATE TABLE t1 (
name String,
date DateTime default now(),
number UInt64 default 0
) Engine MergeTree() ORDER BY name TTL date + INTERVAL 1 DAY;
Which lead to an error as well.
As far as I can see, I'm doing everything according to the documentation (https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/#table_engine-mergetree-creating-a-table), but I still can't use this feature.
I'm using server version 19.5.3 revision 54417.
Please provide any examples or thoughts about how to use the TTL feature!

TTLs for tables and columns are net yet released, those will be available in 19.6.x. Documentation reflects the 'master' state of the art, not the latest release. This is certainly confusing. In order to see the particular release you may refer to documentation for particular major version, like this: https://clickhouse.yandex/docs/v19.5/en/operations/table_engines/mergetree/

Related

Timestamp query issue when using Google Data Studio community connector with BigQuery

I'm having an issue where Google Data Studio is sending bad timestamp data to my community connector, so that when I try and filter on a date, I get an error. I have two BigQuery TIMESTAMP type fields (named timestamp and created_at), both being passed through to my community connector without modification. Once I add a date filter to the reports (in order to do time series, or regular filtering), like so:
then my queries from the connector (viewed in my BigQuery project history) begin to fail like this:
Could not cast literal "20200825" to type TIMESTAMP at [1:677]
The query in BigQuery looks something like this:
SELECT t0.created_at, SUM(t0.sum_metric) AS t0_qt_1z4br3iwbc FROM (SELECT field1, field2, field3) from data_name.table_name where user_identifier in (2)) AS t0 WHERE (t0.created_at >= '20200825' AND t0.created_at <= '20200831') GROUP BY t0.created_at ORDER BY t0.created_at ASC;
This really feels like a bug with the community connector regarding BigQuery. Is there some way around this? Am I just doing something wrong I'm not seeing?
Ok, I solved this. In your community connector you'll need to add this to your config:
config.setDateRangeRequired(true);
this will send a startDate and endDate parameter with your getData request (it defaults to 28 days). Access them in getData() like so:
var startDate = request.dateRange.startDate;
var endDate = request.dateRange.endDate
and then use them in your query as necessary.
As a side note, if you're storing a timestamp field in Google Data Studio and making a community connector, you'll need to set up a calculated field so that the reports treat it appropriately (display a human readable date instead of a timestamp). I'm first reading the timestamp field out of the database as a string like: STRING(timestamp, 'UTC') AS timestamp and then using this value to create a dimension. This is done in the schema like the following (use parsing appropriate to your field if it's different):
fields.newDimension()
.setId('date_timestamp')
.setName('date_timestamp')
.setDescription('Timestamp as a date.')
.setFormula("TODATE($timestamp, '%Y-%m-%d %H:%M:%S%z', '%Y%m%d')")
.setType(types.YEAR_MONTH_DAY)

Airflow - Bigquery operator not working as intended

I'm new with Airflow and I'm currently stuck on an issue with the Bigquery operator.
I'm trying to execute a simple query on a table from a given dataset and copy the result on a new table in the same dataset. I'm using the bigquery operator to do so, since according to the doc the 'destination_dataset_table' parameter is supposed to do exactly what I'm looking for (source:https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/bigquery_operator/index.html#airflow.contrib.operators.bigquery_operator.BigQueryOperator).
But instead of copying the data, all I get is a new empty table with the schema of the one I'm querying from.
Here's my code
default_args = {
'owner':'me',
'depends_on_past':False,
'start_date':datetime(2019,1,1),
'end_date':datetime(2019,1,3),
'retries':10,
'retry_delay':timedelta(minutes=1),
}
dag = DAG(
dag_id='my_dag',
default_args=default_args,
schedule_interval=timedelta(days=1)
)
copyData = BigQueryOperator(
task_id='copyData',
dag=dag,
sql=
"SELECT some_columns,x,y,z FROM dataset_d.table_t WHERE some_columns=some_value",
destination_dataset_table='dataset_d.table_u',
bigquery_conn_id='something',
)
I don't get any warnings or errors, the code is running and the tasks are marked as success. It does create the table I wanted, with the columns I specified, but totally empty.
Any idea what I'm doing wrong?
EDIT: I tried the same code on a much smaller table (from 10Gb to a few Kb), performing a query with a much smaller result (from 500Mb to a few Kb), and it did work this time. Do you think the size of the table/the query result matters? Is it limited? Or does performing a too large query cause a lag of some sort?
EDIT2: After a few more tests I can confirm that this issue is not related to the size of the query or the table. It seems to have something to do with the Date format. In my code the WHERE condition is actually checking if a date_column = 'YYYY-MM-DD'. When I replace this condition with an int or string comparison it works perfectly. Do you guys know if Bigquery uses a particular date format or requires a particular syntax?
EDIT3: Finally getting somewhere: When I cast my date_column as a date (CAST(date_column AS DATE)) to force its type to DATE, I get an error that says that my field is actually an int-32 (Argument type mismatch). But I'm SURE that this field is a date, so that implies that either Bigquery stores it as an int while displaying it as a date, or that the Bigquery operator does some kind of hidden type conversion while loading tables. Any idea on how to fix this?
I had a similar issue when transferring data from other data sources than big-query.
I suggest casting the date_column as follows: to_char(date_column, 'YYYY-MM-DD') as date
In general, I have seen that big-query auto detection schema is often problematic. The safest way is to always specify schema before executing its corresponding query, or use operators that support schema definition.

BigQuery query doesn't work inside a view

When I select from a view I've created I receive the following error from time to time, but not always:
query: Timestamp evaluation: connection error. (error code: invalidQuery)
Job ID vex-custom:bquijob_59705b02_155966ddc5f
Start Time Jun 28, 2016, 11:53:50 AM
End Time Jun 28, 2016, 11:53:50 AM
Running the query by itself works perfectly well.
There's two 2 special things about this query:
It uses TABLE_DATE_RANGE
It references tables from another project then where the view resides. But we've done this a lot of times without issues
Can someone from Google perhaps check the job id?
I checked the internal details for your query. The view that your failed query references makes a few problematic calls to TIMESTAMP functions. Here's one example:
SELECT * FROM TABLE_DATE_RANGE([...], TIMESTAMP(DATE_ADD(UTC_USEC_TO_DAY(CURRENT_DATE()), -15, "day")), CURRENT_TIMESTAMP())
Specifically, the call to TIMESTAMP(DATE_ADD(UTC_USEC_TO_DAY(CURRENT_DATE()), -15, "day")) is erroring because:
UTC_USEC_TO_DAY returns an INTEGER, not a TIMESTAMP.
DATE_ADD expects an argument of type TIMESTAMP.
You can wrap the call to UTC_USEC_TO_DAY with USEC_TO_TIMESTAMP to convert the argument to type TIMESTAMP, like so:
TIMESTAMP(DATE_ADD(USEC_TO_TIMESTAMP(UTC_USEC_TO_DAY(CURRENT_DATE())), -15, "day"))
We're in the process of rolling out a release that more closely checks the expected input types of many timestamp functions, which is why you are currently seeing inconsistent behavior. These fixes prevent issues where some functions can return malformatted TIMESTAMPs, and also brings our behavior more in line with our documentation on timestamp functions.
Separately, we need to work on making sure errors that occur within the evaluation of timestamps for TABLE_DATE_RANGE return more useful errors than "connection error".

Error: Schema changed for Timestamp field (additional)

I am getting an error message when I query a specific table in my data set that has a nullable timestamp field. In the BigQuery web tool, I run simple query, e.g.:
SELECT * FROM [reztrack.201401] LIMIT 100
The result I get is: Error: Schema changed for Timestamp field date
Example Job ID: esiteisthebomb:job_6WKi7ZhSi8D_Ewr8b5rKV-a5Eac
This is the exact same issue that was noted here: Error: Schema changed for Timestamp field.
Also logged this under: https://code.google.com/p/google-bigquery/issues/detail?id=307 but I was unsure since it said we should be logging everything in Stackoverlfow.
Any information on how to fix this for this or other tables would be greatly appreciated.
Note: The original answer states to contact google support, but Google support for BigQuery was moved to StackOverflow. Therefore I assume that means to open it as a new question in hopes the engineers will respond.
BigQuery recently improved the representation of its internal timestamp format (there had previously been a lot of cases where timestamps broke in strange ways and this change should fix that). Your table still was using the old timestamp format, and you tickled a bug in the old format when schemas changed (in this case, the field went from REQUIRED to OPTIONAL).
We have an automated process that coalesces tables to make their storage more efficient. I scheduled this to run over your table, and have verified that it has rewritten your table using the new timestamp format.
You should now be able to query this field of your table without further problems.

Raven appears to generate incorrect Lucene query for DateTime

I have some documents stored in Raven, which have a CreationDate property, of type "DateTimeOffset".
I have been attempting to get these documents returned in a query from C#, and they are never returned if I use the CreationDate in the query criteria.
After watching the Raven console, I saw the query being issued was:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND ( Creati
onDate:[20120524120000000 TO NULL] AND CreationDate:{* TO 20120525120000000})
I ran this query directly against HTTP, and changed the date format to:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND ( Creati
onDate:[2012-05-24120000000 TO NULL] AND CreationDate:{* TO 2012-05-25120000000})
And now it works - it returns my documents which DEFINITELY fall within the range.
Is Raven generating the incorrect date format for lucene? If so, how do I fix this?
Notes:
Yes I need time zone support
Yes I need Time aswell as Date in my index.
Thanks
[EDIT]
Err... I just changed my entities to use DateTime, just for giggles... and it still fails to return the data... whats going on??? Im using RavenDB.Server.1.2.2002-Unstable
Adam,
You are using RavenDB Server 1.2 pre release bits with the RavenDB Client 1.0 Stable.
Those are incompatible.