Timestamp query issue when using Google Data Studio community connector with BigQuery - google-bigquery

I'm having an issue where Google Data Studio is sending bad timestamp data to my community connector, so that when I try and filter on a date, I get an error. I have two BigQuery TIMESTAMP type fields (named timestamp and created_at), both being passed through to my community connector without modification. Once I add a date filter to the reports (in order to do time series, or regular filtering), like so:
then my queries from the connector (viewed in my BigQuery project history) begin to fail like this:
Could not cast literal "20200825" to type TIMESTAMP at [1:677]
The query in BigQuery looks something like this:
SELECT t0.created_at, SUM(t0.sum_metric) AS t0_qt_1z4br3iwbc FROM (SELECT field1, field2, field3) from data_name.table_name where user_identifier in (2)) AS t0 WHERE (t0.created_at >= '20200825' AND t0.created_at <= '20200831') GROUP BY t0.created_at ORDER BY t0.created_at ASC;
This really feels like a bug with the community connector regarding BigQuery. Is there some way around this? Am I just doing something wrong I'm not seeing?

Ok, I solved this. In your community connector you'll need to add this to your config:
config.setDateRangeRequired(true);
this will send a startDate and endDate parameter with your getData request (it defaults to 28 days). Access them in getData() like so:
var startDate = request.dateRange.startDate;
var endDate = request.dateRange.endDate
and then use them in your query as necessary.
As a side note, if you're storing a timestamp field in Google Data Studio and making a community connector, you'll need to set up a calculated field so that the reports treat it appropriately (display a human readable date instead of a timestamp). I'm first reading the timestamp field out of the database as a string like: STRING(timestamp, 'UTC') AS timestamp and then using this value to create a dimension. This is done in the schema like the following (use parsing appropriate to your field if it's different):
fields.newDimension()
.setId('date_timestamp')
.setName('date_timestamp')
.setDescription('Timestamp as a date.')
.setFormula("TODATE($timestamp, '%Y-%m-%d %H:%M:%S%z', '%Y%m%d')")
.setType(types.YEAR_MONTH_DAY)

Related

Date/Timestamp column in DBeaver (SQL Client) but I only want the date (using a CSV file)

I'm fairly rookie when it comes to SQL but as of recent I've been having to use it in it's basic form to do very simple tasks like only recalling relevant columns from a table etc.
I'm currently using DBeaver as my SQL Client and for this example I'm tapping straight into a CSV, no problems there. The data I'm working with is transaction data and the table is structured as follows
My problem is that the data is in 15 minute intervals whereas I need a value per day per store per metric (I.E. in the image example, it would return "Site" = 101 - "Metric" = FOODSER3 - "Date" = 2020-08-09 - "Value" = 6.0000)
Firstly, is this possible
Secondly, if so then please could someone let me in on the secret of how and maybe an explanation as to what the resolution is and why so that I can really understand what's going on.
I'm fairly proficient in Javascript and VBA, but so far SQL defeats me at every hurdle.
The structure of such a query is aggregation. Date/time functions are notoriously dependent on a database, but the idea is:
select cast(date as date), site, metric, sum(value)
from t
group by cast(date as date), site, metric;

Where in my query to place the CONVERT to convert DateTime to Date

Just learning SQL and I've searched many options about converting a DateTime into a Date, and I do not want current date. It's a super simple query from this website: https://sqlzoo.net/wiki/Guest_House_Assessment_Easy
SELECT booking_date, nights
FROM booking
WHERE guest_id=1183
But the output is with the timestamp and I just want the date. I've searched so many forums and tried all their suggestions, including this:
SELECT CONVERT(varchar(10), <col>, 101)
So I've done:
SELECT CONVERT(varchar(10), booking_date,101), nights
FROM booking
WHERE guest_id=1183
But I'm getting syntax errors. This is probably so simple and you'll all think me an idiot, but I'd greatly appreciate help. It's driving me nuts.
When I fiddled about at your sqlzoo link I got the error
execute command denied to user 'scott'#'localhost' for routine 'gisq.to_date'`.
When I googled gisq.to_date I got this link https://sqlzoo.net/wiki/Format_a_date_and_time
Which has examples of how this dialect represents dates. See if you can work it out. Something like this:
SELECT date_format(booking_date,'%d/%m/%Y')
FROM booking
You didn't post the error in your question which is a big mistake. When you get an error message, you actually have something to work from.
It is also very important to note that the query above returns a string, not a date. It's only good for display, not for date arithmetic
TBH that seems like a terrible site to learn on as it gives no clues about the dialect. it looks like Oracle but to_date and trunc don't work.
The use of convert() suggests that you think you are uinsg SQL Server. If you only want the date component of a date/time data type, then you can use:
SELECT CONVERT(DATE, booking_date), nights
FROM booking
WHERE guest_id = 1183;
The syntax error suggests that you are not using SQL Server.
CONVERT() is bespoke syntax for SQL Server. Examples of similar functionality in other databases are:
DATE(booking_date)
TRUNC(booking_date)
DATE_TRUNC('day', booking_date)
In addition, what you see also depends on the user-interface.
In your case, the data is being stored as a date with no time component, but the UI is showing the time. For that, you want to convert to a string. That site uses MariaDB -- which is really a flavor of MySQL-- and you would use:
DATE_FORMAT(booking_date, '%Y-%m-%d')

Airflow - Bigquery operator not working as intended

I'm new with Airflow and I'm currently stuck on an issue with the Bigquery operator.
I'm trying to execute a simple query on a table from a given dataset and copy the result on a new table in the same dataset. I'm using the bigquery operator to do so, since according to the doc the 'destination_dataset_table' parameter is supposed to do exactly what I'm looking for (source:https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/bigquery_operator/index.html#airflow.contrib.operators.bigquery_operator.BigQueryOperator).
But instead of copying the data, all I get is a new empty table with the schema of the one I'm querying from.
Here's my code
default_args = {
'owner':'me',
'depends_on_past':False,
'start_date':datetime(2019,1,1),
'end_date':datetime(2019,1,3),
'retries':10,
'retry_delay':timedelta(minutes=1),
}
dag = DAG(
dag_id='my_dag',
default_args=default_args,
schedule_interval=timedelta(days=1)
)
copyData = BigQueryOperator(
task_id='copyData',
dag=dag,
sql=
"SELECT some_columns,x,y,z FROM dataset_d.table_t WHERE some_columns=some_value",
destination_dataset_table='dataset_d.table_u',
bigquery_conn_id='something',
)
I don't get any warnings or errors, the code is running and the tasks are marked as success. It does create the table I wanted, with the columns I specified, but totally empty.
Any idea what I'm doing wrong?
EDIT: I tried the same code on a much smaller table (from 10Gb to a few Kb), performing a query with a much smaller result (from 500Mb to a few Kb), and it did work this time. Do you think the size of the table/the query result matters? Is it limited? Or does performing a too large query cause a lag of some sort?
EDIT2: After a few more tests I can confirm that this issue is not related to the size of the query or the table. It seems to have something to do with the Date format. In my code the WHERE condition is actually checking if a date_column = 'YYYY-MM-DD'. When I replace this condition with an int or string comparison it works perfectly. Do you guys know if Bigquery uses a particular date format or requires a particular syntax?
EDIT3: Finally getting somewhere: When I cast my date_column as a date (CAST(date_column AS DATE)) to force its type to DATE, I get an error that says that my field is actually an int-32 (Argument type mismatch). But I'm SURE that this field is a date, so that implies that either Bigquery stores it as an int while displaying it as a date, or that the Bigquery operator does some kind of hidden type conversion while loading tables. Any idea on how to fix this?
I had a similar issue when transferring data from other data sources than big-query.
I suggest casting the date_column as follows: to_char(date_column, 'YYYY-MM-DD') as date
In general, I have seen that big-query auto detection schema is often problematic. The safest way is to always specify schema before executing its corresponding query, or use operators that support schema definition.

How do I use TTL on clickhouse table?

Reading documentation, i've found the TTL feature that is very userful for me. However, I can't construct valid SQL to engage it.
How do I do that:
CREATE TABLE t1 (
name String,
date DateTime default now(),
number UInt64 default 0 TTL date + INTERVAL 1 DAY
) Engine MergeTree() ORDER BY name;
which gives error as follows:
Syntax error: failed at position 92 (line 4, col 27):
...[copy of my code here]
Expected one of: NOT, LIKE, AND, OR, IN, BETWEEN, COMMENT, CODEC, token, IS, NOT LIKE, NOT IN, GLOBAL IN, GLOBAL NOT IN, ClosingRoundBracket, Comma, QuestionMark
I've also tried to engage table-wide TTL:
CREATE TABLE t1 (
name String,
date DateTime default now(),
number UInt64 default 0
) Engine MergeTree() ORDER BY name TTL date + INTERVAL 1 DAY;
Which lead to an error as well.
As far as I can see, I'm doing everything according to the documentation (https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/#table_engine-mergetree-creating-a-table), but I still can't use this feature.
I'm using server version 19.5.3 revision 54417.
Please provide any examples or thoughts about how to use the TTL feature!
TTLs for tables and columns are net yet released, those will be available in 19.6.x. Documentation reflects the 'master' state of the art, not the latest release. This is certainly confusing. In order to see the particular release you may refer to documentation for particular major version, like this: https://clickhouse.yandex/docs/v19.5/en/operations/table_engines/mergetree/

Bigtable-BigQuery Import via DataFlow: 2 questions on table partitioning and Timestamps

I have a job in Dataflow importing data from Bigtable into Bigquery by using built-in Dataflow APIs for both. I have two questions:
Question 1: If the source data is in one large table in Bigtable, how can I partition it into a set of sub- or smaller tables in BigQuery dynamically based on, say, the given Bigtable row-key known only at run-time?
The Java code in Dataflow looks like this:
p.apply(Read.from(CloudBigtableIO.read(config)))
.apply(ParDo.of(new SomeDoFNonBTSourceData()))
.apply(BigQueryIO.Write
.to(PROJ_ID + ":" + BQ_DataSet + "." + BQ_TableName)
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
p.run();
So, since BQ_TableName has to be supplied at code-level, how can I provide it programmatically based on what is seen inside the SomeDoFNonBTSourceData, like a range of values of the current RowKey? If RowKey is 'a-c' then TableA, if 'd-f' then TableB, etc.
Question 2: What is the right way to export the Bigtable Timestamp into Bigquery so as to eventually reconstruct it in human-readable format in BigQuery?
The processElement function within the DoFn looks like this:
public void processElement(ProcessContext c)
{
String valA = new String(c.element().getColumnLatestCell(COL_FAM, COL_NAME).getValueArray());
Long timeStamp = c.element().getColumnLatestCell(COL_FAM, COL_NAME).getTimestamp();
tr.put("ColA", valA);
tr.put("TimeStamp",timeStamp);
c.output(tr);
}
And during the Pipeline construction, the BQ schema setup for the timeStamp column looks like this:
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("ColA").setType("STRING"));
fields.add(new TableFieldSchema().setName("TimeStamp").setType("TIMESTAMP"));
schema = new TableSchema().setFields(fields);
So the Bigtable timestamp seems to be of type Long, and I have tried both "TIMESTAMP" and "INTEGER" types for the destination TimeStamp column in BQ (seems like there is no Long in BQ as such). Ultimately, I need to use the TimeStamp column in BQ both for 'order by' clauses and to display the information in human-readable form (date and time). The 'order by' part seems to work OK, but I have not managed to CAST the end result into anything meaningful -- either get cast errors or something still unreadable.
Incidentally am here looking for an answer to an issue similar to Question 1 :).
For the second question, I think you first need to confirm that the Long timestamp is indeed a UNIX timestamp, I've always assumed BQ can ingest that as a timestamp without any conversion.
But you can try this...
Long longTimeStamp = 1408452095L;
Date timeStamp = new Date();
timeStamp.setTime(longTimeStamp * 1000);
tr.put("TimeStamp", timeStamp.toInstant().toString());