How to convert String to Timestamp in kafka connect using transforms and insert into postgres using jdbc sink connector from confluent? - schema

Below is my kafka-connect-sink.properties file
I am using confluent-6.0.1.
name=enba-sink-postgres
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://IP:PORT/DB
connection.user=USERNAME
connection.password=PASSWORD
tasks.max=1
topics=postgresInsert
insert.mode=INSERT
table.name.format=schema."tableName"
auto.create=false
key.converter.schema.registry.url=http://localhost:8081
key.converter.schemas.enable=false
value.converter.schemas.enable=false
config.action.reload=restart
value.converter.schema.registry.url=http://localhost:8081
errors.tolerance=all
errors.log.enable=true
errors.log.include.messages=true
print.key=true
# Transforms
transforms=TimestampConverter
transforms.TimestampConverter.type=org.apache.kafka.connect.transforms.TimestampConverter$Value
transforms.TimestampConverter.format=yyyy-MM-dd HH:mm:ss
transforms.TimestampConverter.target.type=Timestamp
transforms.TimestampConverter.target.field=DATE_TIME
I am using avro data and schema is :
{\"type\":\"record\",\"name\":\"log\",\"namespace\":\"transform.name.space\",\"fields\":[{\"name\":\"TRANSACTION_ID\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"MSISDN\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"TRIGGER_NAME\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"W_ID\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"STEP\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"REWARD_ID\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"CAM_ID\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"STATUS\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"COMMENTS\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"CCR_JSON\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"},{\"name\":\"DATE_TIME\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"avro.java.string\":\"String\"}]}
Basically DATE_TIME column in Postgres is of type Timestamp and from avro I tried sending String date and also of type long .
DATE_TIME = 2022-12-15 14:38:02
Issue is If I dont use transform then I am getting error :
ERROR: column "DATE_TIME" is of type timestamp with time zone but expression is of type character varying
And If I use transforms as mentioned above then error is :
[2021-02-06 21:47:41,897] ERROR Error encountered in task enba-sink-postgres-0. Executing stage 'TRANSFORMATION' with class 'org.apache.kafka.connect.transforms.TimestampConverter$Value', where consumed record is {topic='enba', partition=0, offset=69, timestamp=1612628261605, timestampType=CreateTime}. (org.apache.kafka.connect.runtime.errors.LogReporter:66)
org.apache.kafka.connect.errors.ConnectException: Schema Schema{com.package.kafkaconnect.Enbalog:STRUCT} does not correspond to a known timestamp type format

I got it working using :
# Transforms
transforms= timestamp
transforms.timestamp.type= org.apache.kafka.connect.transforms.TimestampConverter$Value
transforms.timestamp.target.type= Timestamp
transforms.timestamp.field= DATE_TIME
transforms.timestamp.format= yyyy-MM-dd HH:mm:ss
For some reason transforms=TimestampConverter was not working.

Related

Prevent Oracle error ORA-18043 in select statement

I'm storing in a table the logs of an API in my application. In one of the columns, I store the raw Json sent in the HTTP request.
I've been tasked to create a page in my application dedicated to easily explore every entered logs, with filters, sorting, etc.
One of the sorting needed is on the date that was indicated in the Json body of the HTTP call. I've managed to do so using the Oracle's Json API :
SELECT *
FROM FUNDING_REQUEST f
ORDER BY TO_TIMESTAMP_TZ(JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
), 'YYYY/MM/DD"T"HH24:MI:SS.FFTZH:TZM') ASC
This works either if $.leasing_information.consumer_request_date_time is defined or not but I have an issue when the value is wrongly formatted. In one of my test, I sent this to my API :
{
[...],
"leasing_information": {
"consumer_request_date_time": "2021-25-09T12:30:00.000+02:00",
[...]
}
}
There is no 25th month, and my SQL query now returns the following error :
ORA-01843: not a valid month.
I would like to handle this value as NULL rather than returning an error, but it seems like the TO_TIMESTAMP_TZ DEFAULT clause does not really work the way I want it to. Doing this also returns an error :
SELECT *
FROM FUNDING_REQUEST f
ORDER BY TO_TIMESTAMP_TZ(JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
) DEFAULT NULL ON CONVERSION ERROR, 'YYYY/MM/DD"T"HH24:MI:SS.FFTZH:TZM') ASC
ORA-00932:inconsistent datatypes ; expected : - ; got : TIMESTAMP WITH TIME ZONE
I would also like to avoid using a PL/SQL function if possible, how can I prevent this request from returning an error ?
You don't need to extract a string and then convert that to a timestamp; you can do that within the json_value(), and return null if that errors - at least from version 12c Release 2 onwards:
SELECT *
FROM FUNDING_REQUEST f
ORDER BY JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
RETURNING TIMESTAMP WITH TIME ZONE
NULL ON ERROR
) ASC
db<>fiddle, including showing what the string and timestamp extractions evaluate too (i.e. a real timestamp value, or null).

timestamp VS TIMESTAMP_NTZ in snowflake sql

I am using an sql script to parse a json into a snowflake table using dbt.
One of the cols contain this datetime value: '2022-02-09T20:28:59+0000'.
What's the correct way to define ISO datetime's data type in Snowflake?
I tried date, timestamp and TIMESTAMP_NTZ like this in my dbt sql script:
JSON_DATA:",my_date"::TIMESTAMP_NTZ AS MY_DATE
but clearly, these aren't the correct one because later on when I test it in snowflake with select * , I get this error:
SQL Error [100040] [22007]: Date '2022-02-09T20:28:59+0000' is not recognized
or
SQL Error [100035] [22007]: Timestamp '2022-02-13T03:32:55+0100' is not recognized
so I need to know which Snowflake time/date data type suits the best for this one
EDIT:
This is what I am trying now.
SELECT
JSON_DATA:"date_transmission" AS DATE_TRANSMISSION
, TO_TIMESTAMP(DATE_TRANSMISSION:text, 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM') AS DATE_TRANSMISSION_TS_UTC
, JSON_DATA:"authorizerClientId"::text AS AUTHORIZER_CLIENT_ID
, JSON_DATA:"apiPath"::text API_PATH
, MASTERCLIENT_ID
, META_FILENAME
, META_LOAD_TS_UTC
, META_FILE_TS_UTC
FROM {{ source('INGEST_DATA', 'TABLENAME') }}
I get this error:
000939 (22023): SQL compilation error: error line 6 at position 4
10:21:46 too many arguments for function [TO_TIMESTAMP(GET(DATE_TRANSMISSION, 'text'), 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM')] expected 1, g
However, if I comment out the the first 2 lines(related to timpstamp types), the other two work perfectly fine. What's the correct syntax of parsing json with TO_TIMESTAMP?
Not that JSON_DATA:"apiPath"::text API_PATH gives the correct value for it in my snowflake tables.
Did some testing and it seems you have 2 options.
You can either get rid of the +0000 at the end: left(column_date, len(column_date)-5)
or try_to_timestamp with format
try_to_timestamp('2022-02-09T20:28:59+0000','YYYY-MM-DD"T"HH24:MI:SS+TZHTZM')
TZH and TZM are TimeZone Offset Hours and Minutes
So there are 2 main points here.
when getting data from JSON to pass to any of the timestamp functions that want a ::TEXT object, but the values to get from JSON are still ::VARIANT so they need to be cast. This is the cause of the error you quote
(22023): SQL compilation error: error line 6 at position 4
10:21:46 too many arguments for function [TO_TIMESTAMP(GET(DATE_TRANSMISSION, 'text'), 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM')] expected 1, g
also your SQL is wrong there it should have been
TO_TIMESTAMP(DATE_TRANSMISSION::text,
How you handle the timezone format.As other have noted you (as I did in your last question) do you want to ignore the timezone values or read them. I forgot about the TZHTZM formatting. Given you have timezone data, you should use the TO_TIMESTAMP_TZ`TRY_TO_TIMESTAMP_TZto make sure the time zone data is keep, given you second example shows+0100`
putting those together (assuming you didn't want an extra date_transmission as a variant in you data) :
SELECT
TO_TIMESTAMP_TZ(JSON_DATA:"date_transmission"::text, 'YYYY-MM-DDTHH24:MI:SS+TZHTZM') AS DATE_TRANSMISSION_TS_UTC
, JSON_DATA:"authorizerClientId"::text AS AUTHORIZER_CLIENT_ID
, JSON_DATA:"apiPath"::text AS API_PATH
, MASTERCLIENT_ID
, META_FILENAME
, META_LOAD_TS_UTC
, META_FILE_TS_UTC
FROM {{ source('INGEST_DATA', 'TABLENAME') }}
You should use timestamp (not date which does not store the time information), but probably the format you are using is not autodetected. You can specify the input format as YYYY-MM-DD"T"HH24:MI:SSTZHTZM as shown here. The autodetected one has a : between the TZHTZM.

BigQuery: Load avro-files with date column data type as long converted to timestamp

I have troubles getting BigQuery to load timestamps from avro-files correctly.
The avro-files have date columns stored as long, with logical type timestamp-micros. As per documentation, BigQuery should store this as timestamp data type. I have also tried timestamp-millis for logical type.
The data is stored in avro like this:
{'id': '<masked>', '<masked>': '<masked>', 'tm': 1553990400000, '<masked>': <masked>, '<masked>': <masked>, 'created': 1597056958864}
The fields tm and created are longs, 2019-03-31T00:00:00Z and 2020-08-10T11:50:58.986816592Z respectively.
The schema for the avro is
{"type":"record","name":"SomeMessage","namespace":"com.df",
"fields":
[{"name":"id","type":"string"},
{"name":"<masked>","type":"string"},
{"name":"tm","type":"long","logicalType":"timestamp-micros"},
{"name":"<masked>","type":"int"},
{"name":"<masked>","type":"float"},
{"name":"created","type":"long","logicalType":"timestamp-micros"}]}";
When imported to BigQuery through bq load, records ends up like these:
<masked> <masked> tm <masked> <masked> created
________________________________________________________________________________________________________
<masked> | <masked> | 1970-01-18 23:39:50.400 UTC | <masked> | <masked> | 1970-01-19 11:37:36.958864 UTC
________________________________________________________________________________________________________
The import command used is:
bq load --source_format=AVRO --use_avro_logical_types some_dataset.some_table "gs://some-bucket/some.avro"
The timestamps in BigQuery are nowhere near the actual values provided in avro.
Anyone have any ideas on how to do this properly?
I figured out that the avro schema is actually wrong.
The timestamp fields should be like this:
{"name":"created","type":{"type":"long", "logicalType":"timestamp-millis"}}

SparkSQL errors when using SQL DATE function

In Spark I am trying to execute SQL queries on a temporary table derived from a data frame that I manually built by reading a csv file and converting the columns into the right data type.
Specifically, the table I'm talking about is the LINEITEM table from [TPC-H specification][1]. Unlike stated in the specification I am using TIMESTAMP rather than DATE because I've read that Spark does not support the DATE type.
In my single scala source file, after creating the data frame and registering a temporary table called "lineitem", I am trying to execute the following query:
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01 00:00:00');")
When I submit the packaged jar using spark-submit, I get the following error:
Exception in thread "main" java.lang.RuntimeException: [1.75] failure: ``union'' expected but but `;' found
When I omit the semicolon and do the same thing, I get the following error:
Exception in thread "main" java.util.NoSuchElementException: key not found: date
Spark version is 1.4.0.
Does anyone have an idea what's the problem with these queries?
[1] http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpch2.17.1.pdf
SQL queries passed to SQLContext.sql shouldn't be delimited using semicolon - this the source of your first problem
DATE UDF expects date in the YYYY-­MM-­DD form and DATE('1998-12-01 00:00:00') evaluates to null. As long as timestamp can be casted to DATE correct query string looks like this:
"SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01')"
DATE is a Hive UDF. It means you have to use HiveContext not a standard SQLContext - this is the source of your second problem.
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc) // where sc is a SparkContext
In Spark >= 1.5 it is also possible to use to_date function:
import org.apache.spark.sql.functions.{lit, to_date}
df.where(to_date($"shipdate") <= to_date(lit("1998-12-01")))
Please try hive function CAST (expression AS toDatatype)
It changes an expression from one datatype to other
e.g. CAST ('2016-06-17 00.00.000' AS DATE) will convert String to Date
In your case
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE CAST(l.shipdate as DATE) <= CAST('1998-12-01 00:00:00' AS DATE);")
Supported datatype conversions are as listed in Hive Casting Dates

How to fix error "Conversion failed when converting datetime from character string" in SQL Query?

The error that I found at the log is the one below.
'Illuminate\Database\QueryException' with message 'SQLSTATE[22007]:
[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Conversion
failed when converting date and/or time from character string. (SQL:
SELECT COUNT(*) AS aggregate
FROM [mytable]
WHERE [mytable].[deleted_at] IS NULL
AND [created_at] BETWEEN '2015-09-30T00:00:00' AND '2015-09-30T23:59:59'
AND ((SELECT COUNT(*) FROM [mytable_translation]
WHERE [mytable_translation].[item_id] = [mytable].[id]) >= 1)
)'
in
wwwroot\myproject\vendor\laravel\framework\src\Illuminate\Database\Connection.php:625
On the database, the DataType is datetime and is not null
Based on marc_s's answer I tried to change the format that I'm sending to the database. So I tried without the T on and [created_at] between '2015-09-30 00:00:00' and '2015-09-30 23:59:59'.
In my local, I'm using mysql, and the code works just fine. If I test the query above on the SQL Server client, both (with and without the T) works too.
How can I fix this problem without create any changes on the database itself?
The PHP/Laravel code:
$items = $items->whereBetween($key, ["'".$value_aux."T00:00:00'", "'".$value_aux."T23:59:59'"]);
With #lad2025 help, I got it to work.
Based on the point of his comments on my question, I changed in the code part (Laravel/PHP in this case) the format that I was passing. (In reality, I "removed" the format it self, and just added fields to a variable before passing to the query. This way, I let the database decide the format that he wants)
Instead of
$itens = $itens->whereBetween($key, ["'".$value_aux."T00:00:00'", "'".$value_aux."T23:59:59'"]);
I changed the code to this:
$sta = $value_aux."T00:00:00";
$end = $value_aux."T23:59:59";
$itens = $itens->whereBetween($key, [$sta, $end]);