Nifi PutHiveStreaming with null timestamp values

Nifi PutHiveStreaming with null timestamp values - hive

I have the target field type in hive as timestamp and from the source I get the json that has either proper timestamp field or "" or null sometimes. I am converting the source JsonToAvro before using PutHiveStreaming processor. The records with proper timestamp format gets into my hive target table successfully. But those that with ""/null (Empty String set) values show the error - Illegal format. Timestamp format should be" YYYY-MM-DD HH:MM:SS[.fffffffff] ". I know if I can default it to some date when it is null/empty, it works.But I do not want that. I want it to be as null in my target table when it is null. How can I achieve this?

Related

Getting Error : "Invalid date: [19/8/2013]. The row will be skipped in informatica

We have a source as a flat file which consists of a date column in format 19/08/2013.
We have a target in oracle table which consists of a date column in format 11-AUG-13.
When we are trying to pass the source column value in target using expression TO_DATE AND
TO_CHAR like
**source column is A ---> Source column
v1=TO_CHAR(A)
O1(output column)=TO_DATE(V1,'DD-MON-YY') we are getting the below error.
Invalid date: [19/8/2013]. The row will be skipped.**
Can anyone please help where I'm going wrong.
Thank you

You need to convert the str to date properly and then infa will load them.
Change the mapping like this -
Change data type in source to a string. read data as string.
Then use your expressions but change the to_date function like below.
v1(var column, data type string)=A
O1(output column, data type date)=IIF(IS_DATE(V1,'DD/MM/YYYY'), TO_DATE(V1,DD/MM/YYYY'),null)
IS_DATE - will check if the data is a date or not, if yes then only it will try to convert to a proper date else put null.
3. Then connect O1 column to a date type in oracle target.
This will make sure all data are loaded and none skipped.

Hive create table statement for timestamps that aren't in a 'yyyy-MM-dd HH:mm:ss' format

I have a JSON dataset in HDFS that contains a timestamp and a count. The raw data looks like this:
{"timestamp": "2015-03-01T00:00:00+00:00", "metric": 23}
{"timestamp": "2015-03-01T00:00:01+00:00", "metric": 17}
...
The format of the timestamp almost matches the Hive-friendly 'yyyy-mm-dd hh:mm:ss' format but has a couple of differences: there's a 'T' between the date and time. There's also a timezone offset. For example, a timestamp might be 2015-03-01T00:00:00+00:00 instead of 2015-03-01 00:00:00.
I'm able to create a table, providing that I treat the timestamp column as a string:
add jar hdfs:///apps/hive/jars/hive-json-serde-0.2.jar;
CREATE EXTERNAL TABLE `log`(
`timestamp` string,
`metric` bigint)
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" WITH SERDEPROPERTIES ("timestamp"="$.timestamp", "metric"="$.metric")
LOCATION 'hdfs://path/to/my/data';
This isn't ideal since, by treating it like a string, we lose the ability to use timestamp functions (e.g. DATE_DIFF, DATE_ADD, etc...) without casting from within the query. A possible workaround would be to CTAS and CAST the timestamp using a regular expression, but this entails copying the data into its new format. This seems inefficient and not in the spirit of 'schema-on-read'.
Is there a way to create a schema for this data without processing the data twice (i.e. once to load, once to convert the timestamp to a true timestamp)?

You will need to decide whether to:
do CTAS as you described
push the conversion work/logic into the consumers/clients of the table
For the second option this means including the string to timestamp conversion in the sql statements executed against your external table.

Headache loading timestamp with Oracle Sqlldr

Hi All i'm having an issue loading a file with timestamp values with sqlldr, can anyone please help with an idea?
This is what a line of my file to load with the timestamp looks like :
..... 2014-09-02-00.00.00.
And my ctrl file :
...
...
Field POSITION(1085) TIMESTAMP )
Every record is rejected with the error:
ORA-26041: DATETIME/INTERVAL datatype conversion error

SQL*Loader will try to interpret the string as a timestamp using your default NLS session parameters, which by default will be something like:
select value from nls_session_parameters where parameter = 'NLS_TIMESTAMP_FORMAT';
VALUE
--------------------------------------------------------------------------------
DD-MON-RR HH24.MI.SSXFF
... although yours appears to be something else as that setting gives ORA-01843: not a valid month' with the example string. The log file from the load will also show the format it tried to apply:
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
"<field>" 1085 * , DATETIME DD-MON-RR HH24.MI.SSXFF
Whatever your default format is, it doesn't seem to match the format in the file. You could change the format for the database or with a logon trigger but that's not going to be helpful if you have other sources or outputs that expect the existing format; and you shouldn't rely on NLS settings anyway as they could be different in another environment.
You can supply the timestamp format as part of the field definition:
... Field POSITION(1085) TIMESTAMP "YYYY-MM-DD-HH24.MI.SS")
If your column is timestamp with time zone - suggested by the tags but not the question - then it will be inserted in your session's time zone.
It's more common to specify the position range, though just providing the start position 1085 will work if this is the last field in the record.

db2 timestamp format not loading

I am trying to bulk-load data with timestamps into DB2 UDB version 9 and it is not working out.
My target table has been created thusly
CREATE TABLE SHORT
(
ACCOUNT_NUMBER INTEGER
, DATA_UPDATE_DATE TIMESTAMP DEFAULT current timestamp
, SCHEDULE_DATE TIMESTAMP DEFAULT current timestamp
...
0)
This is the code for my bulk load
load from /tmp/short.csv of del modified by COLDEL, savecount 500 messages /users/chris/DATA/short.msg INSERT INTO SHORT NONRECOVERABLE DATA BUFFER 500
No rows are loaded. I get an error
SQL0180N The syntax of the string representation of a datetime value is
incorrect. SQLSTATE=22007
I tried forcing the timestamp format when I create my table thusly:
CREATE TABLE SHORT
(
ACCOUNT_NUMBER INTEGER
, DATA_UPDATE_DATE TIMESTAMP DEFAULT 'YYYY-MM-DD HH24:MI:SS'
, SCHEDULE_DATE TIMESTAMP DEFAULT 'YYYY-MM-DD HH24:MI:SS'
...
0)
but to no avail: SQL0574N DEFAULT value or IDENTITY attribute value is not valid for column. Timestamp format in source my data file looks like this
2002/06/18 17:11:02.000
How can I format my timestamp columns?

Use the TIMESTAMPFORMAT file type modifier for the LOAD utility to specify the format present in your data files.
load from /tmp/short.csv
of del
modified by COLDEL, TIMESTAMPFORMAT="YYYY/MM/DD HH:MM:SS.UUUUUU"
savecount 500
messages /users/chris/DATA/short.msg
INSERT INTO SHORT
NONRECOVERABLE

It almost looks like ISO to me.
If you are able to edit the data file and can safely change the slashes to hyphens, (maybe clip the decimal fraction) and DB2 accepts ISO, you're home.

mysql "datetime NOT NULL DEFAULT '1970-01-01' " gets turned to 0000-00-00 00:00:00

I have a column created as
`date_start` datetime NOT NULL DEFAULT '1970-01-01'
However when I upload data from a CSV file with the LOAD DATA command with a blank entry for date_start the value saved is 0000-00-00 00:00:00 ?

NULL and 'a blank entry' are not always treated the same in MySql. In the case of the datetime data type, blank entries are automatically converted to 0000-00-00 00:00:00. You can think of this as a form of typecasting, where a blank gets typecast to zero.
In your data file, try replacing all blank dates with the DEFAULT keyword.

The MySQL SQL mode by default allows zero dates.
My belief is that the LOAD DATA INFILE command is reading the blank position intended to be a DATETIME, and automatically using a zero date before the insertion. This would explain why your default constraint isn't being applied - the value isn't null when being inserted.
I think you have two options:
Update all the blanks in your CSV from ", ," to ", NULL, " or ", DEFAULT, " so they'll be interpreted correctly
Change the SQL mode: SET SQL_MODE='NO_ZERO_DATE'

datetime means your column stores date and time in format Y-m-d H:i:s (0000-00-00 00:00:00).
date means your column stores just date in format Y-m-d (0000-00-00).

If you don't specify a value when inserting data, the default is used, but when you insert the data from the CSV file it does specify a value for the field, so the default is not used.
A blank entry is an invalid value for a datetime field, so the value 0000-00-00 00:00:00 is used instead, just as with any value that can't be parsed to a valid date (or a partial date).

MySQL has the annoying (in my opinion) behaviour that under some circumstances, if given an invalid value to store in a column, it will store a predefined "error value" instead. For example, in a date or datetime context, it will store the "zero date" you have noticed; for an ENUM column it will store '' (corresponds to 0 in a numeric context) as an error value.
Although it is inelegant, you might try doing the import and allowing the invalid zero dates to be inserted, and then issuing
UPDATE
table_name
SET
date_start = '1970-01-01 00:00:00'
WHERE
date_start = '0000-00-00 00:00:00'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Nifi PutHiveStreaming with null timestamp values - hive

Related

Getting Error : "Invalid date: [19/8/2013]. The row will be skipped in informatica

Hive create table statement for timestamps that aren't in a 'yyyy-MM-dd HH:mm:ss' format

Headache loading timestamp with Oracle Sqlldr

db2 timestamp format not loading

mysql "datetime NOT NULL DEFAULT '1970-01-01' " gets turned to 0000-00-00 00:00:00

Categories

Resources