Import data into Hive with arbitrary timestamp format - hive

I have a data file I wish to import into Hive which contains timestamps. The timestamps are of the format MM/dd/yyyy HH:mm:ss.
I would like to create a table which contains a timestamp type to hold this value, however I can't figure out how to directly import the data.
My workaround is to import the data into a temporary table with my date as a string, then read the data from this temporary table into my permanent table doing the time format conversion on the fly.
So, my entire two-step load function looks something like this:
create table tempTable(
timeField string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
--load data local here!
create table finalTable(
timeField timestamp
) stored as RCFILE;
insert into table finalTable select
from_unixtime( unix_timestamp(timeField,'MM/dd/yyyy HH:mm') )
from tempTable;
So finally my question :-)
Is this the 'right' or 'best' way to do it? Am I using an inefficient/stupid workaround?
thanks!

Another work around is change the datetime format of your data file as yyyy/MM/dd HH:mm:ss
Then it will directly cast the data as timestamp datatype to hive table.
Hope this help.

create table temptable(
timefield timestamp
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
This will cast your string 04/05/2014 04:25:55 as timestamp on hive.

Related

Trouble converting CSV date from non standard format SQL

I have dates in the format 01jan2020 (without a space or any separator) and need to convert this to a date type in SQL Server 2016 Management Studio.
The data was loaded from a .CSV file into a table (call it TestData, column is Fill_Date).
To join on a separate table to pull back data for another process, I need the TestData column Fill_Date to be in the correct format (MM-DD-YYYY) for my query to run correctly.
Fill_Date is currently in table TestData as datatype varchar(50).
I want to either see if it is possible to convert it with TestData table or directly insert the result into a 2nd table that is formatted.
Thanks (NEWB)
I ended up solving by converting the data while dropping into a temp table, deleting old value, and then inserting from that table back into the TestData table.
CONVERT(VARCHAR,CONVERT(date,[fill_date]),101) AS fill_date

Is there a way to specify Date/Timestamp format for the incoming data within the Hive CREATE TABLE statement itself?

I've have a CSV files which contain date and timestamp values in the below formats. Eg:
Col1|col2
01JAN2019|01JAN2019:17:34:41
But when I define Col1 as Date and Col2 as Timestamp in my create statement, the Hive tables simply returns NULL when I query.
CREATE EXTERNAL TABLE IF NOT EXISTS my_schema.my_table
(Col1 date,
Col2 timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
STORED AS TEXTFILE
LOCATION 'my_path';
Instead, if I define the data types as simply string then it works. But that's not how I want my tables to be.
I want the table to be able to read the incoming data in correct type. How can I achieve this? Is it possible to define the expected data format of the incoming data with the CREATE statement itself?
Can someone please help?
As of Hive 1.2.0 it is possible to provide additional SerDe property "timestamp.formats". See this Jira for more details: HIVE-9298
ALTER TABLE timestamp_formats SET SERDEPROPERTIES ("timestamp.formats"="ddMMMyyyy:HH:mm:ss");

best data type for hh:mm in a Hive Table

What is the appropriate Hive data type to store "hh:mm" into a Hive table? I've been using VARCHAR(5) however I've seen that that SMALLINT is used as well, this use case will be for a data warehouse where users will be able to filter data by this field. For example:
SELECT * FROM data WHERE air_time > '10:00' and air_time < '14:00'
For example, in sql server there is a TIME data type that was very convenient.
Any suggestions?
Varchar(5) is the most suitable data type. Looks like you don't need to do arithmetic on this data. Storing it in hh:mm varchar format allows you to do the comparison.

Alter Column type in Teradata

I have an export job from Datameer going through to HIVE. The issue is that we were told that HIVE converts Date columns to string. I am feeding the data from HIVE to Tableau and the issue is that the Date column being converted to string is completely throwing off my data.
I am looking to convert/alter my existing column "Posting_Date" from String to Date... HIVE is based off a Teradata interface so I am trying to find a command which will let me convert this column back to Date format..
I tried the following:
ALTER table Database.Table1
ADD posting_date date(4)

kettle etl how to convert to a time data type

I have a table input and gets some data from a SQL Server table. One field has values of type time, e.g. 02:22:57.0000000, the destination table (table output ) is a PostgreSQL table and has data type of time for that field. But PDI seems think the time from the source table is of type string and generates an error.
ERROR: column "contact_time" is of type time without time zone but expression is of type character varying
I tried using select value step, but there is no time type, only date and timestamp. How should I do?
You can use Select Values step and in the meta-data tab, select Type as Timestamp and Format as HH:mm:ss
This will format your string input to timestamp.
Hope this helps :)