how to map Teradata data types to hive datatypes, for ex: in TD DATE FORMAT 'YYYY-MM-DD' COMPRESS (DATE '2013-10-09',DATE '2014-02-25'),. in hive we can use it as DATE. but whenever we load data to hive table it show as NULL... please help me in this regard... Thank you
Related
I am sending JSON telemetry data from our IoT Hub to Azure Data Lake Gen2 in the form of .parquet files. From the data lake I've then created a view in my Azure Synapse Serverless SQL pool that I can connect to and query data for reports.
CREATE VIEW DeviceTelemetryView
AS SELECT * FROM
OPENROWSET(
BULK 'https://test123.dfs.core.windows.net/devicetelemetry/*/*/*/*/*/',
FORMAT = 'PARQUET'
) AS [result]
This is what my view data looks like:
Most of these reports are based on date time ranges. Therefore I want to be able to write SQL queries that use my date time stamp.
The Current Issue
When I look at the current data type for the dateTimeStamp column, it defaults to varchar(8000) even though I believe my JSON is in the correct datetime format: "2021-11-29T21:45:00". How can I transform this specific field to a datetime field in my view to run queries on it?
When I look at the current data type for the dateTimeStamp column, it defaults to varchar(8000)
I think you would have to look at the data type for that column in the parquet file, it's likely to be a string in your case. Sql interpret as a varchar(8000).
even though I believe my JSON is in the correct datetime format: "2021-11-29T21:45:00".
Even if the timestamp format is correct, I think you'd have to tip system so it knows to cast that string to a datetime
How can I transform this specific field to a datetime field in my view to run queries on it?
I'm not an expert in sql but I think you can convert a string to a timestamp using cast and convert
CREATE VIEW DeviceTelemetryView
AS SELECT corporationid, deviceid, version, Convert(datetime, dateTimestamp, 126) AS NewDateFormat, data FROM
OPENROWSET(
BULK 'https://test123.dfs.core.windows.net/devicetelemetry/*/*/*/*/*/',
FORMAT = 'PARQUET'
) AS [result]
What is the appropriate Hive data type to store "hh:mm" into a Hive table? I've been using VARCHAR(5) however I've seen that that SMALLINT is used as well, this use case will be for a data warehouse where users will be able to filter data by this field. For example:
SELECT * FROM data WHERE air_time > '10:00' and air_time < '14:00'
For example, in sql server there is a TIME data type that was very convenient.
Any suggestions?
Varchar(5) is the most suitable data type. Looks like you don't need to do arithmetic on this data. Storing it in hh:mm varchar format allows you to do the comparison.
Everytime I am trying to select in IMPALA a DATE type field from a table created in HIVE I get the AnalysisException: Unsupported type 'DATE'.
Are there any workarounds?
UPDATE this is an example of a create table schema from hive and an impala query
Schema:
CREATE TABLE myschema.mytable(day_dt date,
event string)
PARTITIONED BY (day_id int)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
Impala query
select b.day_dt
from myschema.mytable b;
Impala doesn't have a DATE datatype, whereas Hive has. You will get AnalysisException: Unsupported type 'DATE' when you access it from Impala. A quick fix would be to create a string column of that date value in Hive and access it in whichever way you want from Impala.
If you're storing as strings, it may work to create a new external hive table that points to the same HDFS location as the existing table, but with the schema having day_dt with datatype STRING instead of DATE.
This is a true workaround, it may only suit some use cases, and you'd at least need to do "MSCK REPAIR" on the external hive table whenever a new partition is added.
I have an export job from Datameer going through to HIVE. The issue is that we were told that HIVE converts Date columns to string. I am feeding the data from HIVE to Tableau and the issue is that the Date column being converted to string is completely throwing off my data.
I am looking to convert/alter my existing column "Posting_Date" from String to Date... HIVE is based off a Teradata interface so I am trying to find a command which will let me convert this column back to Date format..
I tried the following:
ALTER table Database.Table1
ADD posting_date date(4)
I have a data file I wish to import into Hive which contains timestamps. The timestamps are of the format MM/dd/yyyy HH:mm:ss.
I would like to create a table which contains a timestamp type to hold this value, however I can't figure out how to directly import the data.
My workaround is to import the data into a temporary table with my date as a string, then read the data from this temporary table into my permanent table doing the time format conversion on the fly.
So, my entire two-step load function looks something like this:
create table tempTable(
timeField string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
--load data local here!
create table finalTable(
timeField timestamp
) stored as RCFILE;
insert into table finalTable select
from_unixtime( unix_timestamp(timeField,'MM/dd/yyyy HH:mm') )
from tempTable;
So finally my question :-)
Is this the 'right' or 'best' way to do it? Am I using an inefficient/stupid workaround?
thanks!
Another work around is change the datetime format of your data file as yyyy/MM/dd HH:mm:ss
Then it will directly cast the data as timestamp datatype to hive table.
Hope this help.
create table temptable(
timefield timestamp
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
This will cast your string 04/05/2014 04:25:55 as timestamp on hive.