Hive create map from multiple columns

Hive create map from multiple columns - hive

I have a table which contains a id and 12 columns and I want to create map out of all 12 columns:
Base table:
CREATE TABLE test_transposed(
id string,
jan double,
feb double,
mar double,
apr double,
may double,
jun double,
jul double,
aug double,
sep double,
oct double,
nov double,
dec double)
ROW FORMAT SERDE
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
Final table:
CREATE TABLE test_map(
id string,
trans map<String,Double>)
ROW FORMAT SERDE
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
what is the best way to achieve this?
I found the similar question here, but it didn't help. I don't want to write test_transposed data to file then create external table test_map on top of it.

create table test_map
stored as textfile
as
select id
,map
(
'jan',jan
,'feb',feb
,'mar',mar
,'apr',apr
,'may',may
,'jun',jun
,'jul',jul
,'aug',aug
,'sep',sep
,'oct',oct
,'nov',nov
,'dec',dec
) as trans
from test_transposed

Related

I am unable to upload a .csv file in postgresql database because I don't know how to put date structure in sql query

I downloaded a csv file for practising where date format is of two types as shown in picture.
the picture is here
I tried to change the format to yyyy-mm-dd in excel but it is not happening.
and also, I can't upload the file in database in my postgresql. I used the data type "date" but it says I need a different datestyle.
code I have used:
create table sample(
region varchar,
country varchar,
item_type varchar,
sales_channel varchar,
order_priority varchar,
order_date date,
order_id bigint,
ship_date date,
unit_sold int,
unit_price decimal,
unit_cost decimal,
total_revenue decimal,
total_cost decimal,
total_profit decimal);
copy sample from 'E:\postgresql\bin\5m Sales Records.csv'
delimiter ',' csv header;
ERROR: date/time field value out of range: "3/26/2016"
HINT: Perhaps you need a different "datestyle" setting.
CONTEXT: COPY sample, line 2, column ship_date: "3/26/2016"
SQL state: 22008
any guidance will be helpful, thanks

To summarize comments into sort of an answer:
create table csv_test(id integer, date_fld date);
--CSV file(csv_test.csv
1, 2021-07-11
2, 7/11/2021
3, 7/14/2021
show datestyle ;
DateStyle
-----------
ISO, MDY
\copy csv_test from '/home/aklaver/csv_test.csv' with csv;
COPY 3
select * from csv_test ;
id | date_fld
----+------------
1 | 2021-07-11
2 | 2021-07-11
3 | 2021-07-14
(3 rows)
set datestyle = 'iso, dmy';
SET
\copy csv_test from '/home/aklaver/csv_test.csv' with csv;
ERROR: date/time field value out of range: " '7/14/2021'"
HINT: Perhaps you need a different "datestyle" setting.
CONTEXT: COPY csv_test, line 3, column date_fld: " '7/14/2021'"
CSV values are text, so your date needs only to be in a correctly formatted date style. The second copy failed because the date style date order was 'dmy' and the value is 7/14/2021 and 14 is not a month number. This is why there is a date/month order setting, as 7/11/2021 could either be 'July 11 2021' or 'November 7 2021'. Postgres needs the user to tell it what ordering it is looking at.

BigQuery return numeric types

In the following query:
SELECT 1.2400 AS dec, 2 AS int, 1.2e-2 AS fl
Why does BigQuery always string-encode these numbers?
I understanding returning the decimal/numeric type as a string, but why does it do that for integers and floats?

Load csv with timestamp column to athena table

I have started using Athena Query engine on top of my S3 FILEs
some of them are timestamp format columns.
I have created a simple table with 2 columns
CREATE EXTERNAL TABLE `test`(
`date_x` timestamp,
`clicks` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://aws-athena-query-results-123-us-east-1/test'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1525003090')
I have tried to load a file and query it with Athena:
which look like that:
"2018-08-09 06:00:00.000",12
"2018-08-09 06:00:00.000",42
"2018-08-09 06:00:00.000",22
I have tried a different type format of timestamps such as DD/MM/YYYY AND YYY-MM-DD..., tried setting the time zone for each row - but none of them worked.
Each value I have tried is showing in Athena as this results:
date_x clicks
1 12
2 42
3 22
I have tried using a CSV file with and without headers
tried using with and without quotation marks,
But all of them showing defected timestamp.
My column on Athena must be Timestamp - rather it without timezone.
Please don't offer to use STRING column or DATE columns, this is not what i need.
How should the CSV File look like so Athena will recognize the Timestamp column?

Try the FORMAT: yyyy-MM-dd HH:mm:ss.SSSSSS
Article https://docs.amazonaws.cn/en_us/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html suggests:
"Timestamp values in text files must be in the format yyyy-MM-dd HH:mm:ss.SSSSSS, as the following timestamp value shows: 2017-05-01 11:30:59.000000 . "

how to process text timestamp in hive

I have a column in hive table stored as text. The text looks as shown below
2007-01-01T00:00:00+00:00
I am trying to find difference in time between two timestamp value stored as text in the above format.

Suppose we've got an Hive table dateTest with two column date1 string, date2 string
and suppose that table containing a row with this values:
2007-01-01T00:00:00+00:00,2007-02-01T00:00:00+00:00
The dates are in ISO 8601 UTC format, so if you run this query:
select datediff(from_unixtime(unix_timestamp(date2, "yyyy-MM-dd'T'HH:mm:ss")),from_unixtime(unix_timestamp(date1, "yyyy-MM-dd'T'HH:mm:ss"))) as days
from datetest;
the result is 31

Error in semantic analysis: DATE, DATETIME, and TIMESTAMP types aren't supported yet. Please use STRING instead

Is this exception because of hive version or some other issue?
Please provide me with a correct solution.
CREATE TABLE data_types_table (
our_tinyint TINYINT,
our_smallint SMALLINT ,
our_int INT ,
our_bigint BIGINT,
our_float FLOAT,
our_double DOUBLE,
our_timestamp TIMESTAMP ,
our_boolean BOOLEAN,
our_string STRING,
our_array ARRAY<TINYINT>,
our_map MAP<STRING,INT>,
our_struct STRUCT<f : SMALLINT, second : FLOAT, third : STRING>,
our_union UNIONTYPE<INT,FLOAT,STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '^'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
TBLPROPERTIES ('creator'='avi', 'created_at'='Mon May 18
20:46:32 EDT 2015');

If your Hive version is 0.8 and above, then TIMESTAMP datatype will be supported.
If your Hive version is 0.12 and above, then DATE datatype will be supported.
Hive doesn't support DATETIME datatype.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive create map from multiple columns - hive

create table test_map stored as textfile as select id ,map ( 'jan',jan ,'feb',feb ,'mar',mar ,'apr',apr ,'may',may ,'jun',jun ,'jul',jul ,'aug',aug ,'sep',sep ,'oct',oct ,'nov',nov ,'dec',dec ) as trans from test_transposed

Related

I am unable to upload a .csv file in postgresql database because I don't know how to put date structure in sql query

BigQuery return numeric types

Load csv with timestamp column to athena table

how to process text timestamp in hive

Error in semantic analysis: DATE, DATETIME, and TIMESTAMP types aren't supported yet. Please use STRING instead

Categories

Resources