I am very new to partition.
Suppose I have the following table
table mytable(mytime timestamp, myname string)
where the column mytime is like this: year-month-day hour:min:sec.msec (for example,2014-12-05 08:55:59.3131)
I want to partition mytable based on year-month-day of mytime
For example,I want to make a partition for 2014-12-05
The record which has mytime like 2014-12-05 08:55:59,3131 will be in this partition.
So the query like select * from mytable where mytime='2014-12-05%' will search the
partition.
How can I do that in hive?
I already have data in mytable, do I need to recreate mytable and reload all the data?
Thank you
input
1997-12-31 23:59:59.999,kishore
2014-12-31 23:59:59.999999,manish
create table mytable_tmp(mytime string,myname string)
row format delimited
fields terminated by ',';
load data local inpath 'input.txt'
overwrite into table mytable_tmp;
create table mytable(myname string,mytimestamp string)
PARTITIONED BY (mydate string)
row format delimited
fields terminated by ',';
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE mytable PARTITION(mydate)
SELECT myname,mytime,to_date(mytime) from mytable_tmp;
select * from mytable where mydate='2014-12-31';
manish 2014-12-31 23:59:59.999999 2014-12-31
there is partition mydate which include myname and mytime according to your problem;
Related
I have a CSV file containing about 30m rows with a column that has a date type. But the problem is with its format. PSQL supports '-' delimiters for timestamp but my dates are using '/'. For example, the date should be '2021-02-01 00:00:00' but my date format is '2021/02/01 00:00:00'. Also, I can not open the CSV file and change it manually due to its large size. I am trying to import my data into a temporary table to replace the '/' with '-' and then inserting them to a new table and I am using the following command(it is not the real table and it is just an example):
CREATE TABLE TMP(
dt VARCHAR
)
CREATE TABLE other_tmp(
dt TIMESTAMP
)
INSERT INTO TMP VALUES('2020/01/02 22:33:11');
INSERT INTO other_tmp(dt)
SELECT dt,
REPLACE(dt, '/', '-')
FROM TMP
I get an error with replace function when I want to run it.
Does anybody know that how can I solve this problem? Or even is it possible to manipulate the column in the original table?
Try this,
In this case, your column "dt" is of type timestamp without time zone but the expression is of type text. So the text should cast to timestamp as below.
CREATE TABLE IF NOT EXISTS TMP
(
dt VARCHAR
);
CREATE TABLE IF NOT EXISTS other_tmp
(
dt TIMESTAMP
);
INSERT INTO TMP
VALUES ('2020/01/02 22:33:11'::TIMESTAMP);
INSERT INTO other_tmp(dt)
SELECT REPLACE(dt, '/', '-')::TIMESTAMP AS dt
FROM TMP;
Your first error is that you have two columns in the SELECT list, but only one column in the INSERT target.
To convert a string to a timestamp, use to_timstamp()
insert into other_tmp(dt)
select to_timestamp(dt, 'yyyy/mm/dd hh24:mi:ss')
from tmp;
Online example
I'm trying to create a table with one column which is supposed to have a timestamp as the datatype. An example of the value I want to insert into it is this:
2018-12-01T00:00:00Z.
How can I accomplish this? Right now I have it stored as a varchar so when I make joins with this table, in order to convert to timestamp, I need to do this:
to_timestamp(dateColumn, 'YYYY-MM-DDT00:00:00Z)
I'd like to avoid doing this every time I do joins so is there an option maybe when I insert to have this value 2018-12-01T00:00:00Z stored as a Date? It would be fine if the value when inserted into the table appeared as this: 2018-12-01 00:00:00.
EDIT:
Here is my insert statement:
INSERT INTO TEST
SELECT * FROM
EXTERNAL 'C:\...\Desktop\testTable.csv'
USING
(
DELIMITER ','
DATESTYLE 'YMD'
Y2BASE 2000
REMOTESOURCE 'ODBC'
SKIPROWS 1
MAXERRORS 1
LOGDIR 'C:\...\Desktop'
ENCODING 'internal'
DATEDELIM '-'
);
I get the error for the date column: Expected field delimiter or end of record, "2018-12-01" [T]
I am trying to load data from normal table to Hive partitioned table.
Here is my normal table syntax:
create table x(name string, date1 string);
Here is my new partitioned table syntax:
create table y (name string, date1 string) partitioned by (timestamp1 string);
Here is how I am how to load data to y:
insert into table y PARTITION(SUBSTR(date1,0,2)) select name, date1 from x;
Here is my Exception:
FAILED: ParseException line 1:39 missing ) at '(' near ',' in column name
line 1:51 cannot recognize input near '0' ',' '2' in column name
Use dynamic partitioning:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table y PARTITION(timestamp1)
select name, date1, SUBSTR(date1,0,2) as timestamp1 from x;
I am trying to insert data into a Hive table through Dynamic partitioning the table is
CREATE EXTERNAL TABLE target_tbl_wth_partition(
booking_id string,
code string,
txn_date timestamp,
logger string,
)
partition by (txn_date date,txn_hour int)
Values
txn_date=20160216
txn_hour=12
CREATE EXTERNAL TABLE stg_target_tbl_wth_partition(
booking_id string,
code string,
txn_date timestamp,
logger string,
)
insert overwrite table target_tbl_wth_partition partition(txn_date,hour(txn_date))
select booking_id,code,txn_date,logger from stg_target_tbl_wth_partition;
I am not able to insert with derived columns in Dynamic partition. Any help on how to proceed with such case will be helpful.
Regards,
Rakesh
I suggest you start from something like that...
CREATE TABLE blahblah (...)
PARTITIONED BY (aaa STRING, bbb STRING)
;
SET hive.exec.dynamic.partition = true
;
SET hive.exec.dynamic.partition.mode = nonstrict
;
INSERT INTO TABLE blahblah PARTITION (aaa, bbb)
SELECT ...,
SUBSTRING(aaabbb,1,5) as aaa,
SUBSTRING(aaabbb,7,2) as bbb
FROM sthg
;
...and make it work; then you can start experimenting some weird and unsupported syntax and see what works and what does not.
I the data in the following format
6856437950 11/16/2008 22:36:38 8204208990 1001004006044273
6715281120 11/16/2008 15:29:42 8132862237 1001004005059895
The Hive table i have create is the following
CREATE TABLE t2 (session_id STRING, date_time STRING, customer_id STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
When I load the data into the table and display the contents its shows in the following format:
6856437950 11/16/2008 22:36:38 8204208990 1001004006044273 NULL NULL
6715281120 11/16/2008 15:29:42 8132862237 1001004005059895 NULL NULL
It shows all the elements in the row are assigned to variable session_id and the rest date_time and customer_id are NULL.
I believe I made a mistake in FIELD TERMINATED clause but I am not sure what value to assign it for.
hive (default)> CREATE TABLE t2 (session_id STRING, date_time STRING, customer_id STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
OK
Time taken: 9.343 seconds
hive (default)> desc t2;
OK
col_name data_type comment
session_id string
date_time string
customer_id string
Time taken: 0.319 seconds
hive (default)> LOAD DATA LOCAL INPATH '/tmp/input.txt' INTO table t2;
Copying data from file:/tmp/input.txt
Copying file: file:/tmp/input.txt
Loading data to table default.t2
OK
Time taken: 0.766 seconds
hive (default)> select * from t2;
OK
session_id date_time customer_id
6856437950 11/16/2008 22:36:38 8204208990 1001004006044273
6715281120 11/16/2008 15:29:42 8132862237 1001004005059895
Time taken: 0.494 seconds
hive (default)