In the process of executing my hql script, i have to store data into a temporary table before inserting to the main table.
In that scenario, I have tried to create a temporary table with an underscore at the starting.
Note: with quotes the table name with underscore is not working.
Working Create Statement:
create table
dbo.`_temp_table` (
emp_id int,
emp_name string)
stored as ORC
tblproperties ('ORC.compress' = 'ZLIB')';
Working Insert Statement:
insert into table dbo.`_temp_table` values (123, 'ABC');
But, the select statement on the temp table is not working and it is showing null records even though we have inserted the record as per insert statement.
select * from dbo.`_temp_table`;
Everything is working fine, but select statement to view the rows is not working.
I still not sure, that we can create a temp table in the above way???
Hadoop uses such filenames started with underscore for hidden files and ignores them when reading. For example "_$folder$" file which is created when you execute mkdir to create empty folder in S3 bucket.
See HIVE-6431 - Hive table name start with underscore
By default, FileInputFormat(which is the super class of various
formats) in hadoop ignores file name starts with "_" or ".", and hard
to walk around this in hive codebase.
You can try to create external table and specify table location without underscore and still having underscore in table name. Also consider using TEMPORARY tables.
I have data stored in S3 like:
/bucket/date=20140701/file1
/bucket/date=20140701/file2
...
/bucket/date=20140701/fileN
/bucket/date=20140702/file1
/bucket/date=20140702/file2
...
/bucket/date=20140702/fileN
...
My understanding is that if I pull in that data via Hive, it will automatically interpret date as a partition. My table creation looks like:
CREATE EXTERNAL TABLE search_input(
col 1 STRING,
col 2 STRING,
...
)
PARTITIONED BY(date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION 's3n://bucket/';
However Hive doesn't recognize any data. Any queries I run return with 0 results. If I instead just grab one of the dates via:
CREATE EXTERNAL TABLE search_input_20140701(
col 1 STRING,
col 2 STRING,
...
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION 's3n://bucket/date=20140701';
I can query data just fine.
Why doesn't Hive recognize the nested directories with the "date=date_str" partition?
Is there a better way to have Hive run a query over multiple sub-directories and slice it based on a datetime string?
In order to get this to work I had to do 2 things:
Enable recursive directory support:
SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;
For some reason it would still not recognize my partitions so I had to recover them via:
ALTER TABLE search_input RECOVER PARTITIONS;
You can use:
SHOW PARTITIONS table;
to check and see that they've been recovered.
I had faced the same issue and realized that hive does not have partitions metadata with it. So we need to add that metadata using ALTER TABLE ADD PARTITION query. It becomes tedious, if you have few hundred partitions to create same queries with different values.
ALTER TABLE <table name> ADD PARTITION(<partitioned column name>=<partition value>);
Once you run above query for all available partitions. You should see the results in hive queries.
I would like to import data into my postgresql table.
I have .csv file that is formated like this:
1; John Blake
2; Roberto Young
3;Mark Palmer
Any solution how to strip first whitespace where it exists?
i used following code
\copy users from 'users.csv' using delimiters E';'
And it does keep whitespaces
COPY to a temporary staging table and INSERT into the target table from there, trimming the text column.
CREATE TEMP TABLE tmp_x AS
SELECT * FROM users LIMIT 0; -- empty temp table with structure of target
\copy tmp_x FROM '/absolute/path/to/file' delimiters E';'; -- psql command (!)
INSERT INTO users
(usr_id, usr, ...) -- list columns
SELECT usr_id, ltrim(usr), ...
FROM tmp_x;
DROP TABLE tmp_x; -- optional; is destroyed at end of session automatically
ltrim() only trims space from the left of the string.
This sequence of actions performs better than updating rows in the table after COPY, which take longer and produce a dead rows. Also, only newly imported rows are manipulated this way.
Related answer:
Delete rows of a table specified in a text file in Postgres
You won't be able to use COPY alone to do that.
You can use an UPDATE coupled with trim:
UPDATE table SET column = trim(from column)
Or use a script to clean the data before bulk inserting the data to the DB.
When I run a script in PostgreSQL I usually do the following from psql:
my_database> \i my_script.sql
Where in my_script.sql I may have code like the following:
select a.run_uid, s.object_uid into temp_table from dt.table_run_group as a
inner join dt.table_segment as s on a.group_uid = s.object_uid;
In this particular case, I am only interested in creating temp_table with the results of the query.
Are these results in disk on the server? In memory? Is the table stored permanently?
Temporary tables are stored in RAM until the available memory is used up, at which time they spill onto disk. The relevant setting here is temp_buffers.
Either way, they live for the duration of a session and are dropped at the end automatically.
You can also drop them at the end of a transaction automatically (ON COMMIT DROP) or manually any time.
Temporary table are only visible to the the same user in the same session. Others cannot access it - and also not conflict with it.
Always use CREATE TABLE tbl AS .... The alternative form SELECT ... INTO tbl is discouraged since it conflicts with the INTO clause in plpgsql.
Your query could look like:
CREATE TEMP TABLE tbl AS
SELECT a.run_uid, s.object_uid
FROM dt.table_run_group a
JOIN dt.table_segment s ON a.group_uid = s.object_uid;
SELECT INTO table ... is the same as CREATE TABLE table AS ..., which creates a normal, permanent table.
I am new to hive. I have successfully setup a single node hadoop cluster for development purpose and on top of it, I have installed hive and pig.
I created a dummy table in hive:
create table foo (id int, name string);
Now, I want to insert data into this table. Can I add data just like sql one record at a time? kindly help me with an analogous command to:
insert into foo (id, name) VALUES (12,"xyz);
Also, I have a csv file which contains data in the format:
1,name1
2,name2
..
..
..
1000,name1000
How can I load this data into the dummy table?
I think the best way is:
a) Copy data into HDFS (if it is not already there)
b) Create external table over your CSV like this
CREATE EXTERNAL TABLE TableName (id int, name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'place in HDFS';
c) You can start using TableName already by issuing queries to it.
d) if you want to insert data into other Hive table:
insert overwrite table finalTable select * from table name;
There's no direct way to insert 1 record at a time from the terminal, however, here's an easy straight forward workaround which I usually use when I want to test something:
Assuming that t is a table with at least 1 record. It doesn't matter what is the type or number of columns.
INSERT INTO TABLE foo
SELECT '12', 'xyz'
FROM t
LIMIT 1;
Hive apparently supports INSERT...VALUES starting in Hive 0.14.
Please see the section 'Inserting into tables from SQL' at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
What ever data you have inserted into one text file or log file that can put on one path in hdfs and then write a query as follows in hive
hive>load data inpath<<specify inputpath>> into table <<tablename>>;
EXAMPLE:
hive>create table foo (id int, name string)
row format delimited
fields terminated by '\t' or '|'or ','
stored as text file;
table created..
DATA INSERTION::
hive>load data inpath '/home/hive/foodata.log' into table foo;
to insert ad-hoc value like (12,"xyz), do this:
insert into table foo select * from (select 12,"xyz")a;
this is supported from version hive 0.14
INSERT INTO TABLE pd_temp(dept,make,cost,id,asmb_city,asmb_ct,retail) VALUES('production','thailand',10,99202,'northcarolina','usa',20)
It's a limitation of hive.
1.You cannot update data after it is inserted
2.There is no "insert into table values ... " statement
3.You can only load data using bulk load
4.There is not "delete from " command
5.You can only do bulk delete
But you still want to insert record from hive console than you can do select from statck. refer this
You may try this, I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated.
Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of
generated Hadoop script to insert csv into Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
Sample of generated Hive scripts
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
Thanks
Vijay
You can use following lines of code to insert values into an already existing table. Here the table is db_name.table_name having two columns, and I am inserting 'All','done' as a row in the table.
insert into table db_name.table_name
select 'ALL','Done';
Hope this was helpful.
Hadoop file system does not support appending data to the existing files. Although, you can load your CSV file into HDFS and tell Hive to treat it as an external table.
Use this -
create table dummy_table_name as select * from source_table_name;
This will create the new table with existing data available on source_table_name.
LOAD DATA [LOCAL] INPATH '' [OVERWRITE] INTO TABLE <table_name>;
use this command it will load the data at once just specify the file path
if file is in local fs then use LOCAL if file is in hdfs then no need to use local