I am using SQL LOADER to load multiple csv file in one table.
The process I found is very easy like
LOAD
DATA
INFILE '/path/file1.csv'
INFILE '/path/file2.csv'
INFILE '/path/file3.csv'
INFILE '/path/file4.csv'
APPEND INTO TABLE TBL_DATA_FILE
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
COL0,
COL1,
COL2,
COL3,
COL4
)
But I don't want to use INFILE multiple time cause if I have more than 1000 files then I have to mention 1000 times INFILE in control file script.
So my question is: is there any other way (like any loop / any *.csv) to load multiple files without using multiple infile?
Thanks,
Bithun
Solution 1: Can you concatenate the 1000 files into on big file, which is then loaded by SQL*Loader. On unix, I'd use something like
cd path
cat file*.csv > all_files.csv
Solution 2: Use external tables and load the data using a PL/SQL procedure:
CREATE PROCEDURE myload AS
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE IMMEDIATE 'ALTER TABLE xtable LOCATION ('''||to_char(i,'FM9999')||'.csv'')';
INSERT INTO mytable SELECT * FROM xtable;
END LOOP;
END;
You can use a wildcards (? for a single character, * for any number) like this:
infile 'file?.csv'
;)
Loop over the files from the shell:
#!/bin/bash
for csvFile in `ls file*.csv`
do
ln -s $csvFile tmpFile.csv
sqlldr control=file_pointing_at_tmpFile.ctl
rm tmpFile.csv
done
OPTIONS (skip=1)
LOAD DATA
INFILE /export/home/applmgr1/chalam/Upload/*.csv
REPLACE INTO TABLE XX_TEST_FTP_UP
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(FULL_NAME,EMPLOYEE_NUMBER)
whether it will check all the CSV and load the data or not
Related
I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.
In PostgreSQL I previously created a table like so:
CREATE TABLE IF NOT EXISTS stock_data (
code varchar,
date date,
open decimal,
high decimal,
low decimal,
close decimal,
volume decimal,
UNIQUE (code, date)
);
The idea is to import multiple csv files into this table. My approach is to use COPY ... FROM STDIN instead of COPY ... FROM '/path/to/file', as I want to be able to cat from the shell multiple csv files and pipe them to the sql script. The sql script to accomplish this currently looks like this:
CREATE TEMPORARY TABLE IF NOT EXISTS stock_data_tmp (
code varchar,
ddate varchar,
open decimal,
high decimal,
low decimal,
close decimal,
volume decimal,
UNIQUE (code, ddate)
);
\copy stock_data_tmp FROM STDIN WITH CSV;
INSERT INTO stock_data
SELECT code, to_date(date, 'YYYYMMDD'), open, high, low, close, volume
FROM stock_data_tmp;
DROP TABLE stock_data_tmp;
An example csv file looks like this
AAA,20140102,21.195,21.24,1.16,1.215,607639
BBB,20140102,23.29,2.29,2.17,2.26,1863
CCC,20140102,61.34,0.345,0.34,0.34,112700
DDD,20140102,509.1,50.11,50.09,50.11,409863
From the shell I try:
cat /path/to/20140102.txt | psql -d my_db_name -f ~/path/to/script/update_stock_data.sql
But it gives me this error:
psql:/path/to/script/update_stock_data.sql:22: ERROR: missing data for column "date"
CONTEXT: COPY stock_data_tmp, line 1: ""
However, if in my script I change the COPY command to:
\copy stock_data_tmp FROM '/path/to/20140102.txt' WITH csv;
... and simply call
psql -d my_db_name -f ~/path/to/script/update_stock_data.sql
it succeeds.
Why am I getting this error when using cat and STDIN, and not when using the file PATH?
Because if you use -f, COPY will try to read from that file and not from stdin.
I am using the copy command to dump the data of a table in PostgreSQL to a txt file. I run the following command in PSQL:
\copy (select * from TableName) to 'C:\Database\bb.txt' with delimiter E'\t' null as '';
now in the bb.txt file, I see some special characters which are not there in the table itself. The Database has been configured with UTF8 encoding.
For example: when I run the above mentioned copy query, if the special character shows up in the column with ID=5. If I run the same copy query with (select * from tablename where ID=5), the special char is not there:
\copy (select * from TableName where ID=5) to 'C:\Database\bb.txt' with delimiter E'\t' null as '';
This happens on a Windows machine. Can someone tell me where these special characters are coming from?
Hi I'm new to hive and would definitely appreciate some tips.
I'm trying to export hive query results as a csv, in the cli.
I can export them as text using:
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt
Can anyone suggest what I need to add in order to get the columns delimited by ','
This is how you can do it directly from hive, instead of going through sed route.
SET hive.exec.compress.output=FALSE;
SET hive.cli.print.header=TRUE;
INSERT overwrite local directory '/file_path/file_name.txt' row format delimited fields terminated by ',' SELECT * FROM TABLE_NAME LIMIT 1;
You can use concat_ws() function in your query like this
For SELECT *
select concat_ws(',',*) from <table-name>;
Or if you want particluar columns
select concat_ws(',', col_1, col_2, col_3...) from <table-name>;
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt && cat /file_path/file_name.txt | sed -e 's/\s/,/g' > /file_path/file_name.formatted.txt
once your query creates the output file, then use sed to replace the space with "," as show above.
I have a bunch of emails, each separated by a comma (,) (not csv). I want to upload them to a database table (with single field email) such that each email goes into separate record entry. what could be the most easiest way to do that? I have an idea of using grep to replace commas with my sql syntax.. but searching for any other workaround.. any idea?
Perhaps something like:
LOAD DATA INFILE '/where/the/file/is'
INTO TABLE table (email)
FIELDS TERMINATED BY ','
LINES STARTING BY '';
Syntax docs here: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
I'd use shell tools like sed or awk to convert the input format to something that mysqlimport can handle.
Convert the current ',' separated email list to a one line per email list
tr ',' '\n' < inputfilename > outputfilename
use load data infile after logging into mysql, make sure your table only has one column in this case
load data infile 'outputfilename' into table tablename;
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
MySQL supports multiple inserts in a single statment
INSERT INTO [Table] ([col1], [col2], ... [colN] )
VALUES ([value1], [value2], ... [valueN] )
, ([value1], [value2], ... [valueN] )
, ([value1], [value2], ... [valueN] )
;
You could pretty quickly format a comma-separated file into this format.