Loading 1500 CSV files with sqlldr - sql

I have more than 1500 CSV files to load into Oracle 11gR2. I'm using sqlldr in Windows environment. I know i can load the file as follow, but it a really bad way for many reasons.
load data
infile 'FILE_1.csv'
infile 'FILE_2.csv'
infile 'FILE_3.csv'
infile 'FILE_4.csv'
infile 'FILE_5.csv'
.
.
.
infile 'FILE_1500.csv'
append
into table MyTable
fields terminated by ' '
trailing nullcols
(
A,
B,
C,
D,
E
F,
G
)
I'm looking for an automatic way to load a whole folder of files into the DB, file by file (I don't wan't to merge the files, since they are huge).
Any idea?

Use EXTERNAL TABLE, pass the file names to it. On 11gR2, you could use PREPROCESSOR DIRECTIVE.
You could even pass the file names dynamically. Have a look at this asktom link for more details https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3015912000346648463

Related

How to execute pig script and save the result in another file?

I have a "solution.pig" file which contain all load, join and dump queries. I need to run them by typing "solution.pig" in grunt> and save all the result in other file. How can I do that?
You can run the file directly with pig -f solution.pig. Don't open the grunt REPL
And in the file, you can use as many STORE commands as you want to save results into files, rather than DUMP

Apache pig load multiple files

I have the following folder structure containing my content adhering to the same schema -
/project/20160101/part-v121
/project/20160105/part-v121
/project/20160102/part-v121
/project/20170104/part-v121
I have implemented a pig script which uses JSONLoader to load & processes individual files. However I need to make it generic to read all the files under the dated folder.
Right now I have managed to extract the file paths using the following -
hdfs -ls hdfs://local:8080/project/20* > /tmp/ei.txt
cat /tmp/ei.txt | awk '{print $NF}' | grep part > /tmp/res.txt
Now I need to know how do I pass this list to pig script so that my program runs on all the files.
We can use regex path in LOAD statement.
In your case the below statement should help, let me know if you face any issues.
A = LOAD 'hdfs://local:8080/project/20160102/*' USING JsonLoader();
Assuming .pig_schema (produced by JsonStorage) in the input directory.
Ref : https://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore

Hive Reading external table from compressed bz2 file

this is my scenario.
I have bz2 file in Amazon s3. Within the bz2 file, there lies files with .dat,.met,.sta extensions.I am only interested in files with *.dat extensions.You can download this samplefile to take a look at bz2 file.
create external table cdr (
anum string,
bnum string,
numOfTimes int
)
row format delimited
fields terminated by ','
lines terminated by '\n'
location 's3://mybucket/dir'; #the zip file is inside here
The problem lies such that when I execute the above command, some of the records/rows had issues.
1)all the data from files such as *.sta and *.met are also included.
2)the metadata of the filenames are also included.
The only idea I had was to show the INPUT_FILE_NAME. But then, all the records/rows had the same INPUT_FILE_NAME which was the filename.tar.bz2.
Any suggestions are welcome. I am currently completely lost.

Using Pentaho Kettle, how can I convert a csv using commas to a csv with pipe delimiters?

I have a CSV input file with commas. I need to change the delimiter to pipe. Which step should I use in Pentaho kettle? Please do suggest.
Thanks!
Do not use big gun when you try to shoot small target. Can use sed or awk. Or when you want to integrate with kettle, can use step to run shell script and within script use sed for example.
If your goal is to output a pipe separated CSV file from data within a transform and you're already running Kettle, just use a Text File output step.
If the goal is to do something unusual with CSV data within the transform itself, you might look into the Concat Fields step.
If the goal is simply to take a CSV file and write out another CSV with different separators, use the solution #martinnovoty suggests.
You can achieve this easy:
Add a javascript step after the load your csv step into a variable "foo" and add this code onto the js step:
var newFoo = replace(foo,",", "|");
now your cvs file is loaded in newFoo var with pipes.

SQLLDR control file: Loading multiple files

Iam trying to load several data files into a single table. Now the files themselves have the following format:
file_uniqueidentifier.dat_date
My control file looks like this
LOAD DATA
INFILE '/home/user/file*.dat_*'
into TABLE NEWFILES
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
FIRSTNAME CHAR NULLIF (FIRSTNAME=BLANKS)
,LASTNAME CHAR NULLIF (LASTNAME=BLANKS)
)
My SQLLDR on the other hand looks like this
sqlldr control=loader.ctl, userid=user/pass#oracle, errors=99999,direct=true
The error produced is SQL*Loader-500 unable to open file (/home/user/file*.dat_*) SQL*Loader-553 file not found
Does anyone have an idea as to how I can deal with this issue?
SQLLDR does not recognize the wildcard. The only way to have it use multiple files to to list them explicitly. You could probably do this using a shell script.
Your file naming convention seem like you can combine those files in to one making that one being used by the sqlldr control file. I don't know how you can combine those files into one file in Unix, but in Windows I can issue this command
copy file*.dat* file.dat
This command will read all the contents of the files that have the names that start with file and extension of dat and put in the file.dat file.
I have used this option and this works fine for multiple files uploading into single table.
-- SQL-Loader Basic Control File
options ( skip=1 )
load data
infile 'F:\oracle\dbHome\BIN\sqlloader\multi_file_insert\dept1.csv'
infile 'F:\oracle\dbHome\BIN\sqlloader\multi_file_insert\dept2.csv'
truncate into table scott.dept2
fields terminated by ","
optionally enclosed by '"'
( DEPTNO
, DNAME
, LOC
, entdate
)