SQLLDR control file: Loading multiple files - sql

Iam trying to load several data files into a single table. Now the files themselves have the following format:
file_uniqueidentifier.dat_date
My control file looks like this
LOAD DATA
INFILE '/home/user/file*.dat_*'
into TABLE NEWFILES
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
FIRSTNAME CHAR NULLIF (FIRSTNAME=BLANKS)
,LASTNAME CHAR NULLIF (LASTNAME=BLANKS)
)
My SQLLDR on the other hand looks like this
sqlldr control=loader.ctl, userid=user/pass#oracle, errors=99999,direct=true
The error produced is SQL*Loader-500 unable to open file (/home/user/file*.dat_*) SQL*Loader-553 file not found
Does anyone have an idea as to how I can deal with this issue?

SQLLDR does not recognize the wildcard. The only way to have it use multiple files to to list them explicitly. You could probably do this using a shell script.

Your file naming convention seem like you can combine those files in to one making that one being used by the sqlldr control file. I don't know how you can combine those files into one file in Unix, but in Windows I can issue this command
copy file*.dat* file.dat
This command will read all the contents of the files that have the names that start with file and extension of dat and put in the file.dat file.

I have used this option and this works fine for multiple files uploading into single table.
-- SQL-Loader Basic Control File
options ( skip=1 )
load data
infile 'F:\oracle\dbHome\BIN\sqlloader\multi_file_insert\dept1.csv'
infile 'F:\oracle\dbHome\BIN\sqlloader\multi_file_insert\dept2.csv'
truncate into table scott.dept2
fields terminated by ","
optionally enclosed by '"'
( DEPTNO
, DNAME
, LOC
, entdate
)

Related

command line bq load that pulls variable out of load file name for table name

I have a google storage directory that contains several files of the same file layout, with the same file naming convention for different states. I want to run one command line bq load to create a separate BiqQuery table for each file, that also pulls the 2-letter state abbreviation out of each load file name to include it in the created table name.
The file naming convention is:
PUF[STATE]Plan2021.csv (i.e. - PUFCAPlan2021.csv, PUFMDPlan2021.csv, PUFMNPlan2021.csv, etc.)
I want a table created for each file like:
HIOS_STATE_PLAN_ATTR_CA_RAW_2021, HIOS_STATE_PLAN_ATTR_MD_RAW_2021, HIOS_STATE_PLAN_ATTR_MN_RAW_2021
So - in the below example - I'd want to read the "CA" out of the "PUFCAPlan2021.csv" file name and use it for the [STATE] in the table name, for each of the files included in the directory. I have no idea if this is possible?
REM create a raw file
ECHO Creating raw file
call bq load --skip_leading_rows=1 ^
meritagedata:PAYER.HIOS_STATE_PLAN_ATTR_[STATE]_RAW_2021 gs://payer_raw_files/HIOS_STATE_PLAN_ATTR/2021/PUFCAPlan2021.csv ^
I know I can do something like below, but that still requires running a separate command line for each file in the directory. Wondering if it's possible to create one load statement that will create a separate table for each file and insert the state abbreviation into the table name.
REM create a raw file
SET STATE=CA
ECHO Creating raw file
call bq load --skip_leading_rows=1 ^
meritagedata:PAYER.HIOS_STATE_PLAN_ATTR_%STATE%_RAW_2021 gs://payer_raw_files/HIOS_STATE_PLAN_ATTR/2021/PUF%STATE%Plan2021.csv ^

Mass extract part of a text file using Windows batch

I have thousands of txt files that are actually in JSON format.
Each file has the same string, with different values, namely:
"Name":"xxx","Email":"yyy#zzz.com"
I want to extract the values of these two strings from all the txt files that I put in the same folder.
I've found these lines of code:
Extract part of a text file using Windows batch
but it only applies to one txt file. Whereas what I need is, it can execute all files in one folder.
You can use the FORFILES command, to loop through each file,
Syntax
FORFILES [/p Path] [/m SrchMask] [/s] [/c Command] [/d [+ | -] {date | dd}]
From the following webpage,
https://ss64.com/nt/forfiles.html

Get file name from SAP Data service

I'm unable to read file name from data services which contain date_time format, I can read date but time can be variable, I've tried with *.csv on file name(s) property for flat file, but this for static file name.
Example: File_20180520_200003.csv, File_20180519_192503.csv, etc.
My script:
$Filename= 'File_'|| to_char(sysdate()-1, 'YYYYMMDD')|| '_'|| '*.csv';
I want to find a solution to read the 6 digits (any number) *.
Finally, I've found a solution by using
$Csv = word(exec('cmd','dir /b [$Filename]*.csv',8),2) ;
on the flat file (file name property), I've added $Csv
It works fine.

Loading 1500 CSV files with sqlldr

I have more than 1500 CSV files to load into Oracle 11gR2. I'm using sqlldr in Windows environment. I know i can load the file as follow, but it a really bad way for many reasons.
load data
infile 'FILE_1.csv'
infile 'FILE_2.csv'
infile 'FILE_3.csv'
infile 'FILE_4.csv'
infile 'FILE_5.csv'
.
.
.
infile 'FILE_1500.csv'
append
into table MyTable
fields terminated by ' '
trailing nullcols
(
A,
B,
C,
D,
E
F,
G
)
I'm looking for an automatic way to load a whole folder of files into the DB, file by file (I don't wan't to merge the files, since they are huge).
Any idea?
Use EXTERNAL TABLE, pass the file names to it. On 11gR2, you could use PREPROCESSOR DIRECTIVE.
You could even pass the file names dynamically. Have a look at this asktom link for more details https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3015912000346648463

Hive Reading external table from compressed bz2 file

this is my scenario.
I have bz2 file in Amazon s3. Within the bz2 file, there lies files with .dat,.met,.sta extensions.I am only interested in files with *.dat extensions.You can download this samplefile to take a look at bz2 file.
create external table cdr (
anum string,
bnum string,
numOfTimes int
)
row format delimited
fields terminated by ','
lines terminated by '\n'
location 's3://mybucket/dir'; #the zip file is inside here
The problem lies such that when I execute the above command, some of the records/rows had issues.
1)all the data from files such as *.sta and *.met are also included.
2)the metadata of the filenames are also included.
The only idea I had was to show the INPUT_FILE_NAME. But then, all the records/rows had the same INPUT_FILE_NAME which was the filename.tar.bz2.
Any suggestions are welcome. I am currently completely lost.