import csv error using SQL Loader and perl - sql

Hello all I have a question I hope you guys can help me with. I tried to include all the relevant info. I'm building a perl script that will eventually loop though different sqlloader control files and import their respective csv data into oracle sql database tables. I'm testing multiple control loads before looping them.
The problem is that I get an error even though the script connects to the db and uploads all the csv data without any problems that I can see. all the rows are accounted for and the log doesn't really help:
================================================================================
[root#sanasr06 scripts]# perl db_upload.pl
connection made! Starting database upload...
Error: Can't open import control_general to SQL DB : at db_upload.pl line 44
================================================================================
line 44 is the system connection:
system ("sqlldr $userid\#$sid/$passwd control=#control_pools log=$log silent=all ") or $logger->logdie("Error: Can't open import control data to SQL DB :$!");
I'm including the control file output, the perl script and the control file. (the skipped file mentioned is for the csv headers:)
SQL*Loader: Release 11.2.0.1.0 - Production on Tue Aug 14 12:32:36 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Control File: /despliegue/san/project/sql_ctrl/general.ctl
Character Set UTF8 specified for all input.
Data File: /despliegue/san/project/csv/Pools.csv
Bad File: /despliegue/san/project/logs/sql_error.bad
Discard File: /despliegue/san/project/logs/sql_discard.dsc
(Allow all discards)
Number to load: ALL
Number to skip: 1
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Silent options: FEEDBACK, ERRORS and DISCARDS
Table I_GENERAL, loaded from every logical record.
Insert option in effect for this table: TRUNCATE
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
OBJECTID (FILLER FIELD) FIRST * , O(") CHARACTER
DESCRIPTION (FILLER FIELD) NEXT * , O(") CHARACTER
SERIALNUMBER NEXT * , O(") CHARACTER
PRODUCT_NAME NEXT * , O(") CHARACTER
CONTROLLER_VERSION NEXT * , O(") CHARACTER
NUMBER_OF_CONTROLLERS NEXT * , O(") CHARACTER
CAPACITY_GB NEXT * , O(") CHARACTER
PRODUCT_CODE NEXT * , O(") CHARACTER
value used for ROWS parameter changed from 64 to 15
Table I_GENERAL:
2512 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 247680 bytes(15 rows)
Read buffer bytes: 1048576
Total logical records skipped: 1
Total logical records read: 2512
Total logical records rejected: 0
Total logical records discarded: 0
Run began on Tue Aug 14 12:32:36 2012
Run ended on Tue Aug 14 12:32:38 2012
==================================================================================
the above file is of course shortened but includes all the relevant information.
here's the perl script.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Log::Log4perl;
#this script loads multiple saved csv files into the database using the control files
################ Initialization #############################################
my $homepath = "/despliegue/san/project";
my $log_conf = "$homepath/logs/log.conf";
Log::Log4perl->init($log_conf)or die("Error: Can't open log.config Does it exist? $!");
my $logger = Log::Log4perl->get_logger();
################ database connection variables####
my ($serial, $model);
my $host="me.notyou33.safety";
my $port="1426";
my $userid="user";
my $passwd="pass";
my $sid="sid";
my $log="$homepath/logs/sql_import.log";
#Control file location
my #control_pools= "$homepath/sql_ctrl/pools.ctl";
my #control_general = "$homepath/sql_ctrl/general.ctl";
my #control_ports= "$homepath/sql_ctrl/ports.ctl";
my #control_replication = "$homepath/sql_ctrl/replication.ctl";
#######################Database connection and data upload #################
my $dbh = DBI->connect( "dbi:Oracle:host=$host;sid=$sid;port=$port", "$userid", "$passwd",
{ RaiseError => 1}) or $logger->logdie ("Database connection not made: $DBI::errstr");
print " connection made! Starting database upload...\n";
system ("sqlldr $userid\#$sid/$passwd control=#control_general log=$log silent=all") or $logger->logdie("Error: Can't open import control_general to SQL DB :$!");
print "one done moving to next one\n";
system ("sqlldr $userid\#$sid/$passwd control=#control_pools log=$log silent=all ") or $logger->logdie("Error: Can't open import control data to SQL DB :$!");
system ("sqlldr $userid\#$sid/$passwd control=#control_ports log=$log ") or $logger->logdie("Error: Can't open import control data to SQL DB :$!");
print "three done moving to last one\n";
system ("sqlldr $userid\#$sid/$passwd control=#control_replication log=$log silent=feedback ") or $logger->logdie("Error: Can't open import control data to SQL DB :$!");
print "................Done\n";
############################################################################
$dbh->disconnect;
==================================================================================
the control file:
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
INFILE '/despliegue/san/project/csv/Pools.csv'
BADFILE '/despliegue/san/project/logs/sql_error.bad'
DISCARDFILE '/despliegue/san/project/logs/sql_discard.dsc'
TRUNCATE INTO TABLE I_GENERAL
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "\""
TRAILING NULLCOLS
(
OBJECTID FILLER,
DESCRIPTION FILLER,
SERIALNUMBER,
PRODUCT_NAME,
CONTROLLER_VERSION,
NUMBER_OF_CONTROLLERS,
CAPACITY_GB,
PRODUCT_CODE,
)

system() returns the return value of the wait call which includes the return value of the program you executed. If everything goes right, this will be 0. This is different from almost all other funktions in Perl, where you expect them to return some Value which evaluates to True in boolean context. Therefore, the commonly used error handling by using the or operator, does not work properly. You might want to try something like this instead:
system ("sqlldr $userid\#$sid/$passwd control=#control_pools log=$log silent=all") == 0
or $logger->logdie("Error: Can't open import control data to SQL DB :$?");
You can read more about handling the return value of system() in the documentation under perldoc -f system

There is Logdie which should be logdie, AFAIU

The problem is the system call expects a return value of 0 to be "successful". Your sqlldr job, if it skips or discards a record, will not return 0 (I've seen it return 2, check docs to be sure). So, unless you load all records successfully, your perl script (as written) will exit out.
perl system
sqlldr return codes

In my case I execute the sqlldr with backticks (similar to system), that helps me to get any feedback in a variable.
my $sqlldr = "sqlldr userid=usr/pss\#TNS control=\'$controlfile\' log=\'$logfile\' silent=header,feedback";
$execution = `$sqlldr 2>&1`;
The trick is that the returned value is not 0 in perl and you have to shift right 8 bits that value to be sure that you get 0. In my case I do as follows:
# Get the returned code from the last execution
my $ret = $? >> 8;
if ($ret == 0) {
$logger->info("Class DLA_Upload: All rows were successfully loaded");
}
elsif ($ret == 1) {
die("Class DLA_Upload: Executing sqlldr returned the following error:\n$execution");
}
elsif ($ret == 2) {
$logger->info("Class DLA_Upload: SQL*Loader was executed but some or all rows were rejected or discarded, please check $logfile for further information");
}
else {
die("Class DLA_Upload: FATAL ERROR: sqlldr corrupted or not found");
}
Why, here you have a link from Perl monks that explains it properly.

Related

How to Rollback DB2 Ingest statement for malformed data

I have a Bash Shell Script that runs a DB2 sql file. The job of this sql file is to completely replace the contents of a database table with whatever are the contents of this sql file.
However, I also need that database table to have its contents preserved if errors are discovered in the ingested file. For example, supposing my table currently looks like this:
MY_TABLE
C1
C2
row0
15
27
row1
19
20
And supposing I have an input file that looks like this:
15,28
34,90
"a string that's obviously not supposed to be here"
54,23
If I run the script with this input file, the table should stay exactly the same as it was before, not using the contents of the file at all.
However, when I run my script, this isn't the behavior I observe: instead, the contents of MY_TABLE do get replaced with all of the valid rows of the input file so the new contents of the table become:
MY_TABLE
C1
C2
row0
15
28
row1
34
90
row2
54
23
In my script logic, I explicitly disable autocommit for the part of the script that ingests the file, and I only call commit after I've checked that the sql execution returned no errors; if it did cause errors, I call rollback instead. Nonetheless, the contents of the table get replaced when errors occur, as though the rollback command wasn't called at all, and a commit was called instead.
Where is the problem in my script?
script.ksh
SQL_FILE=/app/scripts/script.db2
LOG=/app/logs/script.log
# ...
# Boilerplate to setup the connection to the database server
# ...
# +c: autocommit off
# -v: echo commands
# -s: Stop if errors occur
# -p: Show prompt for interactivity (for debugging)
# -td#: use '#' as the statement delimiter in the file
db2 +c -s -v -td# -p < $SQL_FILE >> $LOG
if [ $? -gt 2 ];
then echo "An Error occurred; rolling back the data" >> $LOG
db2 "ROLLBACK" >> $LOG
exit 1
fi
# No errors, commit the changes
db2 "COMMIT" >> $LOG
script.db2
ingest from file '/app/temp/values.csv'
format delimited by ','
(
$C1 INTEGER EXTERNAL,
$C2 INTEGER EXTERNAL
)
restart new 'SCRIPT_JOB'
replace into DATA.MY_TABLE
(
C1,
C2
)
values
(
$C1,
$C2
)#
Adding as answer per OP's suggestion:
Per the db2 documentation for the ingest command It appears that the +c: autocommit off will not function:
Updates from the INGEST command are committed at the end of an ingest
operation. The INGEST command issues commits based on the commit_period
and commit_count configuration parameters. As a result of this, the
following do not affect the INGEST command: the CLP -c or +c options, which
normally affect whether the CLP automatically commits the NOT LOGGED
INITIALLY option on the CREATE TABLE statement
You probably want to set the warningcount 1 option, which will cause the command to terminate after the first error or warning. The default behaviour is to continue processing while ignoring all errors (warningcount 0).

Can't export sqlite to json because the json option disappeared?

I'm trying to export specific columns of my sqlite database into json.
I found a question relevant to my issue here: Impossible to export SQLite query results to JSON?
Where they used the sqlite3 -json option to solve their problem and export their sqlite database to json.
However, when I try to do the same terminal command I get this in return:
sqlite3: Error: unknown option: -json
and sqlite3 -help returns the following list of options:
-append append the database to the end of the file
-ascii set output mode to 'ascii'
-bail stop after hitting an error
-batch force batch I/O
-column set output mode to 'column'
-cmd COMMAND run "COMMAND" before reading stdin
-csv set output mode to 'csv'
-deserialize open the database using sqlite3_deserialize()
-echo print commands before execution
-init FILENAME read/process named file
-[no]header turn headers on or off
-help show this message
-html set output mode to HTML
-interactive force interactive I/O
-line set output mode to 'line'
-list set output mode to 'list'
-lookaside SIZE N use N entries of SZ bytes for lookaside memory
-maxsize N maximum size for a --deserialize database
-memtrace trace all memory allocations and deallocations
-newline SEP set output row separator. Default: '\n'
-nullvalue TEXT set text string for NULL values. Default ''
-pagecache SIZE N use N slots of SZ bytes each for page cache memory
-quote set output mode to 'quote'
-readonly open the database read-only
-separator SEP set output column separator. Default: '|'
-stats print memory stats before each finalize
-version show SQLite version
-vfs NAME use NAME as the default VFS
json is nowhere to be found. Very strange, because the documentation shows the following:
$ sqlite3 --help
Usage: ./sqlite3 [OPTIONS] FILENAME [SQL]
FILENAME is the name of an SQLite database. A new database is created
if the file does not previously exist.
OPTIONS include:
-A ARGS... run ".archive ARGS" and exit
-append append the database to the end of the file
-ascii set output mode to 'ascii'
-bail stop after hitting an error
-batch force batch I/O
-box set output mode to 'box'
-column set output mode to 'column'
-cmd COMMAND run "COMMAND" before reading stdin
-csv set output mode to 'csv'
-deserialize open the database using sqlite3_deserialize()
-echo print commands before execution
-init FILENAME read/process named file
-[no]header turn headers on or off
-help show this message
-html set output mode to HTML
-interactive force interactive I/O
-json set output mode to 'json'
-line set output mode to 'line'
-list set output mode to 'list'
-lookaside SIZE N use N entries of SZ bytes for lookaside memory
-markdown set output mode to 'markdown'
-maxsize N maximum size for a --deserialize database
-memtrace trace all memory allocations and deallocations
-mmap N default mmap size set to N
-newline SEP set output row separator. Default: '\n'
-nofollow refuse to open symbolic links to database files
-nonce STRING set the safe-mode escape nonce
-nullvalue TEXT set text string for NULL values. Default ''
-pagecache SIZE N use N slots of SZ bytes each for page cache memory
-quote set output mode to 'quote'
-readonly open the database read-only
-safe enable safe-mode
-separator SEP set output column separator. Default: '|'
-stats print memory stats before each finalize
-table set output mode to 'table'
-tabs set output mode to 'tabs'
-version show SQLite version
-vfs NAME use NAME as the default VFS
-zip open the file as a ZIP Archive
As you can see, -json is clearly listed as an option (as well as many others that don't show up for me).
What's going on?

How to read the contents of an .sql file into an R script to run a query?

I have tried the readLines and the read.csv functions but then don't work.
Here is the contents of the my_script.sql file:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'
and it is saved on my Desktop.
Now I want to run this query from my R script. Here is what I have:
conn = connectDb()
fileName <- "C:\\Users\\me\\Desktop\\my_script.sql"
query <- readChar(fileName, file.info(fileName)$size)
query <- gsub("\r", " ", query)
query <- gsub("\n", " ", query)
query <- gsub("", " ", query)
recordSet <- dbSendQuery(conn, query)
rate <- fetch(recordSet, n = -1)
print(rate)
disconnectDb(conn)
And I am not getting anything back in this case. What can I try?
I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.
This is a function that I typically use whenever I am reading in a .sql file to be used in R.
getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""
while (TRUE){
line <- readLines(con, n = 1)
if ( length(line) == 0 ){
break
}
line <- gsub("\\t", " ", line)
if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
I've found for queries with multiple lines, the read_file() function from the readr package works well. The only thing you have to be mindful of is to avoid single quotes (double quotes are fine). You can even add comments this way.
Example query, saved as query.sql
SELECT
COUNT(1) as "my_count"
-- comment goes here
FROM -- tabs work too
my_table
I can then store the results in a data frame with
df <- dbGetQuery(con, statement = read_file('query.sql'))
You can use the read_file() function from the readr package.
fileName = read_file("C:/Users/me/Desktop/my_script.sql")
You will get a string variable fileName with the desired text.
Note: Use / instead of \\\
The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer:
Warning message: In readLines(con, n = 1) : line 1 appears to contain
an embedded nul
The first line returned by readLines is often "ÿþ" in these cases (i.e. the UTF-16 byte order mark) and subsequent lines are not read properly. I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting
File -> Save As ...
then on the small downarrow next to the save button selecting
Save with Encoding ...
and choosing
Unicode (UTF-8 without signature) - Codepage 65001
from the Encoding dropdown menu.
If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting
File -> Save As ...
Encoding: UTF-8
to save with a .txt file extension.
Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.
The combination of readr and textclean works well without having to create any new functions. read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. Note: Does cause problems if you have comments in your SQL string !!
library(readr)
library(textclean)
SQL <- replace_white(read_file("file_path")))

unable to specify db2 import parameters on bluemix?

I subscribed a free sqldb service from bluemix and tried to import data in CSV file to this database instance.
For certain columns I have pure "space" as data, and some columns to be filled by default value. I can import this data with the following command on my local DB2:
db2 'import from MY_DATA.csv of del modified by usedefaults keepblanks timestampformat="MM/DD/YYYY HH:MM:SS" skipcount 1 insert into MY_TABLE'
On bluemix, I can only assign date / time / timestamp format and skip 1st row. How can I add the "modified by usedefaults keepblanks" part on bluemix to complete the import?
Also, when the import fails, I only receive the following message:
BaseException message: [Routine "SYSPROC.ADMIN_CMD" execution has completed, but at least one error, "_0911", was encountered during the execution. More information is available.. _CODE=20397, _STATE=01H52, DRIVER=3.66.46]
Where can I get the detail error log that I can see on my local DB such as:
SQL3125W The character data in row "2" and column "32" was truncated because
the data is longer than the target database column.
SQL3148W A row from the input file was not inserted into the table. SQLCODE
"-181" was returned.
SQL0181N The string representation of a datetime value is out of range.
SQLSTATE=22007
SQL3185W The previous error occurred while processing data from row "2" of
the input file.
SQL3110N The utility has completed processing. "2" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "2".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "2" rows were processed from the input file. "0" rows were
successfully inserted into the table. "1" rows were rejected.
Number of rows read = 2
Number of rows skipped = 1
Number of rows inserted = 0
Number of rows updated = 0
Number of rows rejected = 1
Number of rows committed = 2
In the same quick load page (load complete in step 4), there should be a link to view the logs for this load. Hopefully it'll reveal more details about the error message.
Also note that keepblanks is applicable to DEL file formats (Delimited ASCII) only. It is not applicable to ASCII file formats (ASC/DEL) or ASC file formats (Non-delimited ASCII).
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0023577.html?cp=SSEPGG_10.5.0%2F3-6-1-3-0-0-12&lang=en

sybase update getting slower and slower

I have a big text file about 4GB, and more than 8 million lines, I'm writing a perl script to read this file line by line, do some processing and update the info to sybase, i did this in a batch way, 1000 lines per batch for update commit, but here comes the problem, at first, a batch only costs 10 to 20 seconds, but with the processing goes, updating a batch becomes slower and slower, a batch costs 3 to 4 min, I definitely have no idea why this is happening! Any body can help me analys this what may be the cause? Thanks in advance, on my knee...
==>I'm writing a perl script to read this file line by line, do some processing and update the info to sybase
Please do entire processing at one go mean process your source file at one go; Prepare data structure using hash, array as per requirement and then start inserting data into database.
Please keep below points in mind while inserting large data into database.
1- If each column data is not too large then you can insert entire data at one go also.( you may require good RAM not sure about size because it depend on dataset u need to process).
2- You should use execute_array of perl DBI so that you can insert data at one go.
3- If you have not sufficient RAM to insert data at one go then please devide your data ( may be in 8 part, 1 million lines each time).
4- Also Make it sure that you are preparing the statement once. In every run you are just executing with new data set.
5- Set your auto_commit off.
A sample code to use execute_array of perl DBI. I have used this to insert around 10 million of data into mysql.
Please keep ur data in arrays like below in the form of array.
#column1_data, #column2_data, #column3_data
print $logfile_handle, "Total records to insert--".scalar(#column1_data);
print $logfile_handle, "Inserting data into database";
my $sth = $$dbh_ref->prepare("INSERT INTO $tablename (column1,column2,column3) VALUES (?,?,?)")
or print ($logfile_handle, "ERROR- Couldn't prepare statement: " . $$dbh_ref->errsr) && exit;
my $tuples = $sth->execute_array(
{ ArrayTupleStatus => \my #tuple_status },
\#column1_data,
\#column2_data,
\#column3_data
);
$$dbh_ref->do("commit");
print ($logfile_handle,"Data Insertion Completed.");
if ($tuples) {
print ($logfile_handle,"Successfully inserted $tuples records\n");
} else {
##print Error log or those linese which are not inserted
my $status = $tuple_status[$tuple];
$status = [0, "Skipped"] unless defined $status;
next unless ref $status;
print ($logfile_handle, "ERROR- Failed to insert (%s,%s,%s): %s\n",
$column1_data[$tuple], $column2_data[$tuple],$column3_data[$tuple], $status->[1]);
}
}