Loading data to Netezza using nzload utility - file-upload

I am trying to load data into Netezza database using "nzload" utility. The control file is as below and it works without any issues.
Is there a way to provide multiple data files as the input in a single control file?
DATAFILE C:\Karthick\data.txt
{
Database test1
TableName test
Delimiter '%'
maxErrors 20
Logfile C:\Karthick\importload.log
Badfile C:\Karthick\inventory.bad
}

$ cat my_control_file
datafile my_file1 {}
datafile my_file2 {}
datafile my_file3 {}
datafile my_file4 {}
# Below, I specify many of the options
# on the command line itself ... so I don't have
# to repeat them in the control file.
$ nzload -db system -t my_table -delim "|" -maxerrors 10 -cf my_control_file
Load session of table 'MY_TABLE' completed successfully
Load session of table 'MY_TABLE' completed successfully
Load session of table 'MY_TABLE' completed successfully
Load session of table 'MY_TABLE' completed successfully

Yes, you can specify multiple data files in single control file. Those data files can be loaded to same table or different tables. See an example at https://www.ibm.com/docs/en/psfa/7.2.1?topic=command-nzload-control-file
Following are two data files "/tmp/try1.dat" and "/tmp/try2.dat" to be loaded in a table "test" in "system" database:
[nz#nps ]$ cat /tmp/try1.dat
1
2
[nz#nps ]$ cat /tmp/try2.dat
3
4
Following control file defines two "DATAFILE" blocks one for each data file.
[nz#nps ]$ cat /tmp/try.cf
DATAFILE /tmp/try1.dat
{
Database system
TableName test
Delimiter '|'
Logfile /tmp/try1.log
Badfile /tmp/try1.bad
}
DATAFILE /tmp/try2.dat
{
Database system
TableName test
Delimiter '|'
Logfile /tmp/try2.log
Badfile /tmp/try2.bad
}
Load the data using "nzload -cf" option and verify that data is loaded.
[nz#nps ]$ nzload -cf /tmp/try.cf
Load session of table 'TEST' completed successfully
Load session of table 'TEST' completed successfully
[nz#nps ]$ nzsql -c "select * from test"
A1
----
2
3
4
1
(4 rows)

Related

How to Rollback DB2 Ingest statement for malformed data

I have a Bash Shell Script that runs a DB2 sql file. The job of this sql file is to completely replace the contents of a database table with whatever are the contents of this sql file.
However, I also need that database table to have its contents preserved if errors are discovered in the ingested file. For example, supposing my table currently looks like this:
MY_TABLE
C1
C2
row0
15
27
row1
19
20
And supposing I have an input file that looks like this:
15,28
34,90
"a string that's obviously not supposed to be here"
54,23
If I run the script with this input file, the table should stay exactly the same as it was before, not using the contents of the file at all.
However, when I run my script, this isn't the behavior I observe: instead, the contents of MY_TABLE do get replaced with all of the valid rows of the input file so the new contents of the table become:
MY_TABLE
C1
C2
row0
15
28
row1
34
90
row2
54
23
In my script logic, I explicitly disable autocommit for the part of the script that ingests the file, and I only call commit after I've checked that the sql execution returned no errors; if it did cause errors, I call rollback instead. Nonetheless, the contents of the table get replaced when errors occur, as though the rollback command wasn't called at all, and a commit was called instead.
Where is the problem in my script?
script.ksh
SQL_FILE=/app/scripts/script.db2
LOG=/app/logs/script.log
# ...
# Boilerplate to setup the connection to the database server
# ...
# +c: autocommit off
# -v: echo commands
# -s: Stop if errors occur
# -p: Show prompt for interactivity (for debugging)
# -td#: use '#' as the statement delimiter in the file
db2 +c -s -v -td# -p < $SQL_FILE >> $LOG
if [ $? -gt 2 ];
then echo "An Error occurred; rolling back the data" >> $LOG
db2 "ROLLBACK" >> $LOG
exit 1
fi
# No errors, commit the changes
db2 "COMMIT" >> $LOG
script.db2
ingest from file '/app/temp/values.csv'
format delimited by ','
(
$C1 INTEGER EXTERNAL,
$C2 INTEGER EXTERNAL
)
restart new 'SCRIPT_JOB'
replace into DATA.MY_TABLE
(
C1,
C2
)
values
(
$C1,
$C2
)#
Adding as answer per OP's suggestion:
Per the db2 documentation for the ingest command It appears that the +c: autocommit off will not function:
Updates from the INGEST command are committed at the end of an ingest
operation. The INGEST command issues commits based on the commit_period
and commit_count configuration parameters. As a result of this, the
following do not affect the INGEST command: the CLP -c or +c options, which
normally affect whether the CLP automatically commits the NOT LOGGED
INITIALLY option on the CREATE TABLE statement
You probably want to set the warningcount 1 option, which will cause the command to terminate after the first error or warning. The default behaviour is to continue processing while ignoring all errors (warningcount 0).

Bash script to import the multiple CSV files into a mysql database using load data local infile command

I have Multiple CSV files which are stored in one of the folder then I need to use these folder to fetch the csv files then load them into Database Table.
This script need to prepare in Bash with parameterized fields like InputFolderPath(loop Csv Files), DatabaseConnection, SchemaName, TableName then pass these fields using
Load Data Local Infile Command.
This worked for me,
for f in /var/www/path_to_your_folder/*.csv
do
mysql -e "use database_name" -e "
load data local infile '"$f"' into table your_table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (column_name1, #date_time_variable1, column_name3)
SET some_column_name_which_contains_date = STR_TO_DATE(#date_time_variable1, '%d-%m-%Y');" -u your_mysql_username_here --p --local-infile=1
echo "Done: '"$f"' at $(date)"
done
This script will prompt password for mysql.
i am using this script on ec2 + ubuntu

Export amazon mysql database to an excel sheet

I have an ec2-instance on which mysql database is there and now there are multiple tables have huge values which i want to export into an excel sheet into my local system or even some place at S3 will also work , how can i achieve this ?
Given that you installed your own MySQL instance on an EC2 node, you should have full access to MySQL's abilities. I don't see any reason why you can't just do a SELECT ... INTO OUTFILE here:
SELECT *
FROM yourTable
INTO OUTFILE 'output.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Once you have the CSV file, you may transfer it to a box running Excel, and use the Excel import wizard to bring in the data.
Edit:
Based on your comments below, it might be the case that you need to carefully select an output path and location to which MySQL and your user have permissions to write.
Another way to export CSV files from RDS Mysql and without getting Access denied for user '<databasename>'#'%' (using password: YES) is doing the following command:
mysql -u username -p --database=dbname --host=rdshostname --port=rdsport --batch -e "select * from yourtable" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > yourlocalfilename.csv
The secret is in this part:
--batch -e "select * from yourtable" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > yourlocalfilename.csv

sqlldr - load completion not reflected

I have a bash script (load_data.sh) that invokes the sqlldr command to load data from a .csv into a table (data_import). On a recent invocation, I noticed that even though the command execution was completed, the table didn't contain the data from the .csv file. I say this because the subsequent statement (process_data.sh) in the bash script tried to run a stored procedure that threw the error
ORA-01403: no data found.
I learned that the commit happens right after the file load. So, I'm wondering what's causing this error and how I can avoid it in the future.
Here are my scripts:
load_data.sh
#!/usr/bin/bash -p
# code here #
if [[ -f .st_running ]]
then
echo "Exiting as looks like another instance of script is running"
exit
fi
touch .st_running
# ... #
# deletes existing data in the table
./clean.sh
sqlldr user/pwd#host skip=1 control=$CUR_CTL.final data=$fpath log=${DATA}_data.log rows=10000 direct=true errors=999
# accesses the newly loaded data in the table and processes it
./process_data.sh
rm -f .st_running
clean.sh/process_data.sh
# code here #
# ... #
sqlplus user/pwd#host <<EOF
set serveroutput on
begin
schema.STORED_PROC;
commit;
end;
/
exit;
EOF
# code here #
# ... #
STORED_PROC run by process_data.sh:
SELECT count(*) INTO l_num_to_import FROM data_import;
IF (l_num_to_import = 0) THEN RETURN;
END IF;
/* the error (`ORA-01403: no data found`) happens at this statement: */
SELECT upper(name) INTO name FROM data_import WHERE ROWNUM = 1;
Control file
LOAD DATA
APPEND
INTO TABLE DATA_IMPORT
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
...
...
)
Edits
The input file had 8 rows and the logs from both runs stated that 8 rows were successfully inserted.
Interesting behavior: The script ran fine (without complaining about the error) the 2nd time I ran it on the same file. So, during the first run, the sqlldr command doesn't seem to complete before the next sqlplus command is executed.
If you capture the PID of the sqlldr command and wait for it to complete then you will be sure its complete. You can add a datestamp to the log file or timestamp if its run multiple times a day and do a while loop and sleep and check to see when the log prints its last line of completion. Then run the next step.

How to load XML data file into Hive table?

While loading XML data file into HIVE table i got following error message:
FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'.
The way i am loading the XML file is as follows :
**Create a table StoresXml
'CREATE EXTERNAL TABLE StoresXml (storexml string)
STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/stores';'
** Location /user/hive/warehouse/stores is in HDFS.
load data inpath <local path where the xml file is stored> into table StoresXml;
Now,problem is when i select any column from table StoresXml ,the above mentioned error comes up.
Please help me with it.Where i am going wrong?
1) first you need to create single column table like
CREATE TABLE xmlsample(xml string);
2) after that you need to load data in local/hdfs to hive table like
LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;
3) NEXT BY USING XPATH, XPATH_ARRAY,XPATH_STRING LIKE SAMPLE XML QUERIES..
I have just loaded this transactions.xml file into hive table using xpath
for XML file:
**Bring records of xml file into one line:
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;
I hope this will help you. Let me know the result.
I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated.
Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of
generated Hadoop script to insert csv into Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
Sample of generated Hive scripts
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
Thanks
Vijay