sqoop import command issue with partition key - hive

I am trying to import data in hive table using sqoop command. The hive table is partitioned by date2 and date is in the format of "9/6/2017 00:00:00". It's throwing error when I use sqoop command to import data using the date column.
Teradata table :
column1, date2, column3
1,9/6/2017 00:00:00, qwe
2,9/20/2017 00:00:00, wer
Sqoop command:
sqoop import \
--connect jdbc:teradata://<server>/database=<db_name> \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username un \
--password 'pwd' \
--table <tbl_name> \
--where "cast(date2 as Date) > date '2017-09-07' and cast(date2 as Date) < date '2017-09-20'" \
--hive-import --hive-table <db_name>.<tbl_name> \
--hive-partition-key date2 \
-m1
Error
ERROR teradata.TeradataSqoopImportHelper: Exception running Teradata import job
java.lang.IllegalArgumentException:Wrong FS: /usr/tarun/date2=1900-01-01 00%3A00%3A00

When I tried translating your command to multiline, it looks you have missed one \ character and that's why it looks it is complaining. --hive-import is not ending with "\". The hive table name is also missing in the command
sqoop import \
--connect jdbc:teradata:///database= \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username un \
--password 'pwd' \
--table \
--where "cast(date2 as Date) > date '2017-09-07' and cast(date2 as Date) < date '2017-09-20'" \
--hive-import \
--hive-table tarun121 \
--hive-partition-key date2 \
-m1
alternate to this is to try create-hive-table command
sqoop create-hive-table \
--connect jdbc:teradata:://localhost:port/schema \
--table hive_tble_name \
--fields-terminated-by ',';
let me know if this solves the issue.

Related

To read and print 1st 1000 rows from a csv using awk command and then next 1000 and so on

I have a csv that has around 25k rows. I have to pick 1000 rows from column#1 and column#2 at a time and then next 1000 rows and so on.
I am using below command, and its working fine in picking up all the values from column#1 and Column#2 i.e 25K fields from both the columns, I want to pick value like 1-1000, put them in my sql export query then 1001-2000,2001-3000 and so on and put the value in WHERE IN in my export query and append the result in dbData.csv file.
My code is below:
awk -F ',' 'NR > 2 {print $1}' $INPUT > column1.txt
i=$(cat column1.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')
awk -F ',' 'NR > 2 {print $2}' $INPUT > column2.txt
j=$(cat column2.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "export to dbData.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in ($i) and cd.language_id in ($j)"
Let's assume the two first fields of your input CSV are "simple" (no spaces, no commas...) and do not need any kind of quoting. You could generate the tricky part of your query string with an awk script:
# foo.awk
NR >= first && NR <= last {
c1[n+0] = $1
c2[n++] = $2
}
END {
for(i = 0; i < n-1; i++) printf("%s,", c1[i])
printf("%s) %s (%s", c1[n-1], midstr, c2[0])
for(i = 1; i < n; i++) printf(",%s", c2[i])
}
And then use it in a bash loop to process 1000 records per iteration, store the result of the query in a temporary file (e.g., tmp.csv in the following bash script) that you concatenate to your dbData.csv file. The following example bash script uses the same parameters as you do (INPUT, sourceDBStr) and the same constants (dbData.csv, 1000, user123, pas123). Adapt if you need more flexibility. Error management (input file not found, DB connection error, DB query error...) is left as a bash exercise (but should be done).
prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"
rm -f dbData.csv
len=$(cat "$INPUT" | wc -l)
for (( first = 2; first <= len - 999; first += 1000 )); do
(( last = len < first + 999 ? len : first + 999 ))
query=$(awk -F ',' -f foo.awk -v midstr="$midstr" -v first="$first" \
-v last="$last" "$INPUT")
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "$prefix ($query)"
cat tmp.csv >> dbData.csv
done
rm -f tmp.csv
But there are other ways using split, bash arrays and simpler awk or sed scripts. Example:
declare -a arr=()
prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"
awk -F, 'NR>1 {print $1, $2}' "$INPUT" | split -l 1000 - foobar
rm -f dbData.csv
for f in foobar*; do
arr=($(awk '{print $1 ","}' "$f"))
i="${arr[*]}"
arr=($(awk '{print $2 ","}' "$f"))
j="${arr[*]}"
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "$prefix (${i%,}) $midstr (${j%,})"
cat tmp.csv >> dbData.csv
rm -f "$f"
done
rm -f tmp.csv

How to count the number of emails sent per single account EXIM?

My application needs to count in real time the number of emails and the time it was sent by exim, is it possible?
The connection is made by SMTP.
There are three ways to do that:
1 Parsing logs (worse approach).
2 RSyslog implementation plus Exim conf.
3 Exim with Mysql.
Rsyslog
Install syslog and syslog-mysql
[root#web ~]# yum install rsyslog rsyslog-mysql
Basic configuration
[root#web ~]# mysql
mysql> CREATE DATABASE Syslog;
mysql> USE Syslog;
mysql> CREATE TABLE `SmtpMailLog` (
`Id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
`Hostname` varchar(255) NOT NULL,
`EximID` varchar(16) NOT NULL,
`DateIn` datetime DEFAULT NULL,
`DateLastProcessed` datetime DEFAULT NULL,
`DateCompleted` datetime DEFAULT NULL,
`FromAddr` varchar(100) DEFAULT NULL,
`FromAddrHost` varchar(100) DEFAULT NULL,
`FirstToAddr` varchar(100) DEFAULT NULL,
`AdditionalToAddr` text,
`HostFrom` varchar(100) DEFAULT NULL,
`FirstHostTo` varchar(100) DEFAULT NULL,
`Size` int(11) DEFAULT NULL,
`Subject` varchar(255) DEFAULT NULL,
`Notes` varchar(255) DEFAULT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `EximID` (`EximID`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='--';
mysql> exit
[root#web ~]# echo "USE mysql; CREATE USER rsyslog; FLUSH PRIVILEGES;" | mysql
[root#web ~]# echo "USE mysql; GRANT ALL PRIVILEGES ON Syslog.* TO 'rsyslog'#'127.0.0.1' IDENTIFIED BY 'rsysl0g'; FLUSH PRIVILEGES;" | mysql
[root#web ~]# echo "USE mysql; SET PASSWORD FOR 'rsyslog'#'127.0.0.1' = PASSWORD('rsysl0g'); FLUSH PRIVILEGES;" | mysql
[root#web ~]# /bin/cat << EOF > /etc/rsyslog.conf
# Modules --------------------------------------------------------------------
# Input
$ModLoad imuxsock.so # Unix sockets
# Output
$ModLoad ommysql.so # Log to MySQL
# Globals --------------------------------------------------------------------
# There are many more - see docs
# Files and dirs are created as needed (dirs only for "dynamic" files)
$umask 0000
$DirCreateMode 0640
$FileCreateMode 0640
#$FileOwner rsyslog
#$FileGroup rsyslog
#$DirOwner rsyslog
#$DirGroup rsyslog
$RepeatedMsgReduction on
# Include package specific logs (including rsyslog itself)
$IncludeConfig /etc/rsyslog.d/*.conf
# Log to the console
*.* -/var/log/exim/main.log
& ~
EOF
Parser data Configuration
[root#web ~]# /bin/cat << EOF > /etc/rsyslog.d/20-mail.conf
# ###############################################################
# Mail system logging
# Exim, Spam Assassin, SA-Exim, ClamAV
# /etc/rsyslog.d/20-mail.conf
# ###############################################################
# NOTES
# Careful with quotes in if clauses
# seems to need ' and not " (JG 11 Jun 2009)
# Multi line logging from Exim "detector":
# :msg, regex, " \[[0-9]{1,3}[\\/][0-9]{1,3}\]" ~
# email address finder:
# %msg:R,ERE,0,ZERO:[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}--end%
# Exim ID finder:
# %msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%
# Easier to read log format:
# $template Mail-Exim-File-Format,"%timegenerated:1:10:date-rfc3339% %timegenerated:12:19:date-rfc3339% %hostname% %syslogtag%%msg%\n"
#########################################################
# Syslog style to support OSSEC (JG 26 AUg 2009)
$template Mail-Exim-File-Format,"%timegenerated% %HOSTNAME% %syslogtag%%msg%\n"
#########################################################
# Amalgamated logging templates
# The log entry is built up an initial entry from ClamAV followed by successive updates from the vaious components, in the order
# of the templates here. The EximID is used to look up the entry except for SA-Exim (which uses the msgid).
# <= - In
# Local:
# Sep 15 09:06:17 loghost exim[20787]: 1MnT3J-0005PH-2y <= nagios#example.com U=nagios P=local S=794 T="** PROBLEM Service Alert: host-name/NTP-peer is CRITICAL **"
# Sep 22 10:40:59 portal exim[12557]: 1Mq1rn-0003GX-MZ <= root#blueloop.net U=root P=local S=516 T="test message"
# Relayed:
# Sep 15 09:03:38 loghost exim[20078]:
# 1MnT0g-0005Dq-BC <= user#example.com H=host.example.com [192.168.100.100] P=esmtp S=8690192 id=4AAF585B020000AA0004ED5B#port.blueloop.net T="Subject line from message"
# If an arg to CONCAT is NULL then the whole output is NULL
$template Mail-Exim-In-Amalgamated,"REPLACE INTO SmtpMailLog \
( \
Hostname, \
EximID, \
DateIn, \
DateLastProcessed, \
FirstToAddr, \
FromAddr, \
FromAddrHost, \
AdditionalToAddr, \
HostFrom, \
Size, \
Subject, \
FirstHostTo \
) \
VALUES \
( \
'%hostname%', \
'%msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%', \
'%timereported:::date-mysql%', \
'%timereported:::date-mysql%', \
'%msg:R,ERE,0,ZERO:([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$)--end%', \
'%msg:R,ERE,0,ZERO:[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}--end%', \
substring_index('%msg:R,ERE,0,ZERO:[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}--end%', '#', -1), \
'', \
SUBSTRING('%msg:R,ERE,0,ZERO:H=.*\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}]--end%' FROM 3), \
SUBSTRING('%msg:R,ERE,0,ZERO:S=[0-9]{1,}--end%' FROM 3), \
SUBSTRING('%msg:R,ERE,0,ZERO:T=.*--end%' FROM 3), \
'pending' \
) \
",SQL
# ** - Failed
$template Mail-Exim-Fail-Amalgamated,"UPDATE SmtpMailLog \
SET \
DateLastProcessed = '%timereported:::date-mysql%', \
FirstToAddr = 'Failed - see notes', \
FirstHostTo = 'Failed - see notes', \
Notes = '%msg%' \
WHERE EximID = '%msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%' \
",SQL
# => - Out
$template Mail-Exim-Out-Amalgamated, "UPDATE SmtpMailLog \
SET \
FirstToAddr = '%msg:R,ERE,0,ZERO:[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}--end%', \
FirstHostTo = SUBSTRING('%msg:R,ERE,0,ZERO:H=.*]--end%' FROM 3), \
DateLastProcessed = '%timereported:::date-mysql%', \
Notes = 'Out' \
WHERE EximID = '%msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%' \
",SQL
# -> - additional deliveries
$template Mail-Exim-Add-Amalgamated, "UPDATE SmtpMailLog \
SET \
AdditionalToAddr = CONCAT_WS(' ',AdditionalToAddr,'%msg:R,ERE,0,ZERO:[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}--end%'), \
DateLastProcessed = '%timereported:::date-mysql%', \
Notes = 'Additional delivery' \
WHERE EximID = '%msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%' \
",SQL
# Completed
$template Mail-Exim-Completed-Amalgamated,"UPDATE SmtpMailLog \
SET \
DateCompleted = '%timereported:::date-mysql%', \
DateLastProcessed = '%timereported:::date-mysql%', \
Notes = 'Completed' \
WHERE EximID = '%msg:R,ERE,0,ZERO:[A-Za-z0-9]{6}-[A-Za-z0-9]{6}-[A-Za-z0-9]{2}--end%' \
",SQL
#########################################################
# Full Exim log (bar the bits that are filtered out above) - file
if $programname == 'exim' then /var/log/exim/main.log;Mail-Exim-File-Format
###################################
# Amalgamated Mail log - single line per mail, some details lost - DB
#if $programname == 'exim' \
# and $msg contains 'dovecot_login' \
#then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-New-Amalgamated
if $programname == 'exim' \
and $msg contains '<=' \
then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-In-Amalgamated
if $programname == 'exim' \
and $msg contains '=>' \
then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-Out-Amalgamated
if $programname == 'exim' \
and $msg contains '->' \
then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-Add-Amalgamated
if $programname == 'exim' \
and $msg contains '**' \
then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-Fail-Amalgamated
if $programname == 'exim' \
and $msg contains 'Completed' \
then :ommysql:127.0.0.1,Syslog,rsyslog,rsysl0g;Mail-Exim-Completed-Amalgamated
##################################
# Dump Exim messages
if $programname == 'exim' then ~
EOF
Adjust exim log selector:
[root#web ~]# vi /etc/exim/exim.conf
log_selector = +incoming_port +smtp_connection +all_parents +retry_defer +subject +arguments +received_recipients
--
Exim Mysql
Install dependencies.
[root#web ~]# yum install exim-mysql
Add exim mysql connection.
[root#web ~]# vi /etc/exim/exim.conf
hide mysql_servers = 127.0.0.1/{DATABASE}/{USER}/{PASSWORD}
Is possible to use the same table structure as used on Rsyslog instalation.
On acl_smtp_data section, add some like that:
acl_smtp_data:
warn
continue = ${lookup mysql{INSERT INTO SmtpMailLog \
(\
AdditionalToAddr \
)\
values \
(\
'${quote_mysql:$recipients}' \
)}}

Snakemake: how to change literal tab

I have a rule like :
rule merge_fastq_by_lane:
input:
r1 = get_fastq_r1,
r2 = get_fastq_r2
output:
r1_o = "{sample}/fastq/lanes/{sample}_{unit}_R1.fastq",
r2_o = "{sample}/fastq/lanes/{sample}_{unit}_R2.fastq",
bam = "{sample}/bam/lanes/{sample}_{unit}.bam"
threads:
1
message:
"Merge fastq from the same sample and lane and align using bwa"
shell:
"""
cat {input.r1} > {output.r1_o}
cat {input.r2} > {output.r2_o}
{bwa} mem -M -t {threads} -R "#RG\tID:{wildcards.sample}_{wildcards.unit}\tSM:{wildcards.sample}" {bwa_index} {output.r1_o} {output.r2_o} | {samtools} view -bS - | {samtools} sort - > {output.bam}
"""
And I have this error message due to tab issues in the -R parameter from bwa
bwa mem -M -t 1 -R "#RG ID:P1_L001 SM:P1" Homo_sapiens.GRCh37.dna.primary_assembly P1/fastq/lanes/P1_L001_R1.fastq P1/fastq/lanes/P1_L001_R2.fastq | samtools view -bS - | samtools sort - > P1/bam/lanes/P1_L001.bam
[E::bwa_set_rg] the read group line contained literal <tab> characters -- replace with escaped tabs: \t
You just have to espace the tab character so that snakemake does not interpret it:
{bwa} mem -M -t {threads} -R "#RG\\tID:{wildcards.sample}_{wildcards.unit}\\tSM:{wildcards.sample}" {bwa_index} {output.r1_o} {output.r2_o} | {samtools} view -bS - | {samtools} sort - > {output.bam}

Triggers in SQL syntax issue in code SQL DB2

Question:
My code works until it reachs last line then it throws a syntax error.
Error :
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0104N An unexpected token "temp_dept" was found following "(case when ".
Expected tokens may include: "JOIN". SQLSTATE=42601
I am trying to do this:
After each insert on RD_EMP
for each row
insert into RD_Supervisor
check for cases, if temp_dept.RD_E_ID <= 0 THEN RD_Supervisor.RD_E_SUP 0
Code:
create trigger RD_total_dep_emp \
after insert on RD_Emp \
referencing new table as temp_dept
for each statement \
insert into RD_Supervisor(RD_E_SUP, RD_E_EMP, RD_QUOT) \
select temp_dept.RD_E_ID,
(case \
when temp_dept.RD_E_ID <= 0 then 0 \
when temp_dept.RD_E_ID > 0 AND temp_dept.RD_E_ID <= 15 then 15 \
when temp_dept.RD_E_ID > 15 AND temp_dept.RD_E_ID <= 25 then 25 \
when temp_dept.RD_E_ID > 25 AND temp_dept.RD_E_ID <= 35 then 35 \
when temp_dept.RD_E_ID > 35 then 100 \
end) as RD_E_SUP \
from temp_dept
You have three columns in the insert, but only two in the select -- and they appear to be in the wrong order. The following is probably more your intention:
create trigger RD_total_dep_emp \
after insert on RD_Emp \
referencing new table as temp_dept
for each statement \
insert into RD_Supervisor(RD_E_EMP, RD_E_SUP) \
select temp_dept.RD_E_ID,
(case \
when temp_dept.RD_E_ID <= 0 then 0 \
when temp_dept.RD_E_ID > 0 AND temp_dept.RD_E_ID <= 15 then 15 \
when temp_dept.RD_E_ID > 15 AND temp_dept.RD_E_ID <= 25 then 25 \
when temp_dept.RD_E_ID > 25 AND temp_dept.RD_E_ID <= 35 then 35 \
when temp_dept.RD_E_ID > 35 then 100 \
end) as RD_E_SUP \
from temp_dept
If there is a value you want to set for RD_QUOT, then you can specify that as well -- both in the insert and the select.
You have an opening parenthesis before CASE, but you don't have a closing parenthesis after END.
https://xkcd.com/859/

Creating an sql query using awk, bash, grep

I have been trying to parse a Paypal Email and insert the resultant info into a Database of mine. I have most of the code working but I cannot get a Variable to insert into my awk code to create the sql insert query.
if [ -f email-data.txt ]; then {
grep -e "Transaction ID:" -e "Receipt No: " email-data.txt \
>> ../temp
cat ../temp \
| awk 'NR == 1 {printf("%s\t",$NF)} NR == 2 {printf("%s\n",$NF)}' \
>> ../temp1
awk '{print $1}' $email-data.txt \
| grep # \
| grep -v \( \
| grep -v href \
>> ../address
email_addr=$(cat ../address)
echo $email_addr
cat ../temp1 \
| awk '{print "INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES"; print "(\x27"($email_addr)"\x27,'1',\x27"$2"\x27,\x27"$3"\x27);"}' \
> /home/linux014/opt/post-new-member.sql
The output looks like the following
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES('9MU013922L4775929 9MU013922L4775929',1,'9MU013922L4775929','');
Should look like
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES('dogcat#gmail.com',1,'9MU013922L4775929','1234-2345-3456-4567');
(Names changed to protect the innocent)
The trial data I am using is set out below
Apr 18, 2014 10:46:17 GMT-04:00 | Transaction ID: 9MU013922L4775929
You received a payment of $50.00 USD from Dog Cat (dogcat#gmail.com)
Buyer:
Dog Cat
dogcat#gmail.com
Purchase Details
Receipt No: 1234-2345-3456-4567
I cannot figure out why the email-addr is not being inserted properly.
You are calling a shell variable inside awk. The right way to do that is by creating an awk variable using -v option.
For example, say $email is your shell variable, then
... | awk -v awkvar="$email" '{do something with awkvar}' ...
Read this for more details.
However, having said that, here is how I would try and parse the text file:
awk '
/Transaction ID:/ { tran = $NF }
/Receipt No:/ { receipt = $NF }
$1 ~ /#/ { email = $1 }
END {
print "INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES";
print "("q email q","1","q tran q","q receipt q");"
}' q="'" data.txt
Output:
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES
('dogcat#gmail.com',1,'9MU013922L4775929','1234-2345-3456-4567');