generate SQL queries out of CSV table - sql

Good morning,
I have a CSV file that contains in the first line the column of my tables and the rest is data. Something like that
FIELD1,FIELD2,FIELD3
data1,data2,data3
data1,data2,data3
Now I have been trying to write a script that will return the following output and can be used for more than once.
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
That's what I have so far but it does not return the correct output.
firstline=$(printf '%s\n' 1p d wq | ed -s file.csv )
cat file.csv | while read line
do
field1=$(echo "$line" | cut -d "," -f1)
field2=$(echo "$line" | cut -d "," -f2)
field3=$(echo "$line" | cut -d "," -f3)
echo "INSERT INTO tablename ($firstline) VALUES ($fields1 $field2 $field3) ">prova.csv
done
) VALUES ( 15blename (data1,1,1
I am not sure I can use the variable $firstline inside the while loop... but I don't understand why it doesn't print me the insert into and the correct parenthesis.
Thanks in advance.
EDIT:
I have a new problem: SQL assistant does not allow me to insert values that are not enclosed with "'" so my question is how do I edit the script to make it look like this :
INSERT INTO tablename (columns) VALUES ('data1','data2','data3') ">prova.csv
thanks

Using awk:
awk 'NR==1{x=$0;next} {printf "INSERT INTO tablename (%s) VALUES (%s)\n",x,$0}' file

Related

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

Print variable with each line of while-read command

I'm trying to set up a monitoring script that would take all the databases we have, showed tables and done some arithmetics on it.
I have this command:
impala-shell -i impalad -q " show databases;" -B | while read a; do impala-shell -q "show tables in ${a}" -B -i impalad; done
That produces following output:
Query: show tables in database1
table1
table2
How should I format the output to display the database name($a) with each table? I tried echoing it or || but this only prints the database name after displaying all the tables. Or is there a way how to pass the variable to awk?
Desired output would look like this:
database1.table1
database1.table2
It looks like the output of the show tables ... command will have a 1-line header, followed by the list of table names.
You could skip the first line by piping to tail -n +2,
and then use another while loop to echo the database name and table name pairs in the desired format:
impala-shell -i impalad -q " show databases;" -B | while read a; do
impala-shell -q "show tables in ${a}" -B -i impalad | tail -n +2 | while read table; do
echo $a.$table
done
done
You could also do
impala-shell -q ... | awk -v db="$a" 'NR > 1 {print db "." $0}'

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

Separate text and pass it to a SQL

I'm using the latest Debian version.
I have this file:
2301,XT_ARTICLES
2101,XT_HOUSE_PHOTOS
301,XT_PDF
101611,XT_FIJOS
I want to separate this text so I can add the ID and the name to a one SQL. The SQL must be repeated according to the number of lines in the file, but I don't know how can I do it.
Can anybody help me, please?
Is this fit your needs ?
awk -F',' '{print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"}' file.txt
INSERT INTO foobar VALUES(2301, 'XT_ARTICLES');
INSERT INTO foobar VALUES(2101, 'XT_HOUSE_PHOTOS');
INSERT INTO foobar VALUES(301, 'XT_PDF');
INSERT INTO foobar VALUES(101611, 'XT_FIJOS');
If it's ok, just pipe that in MySQL :
awk -F',' '
BEGIN{
print "USE qux;"
}
{
print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"
}' file.txt | mysql

regex to split name=value,* into csv of name,* and value,*

I would like to split a line such as:
name1=value1,name2=value2, .....,namen=valuen
two produce two lines as follows:
name1,name2, .....,namen
value1,value2, .....,valuen
the goal being to construct an sql insert along the lines of:
input="name1=value1,name2=value2, .....,namen=valuen"
namescsv=$( echo $input | sed 's/=[^,]*//g' )
valuescsv=$( echo $input | ?????? )
INSERT INTO table_name ( $namescsv ) VALUES ( $valuescsv )
Id like to do this as simply as possible - perl awk, or multiple piping to tr cut etc seems too complicated. Given the names part seems simple enough I figure there must be something similar for values but cant work it out.
You can just inverse your character match :
echo $input | sed 's/[^,]*=//g'
i think your best bet is still sed -re s/[^=,]*=([^,]*)/\1/g though I guess the input would have match your table exactly.
Note that in some RDBMS you can use the following syntax:
INSERT INTO table_name SET name=value, name2=value2, ...;
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The following shell script does what you are asking for and takes care of escaping (not only because of injection, but you may want to insert values with quotes in them):
_IFS="$IFS"; IFS=","
line="name1=value1,name2=value2,namen=valuen";
for pair in $line; do
names="$names,${pair%=*}"
values="$values,'$(escape_sql "${pair#*=}")'"
done
IFS="$_IFS"
echo "INSERT INTO table_name ( ${names#,} ) VALUES ( ${values#,} )"
Output:
INSERT INTO table_name ( name1,name2,namen ) VALUES ( 'value1','value2','valuen' )