regex to split name=value,* into csv of name,* and value,* - sql

I would like to split a line such as:
name1=value1,name2=value2, .....,namen=valuen
two produce two lines as follows:
name1,name2, .....,namen
value1,value2, .....,valuen
the goal being to construct an sql insert along the lines of:
input="name1=value1,name2=value2, .....,namen=valuen"
namescsv=$( echo $input | sed 's/=[^,]*//g' )
valuescsv=$( echo $input | ?????? )
INSERT INTO table_name ( $namescsv ) VALUES ( $valuescsv )
Id like to do this as simply as possible - perl awk, or multiple piping to tr cut etc seems too complicated. Given the names part seems simple enough I figure there must be something similar for values but cant work it out.

You can just inverse your character match :
echo $input | sed 's/[^,]*=//g'

i think your best bet is still sed -re s/[^=,]*=([^,]*)/\1/g though I guess the input would have match your table exactly.

Note that in some RDBMS you can use the following syntax:
INSERT INTO table_name SET name=value, name2=value2, ...;
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The following shell script does what you are asking for and takes care of escaping (not only because of injection, but you may want to insert values with quotes in them):
_IFS="$IFS"; IFS=","
line="name1=value1,name2=value2,namen=valuen";
for pair in $line; do
names="$names,${pair%=*}"
values="$values,'$(escape_sql "${pair#*=}")'"
done
IFS="$_IFS"
echo "INSERT INTO table_name ( ${names#,} ) VALUES ( ${values#,} )"
Output:
INSERT INTO table_name ( name1,name2,namen ) VALUES ( 'value1','value2','valuen' )

Related

generate SQL queries out of CSV table

Good morning,
I have a CSV file that contains in the first line the column of my tables and the rest is data. Something like that
FIELD1,FIELD2,FIELD3
data1,data2,data3
data1,data2,data3
Now I have been trying to write a script that will return the following output and can be used for more than once.
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
That's what I have so far but it does not return the correct output.
firstline=$(printf '%s\n' 1p d wq | ed -s file.csv )
cat file.csv | while read line
do
field1=$(echo "$line" | cut -d "," -f1)
field2=$(echo "$line" | cut -d "," -f2)
field3=$(echo "$line" | cut -d "," -f3)
echo "INSERT INTO tablename ($firstline) VALUES ($fields1 $field2 $field3) ">prova.csv
done
) VALUES ( 15blename (data1,1,1
I am not sure I can use the variable $firstline inside the while loop... but I don't understand why it doesn't print me the insert into and the correct parenthesis.
Thanks in advance.
EDIT:
I have a new problem: SQL assistant does not allow me to insert values that are not enclosed with "'" so my question is how do I edit the script to make it look like this :
INSERT INTO tablename (columns) VALUES ('data1','data2','data3') ">prova.csv
thanks
Using awk:
awk 'NR==1{x=$0;next} {printf "INSERT INTO tablename (%s) VALUES (%s)\n",x,$0}' file

Arguments mismatch using where IN clause in query

I have column in hive table like below
testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Problem is that you're comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here's a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[#]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

how to assign a query result to a shell variable

I have an sql query that returns a date.
I call this query from a shell script and would like to assign this value to the variable called datestart (and use it later). Here is my code. Without the datestart assignment the query works fine.
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=(psql $firstname $portname $dbname<< EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN(SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties')GROUP BY transactionname) as earliest;
EOF
)
echo $datestart
but the result is this :
Syntax error: word unexpected (expecting ")").
I have no idea where should I insert that closing bracket. Any hint is appreciated.
Instead of brackets in variable assignment you need to use $(...) for BASH or `...` for sh.
Try this:
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=`psql -t --pset="footer=off" --user="$firstname" --port="$portname" -d "$dbname"<<EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN (SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties') GROUP BY transactionname) as earliest;
EOF
`
echo "$datestart"