BASH script to create SQL statement ignore last column - sql

I am trying to create a bash script that will generate an SQL CREATE TABLE statement from a CSV file.
#!/bin/bash
# Check if the user provided a CSV file
if [ $# -eq 0 ]
then
echo "No CSV file provided."
exit 1
fi
# Check if the CSV file exists
if [ ! -f $1 ]
then
echo "CSV file does not exist."
exit 1
fi
# Get the table name from the CSV file name
table_name=$(basename $1 .csv)
# Extract the header row from the CSV file
header=$(head -n 1 $1)
# Split the header row into column names
IFS=',' read -r -a columns <<< "$header"
# Generate the PostgreSQL `CREATE TABLE` statement
echo "CREATE TABLE $table_name ("
for column in "${columns[#]}"
do
echo " $column TEXT,"
done
echo ");"
If I have a CSV file with three columns(aa,bb,cc), the generated statement does not have the last column for some reason.
Any idea what could be wrong?
If I do:
for a in "${array[#]}"
do
echo "$a"
done
I am getting:
aaa
bbb
ccc
But when add something into the string:
for a in "${array[#]}"
do
echo "$a SOMETHING"
done
I get:
aaa SOMETHING
bbb SOMETHING
SOMETHING
Thanks.

Your csv file has a '\r`
Try the next block for reproducing the problem.
printf -v header "%s,%s,%s\r\n" "aaa" "bbb" "ccc"
IFS=',' read -r -a columns <<< "$header"
echo "Show array"
for a in "${columns[#]}"; do echo "$a"; done
echo "Now with something extra"
for a in "${columns[#]}"; do echo "$a SOMETHING"; done
You should remove the '\r', what can be done with
IFS=',' read -r -a columns < <(tr -d '\r' <<< "${header}")

Related

Bash purge script

I'm trying to create a script that removes my images that are not in DB
There is my code (Updated):
I have 1 problems:
Problem with the like syntax like '%$f%'
#!/bin/bash
db="intranet_carc_development"
user="benjamin"
for f in public/uploads/files/*
do
if [[ -f "$f" ]]
then
psql $db $user -t -v "ON_ERROR_STOP=1" \
-c 'select * from public.articles where content like "%'"$(basename "$f")"'%"' | grep . \
&& echo "exist" \
|| echo "doesn't exist"
fi
done
And I have the following error :
ERROR: column "%1YOLV3M4-VFb2Hydb0VFMw.png%" does not exist
LINE 1: select * from public.articles where content like "%1YOLV3M4-...
^
doesn't exist
ERROR: column "%wnj8EEd8wuJp4TdUwqrJtA.png%" does not exist
LINE 1: select * from public.articles where content like "%wnj8EEd8w...
EDIT : if i use \'%$f%\' for the like :
/purge_files.sh: line 12: unexpected EOF while looking for matching `"'
./purge_files.sh: line 16: syntax error: unexpected end of file
There are several issues with your code :
$f is public/uploads/files/FILENAME and i want only the FILENAME
You can use basename to circumvent that, by writing :
f="$(basename "$f")"
psql $db $user -c "select * from public.articles where content like '%$f%'"...
(The extra quotes are here to prevent issues if you have spaces and special characters in your file name)
your psql request will always return true even if no rows are found
your psql command will return true even if the request fails, unless you set the variable 'ON_ERROR_STOP' to 1
As shown in the linked questions, you can use the following syntax :
#!/bin/bash
set -o pipefail #needed because of the pipe to grep later on
db="intranet_carc_development"
user="benjamin"
for f in public/uploads/files/*
do
if [[ -f "$f" ]]
then
f="$(basename "$f")"
psql $db $user -t -v "ON_ERROR_STOP=1" \
-c "select * from public.articles where content like '%$f%'" | grep . \
&& echo "exist" \
|| echo "doesn't exist"
fi
done

Load Data Transfert files v2 into Big Query

I am currently trying to insert all our DT files v2 into BQ.
I already did it with the click file, I spotted any trouble.
But it's not the same game with the activity and impression.
I wrote a quick script to help me in making the schema for the insertion :
import csv,json
import glob
data = []
for i in glob.glob('*.csv'):
print i
b = i.split("_")
print b[2]
with open(i, 'rb') as f:
reader = csv.reader(f)
row1 = next(reader)
title = [w.replace(' ', '_').replace('/', '_').replace(':', '_').replace('(', '_').replace(')', '').replace("-", "_") for w in row1]
print title
for a in title:
j={"name":"{0}".format(a),"type":"string","mode":"nullable"}
print j
if j not in data:
data.append(j)
with open('schema_' + b[2] + '.json', 'w') as outfile:
json.dump(data, outfile)
After that, I use the small bash script to insert all our data from our GCS .
#!/bin/bash
prep_files() {
date=$(echo "$f" | cut -d'_' -f4 | cut -c1-8)
echo "$n"
table_name=$(echo "$f" | cut -d'_' -f1-3)
bq --nosync load --field_delimiter=',' DCM_V2."$table_name""_""$date" "$var" ./schema/v2/schema_"$n".json
}
num=1
for var in $(gsutil ls gs://import-log/01_v2/*.csv.gz)
do
if test $num -lt 10000
then
echo "$var"
f=$(echo "$var" | cut -d'/' -f5)
n=$(echo "$f" | cut -d'_' -f3)
echo "$n"
prep_files
num=$(($num+1))
else
echo -e "Wait the next day"
echo "$num"
sleep $(( $(date -d 'tomorrow 0100' +%s) - $(date +%s) ))
num=0
fi
done
echo 'Import done'
But I have this kind of error :
Errors:
Too many errors encountered. (error code: invalid)
/gzip/subrange//bigstore/import-log/01_v2/dcm_accountXXX_impression_2016101220_20161013_073847_299112066.csv.gz: CSV table references column position 101, but line starting at position:0 contains only 101 columns. (error code: invalid)
So I check the number of columns in my schema with :
$awk -F',' '{print NF}'
But I have the good number of column...
So I thought that was because we had comma in value (some publishers are using a .NET framework, that allows comma in url). But theses values are enclosed with double quote.
So I made a test with a small file :
id,url
1,http://www.google.com
2,"http://www.google.com/test1,test2,test3"
And this loading works...
If someone have a clue to help me, that could be realy great. :)
EDIT : I did another test by make the load with an already decompressed file.
Too many errors encountered. (error code: invalid)
file-00000000: CSV table references column position 104, but line starting at position:2006877004 contains only 104 columns. (error code: invalid)
I used this command to find the line : $tail -c 2006877004 dcm_accountXXXX_activity_20161012_20161013_040343_299059260.csv | head -n 1
I get :
3079,10435077,311776195,75045433,1,2626849,139520233,IT,,28,0.0,22,,4003208,,dc_pre=CLHihcPW1M8CFTEC0woddTEPSQ;~oref=http://imasdk.googleapis.com/js/core/bridge3.146.2_en.html,1979747802,1476255005253094,2,,4233079,CONVERSION,POSTVIEW,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
After that : $head -n1 dcm_account8897_activity_20161012_20161013_040343_299059260.csv | awk -F',' '{print NF}'
Response : 102
So, I have 104 columns in the first row and 102 on this one...
Anyone else have trouble with the DT files v2 ?
I had this similar issue and found the problem was due to a few records being separated by carriage returns into 2 lines. Removing \r solved the problem
The line affected is usually not the line reflected in the error log.
I would open the csv file from google sheets, and compare the columns with the schema you generated.
Most probably you will found a bug in the schema.

Script to copy one column data to another column

I am writing a script to copy one column data to another column.
Tried with following logic bud didnt worked out-
o/p- number of parameter is 0.
My Logic-
• I got the keys from the admintable and then copied the data to some updateupdateStatement file.
• Using awk command I copied specific column data to some temp file
• Then prepared an update statement and then executed it.
#!/bin/ksh
#
# Script to Populate cross_refs based on what is in cross_references
#
#
echo "number of parameters is $#"
if [ $# != 1 ]; then
USAGE="USAGE: $0 cassPassword"
echo ${USAGE}
exit 1
fi
cassPassword=$1
#Add column to admin table
#echo "alter table to add column..."
#echo "ALTER TABLE admin.product ADD cross_refs Map<String,String>;" > updateTable.cql
#cqlsh -u dgadmin -p ${cassPassword} -f updateTable.cql
echo "get keys from cassandra"
echo "copy admin.product (cross_references) to 'updateupdateProductStatement.cql';" > copyInputs.cql
cqlsh -u dgadmin -p ${cassPassword} -f copyInputs.cql
#Convert file that Cassandra created from DOS to Unix
echo "DOS to Unix conversion..."
tr -d '\015' <updateupdateProductStatement.cql >updateupdateProductStatement2.cql
cat updateupdateProductStatement2.cql >tempFile
sed -i "s/^/update admin.product set cross_refs = '/" tempFile
#execute the updated .cql file to run all the update statements
echo "executing updateupdateProductStatement.cql..."
cqlsh -u dgadmin -p ${cassPassword} -f tempFile
I'm not absolutely certain I understand the intent of your script, but I can pick out one line that looks suspect...
cat updateFlatFileInputStatements2.cql |awk -F'\t' '{ 19 1 2}' >tempFile
I think you want to print columns 19, 1 and 2 to your output...
awk -F'\t' 'BEGIN { OFS=" " }{ print $19, $1, $2 }' updateFlatFileInputStatements2.cql > tempFile
Better is to do all the manipulation of tempFile in awk
awk -F'\t' "{ print \"update admin.product set my_refs = \" \$19 \" where id = \" $1 \" and effective_date = \" $2 \"';\"" updateFlatFileInputStatements2.cql > tempFile
Then again, I don't see in your file where tempFile is used... or where updateFlatFileInputStatements2.cql is generated. Looks like this piece of code is doing nothing?
updateupdateStatement.cql ... don't know where that comes from. This then is stripped to form updateupdateStatement2.cql ... which then is manipulated to become tempFile but... you don't use tempFile -- instead you send updateupdateStatement2.cql to cqlsh. The bug may be that you intended to send tempFile instead to your final cqlsh.

How can i error out an entry if it already exists?

I'm writing a simple program to add a contact into a file called "phonebook", but if the contact already exists, i want it to return an echo saying " (first name last name) already exists", and not add it to the file. So far, i've gotten the program to add the user, but it wont return that echo and adds the duplicate entry anyway. How can i fix this?
#!/bin/bash
# Check that 5 arguments are passed
#
if [ "$#" -ne 5 ]
then
echo
echo "Usage: first_name last_name phone_no room_no building"
echo
exit 1
fi
first=$1
last=$2
phone=$3
room=$4
building=$5
# Count the number of times the input name is in add_phonebook
count=$( grep -i "^$last:$first:" add_phonebook | wc -l )
#echo $count
# Check that the name is in the phonebook
if [ "$count" -eq 1 ]
then
echo
echo "$first $last is already in the phonebook."
echo
exit 1
fi
# Add someone to the phone book
#
echo "$1 $2 $3 $4 $5" >> add_phonebook
# Exit Successfully
exit 0
Couple of things:
Should check if add_phonebook file exists before attempting to grep it, otherwise you get the grep: add_phonebook: No such file or directory output.
Your grep expression doesn't match the format of the file.
You are saving the file with space in between the fields, but searching with a colon(:) between the names. You can either update the file format to use a colon to separate the fields, or update the grep expression to search on space. In addition, you save first name, last_name, but search on last_name, first_name.
With space format:
count=$( grep -i "^$last[[:space:]]\+$first[[:space:]]" add_phonebook | wc -l )
Removed my tab separators from the echo line, used spaces, and now it can count properly

How to check return value of Find statment in shell script?

How can I check the return value of "Find" statement in shell script
I am use Find in my script , if find statement don't find any file the execute exit !!
I want to check the return value of "Find" if it found any files or not
You can redirect output of the find command to a file called say output.txt then you can check if the size of that file is 0 or not by using -s option;
if [[ -s "output.txt" ]]
then
echo "File is not empty!"
else
echo "File is empty!"
fi
You can count the number of files found by find using the wc -l command:
export result=`find . -name *.txt | wc -l`
You can now check result to see how many files where found
if [ $result == "0" ]; then echo zero found; fi