SQL: how to fix these errors? - sql

So I have to loop through a folder of .dat files, extract the data and use INSERT INTO to insert the data into a database.
Here is a pastebin of one of the files to see the data I am working with:
http://pastebin.com/dn4wQjjE
To run the script I just call:
populate_database.sh directoryWithDatFiles
And the contents of the populate_database.sh script:
rm test.sql;
sqlite3 test.sql "CREATE TABLE HotelReviews (HotelID SMALLINT, ReviewID SMALLINT, Author CHAR, Content CHAR, Date CHAR, Readers SMALLINT, HelpfulReviews SMALLINT, Over$
IFS=$'\n'
for file in $1/*;
do
author=($(grep "<Author>" $file | sed 's/<Author>//g'));
content=($(grep "<Content>" $file | sed 's/<Content>//g'));
date=($(grep "<Date>" $file | sed 's/<Date>//g'));
readers=($(grep "<No. Reader>" $file | sed 's/<No. Reader>//g'));
helpful=($(grep "<No. Helpful>" $file | sed 's/<No. Helpful>//g'));
overall=($(grep "<Overall>" $file | sed 's/<Overall>//g'));
value=($(grep "<Values>" $file | sed 's/<Value>//g'));
rooms=($(grep "<Room>" $file | sed 's/<Room>//g'));
location=($(grep "<Location>" $file | sed 's/<Location>//g'));
cleanliness=($(grep "<Cleanliness>" $file | sed 's/<Cleanliness>//g'));
receptionarea=($(grep "<Check in / front desk>" $file | sed 's/<Check in \/ front desk>//g'));
service=($(grep "<Service>" $file | sed 's/<Service>//g'));
businessservice=($(grep "<Business service>" $file | sed 's/<Business service>//g'));
length=${#author[#]}
hotelID="$(echo $file | sed 's/.dat//g' | sed 's/[^0-9]*//g')";
for((i = 0; i < length; i++)); do
sqlite3 test.sql "INSERT INTO HotelReviews VALUES($hotelID, $i, 'author', 'content', 'date', ${readers[i]}, ${helpful[i]}, ${overall[i]}, 9, 10, ${location[i]}, ${cleanliness[i]}, ${receptionarea[i]}, ${service[i]}, ${businessservice[i]})";
done
done
sqlite3 test.sql "SELECT * FROM HotelReviews;"
The problem I have though, is that although much of the script is working, there are still 5 of the 15 columns that I can't get working. I'll just screenshot the errors I get when trying to change the code from:
'author' --> ${author[i]}: http://i.imgur.com/zKQLSqT.jpg
'content' --> ${content[i]}: http://i.imgur.com/pnirIo3.jpg
'date' --> ${date[i]}: http://i.imgur.com/urF5DTa.jpg
9 --> ${value[i]}: http://i.imgur.com/AnBFSWp.jpg
10 --> ${rooms[i]}: same errors as above
Anyway, if anyone could help me out on this, I'd be massively grateful.
Cheers!

If you deal with a lot of XML, I recommend getting to know a SAX parser, such as the one in the Python standard library. Anyone willing to write a shell script like that has the chops to learn it, and the result will be easier to read and at least have a prayer at being correct.
If you want to stick with regex hacking, turn to awk. Using ">" as your field separator, your script could be simplified with awk lines like
/<Author>/ { gsub(/'/, "''", $2); author=$2 }
/<Content>/ { gsub(/'/, "''", $2); content=$2 }
...
END { print author, content, ... }
The gsub takes care of your SQL quoting problem by doubling any single quotes in the data.

Related

Timestamp issues with Powershell

I have a small powershell script that pulls the last hour of punch data from a sql db, it then outputs that data to a .csv file. The script is working, but the timestamp is like this:
hh:mm:ss.xxx, i need it to be only hh:mm, Any help would be greatly appreciated!
Below is the script and a snippet of the output:
sqlcmd -h-1 -S ZARIRIS\IRIS -d IA3000SDB -Q "SET NOCOUNT ON; Select Distinct TTransactionLog_1.DecisionTimeInterval,
TTransactionLog_1.UserID, TTransactionLog_1.OccurDateTime, TTransactionLog_1.StableTimeInterval
From TTransactionLog_1
Inner join TSystemLog1 On TTransactionLog_1.NodeID=TSystemLog1.NodeID
Inner join TUser On TTransactionLog_1.UserID=Tuser.UserID
where TSystemLog1.NodeID = 3 and TTransactionLog_1.OccurDateTime >= dateadd(HOUR, -1, getdate())" -s "," -W -o "C:\atr\karen\adminreport3.csv"
Get-Content "C:\ATR\Karen\adminreport3.csv" | ForEach-Object {$_ -replace "44444444","IN PUNCH"} | ForEach-Object {$_ -replace "11111111","OUT PUNCH"} | Set-Content "C:\ATR\Karen\punchreport1.csv" -Force
Output: (where i need the hh:mm format, it needs to read 12:08, not 12:08:19.000)
112213,2022-10-31 12:08:19.000,OUT PUNCH
It would probably be best if your script were to write out a date formatted the way you want in the first place,
but if that's not an option, you really should consider using Import-Csv and Export-Csv to manipulate the data inside.
If the standard quoted csv output is something you don't want, please see this code to safely remove the quotes where possible.
Having said that, here's one way of doing it in a line-by-line fashion:
Get-Content "C:\ATR\Karen\adminreport3.csv" | ForEach-Object {
$line = $_ -replace "44444444","IN PUNCH" -replace "11111111","OUT PUNCH"
$fields = $line -split ','
# reformat the date by first parsing it out as DateTime object
$fields[1] = '{0:yyyy-MM-dd HH:mm}' -f [datetime]::ParseExact($fields[1], 'yyyy-MM-dd HH:mm:ss.fff',$null)
# or use regex on the date and time string as alternative
# $fields[1] = $fields[1] -replace '^(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}).*', '$1'
# rejoin the fields with a comma
$fields -join ','
} | Set-Content "C:\ATR\Karen\punchreport1.csv" -Force

How to get policy tag if bigquery attribute from information schema

We have used policy tag in bigquery for column level security. ( https://cloud.google.com/bigquery/docs/best-practices-policy-tags) . Now we want to check the list of tables and attributes which have the policy tag. Is there any way in bigquery to get it using INFORMATION_SCHEMA? Or any other approach pragmatically to get the attribute tagged with some policy tag?
You can try this solution, which will give you list of tables and columns where PolicyTag is used:
Write Table-List into a File:
DATASET="dataset-name"
bq ls --max_results=10000 ${DATASET} | awk '{ print $1 }' | sed '1,2d' > table_list.txt
Shell Script:
#!/bin/bash
DATASET="dataset-name"
echo "------------------------------"
echo "TableName ColumnName"
echo "------------------------------"
while IFS= read -r TABLE; do
TAG_COUNT="`bq show --schema ${DATASET}.${TABLE} | grep "policyTags" | wc -l`"
if [ "${TAG_COUNT}" -ge 1 ]
then
COLUMN="`bq show --format=prettyjson ${DATASET}.${TABLE} | jq '.schema.fields[] | select(.policyTags | length>=1)' | jq '.name'`"
echo "${TABLE} ${COLUMN}"
fi
done < table_list.txt

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

Print variable with each line of while-read command

I'm trying to set up a monitoring script that would take all the databases we have, showed tables and done some arithmetics on it.
I have this command:
impala-shell -i impalad -q " show databases;" -B | while read a; do impala-shell -q "show tables in ${a}" -B -i impalad; done
That produces following output:
Query: show tables in database1
table1
table2
How should I format the output to display the database name($a) with each table? I tried echoing it or || but this only prints the database name after displaying all the tables. Or is there a way how to pass the variable to awk?
Desired output would look like this:
database1.table1
database1.table2
It looks like the output of the show tables ... command will have a 1-line header, followed by the list of table names.
You could skip the first line by piping to tail -n +2,
and then use another while loop to echo the database name and table name pairs in the desired format:
impala-shell -i impalad -q " show databases;" -B | while read a; do
impala-shell -q "show tables in ${a}" -B -i impalad | tail -n +2 | while read table; do
echo $a.$table
done
done
You could also do
impala-shell -q ... | awk -v db="$a" 'NR > 1 {print db "." $0}'

Separate text and pass it to a SQL

I'm using the latest Debian version.
I have this file:
2301,XT_ARTICLES
2101,XT_HOUSE_PHOTOS
301,XT_PDF
101611,XT_FIJOS
I want to separate this text so I can add the ID and the name to a one SQL. The SQL must be repeated according to the number of lines in the file, but I don't know how can I do it.
Can anybody help me, please?
Is this fit your needs ?
awk -F',' '{print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"}' file.txt
INSERT INTO foobar VALUES(2301, 'XT_ARTICLES');
INSERT INTO foobar VALUES(2101, 'XT_HOUSE_PHOTOS');
INSERT INTO foobar VALUES(301, 'XT_PDF');
INSERT INTO foobar VALUES(101611, 'XT_FIJOS');
If it's ok, just pipe that in MySQL :
awk -F',' '
BEGIN{
print "USE qux;"
}
{
print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"
}' file.txt | mysql