Escape newline/special characters in sqlite output [duplicate] - sql

quick overview: I got sqlite3 db which contains following structure and data
Id|Name|Value
1|SomeName1|SomeValue1
2|SomeName2|SomeValue2
3|SomeName3|SomeValue3
(continuation of SomeValue3 in here, after ENTER)
Problem is with iteration trough "Value" column, I'm using that code:
records=(`sqlite3 database.db "SELECT Value FROM Values"`)
for record in "${records[#]}"; do
echo $record
done
Problem is there should three values using that iteration, but it is showing four.
As result I received:
1 step of loop - SomeValue1
2 step of loop - SomeValue2
3 step of loop - SomeValue3
4 step of loop - (continuation of SomeValue3 in here, after ENTER)
it should end at third step and just show with line break up something like that:
3 step of loop - SomeValue3
(continuation of SomeValue3 in here, after ENTER)
Any suggestion how I can handle it with bash?
Thank you in advance!

Instead of relying on word splitting to populate an array with the result of a command, it's much more robust to use the readarray builtin, or read a result at a time with a loop. Examples of both follow, using sqlite3's ascii output mode, where rows are separated by the byte 0x1E and columns in the rows by 0x1F. This allows the literal newlines in your data to be easily accepted.
#!/usr/bin/env bash
# The -d argument to readarray and read changes the end-of-line character
# from newline to, in this case, ASCII Record Separator
# Uses the `%q` format specifier to avoid printing the newline
# literally for demonstration purposes.
echo "Example 1"
readarray -d $'\x1E' -t rows < <(sqlite3 -batch -noheader -ascii database.db 'SELECT value FROM "Values"')
for row in "${rows[#]}"; do
printf "Value: %q\n" "$row"
done
echo "Example 2 - multiple columns"
while IFS=$'\x1F' read -d $'\x1E' -ra row; do
printf "Rowid: %d Value: %q\n" "${row[0]}" "${row[1]}"
done < <(sqlite3 -batch -noheader -ascii database.db 'SELECT rowid, value FROM "Values"')
outputs
Example 1
Value: SomeValue1
Value: SomeValue2
Value: $'SomeValue2\nand more'
Example 2 - multiple columns
Rowid: 1 Value: SomeValue1
Rowid: 2 Value: SomeValue2
Rowid: 3 Value: $'SomeValue2\nand more'
See Don't Read Lines With for for more on why your approach is bad.
Since VALUES is a SQL keyword, when using it as a table name (Don't do that!) it has to be escaped by double quotes.

Your problem here is the IFS (internal field seperator) in Bash, which the for -loop counts as a new record.
Your best option is to remove the linefeed in the select statement from sqlite, e.g:
records=(`sqlite3 database.db "SELECT replace(Value, '\n', '') FROM Values"`)
for record in "${records[#]}"; do
echo $record
done
Alternatively, you could change the IFS in Bash - but you are relying on linefeed as a seperator between records.

Related

awk/sed replace multiple newlines in the record except end of record

I have file where:
field delimiter is \x01
the record delimiter is \n
Some lines contain multiple newlines I need to remove them, however I don't want to remove the legitimate newlines at the end of each lines. I have tried this with awk:
awk -F '\x01' 'NF < 87 {getline s; $0 = $0 s} 1' infile > outfile
But this is only working when the line contains one newline in the record (except end of line newline). This does not work for multiple newlines.
Note: the record contains 87 fields.
What am I doing wrong here?
Example of file:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000
test^A12102120^A00000^A00^A^A
NOTE: The file contains 11 fields; field separate \x01; record separator \n
Expected result:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note: I need to preserve the field delimiter (\x01) and record delimiter (\n)
Thank you very much in advance for looking into this.
The file always contains 87 fields;
The fild delimiter is '\x01', but when viewing in Linux it is represented as '^A'
Some lines contain newlines - I need to remove them, but I don't want to remove the legitimate newlines at the end of each line.
The newline appears twice in the 1st and second record and once in third record - this are the newlines I want to remove.
In the examples/expected results there are 11 delimiters "x01" represented as "^A",
I expect to have 3 records and not 6, i.e.:
First record:
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to the first line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Second record
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to that previous line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Third record:
test^A12102120^A00000^A00^A^A should be joined to the previous line
forming one record:
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note:
The example of awk - provided works when there is one unwanted newline in the record but not when there are multiple newlines
Thank you so very much. It works perfectly. Thank you for explaining it so well to me too.
This might work for you (GNU sed):
sed ':a;N;s/\x01/&/87;Ta;s/\n//g' file
Gather up lines until there are 87 separators, remove any newlines and print the result.
What's wrong with your attempt is that you concatenate two lines, print the result and move to the next line. NF is then reset to the next fields count. As all your lines have less than 87 fields the NF < 87 condition is useless, your script would work the same without it.
Try this awk script:
$ awk -F'\x01' -vn=87 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file
Here, we use the real \x01 field separator and the NF fields count. Variable i counts the number of already printed fields. We first print the current line without the trailing newline (printf("%s", $0)). Then we update our i fields counter. If it is equal to n we reset it and print a newline. Else we decrement it such that we do not count the last field of this line and the first of the next as 2 separate fields.
Demo with n=12 instead of 87 and your own input file (with \x01 field separators):
$ awk -F'\x01' -vn=12 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file |
sed 's/\x01/|/g'
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
SL|Nov-21|30-11-2021|B|0000|1234567|00000test|12102120|00000|00||
The sed command shows the result with the \x01 replaced by | for easier viewing.

Bash command/script to split line on a certain character

I would like to split the below data to the expected output:
Raw Data:
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
Expected Output:
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0
931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
Basically the \n character is getting lost sometimes in the data and the lines are getting merged. Sometimes more than 1 line gets merged as well (even the opposite happens but we can get to that later).
The data always has 43 columns | separated. The last but one column(42nd) always is a timestamp and the last column is usually 0 or 1.
Trying for the below approach:
If cols > 43
Split 44th column to add \n and print the remaining.
Repeat process until cols=43
echo "${curr}" | awk -F\| ' { if(NF > 43) {for(i=43;i<NF;i++) "sed '${NR}s/\(^0\)/\1\n/p' $i" }}' filename
less complex
awk 'BEGIN {FS=OFS="|"}
NF>43 {for(i=43;i<=NF;i+=42) {t=$i; $i=substr(t,1,1) ORS substr(t,2)}}1' file
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0
931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
following your spec
If cols > 43 Split 44th 43th column to add
\n and print the remaining. Repeat process until cols=43 the end.
The usual way with sed: write a regex that matches 43 | characters with anything in between and a digit. Then insert a newline after the matched string.
sed 's/[0-9]\{6\}\(|[^|]*\)\{41\}|[0-9]/&\n/g ; s/\n$//'
# ^^^^^^^ - remove the leftover newline
# ^ - the matched string
# ^^^^^ - trailing digit
# ^ - 42th pipe character
# ^^^^^^^^^^^^^^^^ - 41 fields with anything in between
# ^^^^^^^^^^ - leading 6 digits
tested on repl
Or maybe match 42 pipes with anything in front and a digit::
sed 's/\([^|]*|\)\{42\}[0-9]/&\n/g ; s/\n$//'
Or match a character after 42 pipes and a digit and insert a newline in between:
sed 's/\(\([^|]*|\)\{42\}[0-9]\)\(.\)/\1\n\3/g'
Could you please try following, written and tested with shown samples. This solution will take care of inserting new lines even if you have more than 1 occurrences present in your single line too.
awk '
match($0,/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/){
val=substr($0,RSTART+RLENGTH)
if(val){
num=gsub(/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/,"&")
while(++count<num){
sub(/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/,"&\n")
}
}
val=count=num=""
}
1
' Input_file
You don't trust the source of the data. Maybe it will add another | and the number of columns is wrong.
Another approach is guessing that you can trust the timestamp field.
So try to split the line when the field after the timestamp has more dan one character (and split after the first).
sed -E 's/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|.)(.)/\1\n\2/g' file
This might work for you (GNU sed):
sed 's/[^|]*/\n&/44;s/\(|.\)\([^|]*|\)\n/\1\n\2/;P;D' file
If there is a 44th field, insert a newline before it. Then remove that newline and insert it following the first character of the 43rd field. Print the first line, delete the first line and repeat.

Awk pattern matching on rows that have a value at specific column. No delimiter

I would like to search a file, using awk, to output rows that have a value commencing at a specific column number. e.g.
I looking for 979719 starting at column number 10:
moobaaraa**979719**
moobaaraa123456
moo**979719**123456
moobaaraa**979719**
moobaaraa123456
As you can see, there are no delimiters. It is a raw data text file. I would like to output rows 1 and 4. Not row 3 which does contain the pattern but not at the desired column number.
awk '/979719$/' file
moobaaraa979719
moobaaraa979719
An simple sed approach.
$ cat file
moobaaraa979719
moobaaraa123456
moo979719123456
moobaaraa979719
moobaaraa123456
Just search for a pattern, that end's up with 979719 and print the line:
$ sed -n '/^.*979719$/p' file
moobaaraa979719
moobaaraa979719
This code works:
awk 'length($1) == 9' FS="979719" raw-text-file
This code sets 979719 as the field separator, and checks whether the first field has a length of 9 characters. Then prints the line (as default action).
awk 'substr($0,10,6) == 979719' file
You can drop the ,6 if you want to search from the 10th char to the end of each line.

How do I check if input is one word or 2 separated by delimiter "-"

I need help with the following ksh script:
ExpResult=`echo "$LoadString" | awk -F"-" '{print NF}'=2`
MinExp=`echo "$ExpResult" | tr -s " " | sed 's/^[ ]//g'| cut -d"-" -f1`
MaxExp=`echo "$ExpResult" | tr -s " " | sed 's/^[ ]//g'| cut -d"-" -f2`
I can get an input as two options : "50-100" or "50" (for example)
I have two questions:
How do I check if the input is "one word" or two words separated by delimiter "-"?
If the input is two words, how can I separate them?
Rather than call an external program to parse your input, you can use the internal case statement to validate input and parameter expansion features to convert your input, i.e.
# set a copy/paste value for $1
set -- 50-10
case "$1" in
*-* )
range="$1"
min="${range%-*}"
max="${range#*-}"
;;
* )
singleNum="$1"
;;
esac
echo min=$min ... max=$max
output
min=50 ... max=100
Try for non-pair
unset min max
set -- other values
case ...
echo min= ... max= ... singleNum=$singleNum
output
min= ... max= ... singleNum=other
Hopefully the case processing is self-explanatory, but the parameter expansion may require a little explanation.
The statement
min=${range%-*}
says remove from the right side of the expanded value (50-100) anything starting at the last - until the end of the string. This leaves the value 50 remaining.
The reverse happens with
max=${range#*-}
Says remove from the left side of the expanded value anything up to the first - char. This leaves the 100.
As there is only one - char in this string, you don't need to worry about the other versions of ${var##*-} which says remove all from the left until the last match of -, and the reverse ${var%%-*} , remove all from right (backwards) until the very first - char.
The fanatical minimalists will remind us that this can be done without a temporary variable, i.e.
min=${1%-*} ; max=${1#*-}
And the one-line fantasists can be satisfied with
case "$1" in *-* ) range="$1";min="${range%-*}";max="${range#*-}";;* ) singleNum="$1";;esac; echo min=$min ... max=$max .,, singleNum=$singleNum
:-)
IHTH
you can try this;
LoadString=$1
MinExp=`echo "$LoadString" | awk -F"-" '{if (NF==2) print $1}`
MaxExp=`echo "$LoadString" | awk -F"-" '{if (NF==2) print $2}`
echo $MinExp
echo $MaxExp
eg:
user#host:/tmp/test$ ksh test.ksh 50-100
50
100

Please explain this awk script

echo "45" | awk 'BEGIN{FS=""}{for (i=1;i<=NF;i++)x+=$i}END{print x}'
I want to know how this works,what specifically does awk Fs,NF do here?
FS is the field separator. Setting it to "" (the empty string) means that every single character will be a separate field. So in your case you've got two fields: 4, and 5.
NF is the number of fields in a given record. In your case, that's 2. So i ranges from 1 to 2, which means that $i takes the values 4 and 5.
So this AWK script iterates over the characters and prints their sum — in this case 9.
These are built-in variables, FS being Field Separator - blank meaning split each character out. NF being Num Fields split by FS... so in this case num of chars, 2. So split the input by each character ("4", "5"), iterate each char (2) while adding their values up, print the result.
http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/
FS is the field separator. Normally fields are separated by whitespace, but when you set FS to the null string, each character of the input line is a separate field.
NF is the number of fields in the current input line. Since each character is a field, in this case it's the number of characters.
The for loop then iterates over each character on the line, adding it to x. So this is adding the value of each digit in input; for 45 it adds 4+5 and prints 9.