How to replace character from middle of a spool file? - sql

i have a table with the below attributes
NAME COUNTRY CONTINENT ADD1 ADD2 ADD3 ADD4 ADD5 PINCODE
-----------------------------------------------------------
Adam USA NA NYC NY xxxxxx
Rakesh INDIA ASIA MUMBAI MH yyyyyy
Paul UK EU LONDON ENG zzzzzz
from this i have created a spool file file.txt in linux which will hold the below value
file.txt
Adam|USA|NA|NYC|NY||||xxxxxx
Rakesh|INDIA|ASIA|MUMBAI||MH|||yyyyyy
Paul|UK|EU|LONDON|ENG||||zzzzzz
This spool file will run on loop for every line.
For every line i want to store the required output in one variable l_addresses
Thus if we do echo "$l_addresses", it should give the required output for every line.
Required Output
NYC NY "" "" ""
MUMBAI "" MH "" ""
LONDON ENG "" "" ""

Using awk:
$ awk -F\| '{ # set field separator
for(i=4;i<=8;i++) # loop wanted fields
printf "%s%s",($i==""?"\"\"":$i),(i==8?ORS:OFS) # replace nulls and delims
}' file
OUtput:
NYC NY "" "" ""
MUMBAI "" MH "" ""
LONDON ENG "" "" ""

Related

Using AWK for best match replace

I have two files:
operators.txt # includes Country_code and Country_name
49 Germany
43 Austria
32 Belgium
33 France
traffic.txt # MSISDN and VLR_address (includes Country_code prefix)
123456789 491234567
123456788 432569874
123456787 333256987
123456789 431238523
I need to replace the VLR_address in traffic.txt file with Country_name from the first file.
The following awk command do that:
awk 'NR==FNR{a[$1]=$2;next} {print $1,a[$2]}' <(cat operators.txt) <(cat traffic.txt|awk '{print $1,substr($2,1,2)}')
123456789 Germany
123456788 Austria
123456787 France
123456789 Austria
but how to do it in case operators file is:
49 Germany
43 Austria
32 Belgium
33 France
355 Albania
1246 Barbados
1 USA
when country_code is not fixed length and in some case best match will apply e.g.
124612345 shall be Barbados
122018523 shall be USA
The sample input/output you provided isn't adequate to test with as it doesn't include the cases you later described as problematic but if we modify it to include a representation of those later statements:
$ head operators.txt traffic.txt
==> operators.txt <==
49 Germany
43 Austria
32 Belgium
33 France
1 USA
355 Albania
1246 Barbados
==> traffic.txt <==
123456789 491234567
123456788 432569874
123456787 333256987
123456789 431238523
foo 124612345
bar 122018523
then this may be what you want:
$ cat tst.sh
#!/usr/bin/env bash
awk '
NR==FNR {
keys[++numKeys] = $1
map[$1] = $2
next
}
{
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
if ( index($2,key) == 1 ) {
$2 = map[key]
break
}
}
print
}
' <(sort -k1,1rn operators.txt) traffic.txt
$ ./tst.sh
123456789 Germany
123456788 Austria
123456787 France
123456789 Austria
foo Barbados
bar USA
You obviously need to try a substring of the correct length.
awk 'NR==FNR{a[$2]=$1;next}
{ for (prefix in a) {
p = a[prefix]; l = length(p)
if ($2 ~ "^" p) { $2 = prefix; break } } }1' operators.txt traffic.txt
Notice how Awk itself is perfectly capable of reading files without the help of cat. You also nearly never need to pipe one Awk script into another; just refactor to put all the logic in one script.
I swapped the value of the key and the value in the NR==FNR block but that is more of a stylistic change.
And, as always, the final 1 is a shorthand idiom for printing all lines.
Perhaps as an optimization, pull the prefixes into a regular expression so that you can simply match on them all in one go, instead of looping over them.
awk 'NR==FNR{a[$1]=$2; regex = regex "|" $1; next}
FNR == 1 { regex = "^(" substr(regex, 2) ")" } # trim first "|"
match($2, regex) { $2 = a[substr($2, 1, RLENGTH)] } 1' operators.txt traffic.txt
The use of match() to pull out the length of the matched substring is arguably a complication; I wish Awk would provide this information for a normal regex match without the use of a separate dedicated function.

count the fields without empty line using awk command

safwanpaloli#hello:~/linx$ cat 1.txt
Name age address email
safwan 26 india safwanp#gmail.com
rashi 24 India rashi#gmail.com
shanif 25 India shanif#gmail.com
pradeep 25 India pradeep#gmil.com
safwanpaloli#hello:~/linx$
Display the line number with all file content
awk '{print NR,$0}'
output is
1
2 Name age address email
3 safwan 26 india safwanp#gmail.com
4 rashi 24 India rashi#gmail.com
5 shanif 25 India shanif#gmail.com
6 pradeep 25 India pradeep#gmil.com
expected result is
1 Name age address email
2 safwan 26 india safwanp#gmail.com
3 rashi 24 India rashi#gmail.com
4 shanif 25 India shanif#gmail.com
5 pradeep 25 India pradeep#gmil.com
You could examine if NF has a value greater than 0 and use a counter variable:
$ awk 'NF{print ++c,$0}' file
Output:
1 Name age address email
2 safwan 26 india safwanp#gmail.com
...
If the first line is truly empty (ie. no space in there) you could use nl file. It will print the empty line but not number it.
Above nl functionality with awk (empty lines output but not numbered):
$ awk '{print (NF?++c:""),$0}' file
Output:
1 Name age address email
2 safwan 26 india safwanp#gmail.com
3 rashi 24 India rashi#gmail.com
...
This prints non empty lines, with their count. To be counted, the line must contain at least one non whitespace character.
awk '$1!="" {print ++c,$0}'
This is similar, but only completely empty lines are skipped. Eg. a line containing nothing but a single space would still get counted.
awk '/./ {print ++c,$0}'
You can also remove empty lines with one of these greps:
grep '[^[:space:]]'
grep .

Sed replace nth column of multiple tsv files without header

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.
The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''
AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

Awk Scripting printf ignoring my sort command

I am trying to run a script that I have set up but when I go to sort the contents and display the text the content is printed but the sort command is ignored and the information is just printed. I tried this code format using awk and the sort function is ignored but I am not sure why.
Command I tried:
sort -t, -k4 -k3 | awk -F, '{printf "%-18s %-27s %-15s %s\n", $1, $2, $3, $4 }' c_list.txt
The output I am getting is:
Jim Girv 199 pathway rd Orlando FL
Megan Rios 205 highwind dr Sacremento CA
Tyler Scott 303 cross st Saint James NY
Tim Harding 1150 Washton ave Pasadena CA
The output I need is:
Tim Harding 1150 Washton ave Pasadena CA
Megan Rios 205 highwind dr Sacremento CA
Jim Girv 199 pathway rd Orlando FL
Tyler Scott 303 cross st Saint James NY
It just ignores the sort command but still prints the info I need in the format from the file.
I need it to sort based off the fourth field first the state and the third field next the town then display the information.
An example where each field is separated by a comma.
Field 1 Field 2 Field 3 Field 4
Jim Girv, 199 pathway rd, Orlando, FL
The problem is you're doing sort | awk 'script' file instead of sort file | awk 'script' so sort is sorting nothing and consequently producing no output while awk is operating on your original file and so producing output from that. You should have noticed that your sort command is hanging too for lack of input and you should have mentioned that in your question.
To demonstrate:
$ cat file
c
b
a
$ sort | awk '1' file
c
b
a
$ sort file | awk '1'
a
b
c

How to replace a string in awk containing a linefeed?

I want to process a text file using awk to replace "\n1-0" with " 1-0".
**Wrong is:
awk '{gsub("\n1-0", " 1-0"); print}' temp.txt >$TARGET
How can this be done? Would sed be a better choice?
You could set the record separator to nothing:
$ cat file
tbname id department
xyz 20 cic
1-0 xyz 21 csp
xyz 22 cpz
abc 25 cis
abc 26 cta
abc 27 tec
$ awk -v RS= '{gsub("\n1-0", " 1-0")}1' file
tbname id department
xyz 20 cic 1-0 xyz 21 csp
xyz 22 cpz
abc 25 cis
abc 26 cta
abc 27 tec