Sed and count awk - awk

I need to split a text file as a sliding window and need to count “0/0” from each segment.
For example If I have 20 lines of files and window size is 10 the command as follows
sed -n '1,11p' input.txt |grep -c "0/0"
sed -n '2,12p' input.txt |grep -c "0/0"
sed -n '3,13p' input.txt |grep -c "0/0"
.
.
.
sed -n '8,18p' input.txt |grep -c "0/0"
sed -n '9,19p' input.txt |grep -c "0/0"
But if I have a large file this method wont help me to do the same. Is there any way to automate this ?

awk -v k=11 -v str="0/0" '{
cnt += found[NR%k] = index($0,str)>=1;
}
NR>=k {
print 1+NR-k "-" NR, cnt+0;
cnt -= found[(NR+1)%k];
}' file
This calls window size k. Output prints a line number range and how many of those lines contained the string str (matched using index to avoid regex matching).

Related

AWK print between two characters

When I try this command:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" |\
grep 'download.sketchapp.com/sketch-' | awk 'NR==1{print $3}'
The output is:
content="0;URL='https://download.sketchapp.com/sketch-68.2-102594.zip
what I am looking to get is:
68.2
Any help would be appreciated.
It seems you want to extract the number after your pattern, only for the first matcing row. You can use one grep command:
... | grep -oPm1 '(?<=download.sketchapp.com/sketch-)[^-]+' file
or as this is the 3rd field of your 1st curl output row you want, you can use one awk command (split field using hyphen as separator to array and print the element in the middle):
awk '/download.sketchapp.com/sketch-/ && NR==1 {split($3,a,"-"); print a[2]; exit}'
Using sed:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" | \
sed -n 's!.*download.sketchapp.com/sketch-\([^-]*\).*!\1!p;' | \
head -1
head is to get rid of multiple matches. sed command extracts non-hyphen characters after download.sketchapp.com/sketch-.

sed replace text between comma

I have csv files that need to be changed f -> 0 and t -> 1 only between commas for every single csv if it matches. From:
,t,t,f,f,a,t,f,t,f,f,t,f,
tftf
to:
,1,1,0,0,a,1,0,1,0,0,1,0,
tftf
Works this way, but want to know better way that could reduce the replacing time consume
for i in 1 2 3 4 5 6
do
echo "converting tables for mariaDB"
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,t\,/\,1\,/g'
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,f\,/\,0\,/g'
echo "$i time(s) changed "
done
I except , one single command will change the line
Could you please try following. Though it is not perfect solution but would be simplest use it in case you don't have gawk's latest version where -inplace edit option is present.
for file in *.csv
awk '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' "$file" > temp && mv temp"$file"
done
OR
for file in *.csv
awk -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' "$file" > temp && mv temp "$file"
done
2nd solution: Using gawk's latest version where we could save edit into Input_file itself.
gawk -i inplace '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' *.csv
OR
gawk -i inplace -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' Input_file
The main problem, in this case, is that a regular expression does not allow overlap when parsing it with sed 's/ere/str/g' or awk '{gsub(ere,str,$0)}'. This comment nicely explains how you can circumvent this in sed using the t<label> command, which means: if a change happened to the pattern space, move to <label>. The comment shows a generic way of doing it. The awk alternative to this rule would be:
$ awk '{while(match($0,ere)) gsub(ere,str)}'
An alternative sed solution in the case of the OP's example could use the following idea:
duplicate all commas. Since we are searching for strings of the form ",t,", this duplication avoid overlap using s.
since no overlap is possible, replace all ",f," with ",0," and all ",t," with ",1,".
We can now revert all duplicated commas again. As no overlap is allowed, sequences like ,,,, will be nicely converted to ,, and not ,
In POSIX sed this looks like:
$ sed -e 's/,/,,/g' -e 's/,f,/,0,/g' \
-e 's/,t,/,1,/g' -e 's/,,/,/g' file > file.tmp
$ mv file.tmp file
With GNU sed we can do it in one go:
$ sed -i 's/,/,,/g;s/,f,/,0,/g;s/,t,/,1,/g;s/,,/,/g' file
With awk, this would look like:
$ awk 'BEGIN{FS=",";OFS=FS FS}
{$1=$1;gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(OFS,FS)}1' file > file.tmp
$ mv file.tmp file

Using variable in sed as an output of previous command

Below is the output of my 'COMMAND' command .The output format is FILE:LINENO:PATTERN. I want to
take the below command output values in diff variables and use them in 'sed' command mentioned at the bottom.
<COMMAND>
./core.pkglist:16:package linux-release 6Server 9.0.3
./core.pkglist:18:package release-server 6Server 6.9.0.4.0.1.el6
./core.pkglist:32:package upstart 0.6.5 16.el6
./core.pkglist:33:package libnih 1.0.1 7.el6
I want to replace it with the command sed with inputs from the output of above command like:
sed "$var1 s/$var2/$c' $var3
that helps me run virtually the below command:
sed '16s/9.0.3/$c/' core.pkglist
1)The value 16 above should come from a variable like:
var1=$(COMMAND |awk -F':' '{print $2}')
2)9.0.3 should come from a variable.
var2=$(COMMAND |awk -F '{print $4}')
3)core.pkglist should come from a variable.
var3=$(COMMAND |awk -F '{print $1}')
4) $c is another command output
v3=$(echo Hello world); # Stupid examples of variables set from output commands
v1=$(wc -l <<< $v3);
v2=$(awk '{print $1}' <<< $v3);
c=$(echo Good morning);
sed "${v1}s/$v2/$c/" <<< $v3 # Use double quotes instead of singles
# echo $v3 | sed "${v1}s/$v2/$c/" works as well
Returns :
Good morning world

How do I correctly retrieve, using bash' cut, the first field from a line with only 1 field in a text file?

In a text file (accounts.txt) with (financial) accounts the sub-accounts are, and need to be, separated by an underscore, looking like this:
assets
assets_hh
assets_hh_reimbursements
assets_hh_reimbursements_ff
... etc.
Now I want to get specific sub-accounts from specific line numbers, e.g.:
field 3 from line 4:
$ lnr=4; fnr=3
$ cut -d $'\n' -f "$lnr" < accounts.txt | cut -d _ -f "$fnr"
reimbursements
$
But both fnr=1 and fnr=2 give for the first line, which has only 1 field:
$ cut -d $'\n' -f 1 < accounts.txt | cut -d _ -f "fnr"
assets
$
which is undesired behaviour.
Now I can get around this by prefixing an underscore to each account and add 1 to each required field number, but this is not an elegant solution.
Am I doing something wrong and/or can this be changed by issuing a different retrieval command?
Using the cut -d $'\n' -f "$lnr" for getting the lnr-th line from the file is somewhat strange. More common approach is using sed, like:
sed -n "${lnr}p" file | cmd ...
However, for this the awk is better - in one invocation could handle the lnr and fnr too.
file=accounts.txt
lnr=1
fnr=2
awk -F_ -v l=$lnr -v f=$fnr 'NR==l{print $f}' "$file"
The above for the all combinations lnr/fnr produces:
line field1 field2 field3 field4
------------------------------------------------------------------------
assets assets
assets_hh assets hh
assets_hh_reimbursements assets hh reimbursements
assets_hh_reimbursements_ff assets hh reimbursements ff
Check below solution -
cat f
assets
assets_hh
assets_hh_reimbursements
assets_hh_reimbursements_ff
Based on your comment try below commands -
$ lnr=1; fnr=2
$ echo $lnr $fnr
1 2
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
###Output is nothing as line 1 column 2 is blank when FS="_"
$ lnr=4;fnr=1
$ echo $lnr $fnr
4 1
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
assets
$ lnr=4;fnr=3
$ echo $lnr $fnr
4 3
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
reimbursements
One solution is to head|tail and read into an array so it's easier to work with the items:
lnr=4
fnr=2
IFS=_ read -r -a arr < <(head -n "$lnr" accounts.txt | tail -n 1)
#note that the array is 0-indexed, so the fieldnumber has to fit that
echo "${arr[$fnr]}"
Then you could expand the idea into a more usable function:
get_field_from_file() {
local fname="$1"
local lnr="$2"
local fnr="$3"
IFS=_ read -r -a arr < <(head -n "$lnr" "$fname" | tail -n 1)
if (( $fnr > ${#arr[#]} )); then
return 1
else
echo "${arr[$fnr]}"
fi
}
field=$(get_field_from_file "accounts.txt" "4" "2") || echo "no such line or field"
[[ -n $field ]] && echo "field: $field"

awk capability cut capability

I am using the following ssh command to get a list of ids. Now I want to
get only ids greater than a given number in the list of ids; let's say "231219" in this case. How can I incorporate that?
I have a local file "ids_ignore.txt"; anyid we put in this list should be ignored by the command..
Can awk or cut do the above?
ssh -p 29418 company.com gerrit query --commit-message --files --current-patch-set \
status:open project:platform/code branch:master |
grep refs | cut -f4 -d'/'
OUTPUT:-
231222
231221
231220
231219
230084
229092
228673
228635
227877
227759
226138
226118
225817
225815
225246
223554
223527
223452
223447
226137
... | awk '$1 > max' max=8888 | grep -v -F -f ids_ignore.txt
Or, if you want to do it all with awk:
... | awk 'NR==FNR{ no[$1]++ }
NR!=FNR && $1 > max && ! no[$1]' max=NNN ids_ignore.txt -
cut cannot do numeric comparison on the input fields, it's just a simple field extraction tool. awk can do the work of grep and cut:
ssh -p 29418 company.com gerrit ... |
awk -F/ -v min=231219 '
NR == FNR {ignore[$1]; next}
/refs/ && $4>min && !($4 in ignore) {print $4}
' ids_ignore.txt -
The trailing - is important at the end of the awk command: it tells awk to read from stdin after it reads the ids_ignore file.