AWK print between two characters - awk

When I try this command:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" |\
grep 'download.sketchapp.com/sketch-' | awk 'NR==1{print $3}'
The output is:
content="0;URL='https://download.sketchapp.com/sketch-68.2-102594.zip
what I am looking to get is:
68.2
Any help would be appreciated.

It seems you want to extract the number after your pattern, only for the first matcing row. You can use one grep command:
... | grep -oPm1 '(?<=download.sketchapp.com/sketch-)[^-]+' file
or as this is the 3rd field of your 1st curl output row you want, you can use one awk command (split field using hyphen as separator to array and print the element in the middle):
awk '/download.sketchapp.com/sketch-/ && NR==1 {split($3,a,"-"); print a[2]; exit}'

Using sed:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" | \
sed -n 's!.*download.sketchapp.com/sketch-\([^-]*\).*!\1!p;' | \
head -1
head is to get rid of multiple matches. sed command extracts non-hyphen characters after download.sketchapp.com/sketch-.

Related

Is using awk at least 'awk -F' always will be fine?

What is the difference on Ubuntu between awk and awk -F? For example to display the frequency of the cpu core 0 we use the command
cat /proc/cpuinfo | grep -i "^ cpu MHz" | awk -F ":" '{print $ 2}' | head -1
But why it uses awk -F? We could put awk without the -F and it would work of course (already tested).
Because without -F , we couldn't find from wath separator i will begin the calculation and print the right result. It's like a way to specify the kind of separator for this awk's using. Without it, it will choose the trivial separator in the line like if i type on the terminal: ps | grep xeyes | awk '{print $1}' ; in this case it will choose the space ' ' as a separator to print the first value: pid OF the process xeyes. I found it in https://www.shellunix.com/awk.html. Thanks for all.

I need to extract I'd from a Google drive urls with sed, gawk or grep

URLs:
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
I need a single regex for these all urls.
This is what I tried to use but didn't get expected results.
sed -E 's/.*\(folders\)?\(id\)?=?\/?(.*)&?.*/\1/'
Expected results:
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
With your own code updated:
$ cat file
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file | uniq
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
And yours updated to sed -E 's#.*(folders/|id=)(.*)(\?|&|$).*#\2#' would work on GNU sed.
You are using -E, so no need to escape group quotes (), and | means OR.
When matching literal ?, you need to escape it.
And the separator of sed can change to other character, which is # here.
Note uniq will only remove adjacent duplicates, if there're duplicates in different places, change it to sort -u instead.
A GNU grep solution :
$ grep -Poi '(id=|folders/)\K[a-z0-9_-]*' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
Also these two give same results, but are more accurate than above shorter sed one:
sed -E 's#.*(folders/|id=)([A-Za-z0-9_-]*).*#\2#'
sed -E 's#.*(folders/|id=)([[:alnum:]_-]*).*#\2#'
Btw, + means one or more occurances, * means zero or more.
A GNU awk version (removes duplicates at the same time):
awk 'match($0,".*(folders/|id=)([A-Za-z0-9_-]+)",m){if(!a[m[2]]++)print m[2]}' file
Could you please try following.
awk 'match($0,/uc\?id=[^&]*|folders\/[^?]*/){value=substr($0,RSTART,RLENGTH);gsub(/.*=|.*\//,"",value);print value}' Input_file
Try this:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file
Explanations:
.*(id=|folders\/): after any characters(.*) followed by id= or folders/
([^&?/]*): search and capture any characters except &, ? and /
\2: using backreference, matching string is replaced with the second captured text([^&?/]*)
Edit:
To remove duplicate url, just pipe the command to sort then to uniq(because uniq just removes adjacent duplicate lines, you may want to sort the list before):
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort | uniq
As #Tiw suggests in edit, you can also pipe to a single command by using sort with the -u flag:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort -u
Using Perl
$ cat rohit.txt
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ perl -lne ' s/.*\/.*..\/(.*)$/$1/g; s/(.*id=)//g; /(.+?)(&|\?|$)/ and print $1 ' rohit.txt
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$

Trying to print awk variable

I am not much of an awk user, but after some Googling, determined it would work best for what I am trying to do...only problem is, I can't get it to work. I'm trying to print out the contents of sudoers while inserting the server name ($i) and a comma before the sudoers entry as I'm directing it to a .csv file.
egrep '^[aA-zZ]|^[%]' //$i/etc/sudoers | awk -v var="$i" '{print "$var," $0}' | tee -a $LOG
This is the output that I get:
$var,unixpvfn ALL = (root)NOPASSWD:/usr/bin/passwd
awk: no program given
Thanks in advance
egrep is superfluous here. Just awk:
awk -v var="$i" '/^[[:alpha:]%]/{print var","$0}' //"$i"/etc/sudoers | tee -a "$LOG"
Btw, you may also use sed:
sed "/^[[:alpha:]%]/s/^/${i},/" //"$i"/etc/sudoers | tee -a "$LOG"
You can save the grep and let awk do all the work:
awk -v svr="$i" '/^[aA-zZ%]/{print svr "," $0}' //$i/etc/sudoers
| tee -a $LOG
If you put things between "..", it means literal string, and variable won't be expanded in awk. Also, don't put $ before a variable, it will indicate the column, not the variable you meant.

awk to transpose lines of a text file

A .csv file that has lines like this:
20111205 010016287,1.236220,1.236440
It needs to read like this:
20111205 01:00:16.287,1.236220,1.236440
How do I do this in awk? Experimenting, I got this far. I need to do it in two passes I think. One sub to read the date&time field, and the next to change it.
awk -F, '{print;x=$1;sub(/.*=/,"",$1);}' data.csv
Use that awk command:
echo "20111205 010016287,1.236220,1.236440" | \
awk -F[\ \,] '{printf "%s %s:%s:%s.%s,%s,%s\n", \
$1,substr($2,1,2),substr($2,3,2),substr($2,5,2),substr($2,7,3),$3,$4}'
Explanation:
-F[\ \,]: sets the delimiter to space and ,
printf "%s %s:%s:%s.%s,%s,%s\n": format the output
substr($2,0,3): cuts the second firls ($2) in the desired pieces
Or use that sed command:
echo "20111205 010016287,1.236220,1.236440" | \
sed 's/\([0-9]\{8\}\) \([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1 \2:\3:\4.\5/g'
Explanation:
[0-9]\{8\}: first match a 8-digit pattern and save it as \1
[0-9]\{2\}...: after a space match 3 times a 2-digit pattern and save them to \2, \3 and \4
[0-9]\{3\}: and at last match 3-digit pattern and save it as \5
\1 \2:\3:\4.\5: format the output
sed is better suited to this job since it's a simple substitution on single lines:
$ sed -r 's/( ..)(..)(..)/\1:\2:\3./' file
20111205 01:00:16.287,1.236220,1.236440
but if you prefer here's GNU awk with gensub():
$ awk '{print gensub(/( ..)(..)(..)/,"\\1:\\2:\\3.","")}' file
20111205 01:00:16.287,1.236220,1.236440

Convert bash line to use in perl

How would I go about converting the following bash line into perl? Could I run the system() command, or is there a better way? I'm looking for perl to print out access per day from my apache access_log file.
In bash:
awk '{print $4}' /etc/httpd/logs/access_log | cut -d: -f1 | uniq -c
Prints the following:
632 [27/Apr/2014
156 [28/Apr/2014
awk '{print $4}' /etc/httpd/logs/access_log | cut -d: -f1 | uniq -c
perl -lane'
($val) = split /:/, $F[3]; # First colon-separated elem of the 4th field
++$c{$val}; # Increment number of occurrences of val
END { print for map { "$c{$_} $_" } keys %c } # Print results in no order
' access.log
Switches:
-l automatically appends a newline to the print statement.
-l also removes the newlines from lines read by -n (and -p).
-a splits the line on whitespace into the array #F.
-n loops over the lines of the input but does not print each line.
-e execute the given script body.
Your original command translated to a Perl one-liner:
perl -lane '($k) = $F[3] =~ /^(.*?):/; $h{$k}++ }{ print "$h{$_}\t$_" for keys %h' /etc/httpd/logs/access_log
You can change all your commands to one from:
awk '{print $4}' /etc/httpd/logs/access_log | cut -d: -f1 | uniq -c
to
awk '{split($4,a,":");b[a[1]]++} END {for (i in b) print b[i],i}' /etc/httpd/logs/access_log