I want to clean up a list of names and email addresses using sed [closed] - awk

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to replace the email prefix with the full first name using sed - drop the . and the initial and replace with the first name from the beginning of the line.
Antony Clark a.clark#zzz123.co ZZZ
Caroline Foster c.foster#zzz123.co ZZZ
Eg. a.clark#zzz123.co will become antony.clark#zzz123.co
thanks!

It's not glamorous, but awk with printf to preserve your formatting can be done as:
$ awk '{sub(/^[^.]+/,tolower($1),$3); printf "%-16s%-16s%-28s%s\n",$1,$2,$3,$4}' emails
Antony Clark antony.clark#zzz123.co ZZZ
Caroline Foster caroline.foster#zzz123.co ZZZ
If you simply want single space separated output, then that reduces to:
$ awk '{sub(/^[^.]+/,tolower($1),$3)}1' emails
Antony Clark antony.clark#zzz123.co ZZZ
Caroline Foster caroline.foster#zzz123.co ZZZ
(you can change the output field-separator to anything you desire)
Look things over and let me know if one of those will work.

Related

Is there any way to read 2 next lines into pattern space when a pattern is matched? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 19 days ago.
Improve this question
I have the following data file:
Name: ABC
Address: 123, 4th Street,
My County,
10009 Country.
Age: 34
Gender: Male
Name: DEF
Address: 456, Orange Street,
North State,
45678 Country.
Age: 68
Gender: Female
Basically, each record is separated by 2 empty lines.
The problem is some "Address" will contain 2 empty lines too (see example above).
How can I replace the valid "2 empty lines" separator with say "#####", but leave the 2 empty lines in the "Address" field untouched (as is)?
Thanks.
I tried various "sed" commands with N options and "awk" commands, but nothing works so far :(
Assuming that each record ends with a Gender: line then you can do something like:
awk '{if(NR<=n)next;print $0}/Gender/{print"######";n=NR+2}' yourfile.txt

Quantitavely replace digit (as counter) with string in sed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
Let's say i have the following file:
balloons:
- 2
- 3
Each number above should represents how many times i want to print the string. So for example I would like to process this to output as following:
balloons:
- red
- red
- blue
- blue
- blue
I only have red and blue balloons. The digits will vary from one file to another, so my search string would be a simple regex search sed -e "/[[:digit:]]\+/ perform_my_action"
Try:
awk 'BEGIN{idx[2]="red"; idx[3]="blue"}
/^-[ \t]+[0-9]+/{for(i=1;i<=$2;i++) print idx[$2]; next}
1
' file

Return not so similar codes from a single group [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I have a list of product codes grouped in 2 or 3 lines. I need to return the group where codes are not same (or consecutive)
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
In this case, only the third group should be returned:
9003763
9003728
I would harness GNU AWK for this task following way, let file.txt content be
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
then
awk 'BEGIN{RS=""}{diff=$NF-$1;diff=diff>0?diff:-diff}diff>NF' file.txt
gives output
9003763
9003728
Explanation: I set RS to empty string to provoke paragraph mode, thus every block is treated as single line, then for each block I compute absolute of difference between first and last field, if difference is bigger than number of field block is printed.
(tested in GNU Awk 5.0.1)

Regex to match a group and ignore everything else after a pattern for Google's re2 [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to do the following in my Bigquery sql:
input: myword1 myword my 3433123 other stuff
output: myword1 myword my
input: myword 23498780000123 more stuff
output: myword
I want the output shown above based on the above input.
I need everything before the numbers.
I tried using ^([\s\w\s]+)(?=[^\d\r\n]+\d+[^\d\r\n]+$) but re2 doesn't like it.
Re2 doesn't like ?= . Hope that helps
It seems like you want everything before the first digit. If so, you can use regexp_replace():
regexp_replace(mycol, '\s*\d.*$', '')

removing space for a url string inside a text file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a very big text file (1 GB) and I see that there are few places where the http url field has a space there.
For example in the lines below we have space between "brad pitt" and "[30 wet=]". They should be changed to "bradpitt" and "[30wet=]" but they can occur in any url or trim_url. I am currently finding these places using my program and then manually fixing it vim. Is there a way using awk/sed we can do it?
0.0 q:hello url:http://sapient.com/bapper/30/brad pitt/C345/surf trim_url:http://sapient.com/bapper/30/brad pitt/C345 rating:good
0.0 q:hello url:http://sick.com/bright/[30 wet=]/sound trim_url:http://sick.com/bright/[30 wet=]rating:good
What I tried to do was sed:
sed -i -e 's/*http*[:space:]*/*http*/g' test.txt
Using perl and a proper module to URI encode the URL:
perl -MURI::Escape -pe 's!(https?://)(.*)!$1 . uri_escape($2)!e' file
You even can replace the file in place with -i switch (just like sed) perl -MURI::Escape -i -pe [...]
Output
0.0 q:hello url:http://sapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%2Fsurf%20trim_url%3Ahttp%3A%2F%2Fsapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%20rating%3Agood
0.0 q:hello url:http://sick.com%2Fbright%2F%5B30%20wet%3D%5D%2Fsound%20trim_url%3Ahttp%3A%2F%2Fsick.com%2Fbright%2F%5B30%20wet%3D%5Drating%3Agood
URI::Escape - Percent-encode and percent-decode unsafe characters
Note
As msanford said in comments, spaces in a URL are meaningful. You can't decide to cut them without breaking the link in something that just become not reachable