how to remove part of the string if the condition exists [closed] - awk

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a file similar to this:
A*01:03:05
B1*02:06:08
F2*03:01:06
R5*02:01
S1*02:08
And would like to remove the last 2 numbers and the colon, only when there are 2 colon separators. so it will be:
A*01:03
B1*02:06
F2*03:01
R5*02:01
S1*02:08
The last 2 lines remain unchanged because they do not have 2 colon separators after the *, so no changes are made to those values
I used sed and gsub to remove everything after the last underscore but was not sure how to add a condition to exempt the condition when I do not have 2 colons after the *.

This might work for you (GNU sed):
sed 's/:..//2' file
This removes the second occurrence of a : followed by 2 characters.
If this is too lax, use:
sed -E 's/^([^:]*:[^:]*):[0-9]{2}/\1/' file

With cut, you can set : as delimiter and print only upto first two fields
cut -d: -f-2 ip.txt
Similar logic can be done with awk, assuming the implementation supports manipulating NF
awk 'BEGIN{FS=OFS=":"} NF==3{NF=2} 1' ip.txt

This works:
$ sed -E 's/^([^:]*:[^:]*):[0-9][0-9]$/\1/' file
The [^:] means 'any character other than a :' so it works by making the deletion at the end only if there are two leading colons.
This awk works too:
$ awk 'gsub(/:/,":")==2 {sub(/:[0-9][0-9]$/,"")} 1' file
In this case, gsub returns the number of replacements made. So if there are two colons, delete the ending.
You can also use GNU grep (with PCRE) to only match the template of what you are looking for:
$ grep -oP '^\w+\*\d\d:\d\d' file
Or perl same way:
$ perl -lnE 'say "$1" if /(^\w+\*\d\d:\d\d)/' file

Related

Delete lines that contain only spaces [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
The community reviewed whether to reopen this question 5 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Consider following file.
Dalot
# Really empty line
Eriksen
# Line containing space
Malacia
# Line containing tab
# Really empty line
Varane
How do I remove line that ONLY contain either whitespace or tab on it, and leaving empty line intact. The other answer here mostly will remove all empty line including spaces and tab.
Following is desired output.
Dalot
# Really empty line
Eriksen
Malacia
# Really empty line
Varane
Using awk:
awk '/^$/ || NF' file
sed -E '/^[\t ]+$/d'
i.e. "If the line contains spaces and tabs only, remove it".
This might work for you (GNU sed):
sed '/\S/!d' file
Delete all lines that do not contain at least one non-white spaced character.
Alternative:
sed '/^\s*$/d' file
$ awk '/^(|[^\t ]+)$/' file
or
$ sed -En '/^(|[^\t ]+)$/p' file
print if no chars or at least one non-whitespace char.
Note that #choroba did the reverse of this logic to delete lines, which is actually smarter.
cat file | tr ' ' | tr '\t' > f2
mv f2 file

what operations are performing in the given coding [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am having a shell script code like below:
awk 'NR==FNR{a[$0]; next} !($0 in a){print "Fail: "$0 " is not found"}' <(cat file3 <(grep -r names file2)) <(grep -r present file1)
Can someone explain in the above code what the awk is doing here..?
This is the kind of question where you can take it apart piece by piece:
do grep -r present file1 on it's own and see what it outputs
although if "file1" is truly a file and not a directory, then the -r option is useless
<(...) is a Process Substitution -- it takes the output of the script and lets you handle that as a file
Similarly, <(cat file3 <(grep -r names file2)) concatenates the contents
of "file3" and the output of the grep command.-
now, the awk script
awk 'NR==FNR {do something; next} some more awk code' fileA fileB is a very common awk idiom
NR == FNR means "the current record number (out of all files processed so far) is equal to the record number of the current file being processed" -- this can only happen for the first file in the list[1]
so, do something only for the first file, because next won't allow the "some more awk code" to be reached.
Without showing us the contents of the files, there's not much more to say. If you were to show the inputs and output, we can help you understand exactly why you see the results you see.

Search for a string which is stretched over 2 lines [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
i search for this string abcdefgh in a very large file like this and i don't know at which position the new line begin. My first thought was to remove all \n, but the file is over 3 gb ... I think there is smart way to do this (sed, awk, ...?)
efhashivia
hjasjdjasd
oqkfoflABC
DEFGHqpclq
pasdpapsda
Assuming that your search string cannot expand into more than 2 lines, you can use this awk:
awk -v p="ABCDEFGH" 's $0 ~ p {print NR,s $0} {s=$0}' file
Or you can paste each line with its next one, and grep the result. This way you have to create a file with double size of your large input.
tail -n +2 file | paste -d '' file - > output.txt
> cat output.txt
efhashiviahjasjdjasd
hjasjdjasdoqkfoflABC
oqkfoflABCDEFGHqpclq
DEFGHqpclqpasdpapsda
pasdpapsda
> grep -n ABCDEFGH output.txt
3:oqkfoflABCDEFGHqpclq

Combine 2 Conditions of AWK in 1 set of forward slashs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a file where I need only 18th column and that 18th column must not contain 30 words like
AAA, BBB, CCC etc
Sample file
$ cat a.csv
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,Aaa
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,BBB
awk -F, '!($18 ~ /AAA/) && !($18 ~ /BBB/) {print $18 }'
It's possible to write something like
awk -F, '!($18 ~ /AAA, BBB /) {print $18 }'
EDIT
If I use
i=$("AAA|BBB")
awk -F, '!($18 ~ /$i/) {print $18 }'
it produces error command not found
You could use the alternation operator | and use something like
awk -F',' '$18 !~ /AAA|BBB|CCC/{print $18}' a.csv
If you want to simply strip lines where a field is one of a set of blacklist entries, you can create the blacklist once in the BEGIN section, then simply use ~ to see if that blacklist contains your field.
Perhaps the easiest way to do this is to construct the blacklist using the input field separator (so you know it won't be part of the field). With an input.csv file of:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,AAA
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,BBB
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,CCC
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,DDD
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,EEE
Let's say you don't want lines where field 18 is AAA, BBB or DDD:
pax> awk -F, 'BEGIN{ss=",AAA,BBB,DDD,"}ss!~","$18","{print}' input.csv
CCC
EEE
Below, we'll break down how it works:
BEGIN {
ss=",AAA,BBB,DDD," # This is the blacklist, note IFS separator and start/end.
}
ss !~ ","$18"," { # If ",<column 18>," not in blacklist, print.
print $18
}
The trick is to create a string which is the column we're checking surrounded by the delimiters (which cannot be in the column). If we find that in the blacklist (which is every unwanted item surrounded by the delimiter), we can discard it.
Note that you're not restricted to a fixed blacklist (either in your string or if you decide to use a regex solution), you can if you wish read the entries from a file and dynamically construct the list. For example, consider the file blacklist.txt:
AAA
BBB
DDD
and the input.txt file as shown above. The following awk command can dynamically create the blacklist from that file thusly:
pax> awk -F, 'BEGIN{ss=","}NR==FNR{ss=ss""$1",";next}ss!~","$18","{print}' blacklist.txt input.csv
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,CCC
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,EEE
Again, breaking it down:
BEGIN {
ss = "," # Start blacklist.
}
NR==FNR { # Only true for first file in list (blacklist).
ss = ss""$1"," # Extend blacklist.
next # Go get next line.
}
ss !~ ","$18"," { # Only get here for second file (input).
print
}
Here, we process the first file to construct the blacklist (rather than having a fixed one). Lines in the second file are treated as per my original script above.

Using AWK or SED to prepend a single_quote to each line in a file [duplicate]

This question already has answers here:
How can I prepend a string to the beginning of each line in a file?
(9 answers)
Closed 3 years ago.
I am working with ffmpeg and want to automate making one big video from many smaller. I can use a list file, but each line Must Be file 'name.ext'. I can't figure out how to get sed or awk to NOT see the ' as a control character. Any way to do that?
I have tried using a variable as instead of the string file ', tried a two statement script where i set file # then use another cmd to change the # to ', but it fails every time
awk '{print "line #" $0}' uselessInfo.txt >goofy2.txt
sed '/#/\'/g' goofy2.txt >goofy3.txt
tried the sed line with " around the ' also
Neither sed nor awk are seeing ' as a control character. In fact they aren't seeing the ' at all in the code you posted - the shell doesn't allow you to use single quotes inside a single quote delimited script.
Your question isn't clear. Is this what you're trying to do?
$ echo 'foo' | awk '{print "file \047" $0 "\047"}'
file 'foo'
$ echo 'foo' | sed 's/.*/file '\''&'\''/'
file 'foo'
$ echo 'foo' | sed "s/.*/file '&'/"
file 'foo'
If not then edit your question to clarify and provide a concrete example we can test against.
If you just want to add a single quote at the front of each line:
> cat test.txt
a
b
c
> sed "s/^/'/" test.txt
'a
'b
'c
You can then output this to whatever file you wish as in your example.
The solution to your problem, I believe, lies in the fact that characters within single quotes on the command line are not interpreted. This means that when you try to escape a quote inside a single quoted string, this will not work and you just get a backslash in your string. Comparing this to a string bound by double quotes, we see that the backslash is interpreted before being passed to the command echo as an argument.
> echo '\'
\
> echo "\""
"
another useful trick is defining single quote as a variable
$ echo "part 1" | awk -v q="'" '{print "line " q $0 q}'
line 'part 1'