Formatting specific output from dmidecode - awk

I was searching for a means to format output from dmidecode a specific way, and I found the following article which just about does what I need
http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_27770556.html
I modified some of the fields that I need from the code in the answer above, this shows awk creating csv output, with quotes, from dmidecode
dmidecode -t 17 | awk -F: '/Size|Locator|Speed|Manufacturer|Serial Number|Part Number/{sub(/^ */,"",$2);s=sprintf("%s,\"%s\"",s,$2)}/^Memory/{print s;s=""}END{print s}' |sed -e 's/,//' | grep -iv "no module" | tr -d ' '
"4096MB","CPU0","DIMM01","1066MHz","Samsung","754C2C33","M393B5273CH0-YH9"
I need tabbed, no quotes
4096MB CPU0 DIMM01 1066MHz Samsung 754C2C33 M393B5273CH0-YH9
I am still trying to get my head around awk and would appreciate anyone showing me the appropriate modifications
Fixed my code above, previously pasted non-working syntax

From the link you posted, I saved the data in a file called file.txt. I noticed that records are blank line separated. I used the following awk code:
awk 'BEGIN { FS=":"; OFS="\t" } /Size|Locator|Speed|Manufacturer|Serial Number|Part Number/ { gsub(/^[ \t]+/,"",$2); line = (line ? line OFS : "") $2 } /^$/ { print line; line="" }' file.txt
Results:
2048 MB XMM1 Not Specified 1333 MHz JEDEC ID 8106812F HMT125U6BFR8C-H9
No Module Installed XMM2 Not Specified Unknown JEDEC ID
2048 MB XMM3 Not Specified 1333 MHz JEDEC ID 7006C12F HMT125U6BFR8C-H9
No Module Installed XMM4 Not Specified Unknown JEDEC ID
4096 kB SYSTEM ROM Not Specified Unknown Not Specified Not Specified Not Specified
Your command line would now look like this:
dmidecode -t 17 | awk 'BEGIN { FS=":"; OFS="\t" } /Size|Locator|Speed|Manufacturer|Serial Number|Part Number/ { gsub(/^[ \t]+/,"",$2); line = (line ? line OFS : "") $2 } /^$/ { print line; line="" }' | grep -iv "no module"
EDIT:
dmidecode -t 17 | awk 'BEGIN { FS=":"; OFS="\t" } /Size|Locator|Speed|Manufacturer|Serial Number|Part Number/ { if ($2 ~ /MB$|MHz$/) { gsub(/[ \t]+/,"",$2) } gsub(/^[ \t]+/,"",$2); line = (line ? line OFS : "") $2 } /^$/ { print line; line="" }' | grep -iv "no module"

Related

Why double quote does not work in echo statement inside cmd in awk script?

gawk 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n "$2" | base64 -w 0";cmd1 | getline d1;close(cmd1); print $1,d1 }' dummy2.txt
input:
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
Expected output:
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
2|c3ViaGE6Mjt1c2VyPXBobgo=
output produced by script:
id|dummy
1|subhashree:1
2|subha:2
I have understood that the double quote around $2 is causing the issue. It does not work hence not encoding the string properly and just stripping off the string after semi colon.Because it does work inside semicolon and gives proper output in terminal.
echo "subhashree:1;user=phn" | base64
c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
[root#DERATVIV04 encode]# echo "subha:2;user=phn" | base64
c3ViaGE6Mjt1c2VyPXBobgo=
I have tried with different variation with single and double quote inside awk but it does not work.Any help will be highly appreciated.
Thanks a lot in advance.
Your existing cmd1 producing
echo -n subhashree:1;user=phn | base64 -w 0
^ semicolon is there
So if you execute below would produce
$ echo -n subhashree:1;user=phn | base64 -w 0
subhashree:1
With quotes
$ echo -n 'subhashree:1;user=phn' | base64 -w 0
c3ViaGFzaHJlZToxO3VzZXI9cGhu
Solution is just to use quotes before echo -n '<your-string>' | base64 -w 0
$ cat file
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
$ gawk -v q="'" 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n " q $2 q" | base64 -w 0"; cmd1 | getline d1;close(cmd1); print $1,d1 }' file
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhu
2|c3ViaGE6Mjt1c2VyPXBobg==
It can be simplified as below
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' file
Based on Ed Morton recommendation http://awk.freeshell.org/AllAboutGetline
if/while ( (getline var < file) > 0)
if/while ( (command | getline var) > 0)
if/while ( (command |& getline var) > 0)
The problem is because of lack of quotes, when trying to run the echo command in shell context. What you are trying to do is basically converted into
echo -n subhashree:1;user=phn | base64 -w 0
which the shell has executed as two commands separated by ; i.e. user=phn | base64 -w 0 means an assignment followed by a pipeline, which would be empty because the assignment would not produce any result over standard input for base64 for encode. The other segment subhashree:1 is just echoed out, which is stored in your getline variable d1.
The right approach fixing your problem should be using quotes
echo -n "subhashree:1;user=phn" | base64 -w 0
When you said, you were using quotes to $2, that is not actually right, the quotes are actually used in the context of awk to concatenate the cmd string i.e. "echo -n ", $2 and " | base64 -w 0" are just joined together. The proposed double quotes need to be in the context of the shell.
SO with that and few other fixes, your awk command should be below. Added gsub() to remove trailing spaces, which were present in your input shown. Also used printf over echo.
awk -v FS="|" '
BEGIN {
OFS = FS
}
NR == 1 {
print
}
NR >= 2 {
gsub(/[[:space:]]+/, "", $2)
cmd = "printf \"%s\" \"" $2 "\" | base64 -w 0"
if ((cmd | getline result) > 0) {
$2 = result
}
close(cmd)
print
}
' file
So with the command above, your command is executed as below, which would produce the right result.
printf "%s" "subhashree:1;user=phn" | base64 -w 0
You already got answers explaining how to use awk for this but you should also consider not using awk for this. The tool to sequence calls to other commands (e.g. bas64) is a shell, not awk. What you're trying to do in terms of calls is:
shell { awk { loop_on_input { shell { base64 } } } }
whereas if you call base64 directly from shell it'd just be:
shell { loop_on_input { base64 } }
Note that the awk command is spawning a new subshell once per line of input while the direct call from shell isn't.
For example:
#!/usr/bin/env bash
file='dummy2.txt'
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
Here's the difference in execution speed for an input file that has each of your data lines duplicated 100 times created by awk -v n=100 'NR==1{print; next} {for (i=1;i<=n;i++) print}' dummy2.txt > file100
$ ./tst.sh file100
Awk:
real 0m23.247s
user 0m3.755s
sys 0m10.966s
Shell:
real 0m14.512s
user 0m1.530s
sys 0m4.776s
The above timing was produced by running this command (both awk scripts posted in answers will have about the same timeing so I just picked one at random):
#!/usr/bin/env bash
doawk() {
local file="$1"
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' "$file"
}
doshell() {
local file="$1"
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
}
# Use 3rd-run timing to eliminate cache-ing as a factor
doawk "$1" >/dev/null
doawk "$1" >/dev/null
echo "Awk:"
time doawk "$1" >/dev/null
echo ""
doshell "$1" >/dev/null
doshell "$1" >/dev/null
echo "Shell:"
time doshell "$1" >/dev/null

Check for multi-line content in a file

I'm trying to check if a multi-line string exists in a file using common bash commands (grep, awk, ...).
I want to have a file with a few lines, plain lines, not patterns, that should exists in another file and create a command (sequence) that checks if it does. If grep could accept arbitrary multiline patterns, I'd do it with something similar to
grep "`cat contentfile`" targetfile
As with grep I'd like to be able to check the exit code from the command. I'm not really interested in the output. Actually no output would be preferred since then I don't have to pipe to /dev/null.
I've searched for hints, but can't come up with a search that gives any good hits. There's How can I search for a multiline pattern in a file?, but that is about pattern matching.
I've found pcre2grep, but need to use "standard" *nix tools.
Example:
contentfile:
line 3
line 4
line 5
targetfile:
line 1
line 2
line 3
line 4
line 5
line 6
This should match and return 0 since the sequence of lines in the content file is found (in the exact same order) in the target file.
EDIT: Sorry for not being clear about the "pattern" vs. "string" comparison and the "output" vs. "exit code" in the previous versions of this question.
You didn't say if you wanted a regexp match or string match and we can't tell since you named your search file "patternfile" and a "pattern" could mean anything and at one point you imply you want to do a string match (check if a multi-line _string_ exists) but then you're using grep and pcregpre with no stated args for string rather than regexp matches.
In any case, these will do whatever it is you want using any awk (which includes POSIX standard awk and you said you wanted to use standard UNIX tools) in any shell on every UNIX box:
For a regexp match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
END {
while ( match(tgt,pat) ) {
printf "%s", substr(tgt,RSTART,RLENGTH)
tgt = substr(tgt,RSTART+RLENGTH)
}
}
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
For a string match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
END {
lgth = length(pat)
while ( beg = index(tgt,pat) ) {
printf "%s", substr(tgt,beg,lgth)
tgt = substr(tgt,beg+lgth)
}
}
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
Having said that, with GNU awk you could do the following if you're OK with a regexp match and backslash interpretation of the patternfile contents (so \t is treated as a literal tab):
$ awk -v RS="$(cat patternfile)" 'RT!=""{print RT}' targetfile
line 3
line 4
line 5
or with GNU grep:
$ grep -zo "$(cat patternfile)" targetfile | tr '\0' '\n'
line 3
line 4
line 5
There are many other options depending on what kind of match you're really trying to do and which tools versions you have available.
EDIT: Since OP needs outcome of command in form of true or false(yes or no), so edited command in that manner now(created and tested in GNU awk).
awk -v message="yes" 'FNR==NR{a[$0];next} ($0 in a){if((FNR-1)==prev){b[++k]=$0} else {delete b;k=""}} {prev=FNR}; END{if(length(b)>0){print message}}' patternfile targetfile
Could you please try following, tested with given samples and it should print all continuous lines from pattern file if they are coming in same order in target file(count should be at least 2 for continuous lines in this code).
awk '
FNR==NR{
a[$0]
next
}
($0 in a){
if((FNR-1)==prev){
b[++k]=$0
}
else{
delete b
k=""
}
}
{
prev=FNR
}
END{
for(j=1;j<=k;j++){
print b[j]
}
}' patternfile targetfile
Explanation: Adding explanation for above code here.
awk ' ##Starting awk program here.
FNR==NR{ ##FNR==NR will be TRUE when first Input_file is being read.
a[$0] ##Creating an array a with index $0.
next ##next will skip all further statements from here.
}
($0 in a){ ##Statements from here will will be executed when 2nd Input_file is being read, checking if current line is present in array a.
if((FNR-1)==prev){ ##Checking condition if prev variable is equal to FNR-1 value then do following.
b[++k]=$0 ##Creating an array named b whose index is variable k whose value is increment by 1 each time it comes here.
}
else{ ##Mentioning else condition here.
delete b ##Deleting array b here.
k="" ##Nullifying k here.
}
}
{
prev=FNR ##Setting prev value as FNR value here.
}
END{ ##Starting END section of this awk program here.
for(j=1;j<=k;j++){ ##Starting a for loop here.
print b[j] ##Printing value of array b whose index is variable j here.
}
}' patternfile targetfile ##mentioning Input_file names here.
another solution in awk:
echo $(awk 'FNR==NR{ a[$0]; next}{ x=($0 in a)?x+1:0 }x==length(a){ print "OK" }' patternfile targetfile )
This returns "OK" if there is a match.
a one-liner:
$ if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
explanation:
first is to compare the two files using diff:
diff --left-column -y patternfile targetfile
> line 1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
then filter to show only interesting lines, which are the lines the '(', plus extra 1-line before, and after match, to check if lines in patternfile match without a break.
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
Then leave out the first, and last line:
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1
line 3 (
line 4 (
line 5 (
add some code to check if the number of lines match the number of lines in the patternfile:
if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
ok
to use this with a return-code, a script could be created like this:
#!/bin/bash
patternfile=$1
targetfile=$2
if [ $(diff --left-column -y $patternfile $targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat $patternfile | wc -l) ];
then
exit 0;
else
exit 1;
fi
The test (when above script is named comparepatterns):
$ comparepatterns patternfile targgetfile
echo $?
0
The easiest way to do this is to use a sliding window. First you read the pattern file, followed by file to search.
(FNR==NR) { a[FNR]=$0; n=FNR; next }
{ b[FNR]=$0 }
(FNR >= n) { for(i=1; i<=n;++i) if (a[i] != b[FNR-n+i]) { delete b[FNR-n+1]; next}}
{ print "match at", FNR-n+1}
{ r=1}
END{ exit !r}
which you call as
awk -f script.awk patternFile searchFile
Following up on a comment from Cyrus, who pointed to How to know if a text file is a subset of another, the following Python one-liner does the trick
python -c "content=open('content').read(); target=open('target').read(); exit(0 if content in target else 1);"
Unless you're talking about 10 GB+, here's an awk-based solution that's fast and clean :
mawk '{ exit NF==NR }' RS='^$' FS="${multiline_pattern}"
The pattern exists only in the file "${m2p}"
which is embedded within multi-file pipeline of 1st test,
but not 2nd one
This solution, for now, doesn't auto handle instances where regex meta-character escaping is needed. Alter it as you see fit.
Unless the pattern occurs far too often, it might even save time to do it all at once instead of having to check line-by-line, including saving lines along the way in some temp pattern space.
NR is always 1 there since RS is forced to the tail end of the input. NF is larger than 1 only when the pattern is found. By evaluating exit NF == NR, it inverts the match, thus matching structure of posix exit codes.
% echo; ( time ( \
\
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
\
) exit code : ${?} " ) ) | ecp
in0: 3.10GiB 0:00:01 [2.89GiB/s] [2.89GiB/s] [ <=> ]
( echo ; ) 0.77s user 1.74s system 110% cpu 2.281 total
multi-line-pattern ::
-------------
77138=1159=M
77138=1196=M
77138=1251=M
77138=1252=M
77138=4951=M
77138=16740=M
77138=71501=M
-----------
exit code : 0
% echo; ( time ( \
\
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
\
) exit code : ${?} " ) ) | ecp
in0: 2.95GiB 0:00:01 [2.92GiB/s] [2.92GiB/s] [ <=> ]
( echo ; ) 0.64s user 1.65s system 110% cpu 2.074 total
multi-line-pattern ::
-------------
77138=1159=M
77138=1196=M
77138=1251=M
77138=1252=M
77138=4951=M
77138=16740=M
77138=71501=M
-----------
exit code : 1
If your pattern is the full file, then something like this - even when using the full file as a single gigantic 153 MB pattern, it finished in less than 2.4 secs against ~3 GB input.
echo
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 -v pattern_file="${m2p}" '
BEGIN {
RS = "^$"
getline FS < pattern_file
close(pattern_file)
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
du -csh "${m2p}" ;
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m2p}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 -v pattern_file="${m2p}" '
BEGIN {
RS = "^$"
getline FS < pattern_file
close(pattern_file)
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
in0: 2.95GiB 0:00:01 [2.58GiB/s] [2.58GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE 0.)
0.82s user 1.71s system 111% cpu 2.260 total
exit code :: 1
153M /Users/************/m2map_main.txt
153M total
in0: 3.10GiB 0:00:01 [2.56GiB/s] [2.56GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" "${m3r}" "${m3supp}" "${m3t}")
0.83s user 1.79s system 112% cpu 2.339 total
exit code :: 0
Found a portable solution using patch command. The idea is to create a diff/patch in remove direction and check if it could be applied to the source file. Sadly there is no option for a dry-run (in my old patch version). So we've to do the patch and remove the temporary files.
The shell part around is optimized for my ksh usage:
file_in_file() {
typeset -r vtmp=/tmp/${0%.sh}.$$.tmp
typeset -r vbasefile=$1
typeset -r vcheckfile=$2
typeset -ir vlines=$(wc -l < "$vcheckfile")
{ echo "1,${vlines}d0"; sed 's/^/< /' "$vcheckfile"; } |
patch -fns -F0 -o "$vtmp" "$vbasefile" >/dev/null 2>&1
typeset -ir vrc=$?
rm -f "$vtmp"*
return $vrc
}
Explanation:
set variables for local usage (on newer bash you should use declare instead)
count lines of input file
create a patch/diff file in-memory (the line with the curly brackets)
use patch with strict settings patch -F0
cleanup (also eventually created reject files: rm -f "$vtmp"*)
return RC of patch

How to merge lines using awk command so that there should be specific fields in a line

I want to merge some rows in a file so that the lines should contain 22 fields seperated by ~.
Input file looks like this.
200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2269~744~2701~VALD~3500~93~~~~76~423~223~Y~
UN~~243~223~~~~A~200123
209~7414~7001~VALD~OM30~963~~~
~76~23~2523~Y~UN~~223~223~~~~A~123
and So on
First line looks fine. 2nd and 3rd line needs to be merged so that it becomes a line with 22 fields. 4th,5th and 6th line should be merged and so on.
Expected output:
200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2269~744~2701~VALD~3500~93~~~~76~423~223~Y~UN~~243~223~~~~A~200123
209~7414~7001~VALD~OM30~963~~~~76~23~2523~Y~UN~~223~223~~~~A~123
The file has 10 GB data but the code I wrote (used while loop) is taking too much time to execute . How to solve this problem using awk/sed command?
Code Used:
IFS=$'\n'
set -f
while read line
do
count_tild=`echo $line | grep -o '~' | wc -l`
if [ $count_tild == 21 ]
then
echo $line
else
checkLine
fi
done < file.txt
function checkLine
{
current_line=$line
read line1
next_line=$line1
new_line=`echo "$current_line$next_line"`
count_tild_mod=`echo $new_line | grep -o '~' | wc -l`
if [ $count_tild_mod == 21 ]
then
echo "$new_line"
else
line=$new_line
checkLine
fi
}
Using only the shell for this is slow, error-prone, and frustrating. Try Awk instead.
awk -F '~' 'NF==1 { next } # Hack; see below
NF<22 {
for(i=1; i<=NF; i++) f[++a]=$i }
a==22 {
for(i=1; i<=a; ++i) printf "%s%s", f[i], (i==22 ? "\n" : "~")
a=0 }
NF==22
END {
if(a) for(i=1; i<=a; i++) printf "%s%s", f[i], (i==a ? "\n" : "~") }' file.txt>file.new
This assumes that consecutive lines with too few fields will always add up to exactly 22 when you merge them. You might want to check this assumption (or perhaps accept this answer and ask a new question with more and better details). Or maybe just add something like
a>22 {
print FILENAME ":" FNR ": Too many fields " a >"/dev/stderr"
exit 1 }
The NF==1 block is a hack to bypass the weirdness of the completely empty line 5 in your sample.
Your attempt contained multiple errors and inefficiencies; for a start, try http://shellcheck.net/ to diagnose many of them.
$ cat tst.awk
BEGIN { FS="~" }
{
sub(/^[0-9]+\./,"")
gsub(/[[:space:]]+/,"")
$0 = prev $0
if ( NF == 22 ) {
print ++cnt "." $0
prev = ""
}
else {
prev = $0
}
}
$ awk -f tst.awk file
1.200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2.2269~744~2701~VALD~3500~93~~~~76~423~223~Y~UN~~243~223~~~~A~200123
3.209~7414~7001~VALD~OM30~963~~~~76~23~2523~Y~UN~~223~223~~~~A~123
The assumption above is that you never have more than 22 fields on 1 line nor do you exceed 22 in any concatenation of the contiguous lines that are each less than 22 fields, just like you show in your sample input.
You can try this awk
awk '
BEGIN {
FS=OFS="~"
}
{
while(NF<22) {
if(NF==0)
break
a=$0
getline
$0=a$0
}
if(NF!=0)
print
}
' infile
or this sed
sed -E '
:A
s/((.*~){21})([^~]*)/\1\3/
tB
N
bA
:B
s/\n//g
' infile

Split file into good and bad data

I have a file file1.txt the data is like below
HDR|2016-10-24
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE|1
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME
TRL|11
Now I want to create two set of files good and bad. Good should be where all 29 separators are there. Where it is less than or more than 29 separator (which is pipe) it should go into bad file.
IN_FILE=$1
FNAME=`echo $IN_FILE | cut -d"." -f1 | awk '{$1 = substr($1, 1, 26)} 1'`
DFNAME=$FNAME"_Data.txt"
DGFNAME=$FNAME"_Good.txt"
DBFNAME=$FNAME"_Bad.txt"
TFNAME=$FNAME"_Trl.txt"
cat $IN_FILE | awk -v DGFNM="$DGFNAME" -v DBFNM="$DBFNAME" '
{ {FS="|"}
split($0, chars, "|")
if(chars[1]=="DTL")
{
NSEP=`awk -F\| '{print NF}'`
if [ "$NSEP" = "29" ]
then
print substr($0,5) >> DGFNM
else
print $0 >> DBFNM
fi
}
}'
But I am getting some error on this.
awk: cmd. line:5: NSEP=`awk -F\| {print
awk: cmd. line:5: ^ invalid char '`' in expression
Looks like you want:
awk -F'|' -v DGFNM="$DGFNAME" -v DBFNM="$DBFNAME" '
$1 == "DTL" {
if (NF == 29) {
print substr($0, 5) > DGFNM
} else {
print > DBFNM
}
}
' "$IN_FILE"
Your code has two main problems:
it uses shell syntax (such as `....` and [ ... ]) inside an awk script, which is not supported.
it performs operations explicitly that awk performs implicitly by default.
Also:
it is best to avoid all-uppercase variable names - both in the shell and in awk scripts - because they can conflict with reserved variables.
As #tripleee points out in a comment, you can pass filenames directly to Awk (as in the above code) - no need for cat and a pipelin.
In essence:
$ awk -F\| 'NF==30 {print > "good.txt"; next}{print > "bad.txt"}' file1.txt
29 separators means 30 fields, just check the NF.

grep between multiple pattern

Here is a (real-world) text:
<tr>
randomtext
ip_(45.54.58.85)
randomtext..
port(randomtext45)
randomtext random...
</tr>
<tr>
randomtext ran
ip_(5.55.45.8)
randomtext4
port(other$_text_other_length444)
</tr>
<tr>
randomtext
random
port(other$text52)
</tr>
output should be:
45.54.58.85 45
5.55.45.8 444
I know how to grep 45.54.58.85 and 5.55.45.8
awk 'BEGIN{ RS="<tr>"}1' file | grep -oP '(?<=ip_\()[^)]*'
how to grep port taking into account that we have random text/length after port( ?
I put a third record that should not appear in the output as there is no ip
Using GNU Awk:
gawk 'BEGIN { RS = "<tr>" } match($0, /.*^ip_[(]([^)]+).*^port[(].*[^0-9]+([0-9]+)[)].*/, a) { print a[1], a[2] }' your_file
And another that's compatible with any Awk:
awk -F '[()]' '$1 == "<tr>" { i = 0 } $1 == "ip_" { i = $2 } $1 == "port" && i { sub(/.*[^0-9]/, "", $2); if (length($2)) print i, $2 }' your_file
Output:
45.54.58.85 45
5.55.45.8 444
Through GNU awk , grep and paste.
$ awk 'BEGIN{ RS="<tr>"}/ip_/{print;}' file | grep -oP 'ip_\(\K[^)]*|port\(\D*\K\d+' | paste - -
45.54.58.85 45
5.55.45.8 444
Explanation:
awk 'BEGIN{ RS="<tr>"}/ip_/{print;}' file with the Record Separator value as <tr>, this awk command prints only the record which contains the string ip_
ip_\(\K[^)]* prints only the text which was just after to ip_( upto the next ) symbol. \K in the pattern discards the previously matched characters.
| Logical OR symbol.
port\(\D*\K\d+ Prints only the numbers which was inside port() string.
paste - - combine every two lines.
Here is another awk
awk -F"[()]" '/^ip/ {ip=$2;f=NR} f && NR==f+2 {n=split($2,a,"[a-z]+");print ip,a[n]}' file
45.54.58.85 45
5.55.45.8 444
How it works:
awk -F"[()]" ' # Set field separator to "()"
/^ip/ { # If line starts with "ip" do
ip=$2 # Set "ip" to field $2
f=NR} # Set "f" to line number
f && NR==f+2 { # Go two line down and
n=split($2,a,"[a-z]+") # Split second part to get port
print ip,a[n] # Print "ip" and "port"
}' file # Read the file
WIth any modern awk:
$ awk -F'[()]' '
$1=="ip_" { ip=$2 }
$1=="port" { sub(/.*[^[:digit:]]/,"",$2); port=$2 }
$1=="</tr>" { if (ip) print ip, port; ip="" }
' file
45.54.58.85 45
5.55.45.8 444
Couldn't be much simpler and clearer IMHO.