awk keep only the first and last value on comma-seaparated field - awk

Hi' I am trying to keep only first and last value of comma-separated field on my data. Following is how my input data would look like
a 1 y 1,2,4,3,6,2,1
b 2 y 3,56,3,2,1
c 3 n 4,3,2,1,4
I just want to keep first and last value on the 4th coulmn of my data so that my data would look like this:
a 1 y 1,1
b 2 y 3,1
c 3 n 4,4
Can you anyone help me how to do this? Thank you

Try this:
awk -F, -vOFS=, '{print $1,$NF}' input.txt
-F, input field separator
-vOFS=, output field separator
$1 the 1st field
$NF the last field

Try this awk command:
awk '{size = split($4,numbers,",")} {print $1" "$2" "$3" "numbers[1]","numbers[size]}'
This splits the fourth field into an array, saves the size as size, prints the first 3 fields, then the first and last elements of the numbers array.

awk -F, '{ printf "%s,%s\n", $1, $NF}' should do the job!

If your other fields can contain commas:
$ awk '{sub(/,.*,/,",",$NF)}1' file
a 1 y 1,1
b 2 y 3,1
c 3 n 4,4
If not:
$ awk '{sub(/,.*,/,",")}1' file
a 1 y 1,1
b 2 y 3,1
c 3 n 4,4

Related

how to keep newline(s) when selecting a given column with awk

Suppose I have a file like this (disclaimer: this is not fixed I can have more than 7 rows, and more than 4 columns)
R H A 23
S E A 45
T E A 34
U A 35
Y T A 35
O E A 353
J G B 23
I want the output to select second column if third column is A but keeping newline or whitespace character.
output should be:
HEE TE
I tried this:
awk '{if ($3=="A") print $2}' file | awk 'BEGIN{ORS = ""}{print $1}'
But this gives:
HEETE%
Which has a weird % and is missing the space.
You may use this gnu-awk solution using FIELDWIDTHS:
awk 'BEGIN{ FIELDWIDTHS = "1 1 1 1 1 1 *" } $5 == "A" {s = s $3}
END {print s}' file
HEE TE
awk splits each record using width values provided in this variable FIELDWIDTHS.
1 1 1 1 1 1 * means each of first 6 columns will have single character length and remaining text will be filled in 7th column. Since you have a space after each value so $2,$4,$6 will be filled with a single space and $1,$3,$5 will be filled with the provided values in input.
$5 == "A" {s = s $3}: Here we are checking if $5 is A and if that condition is true then we keep appending value of $3 in a variable s. In the END block we just print variable s.
Without using fixed width parsing, awk will treat A in 4th row as $2.
Or else if we let spaces part of column value then use:
awk '
BEGIN{ FIELDWIDTHS = "2 2 2 *" }
$3 == "A " {s = s substr($2,1,1)}
END {print s}
' file

Breakline after matching pattern with awk

I have this kind of file
A 1,2,3,4
B 1
C 1,2
I would like to get with awk this output :
A 1
A 2
A 3
A 4
B 1
C 1
C 2
C 3
tried code:
sed 's/,/\n&/g' file
Any idea with awk?
Could you please try following, using multiple field separator concept, written and tested with shown samples in GNU awk.
awk 'BEGIN{FS="[ ,]"} {for(i=2;i<=NF;i++){print $1,$i}}' Input_file
2nd solution: splitting 2nd field value.
awk '{num=split($2,arr,",");for(i=1;i<=num;i++){print $1,arr[i]}}' Input_file
Hmm:
$ awk '{gsub(/,/,ORS $1 OFS)}1' file
Output:
A 1
A 2
A 3
A 4
B 1
C 1
C 2
And if you really want THAT output from THAT input, you need to add END{print "C 3"} in the end...
Edit Please see #EdMorton's comment for a pitfall.

awk with empty field in columns

Here my file.dat
1 A 1 4
2 2 4
3 4 4
3 7 B
1 U 2
Running awk '{print $2}' file.dat gives:
A
2
4
7
U
But I would like to keep the empty field:
A
4
U
How to do it?
I must add that between :
column 1 and 2 there is 3 whitespaces field separator
column 2 and 3 and between column 3 and 4 one whitespace field separator
So in column 2 there are 2 fields missing (lines 2 and 4) and in column 4
there are also 2 fields missing (lines 3 and 5)
If this isn't all you need:
$ awk -F'[ ]' '{print $4}' file
A
4
U
then edit your question to provide a more truly representative example and clearer requirements.
If the input is fixed-width columns, you can use substr to extract the slice you want. I have assumed that you want a single character at index 5:
awk '{ print(substr($0,5,1)) }' file
Your awk code is missing field separators.
Your example file doesn't clearly show what that field separator is.
From observation your file appears to have 5 columns.
You need to determine what your field separator is first.
This example code expects \t which means <TAB> as the field separator.
awk -F'\t' '{print $3}' OFS='\t' file.dat
This outputs the 3rd column from the file. This is the 'read in' field separator -F'\t' and OFS='\t' is the 'read out'.
A
4
U
For GNU awk. It processes the file twice. On the first time it examines all records for which string indexes have only space and considers continuous space sequences as separator strings building up FIELDWIDTHS variable. On the second time it uses that for fixed width processing of the data.
a[i]:s get valus 0/1 and h (header) with this input will be 100010101 and that leads to FIELDWIDTHS="4 2 2 1":
1 A 1 4
2 2 4
3 4 4
3 7 B
1 U 2
| | | |
100010101 - while(match(h,/10*/))
\ /|/|/|
4 2 2 1
Script:
$ awk '
NR==FNR {
for(i=1;i<=length;i++) # all record chars
a[i]=((a[i]!~/^(0|)$/) || substr($0,i,1)!=" ") # keep track of all space places
if(--i>m)
m=i # max record length...
next
}
BEGINFILE {
if(NR!=0) { # only do this once
for(i=1;i<=m;i++) # ... used here
h=h a[i] # h=100010101
while(match(h,/10*/)) { # build FIELDWIDTHS
FIELDWIDTHS=FIELDWIDTHS " " RLENGTH # qnd
h=substr(h,RSTART+RLENGTH)
}
}
}
{
print $2 # and output
}' file file
And output:
A
4
U
You need to trim off the space from the fields, though.

How to print all columns after matching on key field

How can I join all fields of each row from both files after matching on a key field? How to generalize this one-liner if the number of fields is unknown in f2?
f2:
a 1 2
b 3 4
c 5 6
f3:
10 a x y z
11 g x y z
12 j x y z
observed:
a 10 x y z
a1 10 x y z
Desired:
a 1 2 10 x y z
These are my best attempts but are incorrect:
awk 'FNR==NR{a[$1]=$2;next} ($2 in a) {print a[$2],$0}' f2.txt f3.txt > f4.txt
awk 'FNR==NR{a[$1]=$2$3;next} ($2 in a) {print a[$2],$0}' f2.txt f3.txt > f4.txt
awk 'NR==FNR{a[$1]=$0;next} ($2 in a){print a[$2],$1,$3,$4,$5}' f2.txt f3.txt > f4.txt
save the whole as value and column1 as key, when read 2nd file, check column2 in array a or not, if it is, print a[$2] and the rest columns
A shorter way(the disadvantage of this command is there's one extra space between 10 and x):
awk 'NR==FNR{a[$1]=$0;next} ($2 in a){second=$2; $2="";print a[second],$0}' f2.txt f3.txt > f4.txt
replace $2 of 2nd file with empty string, and print the whole line $0
if your files are sorted in the keys as in your example, join is the tool for this task
join -11 -22 f2.txt f3,txt
#mxttgen31: try:
awk 'FNR==NR{Q=$2;$2="";A[Q]=$0;next} ($1 in A){print $0,A[$1]}' f3 f2
Explanation of above command as follows:
awk 'FNR==NR{ ##### Checking condition FNR==NR here, where FNR and NR both denotes the number of line,
only difference between FNR and NR is as we could read mutiple files from awk,
value of FNR will be RESET on next file's start, where NR's value will be keep on increasing till
it completes reading all the file. so this condition will be TRUE only when first Input_file(which is f3 here) will be TRUE.
Q=$2; ##### Assigning second field's value to variable Q.
$2=""; ##### making second field's value to NULL now.
A[$2]=$0; ##### Create an array named A whose index is $2 and value is current line.
next} ##### putting next(awk's in-built keyword) which skips all next further statements and take the cursor again to starting.
($1 in A) ##### Cursor will come here whenever second Input_file is being read, so here checking $1(first field) is present in array A then do following.
{print $0,A[$1]} ##### print the current line($0) of current file named f2 and print array A's value whose index is $1 of current file f2.
' f3 f2 ##### Mentioning Input_files here.

AWK Retrieve text after a certain pattern where the 1st and 2nd columns match the values in the 1st and 2nd columns in an input file

My input file (file1) looks like this:
part position col3 col4 info
part1 34 1 1 NAME=Mark;AGE=23;HEIGHT=189
part2 55 1 1 NAME=Alice;AGE=43;HEIGHT=167
part2 19 1 1 NAME=Emily;AGE=16;HEIGHT=164
part3 23 1 1 NAME=Owen;AGE=55;HEIGHT=181
part3 99 1 1 NAME=Rachel;AGE=76;HEIGHT=162
I need to retrieve the text after "NAME=" in the info column, but only if the values in the first two columns match another file (file2).
part position
part2 55
part3 23
Then only the 2nd and 4th rows will be considered and text after "NAME=" in those rows are put into the output file:
Alice
Owen
I don't need to preserve the order of the original rows, so the following output is equally valid:
Owen
Alice
My (not very good) attempt:
awk -F, 'FNR==NR {a[$1]=$5; next}; $1 in a {print a[$1]}' file1 file2
Something like,
awk -F"[ =;]" 'FNR==NR{found[$1" "$2]=$6; next} $1" "$2 in found{print found[$1" "$2]}'
Example
$ awk -F"[ =;]" 'FNR==NR{found[$1" "$2]=$6; next} $1" "$2 in found{print found[$1" "$2]}' file1 file2
Alice
Owen
What it does?
-F"[ =;]" -F sets the field separators. Here we set it to space or = or ;. This makes it easier to get the name from the first file without using a split function.
found[$1" "$2]=$6 This block is run only for file1, here we save the names $6 in the associative array found indexed by part position
$1" "$2 in found{print found[$1" "$2]} This is executed for the second file. Checks if the part position is found in the array, if yes print the name from the array
Using gnu awk below would do the same
awk 'NR>1 && NR==FNR{found[$1","$2];next}\
$1","$2 in found{print gensub(/^NAME=([^;]*).*/,"\\1","1",$NF);}' file2 file1
Output
Alice
Owen