script to read a file with many columns into a file with one column [duplicate] - awk

This question already has answers here:
extract words from a file
(5 answers)
Closed 7 years ago.
For example if I have a file as follow:
1 2 3 4 5 6
7 8 9 10 11 12
And I want to reorganize this file as:
1
2
3
4
5
6
7
8
9
10
11
12
Can I use the awk command for that or not?

There are multiple ways to achieve this.
With grep:
grep -oE "[0-9]+" file
The -o flag prints only the matching patterns (the digits in this case), delimited by newline
-E activates extended regular expressions.
With awk:
awk 'OFS="\n"{$1=$1}1' file
OFS defines the output field separator.
$1=$1 because we changed the OFS, we need to rebuild the line by setting the first field to itself, this will force the rebuild.
1 at least we need a true condition that the line is printed.

With sed:
TMP$ sed -r 's/ +/\n/g' File
1
2
3
4
5
6
7
8
9
10
11
12
Replace all continuous spaces with newline.

The naive AWK approach:
#!/usr/bin/awk -f
{ for (i = 1; i <= NF; i++) print $i; }
Chaos's approach is probably more efficient.

Related

read columns from several file and print them in individual columns

I have several text files which each one contains several columns contains numbers e.g:
5 10 6
6 20 1
7 30 4
8 40 3
9 23 1
4 13 6
I want to collect the second column of all files in separate columns. I used this code, it works but print all second columns in a single column.
{awk '{print $3}' > outfile}
How can I print each column in an individual one?
$ awk '{a[FNR]=(FNR in a)?a[FNR] OFS $2:$2}
END {for(i=1;i<=NR;i++) print a[i]}' file1 file2 ... > outfile
assumes all files have the same number of lines, otherwise alignment will be off.

Apply a sed command to every column of a specific row

I have a tab separated file:
samplename1/filename1 anotherthing/anotherfile asdfgh/hjklñ
2 3 4
5 6 7
I am trying to remove everything after the / just in the header of the file using sed:
sed 's/[/].*//' samplenames.txt
How can I do this for each column of the file? because right now I am removing everything after the first /, but I want to remove just the part of each column after the /.
Actual output:
samplename1
2 3 4
5 6 7
Desired output:
samplename1 anotherthing asdfgh
2 3 4
5 6 7
With GNU sed, you may use
sed -i '1 s,/[^[:space:]]*,,g' samplenames.txt
With FreeBSD sed, you need to add '' after -i.
See the online demo
The -i option will make sed change the file inline. The 1 means only the first line will be modified in the file.
The s,/[^[:space:]]*,,g command means that all occurrences of / followed with 0 or more non-whitespace chars after it will be removed.
Given:
printf "samplename1/filename1\tanotherthing/anotherfile\tasdfgh/hjklñ
2\t3\t4
5\t6\t7" >file # ie, note only one tab between fields...
Here is an POSIX awk to do this:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file
Prints:
samplename1 anotherthing asdfgh
2 3 4
5 6 7
You can get those to line up with the column command:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file | column -t
samplename1 anotherthing asdfgh
2 3 4
5 6 7

Lookup and Replace with two files in awk

I am trying to correct one file with another with a single line of AWK code. I am trying to take $1 from FILE2, look it up in FILE1, get the corresponding $3 and $4. After I set them as variables I want the program to stop evaluating FILE1, change $10 and $11 from FILE2 to the values of the variables, and print this out.
I am having trouble getting the awk to switch from FILE1 to FILE2 after I have extracted the variables. I've tried nextfile, but this resets the program and it tires to extract variables from FILE2, I set NR to the last Record, but it did not switch.
I am also doing a loop to get each line out of FILE1, but if that can be part of the script I am sure it would speed things up not having to reopen awk over and over again.
here is the parts I have figured out.
for file in `cut -f 1 FILE2`; do
awk -v a=$file '$1=a{s=$2;q=$4; ---GO TO FILE1---}{if ($1==a) {$10=s; $11=q; print 0;exit}' FILE1 FILE2 >> FILEOUT
done
a quick example set NOTE: Despite how I have this written, the two files are not in the same order and on the order of 8GB in size, so a little unwieldy to sort.
FILE1
A 12345 + AJD$JD
B 12504 + DKFJ#%
C 52042 + DSJTJE
FILE2
A 2 3 4 5 6 7 8 9 345 D$J
B 2 3 4 5 6 7 8 9 250 KFJ
C 2 3 4 5 6 7 8 9 204 SJT
OUTFILE
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
This is the code I got to work based on Kent's answer below.
awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{$9=$9" "a[$1]}{$10="";$11=""}2' f1 f2
try this one-liner:
kent$ awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{NF-=2;$0=$0" "a[$1]}7' f1 f2
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
No need to loop over the files repeatedly - just read one file and store the relevant fields in arrays keyed on $1, then go through the other file and use those arrays to look up the values you want to insert.
awk '(FILENAME=="FILE1"){y[$1]=$2;z[$1]=$4}; (FILENAME=="FILE2" && $1 in y){$10=y[$1];$11=z[$1];print $0}' FILE1 FILE2
That said, it sounds like you might have a use for the join command here rather than messing about with awk (the above script assumes all your $1/$2/$4 values will fit in memory).

Multiply every nth field...elegantly

I have a text file with a series of numbers:
1 2 4 2 2 6 3 4 7 4 4 8 2 4 6 5 5 8
I need to have every third field multiplied by 3, so output would be:
1 2 12 2 2 18 3 4 21 4 4 24 2 4 18 5 5 24
Now, I've hammered out a solution already, but I know there's a quicker, more elegant one out there. Here's what I've gotten to work:
xargs -n1 < input.txt | awk '{printf NR%3 ? "%d " : $0*3" ", $1}' > output.txt
I feel that there must be an awk one-liner that can do this?? How can I make awk look at each field (instead of each record), thus not needing the call to xargs to put every field on a different line? Or maybe sed can do it?
Try:
awk '{for (i=3;i<=NF;i+=3)$i*=3; print}' input.txt > output.txt
I have not tested this yet (posted on my iPod). The print command without parameters should print out the whole (partially modified) line. You might have to set OFS=" " in the BEGIN section to get the blank as the separator in the output.
this line would work too:
awk -v RS="\\n| " -v ORS=" " '!(NR%3){$0*=3}7' file

rearrange columns using awk or cut command

I have large file with 1000 columns. I want to rearrange so that last column should be the 3rd column. FOr this i have used,
cut -f1-2,1000,3- file > out.txt
But this does not change the order.
Could anyone help using cut or awk?
Also, I want to rearrange columns 10 and 11 as shown below:
Example:
1 10 11 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 19 20
try this awk one-liner:
awk '{$3=$NF OFS $3;$NF=""}7' file
this is moving the last col to the 3rd col. if you have 1000, then it does it with 1000th col.
EDIT
if the file is tab-delimited, you could try:
awk -F'\t' -v OFS="\t" '{$3=$NF OFS $3;$NF=""}7' file
EDIT2
add an example:
kent$ seq 20|paste -s -d'\t'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
kent$ seq 20|paste -s -d'\t'|awk -F'\t' -v OFS="\t" '{$3=$NF OFS $3;$NF=""}7'
1 2 20 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
EDIT3
You didn't give any input example. so assume you don't have empty columns in original file. (no continuous multi-tabs):
kent$ seq 20|paste -s -d'\t'|awk -F'\t' -v OFS="\t" '{$3=$10 FS $11 FS $3;$10=$11="";gsub(/\t+/,"\t")}7'
1 2 10 11 3 4 5 6 7 8 9 12 13 14 15 16 17 18 19 20
After all we could print those fields in a loop.
I THINK what you want is:
awk 'BEGIN{FS=OFS="\t"} {$3=$NF OFS $3; sub(OFS "[^" OFS "]*$","")}1' file
This might also work for you depending on your awk version:
awk 'BEGIN{FS=OFS="\t"} {$3=$NF OFS $3; NF--}1' file
Without the part after the semi-colon you'll have trailing tabs in your output.
Since many people are searching for this and even the best awk solution is not really pretty and easy to use I wanted to post my solution (mycut) written in Python:
#!/usr/bin/env python3
import sys
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
#example usage: cat file | mycut 3 2 1
columns = [int(x) for x in sys.argv[1:]]
delimiter = "\t"
for line in sys.stdin:
parts = line.split(delimiter)
print("\t".join([parts[col] for col in columns]))
I think about adding the other features of cut like changing the delimiter and a feature to use a * to print the remaning columns. But then it will get an own page.
A shell wrapper function for awk' that uses simpler syntax:
# Usage: rearrange int_n [int_o int_p ... ] < file
rearrange ()
{
unset n;
n="{ print ";
while [ "$1" ]; do
n="$n\$$1\" \" ";
shift;
done;
n="$n }";
awk "$n" | grep '\w'
}
Examples...
echo foo bar baz | rearrange 2 3 1
bar baz foo
Using bash brace expansion, rearrange first and last 5 items in descending order:
echo {1..1000}a | tr '\n' ' ' | rearrange {1000..995} {5..1}
1000a 999a 998a 997a 996a 995a 5a 4a 3a 2a 1a
Sorted 3-letter shells in /bin:
ls -lLSr /bin/?sh | rearrange 5 9
150792 /bin/csh
154072 /bin/ash
771552 /bin/zsh
1554072 /bin/ksh