I'm trying to extract a particular string variable (i.e. symbol) from a Linux program's elf file, or even from the .o it comes from.
It's in the .rodata section, and obviously I know the symbol name.
Is there a sequence of objdump-style commands and options I can use to dump out the string?
Update:
For example, the .map file includes:
.rodata.default_environment 0x000000001013f763 0x615 common/built-in.o
0x000000001013f763 default_environment
The variable itself - default_environment - is a standard null-terminated text string.
Is there a sequence of objdump-style commands and options I can use to dump out the string?
Sure. Let's construct an example:
const char foo[] = "Some text";
const char bar[] = "Other text";
const void *fn1() { return foo; }
const void *fn2() { return bar; }
$ gcc -c t.c
Suppose we want to extract contents of bar[].
$ readelf -Ws t.o | grep bar
10: 000000000000000a 11 OBJECT GLOBAL DEFAULT 5 bar
This tells us that the "contents" of the bar variable is in section 5, at offset 0xa, and is 11 bytes long.
We can extract the entire section 5:
$ readelf -x5 t.o
Hex dump of section '.rodata':
0x00000000 536f6d65 20746578 74004f74 68657220 Some text.Other
0x00000010 74657874 00 text.
and indeed find the string we are looking for. If you really want to extract just the contents of bar (e.g. because the .rodata is really large, and/or because bar contains embedded NULs):
$ objcopy -j.rodata -O binary t.o t.rodata # extract just .rodata section
$ dd if=t.rodata of=bar bs=1 skip=10 count=11 # extract just bar
11+0 records in
11+0 records out
11 bytes (11 B) copied, 0.000214501 s, 51.3 kB/s
Look at result:
$ xd bar
000000 O t h e r t e x t nul O t h e r t e x t .
QED.
Related
One of the most used files in Bioinformatics is the fasta file
format.
Fasta files are simple: They contain a "Header" record that starts
with a ">", followed by the "Sequence" record, which is everything after the
header but before the next record separator (i.e., ">").
>ENSP00000488314.1 pep chromosome:GRCh38:X:143884071:143885255:1 gene:ENSG00000276380.2 transcript:ENST00000618570.1 gene_biotype:polymorphic_pseudogene transcript_biotype:polymorphic_pseudogene gene_symbol:UBE2NL description:ubiquitin conjugating enzyme E2 N like (gene/pseudogene) [Source:HGNC Symbol;Acc:HGNC:31710]
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLA
EEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDD
PLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
> next record...
> another one...
The header can be very simple (e.g., ">ENSP00000488314.1") or complex.
Complex headers important but variable information.
In the case of the example sequences above (coming from ENSEMBL), the header record is compossed of:
Field 01: ENSP00000488314.1 <=Protein ID
Field 02: pep <=Peptide record
Field 03: chromosome:GRCh38:X:143884071:143885255:1 <=Chromosome and chromosomal coordinates
Field 04: gene:ENSG00000276380.2 <=Gene ID
Field 05: transcript:ENST00000618570.1 <=Transcript ID
Field 06: gene_biotype:polymorphic_pseudogene <=Gene Biotype
Field 07: transcript_biotype:polymorphic_pseudogene <=Transcript Biotype
Field 08: gene_symbol:UBE2NL <=Gene Symbol
Up to here the fields are all nicely separated by spaces, and then...Field 09 (Variable)
Field 09: description:ubiquitin conjugating enzyme E2 N like (gene/pseudogene)
Field 10: [Source:HGNC Symbol;Acc:HGNC:31710] <=Predictable
Many times long headers are not well received by other Bioinformatic applications, and so it is required to shorten headers.
It would be nice to do that in a smart way. Therefore, using AWK, and using the example sequences below, I would like to:
First: Control the printing of the header records as follows:
Always retain the first field:
>ENSP00000488314.1
But then be able to ommit and/or include fields. Examples:
>ENSP00000488314.1 gene:ENSG00000276380.2 transcript:ENST00000618570.1
Field: 01 04 05
>ENSP00000488314.1 pep chromosome:GRCh38:X:143884071:143885255:1 [Source:HGNC Symbol;Acc:HGNC:31710]
Field: 01 02 03 10
For simplicity, totally ignoring Field 09 would be totally acceptable, but being able to use Field 10 would be nice
Then be able to "Fold" the sequence to a user specified number. For Example the records having sequence folded every 60 characters:
>ENSP00000441696.1 pep chromosome:GRCh38:14:21868839:21869365:1 gene:ENSG00000211788.2 transcript:ENST00000390436.2 gene_biotype:TR_V_gene transcript_biotype:TR_V_gene gene_symbol:TRAV13-1 description:T cell receptor alpha variable 13-1 [Source:HGNC Symbol;Acc:HGNC:12108]
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELG
KGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>ENSP00000488314.1 pep chromosome:GRCh38:X:143884071:143885255:1 gene:ENSG00000276380.2 transcript:ENST00000618570.1 gene_biotype:polymorphic_pseudogene transcript_biotype:polymorphic_pseudogene gene_symbol:UBE2NL description:ubiquitin conjugating enzyme E2 N like (gene/pseudogene) [Source:HGNC Symbol;Acc:HGNC:31710]
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLA
EEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDD
PLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>ENSP00000437680.2 pep chromosome:GRCh38:22:42140203:42141924:-1 gene:ENSG00000205702.11 transcript:ENST00000435101.1 gene_biotype:polymorphic_pseudogene transcript_biotype:nonsense_mediated_decay gene_symbol:CYP2D7 description:cytochrome P450 family 2 subfamily D member 7 (gene/pseudogene) [Source:HGNC Symbol;Acc:HGNC:2624]
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
Could become (sequence folded every 120 characters):
>ENSP00000441696.1 gene:ENSG00000211788.2 transcript:ENST00000390436.2
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELGKGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>ENSP00000488314.1 gene:ENSG00000276380.2 transcript:ENST00000618570.1
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLAEEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDD
PLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>ENSP00000437680.2 gene:ENSG00000205702.11 transcript:ENST00000435101.1
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
So far, the best I was able to do is to call a script containing the following code:
awk -v w=60 -f script.awk fasta_file.fa
#!/usr/bin/env gawk
## Script.awk
/^>/ {
if (seq != "") print seq; print $1,$4,$5; seq = ""; next
}
{
seq = seq $1
while (length(seq) > w) {
print substr(seq, 1,w)
seq = substr(seq, 1+w)
}
}
END { if (seq != "") print seq }
The problem with the code above is that the fields $1, $4, and $5 are hard coded.
An elegant solution to a similar problem was proposed by
Ed Morton, but, it requires me to understand the \s/\S gawk extensions and AWK arrays, which is something I am struggling to do.
Any ideas on how to do improve the code above using AWK (not Perl/Python) will be greatly appreciated
This shows not only how to do what you want with awk but how to structure a shell script properly to call awk after parsing arguments (which you can't do if you invoke awk with a shebang so - don't do that). It uses GNU awk for gensub() and the 3rd arg to match():
$ cat tst.sh
#!/usr/bin/env bash
while getopts ":w:f:" opt; do
case "$opt" in
w) wid=${OPTARG}
;;
f) flds=${OPTARG}
;;
*) printf 'bad argument "%s"\n' "$opt" >&2
exit 1
;;
esac
done
shift "$((OPTIND-1))"
awk -v wid="$wid" -v flds="$flds" '
BEGIN {
wid=(wid ? wid : 120)
flds=(flds ? flds : "protein gene transcript")
numTags = split(flds,tags)
}
sub(/^>/,"") {
if (NR > 1) {
prt()
}
match($0,/(description:.*\S)\s+\[([^]]+)/,a)
$0 = substr($0,1,RSTART-1)
f["description"] = a[1]
f["predictable"] = a[2]
f["protein"] = $1
f["peptide"] = $2
for (i=3; i<=NF; i++) {
tag = gensub(/:.*/,"",1,$i)
f[tag] = $i
}
next
}
{ f["sequence"] = f["sequence"] $0 }
END { prt() }
function prt( tagNr, tag) {
printf ">"
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
printf "%s%s", f[tag], (tagNr<numTags ? OFS : ORS)
}
print gensub(".{"wid"}","&"RS,"g",f["sequence"])
delete f
}
' "${#:--}"
.
$ ./tst.sh file
>ENSP00000441696.1 gene:ENSG00000211788.2 transcript:ENST00000390436.2
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELGKGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>ENSP00000488314.1 gene:ENSG00000276380.2 transcript:ENST00000618570.1
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLAEEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDD
PLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>ENSP00000437680.2 gene:ENSG00000205702.11 transcript:ENST00000435101.1
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
.
$ ./tst.sh -w 60 -f 'gene_symbol chromosome' file
>gene_symbol:TRAV13-1 chromosome:GRCh38:14:21868839:21869365:1
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELG
KGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>gene_symbol:UBE2NL chromosome:GRCh38:X:143884071:143885255:1
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLA
EEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDD
PLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>gene_symbol:CYP2D7 chromosome:GRCh38:22:42140203:42141924:-1
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
.
$ ./tst.sh -w 10000 -f 'description' file
>description:T cell receptor alpha variable 13-1
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELGKGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>description:ubiquitin conjugating enzyme E2 N like (gene/pseudogene)
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLAEEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDDPLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>description:cytochrome P450 family 2 subfamily D member 7 (gene/pseudogene)
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
.
$ ./tst.sh -w 10000 -f 'predictable' file
>Source:HGNC Symbol;Acc:HGNC:12108
MTSIRAVFIFLWLQLDLVNGENVEQHPSTLSVQEGDSAVIKCTYSDSASNYFPWYKQELGKGPQLIIDIRSNVGEKKDQRIAVTLNKTAKHFSLHITETQPEDSAVYFCAAS
>Source:HGNC Symbol;Acc:HGNC:31710
MAELPHRIIKETQRLLAEPVPGIKAEPDESNARYFHVVIAGESKDSPFEGGTFKRELLLAEEYPMAAPKVRFMTKIYHPNVDKLERIS*DILKDKWSPALQIRTVLLSIQALLNAPNPDDPLANDVVEQWKTNEAQAIETARAWTRLYAMNSI
>Source:HGNC Symbol;Acc:HGNC:2624
DPAQPPRDLTEAFLAKKEKAKGSPESSFNDENLRIVSVSNRRSTT
I want to change [%a/b] to [%a/c].
Basically, the same as Change path or refinement, but with file! instead:
I want to change the a/b inside a block to a/c
test: [a/b]
In this case, either change next test/1 'c or test/1/2: 'c works.
But not when test is a file!:
>> test: [%a/b]
== [%a/b]
>> test/1
== %a/b
>> test/1/2 ; can't access 2nd value
== %a/b/2
>> next first test ; not quite what you expect
== %/b
Trying to change it gives not something you'd expect:
>> change next test/1 'c
== %b
>> test
== [%acb]
You are confusing path! and file! series, they can look similar, but their nature are very different.
A path! is a collection of values (often word! values) separated by a slash symbol, a file! is a collection of char! values. Slash characters in file! series are just characters, so file! has no knowledge about any sub-structures. It has (mostly) the semantics of string! series, while path! has the semantics of a block! series.
Now that this is cleared, about the test/1/2 result, path notation on a file! series has a different behavior than on string!, it will do a smart concatenation instead of acting as an accessor. It's called smart because it will nicely handle extra slash characters present in left and right parts. For example:
>> file: %/index.html
== %/index.html
>> path: %www/
== %www/
>> path/file
== %www/file
>> path/:file
== %www/index.html
Same path notation rule applies to url! series too:
>> url: http://red-lang.org
== http://red-lang.org
>> url/index.html
== http://red-lang.org/index.html
>> file: %/index.html
== %/index.html
>> url/:file
== http://red-lang.org/index.html
So for changing the nested content of test: [%a/b], as file! behaves basically as string!, you can use any available method for strings to modify it. For example:
>> test: [%a/b]
== [%a/b]
>> change skip test/1 2 %c
== %""
>> test
== [%a/c]
>> change next find test/1 slash "d"
== %""
>> test
== [%a/d]
>> parse test/1 [thru slash change skip "e"]
== true
>> test
== [%a/e]
Files are string types and can be manipulated in the same way you'd modify a string. For example:
test: [%a/b]
replace test/1 %/b %/c
This is because file! is an any-string!, not any-array!
>> any-string? %a/c
== true
>> any-array? 'a/c
== true
So the directory separator '/' in a file! doesn't mean anything special with the action CHANGE. So 'a', '/', and 'b' in %a/b are treated the same way, and the interpreter isn't trying to parse it into a two segment file path [a b].
While for a path!, because it's an array, each component is a rebol value, and the interpreter knows that. For instance, 'bcd' in a/bcd will be seen as a whole (a word!), instead of three characters 'b', 'c' and 'd'.
I agree that the file! being an any-string! is not convenient.
Here is a maybe cumbersome solution, but suitable for directories treating them as files
test/1: to-file head change skip split-path test/1 1 %c
I have the following text...
BIOS Information
Manufacturer : Dell Inc.
Version : 2.5.2
Release Date : 01/28/2015
Firmware Information
Name : iDRAC7
Version : 2.21.21 (Build 12)
Firmware Information
Name : Lifecycle Controller 2
Version : 2.21.21.21
... which is piped into the following awk statement...
awk '{ if ($1" "$2 == "BIOS Information") var=$1} END { print var }'
This will output 'BIOS' in this case.
I want to look for 'BIOS Information' and then set the third field, two lines down, so in this case 'var' would equal '2.5.2'. Is there a way to do this with awk?
EDIT:
I tried the following:
awk ' BEGIN {
FS="[ \t]*:[ \t]*";
}
NF==1 {
sectname=$0;
}
NF==2 && $1 == "Version" && sectname="BIOS Information" {
bios_version=$2;
}
END {
print bios_version;
}'
Which gives me '2.21.21.21' with the above text. Can this be modified to give me the first 'Version" following "BIOS Information"?
Following script may be an overkill but it is robust in cases if you have multiple section names and/or order of fields is changed.
BEGIN {
FS="[ \t]*:[ \t]*";
}
NF==1 {
sectname=$0;
}
NF==2 && $1 == "Version" && sectname=="BIOS Information" {
bios_version=$2;
}
END {
print bios_version;
}
First, we set input field separator so that words are not separated into different fields. Next, we check whether current line is section name or a key-value pair. If it is section name, set sectname to section name. If it is a key-value pair and current section name is "BIOS Information" and key is "Version" then we set bios_version.
To answer the question as asked:
awk -v RS= '
/^BIOS Information\n/ {
for(i=1;i<=NF;++i) { if ($i=="Version") { var=$(i+2); exit } }
}
END { print var }
' file
-v RS= puts awk in paragraph mode, so that each run of non-empty lines becomes a single record.
/^BIOS Information\n/ then only matches a record (paragraph) whose first line equals "BIOS Information".
Each paragraph is internally still split into fields by any run of whitespace (awk's default behavior), so the for loop loops over all fields until it finds literal Version, assigns the 2nd field after it to a variable (because : is parsed as a separate field) and exits, at which point the variable value is printed in the END block.
Note: A more robust and complete way to extract the version number can be found in the update below (the field-looping approach here could yield false positives and also only ever reports the first (whitespace-separated) token of the version field).
Update, based on requirements that emerged later:
To act on each paragraph's version number and create individual variables:
awk -v RS= '
# Helper function that that returns the value of the specified field.
function getFieldValue(name) {
# Split the record into everything before and after "...\n<name> : "
# and the following \n; the 2nd element of the array thus created
# then contains the desired value.
split($0, tokens, "^.*\n" name "[[:blank:]]+:[[:blank:]]+|\n")
return tokens[2]
}
/^BIOS Information\n/ {
biosVer=getFieldValue("Version")
print "BIOS version = " biosVer
}
/^Firmware Information\n/ {
firmVer=getFieldValue("Version")
print "Firmware version (" getFieldValue("Name") ") = " firmVer
}
' file
With the sample input, this yields:
BIOS version = 2.5.2
Firmware version (iDRAC7) = 2.21.21 (Build 12)
Firmware version (Lifecycle Controller 2) = 2.21.21.21
Given:
$ echo "$txt"
BIOS Information
Manufacturer : Dell Inc.
Version : 2.5.2
Release Date : 01/28/2015
Firmware Information
Name : iDRAC7
Version : 2.21.21 (Build 12)
Firmware Information
Name : Lifecycle Controller 2
Version : 2.21.21.21
You can do:
$ echo "$txt" | awk '/^BIOS Information/{f=1; printf($0)} /^Version/ && f{f=0; printf(":%s\n", $3)}'
BIOS Information:2.5.2
I have myfile.ps with a vector image included.
But when I run
ps2pdf myfile.ps
it seems that the output page size is A4: the vector image is too large and become cut away, so about one inch is lost.
The following pseudo-header is printed in the output PDF file, in addition to the original vector image:
PLOT SIZE:8.02x8.62Inches
Magnification:7354.21X
Is there any option or any way to convert the PS file to a PDF preserving the original paper size?
If the input postscript has an EPS BoundingBox, this should preserve the page size:
ps2pdf -dEPSCrop <input.ps> <output.pdf>
I doubt your quoted 2 lines are really inside the PS file as quoted... Aren't they preceeded by % comment characters?
If they weren't preceeded by such characters, no PS interpreter would work, because they are no known PostScript operators.
If they are preceeded by such characters, the PS interpreter would simply ignore them, because... they are comments only! :-)
If you want to convert this PS file to PDF, it is better to run Ghostscript directly (ps2pdf is only a thin shell script wrapper around a Ghostscript command anyway):
gs -o myfile.pdf \
-sDEVICE=pdfwrite \
-g5775x6207 \
-dPDFFitPage \
myfile.ps
Explanation:
-g... gives the medium size in pixels.
An A4 page has a dimension of 595x842pt (PostScript points).
1 Inch is the same as 72 PostScript points.
Ghostscript internally by default computes with a resolution of 720 pixels per inch when it comes to PDF output.
Hence for PDF output 595x842pt == 5950x8420px.
Hence for your case in question 8.02x8.62Inches ≈≈ 5775x6207px.
I am not allowed to comment, but I must warn everyone that all the current answers are vulnerable to malicious postscript files.
Using gs like this is VERY dangerous. ps2pdf internally uses the -dSAFER option which would, for example, prevent an untrusted postscript file from encrypting your files and rendering a pdf that demands a ransom payment from you for the decryption key! ALWAYS use -dSAFER!
While -o outputFile.pdf is nice, it is also undocumented (via man page or gs -h) as of version 9.23.
The below command works without worrying about the top being cut off like with the other solutions:
gs -sOutputFile=file.pdf -dNOPAUSE -dBATCH -sPAPERSIZE=a4 -sDEVICE=pdfwrite -dSAFER file.ps
-sPAPERSIZE=a4 is how the a4 paper size is specified.
to get the page size you can look for a line like the following:
%%PageBoundingBox:·12·12·583·830
and then use
gs -sOutputFile=file.pdf -dNOPAUSE -dBATCH -g583x830 -r72 -sDEVICE=pdfwrite -dSAFER file.ps
and it works perfectly.
Based on #Kurt Pfeifle's answer I wrote this Perl script to do the task:
#! /usr/bin/env perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
use List::Util qw(all);
sub ps2pdf;
sub get_ps_headers;
sub get_media_size;
sub main;
# Run the program
main();
# Function: main
#
# Program's entry point.
#
sub main {
for (#ARGV) {
# check input file
if(not -r) {
print "WARN: Cannot read input file: $_\n";
next;
}
# build PDF file name
my $pdf = $_;
$pdf =~ s/(\.e?ps)?$/.pdf/i;
ps2pdf($_, $pdf);
}
}
# Function: ps2pdf
#
# Converts a PostScript file to PDF format using GhostScript,
# keeping the medium size.
#
# Params:
#
# $ps_file - (string) Input [E]PS file name
# $pdf_file - (string) Output PDF file name
#
sub ps2pdf {
my ($ps_file, $pdf_file) = #_;
my $cmd = "gs -q -sDEVICE=pdfwrite -dPDFFitPage ";
# try to find the media size
my ($width, $height) = get_media_size(get_ps_header($ps_file));
# keep media size
if(defined $height) {
$cmd .= "-g${width}x${height} ";
}
# set input/output
$cmd .= "-o $pdf_file $ps_file";
print "Running: $cmd\n";
system($cmd);
}
# Function: get_media_size
#
# Computes the size of a PostScript document in pixels,
# from the headers in the PS file.
#
# Params:
#
# $hdr - (hash ref) Parsed PS header values
#
# Returns:
#
# On success: Two-element array holding the document's width and height
# On failure: undef
#
sub get_media_size {
my ($hdr) = #_;
# we need the DocumentMedia header
return undef if not defined $hdr->{DocumentMedia};
# look for valid values
my #values = split(/\s+/, $hdr->{DocumentMedia});
return undef if scalar #values < 3;
my ($width, $height) = #values[1, 2];
return undef if not all { looks_like_number($_) } ($width, $height);
# Ghostscript uses a default resolution of 720 pixels/inch,
# there are 72 PostScript points/inch.
return ($width*10, $height*10);
}
# Function: get_ps_header
#
# Parses a PostScript file looking for headers.
#
# Params:
#
# $ps_file - (string) Path of the input file
#
# Returns:
#
# (hash ref) - As expected, keys are header names,
# values are corresponding header values. A special key
# named `version' is included for headers of the type
# `PS-Adobe-3.0'
#
sub get_ps_header {
my ($ps_file) = #_;
my %head;
open my $fh, "<$ps_file" or die "Failed to open $ps_file\n";
while(<$fh>) {
# look for end of header
last if /^%%EndComments\b/;
# look for PS version
if(/^%!(\w+)/) {
$head{version} = $1;
}
# look for any other field
# Ex: %%BoundingBox: 0 0 1008 612
elsif(/^%%(\w+)\s*:\s*(.*\S)/) {
$head{$1} = $2;
}
# discard regular comments and blank lines
elsif(/^\s*(%.*)?$/) {
next;
}
# any other thing will finish the header
else {
last;
}
}
return \%head;
}
Say, I have a C code which I compile like:
$ gcc code.c -o f.out
$ ./f.out inputfile outputfile
Then the code asks for input
$ enter mass:
Now if I need to run this code for example 200 times and the input files have name : 0c.txt, 1c.txt, ....., 199c.txt etc and I want to use same value of mass every time (e.g. mass=6) then how do I write an "awk" command for that? Thanks for your help.
You don't specify your outputfile name. I'll assume 0c.out, 1c.out, ...
I'm also assuming that the f.out program reads the mass from stdin instead of anything more complicated.
#!/usr/bin/gawk -f
BEGIN {
mass = 6
for (i=0; i<200; i++) {
cmd = sprintf("./f.out %dc.txt %dc.out", i, i)
print mass |& cmd
close(cmd, "to")
while ((cmd |& getline out) > 0) {
do something with each line of output from ./f.out
}
close(cmd)
}
}
ref http://www.gnu.org/software/gawk/manual/html_node/Two_002dway-I_002fO.html
In bash, you'd write:
for i in $(seq 0 199); do
echo 6 | ./f.out ${i}c.txt ${i}c.out
done