How to print a double quote symbol in Red language - rebol

How can I print a double quote symbol ("). I tried different methods mentioned on http://r.789695.n4.nabble.com/How-to-print-a-variable-with-in-double-quotes-td905186.html but they do not work:
print "\""
print as.character(x) ; x should come in quotes
print dQuote(x)
How can this be done?

There are two string formats... {This is a string} and "This is too". Escaping is done with a caret, ^.
If you use the braces you worry less about escaping. You can use single quotes, double quotes, and even braces if they are matched naturally. {This is {a valid} string}.
So you can say print {"} or if you insist print "^"". The former is usually preferable.

Related

String matching with gensub in Awk

I tried to answer the question asked here
How to replace a string like "[1.0 - 4.0]" with a numeric value using awk or sed?
I tried for
awk '{ print gensub(/[([0-9]+.[0-9]+) - ([0-9]+.[0-9]+)]/,"\\1","g")}'
but it didn't work cant understand why . Please advise.
Input given :
10368,"Verizon DSL",DSL,NY,NORTHEAST,-5,-4,"[1.1 - 3.0]","[0.384 - 0.768]"
desired output :
10368,"Verizon DSL",DSL,NY,NORTHEAST,-5,-4,1.1,0.384
You're already using bracket expressions with [0-9] so obviously you know what [...] means in a regexp. Now take a look at the regexp you wrote:
[([0-9]+.[0-9]+) - ([0-9]+.[0-9]+)]
and note where opening [ and closing ] characters occur to define the bracket expressions, in particular the first matching pair (the 2nd [ in the regexp is just a literal [ character inside the first bracket expression):
[([0-9]
+.
[0-9]
+) - (
[0-9]
+.
[0-9]
+)]
and note that the last ] is not terminating a bracket expression so it's already just a literal ] character and wouldn't need to be escaped.
Also note that the .s are regexp metacharacters that match any single character when you really wanted them to be treated literally and according to your expected output you dont want the double quotes retained so your code should have been:
$ awk '{ print gensub(/"\[([0-9]+\.[0-9]+) - ([0-9]+\.[0-9]+)]"/,"\\1","g")}' file
10368,"Verizon DSL",DSL,NY,NORTHEAST,-5,-4,1.1,0.384
The square brackets are metacharacters. If you want to match them you need to escape them.
Pay attention on the Harvery's solution in the pointed question and how the square brackets are escaped. Your awk code, instead, does not contain escaped brackets.
This should work
awk '{ print gensub(/\[([0-9]+.[0-9]+) - ([0-9]+.[0-9]+)\]/,"\\1","g")}'
this is a sed version
$ sed -r '{ s#"\[([0-9.]*)[^"]*"#\1#g }'
10368,"Verizon DSL",DSL,NY,NORTHEAST,-5,-4,1.1,0.384

How to parse this string into kv in awk in a simple way

Now I have a str in awk like this:
str = "a='abc',b=1,c='http://xxxx,http://yyyy,http://zzz'"
How can I parse it to get this result:
(a abc)(b 1)(c http://xxxx,http://yyyy,http://zzz)
By now I still implement it in such an ugly way:
result = ""
while (match(str, /[^=]*=('[^']*'|[^,]*),/) != 0) {
subs = substr(str, RSTART, RLENGTH)
str = substr(str, RSTART + RLENGTH, length(str) - RSTART - RLENGTH + 1)
split(subs, vec, "=")
gsub(/'/, "", vec[1])
gsub(/'/, "", vec[2])
if (substr(vec[2], length(vec[2]), 1) == ",") {
vec[2] = substr(vec[2], 0, length(vec[2]) - 1)
}
result = result"("vec[1]" "vec[2]")"
}
I wonder if there exist some more elegant way.
Using awk
The trick here is that we need to treat quoted commas differently from unquoted commas. That can be done as follows:
$ echo "$str" | awk -F"'" -v OFS="" '{for (i=1;i<=NF;i+=2) gsub(",", ")(", $i)} {gsub("=", " "); print "("$0")"}'
(a abc)(b 1)(c http://xxxx,http://yyyy,http://zzz)
How it works
-F"'" -v OFS=""
This sets the input field separator to a single quote and the output separator to an empty string.
{for (i=1;i<=NF;i+=2) gsub(",", ")(", $i)}
This replaces unquoted commas (odd fields) with )(.
Even numbered fields represent the quoted strings and they are left unchanged here.
gsub("=", " ")
This replaces equal signs with spaces.
print "("$0")"
This adds parens to the beginning and end and prints the line.
Using sed
$ echo "$str" | sed -r ":a; s/^(([^']*'[^']*')*[^']*'[^,']*),/\1\n/; ta; s/,/)(/g; s/^/(/; s/$/)/; s/\n/,/g; s/'//g; s/=/ /g"
(a abc)(b 1)(c http://xxxx,http://yyyy,http://zzz)
How it works
First, remember that sed processes input line-by-line. That means that, unless we put one in it, no line in sed's pattern space will contain a newline character.
This command works by replacing all quoted commas with newline characters. It then adds ( to the beginning of the line, ) to the end of the line, and replaces the remaining commas with )(. The newline characters are changed back to commas. Next the single-quotes are removed. Finally, the = signs are then replaced with spaces and we are done.
We can tell whether a comma is quoted or unquoted by whether is it is preceded by an odd or an even number of single-quotes.
In more detail:
sed -r
This starts sed with extended regular expressions.
:a; s/^(([^']*'[^']*')*[^']*'[^,']*),/\1\n/; ta
This converts all quoted commas into newline characters. The regex ^(([^']*'[^']*')*[^']*'[^,']*) matches, starting at the beginning of the line, any odd numbers of single-quotes and the text surrounding them up to the first comma afterward. The substitution command s/^(([^']*'[^']*')*[^']*'[^']*),/\1\n/ consequently replaces the first quoted comma found with a newline, \n.
:a is a label. ta is a test: it branches back to label a if a substitution was made. Thus, as many substitutions are made as needed to replace all the quoted commas with newline characters.
s/,/)(/g; s/^/(/; s/$/)/
These three substitution commands puts parens everywhere that we want one.
s/\n/,/g
Now that we have parens where we need them, this converts the newline characters that we added back to commas.
s/'//g
This removes all the single quotes.
s/=/ /g
This replaces the equal signs with spaces.

With sed or awk, how to replace all occurrences of string between quotes?

Given a file that looks like:
some text
no replace "text in quotes" no replace
more text
no replace "more text in quotes" no replace
even more text
no replace "even more text in quotes" no replace
etc
what sed or awk script would replace all the es that are between quotes and only the es between quotes such that something like the following is produced:
some text
no replace "t##$xt in quot##$s" no replace
more text
no replace "mor##$ t##$xt in quot##$s" no replace
even more text
no replace "##$v##$n mor##$ t##$xt in quot##$s" no replace
etc
There can be any number es between the quotes.
$ awk 'BEGIN{FS=OFS="\""} {gsub(/e/,"##$",$2)} 1' file
some text
no replace "t##$xt in quot##$s" no replace
more text
no replace "mor##$ t##$xt in quot##$s" no replace
even more text
no replace "##$v##$n mor##$ t##$xt in quot##$s" no replace
etc
Also consider multiple pairs of quotes on a line:
$ echo 'aebec"edeee"fegeh"eieje"kelem' |
awk 'BEGIN{FS=OFS="\""} {gsub(/e/,"##$",$2)} 1'
aebec"##$d##$##$##$"fegeh"eieje"kelem
$ echo 'aebec"edeee"fegeh"eieje"kelem' |
awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/e/,"##$",$i)} 1'
aebec"##$d##$##$##$"fegeh"##$i##$j##$"kelem
This might work for you (GNU sed):
sed -r ':a;s/^([^"]*("[^"e]*"[^"]*)*"[^"e]*)e/\1##$/;ta' file
This regex looks from the start of line for a series of non-double quote characters, followed by a possible pair of double quotes with no e's within them, followed by another possibile series of non-double quote characters, followed by a double quote and a possible series of non-double quotes. If the next pattern is an e it replaces the pattern by the \1 (which is everything up until the e) and ##$. If the substitution is successful i.e. ta, then the process is repeated until no further substitutions occur.
N.B. This caters for lines with multiple pairs of double quoted strings.
sed ':cycle
s/^\(\([^"]*\("[^"]*"\)*\)*"[^"]*\)e/\1##$/
t cycle' YourFile
Posix version
front last till first e
change also any e that will be in an unclosed quoted string (at the end thus and failed in next line if that could happend (Not present in your sample)

Using awk how do I reprint a found pattern with a new line character?

I have a text file in the format of:
aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd;
Where "bcd" can be any length of any characters, excluding ; or :
What I want to do is print the text file in the format of:
aaa: bcd;bcd;bcddd;
aaa: bcd;bcd;bcd;
-etc-
My method of approach to this problem was to isolate a pattern of ";...:" and then reprint this pattern without the initial ;
I concluded I would have to use awk's 'gsub' to do this, but have no idea how to replicate the pattern nor how to print the pattern again with this added new line character 1 character into my pattern.
Is this possible?
If not, can you please direct me in a way of tackling it?
We can't quite be sure of the variability in the aaa or bcd parts; presumably, each one could be almost anything.
You should probably be looking for:
a series of one or more non-colon, non-semicolon characters followed by colon,
with one or more repeats of:
a series of one or more non-colon, non-semicolon characters followed by a semi-colon
That makes up the unit you want to match.
/[^:;]+:([^:;]+;)+/
With that, you can substitute what was found by the same followed by a newline, and then print the result. The only trick is avoiding superfluous newlines.
Example script:
{
echo "aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd;"
echo "aaz: xcd;ycd;bczdd;baa:bed;bid;bud;"
} |
awk '{ gsub(/[^:;]+:([^:;]+;)+/, "&\n"); sub(/\n+$/, ""); print }'
Example output
aaa: bcd;bcd;bcddd;
aaa:bcd;bcd;bcd;
aaz: xcd;ycd;bczdd;
baa:bed;bid;bud;
Paraphrasing the question in a comment:
Why does the regular expression not include the characters before a colon (which is what it's intended to do, but I don't understand why)? I don't understand what "breaks" or ends the regex.
As I tried to explain at the top, you're looking for what we can call 'words', meaning sequences of characters that are neither a colon nor a semicolon. In the regex, that is [^:;]+, meaning one or more (+) of the negated character class — one or more non-colon, non-semicolon characters.
Let's pretend that spaces in a regex are not significant. We can space out the regex like this:
/ [^:;]+ : ( [^:;]+ ; ) + /
The slashes simply mark the ends, of course. The first cluster is a word; then there's a colon. Then there is a group enclosed in parentheses, tagged with a + at the end. That means that the contents of the group must occur at least once and may occur any number of times more than that. What's inside the group? Well, a word followed by a semicolon. It doesn't have to be the same word each time, but there does have to be a word there. If something can occur zero or more times, then you use a * in place of the +, of course.
The key to the regex stopping is that the aaa: in the middle of the first line does not consist of a word followed by a semicolon; it is a word followed by a colon. So, the regex has to stop before that because the aaa: doesn't match the group. The gsub() therefore finds the first sequence, and replaces that text with the same material and a newline (that's the "&\n", of course). It (gsub()) then resumes its search directly after the end of the replacement material, and — lo and behold — there is a word followed by a colon and some words followed by semicolons, so there's a second match to be replaced with its original material plus a newline.
I think that $0 must contain the newline at the end of the line. Therefore, without the sub() to remove a trailing newlines, the print (implictly of $0 with a newline) generated a blank line I didn't want in the output, so I removed the extraneous newline(s). The newline at the end of $0 would not be matched by the gsub() because it is not followed by a colon or semicolon.
This might work for you:
awk '{gsub(/[^;:]*:/,"\n&");sub(/^\n/,"");gsub(/: */,": ")}1' file
Prepend a newline (\n) to any string not containing a ; or a : followed by a :
Remove any newline prepended to the start of line.
Replace any : followed by none or many spaces with a : followed by a single space.
Print all lines.
Or this:
sed 's/;\([^;:]*: *\)/;\n\1 /g' file
Not sure how to do it in awk, but with sed this does what I think you want:
$ nl='
'
$ sed "s/\([^;]*:\)/\\${nl}\1/g" input
The first command sets the shell variable $nl to the string containing a single new line. Some versions of sed allow you to use \n inside the replacement string, but not all allow that. This keeps any whitespace that appears after the final ; and puts it at the start of the line. To get rid of that, you can do
$ sed "s/\([^;]*:\)/\\${nl}\1/g; s/\n */\\$nl/g" input
Ordinary awk gsub() and sub() don't allow you to specify components in the replacement strings Gnu awk - "gawk" - supplies "gensub()" which would allow "gensub(/(;) (.+:)/,"\1\n\2","g")"

Escaping separator within double quotes, in awk

I am using awk to parse my data with "," as separator as the input is a csv file. However, there are "," within the data which is escaped by double quotes ("...").
Example
filed1,filed2,field3,"field4,FOO,BAR",field5
How can i ignore the comma "," within the the double quote so that I can parse the output correctly using awk? I know we can do this in excel, but how do we do it in awk?
It's easy, with GNU awk 4:
zsh-4.3.12[t]% awk '{
for (i = 0; ++i <= NF;)
printf "field %d => %s\n", i, $i
}' FPAT='([^,]+)|("[^"]+")' infile
field 1 => filed1
field 2 => filed2
field 3 => field3
field 4 => "field4,FOO,BAR"
field 5 => field5
Adding some comments as per OP requirement.
From the GNU awk manual on "Defining fields by content:
The value of FPAT should be a string that provides a regular
expression. This regular expression describes the contents of each
field. In the case of CSV data as presented above, each field is
either “anything that is not a comma,” or “a double quote, anything
that is not a double quote, and a closing double quote.” If written as
a regular expression constant, we would have /([^,]+)|("[^"]+")/. Writing this as a string
requires us to escape the double quotes, leading to:
FPAT = "([^,]+)|(\"[^\"]+\")"
Using + twice, this does not work properly for empty fields, but it can be fixed as well:
As written, the regexp used for FPAT requires that each field contain at least one character. A straightforward modification (changing the first ‘+’ to ‘*’) allows fields to be empty:
FPAT = "([^,]*)|(\"[^\"]+\")"
FPAT works when there are newlines and commas inside the quoted fields, but not when there are double quotes, like this:
field1,"field,2","but this field has ""escaped"" quotes"
You can use a simple wrapper program I wrote called csvquote to make data easy for awk to interpret, and then restore the problematic special characters, like this:
csvquote inputfile.csv | awk -F, '{print $4}' | csvquote -u
See https://github.com/dbro/csvquote for code and docs
Fully fledged CSV parsers such as Perl's Text::CSV_XS are purpose-built to handle that kind of weirdness.
Suppose you only want to print the 4th field:
perl -MText::CSV_XS -lne 'BEGIN{$csv=Text::CSV_XS->new()} if($csv->parse($_)){ #f=$csv->fields(); print "\"$f[3]\"" }' file
The input line is split into array #f
Field 4 is $f[3] since Perl starts indexing at 0
I provided more explanation of Text::CSV_XS within my answer here: parse csv file using gawk