MIME "From:" header with national characters - header

What is the correct format of "From:" header when From Name contains national characters and dot (.) character?
We generate (using C# Chilkat lib) this:
From: =?utf-8?Q?Micha=C5=82_from_domain.com?= <abcdef#domain.com>
(where From Name = Michał from domain.com)
This works OK in most cases. However, we encountered an email provider which marks this header as invalid and uses Return-Path header instead (which is machine-readable only).
The error is:
Illegal-Object: Syntax error in From: address found on ps11.m5r2.onet:
From: =?utf-8?Q?Micha=C5=82_from_domain.com?=<abcdef#domain.com>
^-missing end of mailbox
The provider insists the the problem is the lack of space between name and email. This is not the case on our end (see previous code example).

That email provider has a broken MTA. Unfortunately, you have to deal with it.
You're already formatting your non-ASCII "From" personal-part as an RFC 2047 encoded-word. Since you're using Q as the encoding, you can take advantage of the flexibility in the quoted-printable encoding and encode the . as well:
From: =?utf-8?Q?Micha=C5=82_from_domain=2Ecom?= <abcdef#domain.com>
(Note that the . has been replaced by its quoted-printable encoding, =2E.)

Related

Where are named pdf characters defined like "f_f", "uni00D0" and "a204"?

I'm trying to read the official pdf specification "Document management — Portable document format — Part 1: PDF 1.7" (PDF32000_2008.pdf) as bytes and then interpret them according to that specification.
In Annex D, Character Sets and Encodings, there is a list of all named characters, like:
or
When I parse PDF32000_2008.pdf, there are also named characters like "f_f", "uni00D0" and "a204", which are missing in that specification.
My guess is that "f_f" is a symbol for two 'f' characters, which might get printed with a special glyph. There is a unicode "Latin Small Ligature Ff" for 'ff'.
For example, there is also "f_i" in that file, which I expect to mean 'fi', one glyph showing the 2 characters 'f' and 'i'. However, the pdf specification has 'fi' as named character "fi" and what is the point for having 2 named characters pointing to the same symbol ?
I can imagine that "uni00D0" means the unicode character 'Ð'. However, pdf defines it already as named character "Eth"
What could be "a204" ? Maybe Ansi 204 'Ì', which also has already a named character "Igrave" ?
Why do they use also "a62", which would be just a '<' ?
However, my main question is: Where can I find a specification for these additional named characters ?
Of course, Adobe Acrobat understands them, but also Gmail seems not to have a problem with them. So I guess, their meaning must be specified somewhere.

AspNet core Url decoding

I am using AspNetCore 2.1.
I encountered an issue to deserialize a portion of URL:
http://localhost:55381/api/Umbrellas/cc1892b0-b790-4698-ae3e-07bee39fd29b/ModeOperationnelWithAppliedEvents?dateDeValeur=2018-09-01T02:00:00.000+02:00
the part "2018-09-01T02:00:00.000+02:00" is expected to be deserialized as DateTimeOffset. But it failed to do it. A default(DateTimeOffset) is returned.
If I encode to this format "2018-09-01T02%3A00%3A00.000%2B02%3A00" => correctly deserialized.
When it is enclosed in URL, that does not work.
In the contrarily, when the same format is enclosed in the body of message, it is correctly deserialized.
{"lastKnownAggregateVersion":4,"validFrom":"2017-09-03T00:00:00.000+02:00","commandId":"0cfa7da0-7895-4917-89ac-24ffa3abb87c","newDateDeValeur":"2017-09-03T00:00:00.000+02:00","eventUniqueIdentifier":{"streamName":"umbrella-54576b92-0234-4ec1-8eee-142375c53325","eventVersion":0},"aggregateId":"54576b92-0234-4ec1-8eee-142375c53325"}
According to RFC3986 both colon ':' and '+' is legal char in a URL. Does anyone have an idea on this?
Ok it turns out URL and URI have different standard
the URL standard is here RFC1738: Uniform Resource Locators (URL). So according to the doc, ':' is reserved for scheme.
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "#", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
and when it goes to +:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

How do I encode a : in a url?

I need to send a get request where the last part of the url is a json value. I have encoded the following {"period":"600s"} to use on multiple different sites, however they all come up with the same result where the : is not decoded.
The encoded url: stickiness=%7B%22period%22%3A%22600s%22%7D.
Its result when I enter it into my browser:
So how do I encode a :?
%3A is the encoding of :. : is reserved in URIs for designating the port number (e.g. google.com:443 manually specifies to use port 443, the default HTTPS port). If you want to include a : in a URI, it must be precent-sign-encoded, which is what the %3A is. It can't be decoded in the URL bar because it would violate the reserved purpose of the : character.
The colon character is not decoded in the browser as it belong to the reserved characters that already have an explicit meaning in URLs elsewhere - in this case separating the protocol from the hostname and the port after the hostname.
The relevant standard is RFC 1738, page 3:
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "#", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
Usually a URL has the same interpretation when an octet is
represented by a character and when it encoded. However, this is not
true for reserved characters: encoding a character reserved for a
particular scheme may change the semantics of a URL.
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Add TM symbol in the subject of the email

I need to show TM symbol in the subject of an email. I am adding it in the body of the mail using ™ as we can send the email as text/html encoding. But, I need to add this in the subject line also.
I guess this is not possible since we cannot use HTML in subject line of the email. But I want to make sure of that or maybe some other way of doing it. I am using Rails 3.
i don't know ruby, but generally, you would encode the subject like this :
=?utf-8?q?hello_world=E2=84=A2?=
=> this produces
"hello world™"
=E2=84=A2 is the quoted printable representation of the trademark symbol in utf-8
see: http://en.wikipedia.org/wiki/MIME#Encoded-Word
Add this Unicode Character trademark symbol \u2122 in from address for ex: "MakeMyShop\u2122 Offers" and in email subject like "Welcome to MakeMyShop.IN\u2122!"

Special characters in Amazon S3 file name

Users are uploading files with names like "abc #1", "abc #2". I am uploading these files to S3. When I try to download these files I get an error like this
InvalidArgument
Header value contained an open quoted span.
I am creating the link by wrapping the file name using "Uri.EscapeUriString".
Any suggestions?
From AWS documentation:
The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long.
So the "abc #1" and "abc #2" are valid key names, the problem is then probably in your client code, check the documentation of your Http client.
AWS also warn about using special characters:
You can use any UTF-8 character in an object key name. However, using certain characters in key names may cause problems with some applications and protocols. The following guidelines help you maximize compliance with DNS, web-safe characters, XML parsers, and other APIs.
Alphanumeric characters: 0-9, a-z, A-Z
Special characters: !, -, _, ., *, ', (, )
So either restrict the set of available characters in your app to only allow the recommended ones, or fix the issue at your client level.
You should use Uri.EscapeDataString instead of Uri.EscapeUriString for 3 reasons:
Uri.EscapeUriString has been deprecated as of .NET 6 - https://learn.microsoft.com/en-us/dotnet/api/system.uri.escapeuristring?view=net-6.0
Uri.EscapeUriString can corrupt the Uri string in some cases
Uri.EscapeUriString only escapes the spaces - not the #
Uri.EscapeUriString("abc #1") returns "abc%20#1" whereas Uri.EscapeDataString("abc #1") returns "abc%20%231" which is preferable.