How do I encode a : in a url? - urlencode

I need to send a get request where the last part of the url is a json value. I have encoded the following {"period":"600s"} to use on multiple different sites, however they all come up with the same result where the : is not decoded.
The encoded url: stickiness=%7B%22period%22%3A%22600s%22%7D.
Its result when I enter it into my browser:
So how do I encode a :?

%3A is the encoding of :. : is reserved in URIs for designating the port number (e.g. google.com:443 manually specifies to use port 443, the default HTTPS port). If you want to include a : in a URI, it must be precent-sign-encoded, which is what the %3A is. It can't be decoded in the URL bar because it would violate the reserved purpose of the : character.

The colon character is not decoded in the browser as it belong to the reserved characters that already have an explicit meaning in URLs elsewhere - in this case separating the protocol from the hostname and the port after the hostname.
The relevant standard is RFC 1738, page 3:
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "#", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
Usually a URL has the same interpretation when an octet is
represented by a character and when it encoded. However, this is not
true for reserved characters: encoding a character reserved for a
particular scheme may change the semantics of a URL.
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Related

Validating URLs with non traditional characters

I have a URL with special characters, example:
$url = 'https://example.com/c/ファンタシ.jpg';
I can't validate it with:
var_dump( (filter_var($url, FILTER_VALIDATE_URL)) );
Because it isn't a valid URL according to the RFC:
"Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'(),"
[not including the quotes - ed], and reserved characters used for
their reserved purposes may be used unencoded within a URL"
And I can't do:
urlencode($url);
Because that will encode the entire string with encoded slashes, colon etc.
So how do I encode the $url properly so it passed the validation?

AspNet core Url decoding

I am using AspNetCore 2.1.
I encountered an issue to deserialize a portion of URL:
http://localhost:55381/api/Umbrellas/cc1892b0-b790-4698-ae3e-07bee39fd29b/ModeOperationnelWithAppliedEvents?dateDeValeur=2018-09-01T02:00:00.000+02:00
the part "2018-09-01T02:00:00.000+02:00" is expected to be deserialized as DateTimeOffset. But it failed to do it. A default(DateTimeOffset) is returned.
If I encode to this format "2018-09-01T02%3A00%3A00.000%2B02%3A00" => correctly deserialized.
When it is enclosed in URL, that does not work.
In the contrarily, when the same format is enclosed in the body of message, it is correctly deserialized.
{"lastKnownAggregateVersion":4,"validFrom":"2017-09-03T00:00:00.000+02:00","commandId":"0cfa7da0-7895-4917-89ac-24ffa3abb87c","newDateDeValeur":"2017-09-03T00:00:00.000+02:00","eventUniqueIdentifier":{"streamName":"umbrella-54576b92-0234-4ec1-8eee-142375c53325","eventVersion":0},"aggregateId":"54576b92-0234-4ec1-8eee-142375c53325"}
According to RFC3986 both colon ':' and '+' is legal char in a URL. Does anyone have an idea on this?
Ok it turns out URL and URI have different standard
the URL standard is here RFC1738: Uniform Resource Locators (URL). So according to the doc, ':' is reserved for scheme.
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "#", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
and when it goes to +:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Whitespace encoding using stringByAddingPercentEscapesUsingEncoding

I am encoding white spaces in a string using
[#"iPhone Content.doc" stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]
in SKPSMTP message sending. But while receiving mail at attachments place I am getting the name iPhone%20Content.doc - instead of a space it shows %20. How can this be avoided / correctly encoded?
If you're doing stringByAddingPercentEscapesUsingEncoding then you're going to get percent signs in your result string... You can either use something different, or go back through and remove the percent signs later.
From the doc:
stringByAddingPercentEscapesUsingEncoding: Returns a representation of
the receiver using a given encoding to determine the percent escapes
necessary to convert the receiver into a legal URL string.
aka, "this method adds percent signs". If you want to reverse this process, use stringByReplacingPercentEscapesUsingEncoding
Just a side note, %20 is there because the hex representation of the space character is 20 and the % sign is an escape. You only need to do this for URLs, as they disallow the use of whitespace characters.
I got solution for my question. Actually am missed to set the "" to a string.
Of course the remote receiver can not accept the url with whitespace, so we must convert the URL address using the stringByAddingPercentEscapesUsingEncoding function.
This function replaces spaces in the URL expression with %20. It is especially useful when the URL contains non-ascii characters - you have use the function to percent-escape the URL so that the remote server can accept your request.

Special characters in Amazon S3 file name

Users are uploading files with names like "abc #1", "abc #2". I am uploading these files to S3. When I try to download these files I get an error like this
InvalidArgument
Header value contained an open quoted span.
I am creating the link by wrapping the file name using "Uri.EscapeUriString".
Any suggestions?
From AWS documentation:
The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long.
So the "abc #1" and "abc #2" are valid key names, the problem is then probably in your client code, check the documentation of your Http client.
AWS also warn about using special characters:
You can use any UTF-8 character in an object key name. However, using certain characters in key names may cause problems with some applications and protocols. The following guidelines help you maximize compliance with DNS, web-safe characters, XML parsers, and other APIs.
Alphanumeric characters: 0-9, a-z, A-Z
Special characters: !, -, _, ., *, ', (, )
So either restrict the set of available characters in your app to only allow the recommended ones, or fix the issue at your client level.
You should use Uri.EscapeDataString instead of Uri.EscapeUriString for 3 reasons:
Uri.EscapeUriString has been deprecated as of .NET 6 - https://learn.microsoft.com/en-us/dotnet/api/system.uri.escapeuristring?view=net-6.0
Uri.EscapeUriString can corrupt the Uri string in some cases
Uri.EscapeUriString only escapes the spaces - not the #
Uri.EscapeUriString("abc #1") returns "abc%20#1" whereas Uri.EscapeDataString("abc #1") returns "abc%20%231" which is preferable.

Are square brackets permitted in URLs?

Are square brackets in URLs allowed?
I noticed that Apache commons HttpClient (3.0.1) throws an IOException, wget and Firefox however accept square brackets.
URL example:
http://example.com/path/to/file[3].html
My HTTP client encounters such URLs but I'm not sure whether to patch the code or to throw an exception (as it actually should be).
RFC 3986 states
A host identified by an Internet
Protocol literal address, version 6
[RFC3513] or later, is distinguished
by enclosing the IP literal within
square brackets ("[" and "]"). This
is the only place where square bracket
characters are allowed in the URI
syntax.
So you should not be seeing such URI's in the wild in theory, as they should arrive encoded.
Square brackets [ and ] in URLs are not often supported.
Replace them by %5B and %5D:
Using a command line, the following example is based on bash and sed:
url='http://example.com?day=[0-3][0-9]'
encoded_url="$( sed 's/\[/%5B/g;s/]/%5D/g' <<< "$url")"
Using Java URLEncoder.encode(String s, String enc)
Using PHP rawurlencode() or urlencode()
<?php
echo '<a href="http://example.com/day/',
rawurlencode('[0-3][0-9]'), '">';
?>
output:
<a href="http://example.com/day/%5B0-3%5D%5B0-9%5D">
or:
<?php
$query_string = 'day=' . urlencode('[0-3][0-9]') .
'&month=' . urlencode('[0-1][0-9]');
echo '<a href="http://example.com?',
htmlentities($query_string), '">';
?>
Using your favorite programming language... Please extend this answer by posting a comment or editing directly this answer to add the function you use from your programming language ;-)
For more details, see the RFC 3986 specifying the URL syntax. The Appendix A is about %-encoding in the query string (brackets as belonging to “gen-delims” to be %-encoded).
I know this question is a bit old, but I just wanted to note that PHP uses brackets to pass arrays in a URL.
http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3
In this case $_GET['bar'] will contain array(1, 2, 3).
Pretty much the only characters not allowed in pathnames are # and ? as they signify the end of the path.
The uri rfc will have the definative answer:
http://www.ietf.org/rfc/rfc1738.txt
Unsafe:
Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".
All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.
The answer is that they should be hex encoded, but knowing postel's law, most things will accept them verbatim.
Any browser or web-enabled software that accepts URLs and is not throwing an exception when special characters are introduced is almost guaranteed to be encoding the special characters behind the scenes. Curly brackets, square brackets, spaces, etc all have special encoded ways of representing them so as not to produce conflicts. As per the previous answers, the safest way to deal with these is to URL-encode them before handing them off to something that will try to resolve the URL.
For using the HttpClient commons class, you want to look into the org.apache.commons.httpclient.util.URIUtil class, specifically the encode() method. Use it to URI-encode the URL before trying to fetch it.
StackOverflow seems to not encode them:
https://stackoverflow.com/search?q=square+brackets+[url]
Best to URL encode those, as they are clearly not supported in all web servers. Sometimes, even when there is a standard, not everyone follows it.
According to the URL specification, the square brackets are not valid URL characters.
Here's the relevant snippets:
The "national" and "punctuation" characters do not appear in any
productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | ^ | ~
punctuation < | >
Square brackets are considered unsafe, but majority of browsers will parse those correctly. Having said that it is better to replace square brackets with some other characters.