Does DKIM header canonicalization insert a space after each comma in the To: header? - zend-mail

Using Zend Framework 1.12, PHP 5.3.1. Specifying multiple mail recipients to Zend_Mail and using Smtp transport, the "To:" header contains a list of recipients separated by commas. E.g.
To: a#example.com,b#example.com
This appears to be correct syntax according to RFC 2822.
I've added DKIM signing (using https://github.com/louisameline/php-mail-signature), which works fine, producing passing signatures accepted by gmail and other verifiers with a single To: recipients. But for multiple recipents, the signature fails.
Sending signed mail to check-auth#verifier.port25.com returns an email to the Return-path containing, among results of various checks, the canonicalized version of headers and body that it computed. I see that it has added a space after each comma when canonicalizing the To: header (I'm specifying relaxed canonicalization for both headers and body). And indeed I hacked my signature generation to add that space when canonicalizing the To: header, and that solved the problem: the port25.com verifier, gmail and others now pass the signature.
But in any description I've seen of DKIM relaxed canonicalization, including a pretty careful reading of section 3.4 of RFC 6376, as well as the code in the signature class I downloaded from github referenced above, there is nothing that adds whitespace where none was present in the original; the specified canonicalizations are all about changing existing whitespace.
So it seems to me that the canonicalization being done by the port25 verifier is at odds with RFC 6376 - except that other DKIM verifiers like gmail, autorespond+dkim-relaxed#dk.elandsys.com, and http://www.appmaildev.com/en/dkim/ all agree on the final result (they don't show the canonicalizations that they performed like port25 does, but if I don't canonicalize with the space after the commas, they reject the signature).
Does anyone here have any insight into this? Since practice in the field appears to differ consistently with the RFC dated 2011, shouldn't the RFC be updated? And of course there is also the question of whether all commas in the To: header value should be canonicalized to comma-space before WSP is replaced by a single space, and whether any other headers are affected.
Summary of final resolution
Evan's answer was exactly right regarding the OP: my MTA was adding
the space, not the validator's canonicalization. I just had not
looked carefully enough at the To: header that actually had been
received.
But knowing that doesn't yield a completely trivial solution to the
problem of generating correct signatures. The "root" problem is that
Zend_Mail generates the address-list values in headers using a comma
without a following space as a separator - although it's perfectly
valid syntax, it's also perfectly valid for an MTA to introduce a
space after each one. And the relaxed canonicalization specified by
RFC 6376 and widely implemented does not accommodate such an MTA
rewrite. Changing the Zend_Mail code to use comma-space as the
separator would be trivial, except that it is done in the middle of a
fairly long and complex method that would involve major copy-paste to
override, which felt wrong. So what I ended up doing was to write a
pre-filter that puts the space in the headers before signing them.
Although it's not 100% perfect wrt all possible syntax variations
including comments, applying the following preg_replace to the values
of To:, Cc:, and Reply-To: headers does the trick for me:
preg_replace('/(#[a-z0-9]+(\.[a-z0-9-]+)*>?,)([^ ])/i', '\1 \3', $header_value)

What is your MTA, under which platform ?
I believe it may be the one modifying your headers, not the recipient applications.
Note : I am the maintainer of the library you mentioned.

Related

With REST, do you use a body or query params when creating a Resource?

I'm working on a digital media library where users can create entries for a Media resource.
The media resource is made up of tons of properties, eg:
Media:{
id,
name,
type,
private,
}
the url users use to create a resource is
POST api/media
On the backend, we are creating the resource with a UID generated for them while defaulting name, type and private values. However, users can pass in name, type, private if they choose to.
RFC 4.3.3 doesn't seem to have an opinion on whether or not to use params or post body for these data.
So is it better to do this
api/media?type="audio"&name="Hopkins County Collective"&private=false
or with a body instead?
api/media
body{
name:
type:
private:
}
Althought after reading section 4.3.3 for POST here https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3 and I see this piece
Providing a block of data, such as the fields entered into an HTML
form, to a data-handling process;
I'm leaning toward the post fields in the body but I'm still unsure.
Thanks
do you use a body or query params when creating a Resource?
The Body. But it can be more complicated than that.
HTTP gives us standardized message semantics - we all agree, by adopting the common standard, what a given message means. That doesn't necessarily constraint what we do with the message when we get it.
For example.
PUT /id=1 HTTP/?.?
Content-Type: text/plain
id=2
That message means that we want the resource identified by /id=1 to have the representation id=2. In other words, this is the future behavior intended by the client
GET /id=1 HTTP/?.?
200 OK
Content-Location: /id=1
Content-Type: text/plain
id=2
So the body describes what we want the representation to be, and the effective-uri identifies which document we are talking about.
The same basic pattern holds for POST and PATCH - the effective-uri tells us which resource we want to change, the body describes that change.
BUT...
You the server aren't actually required to do what the request asks you to do. You can reject the request (4xx), or you can do something similar to the request, and tell the client about that.
So you might, as part of the implementation hidden behind your REST facade, copy information from the effective-uri in addition to, or instead of, exactly applying the instructions provided by the client in the body of the request. (You have to be a little bit careful with the response metadata to ensure there's no ambiguity about what you did do).
Anecdotally, "just about everyone" seems to be using the body to represent what they want the created resource to look like, be, or contain.
Parameters are often not likely to be used at all, and if they are, only for, perhaps, controlling aspects of how that resource is to be created, not anything having to do with what the resource is to look like, be, or contain.
I say anecdotal, because I'm sure there are exceptions to this -- you're even contemplating it. That said, REST does not specifically say anything about parameters vs. body.
For the sake of conformity, and for the sake of "doing it like everyone else", go with body.
There are other considerations pointing away from parameters: 1) they are part of the URI, and URIs are used for identification purposes, 2) the query string length is highly constrained, so would prevent creating large objects, and 3) it would be a diagnostics/debugging nightmare parsing the query string in your head trying to make sense of it.

Which of the following are valid URI's(Uniform Resource Identifier) as per the REST API specifications?

How to identify which of the following Uniform Resource Identifier(URI) is valid as per REST API specifications.
Choose one or more options
1. POST https://api.example.com/whales/create/9xf3df
2. PUT https://api.example.com/whales/9xf3df
3. GET https://api.example.com/whales/9xf3df?sort=name&valid=true
4. DELETE https://api.example.com/whales
REST doesn't care what spelling conventions you use for your resource identifiers; anything that conforms to the production rules defined by RFC 3986 is fine.
/whales/create/9xf3df
/whales/9xf3df
/whales/9xf3df?sort=name&valid=true
/whales
/0cc846bb-678d-45d8-9c06-d9cf94cee0a5
/9xf3df/whales
These are all fine identifiers.
Identifiers for a "REST API" are exactly like identifiers for web pages - you can use any spelling you want, and the browsers, caches, web crawlers, and so on will work with them quite happily; these general purpose components treat identifiers like identifiers - they don't try to extract any meaning from them.
By way of demonstration, please observe that all of the following work exactly the way you would expect them to:
https://www.merriam-webster.com/dictionary/create
https://www.merriam-webster.com/dictionary/get
https://www.merriam-webster.com/dictionary/put
https://www.merriam-webster.com/dictionary/post
https://www.merriam-webster.com/dictionary/patch
https://www.merriam-webster.com/dictionary/delete
Does REST care about the POST, PUT, GET, and DELETE for the above options?
Hard to be sure which question you are asking here.
PUT /dictionary/delete HTTP/1.1
That's a perfectly satisfactory request-line, and there is no ambiguity about what it means. In this example, PUT is the method-token; that tells the server that we are requesting that the representation of the target resource (identified by /dictionary/delete) be replaced by the representation include in the message-body of the request
For this specific resource, that probably means that the message-body is an HTML document (we'd see Content-Type: text/html in the headers, to ensure that the server knows how to correctly interpret the bytes provided).
PATCH /dictionary/delete HTTP/1.1
This is also a satisfactory request line; we are again requesting a change to the representation of the /dictionary/delete resource, but we're going about it in a slightly different way - instead of including a replacement representation in the message body, we're providing a representation of a list of changes to make (aka a "patch document").
Uniform interface means that we should expect the folks at www.merriam-webster.com to understand these messages exactly as we've described them here.
Now, for these specific resources, they probably don't want random stack-overflow members making changes to their website, so they are likely to respond 403 Forbidden or 405 Method Not Allowed.
All of the general purpose components will understand what that means, again because the standardized response meta-data is common to all resources.

Modsecurity create config file with rules for specific URL

I'm starting to learn about ModSecurity and rule creation so say I know a page in an web app is vulnerable to cross site scripting. For argument's sake lets say page /blah.html is prone to XSS.
Would the rule in question look something like this?:
SecRule ARGS|REQUEST_URI "blah" REQUEST_HEADERS “#rx <script>” id:101,msg: ‘XSS Attack’,severity:ERROR,deny,status:404
Is it possible to create a config file for that particular page (or even wise to do so?) or better said is it possible to create rules for particular URL's?
Not quite right as a few things wrong with this rule as it's written now. But think you get the general concept.
To explain what's wrong with the rule as it currently stands takes a fair bit of explanation:
First up, ModSecurity syntax for defining rules is made up of several parts:
SecRule
field or fields to check
values to check those fields for
actions to take
You have two sets of parts 2 & 3 which is not valid. If you want to check 2 things you need to chain two rules together, which I'll show an example of later.
Next up you are checking the Request Headers for the script tag. This is not where most XSS attacks exist and you should be checking the arguments instead - though see below for further discussion on XSS.
Also checking for <script> is not very thorough. It would easily be defeated by <script type="text/javascript"> for example. Or <SCRIPT> (should add a t:lowercase to ignore case). Or by escaping characters that might be processed by your parts of your application.
Moving on, there is no need to specify the #rx operator as that's implied if no other operator is given. While no harm in being explicitly it's a bit odd you didn't give it for blah but did for the <script> bit.
It's also advisable to specify a phase rather than use default (usually phase 2). In this case you'd want phase 2 which is when all Request Headers and Request Body pieces are available for processing, so default is probably correct but best to be explicit.
And finally 404 is "page not found" response code. A 500 or 503 might be a better response here.
So your rule would be better rewritten as:
SecRule REQUEST_URI "/blah.html" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,chain
SecRule ARGS "<script" "t:lowercase"
Though this still doesn't address all the ways that the basic check you are doing for a script tag can be worked around, as mentioned above. The OWASP Core Rule Set is a free ModSecurity set of rules that has been built up over time and has some XSS rules in it that you can check out. Though be warned some of their regexprs get quite complex to follow!
ModSecurity also works better as a more generic check, so it's unusual to want to protect just one page like this (though not completely unheard of). If you know one page is vulnerable to a particular attack then it's often better to code that page, or how it's processed, to fix the vulnerability, rather than using ModSecurity to handle it. Fix the problem at source rather than patching round it, is always a good mantra to follow where possible. You would do that by sanitising and escaping input HTML code from inputs for example.
Saying that it is often a good short term solution to use a ModSecurity rule to get a quick fix in while you work on the more correct longer term fix - this is called "virtual patching". Though sometimes they have a tendency to become the long term solutions then as no one gets time to fix the core problem.
If however you wanted a more generic check for "script" in any arguments for any page, then that's what ModSecurity is more often used for. This helps add extra protection on what already should be there in a properly coded app, as a back up and extra line of defence. For example in case someone forgets to protect one page out of many by sanitising user inputs.
So it might be best dropping the first part of this rule and just having the following to check all pages:
SecRule ARGS "<script" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,"t:lowercase"
Finally XSS is quite a complex subject. Here I've assumed you want to check parameters sent when requesting a page. So if it uses user input to construct the page and displays the input, then you want to protect that - known as "reflective XSS." There are other XSS vulnerabilities though. For example:
If bad data is stored in a database and used to construct a page. Known as "stored XSS". To address this you might want to check the results returned from the page in the RESPONSE_BODY parameter in phase 4, rather than the inputs sent to the page in the ARGS parameter in phase 2. Though processing response pages is typically slower and more resource intensive compared to requests which are usually much smaller.
You might be able to execute a XSS without going through your server e.g. If loading external content like a third party commenting system. Or page is served over http and manipulated between server and cling. This is known as "DOM-based XSS" and as ModSecurity is on your server then it cannot protect against these types of XSS attacks.
Gone into quite a lot of detail there but hope that helps explain a little more. I found the ModSecurity Handbook the best resource for teaching yourself ModSecurity despite its age.

Content-Range header - allowed units?

This is related to:
How should I implement a COUNT verb in my RESTful web service? , Paging in a Rest Collection
and Using the HTTP Range Header with a range specifier other than bytes?
Actually I think the -1 rated anwser here is correct https://stackoverflow.com/a/1434701/1237617
Generally anwsers say that you can use custom units citing the sec 3.12
range-unit = bytes-unit | other-range-unit
bytes-unit = "bytes"
other-range-unit = token
However when you read the HTTP spec please notice the production rules are thus:
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
The header spec only references bytes-unit from sec 3.12, not range-units, so I think that actually it's against the spec to use custom units here.
Am I missing something or is the popular anwser wrong?
EDIT: Since this probbably isn't clear, the gist of my question is:
rfc2616 sec14.16 only references bytes-unit. It never mentions range-unit, so range-unit production is not relevant for Content-Range, and thus only byte-units can be used.
I think this adresses my concerns best, although I needed some time to understand it (plus I wanted to make sure, that there is something wrong with the wording).
This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests
thanks to elgaton
The spec, as being revised, allows custom range units. See HTTPbis Part 5, Section 2.
If you read the HTTP/1.1 RFC, section 3.12, you will see that:
The only range unit defined by HTTP/1.1 is "bytes". HTTP/1.1 implementations MAY ignore ranges specified using other units.
So, the other-range-unit token has been introduced only to make servers more "liberal" when accepting. This reflects the fact that, apparently, the first set of grammar rules has been specifically made for parsing and the second one for producing HTTP requests, so that servers could accept even invalid requests (they will be simply ignored) and clients would use only the universally-accepted bytes unit.
Therefore, I personally recommend to:
use only the bytes unit when acting as a client, and
accept other units (discarding the Content-Range header if they are invalid) when acting as a server.
This is a purely personal opinion, but I think it is fairly consistent with how other HTTP extensions (custom methods or headers) are used. Here is how I read it: Yes I can use custom range units and no, I shouldn't submit a bug report when it gets ignored when passing through firewalls, web proxies, and other intermediaries. I conform to the HTTP spec when I'm sending it and they conform to HTTP when they ignore it. WebDAV uses HTTP extensions correctly, IMO, but rarely works over the Internet for exactly this reason. As I said, a personal opinion only.
Apparently it's OK to use custom units, because:
This reflects the fact that, apparently, the first set of grammar
rules has been specifically made for parsing and the second one for
producing HTTP requests

Why would Apache be URL decoding my query string?

My Web host has refused to help me with this, so I'm coming to the wise folks here for some help "black-box debugging". Here's an edited version of what I sent to them:
I have two (among other) domains at dreamhost:
1) thefigtrees.net
2) shouldivoteformccain.com
I noticed today that when I host a CGI script on #1, that by the time the
CGI script runs, the HTTP GET query string passed to it as the QUERY_STRING
environment variable has already been URL decoded. This is a problem because
it then means that a standard CGI library (such as perl's CGI.pm) will try to
split on ampersands and then decode the string itself. There are two
potential problems with this:
1) the string is doubly-decoded, so if a value is submitted to the script
such as "%2525", it will end up being treated as just "%" (decoded twice)
rather than "%25" (decoded once)
2) (more common) if there is an ampersand in a value submitted, then it
will get (properly) submitted as %26, but the QUERY_STRING env. variable will
have it already decoded into an "&" and then the CGI library will improperly
split the query string at that ampersand. This is a big problem!
The script at http://thefigtrees.net/test.cgi demonstrates this. It echoes back the
environment variables it is called with. Navigating in a browser to:
http://thefigtrees.net/lee/test.cgi?x=y%26z
You can see that REQUEST_URI properly contains x=y%26z (unencoded) but that
QUERY_STRING already has it decoded to x=y&z.
If I repeat the test at domain #2 (
http://www.shouldivoteformccain.com/test.cgi?x=y%26z ) I see that the
QUERY_STRING remains undecoded, so that CGI.pm then splits and decodes
correctly.
I tried disabling my .htaccess files on both to make sure that was not the
problem, and saw no difference.
Could anyone speculate on potential causes of this, since my Web host seems unwilling to help me?
thanks,
Lee
I have the same behavior in Apache.
I believe mod_rewrite will automatically decode the URL if it is installed, however, I have seen the auto-decode behavior even without it. I haven't tracked down the other culprit.
A common workaround is to double encode the input parameter (taking advantage of URL decoding being safe when called on an unencoded URL).
Curious. Nothing I can see from here would give us a clue why this would happen... I can only confirm that it is an environment bug and suspect maybe configuration differences like maybe rewrite rules.
Per CGI 1.1, this decoding should only happen to SCRIPT-NAME and PATH-INFO, not QUERY-STRING. It's pointless and annoying that it happens at all, but that's the spec. Using REQUEST-URI instead of those variables where available (ie. Apache) is a common workaround for places where you want to put out-of-bounds and Unicode characters in path parts, so it might be reasonable to do the same for query strings until some sort of resolution is available from the host.
VPSs are cheap these days...