Safari shows an error when IPB redirects it to an https:// URL with language accents - ssl

My https switch is almost complete. Everything works flawlessly on every browser except Safari (both iOS and OSX).
Some URLs contain Polish characters, such as żźćąłóń. Apparently in some cases Invision Power Board encodes them, so for example ó becomes %F3.
Example 1
HTTP version works: http://net4game.com/topic/256549-rekrutujemy-gamemaster%F3w/
HTTPS with a special UTF8 character works as well: https://net4game.com/topic/256549-rekrutujemy-gamemasterów/
Does not work when ó is encoded: https://net4game.com/topic/256549-rekrutujemy-gamemaster%F3w/
[29/Aug/2015:17:19:56 +0200] "GET /topic/256549-rekrutujemy-gamemaster%F3w/ HTTP/1.1" 301 31 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9"
Example 2
HTTP works: http://net4game.com/topic/256786-bug-ślub/
HTTPS works: https://net4game.com/topic/256786-bug-ślub/
HTTPS, UTF8, with a redirect, does not work: https://net4game.com/topic/256786-bug-ślub/?view=getlastpost
I'm using nginx behind CloudFlare. Every request goes to index.php.
What could be wrong? Why does this problem occur only on HTTPS? It seems Safari doesn't rewrite %F3 to ó over encrypted connections, but I'm not sure it's relevant, as the second example seems to stick to the unencoded ś.
Cheers.

Related

AEM infinite redirect loop after Author login via https

I have an AEM author that sits behind a dispatcher, my apache server is configured to use a cert/key for SSL encryption. When I go to my project's https url which goes to the Author instance via the Dispatcher instance I am prompted with the usual Author login page. If I type in my credentials and click submit I then get stuck in a infinite redirect loop.
The first thing I did to troubleshoot this issue was to analyse my apache access logs to see if any 301s or 302s were occurring there, but after examination I see nothing of the sort, all I see are 200s.
"GET /libs/granite/csrf/token.json HTTP/1.1" 200 123 "https://bla.bla.bla/libs/granite/core/content/login.html?resource=%2F&$$login$$=%24%24login%24%24&j_reason=unknown&j_reason_code=unknown" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
I then went back to my browser and tried this again and dug into my browser developer tools. From this it seems like AEM Author after authentication uses javascript to redirect you / => /index.html => /aem/start.html.
I then replaced my correct cert/key in my apache configuration with one that did not correspond to my domain (which renders it invalid). After a restart of apache I visited my https url (accepted the browser https exception) and this time when I was prompted with the Author login page I was able to input my credentials and login without an infinite redirect loop, normal Author behavior incurs.
From this troubleshooting I believe something involving the cert/key setup perhaps HSTS is creating this infinite redirect loop but I can't figure out what exactly is happening.
Any input on this issue is welcomed.
Thanks
When I look at the headers that were being set on that request for authentication I noticed the header Cache-Control: max-age:=3600. Digging more into my apache configurations I noticed that I was setting expire headers ExpiresDefault "access plus 1 hour". Once I removed that from the Author's Vhost configuration the infinite redirect loop went away.

Random chars appearing in Apache access logs

We are seeing random letters appear in access logs. The requests 404 since the content does not exist. The requests are made by a variety of users and other requests from the same ip usually look genuine. There is no way to request these from the site. Some of these requests even appear from internal traffic on our network.
Example:
157.203.177.191 - - [04/Feb/2018:23:51:20 +0000] "GET /VLTRP/content/dam/example/dotcom/images/ABtest/existing-customer-thumb.jpg HTTP/1.1" 404 60294 39082 "http://www.example.com/shop.html" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0" 2
Without the /VLTRP this is a genuine request. Has anyone seen something similar before?
For info we are running Apache/2.2.15 (Unix) with ModSec enabled. We do see similar behaviour on another site where we do not have ModSec configured. We see similar requests for internal, external and bot traffic.

Finding 301 Redirects which are working on my site

I have reached a fair number of redirects in the .htaccess of web site (around 700) due to software upgrades. I think about half of these have been now indexed by Google. How can I find the list of redirects which are currently being used ?
My idea is to find all "301" in the Apache Logs, such as this:
1.235.117.180 - - [01/Aug/2014:06:41:59 +0200] "GET /components/com_acesearch/assets/css/acesearch.css HTTP/1.1" 301 626
"http://example.com/link1/link2/page-2" "Mozilla/5.0 (Windows NT 6.1;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125
Safari/537.36"
Is it safe to assume that all Redirects which are not listed like the above one are not being used (so I can remove them?)
Thanks
No, that is not safe, do not rely only on the apache logs. Some old links might still be in the index and can be crawled later on.
Can't you optimize your redirects? Can you give an example of some of the redirects? Isn't there a pattern? With regular expressions, you can rewrite your files quite effectively if you can find a sort of pattern (or a couple of them).
There are more search engines than Google alone. If it is important that everything keeps indexed, I would keep the redirects, but find the pattern and schrink the number of redirects to max. 10 or something.

Fixing mistakes reading logs

I have huge 1 GB log file. As I know, it shows errors in my site. But I absolutely don't get it.
I have lots of rows like this:
8x.xxx.45.10x (my ip) - - [04/Feb/2011:09:59:48 -0500] "GET /post?slaps=bbrfd HTTP/1.1" 404 278 "http://mywebsite.com/" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.86 Safari/534.13"
What does it mean?
Thank you very much.
That entry indicates that a request for /post?slaps=bbrfd on your site was not found (404). The request came from your IP, transferred 278 bytes of data (the 404 error page's contents). The link that couldn't be found was clicked on mywebsite.com, and the rest is how the browser identified itself. The two dashes are for "remote username", and "username as logged into the site". The remote username is VERY rarely present, as it requires the remote site running identd and would slow down your site massively.
Looks like an access log file from Apache. Nothing to do with PHP or MySQL. Looks the user got a 404 page when trying to access /post?slaps=bbrfd
This would suggest the URL does not exist.

jboss url decoding

We have a servlet hosted on jboss which works on HttpServletRequest. But sometimes we receieve requests that do not get decoded by jboss, and when we do getQueryParam on HttpServletRequest, we get null. The jboss access log shows the url in encoded form. Normally, when everything works smooth, url is shown decoded in access log.
e.g.:
This was a problematic request:
127.0.0.1 [13/Apr/2009:14:18:53 +0000] GET /redirectService//%3Fclient_id=3&redirect_url=http%253A%252F%252Fwww.amazon.de%252Fgp%252Fsearch%253Fie%253DUTF8%2526keywords%253DMicrosoft+Office+2007%2526search-alias%253Dsoftware%2526 HTTP/1.1 'null' 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12)'
This was a proper request:
127.0.0.1 [13/Apr/2009:14:19:37 +0000] GET /redirectService//?client_id=3&redirect_url=http%3A%2F%2Fwww.amazon.de%2Fgp%2Fsearch%3Fie%3DUTF8%26keywords%3DMAGIX+Video+deluxe+2008%26search-alias%3Dsoftware%26 HTTP/1.1 'http://www.google.de/search?hl=de&q=magix+video+deluxe+2008&meta=&aq=3&oq=%22magix%22' 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; .NET CLR 1.1.4322)'
Could we be missing some jboss decode settings, or is it just a case of malicious user?
Hard to tell, really.
The client seems to be decoding the question mark into "%3F" but not the ampersand. Suspicious, isn't it?. This looks like a buggy client IMO. Maybe nonportable javascript, maybe some URL-rewriting bug on the web server side, or a more esoteric cause ... a malfunctioning browser plugin.
To rule out nonportable javascript, log the user-agent and compare results. To rule out url-rewriting bug, log referer.
AFAIK, the URL decoder behavior is hardcoded. The string encoding can change if uri's get written in non-ascii or non-iso88591, but that's not what you're after. What encodes question marks but fail to encode ampersands escapes me.
We logged the user-agent, it is some suspicious "XXXagentXXX" in most cases, but a genuine Mozilla (as above) in others. Referrer is "-" for all these requests. However, there is one curious thing I noticed today. We redirect our requests from apache (80) to jboss. Apache access log shows above request as completely encoded:
GET /r/%3Fclient_id%3D3%26redirect_url%3Dhttp%253A%252F%252Fwww.amazon.de%252Fgp%252Fsearch%253Fie%253DUTF8%2526keywords%253DCyberlink%2BPower%2BDirector%2526search-alias%253Dsoftware HTTP/1.0" 400 965 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10)"
while jboss access log has everything except %3F decoded. Now this makes me think apache is screwing up somewhere in the decoding?
I had problem decoding URL too with JBoss 13.
I added the last line in JBoss configuration and it works now.
/subsystem=undertow/servlet-container=default:write-attribute(name=default-encoding,value="ISO-8859-15")
/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=url-charset,value="ISO-8859-15")
Doc is here if more needed : https://wildscribe.github.io/WildFly/13.0/subsystem/undertow/server/http-listener/index.html