nginx - proxy facebook/bots to a different server without changing canonical URL - vue.js

TLDR;
How can I make it so that all scraper/bot requests reaching my frontend https://frontend.example.test/any/path/here are fed the data from https://backend.example.test/prerender/any/path/here without changing the canonical URL?
I have a complex situation where I have a Vue app that pulls data from a php API to render data. These are hosted in China so niceties like netlify prerender and prerender.io are not an option.
Initially I tried:
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
rewrite ^/(.*)$ https://backend.example.test/prerender/$1 redirect;
}
which workd but Facebook used backend.example.text the canonical URL frontend.example.test.
Setting the og:url to the frontend app caused problems due to a redirect loop. I tried then setting the og:url to the frontend with a query param that skipped the nginx forward, but for some reason this wasn't working properly on the live server and I imagine facebook would still end up pulling the data from the final url anyhow.
Thus I imagine the only solution is to use proxy_pass but it is not permitted with a URI inside an if statement (and I have read the if is evil article).
I feel like all I need is something like a functioning version of:
location / {
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
proxy_pass https://backend.example.test/prerender;
}
...
}
(I am of course aware of the contradiction of having to have Facebook sharing work in China, but the client is requesting this for their international users as well).

Here is the solution for your problem:
https://www.claudiokuenzler.com/blog/934/nginx-proper-way-different-upstream-user-agent-based-reverse-proxying-rule-without-if
I'm copying here the main parts in case the link breaks:
Create a dynamic target upstream with the map directive:
map "$http_user_agent" $targetupstream {
default http://127.0.0.1:8080;
"~^mybot" http://127.0.0.1:8090;
}
Here "~^mybot" is a regular expression, if the user agent matches that expression it will use that upstream server.
If the user-agent does not match any entries, Nginx will use the "default" entry (saving http://127.0.0.1:8080 as $targetupstream variable).
Then you just have to use a that upstream in a proxy pass setting:
location / {
include /etc/nginx/proxy.conf;
proxy_set_header X-Forwarded-Proto https;
proxy_pass $targetupstream;
}
Now, you could use one upstream pointing to locahost at a port that is being used by nginx to serve static files (for client only) and another port for the server renderer.

Related

How to fix incorrect nginx s3 reverse proxy paths?

I'm working on an nginx s3 reverse proxy container image to proxy frontend files (Angular apps) from s3 behind an Application Load Balancer. The frontend files are located in the specific folder of the given app name in the s3 bucket. These are angular apps which are built using standard angular commands. The dist contents are uploaded to s3 and then the ALB route paths, along with the nginx locations map to those app folders in s3. For example, here is my nginx conf file:
server {
listen 80;
listen 443 ssl;
ssl_certificate /etc/ssl/nginx-server.crt;
ssl_certificate_key /etc/ssl/nginx-server.key;
server_name timemachine.com;
sendfile on;
default_type application/octet-stream;
resolver 8.8.8.8;
server_tokens off;
location ~ ^/app1/(.*) {
proxy_http_version 1.1;
proxy_buffering off;
proxy_ignore_headers "Set-Cookie";
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-meta-s3cmd-attrs;
proxy_hide_header Set-Cookie;
proxy_set_header Authorization "";
proxy_intercept_errors on;
rewrite ^/app1/?$ /app1/index.html;
proxy_pass https://<s3 bucket name here>;
break;
}
}
So there is a corresponding bucket folder /app1 in s3 which has the dist contents and is serving up the index.html. And on the ALB, there are two route paths. The first is /app1 which redirects to https:{port}//app1/ and then the second route path /app1/* which just forwards to the nginx reverse proxy container deployed via ECS Fargate.
This is not using cloudfront. The bucket is proxied internally on https and specific permissions are set on the bucket to be accessible w/in the given VPC.
The angular apps have specific modules, but the issue is since Im not saving any of this content in the container, I can't just do a try_files, or set an index to make this work, since all of this is proxied from s3 and the content is accessed differently.
I can access the app at with the given proxy configuration above, but for other paths, say when I navigate to the part of the apps where its /app1/account and then do a refresh, the page throws an access denied on the bucket and I just get the standard xml page in the browser.
How do I get this to work with all of those other relative paths without having to add each of those paths to nginx or the ALB routes? In other words, I dont want to have to add
location /app1/account {
}
and so on, or something like that. Yes, Im sort of new to nginx, so im still figuring things out.
I was expecting the above proxy to work with all paths on /app1 but im unsure what other route paths need to be added to the ALB or if the regex is off, or what else needs to be added to the nginx conf file.
All that to say, when I enter this
https://timemachine.com/app1
or this,
https://timemachine.com/app1/
both work and just rewrite to the index.html which is good.
After this, when I click on another icon in the UI that directs to another path on /app1/, I get directed to the page correctly at...
https://timemachine.com/app1/news
but then on a refresh on this path, instead of hitting url https://timemachine.com/app1/news, with all the data shown when I accessed this through UI, the url stays at https://timemachine.com/app1/news but the page defaults to s3 bucket access denied on that route(.xml).
The goal is just to be able to reload on the pages I can already access without the UI blowing up and defaulting to the access denied message. So I would like to be able to just enter https://timemachine.com/app1/news, which will display the content, then do a refresh and see the content again.
The are various modules within the angular apps and so these are relative paths, which may be part of the problem.
NOTE: All files, aside from assets folder, are in the base app1 bucket folder. So https://<s3_bucket_name>/app1 (with app1 being the folder).
Angular's docs indicate to use the Frontend Controller pattern for static files like so:
Use try_files, as described in Front Controller Pattern Web Apps,
modified to serve index.html:
try_files $uri $uri/ /index.html;
Obviously, that won't work here (since the files aren't local to nginx) so my understanding is we're looking for equivalent logic to that for when the files are hosted elsewhere.
Route not-assets to index.html
All assets are in the /assets/ folder - so the simplest solution is to look for anything starting with not-that and proxy those requests to the html file for the response:
server {
location ~ /app1/ {
rewrite ^/app1/(?!assets/) /index.html;
proxy_pass https://domain/bucket/app1/;
}
}
That regex means that:
/app1/assets/some.css gets proxied to https://domain/bucket/app1/assets/some.css
/app1/ gets proxied to https://domain/bucket/app1/index.html
/app1/something/else gets proxied to https://domain/bucket/app1/index.html
etc.
Do note that this is going to make your app respond HTTP 200 OK with html to almost any url - which may be confusing.
If there are any problems setting this up, enable the nginx debug log to see to what url requests are being proxied, and determine the difference from what's desired.

Nginx - proxy_pass to google storage bucket static page VueJS sub paths cause 404 error, VueJS router not kicking in

I'm hosting VueJS at google cloud storage bucket, app works only when using domain name without subpath: www.domain.com when using URL like: www.domain.com/sub/path I'm getting 404 error as it seem that NGINX is looking for this path in the bucket instead of let VueJS router take over.
I tried to follow older thread but in my case would not help.
Any ideas how to fix this?
location = / {
proxy_pass https://gcs/mygoogle-cloud-bucket/main.html;
proxy_set_header Host storage.googleapis.com;
}
location / {
rewrite /(.*) /$1 break;
proxy_pass https://gcs/mygoogle-cloud-bucket/$1$is_args$args;
proxy_redirect off;
index main.html;
proxy_set_header Host storage.googleapis.com;
}
It seems like what you need to do is to create a Static Website using Cloud Storage and VueJS.
With this bieng the case, there are a few things that needs to be clarified:
Cloud Storage doesn't support HTTPs, so yo uneed to use a Load Balancer.
Make sure the objects in your bucket are public.
Build the Vue project With Relative Path.
It is also recomended to set the special pages, but this is not necessary.
Set up your load balancer and the SSL certificate as it is mentioned here.
Configure routing rules.
Make sure you have connected your custom domain to your load balancer
This should get you going with your site. If you would like to check a worknig example, you can take a look at this one.
Your code should look something like:
location / {
rewrite /$ $uri$index_name;
proxy_set_header Host storage.googleapis.com;
proxy_pass https://gs/$bucket_name$uri;
proxy_http_version 1.1;
proxy_set_header Connection "";
}

How to rewrite nginx path from "http://domain/api/path" to "http://domain"

I'm configuring the nginx to redirect all incoming uris which start with
/api
to the backend, this means port 1000. The issue is that the frontend path
/api/path/
becomes the backend
/
but I need it to stay the same. For example when I navigate to the backend
http://domain/api/path
I have to actually navigate to
http://domain/api/path/api/path
. Is there a possibility to rewrite the expression in order to keep the same path? The actual configuration looks like this:
location /api/ui {
include proxy_params;
proxy_pass http://localhost:1000/;
}
Thank you!

Nginx reverse proxy didnt load site correctly

I have an nginx configuration which listens to any subdomain *.mydomain.com,
and I want to use subdomain as variable, to proxy request to other site.
Here is my nginx configuration
server {
listen 80;
server_name "~^(?<subdomain>.*).mydomain.com";
location / {
resolver 1.1.1.1 1.0.0.1 ipv6=off;
proxy_pass http://hosting.mydomain.com/$subdomain/;
proxy_redirect off;
access_log /var/log/nginx/proxy.log;
}
}
As I request the site directly and it loads perfectly
Site placed on AWS S3, and bucket static website address cnamed to mydomain
However, when I try to access via user1.mydomain.com, the page didn't load images, and css
This is the same site
And in browser network panel shows
Difference between direct and proxy access
This issue is made, because I have many sites stored in S3 bucket and located in different folders (the folder name is used as subdomain).
And I want to use a single domain to access all of them via subdomains.
Thanks in advance
You forgot to proxy pass the URI, you're serving user1/index.html for every request, including for JS and CSS requests, it's why all of responses are the same size (2kb, the size of user1/index.html), and it's also why you're getting Uncaught SyntaxError: Unexpected token < in the first line of Enterprise_skeleton.bundle.js because it's returning an HTML document that starts with <!doctype html> instead of the actual JS bundle.
Change
location / {
proxy_pass http://hosting.mydomain.com/$subdomain/;
}
to
location / {
proxy_pass http://hosting.mydomain.com/$subdomain$uri;
}

redirect the entire php url with params (apache/nginx)

1) How to make Apache to redirect the whole url with parameters, and make it visible to a client, for example:
when client comes to :
https://domain1.com/app/index.php?device_id=WeWeWe&ordna_ver=5.0&num=+1234567890
it redirects him to:
https://domain2.com/app/index.php?device_id=WeWeWe&ordna_ver=5.0&num=+1234567890
2) Also, how to make the same redirect but NOT visible to a client (he still see the URL from domain1.com while opening it from domain2.com) ?
3) And the third, how to make the same two things (redirects) with nginx ?
Thank you very much for your help.
In nginx, visible to the client:
server_name domain1.com;
return https://domain2.com$request_uri;
In nginx, hiding the redirect from being visible to the client:
server_name domain1.com;
location / {
proxy_pass https://domain2.com;
}
You might also want to use the optional module http://nginx.org/docs/http/ngx_http_sub_module.html#sub_filter (requires recompilation of nginx), if you want to make sure to replace any mention of domain2.com from the proxied web-page with domain1.com.
sub_filter "https://domain1.com" "https://domain2.com";
sub_filter_once off;