What I am trying to achieve is to have Apache's mod_proxy_balancer check if a request was already made using a Memcache store.
Basically:
Streaming media request comes in.
Check if streaming media has already been served with Memcache.
If so, can that streaming media server handle another request.
If so send request to said streaming media server.
If not send request to the next streaming media server in line.
Store key:value pair in Memcache.
My questions are:
Does mod_proxy_balancer already do this in some way?
Is there anyway to make Apache a content-aware load balancer?
Any other suggestions would be greatly appreciated too, other software, other approach, etc.
Cheers.
Looking at 'mod_proxy_balancer.c'; one could, as suggested in the comments in the file, add additional lbmethods. Something along the lines of "bymemcached_t" or "bymemcached_r" where the t and r endings denote the "bytraffic" and "byrequests" methods respectively. We would do our pseudo code above and if not found proceed to the other methods and save the result in the memcached store.
In my research I came across HAProxy which does exactly what I want from its documentation using the balance algorithm option of 'uri' just not using Memcached. Which is fine for my purposes.
Related
I am currently hosting the contents of a site with ProviderA. I have a domain registered with ProviderB. I want users to access the contents (www.providerA.com/sub/content) by visiting www.providerB.com. A domain forward is easy enough and works as intended, however, unless I embed the site in a frame (which is a big no-no), the actual URL reads www.providerA.com/sub/content despite the user inputting www.providerB.com.
I really need a solution for this. A domain masking without the use of a frame. I'm sure this has been done before. An .htaccess domain rewrite?
Your help would be hugely appreciated! I'm going nuts trying to find a solution.
For Apache
Usual way: setup mod_proxy. The apache on providerB becomes a client to providerA's apache. It gets the content and sends it back to the client.
But looks like you only have .htaccess. So no proxy, you need full configuration access for that.
So you cannot, see: How to set up proxy in .htaccess
If you have PHP on providerB
Setup a proxy written in PHP. All requests to providerB are intercepted by that PHP proxy. It gets the content from providerA and sends it back. So it does the same thing as the Apache module. However, depending on the quality of the implementation, it might fail on some requests, types, sizes, timeouts, ...
Search for "php proxy" on the web, you will see a couple available on GitHub and others. YMMV as to how difficult it is to setup, and the reliability.
No PHP but some other server side language
Obviously that could be done in another language, I checked PHP because that is what I use the most.
The best solution would be to transfer the content to providerB :-)
I am using IBM HTTP Server 6.1 / Apache 2.0.47. I would like to pull a specific piece of data out of all requests coming through the HTTP server and if it exists log the found data along with the target URL. It needs to be as efficient as possible.
Is a filter appropriate or a handler?
Does a filter/handler exist that I can configure and use as is or do I need to write something? How do I configure, or write this?
Thanks.
You could use mod_security apache module , which have a good audit log tool SecAuditLog (log all headers), that you can declench by http status. You'll find as well fine filters, that will maybe fits your needs.
And do not hesitate to ask servfault gurus on that.
Is there a way to detect if a site is on a Content Delivery Network and if yes, can we tell which service are they using?
A method that is achievable from the command line is using the 'host' command, with the -a flag set to see the DNS record e.g.
host -a www.visitbritain.com
Returns:
www.visitbritain.com. 0 IN CNAME d18sjq5nyxcof4.cloudfront.net.
Here you can see that the CNAME entry tells us that the site is using cloudfront as the CDN.
Just take a look at the urls of the images (and other media) of the site.
Reverse lookup IP's of the hostnames you see there and you will see who own them.
I built this little tool to identify the CDN used by a site or a domain, feel free to try it.
The URL: http://www.whatsmycdn.com/
You might also be able to tell from the HTTP headers of the media if the URL doesn't give it away. For example, media served by SimpleCDN has Server: SimpleCDN 5.6a4 in its headers.
cdn planet now have their cdn finder tool on github
http://www.cdnplanet.com/blog/better-cdn-finder/ The tool installs on the command line and allows you the feed in host names and check if they use a CDN.
If Website using GCP CDN you simply check it using curl
curl -I <https://site url>
In reponse you can find following headers there available
x-goog-metageneration: 2
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 17393
x-goog-meta-object-id: 11602
x-goog-meta-source-id: 013dea516b21eedfd422a05b96e2c3e4
x-goog-meta-file-hash: cf3690283997e18819b224c6c094f26c
Yes you can find by
host -a www.website.com
Apart from some excellent answers already posted here which include some direct methods which may or may not work for all the websites out there, there is also an indirect way to see if a CDN is there. And especially if its your own website and you want to know if you are getting what you are paying for !
The promise of a CDN is that connections from your users are terminated closer to them so that they get less TCP / TLS connection establishment overhead and static content is cached closet to them so that it loads faster, puts less strain on your origin servers.
To verify this, you can take measurements of site load times across the globe and see if all the users get similar loads times. No you dont have to get a machine everywhere in the world to do that ! Someone has already done that for you
Head to https://prober.tech/ and the URL you wish to test for load times.
Because this site itself is in Cloudflare's CDN, you can put that link itself in the test box and use it as baseline !
More information on using the tool can be found here
With Amazon S3, can I stop a query-string-authorized download that is in progress?
Are there other file download services that provide such a feature?
I'm not aware of a built in way to do this. If I understand your goal, you want to potentially stop an HTTP response mid-stream based on some custom rules you have. Is that right?
If so, perhaps you could write a very thin proxy to S3 that encapsulates this logic. If you ran the proxy on EC2 you wouldn't incur any additional bandwidth fees.
The downside is that you would have manage scaling the proxy (i.e. add more EC2 nodes based on traffic) so depending on your scaling requirements, this could require a bit of work. But the proxy script itself would probably be fairly trivial. Something like:
Make streaming HTTP request to S3 for object
for each x byte chunk in response from S3:
Check auth condition. Continue if valid. Break if not.
Send chunk to caller
I'm not aware of anyone that allows this. In general, the authentication is only checked once, when you begin downloading, but not thereafter.
Can you describe what you're trying to do more broadly?
Does anyone know of any reverse proxy solutions that allow the content/data of an HTTP response to be directly modified before being relayed to the requesting client?
As an example:
Proxy relays client request for pdf document to another server, response received by proxy, watermark added to pages of pdf, watermarked pdf is returned to client.
Regards,
Mike
Apache has mod_proxy and mod_proxy_html, which is used to rewrite links, headers, etc. I've only ever seen HTML or XML filters, but you should be able to write your own binary one for your PDF needs. The possible difficulty I could see is that Apache treats webpages as a stream, rather than a file. I'm not sure how to watermark a PDF doc, but if you need access to the entire file to do it, it might get complicated quickly.
Note that it would seem far easier to me to do the watermarking on the server, where you have access to the file, rather than a proxy. If server load is a concern, either a batch process, or a separate server could be an alternative solution.
I found a description of Deliverance over on the python tags, and it may be useful for what you're looking for. I have no experience with it myself, so grain of salt and all that.
http://www.openplans.org/projects/deliverance/introduction
I've had success with Pound.
I think I might go down the Squid/ICAP route.
This is for an enterprise level system, does anyone have any experience with either of these in this context?
http://wiki.squid-cache.org/Features/ICAP