I'm building a queue-based system to scale user-uploaded images.
Users will upload images which will get transferred to a storage server. The web server will then add a message to a queue which will be listened to by image scaling workers that will retrieve the image files, scale them and add them to the storage server.
I was planning on using celery over rabbitmq for this, but my web tier will be running PHP so for convenience I'd rather find a PHP way of doing this.
What suggestions do people have?
If it came to it (although I don't want to complicate the web tier by having python and PHP) how easy would it be to control celery from PHP, and how would I do that? Some kind of RPC protocol (like thrift?) or something simpler since celery needn't be on a different server?
I made the Celery-PHP library and it works smoothly for a few months now.
I'll just use thrift to allow me to invoke python from php, and use python with celery.
Related
I am NOT talking about Cloudera or Yarn system level logs. I am talking about applications running on Cloudera/Yarn infrastructure.
We have tens of Java and Python applications running on our Cloudera Infra, and all of them generate application logs. I am looking for the best way to monitor these logs for any errors and warnings. If it is a pure stand alone Java application, traditionally we can use one of these log scraper tools that send emails based on an expression matching (to detect error/warning/any other special situation). I am looking for something similar, that can monitor our application logs and emails us in real time for better production application support.
If thinking about this like a traditional application log monitoring is not the right way, then I am happy to know if there are any better industry standard approaches. Thanks!
I guess the ElasticStack (https://www.elastic.co/de/) could be one approach to solve this. You could use FileBeats to send your application logs to Logstash which forwards it to ElasticSearch. You could then create a Watcher in Kibana which sends i.e. Emails based on some triggering condition (we use a webhook to send notifications into a MS Teams channel).
This solution should work at least in near-realtime (~1-2 minutes delay, but this also depends on your watcher configuration).
I'm setting up the infrastructure for a Ray project and would like to use an external redis (i.e one not started by ray --head. However that currently does not seem possible, giving me:
If --head is passed in, a Redis server will be started, so a Redis address should not be provided.
Has anyone managed to use an external Redis not managed by Ray?
Regards,
Niklas
There's an ongoing project to improve this limitation. https://github.com/ray-project/ray/pull/6763
The idea is to make GCS as a service that you can use various external backends.
I am building a project which requires constant connection with the server.
There are two major ways to achieve this:
Ajax pull
Ajax push
I have to decide between pinging a server (expensive) and maintaining keep-alive connections (firewalls block that.)
I was thinking about the live video streams. They are not keep-alive connections, nor frequent pings.
Is it possible, to send data, like JSON strings through rtmp?
It would be theoretically possible to implement RTMP's AMF3 and AMF0 Message types to carry the data. RTMP [Wikipedia]
The problem is that using a protocol typically used for streaming video might get your connection blocked or throttled by some service providers that limit such protocols to conserve bandwidth (and prevent employees from watching internet videos at work).
Maybe this article may be of some use to you. It explains how to set up an RTMP server with nginx.
From the article:
nginx is an extremely lightweight web server, but someone wrote a RTMP module for it, so it can host RTMP streams too. However, to add the RTMP module, we have to compile nginx from source rather than use the apt package. Don't worry, it's really easy. Just follow these instructions. :)
One comment on this article by a user named 'stefaniuk' linked to a github respitory for this that I think you should look in to. Check it out here.
I am working on a project which has following requirements:
Perform sticky based load balancing(based on SOAP session ID) onto multiple backend servers.
Possibility to plugin my own custom based load balancer.
Easy to write and deploy.
A central configuration file(Possibly an XML), to take care of all the backend servers.
Easy extraction of a node from this configuration file(Possibly with xpath).
I tried working with camel for a while but, wasn't able to do perform certain task with it.
So thought of giving a try to Akka.
Will akka be possibly able to satisfy the above requirements?
If so is there a load balancing example in akka or proxy example?
Would really appreciate some feedbacks.
You can do everything you've described with Akka.
You don't mention what language you're working with, Scala or Java. I've included links to the Scala documentation.
Before you do anything with Akka you HAVE TO read the documentation and understand how Akka works.
http://doc.akka.io/docs/akka/2.0.3/
Doing so, you'll find Akka is perfect for the project you've described with some minor caveats.
Once you read the documentation the following answers should make a lot of sense.
Perform sticky based load balancing(based on SOAP session ID) onto multiple backend servers.
Load balancing is already part of the framework (it's called Routing in Akka http://doc.akka.io/docs/akka/2.0.3/scala/routing.html) and Remoting (http://doc.akka.io/docs/akka/2.0.3/scala/remoting.html) will take care of the backend servers. You can easily combine the two.
To my knowledge the idea of sticky load balancing is not a part of Akka but I can envision this being accomplished with a Map using the session ID as the key and the Actor name (or path) as the value. A quick actorFor will take care of the rest. Not well thought out but should give you a good idea of where to start.
Possibility to plugin my own custom based load balancer.
Refer to the Routing documentation.
Easy to write and deploy.
This depends on your aptitude and effort but after you read certain parts of the documentation you should be build a proof of concept in a couple of hours.
Deployment can be a bit frustrating mostly because the documentation isn't really great with respect to deploying Akka networks with remote components. However, there are enough examples on the web that you can figure out how to get it done...eventually. Once you do it once it's no big deal.
A central configuration file(Possibly an XML), to take care of all the backend servers.
Akka uses Typesafe Config (https://github.com/typesafehub/config) which is a lot easier to work with than XML (but I hate XML so take that with a grain of salt). As far as a central configuration, I'm not sure what you're trying to accomplish but it sounds like something that can be solved using remote actor creation. Again, see the Remoting documentation.
Easy extraction of a node from this configuration file(Possibly with xpath).
Akka provides a lookup method .actorFor. There's no need to go to the configuration file once the system is up and running.
If so is there a load balancing example in akka or proxy example?
Google is your friend.
How is Apache in respect to handling the c10k problem under normal conditions ?
Say while running very small scripts with little data, or do I need to scale out if I use Apache?
In the background heavy lifting is done by a few servers running specialized software that processes the requests but I'd like to use Apache as a front. Is this a viable plan?
I consider Apache to be more of an origin server - running something like mod_php or mod_perl to generate the content and being smart about routing to the appropriate system.
If you are getting thousands of concurrent hits to the front of your site, with a mix of types of data (static and dynamic) being returned, you may find it useful to put a more optimised system in front of it though.
The classic post-optimisation problem with Apache isn't generating the dynamic content (or at least, that can be optimised for early in the process), but simply waiting for a slow client to be able to receive the bytes that are being sent. It can therefore be a significant advantage to put a reverse proxy, in the form of Squid or Nginx, in front of the servers to take over the 'spoon-feeding' of the slow network clients, while allowing the content production to happen at full speed, and at local network speeds - 100Mb/sec or even gigabit speeds - if it even has to traverse a network at all.
I'm assuming you've probably seen this data, but if not, it might give you some idea.
Guys, imagine that you are running web server with 10K connections (simultaneous). How could it be?
You've got many many connections per second
Dynamic content
Are you sure that your CPU can handle that many PHP sessions for example? I guess no, so why are you thinking about C10K problem? :D
Static content - small files
And still soo many connections? On single server? Probably you've got problems with networking/throughput too or you are future competitor of Google. Use lighttpd which addresses C10K problem and is stable - fly light. Using Apache for only static files for large sites is obvious.
Your clients are downloading large files for a large time - static content
ISO images, archives etc
If you are doing it via web server - FTP may be more appropriate.
Video streaming
Use lighttpd or specialized software. And still... What about other resources?
I am using Linux Virtual Server as load balancer in front of apache servers (with specific patches for LVS-NAT) and I am happy :) This string is an answer you want to hear.