Updating Redis Click-To-Deploy Configuration on Compute Engine - redis

I've deployed a single micro-instance redis on compute engine using the (very convenient) click-to-deploy feature.
I would now like to update this configuration to have a couple of instances, so that I can benchmark how this increases performance.
Is it possible to modify the config while it's running?
The other option would be to add a whole new redis deployment, bleed traffic onto that over time and eventually shut down the old one. Not only does this sound like a pain in the butt, but, I also can't see any way in the web UI to click-to-deploy multiple clusters.
I've got my learners license with all this, so would also appreciate any general 'good-to-knows'.

I'm on the Google Cloud team working on this feature and wanted to chime in. Sorry no one replied to this for so long.
We are working on some of the features you describe that would surely make the service more useful and powerful. Stay tuned on that.
I admit that there really is not a good solution for modifying an existing deployment to date, unless you launch a new cluster and migrate your data over / redirect reads and writes to the new cluster. This is a limitation we are working to fix.
As a workaround for creating two deployments using Click to Deploy with Redis, you could create a separate project.
Also, if you wanted to migrate to your own template using the Deployment Manager API https://cloud.google.com/deployment-manager/overview, keep in mind Deployment Manager does not have this limitation, and you can create multiple deployments from the same template in the same project.
Chris

Related

Upcoming deprecation of legacy GCE and GKE metadata server endpoints on Legacy Boxes

I have two legacy servers in GCE, which have both been flagged as using the deprecated metadata server endpoints. At this moment in time, they have hundreds of GB's of data between them in MySQL and MongoDB data, and risking upgrading something on these boxes which has an adverse affect is not an option.
We are currently in the process of migrating away from the data stored here, but for now, we need to keep them running.
Is anyone aware of any implications to either
a) doing nothing or
b) Just setting the disable-legacy-endpoints metadata flag to true ?
i.e. Will these instances stop working altogether if we leave them as they currently are?
After some more digging into what was actually using the Metadata API to start with, we found that they were being sent by stackdriver_agent which was installed an extremely long time ago while it was free, and just never removed.
Stopping this agent will remove all calls that we make with these two legacy servers.
If you are considering disabling with the disable-legacy-endpoints metadata flag, make sure to test it in a contained environment first, i.e. a new VM from a snapshot of the affected instance, before apply to production services.
For help identifying the instances making the calls, refer to this article
For help identifying the processes within the instances, refer to this article

Openstack - hardware requirements

I've been needing a new VM host for some time now, and from working with/on AWS at work, "The Cloud" seems to be a good idea.
I've done some math, and no matter how I count, it's going to be cheaper to do it myself, than colo or something else. Plus, I really like lots of blinking lights :D
A year or so, I heard about Openstack and have been looking cursory at it since then. Seems big and complex (and scary!), and some friends who have been trying to do it at work for a year and still not quite finished/succeeded, indicate that it is what it seems :)
However, I like tormenting myself, so I've decided I'm going to give it a try. It does provide all the functionality, and then some, that I need. Theoretically, I could go with Vagrant, but that's not quite half-way to what I want/need.
So, I've been looking at https://en.wikipedia.org/wiki/OpenStack#Components and from that came to the following conclusion:
Required: (Nova, Glance, Horizon, Cinder)
This seems to be the "core" services. I need all of them.
Nova
Compute fabric controller
Glance
Image service (for templates)
Horizon
Dashboard
Cinder
Block storage devices (can work with ZoL w/ 3rd party driver)
Less important: (Barbican, Trove, Designate)
I really don't need any of this, it's more of "could be nice to have at some point".
Barbican
REST API designed for the secure storage, provisioning and management of secrets
Trove
Database-as-a-service provisioning relational and non-relational database engine
Designate
DNS as a Service
Possibly not needed: (Neutron, Keystone)
These ones I don't know if I need. I have DHCP, VLAN, VPN, DNS, LDAP, Kerberos services on the network that work just fine, and I'm not replacing them!
Neutron (previously Quantum)
Network management (DHCP, VLAN)
Keystone
Identity service (can work with existing LDAP servers)
Not needed: (Swift, Ceilometer, Ironic, Zaqar, Searchlight, Sahara, Heat, Manilla)
Meh! I'm doing this for me, for my basement and for my own development and enjoyment, so don't need that. Would be nice to go with a fully object based storage, but that's not feasible for me at this time.
Swift
Object storage system
Ceilometer
Telemetry Service (billing)
Ironic
Bare metal provisoning instead of virtual machines
Zaqar
multi-tenant cloud messaging service for Web developers (~ SQS)
Searchlight
Advanced and consistent search capabilities across various OpenStack cloud services
Sahara
Easily and rapidly provision Hadoop (storing and managing vast amounts of data cheaply and efficient) clusters
Heat
Orchestration layer (store the requirements of a cloud application in a file that defines what resources are necessary for that application)
Manila
Shared File System Service (manage shares in a vendor agnostic framework)
If we don't count storage (I already have my own block storage, which I can use with Cinder and some 3rd party plugins/modules) and compute nodes (everything that's left over will become compute nodes), can I run all this on one machine? With a hot standby/failover?
Everything is going to be connected to the same power jack, same rack, same [outgoing] network cable so more redundancy that that is overkill. I don't even need that, but "why not" :)
The basic recommendation I've heard is four to six machines. And after a lot of pestering the ones who said that, it turns out that "two storage, two controller, two compute". Which, is what I was thinking as well: Running this on two machines should be enough. They're basically only going to run Glance, Horizon and Cinder. And possibly Neutron and Keystone.
Neither of them seems to be very resource-heavy.
Is there something I'm missing?
Oh, and nothing of this is going to face the 'Net! It's all just for me.
Though it is theoretically possible to bring up OpenStack without Keystone, it is almost practically impossible and makes the system pretty inconvenient to use.
You can definitely run full OpenStack on a machine (or even in a VM). Checkout the devstack (http://docs.openstack.org/developer/devstack/) -- you just run a shell script to bring up a full working OpenStack setup.
As long as you are not worried about availability and your workload is minimal, single-node deployment is a pretty good start to get your hands wet.

Use IronWorkers while using my work

My website is hosted on AWS Elastic Beanstalk (PHP). I use Yii Framework as an MVC.
A while ago I wanted to run a SQL query everyday. I looked up how to run crons on Beanstalk and it seemed complicated to merge the concepts of Cloud and Cron. I ran into Iron Worker (http://www.iron.io/worker), and managed to create a worker that is currently doing its job fine.
Today I want to run a more complex cron (Look for notifications in my database, decide whether to send an email, build an email template and send the email (via AWS SES).
From what I understand, worker files are supposed to be self-contained items, with everything they need to work.
However, I have invested a lot of time and effort in building my MVC. I have complex models, verifications, an email templating engine, etc...
It seems very difficult to use the work I've done to create an Iron Worker. Even if I managed to port all of my code to a worker (which seems like a great deal of work), it means anytime I make changes to my main code I need to make sure the worker also has those changes. It means I would have a "branch" of my code. Even more so if I want to create more workers in the future.
What is the correct approach?
Short-term, you could likely just use the scheduling capabilities in IronWorker and have the worker hit an endpoint in your application. The endpoint will then trigger the operations to run within your app environment.
Longer-term, we do suggest you look at more of a service-oriented approach whereby you break your application up to be more loose-coupled and distributed. Here's a post on the subject. The advantages are many especially around scalability and development agility.
https://blog.heroku.com/archives/2013/12/3/end_monolithic_app
You can also take a look at this YII addition.
http://www.yiiframework.com/extension/yiiron/
Certainly don't want you rewrite your app unnecessarily but there are likely areas where you can look to decouple. Suggest creating a worker directory and making efforts to write the workers to be self-contained. In that way, you could run them in a different environment and just pass payloads to the worker. (Push queues can also be used to push to these workers.) Once you get used to distributed async processing, it's a pretty easy process to manage.
(Note: I work at Iron.io)

Deploying on EC2

This question is for anyone who has actually used Amazon EC2. I'm looking into what it would take to deploy a server there.
It looks like I can start in VirtualBox, setup my server and then export the image using the provided ec2-tools.
What gets tricky is if I actually want to make configuration changes to my running server, they will not be persistent.
I have some PHP code that I need to be able to deploy (and redeploy) to the system, so I was thinking that EBS would be a good choice there.
I have a massive amount of data that I need stored, but it just so happens that latency is not an issue, so I was thinking something like s3fs might work.
So my question is... What would you do? What does your configuration look like? What have been particular challenges that perhaps you didn't see coming?
We have deployed a large-scale commercial app in the AWS environment.
There are three basic approaches to keeping your changes under control once the server is running, all of which we use in different situations:
Keep the changes in source control. Have a script that is part of your original image that can pull down the latest and greatest. You can pull down PHP code, Apache settings, whatever you need. If you need to restart your instance from your AMI (Amazon Machine Image), just run your script to get the latest code and configuration, and you're good to go.
Use EBS (Elastic Block Storage). EBS is like a big external hard drive that you can attach to your instance. Even if your instance goes away, EBS survives. If you later need two (or more) identical instances, you can give each one of them access to what you save in EBS. See https://stackoverflow.com/a/3630707/141172
Burn a new AMI after each change. There's a tool to create a new AMI from a running instance. If EBS is like having an external hard drive, creating a new AMI is like having a DVD-R. You can save the current state of your machine to it. Next time you have to start a new instance, base it on that new AMI. Good to go.
I recommend storing your PHP code in a repository such as SVN, and writing a script that checks the latest code out of the repository and redeploys it when you want to upgrade. You could also have this script run on instance startup so that you get the latest code whenever you spin up a new instance; saves on having to create a new AMI every time.
The main challenge that I didn't see coming with EC2 is instance startup time - especially with Windows. Linux instances take 5 to 10 minutes to launch, but I've seen Windows instances take up to 40 minutes; this can be an issue if you want to do dynamic load balancing and start up new instances when your load increases.
I'd suggest the best bet is to simply 'try it'. The charges to run a small instance are not high and data transfer rates are very low - I have moved quite a few GB and my data fees are still less than a dollar(!) in my first month. You will likely end up paying mostly for system time rather than data I suspect.
I haven't deployed yet but have run up an instance, migrated it from Ubuntu 8.04 to 8.10, tried different port security settings, seen what sort of access attempts unknown people have tried (mostly looking for phpadmin), run some testing against it and generally experimented with the config and restart of the components I'm deploying. It has been a good prelude to my end deployment. I won't be starting with a big DB so will be initially sticking with the standard EC2 instance space.
The only negativity I have heard it that some spammers have made some of the IP ranges subject to spam-blocking - but have not yet confirmed that.
Your virtual box approach I will suggest you take after you are more familiar with the EC2 infrastructure. I suggest that you go to EC2, open an account and follow Amazon's EC2 getting-started guide. This guide will give you enough overview on all things (EBS, IP, CONNECTIONS, and otherS) to get you started. We are currently using EC2 for production and the way we started was like I am explaining here.
I hope you become a Cloud Expert Soon.
Per timbo's concern, I was able to nab an IP that, so far hasn't legitimately shown up on any spam lists. You will have a few hiccups since many blacklists are technically whitelists and will have every IP on their list until otherwise notified that a Mail Server is running on that IP. It's really easy to remove, most of them have automated removal request forms and every one that doesn't has been very cooperative in removing me from their lists. Just be professional, ask if they can give a time and reason for the block and what steps you should take to remove your IP. All the services I have emailed never asked me to jump through any hoops, within two or three business days they all informed me my IP had been removed.
Still, if you plan on running a mail server I would recommend reserving IPs now. They're 1 cent per every hour they are not bound to an instance so it works out to being about $7 a month. I went ahead and reserved an extra one as I plan on starting up another instance soon.
I have deployed some simple stuff to EC2 Win2k3 instances. Here's my advice:
Find a tutorial. Sign up for the service. Just spend an afternoon setting up your first server. It's pretty darned easy, though there will be obstacles to overcome. It's not too tough.
When I was fooling with EC2 I think I spent like $2.00 setting up a server and playing with it for a while.
Some of your data will be persistent, but you can connect S3 to EC2 as well.
Just go for it!
With regards to the concerns about blacklisting of mail servers, you can also use Amazon's Simple Email Service (SES), which obviates the need to run the mail server on the EC2 instances.
I had trouble with this as well, but posted a note here in their forums - https://forums.aws.amazon.com/thread.jspa?threadID=80158&tstart=0

Index replication and Load balancing

Am using Lucene API in my web portal which is going to have 1000s of concurrent users.
Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing.
Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?
You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.
That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.
Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.
From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.
Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.
We are working on a similar implementation to what you are describing as a proof of concept. What we see as an end-product for us consists of three separate servers to accomplish this.
There is a "publication" server, that is responsible for generating the indices that will be used. There is a service implementation that handles the workflows used to build these indices, as well as being able to signal completion (a custom management API exposed via WCF web services).
There are two "site-facing" Lucene.NET servers. Access to the API is provided via WCF Services to the site. They sit behind a physical load balancer and will periodically "ping" the publication server to see if there is a more current set of indicies than what is currently running. If it is, it requests a lock from the publication server and updates the local indices by initiating a transfer to a local "incoming" folder. Once there, it is just a matter of suspending the searcher while the index is attached. It then releases its lock and the other server is available to do the same.
Like I said, we are only approaching the proof of concept stage with this, as a replacement for our current solution, which is a load balanced Endeca cluster. The size of the indices and the amount of time it will take to actually complete the tasks required are the larger questions that have yet to be proved out.
Just some random things that we are considering:
The downtime of a given server could be reduced if two local folders are used on each machine receiving data to achieve a "round-robin" approach.
We are looking to see if the load balancer allows programmatic access to have a node remove and add itself from the cluster. This would lessen the chance that a user experiences a hang if he/she accesses during an update.
We are looking at "request forwarding" in the event that cluster manipulation is not possible.
We looked at solr, too. While a lot of it just works out of the box, we have some bench time to explore this path as a learning exercise - learning things like Lucene.NET, improving our WF and WCF skills, and implementing ASP.NET MVC for a management front-end. Worst case scenario, we go with something like solr, but have gained experience in some skills we are looking to improve on.
I'm creating the Indices on the publishing Backend machines into the filesystem and replicate those over to the marketing.
That way every single, load & fail balanced, node has it's own index without network latency.
Only drawback is, you shouldn't try to recreate the index within the replicated folder, as you'll have the lockfile lying around at every node, blocking the indexreader until your reindex finished.