JVM Tuning for Atlassian applications on Large Ec2 instance - jvm

Can anyone help me to understand what JVM memory size I can use for Large Ec2 Instances?
I am working on five applications - Atlassian apps (Bamboo, Bitbucket, Confluence, Jira) and Artifactory?
For Jira Application I found information online but no for other apps

Large EC2 instance is slightly ambiguous, so I'll assume you're referring to the general purpose category, which has 2vCPUs and 8GiB of RAM.
The documentation advises you to reserve at least 512MB for JFrog Artifactory, with the recommended minimal JVM parameters being:
-server -Xms512m -Xmx2g -Xss256k -XX:+UseG1GC
Based on the number of developers you're planning to service with JFrog Artifactory, you'll need to update the -Xmx parameter. The "Recommended Hardware" table in the documentation, will give you an overview of the suggested amount of RAM based on the number of developers.

Related

Redis localhost limitations and costs

I downloaded Redis server and cli to my local machine and it working good.
I just wanted to know if I can use it also in production server:
Are there any critical limitations? For example: Can I use 100 GB for free? (It will be on my computer).
I know that Redis labs cost money per month but if I download the redis to my machine and not using the redis labs, would it be free? (and the cost will be only the storage of the machine I using).
Redis is an open source software, licensed under BSD. That basically means you can do anything you want with it, without owing anyone anything.
Redis Labs, the home of open source Redis and the provider of commercial products that leverage on it, offers a wide spectrum of solutions - whether hosted, as-a-service, downloadable, remotely managed and so forth. You can (and should sometimes) use them, but that's definitely not a requirement.
Disclaimer: I work at Redis Labs and with the open source project.

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

Apache Mesos Vs. Apache CloudStack

Managing the infrastructure (private cloud or public cloud) at scale and ease is addressed by Apache Mesos, Apache CloudStack and OpenStack.
I have few questions in this regard and wanted to see if someone can throw light.
Any article(s) that compares and contrast the above?
Why run one over the other at all? (as I see from tutorials that one can run over the other)
It seems like CloudStack is centered around VMs (Hypervisor) and Mesos is centered around Scheduling and allocation of resources in side VMs for different co-existing software systems. Am I right in my conclusion?
If so, why does Mesos claims that it can manage bare metal boxes w/o even the need for Hypervisor? Is this for the reason for workloads that do not go well with VMs? (esp. LSM based systems such as SOLR-Lucene and HBase)
Is it with the choice of Linux VMs. Vs. Linux Container for resource allocation? IOW, Mesos is Linux container based framework, and CloudStack is Linux VMs based framework?

Obtain useful data from WebSphere JVM

I would like to attach to a WebSphere JVM and obtain useful data like garbage collectors' names and their collection counts, thread counts, heap/non-heap memory usage, JVM uptime etc. However, this link gives the list of MBeans available with the WebSphere JVM -
http://pic.dhe.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=%2Fcom.ibm.websphere.javadoc.wsfep.doc%2Fweb%2FmbeanDocs%2Findex.html
These MBeans don't seem to offer any data that I require. Is there any other way to obtain the data? I shall be using JMX to gather it.
If you're a corporate with bucks to spend I would suggest a product like Wily Introscope which runs an agent along with your JVM to collect all the metrics that you are after. I have used it with Websphere servers. Searching for an Open Source alternative I came across GlassBox which may provide a low cost alternative for you.
I'm not aware of any default MBeans that will provide the coverage you're after. It's typically the big Java vendors that provide this type of functionality.
[Update]
Having done something recently using VisualVM with Websphere 7, for the purposes of real-time monitoring/troubleshooting, I thought I would share my knowledge. VisualVM comes with the standard Sun JDK and you will find it installed here: JAVA_HOME\bin\jvisualvm.exe
To enable the JRE in Websphere to allow VisualVM to connect you must add the following JVM parameters using the Websphere Admin Console
Go To: Application Servers > [server_name] > Java and Process Management > Process definition > Java Virtual Machine > Generic JVM arguments
-Djavax.management.builder.initial=
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.local.only=false
Make sure that the port number you have chosen above is not already in use
netstat -ap | grep 1099
Restart the server and you will be able to connect using VisualVM to see Uptime, Threads, Heap and GC profiles.
I see that Sun have also documented how you can write your own Java JMX client to read these values.
You could go with the suggestions provided by Brad and Andreas.
I would like to give you some insights into some of the tools that should be explored
(1) Tivoli Performance Viewer. This should provide some information about the JVM.
(2) IBM Health Center -> http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/
Both of these should provide you a lot of info that you require.
Try them out
The JVM statistics are provided by the platform MXBeans. If you need to collect this data over a short period of time, then you could use a tool such as VisualVM. It's a bit tricky to configure this to connect to a WebSphere instance, but it is possible. One way to do that (there are other options) is described here:
http://code.google.com/p/xm4was/wiki/VisualVMHowTo
If you want to collect the data over a longer period of time, then you need a monitoring system. At work, I wrote a plugin for the Open Source RHQ enterprise management system that adds support for WebSphere. I'm in the process of releasing this plugin as an Open Source project, but at the time of writing, I have not yet published the documentation and there is also no downloadable release yet. Only the source code is available right now. I will try to complete that in the next weeks. If you are interested in this project, please let me know.

EC2 automation tools / strategies?

What tools or strategies are you using for automation of EC2 activities?
I need to be able to bring up a number of EC2 instances, provision various software to it (primarily Python packages), interact with S3 (primarily download data), and run various jobs. I'll be doing this both on-demand and on a scheduled basis.
I'm trying to decide if I should:
Create an AMI with all my software loaded on it
or
Launch a plain vanilla linux AMI instance and scp my software to it
For the provisioning and automation Boto looks pretty good. Or I could write something with Paramiko. Recommend either or anything else I should be looking it?
Basically I'm looking for advice / success stories, let me know what's working for you.
To answer your bullets about selecting AMIs, I would say that it depends on how much software you're installing.
I have been successful with a hybrid approach, where I build an AMI and load my heavyweight and more stable software. This is the stuff that needs to run an installer, or takes considerable time to install (remember that if you re-install a package every time as part of your startup process, you're paying for the install every time). Then, I upload the small and volatile software at provisioning/startup time. In this bucket goes most of the application code, data, etc. That way, I can change my app and not have to touch the AMI.
The benefits of this approach:
Don't have to pay for running the same software install thousands of times.
AMI can stay fairly stable over time.
Can use software that requires intervention or GUI interaction to install.
Major drawbacks:
Your AMI's OS version will become stale over time.
Your AMI may not be flexible as to the instance type/architecture it will run on. For instance, you may create it on a 32-bit OS and thereby prevent it from running on the High CPU instance types, or vice versa. So you may lock yourself into a pricing scheme.
I don't use Python, so I can't comment on either of the APIs you referenced.
AWS just released the Systems Manager suite, which includes an Automation service that will (among other things) handle your use cases around AMIs.
This question was asked some time ago now but I believe my answer could be useful to other users. I believe the best automations tools available on the market are provided by Cloud Management platforms. For example they offer auto-scaling, configuration software integration (Chef/Puppet), databases replications, dns management...
The most popular cloud management softwares are Scalr (disclaimer: I work there), RightScale and enStratus. Scalr is open-source and released under the Apache 2 license.
Regarding your specific question on AMIs, cloud Management platforms usually provide pre-configured AMIs (at Scalr, we call them roles). If you want to create your own AMI built on an existing instance, you'll be able to create snpashots and use them as a template for future instances.