Is it safe to run Elastic APM on production servers? - elastic-apm

Elastic APM uses a Java agent to collect application performance metrics and sends it to the Elastic APM server, which means it will use Java instrumentation for underlying metrics in JVM.
My question is:
Is there any vulnerability issue or security risk for using it?

currently,we run apm-agent for java application, but after 1 day, server cpu has over 100% and java application killed by system

Related

Is AWS Fargate useful for deploying a "web" service stack?

I see Fargate as a good service for deploying a Docker Compose based stack, but I was wondering if it is any good for "long-running" web services, not just ones where you need dynamic scaling and undeterminate workloads (e.g. containers that are created and die on demand).
That depends on your use-case. ECS lets you quickly deploy containerized applications. With Fargate we don't need to manage the underlying infrastructure (say server-less approach for containers!). Fargate is suitable for long-running apps, microservices, and batch jobs.
Few of my observations on Fargate are:
Fargate storage is ephemeral - We cannot store container data in disk such as volumes. (although Fargate provides 10 GB of volume mounts that is nonpersistent empty storage.)
Logs can be sent to Cloudwatch using awslogs driver. Recently AWS announced the support for Splunk log driver.
Fargate uses only awavpc network mode.
Fargate supports environment variables. Environment variables are the only way to pass parameters to the container.

Google Cloud Manage Tomcat Service

Does google cloud or aws provide manage Apache tomcat which just take war file and do auto-scaling based on load increase and decrease ? not compute engine. I dont want to create VM. this should be manage by manage service.
Google App Engine can directly take and run a WAR file - just use the appcfg deployment method.
You will have more options if you package with docker, as this then provides an image type that can be run in many places (Multilpe GCP, AWS and Azure options, on-prem Kubernetes, etc). This can even be as simple as building a dockerfile that just copies the WAR into a jetty image:
FROM jetty:latest
COPY YOUR_WAR.war /var/lib/jetty/webapps
It might be better to explode the war though - see discussion in this question
AWS provide ** AWS Elastic Beanstalk **
The AWS Elastic Beanstalk Tomcat platform is a set of environment configurations for Java web applications that can run in a Tomcat web container. Each configuration corresponds to a major version of Tomcat, like Java 8 with Tomcat 8.
Platform-specific configuration options are available in the AWS Management Console for modifying the configuration of a running environment. To avoid losing your environment's configuration when you terminate it, you can use saved configurations to save your settings and later apply them to another environment.
To save settings in your source code, you can include configuration files. Settings in configuration files are applied every time you create an environment or deploy your application. You can also use configuration files to install packages, run scripts, and perform other instance customization operations during deployments.
It also provide autoscaling
The Auto Scaling group in your Elastic Beanstalk environment uses two Amazon CloudWatch alarms to trigger scaling operations. The default triggers scale when the average outbound network traffic from each instance is higher than 6 MB or lower than 2 MB over a period of five minutes. To use Amazon EC2 Auto Scaling effectively, configure triggers that are appropriate for your application, instance type, and service requirements. You can scale based on several statistics including latency, disk I/O, CPU utilization, and request count.

JMeter gives “The target server failed to respond ” Error

We use Jmeter to do performance testing. I gave 200 threads(200 users). and we have two servers. like sever A, Server B. i tested indivisibly for 200 Users, it works. and we load balancing server Like server C. So request goes to ether server A Or Server B. But if configure my same jmx script(200 thread) with Server C. it gives error below (but it works for 50 users-- no error).
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.jmeter.protocol.http.sampler.MeasuringConnectionManager$MeasuredConnection.receiveResponseHeader(MeasuringConnectionManager.java:201)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
If the issue can be reproduced only on higher loads - it's definitely a server (or load balancer) issue so congratulations on finding the first bottleneck.
Now you can investigate the reason and suggest the fixes, the next steps could be:
Inspect application under test / load balancer logs - you can find a clue there
Inspect application under test / load balancer / database / any other middleware configuration. in the majority of cases default configuration is good for development and debugging but you will need to perform some performance tuning before running a prod-like load test
Collect main health metrics on the application under test side (CPU, RAM, Network, Disk, Swap usage, etc.). It might be the case your application simply lacks hardware resources. You can use built-in tools of the operating system(s) or an APM tool or JMeter PerfMon Plugin
Re-run your test with a profiling tool telemetry enabled on the application under test side. This will give you an overview with regards to where the application spends the most time, which are the "heaviest" functions or functions called most frequently so you would know what to optimise.
Make sure that the load balancer equally (or according to the other algorithm) distributes the requests between the backend servers. It might be the case you're hitting only one server, if this is - consider adding the DNS Cache Manager to your Test Plan and re-run your test to see if it helps.

Bamboo remote agent pool

In Jenkins, we can define labels and group number of build slaves under the label. This label then can be mapped to job so jenkins will automatically pick the available build slaves in the pool and execute the jobs. Is something similar available in bamboo to create remote agent pool?
I hope I understood your question correctly but anyway... There's similar concept in Bamboo. There are two types of agents:
Local ones which operate as a thread in Bamboo server. Generally, not recommended for bigger Bamboo instances due to performance and security reasons.
Remote ones which are basically separate processes running the builds, ideally on a different machine so Bamboo server doesn't suffer from higher hardware load.
The match between job and agents bases on job requirement and agent capabilities, e.g:
Agent define a capability, effectively states what it can build, what tools are installed, e.g. .NET or JDK
Job/deployment environment define a requirement which is need to successfully accomplish the task, e.g. Git and Maven.
In the end Bamboo tries to find an agent which provides full set of capabilities a job/deployment environment requires.
The special rules applies if an agent is dedicated to an job or environment or agent is elastic agent (runs in EC2).
More reading:
https://confluence.atlassian.com/bamkb/difference-between-local-agents-and-remote-agents-457703602.html
https://confluence.atlassian.com/bamboo/configuring-a-job-s-requirements-289277064.html
https://confluence.atlassian.com/bamboo/configuring-a-job-s-requirements-289277064.html
https://confluence.atlassian.com/bamboo/requirements-for-deployment-environments-838427584.html
https://confluence.atlassian.com/bamboo/dedicating-an-agent-629015108.html
https://confluence.atlassian.com/bamboo/managing-your-elastic-image-configurations-289277147.html

What is yarn-client mode in Spark?

Apache Spark has recently updated the version to 0.8.1, in which yarn-client mode is available. My question is, what does yarn-client mode really mean? In the documentation it says:
With yarn-client mode, the application will be launched locally. Just like running application or spark-shell on Local / Mesos / Standalone mode. The launch method is also the similar with them, just make sure that when you need to specify a master url, use “yarn-client” instead
What does it mean "launched locally"? Locally where? On the Spark cluster?
What is the specific difference from the yarn-standalone mode?
So in spark you have two different components. There is the driver and the workers. In yarn-cluster mode the driver is running remotely on a data node and the workers are running on separate data nodes. In yarn-client mode the driver is on the machine that started the job and the workers are on the data nodes. In local mode the driver and workers are on the machine that started the job.
When you run .collect() the data from the worker nodes get pulled into the driver. It's basically where the final bit of processing happens.
For my self i have found yarn-cluster mode to be better when i'm at home on the vpn, but yarn-client mode is better when i'm running code from within the data center.
Yarn-client mode also means you tie up one less worker node for the driver.
A Spark application consists of a driver and one or many executors. The driver program is the main program (where you instantiate SparkContext), which coordinates the executors to run the Spark application. The executors run tasks assigned by the driver.
A YARN application has the following roles: yarn client, yarn application master and list of containers running on the node managers.
When Spark application runs on YARN, it has its own implementation of yarn client and yarn application master.
With those background, the major difference is where the driver program runs.
Yarn Standalone Mode: your driver program is running as a thread of the yarn application master, which itself runs on one of the node managers in the cluster. The Yarn client just pulls status from the application master. This mode is same as a mapreduce job, where the MR application master coordinates the containers to run the map/reduce tasks.
Yarn client mode: your driver program is running on the yarn client where you type the command to submit the spark application (may not be a machine in the yarn cluster). In this mode, although the drive program is running on the client machine, the tasks are executed on the executors in the node managers of the YARN cluster.
Reference: http://spark.incubator.apache.org/docs/latest/cluster-overview.html
A Spark application running in
yarn-client mode:
driver program runs in client machine or local machine where the application has been launched.
Resource allocation is done by YARN resource manager based on data locality on data nodes and driver program from local machine will control the executors on spark cluster (Node managers).
Please refer this cloudera article for more info.
The difference between standalone mode and yarn deployment mode,
Resource optimization won't be efficient in standalone mode.
In standalone mode, driver program launch an executor in every node of a cluster irrespective of data locality.
standalone is good for use case, where only your spark application is being executed and the cluster do not need to allocate resources for other jobs in efficient manner.
Both spark and yarn are distributed framework , but their roles are different:
Yarn is a resource management framework, for each application, it has following roles:
ApplicationMaster: resource management of a single application, including ask for/release resource from Yarn for the application and monitor.
Attempt: an attempt is just a normal process which does part of the whole job of the application. For example , a mapreduce job which consists of multiple mappers and reducers , each mapper and reducer is an Attempt.
A common process of summiting a application to yarn is:
The client submit the application request to yarn. In the
request, Yarn should know the ApplicationMaster class; For
SparkApplication, it is
org.apache.spark.deploy.yarn.ApplicationMaster,for MapReduce job ,
it is org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
Yarn allocate some resource for the ApplicationMaster process and
start the ApplicationMaster process in one of the cluster nodes;
After ApplicationMaster starts, ApplicationMaster will request resource from Yarn for this Application and start up worker;
For Spark, the distributed computing framework, a computing job is divided into many small tasks and each Executor will be responsible for each task, the Driver will collect the result of all Executor tasks and get a global result. A spark application has only one driver with multiple executors.
So, then ,the problem comes when Spark is using Yarn as a resource management tool in a cluster:
In Yarn Cluster Mode, Spark client will submit spark application to
yarn, both Spark Driver and Spark Executor are under the supervision
of yarn. In yarn's perspective, Spark Driver and Spark Executor have
no difference, but normal java processes, namely an application
worker process. So, when the client process is gone , e.g. the client
process is terminated or killed, the Spark Application on yarn is
still running.
In yarn client mode, only the Spark Executor are under the
supervision of yarn. The Yarn ApplicationMaster will request resource
for just spark executor. The driver program is running in the client
process which have nothing to do with yarn, just a process submitting
application to yarn.So ,when the client leave, e.g. the client
process exits, the Driver is down and the computing terminated.
First of all, let's make clear what's the difference between running Spark in standalone mode and running Spark on a cluster manager (Mesos or YARN).
When running Spark in standalone mode, you have:
a Spark master node
some Spark slaves nodes, which have been "registered" with the Spark master
So:
the master node will execute the Spark driver sending tasks to the executors & will also perform any resource negotiation, which is quite basic. For example, by default each job will consume all the existing resources.
the slave nodes will run the Spark executors, running the tasks submitted to them from the driver.
When using a cluster manager (I will describe for YARN which is the most common case), you have :
A YARN Resource Manager (running constantly), which accepts requests for new applications and new resources (YARN containers)
Multiple YARN Node Managers (running constantly), which consist the pool of workers, where the Resource manager will allocate containers.
An Application Master (running for the duration of a YARN application), which is responsible for requesting containers from the Resource Manager and sending commands to the allocated containers.
Note that there are 2 modes in that case: cluster-mode and client-mode. In the client mode, which is the one you mentioned:
the Spark driver will be run in the machine, where the command is executed.
The Application Master will be run in an allocated Container in the cluster.
The Spark executors will be run in allocated containers.
The Spark driver will be responsible for instructing the Application Master to request resources & sending commands to the allocated containers, receiving their results and providing the results.
So, back to your questions:
What does it mean "launched locally"? Locally where? On the Spark
cluster?
Locally means in the server in which you are executing the command (which could be a spark-submit or a spark-shell). That means that you could possibly run it in the cluster's master node or you could also run it in a server outside the cluster (e.g. your laptop) as long as the appropriate configuration is in place, so that this server can communicate with the cluster and vice-versa.
What is the specific difference from the yarn-standalone mode?
As described above, the difference is that in the standalone mode, there is no cluster manager at all. A more elaborate analysis and categorisation of all the differences concretely for each mode is available in this article.
With yarn-client mode, your spark application is running in your local machine. With yarn-standalone mode, your spark application would be submitted to YARN's ResourceManager as yarn ApplicationMaster, and your application is running in a yarn node where ApplicationMaster is running.
In both case, yarn serve as spark's cluster manager. Your application(SparkContext) send tasks to yarn.