Weblogic Admin console resiliency - weblogic

I have a weblogic cluster with cluster node running in 2 VMs to have resiliency during failure in any node. I use the WLST scripts to manage the start & stop of the deployed components as some components will be brought down during specific time frame.
Incase VM in which admin console is running is down, Is there any way to start / stop my deployed components if Im not able to bring up the admin console.

Related

Dynatrace one agent in ecs fargate containers stops but application container is running

Am trying to install one agent in my ECS fargate task. Along with application container i have added another container definition for one agent with image as alpine:latest and used run time injection.
While running the task, initially the one agent container is in running state and after a minute it goes to stopped state same time application container will be in running state.
In dynatrace the same host is available and keeps recreating after 5-10mins frequently.
Actually the issue that I had was task was in draining status because of application issue due to which in dynatrace it keeps recreating... And the same time i used run time injection for my ECS fargate so once the binaries are downloaded and injected to volume, the one agent container definition will stop while the application container keeps running and injecting logs in dynatrace.
I have the same problem and connected via ssh to the cluster I saw that the agent needs to be privileged. The only thing that worked for me was sending traces and metrics through Opentelemetry.
https://aws-otel.github.io/docs/components/otlp-exporter
Alternative:
use sleep infinity in the command field of your oneAgent container.

WebSphere 9 ND node agent stopped and the applications are still working. How/why?

This is WebSphere 9 ND. I've stopped the node agent and the serverStatus.sh script reports that it is down: ADMU0509I: The Node Agent "nodeagent" cannot be reached. Why are the applications still authenticating and appear to be working?
See this article explaining the basic concepts of IBM Websphere application server Network Deployment.
node agent
A node agent manages all managed processes on a WebSphere Application Server on a node by communicating with the Network Deployment Manager to coordinate and synchronize the configuration. A node agent performs management operations on behalf of the Network Deployment Manager. The node agent represents the node in the management cell. Node agents are installed with WebSphere Application Server base, but are not required until the node is added to a cell in a Network Deployment environment.
application server
The application server is the primary component of WebSphere. The server runs a Java™ virtual machine, providing the runtime environment for the application's code. The application server provides containers that specialize in enabling the execution of specific Java application components.
Apps are deployed into the Application server and not to the nodeagent. The role of the node agent is to perform management operations on behalf of Deployment Manager.
So, if the nodeagent is stopped, you will only loose the ability to manage the servers running under that node and it will not stop already running application
servers or applications deployed to servers in that node.
You can validate this by grepping the server name (eg:server1) from the list of all running processes:
ps -ef | grep java | grep servername
Sample output (for an app server) is given below:
wasadmin 12345 98765 2 13:18 pts/0 00:04:57 /opt/ibm/WebSphere/AppServer/java/8.0/bin/java -Dosgi.install.area=/opt/ibm/WebSphere/AppServer <collapsed text> cellname nodename servername
where:
wasadmin - is the os username running the application server on that
node
12345 - is the pid of the application server running on that node.
98765 - is the pid of the parent process (nodeagent). This will be
"1" if the nodeagent is stopped

ambari cluster + poor connection between ambari-agent to ambari server

we have ambari cluster with 872 data-nodes machines , when ambari version is 2.6.x
we have for now some network problem ,
after long investigation we found that , ambari agent that runs on some machine not communicate well with the ambari server
therefore we get some strange behaviors as 5 dead data-nodes from ambari dashboard , while for sure datanodes machine are healthy
is it possible to give more tolerated value in ambari agent configuration so the ack between ambari agent to ambari server will be after more little time in order to ignore the network problems ?
something like timeout or time connection between the ambari agent to ambari server
First of all, you need to get the root cause of the issue why Data Node is showing as Dead.
Ambari agent runs on every node. It is responsible for sending
metrics and heartbeat to the Ambari server which then publishes to
your Ambari web UI.
The name node waits for 10 minutes till it declares the data node as dead and copies
the blocks to other data nodes.
If it's showing that data node is dead then please check the Ambari agent status in
the specific node by running-service ambari-agent status. Parallelly you can check the ambari-agent.log in the worker node to check why Ambari agent stopped working.
You can configure your http timeouts in ambari-agents for service tasks, http timeouts
https://github.com/apache/ambari/blob/trunk/ambari-agent/conf/unix/ambari-agent.ini
There's a HTTP Timeout section you can configure it based on your network throughput.
The file should be in /etc/ambari-agent/ambari.properties

How to restart a Service Fabric Application

I have a gMSA service account running a stateless Service Fabric application. The account has recently been added as a member to a new security group. We don't see that the application is working and I think its because the user claims were loaded on application start up. I've seen that to get this to work on Windows Services that we need to restart the service (mmc->Services, right click restart). I would like to do something similar in Service Fabric.
I see the option of restarting the node, but that is a more heavy handed approach than I want to use. This is in production and I want to scope the solution to the problem. The other applications on the node do not have an issue so I would prefer to not bring them down.
Service Fabric Deactivate (pause) vs Deactivate (restart)?
Thanks in advance,
Greg
What you are looking for is the Restart-ServiceFabricDeployedCodePackage command.
The Restart-ServiceFabricDeployedCodePackage cmdlet ends the code package process, which restarts all of the user service replicas hosted in that process. This restart simulates code package process failures in the cluster, which tests the failover recovery paths of your service.
You can specify a code package, or you can specify a ReplicaSelector to restart the node and code package combination where the replica is hosted. This simplifies tests on the primary host node by not having to determine which Service Fabric node is the primary node before restarting that node.

How to force Weblogic to start deployments in active state (i.e. not just prepared)

When I start a Weblogic instance with a deployed application, the deployment is sometimes left in prepared state, not in active state. I have to go to Weblogic Console and start the deployment manually, which is quite slow and annoying repetetive work. Since this is done on a development machine — sometimes 50 times a day, — there are no security implication as the server is only visible on the local network. Is there some way to have it always start the deployment active?
Note that I'm not redeploying the application, I instead have it "constantly deployed" and stop/start the Weblogic instance using the scripts in bin directory.
If you are running weblogic in development mode, you can use the autodeploy folder for your app. See details here: http://download.oracle.com/docs/cd/E11035_01/wls100/deployment/autodeploy.html#wp1021620
Think this should solve your problem