How do I make a free VLM from kemp boot on a physical LoadMaster - load-balancing

I know this isn't the right place to ask this but this site has the most users on it. I recently bought a Kemp LoadMaster LM-2600 Load Balancer for my webservers. However, this unit didn't include an SSD because the previous owner decided to erase it. So, I downloaded the VirtualBox version of free VLM from kemp's website. Then, I used VBoxManage clonehd LMOS.vmdk LMOS.img --format RAW to turn the disk into a raw img file. Then, I used dd if=LMOS.img of=/dev/sdb to flash a USB with the os. Then, I booted my loadmaster with the USB.
The boot process went like normal until it finished booting and then the machine switched to runlevel 0 (Shutdown)
This is the logs I got when I plugged the USB into my computer (The log file was so big that stack overflow won't allow me to paste it here):
https://pastebin.com/5PbKzRi6
I noticed that it said something about eth0 being down so I plugged in an ethernet cable and booted it again. The same thing happened but I got a different error (The log was shorter so I labeled it):
-- BOOT --
2022-08-07T19:50:06+00:00 lb100 syslog-ng: syslog-ng starting up; version='3.25.1'
-- ERROR --
2022-08-07T19:50:07+00:00 lb100 raid_events_handler: RAID controller not detected yet (check # 0)
-- LOGIN --
2022-08-07T19:50:11+00:00 lb100 login: pam_unix(login:session): session opened for user bal by LOGIN(uid=0)
-- ERROR --
2022-08-07T19:50:14+00:00 lb100 raid_events_handler: RAID controller not detected yet (check # 1)
-- SHUTDOWN --
2022-08-07T19:50:15+00:00 lb100 init: Switching to runlevel: 0
2022-08-07T19:50:15+00:00 lb100 kernel: S99final (938): drop_caches: 1
2022-08-07T19:50:17+00:00 lb100 syslog-ng: syslog-ng shutting down; version='3.25.1'
2022-08-07T19:50:17+00:00 lb100 kernel: Kernel logging (proc) stopped.
2022-08-07T19:50:17+00:00 lb100 kernel: Kernel log daemon terminating.
2022-08-07T19:50:17+00:00 lb100 sslproxy: (815) caught signal 15
2022-08-07T19:50:17+00:00 lb100 raid_events_handler: stop
I have no idea what to do right now. I already tried everything I knew. What should I do?
Any help would be great,
Thanks!

Related

Unable to ssh VM after hardware configuration change

I followed the recommandation to reduce the size of my VM (number of CPU from 4 to 2 and memory from 16GO to 8 Go). After updating the configuration and restarting the VM i was not able to access the VM via ssh.
The VM has an external IP.
The troublshoot diagnostic using gcloud does not show any error or issue in the log. Everything is fine regarding the firewall configuration.
I tried to create a new VM under my project (same project as the original VM). I cannot access it with ssh. If i create a new project and a new VM instance under this new project then I can ssh it. --> The problem seems to be related to the project itself.
I tried to access vie serial port and I am getting these errors:
Mar 8 20:31:11 myvm systemd[1]: Started Google OSConfig Agent.
Mar 8 20:32:11 myvm OSConfigAgent[1173]: 2022-03-08T20:32:11.5643Z OSConfigAgent Critical main.go:100: Error parsing metadata, agent cannot start: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=0&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Mar 8 20:32:11 myvm systemd[1]: google-osconfig-agent.service: Main process exited, code=exited, status=1/FAILURE
Mar 8 20:32:11 myvm systemd[1]: google-osconfig-agent.service: Failed with result 'exit-code'.
Mar 8 20:32:12 myvm systemd[1]: google-osconfig-agent.service: Service hold-off time over, scheduling restart.
Mar 8 20:32:12 myvm systemd[1]: google-osconfig-agent.service: Scheduled restart job, restart counter is at 4.
I am blocked... I am asking for your support. Any idea or suggestion?

running ARM- fullsystem on gem5

0
I am trying to run ARM-full_system on gem5.when I enter this command for Performance : {root#farideh-S551LN:/home/farideh/gem5# ./build/ARM/gem5.opt configs/example/arm/starter_fs.py --cpu="minor" --num-cores=1 --disk-image=/home/farideh/Downloads/fullsystem/disks/linaro-minimal-aarch64.img --kernel=/home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616}
the terminal shows:
{gem5 Simulator System. http://gem5.org gem5 is copyrighted software; use the --copyright option for details. gem5 compiled May 1 2019 08:46:14 gem5 started Jun 2 2019 11:49:47 gem5 executing on farideh-S551LN, pid 7162 command line: ./build/ARM/gem5.opt configs/example/arm/starter_fs.py --cpu=minor --num-cores=1 --disk-image=/home/farideh/Downloads/fullsystem/disks/linaro-minimal-aarch64.img --kernel=/home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616 Global frequency set at 1000000000000 ticks per second warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (2048 Mbytes) info: kernel located at: /home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616 warn: Bootloader entry point 0x10 overriding reset address 0 warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match. system.vncserver: Listening for connections on port 5900 system.terminal: Listening for connections on port 3456 0: system.remote_gdb: listening for remote gdb on port 7000 info: Using bootloader at address 0x10 info: Using kernel entry physical address at 0x80080000 info: Loading DTB file: /home/farideh/gem5/m5out/system.dtb at address 0x88000000 warn: Existing EnergyCtrl, but no enabled DVFSHandler found. info: Entering event queue # 0. Starting simulation... warn: ClockedObject: Already in the requested power state, request ignored warn: SCReg: Access to unknown device dcc0:site0:pos0:fn7:dev0}
then I enter telnet 127.0.0.1 3456 in an other terminal and it shows:Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ==== m5 slave terminal: Terminal 0 ==== }
after several times I confront by this error about telnet:Connection closed by foreign host. and another terminal , I see this:
root#farideh-S551LN:/home/farideh/gem5# ./build/ARM/gem5.opt configs/example/arm/starter_fs.py --cpu="minor" --num-cores=1 --disk-image=/home/farideh/Downloads/fullsystem/disks/linaro-minimal-aarch64.img --kernel=/home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616 gem5 Simulator System. http://gem5.org gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled May 1 2019 08:46:14 gem5 started Jun 2 2019 11:49:47 gem5 executing on farideh-S551LN, pid 7162 command line: ./build/ARM/gem5.opt configs/example/arm/starter_fs.py --cpu=minor --num-cores=1 --disk-image=/home/farideh/Downloads/fullsystem/disks/linaro-minimal-aarch64.img --kernel=/home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616
Global frequency set at 1000000000000 ticks per second warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (2048 Mbytes) info: kernel located at: /home/farideh/Downloads/fullsystem/binaries/vmlinux.vexpress_gem5_v1_64.20170616 warn: Bootloader entry point 0x10 overriding reset address 0 warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match. system.vncserver: Listening for connections on port 5900 system.terminal: Listening for connections on port 3456 0: system.remote_gdb: listening for remote gdb on port 7000 info: Using bootloader at address 0x10 info: Using kernel entry physical address at 0x80080000 info: Loading DTB file: /home/farideh/gem5/m5out/system.dtb at address 0x88000000 warn: Existing EnergyCtrl, but no enabled DVFSHandler found. info: Entering event queue # 0. Starting simulation... warn: ClockedObject: Already in the requested power state, request ignored warn: SCReg: Access to unknown device dcc0:site0:pos0:fn7:dev0 33342160250: system.terminal: attach terminal 0 49510353000: system.terminal: detach terminal 0 62922704750: system.terminal: attach terminal 0 warn: Tried to read RealView I/O at offset 0x60 that doesn't exist warn: Tried to read RealView I/O at offset 0x48 that doesn't exist warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist simulate() limit reached # 18446744073709551615 root#farideh-S551LN:/home/farideh/gem5#
I do not know that Arm-fullsystem runs or not and why is telnet shows Connection closed by foreign host?

How to solve: UDP send of xxx bytes failed with error 11 in Ubuntu?

UDP send of XXXX bytes failed with error 11
I am running a WebRTC streaming app on Ubuntu 16.04.
It streams video and audio from Logitec HD Webcam c930e within an Electronjs Desktop App.
It all works fine and smooth running on my other machine Macbook Pro. But on my Ubuntu machine I receive errors after 10-20 seconds when the peer connection is established:
[2743:0513/193817.691636:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1019 bytes failed with error 11
[2743:0513/193817.691775:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696615:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696777:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.712369:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1029 bytes failed with error 11
[2743:0513/193817.712952:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.713086:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.717713:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
==> Btw, if I do NOT stream audio, but video only. I got the same error but only with the "video" between the Log lines...
somewhere in between the lines I also got one line that says:
[3441:0513/195919.377887:ERROR:stunport.cc(506)] sendto: [0x0000000b] Resource temporarily unavailable
I also looked into sysctl.conf and increased the values there. My currenct sysctl.conf looks like this:
fs.file-max=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
fs.nr_open=1048576
net.core.netdev_max_backlog=1048576
net.core.rmem_max=16777216
net.core.somaxconn=65535
net.core.wmem_max=16777216
net.ipv4.tcp_congestion_control=htcp
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_orphans=1048576
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_max_tw_buckets=400000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_wmem=4096 65535 16777216
vm.max_map_count=1048576
vm.min_free_kbytes=65535
vm.overcommit_memory=1
vm.swappiness=0
vm.vfs_cache_pressure=50
Like suggested here: https://gist.github.com/cdgraff/7920db287988463aafd7ea09eef6f9f0
It does not seem to help. I am still getting these errors and I experience lagging on the other side.
Additional info: on Ubuntu the Electronjs App connects to Heroku Server (Nodejs) and the other side of the peer connection (Chrome Browser) also connects to it. Heroku Server acts as Handshaking Server to establish WebRTC connection. Both have as configuration:
{'urls': 'stun:stun1.l.google.com:19302'},
{'urls': 'stun:stun2.l.google.com:19302'},
and also an additional Turn Server from numb.viagenie.ca
Connection is established and within the first 10 seconds the quality is very high and there is no lagging at all. But then after 10-20 seconds there is lagging and on the Ubuntu console I am getting these UDP errors.
The PC that Ubuntu is running on:
PROCESSOR / CHIPSET:
CPU Intel Core i3 (2nd Gen) 2310M / 2.1 GHz
Number of Cores: Dual-Core
Cache: 3 MB
64-bit Computing: Yes
Chipset Type: Mobile Intel HM65 Express
RAM:
Memory Speed: 1333 MHz
Memory Specification Compliance: PC3-10600
Technology: DDR3 SDRAM
Installed Size: 4 GB
Rated Memory Speed: 1333 MHz
Graphics
Graphics Processor Intel HD Graphics 3000
Could please anyone give me some hints or anything that could solve this problem?
Thank you
==============EDIT=============
I found in my very large strace log somewhere these two lines:
7671 sendmsg(17, {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 11}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 11
7661 <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 12}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=7671, uid=0, gid=0}}], msg_flags=0}, 0) = 11
On top of that, somewhere near when the error happens (at the end of the log file, just before I quit the application) I see in the log file the following:
https://gist.github.com/Mcdane/2342d26923e554483237faf02cc7cfad
First, to get an impression of what is happening in the first place, I'd look with strace. Start your application with
strace -e network -o log.strace -f YOUR_APPLICATION
If your application looks for another running process to turn the work too, start it with parameters so it doesn't do that. For instance, for Chrome, pass in a --user-data-dir value that is different from your default.
Look for = 11 in the output file log.strace afterwards, and look what happened before and after. This will give you a rough picture of what is happening, and you can exclude silly mistakes like sendtos to 0.0.0.0 or so (For this reason, this is also very important information to include in a stackoverflow question, for instance by uploading the output to gist).
It may also be helpful to use Wireshark or another packet capture program to get a rough overview of what is being sent.
Assuming you can confirm with strace that a valid send call is taken place, you can then further analyze the error conditions.
Error 11 is EAGAIN. The documentation of send says when this error is supposed to happen:
EAGAIN (...) The socket is marked nonblocking and the requested operation would block. (...)
EAGAIN (Internet domain datagram sockets) The socket referred to by
sockfd had not previously been bound to an address and, upon
attempting to bind it to an ephemeral port, it was determined that all
port numbers in the ephemeral port range are currently in use. See
the discussion of /proc/sys/net/ipv4/ip_local_port_range in
ip(7).
Both conditions could apply.
The first will be obvious by the strace log if you trace the creation of the socket involved.
To exclude the second, you can run netstat -una (or, if you want to know the programs involved, sudo netstat -unap) to see which ports are open (if you want Stack Overflow users to look into it, post the output on gist or similar and link to it here). Your port range net.ipv4.ip_local_port_range=1024 65535 is not the standard 32768 60999; this looks like you attempted to do something about lacking port numbers already. It would help to trace back to the reason of why you changed that parameter, and the conditions that convinced you to do so.

Error running topology in production cluster with Apache Storm 1.0.0, topology does not start

I have a topology that runs well on a Local cluster.
But when I try to run it on a production cluster the following things happens:
The nimbus is up
The storm UI is up
The two workers I use are up
Zookeper is up
I run storm with
storm jar myjar.jar MyClass
Nimbus submits the topology
The topologies and the workers appears in the storm UI
BUT:
The topology does not start despite the fact that its status is ACTIVE
The log file of the topology does not appear in the workers.
I have the following log in the worker on the supervisor.log:
2016-04-15 13:18:19.831 o.a.s.d.supervisor [WARN] There was a connection problem with nimbus. #error {
:cause jobs-rec-storm-nimbus
:via
[{:type java.lang.RuntimeException
:message org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: jobs-rec-storm-nimbus
:at [org.apache.storm.security.auth.TBackoffConnect retryNext TBackoffConnect.java 64]}
{:type org.apache.storm.thrift.transport.TTransportException
:message java.net.UnknownHostException: jobs-rec-storm-nimbus
:at [org.apache.storm.thrift.transport.TSocket open TSocket.java 226]}
{:type java.net.UnknownHostException
:message jobs-rec-storm-nimbus
:at [java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184]}]
:trace
[[java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184]
[java.net.SocksSocketImpl connect SocksSocketImpl.java 392]
[java.net.Socket connect Socket.java 589]
[org.apache.storm.thrift.transport.TSocket open TSocket.java 221]
[org.apache.storm.thrift.transport.TFramedTransport open TFramedTransport.java 81]
[org.apache.storm.security.auth.SimpleTransportPlugin connect SimpleTransportPlugin.java 103]
[org.apache.storm.security.auth.TBackoffConnect doConnectWithRetry TBackoffConnect.java 53]
[org.apache.storm.security.auth.ThriftClient reconnect ThriftClient.java 99]
[org.apache.storm.security.auth.ThriftClient <init> ThriftClient.java 69]
[org.apache.storm.utils.NimbusClient <init> NimbusClient.java 106]
[org.apache.storm.utils.NimbusClient getConfiguredClientAs NimbusClient.java 78]
[org.apache.storm.utils.NimbusClient getConfiguredClient NimbusClient.java 41]
[org.apache.storm.blobstore.NimbusBlobStore prepare NimbusBlobStore.java 268]
[org.apache.storm.utils.Utils getClientBlobStoreForSupervisor Utils.java 462]
[org.apache.storm.daemon.supervisor$fn__9590 invoke supervisor.clj 942]
[clojure.lang.MultiFn invoke MultiFn.java 243]
[org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351$fn__9369 invoke supervisor.clj 582]
[org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351 invoke supervisor.clj 581]
[org.apache.storm.event$event_manager$fn__8903 invoke event.clj 40]
[clojure.lang.AFn run AFn.java 22]
[java.lang.Thread run Thread.java 745]]}
2016-04-15 13:18:19.831 o.a.s.d.supervisor [INFO] Finished downloading code for storm id jobs-KafkaMigration-topology-3-1460740616
2016-04-15 13:18:19.850 o.a.s.d.supervisor [INFO] Missing topology storm code, so can't launch worker with assignment ...(some more numbers)
So I asume that I have a connection problem with nimbus, but the properties file in the worker is:
storm.zookeeper.servers:
- "192.168.22.209"
- "192.168.22.216"
- "192.168.22.217"
storm.local.dir: "/app/home/storm"
storm.zookeeper.root: "/storm-prod"
#
nimbus.seeds: ["192.168.120.96"]
And if I make a ping to the nimbus ip from the workers, it returns OK
Where is the error, How can I fix it?
Thanks!
Whats appears to happen in this context is that Storm supervisor resolves nimbus from whatever is configured in storm.yaml seeds/host the first time and from then on uses nimbus host name to download the topology artifacts.
If that is correct, DNS is mandatory for a cluster setup. This is far from ideal, specially when using containers in an orchestrated environment like kubernetes.
Current workaround i'm using is adding
storm.local.hostname: "<local.ip.value>"
to the storm.yaml
Thanks to #bastien who provided the tip on storm user mailing list
I ran into the similar issue. Turns out my firewall rules were blocking the supervisor ports. Make sure the supervisor and nimbus are able to talk to each other.
I found that I need to have the hostnames of the boxes match what I was calling them in the /etc/hosts file
in host file i had
xxx.xxx.xxx.xxx nimbus
but the host name on the box was different and it was pulling the hostname from the os
changing the host name on the os of the nimbus server resolved my issue.

RabbitMQ consumes memory and shuts

I just installed OpenStack Juno using devstack, and observed that RabbitMQ (package rabbitmq-server-3.1.5-10 installed by yum) is not stable, i.e. it quickly eats up the memory and shuts down; there is 2G of RAM. Below is the messages from logs and 'systemctl status' before the daemon died:
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
vm_memory_high_watermark clear. Memory used:835116352 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:40 ===
memory resource limit alarm cleared on node rabbit#node
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
accepting AMQP connection <0.27011.5> (10.0.0.11:55198 -> 10.0.0.11:5672)
=INFO REPORT==== 18-Dec-2014::01:25:41 ===
vm_memory_high_watermark set. Memory used:850213192 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:41 ===
memory resource limit alarm set on node rabbit#node.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
rabbitmqctl[770]: ===========
rabbitmqctl[770]: nodes in question: [rabbit#node]
rabbitmqctl[770]: hosts, their running nodes and ports:
rabbitmqctl[770]: - node: [{rabbitmqctl770,40089}]
rabbitmqctl[770]: current node details:
rabbitmqctl[770]: - node name: rabbitmqctl770#node
rabbitmqctl[770]: - home dir: /var/lib/rabbitmq
rabbitmqctl[770]: - cookie hash: FftrRFUESg4RKWsyb1cPqw==
systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
systemd[1]: Unit rabbitmq-server.service entered failed state.
I know about set_vm_memory_high_watermark, but it doesn't solve the issue. I want to ensure that the daemon doesn't shut down abruptly. I wonder if someone saw this before and could advise?
Thanks.
UPDATE
Upgraded to version 3.4.2 taken directly from www.rabbitmq.com/download.html The new version doesn't consume RAM that fast and tends to work longer then previous version, but eventually still eats out all the memory and shuts.
I think the number of connections in the servers are increasing and they are being held like that without closing that's why it is consuming more memory. When the usage of RAM increases beyond the watermark rabbitmq server won't accept any network request. Either you have to close the connections which all are opened or you have to increase the RAM of the system. But increasing the RAM will only reduce the problem for some time but you'll face the problem again it is better to close the connections.
try to use CloudAMQP instead of installing locally. This will be fixed then. firstly create a rabbitMQ account here. https://customer.cloudamqp.com/signup.
then create your queue there and connect with your application.