I am running Nessus on a Red Hat 7 machine with 10GB RAM an 2 sockets. In doing a scaled down scan of specific machines (VM's in vCenter) the scan will finish. When doing a full scan with all assets the san will never finish. I ran it over the weekend and it still never finished. What I need to know is what logs can I check to troubleshoot the issue. I am not the originator of the scan jobs or scans. I inherited this setup, couldn't tell you the particulars about plugins and such. I have looked through nessusd.messages a little but found nothing useful other than a bunch of machines that are 'dead'. Looked for the IP's in vCenter and they don't exist.
There has to be a log I can interrogate.
Related
We are currently upgrading a TYPO3-Installation with about 60.000 Pages to V9.
The Upgrade-Wizard "Introduce URL parts ("slugs") to all existing pages" does not finish. In Browser (Install-Tool) I get a time-out.
Calling it via
./vendor/bin/typo3cms upgrade:wizard pagesSlugs
results in following Error:
[ Symfony\Component\Process\Exception\ProcessSignaledException ]
The process has been signaled with signal "9".
After using my favourite internet-search-engine I thinks that means most likely "out of memory".
Sadly the database doesn't seams to be touched at all - so no pages got the slug after that. That means just running this process several times will not help. Observing the Process the PHP-Process takes all memory it can get, then filling the swap. When the swap is full the process crashes.
Tested so far on a local Docker with 16GB RAM Host and on a Server with 8 Cores but 8GB RAM (DB is on an external Machine).
Any ideas to fix that?
After debugging I found out that the reason for this are messed up relations in database. So there are non deleted pages which points to non existing parents. This was mainly caused by a heavy clean up of the database before. Beside the wizard is not checking that and could be an improvement on it - the main problem is my database in that case.
On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.
There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.
There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?
How do I prevent VMs from crashing?
CPU usage graph
This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.
One of the best ways to tackle this is:
increase the resources of your VM
then go back to code and find out if there's something that is leaking in the process or memory
if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.
I've been using 3 identical VMs on Azure for a month or more without problem.
Today I couldn't Remote Desktop to one of them, and restarted it from the Azure Portal. That took a long time. It eventually came back up, and the Event log has numerous entries such as:
"The IO operation at logical block address 70 for Disk 0 ..... was retried"
"Windows cannot access the file C:\windows\Microsoft.Net\v4.0.30319\clrjit.dll for one of the following reasons, network, disk etc.
There are lots of errors like this. To me they seem symptomatic that the underlying disk system is having serious problems. Given the VHD is stored in a triple replicated Azure blob, I would have thought there was some immunity to this kind of thing?
Many hours later it's still doing the same thing. It works fine for a few hours, then slows to a crawl with the Event log containing lots of disk problems. I can upload screen shots of the event log if people are interested.
This is a pretty vanilla VM, I'm only using the one OS disk it came with.
The other two identical VMs in the same region are fine.
Just wondering if anybody has seen this before with Azure VMs and how to safeguard against it, or recover from it.
Thanks.
Thank you for providing all the details and we apologize for the inconvenience. We have investigated the failures and determined that they were caused by a platform issue. Your virtual machine’s disk does not have any problems and therefore you should be able to continue using it as is.
THE PROBLEM
My server gave me an ultimatum (3 business days):
"We regret to say That database is currently consuming excessive resources on our servers Which causes our servers to degrade performance Affecting ITS customers to other database driven sites are hosted on this server That. The database / tables / queries statistical information's are provided below:
AVG Queries / logged / killed
79500/0/0
There are Several Reasons where the queries gets Increased. Unused plugins will Increase the number of queries. If the plugins are not causing the issue, you can go ahead and block the IP addresses of the spammers Which will optimize the queries. Also you can look for any spam Existed contents in the database and clear them up.
You need to check for the top hitters in the Stats page. Depending upon the bandwidth accessed, top hits and IP you need to take specific actions on Them to optimize the database queries. you need to block the Unknown robot (Identified by 'bot *'). Since These bots are scraping content from your website, blog comment spamming your area, harvesting email addresses, sniffing for security holes in your scripts, trying to use your mail form scripts as relays to send spam email. .htaccess Editor tool is available to block the IP address."
THE BACKGROUND
The site is made 100% from us in VB. NET, mySQL and platform of Win (except the Snitz Forum). The only point from which we received SPAM was a form for comments which now has a captcha. We talk of more than 4000 files between tools articles, forums, etc. for a total of 19GB of space. Only upload it takes me 2 weeks.
STATISTICS OF ROBOTS
Awstats tells us for the month of February 2012:
ROBOT AND SPIDER
Googlebot
+303 2572945 accesses
5:35 GB
Unknown robot (Identified by 'bot *')
772520 accesses +2740
259.55 MB
BaiDuSpider
+95 96 639 access
320.02 MB
Google AdSense
35907 accesses
486.16 MB
MJ12bot
33567 +1208 access
844.52 MB
Yandex bot
+104 18 876 access
433.84 MB
[...]
STATISTICS OF IP
IP
41.82.76.159
11681 pages
12078 accesses
581.68 MB
87.1.153.254
9807 pages
10734 accesses
788.55 MB
[...]
other
249561 pages
4055612 accesses
59.29 GB
THE SITUATION
Help!!! I don't know how to block IP with .htaccess and I don't know what IP! I'm not sure! Awstats ends without the past 4 days!
I already tried in the past to change the password of FTP and account, nothing! The goal is not I think are generic attacks aimed at obtaining backlinks and redirects (often do not work)!
This isn't really an htaccess issue. Look at your own stats. You've had ~4M hits generating some 12Kb per hit in the last 4 days. I ran the OpenOffice.org user forums for 5 years and this sort off access rate can be typical for a busy forum. I used to run on a dedicated quad-core box, but migrated this a modern single core VM and when tuned, this took this sort of load.
The relative Bot volumetrics are also not surprising as a % of these volumes, nor are the 75K D/B queries.
I think that what your hosting provider is pointing out is that you are using an unacceptable amount of system (D/B) resources for your type of account. You either need to upgrade your hosting plan or examine how you can optimise your database use. E.g. are your tables properly indexed and do you routinely do a Check/Analyze/Optimize of all tables. If not then you should!
It may well be that spammers are exploiting your forum for SPAM link posts, but you need to look at the content in the first instance to see if this is the case.
I have an active/passive W2K8 (64) cluster pair, running SQL05 Standard. Shared storage is on a HP EVA SAN (FC).
I recently expanded the filesystem on the active node for a database, adding a drive designation. The shared storage drives are designated as F:, I:, J:, L: and X:, with SQL filesystems on the first 4 and X: used for a backup destination.
Last night, as part of a validation process (the passive node had been offline for maintenance), I moved the SQL instance to the other cluster node. The database in question immediately moved to Suspect status.
Review of the system logs showed that the database would not load because the file "K:\SQLDATA\whatever.ndf" could not be found. (Note that we do not have a K: drive designation.)
A review of the J: storage drive showed zero contents -- nothing -- this is where "whatever.ndf" should have been.
Hmm, I thought. Problem with the server. I'll just move SQL back to the other server and figure out what's wrong..
Still no database. Suspect. Uh-oh. "Whatever.ndf" had gone into the bit bucket.
I finally decided to just restore from the backup (which had been taken immediately before the validation test), so nothing was lost but a few hours of sleep.
The question: (1) Why did the passive node think the whatever.ndf files were supposed to go to drive "K:", when this drive didn't exist as a resource on the active node?
(2) How can I get the cluster nodes "re-syncd" so that failover can be accomplished?
I don't know that there wasn't a "K:" drive as a cluster resource at some time in the past, but I do know that this drive did not exist on the original cluster at the time of resource move.
Random thought based on what happened to me a few months ago... sound quite similar
Do you have NFTS mount points? I forget what it was exactly (me code monkey and relied on DBAs), but the mount points were either "double booked" or not part of the cluster resource or the SAN volumes were not configured correctly.
We had "zero size" drives (I used xp_fixeddrives) for our log files but we could still write to them.
Assorted reboots and failovers were unsuccessful. Basically, it was a thorough review of all settings in the SAN management tool.
A possibility for your K: drive...
The other thing I've seen is the mounted drives have letters as well as being mounted in folders. I used to use mounted folders for SQL Server but the backup system used a direct drive letter.