shell script that displays a CPU utilization percentage on AIX servers - aix

I want to get the consolidated percentage like, at this moment when i execute a script it should display output like "50% cpu utilization overall" some thing like this.
Is there a way i can achieve this in AIX..!?

CPU=`vmstat |tail -1 | awk '{print $14 + $15 }'`
echo "$CPU% cpu utilization overall"
This just adds the user and sys CPU usage from vmstat.

Related

Awk slower than lz4 decompression

I have half a million files with 290 MB each that mostly consist of numbers.
I'd like to (routinely) filter through this data, but find that awk is slower than decompression.
For example,
/usr/bin/time unlz4 bigfile.lz4 --stdout > /dev/null
0.20user 0.05system 0:00.44elapsed 57%CPU
/usr/bin/time unlz4 bigfile.lz4 --stdout | awk '{if ($26>120.) print}' > /dev/null
0.25user 0.25system 0:01.35elapsed 37%CPU
Notes:
Before each timing, I cleared the page cache.
The size of the output data is small and not relevant. In this excercise the output is discarded altogether.
awk here is gawk 5.0.1 on Ubuntu.
Tried mawk instead of awk. It didn't make a difference.
I wrote a C program that reads the data with fscanf. It was significantly slower than awk.
I tried reading from HDD and SSD. awk is slower than unlz4 for both.
Each lz4 file has about 66 MB (compressed from 290 MB).
Using uncompressed files is even slower. cat bigfile | awk '{if ($26>120.) print}' > /dev/null
I conclude that it does not help to use a fast decompression format like lz4, instead of stronger and slower compression formats, because even the simplest filtering with awk will be the bottleneck.
Does anybody have any insight or bright ideas about this? Is there a way to speed this up or have I hit the physical limit?
It looks like your files are small enough that startup time is a significant contributor to your runtime. Simply put, unzl4 | anything > /dev/null is always going to take a little bit longer than unlz4 > /dev/null because both ends of the pipeline need to start before processing can happen. So measuring a larger time for the pipeline case doesn't necessarily mean that the consumer is slower than the producer, or that your choice of compression algorithm is irrelevant. If you want to measure the impact of changing your compression algorithm, change your compression algorithm, and measure it!

How does piping handle multiple files in linux?

So a naive me wanted to parse 50 files using awk, so I did the following
zcat dir_with_50files/* > huge_file
cat huge_file | awk '{parsing}'
Of course, this was terrible because it would spend time creating a file, then consume a whole bunch of memory to pass along to awk.
Then a coworker showed me that I could do this.
zcat dir_with_50files/filename{0..50} | awk '{parsing}'
I was amazed that I would get the same results without the memory consumption.
ps aux also showed that the two commands ran in parallel. I was confused about what was happening and this SO answer partially answered my question.
https://stackoverflow.com/a/1072251/6719378
But if piping knows to initiate the second command after certain amount of buffered data, why does my naive approach consume so much more memory compared to the second approach?
Is it because I am using cat on a single file compared to loading multiple files?
you can reduce maximuml memory usage by zcat file by file
ex:
for f in dir_with_50files/*
do
zcat f | awk '{parsing}' >> Result.File
done
# or
find dir_with_50files/ -exec zcat {} | awk '{parsing}' >> Result.File \;
but it depend on your parsing
ok for modfying line, deleting, copying if there is no relation to previous items ( ex: sub( /foo/, "bar"))
bad for counter (ex: List[$2]++ ) or related (modification) (ex: NR != FNR {...}; ! List[$2]++ {...})

Parallel processing in awk?

Awk processes the files line by line. Assuming each line operation has no dependency on other lines, is there any way to make awk process multiple lines at a time in parallel?
Is there any other text processing tool which automatically exploits parallelism and processes the data quicker ?
The only awk implementation that was attempting to provide a parallel implementation of awk was parallel-awk but it looks like the project is dead now.
Otherwise, one way to parallelize awk is be to split your input in chunks and process them in parallel. However, splitting the input data would still be single threaded so might defeat the performance enhancement goal, the main issue being the standard split command is unable to split at line boundaries without reading each and every line.
If you have GNU split available, or a version that support the -n l/* option, here is one optimized way to process your file in parallel, assuming here you have 8 vCPUs:
inputfile=input.txt
outputfile=output.txt
script=script.awk
count=8
split -n l/$count $inputfile /tmp/_pawk$$
for file in /tmp/_pawk$$*; do
awk -f script.awk $file > ${file}.out &
done
wait
cat /tmp/_pawk$$*.out > $outputfile
rm /tmp/_pawk$$*
You can use GNU Parallel for this purpose
Consider you are counting the sum of numbers in a big file:
cat rands20M.txt | awk '{s+=$1} END {print s}'
With GNU Parallel you can do it in multiple threads:
cat rands20M.txt | parallel --pipe awk \'{s+=\$1} END {print s}\' | awk '{s+=$1} END {print s}'

Scripts Linux - awk

I have small problem. I need to explaint, what does awk.
I need to write a script that monitors the load on the system is overloaded (CPU, RAM) and writes message.
I have this:
if
[[ $(bc <<< "$(top -b -n1 | grep ^Cpu | awk -F': ' '{print $2}' | awk -F% '{print $1}') >= 100") -eq 1 ]] ; then echo '...';
fi
This is for CPU. Can anybody explain me what does the awk in this example? And how would be awk for RAM?
The first awk invocation will print the second token on any line, where tokens are separated by a colon or a space.
The second will print the first token in any line where the tokens are separated by a percent sign (%).
To get the used memory on a Linux system:
free | awk '/Mem:/ {print $3;}'
This is a very fragile script based upon what looks like an old version of top. It is very very easy to inspect this piece of script though - so, let's go through it. We start with the following:
top -b -n1
Which (reading the manual for top) places top into batch mode (meaning that instead of wanting to play interactively with top, we want to send output to another command) and outputs with 1 iteration. That will get us output like the following:
$ top -b -n1
top - 10:48:33 up 1 day, 22:51, 3 users, load average: 1.21, 1.27, 1.03
Tasks: 262 total, 2 running, 260 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.5 us, 5.2 sy, 11.3 ni, 67.3 id, 1.6 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 8124692 total, 6722112 used, 1402580 free, 384188 buffers
KiB Swap: 4143100 total, 430656 used, 3712444 free. 2909664 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11012 user1 20 0 426412 14436 5740 R 97.1 0.2 19:27.98 dleyna-renderer
4579 root 20 0 286480 152924 31152 S 13.0 1.9 24:15.49 Xorg
1 root 20 0 185288 4892 3352 S 0.0 0.1 0:02.52 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:02.77 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 1:32.00 rcu_sched
When we pipe this to grep ^Cpu ... well it looks like this is where we discover some breakage that indicates that the version of top that we are using in this answer may be different in output from the version the original script expected. It looks like the intent is to match on ^%Cpu instead. Here is the corrected piece:
$ top -b -n1 | grep ^%Cpu
%Cpu(s): 14.6 us, 5.2 sy, 11.2 ni, 67.3 id, 1.6 wa, 0.0 hi, 0.1 si, 0.0 st
The next piece of the pipe is to just get rid of the '%Cpu(s): ' piece:
$ top -b -n1 | grep ^%Cpu | awk -F': ' '{print $2}'
15.1 us, 5.0 sy, 10.8 ni, 67.4 id, 1.6 wa, 0.0 hi, 0.1 si, 0.0 st
And then the next piece... awk -F% '{print $1}' -- doesn't make sense again for this answer's version of top, as the script is looking to print what's to the left of a % sign -- and there is no % in our output. So... we are left wondering where we need to go from here.
From the rest of the script... the result of the pipeline is compared to 100... so, I assume the version of top that the script was meant to parse had a percentage of CPU utilization total in the first column... in our version of top output is all broken out with much more granularity. Here is the breakdown for the immediately preceding output:
15.1% -- spent in normal priority user/applications
5.0% -- spent in system/kernel
10.8% -- spent in low priority batch or daemon jobs
67.4% -- spent "idle"
1.6% -- spent waiting for I/O to complete
0.0% -- spent in servicing HW interrupts
0.1% -- spent in servicing software interrupts
0.0% -- spent stolen by another VM running on the HW
------------------------------------------------------
100.0% -- Total
... So, on modern Linux systems, top provides a lot more information, and maybe we need to look at the problem differently. In this case, we could look at (idle * 10) as the metric -- as in shell, we only have integer math and comparison available to us. So, we will adjust the script a little... and while we are at it, let's get rid of the grep in the pipeline as that can just as easily be done by awk as well:
$ top -b -n1 | awk -F, '/^%Cpu/ {print $4}'
67.8 id
Now let's adjust it so that it gives us just the idle value multiplied by 10:
$ top -b -n1 | awk -F, '/^%Cpu/ { sub(/id/,"",$4); print $4*10 }'
678
Ok, the next part of the original script uses bc to see if we are 100% utilized. As we now are looking at idle rather than utilization, we want the opposite of the original script. Also, we don't need the complication of bc now that the output is scaled to integer. Let's just use shell in our comparison?
$ if [ $(top -b -n1 | awk -F, '/^%Cpu/ { sub(/id/,"",$4); print $4*10 }') -le 0 ]; then echo '...'; fi
And that is it.
This answer was all to show how the code works -- how to interpret and parse the output of top through a pipeline, how to go about the task of figuring out what a piece of script does, and how to go about repairing a fragile/broken script. However, the original script is not only fragile but is pretty much broken by design. The metric we use to detect an overloaded system is more like the "load average" that is found at the first line of the output of the top command, or even better that can be parsed from the output of the uptime command.
A way to find out overload may be to look at load average divided by number of cpu's. Number of cpu's can be found easily parsing /proc/cpuinfo:
$ grep ^processor /proc/cpuinfo | wc -l
4
Here is one example where 400% load over 15 minutes is considered to be the continuous loading threshold:
load=$(uptime | awk -F, '{ print $(NF) * 1.0 }')
proc=$(grep ^processor /proc/cpuinfo | wc -l)
plod=$(awk "BEGIN { x = 100 * $load / $proc; print int(x) + int(x+x)%2 }")
if [ $plod -gt 400 ]; then echo '...'; fi
note: int(x) + int(x+x)%2 is a rounding function
For the amount of free memory on the system, I like schtever's answer -- except that I would use column 4 rather than column 3 and check that for low memory.

Getting amount of memory processes usage from 'ps aux' output with awk in MB

I`m newbie with Linux so... I have a exercise to do: using only ps, grep, awk, gawk determine total amount of memory processes used by SQL. How i can determine with ps how much memory used by any process? Mem in ps displaying in percents...
Thanks alot.
ps auwx | awk '/[s]ql/ {total += $4} END {print total}'
You might want to look for $11 ~ /sql/ instead to find actual (my)sql processes instead of any command with "sql" in any argument.