Top high IO consuming process - sar

I would like to know the Top high IO consuming process. I used 'sar' for the same. But it does not display the pid of the processes. Please suggest an efficient way for the same.
From questions, previously asked I understand there is a utility called iotop. But unfortunately, I cant be installing it on our systems without organizational approval. Please suggest other alternative ways.
sar > file_sar
cat file_sar|sort -nr -k7,7|head
12:10:01 AM all 15.06 0.00 6.59 0.53 0.10 77.72
01:50:02 AM all 16.67 0.00 6.30 0.20 0.08 76.74
12:50:01 AM all 11.09 0.00 2.64 0.18 0.08 86.01
12:30:02 AM all 12.44 0.00 1.68 0.17 0.08 85.63
01:10:02 AM all 13.43 0.00 1.83 0.16 0.11 84.47
01:30:02 AM all 13.21 0.00 5.86 0.13 0.07 80.73
Average: all 13.20 0.00 3.94 0.15 0.08 82.63
12:20:01 AM all 10.94 0.00 1.53 0.07 0.08 87.38
12:40:02 AM all 8.17 0.00 1.28 0.06 0.07 90.42
01:20:01 AM all 15.35 0.00 6.14 0.06 0.09 78.36

Related

Printing specific columns as a percentage

I have multi index dataframe and I want to convert two columns' value into percentage values.
Capacity\nMWh Day-Ahead\nMWh Intraday\nMWh UEVM\nMWh ... Cost Per. MW\n(with Imp.)\n$/MWh Cost Per. MW\n(w/o Imp.)\n$/MWh Intraday\nMape Day-Ahead\nMape
Power Plants Date ...
powerplant1 2020 January 3.6 446.40 492.70 482.50 ... 0.05 0.32 0.04 0.10
2020 February 0.0 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00
2020 March 0.0 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00
2020 April 0.0 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00
I used apply('{:0%}'.format):
nested_df[['Intraday\nMape', 'Day-Ahead\nMape']] = \
nested_df[['Intraday\nMape', 'Day-Ahead\nMape']].apply('{:.0%}'.format)
But I got this error:
TypeError: ('unsupported format string passed to Series.__format__', 'occurred at index Intraday\nMape')
How can I solve that?
Use DataFrame.applymap:
nested_df[['Intraday\nMape', 'Day-Ahead\nMape']] = \
nested_df[['Intraday\nMape', 'Day-Ahead\nMape']].applymap('{:.0%}'.format)

Varnish not caching simple GET request to backend

I'm experimenting with Varnish 6.0.3 as a caching server community version installed on a CentOS 7.6.
I deploy Varnish behind an NGINX, working as SSL offload and proxy. The Varnish server sends requests to another NGINX (Kubernetes ingress controller) that prox again to a JAVA SpringBoot backend.
The NGINX front off Varnish is configured as follow:
server {
listen 443 ssl;
server_name cache.mydomain.io;
ssl_certificate /opt/ssl/my.crt;
ssl_certificate_key /opt/ssl/my.key;
access_log /var/log/nginx/cache.mydomain.io-access.log;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://10.100.0.7:6081;
}
}
I'm using Varnish with a very default configuration:
backend default {
.host = "10.100.16.128";
.port = "80";
}
sub vcl_recv {
# Happens before we check if we have this in cache already.
#
# Typically you clean up the request here, removing cookies you don't need,
# rewriting the request, etc.
}
sub vcl_backend_response {
# Happens after we have read the response headers from the backend.
#
# Here you clean the response headers, removing silly Set-Cookie headers
# and other mistakes your backend does.
}
sub vcl_deliver {
# Happens when we have all the pieces we need and are about to send the
# response to the client.
#
# You can do accounting or modifying the final object here.
}
After this, I started sending requests as follow:
curl 'https://cache.mydomain.io/health'
{"status":"UP"}
and I'm able to see the request follow all the chain, from NGINX to Varnish and the JAVA backend.
But trying again and again with the same GET request, I never have a cache HIT on Varnish side. All the request are miss and served from the backend according to the varnishstat cmd.
Uptime mgt: 0+00:22:30 Hitrate n: 10 100 133
Uptime child: 0+00:22:31 avg(n): 0.0000 0.0000 0.0000
NAME CURRENT CHANGE AVERAGE AVG_10 AVG_100 AVG_1000
MGT.uptime 0+00:22:30
MAIN.uptime 0+00:22:31
MAIN.sess_conn 37 0.00 0.03 0.00 0.23 0.24
MAIN.client_req 37 0.00 0.03 0.00 0.23 0.24
MAIN.cache_hitmiss 35 0.00 0.03 0.00 0.22 0.23
MAIN.cache_miss 37 0.00 0.03 0.00 0.23 0.24
MAIN.backend_conn 2 0.00 0.00 0.00 0.01 0.01
MAIN.backend_reuse 35 0.00 0.03 0.00 0.22 0.23
MAIN.backend_recycle 37 0.00 0.03 0.00 0.23 0.24
MAIN.fetch_chunked 37 0.00 0.03 0.00 0.23 0.24
MAIN.pools 2 0.00 . 2.00 2.00 2.00
MAIN.threads 200 0.00 . 200.00 200.00 200.00
MAIN.threads_created 200 0.00 0.15 0.00 0.00 0.00
MAIN.n_object 1 0.00 . 1.00 1.00 1.00
MAIN.n_objectcore 1 0.00 . 1.00 1.00 1.00
MAIN.n_objecthead 3 0.00 . 3.00 2.93 2.92
MAIN.n_backend 1 0.00 . 1.00 1.00 1.00
MAIN.n_expired 1 0.00 0.00 0.00 0.00 0.00
MAIN.s_sess 37 0.00 0.03 0.00 0.23 0.24
MAIN.s_fetch 37 0.00 0.03 0.00 0.23 0.24
MAIN.s_req_hdrbytes 7.37K 0.00 5.59 0.00 47.23 49.35
MAIN.s_resp_hdrbytes 31.26K 0.00 23.70 0.01 200.29 209.30
MAIN.s_resp_bodybytes 555 0.00 0.41 0.00 3.47 3.63
MAIN.sess_closed 37 0.00 0.03 0.00 0.23 0.24
MAIN.backend_req 37 0.00 0.03 0.00 0.23 0.24
MAIN.n_vcl 1 0.00 . 1.00 1.00 1.00
MAIN.bans 1 0.00 . 1.00 1.00 1.00
SMA.s0.g_space 256.00M 0.00 . 256.00M 256.00M 256.00M
SMA.Transient.c_req 111 0.00 0.08 0.00 0.69 0.73
SMA.Transient.c_bytes 624.05K 0.00 473.00 0.29 3.90K 4.08K
SMA.Transient.c_freed 623.20K 0.00 472.36 0.29 3.90K 4.07K
SMA.Transient.g_alloc 1 0.00 . 1.00 1.00 1.00
SMA.Transient.g_bytes 872 0.00 . 872.00 872.00 872.00
VBE.boot.default.bereq_hdrbytes 8.54K 0.00 6.47 0.00 54.75 57.22
VBE.boot.default.beresp_hdrbytes 29.16K 0.00 22.10 0.01 186.84 195.24
VBE.boot.default.beresp_bodybytes 555 0.00 0.41 0.00 3.47 3.63
VBE.boot.default.req 37 0.00 0.03 0.00 0.23 0.24
Can you help me understand what's going wrong with my actual configuration?
Actually moved to cache using NGINX, as documented here: https://serversforhackers.com/c/nginx-caching
works really fine.

Awk: Removing duplicate lines without sorting after matching conditions

I've got a list of devices which I need to remove duplicates (keep only the first occurrence) while preserving order and matching a condition. In this case I'm looking for a specific string and then printing the field with the device name. Here is some example raw data from the sar application:
10:02:01 AM sdc 0.70 0.00 8.13 11.62 0.00 1.29 0.86 0.06
10:02:01 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:02:01 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sdc 1.31 3.73 99.44 78.46 0.02 17.92 0.92 0.12
Average: sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:01 AM sdc 2.70 0.00 39.92 14.79 0.02 5.95 0.31 0.08
10:05:01 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:01 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:06:01 AM sdc 0.83 0.00 10.00 12.00 0.00 0.78 0.56 0.05
11:04:01 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11:04:01 AM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sdc 0.70 2.55 8.62 15.91 0.00 1.31 0.78 0.05
Average: sda 0.12 0.95 0.00 7.99 0.00 0.60 0.60 0.01
Average: sdb 0.22 1.78 0.00 8.31 0.00 0.54 0.52 0.01
The following will give me the list of devices from lines containing the word "average" but it sorts the output:
sar -dp | awk '/Average/ {devices[$2]} END {for (device in devices) {print device}}'
sda
sdb
sdc
The following gives me exactly what I want (command from here):
sar -dp | awk '/Average/ {print $2}' | awk '!devices[$0]++'
sdc
sda
sdb
Maybe I'm missing something painfully obvious but I can't figure out how to do the same in one awk command, that is without piping the output of the first awk into the second awk.
You can do:
sar -dp | awk '/Average/ && !devices[$2]++ {print $2}'
sdc
sda
sdb
The problem is this part for (device in devices). For some reason the for does randomize the output.
I have read a long complicated information on why some where but have not the link.
awk '/Average/ && !devices[$2]++ {print $2}' sar.in
You just need to combine the two tests. The only caveat is that in the original the entire line is field two from the original input so you need to replace $0 with $2.

Why my SQLite3 query takes time?

Context: I developed a Read-Only VFS for SQLite using the C API. The thing is what I need is both speed and small file size. I solved the size problem with a LZ4 based VFS. However I have some speed issues when I query my DB.
Specifications:
- I work on Linux (Ubuntu 12.10)
- DB files are 275MB compressed and about 700MB uncompressed
- I am doing queries on indexed fields.
- I evaluate the time taken for a given query after droping caches (echo 3 | sudo tee /proc/sys/vm/dropcaches)
Problem:
When I query the DB with the command time, I get the following output:
real 0m5.933s
user 0m0.124s
sys 0m0.096s
What is surprising is the difference between user+sys and real. This is why I decided to profile, with gprof, the code I have written as well as its dependencies (sqlite3,lz4). Hereafter, you will find few lines of the gprof flat and call-graph representation. After that, I have no idea about what to look out if I want to find a solution. Mainly because I do not understand why (and where) all this time is wasted. I hope you can help me.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
93.75 0.15 0.15 2948 0.05 0.05 LZ4_decompress_fast
6.25 0.16 0.01 26068 0.00 0.01 sqlite3VdbeCursorMoveto
0.00 0.16 0.00 54088 0.00 0.00 sqlite3GetVarint
0.00 0.16 0.00 28459 0.00 0.00 sqlite3VdbeSerialGet
0.00 0.16 0.00 23708 0.00 0.00 sqlite3VdbeMemNulTerminate
0.00 0.16 0.00 23699 0.00 0.00 sqlite3_transfer_bindings
0.00 0.16 0.00 11910 0.00 0.00 sqlite3DbMallocSize
0.00 0.16 0.00 11883 0.00 0.00 sqlite3VdbeMemGrow
0.00 0.16 0.00 11851 0.00 0.00 sqlite3VdbeMemMakeWriteable
0.00 0.16 0.00 9480 0.00 0.00 fetchPayload
0.00 0.16 0.00 9478 0.00 0.00 sqlite3VdbeMemRelease
0.00 0.16 0.00 7170 0.00 0.02 btreeGetPage
0.00 0.16 0.00 7170 0.00 0.00 pcache1Fetch
0.00 0.16 0.00 7170 0.00 0.00 releasePage
0.00 0.16 0.00 7170 0.00 0.02 sqlite3PagerAcquire
0.00 0.16 0.00 7170 0.00 0.00 sqlite3PagerUnrefNotNull
0.00 0.16 0.00 7170 0.00 0.00 sqlite3PcacheFetch
0.00 0.16 0.00 7170 0.00 0.00 sqlite3PcacheRelease
0.00 0.16 0.00 7169 0.00 0.00 pcache1Unpin
0.00 0.16 0.00 7169 0.00 0.00 pcacheUnpin
0.00 0.16 0.00 7168 0.00 0.02 getAndInitPage
0.00 0.16 0.00 7165 0.00 0.02 moveToChild
granularity: each sample hit covers 2 byte(s) for 6.25% of 0.16 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.00 0.16 main [1]
0.00 0.16 1/1 sqlite3_exec <cycle 1> [5]
0.00 0.00 1/1 openDatabase [18]
0.00 0.00 2/3 sqlite3_vfs_find [249]
0.00 0.00 1/1 sqlite3_crodvfs [358]
0.00 0.00 1/14 sqlite3_vfs_register <cycle 5> [204]
0.00 0.00 1/1 sqlite3_open_v2 [359]
-----------------------------------------------
[2] 99.9 0.00 0.16 1+198 <cycle 1 as a whole> [2]
0.00 0.16 2 sqlite3_exec <cycle 1> [5]
0.00 0.00 2 sqlite3InitOne <cycle 1> [27]
0.00 0.00 124 sqlite3Parser <cycle 1> [82]
0.00 0.00 6+4 sqlite3WalkSelect <cycle 1> [186]
0.00 0.00 9 sqlite3ReadSchema <cycle 1> [149]
0.00 0.00 7 sqlite3LockAndPrepare <cycle 1> [168]
0.00 0.00 7 sqlite3Prepare <cycle 1> [171]
0.00 0.00 7 sqlite3RunParser <cycle 1> [172]
0.00 0.00 5 sqlite3InitCallback <cycle 1> [196]
0.00 0.00 5 sqlite3_prepare <cycle 1> [203]
0.00 0.00 5 sqlite3SelectPrep <cycle 1> [198]
0.00 0.00 4 sqlite3LocateTable <cycle 1> [213]
0.00 0.00 3 sqlite3StartTable <cycle 1> [243]
0.00 0.00 2 selectExpander <cycle 1> [277]
0.00 0.00 2 sqlite3Select <cycle 1> [299]
0.00 0.00 2 sqlite3CreateIndex <cycle 1> [282]
0.00 0.00 2 sqlite3_prepare_v2 <cycle 1> [312]
0.00 0.00 2 resolveSelectStep <cycle 1> [275]
0.00 0.00 2 resolveOrderGroupBy <cycle 1> [274]
0.00 0.00 1 sqlite3Init <cycle 1> [347]
-----------------------------------------------
0.00 0.16 2374/2374 sqlite3_step [4]
[3] 99.9 0.00 0.16 2374 sqlite3VdbeExec [3]
0.01 0.15 26068/26068 sqlite3VdbeCursorMoveto [6]
0.00 0.00 53/54 moveToLeftmost [17]
0.00 0.00 2/3 sqlite3BtreeBeginTrans [20]
0.00 0.00 1/2370 sqlite3BtreeMovetoUnpacked [16]
0.00 0.00 2372/2372 sqlite3BtreeNext [28]
0.00 0.00 1/2371 moveToRoot [23]
0.00 0.00 26068/28459 sqlite3VdbeSerialGet [34]
0.00 0.00 23699/23708 sqlite3VdbeMemNulTerminate [35]
0.00 0.00 23699/23699 sqlite3_transfer_bindings [36]
0.00 0.00 11851/11851 sqlite3VdbeMemMakeWriteable [39]
0.00 0.00 7108/7109 sqlite3BtreeKeySize [49]
0.00 0.00 4741/9480 fetchPayload [40]
0.00 0.00 4739/4739 sqlite3VdbeMemFromBtree [55]
0.00 0.00 4739/9478 sqlite3VdbeMemRelease [41]
0.00 0.00 2374/2376 sqlite3VdbeMemShallowCopy [65]
0.00 0.00 2372/2376 sqlite3VdbeCheckFk [64]
0.00 0.00 2372/2372 sqlite3VdbeCloseStatement [66]
0.00 0.00 2372/4742 btreeParseCellPtr [54]
0.00 0.00 2372/4742 btreeParseCell [53]
0.00 0.00 2370/2391 sqlite3VdbeRecordCompare [63]
0.00 0.00 2369/2369 sqlite3VdbeIntValue [67]
0.00 0.00 893/893 sqlite3VdbeRealValue [74]
0.00 0.00 3/3 sqlite3VdbeFreeCursor [245]
0.00 0.00 3/3 allocateCursor [226]
0.00 0.00 3/3 sqlite3BtreeCursor [237]
0.00 0.00 2/9 sqlite3VdbeHalt [152]
0.00 0.00 1/36 sqlite3BtreeLeave [98]
0.00 0.00 1/6 sqlite3BtreeGetMeta [185]
0.00 0.00 1/1 sqlite3GetVarint32 [345]

Deleting entire columns of from a text file using CUT command or AWK program

I have a text file in the form below. Could someone help me as to how I could delete columns 2, 3, 4, 5, 6 and 7? I want to keep only 1,8 and 9.
37.55 6.00 24.98 0.00 -2.80 -3.90 26.675 './gold_soln_CB_FragLib_Controls_m1_9.mol2' 'ethyl'
38.45 1.39 27.36 0.00 -0.56 -2.48 22.724 './gold_soln_CB_FragLib_Controls_m2_6.mol2' 'pyridin-2-yl(pyridin-3-yl)methanone'
38.47 0.00 28.44 0.00 -0.64 -2.42 20.387 './gold_soln_CB_FragLib_Controls_m3_3.mol2' 'pyridin-2-yl(pyridin-4-yl)methanone'
42.49 0.07 30.87 0.00 -0.03 -3.24 22.903 './gold_soln_CB_FragLib_Controls_m4_5.mol2' '(3-chlorophenyl)(pyridin-3-yl)methanone'
38.20 1.47 27.53 0.00 -1.13 -3.28 22.858 './gold_soln_CB_FragLib_Controls_m5_2.mol2' 'dipyridin-4-ylmethanone'
41.87 0.57 30.53 0.00 -0.67 -3.16 22.829 './gold_soln_CB_FragLib_Controls_m6_9.mol2' '(3-chlorophenyl)(pyridin-4-yl)methanone'
38.18 1.49 27.09 0.00 -0.56 -1.63 7.782 './gold_soln_CB_FragLib_Controls_m7_1.mol2' '3-hydrazino-6-phenylpyridazine'
39.45 1.50 27.71 0.00 -0.15 -4.17 17.130 './gold_soln_CB_FragLib_Controls_m8_6.mol2' '3-hydrazino-6-phenylpyridazine'
41.54 4.10 27.71 0.00 -0.65 -4.44 9.702 './gold_soln_CB_FragLib_Controls_m9_4.mol2' '3-hydrazino-6-phenylpyridazine'
41.05 1.08 29.30 0.00 -0.31 -2.44 28.590 './gold_soln_CB_FragLib_Controls_m10_3.mol2' '3-hydrazino-6-(4-methylphenyl)pyridazine'
Try:
awk '{print $1"\t"$8"\t"$9}' yourfile.tsv > only189.tsv