Reactive Redis only one thread is used in a webflux service - redis

In a Webflux server, detect function first looks up 2 redis hash value, and then do a heavy CPU calculation using the redis returned value, and then save result to redis, and returns response. The idea is like this:
#PostMapping("detect")
public Mono<RecognitionResponse> detect(#Valid #RequestBody ImageVo vo) {
float[] vec = ai.calculateVec(vo.getImg);
Mono<Matrix> aLib= redisOps.get("aLib", "field").switchIfEmpty(...a default matrix...);
Mono<Matrix> bLib = redisOps.get("bLib", "field").switchIfEmpty(...a default matrix...);
return aLib.zipWith(bLib)
.flatMap(tuple -> {
R r1 = ai.calculationSimilar(tuple.getT1(), vec);
R r2 = ai.calculationSimilar(tuple.getT2(), vec);
if (r1 or r2 newImage) {
return redisOps.put("aLib", "field", r1.getVec()).map(b -> new RecognitionResponse(r1,r2))
} else {
return Mono.just(new RecognitionResponse(r1,r2));
}
});
}
In a performance test, the above works fine given a 4 CPU server, like 380% is used. Given a 32 CPU server, very small part of CPU is used, 1 shows a lettuce-epoll-thread has a high CPU consumption like 90%. Then I checked log, it shows only one lettuce-epoll-thread is used for the whole redisOps.get() to return part, which is a heavy CPU calculation. (I printed for every ai function, and they all have the same lettuce-epoll-thread number)
I think first query redis, and then call a handler on returned data is an ordinary usage for webflux. If only one lettuce-epool thread is used, then performance would be very bad. I am starting to think that the above is a misuse of Webflux reactive redis.
Supplement
In a test, thread-level top shows that the thread that eats most CPU is thread 4807:
top - 11:33:00 up 486 days, 9 min, 2 users, load average: 1.55, 1.62, 1.17
Threads: 326 total, 10 running, 316 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 1.1 sy, 0.0 ni, 90.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13183149+total, 63810604 free, 35841708 used, 32179184 buff/cache
KiB Swap: 20971516 total, 20971516 free, 0 used. 89526880 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4807 tina 20 0 78.933g 0.013t 44980 R 95.7 11.0 8:13.02 java
14946 tina 20 0 78.933g 0.013t 44980 R 6.0 11.0 0:35.34 java
14733 tina 20 0 78.933g 0.013t 44980 S 5.3 11.0 0:28.25 java
And then 4807, equally 0x12c7, is lettuce-epollEventLoop-5-12::
"lettuce-epollEventLoop-5-12" #197 daemon prio=5 os_prio=0 tid=0x00007f9554002800 nid=0x12c7 runnable [0x00007f911a4ef000]
java.lang.Thread.State: RUNNABLE
at java.lang.StringBuffer.append(StringBuffer.java:388)
- locked <0x000000009b0a79d0> (a java.lang.StringBuffer)
at java.nio.ByteBuffer.toString(ByteBuffer.java:1088)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at java.util.AbstractMap.toString(AbstractMap.java:557)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at io.lettuce.core.output.CommandOutput.toString(CommandOutput.java:151)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at io.lettuce.core.protocol.CommandWrapper.toString(CommandWrapper.java:219)
at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:277)
at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:249)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:211)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:161)
at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:151)
at io.netty.util.internal.logging.LocationAwareSlf4JLogger.debug(LocationAwareSlf4JLogger.java:115)
at io.lettuce.core.protocol.RedisStateMachine.decode(RedisStateMachine.java:145)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:742)
at io.lettuce.core.protocol.CommandHandler.decode0(CommandHandler.java:706)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:701)
jvisualvm results self time and thread cpu time, io.netty.util.concurrent.SingleThreadEventExecutor.takeTask() is the highest, and then comes io.netty.channel.epoll.Native.epollWait
self time:
thread cpu time:

Related

How to build historgram of methods by time spent inside with Mono?

I have tried the following:
mono --profile=log myprog.exe
to collect profiler data. Then to interpret those I invoke:
> mprof-report output.mlpd
Mono log profiler data
Profiler version: 2.0
Data version: 14
Arguments: log
Architecture: x86-64
Operating system: linux
Mean timer overhead: 51 nanoseconds
Program startup: Fri Jul 20 00:11:12 2018
Program ID: 19840
Server listening on: 59374
JIT summary
Compiled methods: 8349
Generated code size: 2621631
JIT helpers: 0
JIT helpers code size: 0
GC summary
GC resizes: 0
Max heap size: 0
Object moves: 0
Metadata summary
Loaded images: 16
Loaded assemblies: 16
Exception summary
Throws: 0
Thread summary
Thread: 0x7fb49c50a700, name: ""
Thread: 0x7fb49d27b700, name: "Threadpool worker"
Thread: 0x7fb49d07a700, name: "Threadpool worker"
Thread: 0x7fb49ce79700, name: "Threadpool worker"
Thread: 0x7fb49cc78700, name: "Threadpool worker"
Thread: 0x7fb49d6b9700, name: ""
Thread: 0x7fb4bbff1700, name: "Finalizer"
Thread: 0x7fb4bfe3f740, name: "Main"
Domain summary
Domain: (nil), friendly name: "myprog.exe"
Domain: 0x1d037f0, friendly name: "(null)"
Context summary
Context: (nil), domain: (nil)
However, there's no information concerning which methods were called often and took long to complete, which was the only one thing I expected from profiling.
How do I use Mono profiling to gather and output information about method calls' total run time? Like hprof with cpu=times will generate.
The Mono docs are "slightly" wrong as the methods calls are not tracked by default. This option creates huge profile log output and massively slows down "total" execution time and when combined with other options like alloc, effect the execution time of the methods and thus any timings that are being collected.
Personally I would recommend using calls profiling by itself adjusting the calldepthto a level that matters to your profiling. i.e. do you need to profile into the framework calls or not? Also a smaller call depth also greatly decreases the size of the log produced.
Example:
mono --profile=log:calls,calldepth=10 Console_Ling.exe
Produces:
Method call summary
Total(ms) Self(ms) Calls Method name
53358 0 1 (wrapper runtime-invoke) <Module>:runtime_invoke_void_object (object,intptr,intptr,intptr)
53358 2 1 Console_Ling.MainClass:Main (string[])
53340 2 1 Console_Ling.MainClass:Stuff ()
53337 0 3 System.Linq.Enumerable:ToList<int> (System.Collections.Generic.IEnumerable`1<int>)
53194 13347 1 System.Linq.Enumerable/WhereListIterator`1<int>:ToList ()
33110 13181 20000000 Console_Ling.MainClass/<>c__DisplayClass0_0:<Stuff>b__0 (int)
19928 13243 20000000 System.Collections.Generic.List`1<int>:Contains (int)
6685 6685 20000000 System.Collections.Generic.GenericEqualityComparer`1<int>:Equals (int,int)
~~~~
Re: http://www.mono-project.com/docs/debug+profile/profile/profiler/#profiler-option-documentation

arules apriori command hanging or just taking long

I used apriori command from the arules package on a transaction object and one the CPU's went up to about 97% for 20 or so min. Then it went down to cycling between 0.7 % and 0.3% and has been doing that for about 24 hr. and I do not have the prompt back in RStudio; just blinking. I have 2666 transactions and 376 items. There is probably a lot similarity among some of the transactions, meaning that some transactions can share over 100 items.
This is the first time I have used this package, so I was wondering if this behavior was normal or what should I do.
I am running on CentOS 7 with 24 GB RAM and 16 CPU's, and using RStudio Server.
My command:
rules <- apriori(adjacdmMtrans, parameter =list(support = 0.002, confidence=0.75))
Some info put out by arules after entering the above command:
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.75 0.1 1 none FALSE TRUE 0.002 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 5
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[376 item(s), 2666 transaction(s)] done [0.03s].
sorting and recoding items ... [376 item(s)] done [0.01s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4
Your machine probably runs out of memory for the R process and starts swapping. In the worse case, apriori has to create for 376 items in the order of 10^12 candidates of length 4. Start with a higher support value.

Wrong balance between Aerospike instances in cluster

I have an application with a high load for batch read operations. My Aerospike cluster (v 3.7.2) has 14 servers, each one with 7GB RAM and 2 CPUs in Google Cloud.
By looking at Google Cloud Monitoring Graphs, I noticed a very unbalanced load between servers: some servers have almost 100% CPU load, while others have less than 50% (image below). Even after hours of operation, the cluster unbalanced pattern doesn't change.
Is there any configuration that I could change to make this cluster more homogeneous? How to optimize node balancing?
Edit 1
All servers in the cluster have the same identical aerospike.conf file:
Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
#address any
port 3000
}
heartbeat {
mode mesh
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 3
memory-size 5G
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 180G
}
}
Edit 2:
$ asinfo
1 : node
BB90600F00A0142
2 : statistics
cluster_size=14;cluster_key=E3C3672DCDD7F51;cluster_integrity=true;objects=3739898;sub-records=0;total-bytes-disk=193273528320;used-bytes-disk=26018492544;free-pct-disk=86;total-bytes-memory=5368709120;used-bytes-memory=239353472;data-used-bytes-memory=0;index-used-bytes-memory=239353472;sindex-used-bytes-memory=0;free-pct-memory=95;stat_read_reqs=2881465329;stat_read_reqs_xdr=0;stat_read_success=2878457632;stat_read_errs_notfound=3007093;stat_read_errs_other=0;stat_write_reqs=551398;stat_write_reqs_xdr=0;stat_write_success=549522;stat_write_errs=90;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=4;stat_rw_timeout=1862;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=7021;stat_proxy_reqs_xdr=0;stat_proxy_success=2121;stat_proxy_errs=4739;stat_ldt_proxy=0;stat_cluster_key_err_ack_dup_trans_reenqueue=607;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;stat_compressed_pkts_received=0;err_tsvc_requests=110;err_tsvc_requests_timeout=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=17;err_rw_pending_limit=19;err_rw_cant_put_unique=0;geo_region_query_count=0;geo_region_query_cells=0;geo_region_query_points=0;geo_region_query_falsepos=0;fabric_msgs_sent=58002818;fabric_msgs_rcvd=57998870;paxos_principal=BB92B00F00A0142;migrate_msgs_sent=55749290;migrate_msgs_recv=55759692;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=7228;migrate_num_incoming_refused=0;queue=0;transactions=101978550;reaped_fds=6;scans_active=0;basic_scans_succeeded=0;basic_scans_failed=0;aggr_scans_succeeded=0;aggr_scans_failed=0;udf_bg_scans_succeeded=0;udf_bg_scans_failed=0;batch_index_initiate=40457778;batch_index_queue=0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0;batch_index_complete=40456708;batch_index_timeout=1037;batch_index_errors=33;batch_index_unused_buffers=256;batch_index_huge_buffers=217168717;batch_index_created_buffers=217583519;batch_index_destroyed_buffers=217583263;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=7021;proxy_action=5519;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=551089;write_prole=1055431;read_dup_prole=14232;rw_err_dup_internal=0;rw_err_dup_cluster_key=1814;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=1767;rw_err_ack_badnode=0;client_connections=366;waiting_transactions=0;tree_count=0;record_refs=3739898;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=296;partition_replica=572;partition_desync=0;partition_absent=3228;partition_zombie=0;partition_object_count=3739898;partition_ref_count=4096;system_free_mem_pct=61;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=90;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=53184;uptime=1001388;stat_write_errs_notfound=0;stat_write_errs_other=90;heartbeat_received_self=0;heartbeat_received_foreign=145137042;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_running=0;query_long_running=0;query_short_queue_full=0;query_long_queue_full=0;query_short_reqs=0;query_long_reqs=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
3 : features
cdt-list;pipelining;geo;float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 : cluster-generation
61
5 : partition-generation
11811
6 : edition
Aerospike Community Edition
7 : version
Aerospike Community Edition build 3.7.2
8 : build
3.7.2
9 : services
10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.41:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.23:3000
10 : services-alumni
10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.23:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.41:3000
I have a few comments about your configuration. First, transaction-threads-per-queue should be set to 3 or 4 (don't set it to the number of cores).
The second has to do with your batch-read tuning. You're using the (default) batch-index protocol, and the config params you'll need to tune for batch-read performance are:
You have batch-max-requests set very high. This is probably affecting both your CPU load and your memory consumption. It's enough that there's a slight imbalance in the number of keys you're accessing per-node, and that will reflect in the graphs you've shown. At least, this is possibly the issue. It's better that you iterate over smaller batches than try to fetch 200K records per-node at a time.
batch-index-threads – by default its value is 4, and you set it to 32 (of a max of 64). You should do this incrementally by running the same test and benchmarking the performance. On each iteration adjust higher, then down if it's decreased in performance. For example: test with 32, +8 = 40 , +8 = 48, -4 = 44. There's no easy rule-of-thumb for the setting, you'll need to tune through iterations on the hardware you'll be using, and monitor the performance.
batch-max-buffer-per-queue – this is more directly linked to the number of concurrent batch-read operations the node can support. Each batch-read request will consume at least one buffer (more if the data cannot fit in 128K). If you do not have enough of these allocated to support the number of concurrent batch-read requests you will get exceptions with error code 152 BATCH_QUEUES_FULL . Track and log such events clearly, because it means you need to raise this value. Note that this is the number of buffers per-queue. Each batch response worker thread has its own queue, so you'll have batch-index-threads x batch-max-buffer-per-queue buffers, each taking 128K of RAM. The batch-max-unused-buffers caps the memory usage of all these buffers combined, destroying unused buffers until their number is reduced. There's an overhead to allocating and destroying these buffers, so you do not want to set it too low compared to the total. Your current cost is 32 x 256 x 128KB = 1GB.
Finally, you're storing your data on a filesystem. That's fine for development instances, but not recommended for production. In GCE you can provision either a SATA SSD or an NVMe SSD for your data storage, and those should be initialized, and used as block devices. Take a look at the GCE recommendations for more details. I suspect you have warnings in your log about the device not keeping up.
It's likely that one of your nodes is an outlier with regards to the number of partitions it has (and therefore number of objects). You can confirm it with asadm -e 'asinfo -v "objects"'. If that's the case, you can terminate that node, and bring up a new one. This will force the partitions to be redistributed. This does trigger a migration, which takes quite longer in the CE server than in the EE one.
For anyone interested, Aerospike Enterpirse 4.3 introduced 'uniform-balance' which homogeneously balances data partitions. Read more here: https://www.aerospike.com/blog/aerospike-4-3-all-flash-uniform-balance/

Aerospike cluster not clean available blocks

we use aerospike in our projects and caught strange problem.
We have a 3 node cluster and after some node restarting it stop working.
So, we make test to explain our problem
We make test cluster. 3 node, replication count = 2
Here is our namespace config
namespace test{
replication-factor 2
memory-size 100M
high-water-memory-pct 90
high-water-disk-pct 90
stop-writes-pct 95
single-bin true
default-ttl 0
storage-engine device {
cold-start-empty true
file /tmp/test.dat
write-block-size 1M
}
We write 100Mb test data after that we have that situation
available pct equal about 66% and Disk Usage about 34%
All good :slight_smile:
But we stopped one node. After migration we see that available pct = 49% and disk usage 50%
Return node to cluster and after migration we see that disk usage became previous about 32%, but available pct on old nodes stay 49%
Stop node one more time
available pct = 31%
Repeat one more time we get that situation
available pct = 0%
Our cluster crashed, Clients get AerospikeException: Error Code 8: Server memory error
So how we can clean available pct?
If your defrag-q is empty (and you can see whether it is from grepping the logs) then the issue is likely to be that your namespace is smaller than your post-write-queue. Blocks on the post-write-queue are not eligible for defragmentation and so you would see avail-pct trending down with no defragmentation to reclaim the space. By default the post-write-queue is 256 blocks and so in your case that would equate to 256Mb. If your namespace is smaller than that you will see avail-pct continue to drop until you hit stop-writes. You can reduce the size of the post-write-queue dynamically (i.e. no restart needed) using the following command, here I suggest 8 blocks:
asinfo -v 'set-config:context=namespace;id=<NAMESPACE>;post-write-queue=8'
If you are happy with this value you should amend your aerospike.conf to include it so that it persists after a node restart.

How to Configure the Web Connector from metrics.log Values

I am reviewing the ColdFusion Web Connector settings in workers.properties to hopefully address a sporadic response time issue.
I've been advised to inspect the output from the metrics.log file (CF Admin > Debugging & Logging > Debug Output Settings > Enable Metric Logging) and use this to inform the adjustments to the settings max_reuse_connections, connection_pool_size and connection_pool_timeout.
My question is: How do I interpret the metrics.log output to inform the choice of setting values? Is there any documentation that can guide me?
Examples from over a 120 hour period:
95% of entries -
"Information","scheduler-2","06/16/14","08:09:04",,"Max threads: 150 Current thread count: 4 Current thread busy: 0 Max processing time: 83425 Request count: 9072 Error count: 72 Bytes received: 1649 Bytes sent: 22768583 Free memory: 124252584 Total memory: 1055326208 Active Sessions: 1396"
Occurred once -
"Information","scheduler-2","06/13/14","14:20:22",,"Max threads: 150 Current thread count: 10 Current thread busy: 5 Max processing time: 2338 Request count: 21 Error count: 4 Bytes received: 155 Bytes sent: 139798 Free memory: 114920208 Total memory: 1053097984 Active Sessions: 6899"
Environment:
3 x Windows 2008 R2 (hardware load balanced)
ColdFusion 10 (update 12)
Apache 2.2.21
Richard, I realize your question here is from 2014, and perhaps you have since resolved it, but I suspect your problem was that the port set in the CF admin (below the "metrics log" checkbox) was set to 8500, which is your internal web server (used by the CF admin only, typically, if at all). That's why the numbers are not changing. (And for those who don't enable the internal web server at installation of CF, or later, most values in the metrics log are null).
I address this problem in a blog post I happened to do just last week: http://www.carehart.org/blog/client/index.cfm/2016/3/2/cf_metrics_log_part1
Hope any of this helps.