rabbitmq-c consumer not receving all messages - rabbitmq

I have ACK enabled on consumer and producer sending 2000 messages to server. What I see is only around 1700 messages are received at consumer. Can someone tell what is wrong?
I am running provided example code from rabbitmq-c library
./amqp_producer localhost 5672 1000
1000 ms: Sent 1000 - 1000 since last report (999 Hz)
PRODUCER - Message count: 2000
Total time, milliseconds: 2001
Overall messages-per-second: 999.083
root#ce-bras-mx240-e:/usr/sbin/rabbitmq_server-3.6.6 # sbin/rabbitmqctl list_connections send_cnt
Listing connections ...
2007
root#ce-bras-mx240-e:/usr/sbin/rabbitmq_server-3.6.6 # sbin/rabbitmqctl list_channels messages_unacknowledged
Listing channels ...
0
# ./amqp_consumer localhost 5672
3275 ms: Received 1 - 1 since last report (0 Hz)
3275 ms: Received 2 - 1 since last report (1919 Hz)
3277 ms: Received 3 - 1 since last report (656 Hz)
4001 ms: Received 727 - 724 since last report (999 Hz)
5000 ms: Received 1727 - 1000 since last report (1001 Hz)
Only 1727 out of 2000 are received at consumer. The consumer is having no-ack flag set to 0.

It was display issue only. There was bug in displaying the summary from amqp_consumer.cc in provided liberary which was incrementing the timestamp fo collecting next summary wrongly.

Related

Apache Intermittant Hang is it Network Lag?

I have an intermittent lag on the web applications I am serving from Apache on a Debian box. Apache and MySQL check out. I am far from fully utilizing the box CPU/Memory. Still there is an intermittent lag. My theory is there is a network rate limit needing to be tweaked. Stats below.
Apache Server Status
Current Time: Tuesday, 02-Jun-2020 14:36:53 EDT
Restart Time: Monday, 01-Jun-2020 01:00:03 EDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 day 13 hours 36 minutes 50 seconds
Server load: 2.95 3.23 3.09
Total accesses: 1213060 - Total Traffic: 22.0 GB - Total Duration: 32311929295
CPU Usage: u396.94 s164.31 cu2065.15 cs789.27 - 2.52% CPU load
8.96 requests/sec - 170.5 kB/second - 19.0 kB/request - 26636.7 ms/request
296 requests currently being processed, 66 idle workers
WR.WWWW.KWW_W._W_KWWWWWWKWWWWW_WWWWK_WK_WWW_WW_RWWWWWKCWWWWWW._W
_WW_R_W_.__K_WWWW__WWWWWWKKWWWWWWKWWWW_W____WWWWWWWW_WWW_KWWWWWW
WWWWWWWW_.WWWWWK_WWW_WWKWWWWWWKWWKWK_WWWWWRKWWW.WW_KKWKWWWKW_WWW
WW.W_.K._WWWK_WW_K_K._WW..WWWWWWW_.W_WWWW_W_W.W_WWWW_.WWKWK_WKWW
_W_WWWW_W.WWWWWW.WWWW_K__..W.WW_WWWWWWWWKRW_WWW_C.W_KW_WWW_KW.._
..WWWWWWWCWWW.WWW_WKKWWWW_._WWW.....WWW.W_W.W._.KW...W...WWW.WWW
W..W..K..WW_.W._................W..._W.W.....K.W.K_...R..K...W.W
...W..W.............................................
top
top - 14:31:14 up 79 days, 21:39, 3 users, load average: 2.26, 2.57, 2.86
Tasks: 717 total, 1 running, 716 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.3 us, 0.7 sy, 0.2 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 64365.1 total, 539.8 free, 8847.0 used, 54978.4 buff/cache
MiB Swap: 65477.0 total, 63810.0 free, 1667.0 used. 54580.5 avail Mem
ss -s
Total: 1934
TCP: 2362 (estab 1233, closed 1105, orphaned 2, timewait 1104)
Transport Total IP IPv6
RAW 0 0 0
UDP 0 0 0
TCP 1257 430 827
INET 1257 430 827
FRAG 0 0 0
ulimit -n
1024
ss -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n
1 Local
6 192.XXX.XXX.XXX
100 127.0.0.1
340 10.0.0.XX
866 [
ss -ntu | awk '{print $6}' | cut -d: -f1 | sort | uniq -c | sort -n
..........
lists # of ip connections. Besides 127.0.0.1 and [ there are 2 ips over 50.
74 104.xxx.xxx.xxx
91 12.xxx.xxx.xxx
MySQL
No processes running more than a second. Number of processes well within limits.
I do not know what stats would be relevant beyond these in diagnosing network rate limiting issues. Any pointers would be appreciated.
EDITED
CPU
lscpu https://pastebin.com/Jha6F7J8
Apache Config
apachectl -t -D DUMP_RUN_CFG https://pastebin.com/i1L2hnjH
Mysql
SHOW GLOBAL STATUS https://pastebin.com/aQX4D01k
SHOW GLOBAL VARIABLES https://pastebin.com/L8EfmHfn
SHOW FULL PROCESSLIST https://pastebin.com/GtqK2tET
mysqltuner https://pastebin.com/GLhhKA9q
Optional Very Helpful Information
top -bn1 https://pastebin.com/r94vpXe6
iostat -xm 5 3 https://pastebin.com/R8YLK3QU
ulimit -a https://pastebin.com/KUC3wqxU
Dorothy, Your system is very busy with activity. Not knowing the frequency and duration of the intermittent hangs puts us at a disadvantage. One possible cause is com_drop_table had 3,318 uses in your 83 days of uptime. Another possible cause is volume of data read and written. It appears innodb_data_written was 484TB in 83 days and yet MySQLTuner reports only 800K of data in 10 tables. Our General Log Analysis could likely identify the cause of this high activity. These suggestions will be a starting effort, more analysis and changes should be accomplished.
From your OS command prompt,
ulimit -n 96000 would enable many more Open Files (handles) above today's 1024 limit.
This is a dynamic operation in Linux and does not require OS restart to be implemented.
For this change to persist across OS stop/start the following URL could be used as a guide.
Please use 96000, not 500000 - as in their example documentation.
https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/
Rate Per Second = RPS
Suggestions to consider for your my.cnf [mysqld] section
innodb_io_capacity=1900 # from 200 if you have SSD, 900 if you have magnetic storage to improve IOPS
net_buffer_length=32K # from 16K to reduce malloc operations
innodb_lru_scan_depth=100 # from 1024 to conserve 90% of CPU cycles used for function
key_cache_segments=16 # from 0 to reduce mutex contention with MyISAM opens
key_cache_division_limit=50 # from 100 for Hot/Warm storage to reduce key_page_reads RPS of 18
aria_pagecache_division_limit=50 # from 100 for Hot/Warm storage to reduce aria_pagecache_reads RPS of 5K
read_rnd_buffer_size=64K # from 256K to reduce handler_read_rnd_next RPS of 27,707
These changes should reduce elapsed time to complete most queries.
Additional areas to consider include the use of Slow Query Log analysis to find where an index could avoid a table scan. MySQLTuner reported more than 4 million joins performed without indexes. Our FAQ page includes information on how you could find the tables needing indexes to avoid scans. Let us know how these suggestions work for you.
Skype Talk works very well if you have the flexibility to use that form of communication.

RabbitMQ lager_error_logger_h dropped messages

Help please solve the problem.
There are:
RabbitMQ - 3.7.2
Erlang - 20.1
Connections: 527
Channels: 500
Exchanges: 49
Queues: 4437
Consumers: 131
Publish rate ~ 200/s
Ack rate ~ 200/s
Config:
disk_free_limit.absolute = 5GB
log.default.level = warning
log.file.level = warning
In the logs constantly appear such messages:
11:42:16.000 [warning] <0.32.0> lager_error_logger_h dropped 105 messages in the last second that exceeded the limit of 100 messages/sec
11:42:17.000 [warning] <0.32.0> lager_error_logger_h dropped 101 messages in the last second that exceeded the limit of 100 messages/sec
11:42:18.000 [warning] <0.32.0> lager_error_logger_h dropped 177 messages in the last second that exceeded the limit of 100 messages/sec
How to get rid of them correctly? How to remove this messages from logs?
The RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
The message means that RabbitMQ is generating a very large number of error messages and that they are being dropped to avoid filling the log rapidly. If "dropped X messages in the last second" is the only message you are seeing in the logs, you need to determine what the messages are that are being dropped to find the root of the problem. You can do this by temporarily raising that limit by running the following command:
rabbitmqctl eval '[lager:set_loghwm(H, 250) || H <- gen_event:which_handlers(lager_event)].'
You should then see a much larger number of messages that will reveal the underlying issue. To revert back to the previous setting, run this command:
rabbitmqctl eval '[lager:set_loghwm(H, 50) || H <- gen_event:which_handlers(lager_event)].'

Redis server console output clarification?

I'm looking at the redis output console and I'm trying to understand the displayed info :
(didn't find that info in the quick guide)
So redis-server.exe outputs this :
/*1*/ [2476] 24 Apr 11:46:28 # Open data file dump.rdb: No such file or directory
/*2*/ [2476] 24 Apr 11:46:28 * The server is now ready to accept connections on port 6379
/*3*/ [2476] 24 Apr 11:42:35 - 1 clients connected (0 slaves), 1188312 bytes in use
/*4*/ [2476] 24 Apr 11:42:40 - DB 0: 1 keys (0 volatile) in 4 slots HT.
Regarding line #1 - what does the dump.rdb file is used for ? is it the data itself ?
what is the [2476] number ? it is not a port since line #2 tells port is 6379
What does (0 slaves) means ?
in line #3 - 1188312 bytes used - but what is the max value so i'd know overflows ...? is it for whole databases ?
Line #3 What does (0 volatile) means ?
Line #4 - why do i have 4 slots HT ? I have no data yet
[2476] - process ID
dump.rdb - redis can persist data by snapshoting, dump.rdb is the default file name http://redis.io/topics/persistence
0 slaves - redis can work in master-slave mode, 0 slaves informs you that there are no slave servers connected
1188312 bytes in use - total number of bytes allocated by Redis using its allocator
0 volatile - redis can set keys with expiration time, this is the count of them
4 slots HT - current hash table size, initial table size is 4, as you add more items hash table will grow

size of ICMP type 11 packet payload

What's the size of the ICMP packet payload when the type is 11, i.e. time exceeded?
Since it contains an IP header and the first 8 Bytes of the IP packet payload generating the ICMP message, I thought its size was 20 + 8 = 28.
I'm replaying some common user traffic with TTL=1. In the ICMP messages I have dumped I noticed that:
all ICMP packets generated by UDP packets have payload of size 28 Bytes
all those generated by TCP packets have payload of size 40 Bytes
Since I need to match ICMP time-exceeded messages with the packets that triggered them by comparing those bytes, this piece of information is essential, but I can't find figure out why this happens.
The problem is that you're quoting the 8-byte header payload from RFC 792, Page 4, but the requirements were changed by RFC 1812...
Time Exceeded Message (in RFC 792)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unused |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Internet Header + 64 bits of Original Data Datagram |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RFC 1812, Section 4.3.2.3 dramatically increases the allowable payload in an ICMP Error message (emphasis mine):
4.3.2.3 Original Message Header
Historically, every ICMP error message has included the Internet
header and at least the first 8 data bytes of the datagram that
triggered the error. This is no longer adequate, due to the use of
IP-in-IP tunneling and other technologies. Therefore, the ICMP
datagram SHOULD contain as much of the original datagram as possible
without the length of the ICMP datagram exceeding 576 bytes. The
returned IP header (and user data) MUST be identical to that which
was received, except that the router is not required to undo any
modifications to the IP header that are normally performed in
forwarding that were performed before the error was detected (e.g.,
decrementing the TTL, or updating options).
The ICMP Errors you're generating from Scapy packets should contain all the information from the IP and TCP layers of the original packet.
As you noted, the ICMP payload is the IP header plus 8 octets of the original packet's payload. IP headers, however, are not always 20 octets long; 20 is only the minimum. The IP header itself may contain options, and the header length is indicated by the value in the IHL field of the header. See sec 3.1 of RFC 791. So it looks like the TCP packets have 12 additional octets of options in their IP headers. RFC 791 defines some standard options such as source routing and timestamping. You'll have to decode the header to determine what options are being used.
I would like to add for future reference that not only do ICMP payloads vary in size as Mike said, they might also be longer than 128 Bytes in the case of ICMP extensions for MPLS. See this draft for more information

Apache benchmark: what does the total mean milliseconds represent?

I am benchmarking php application with apache benchmark. I have the server on my local machine. I run the following:
ab -n 100 -c 10 http://my-domain.local/
And get this:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 3.7 2 8
Processing: 311 734 276.1 756 1333
Waiting: 310 722 273.6 750 1330
Total: 311 737 278.9 764 1341
However, if I refresh my browser on the page http://my-domain.local/ I find out it takes a lot longer than 737 ms (the mean that ab reports) to load the page (around 3000-4000 ms). I can repeat this many times and the loading of the page in the browser always takes at least 3000 ms.
I tested another, heavier page (page load in browser takes 8-10 seconds). I used a concurrency of 1 to simulate one user loading the page:
ab -n 100 -c 1 http://my-domain.local/heavy-page/
And the results are here:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 17 20 4.7 18 46
Waiting: 16 20 4.6 18 46
Total: 17 20 4.7 18 46
So what does the total line on the ab results actually tell? Clearly it's not the number of milliseconds the browser is loading the web page. Is the number of milliseconds that it takes from browser to load the page (X) linearly dependent of number of the total mean milliseconds ab reports (Y)? So if I'm able to reduce half of Y, have I also reduced half of X?
(Also Im not really sure what Processing, Waiting and Total mean).
I'll reopen this question since I'm facing the problem again.
Recently I installed Varnish.
I run ab like this:
ab -n 100 http://my-domain.local/
Apache bench reports very fast response times:
Requests per second: 462.92 [#/sec] (mean)
Time per request: 2.160 [ms] (mean)
Time per request: 2.160 [ms] (mean, across all concurrent requests)
Transfer rate: 6131.37 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 2 2.3 1 13
Waiting: 0 1 2.0 1 12
Total: 1 2 2.3 1 13
So the time per request is about 2.2 ms. When I browse the site (as an anonymous user) the page load time is about 1.5 seconds.
Here is a picture from Firebug net tab. As you can see my browser is waiting 1.68 seconds for my site to response. Why is this number so much bigger than the request times ab reports?
Are you running ab on the server? Don't forget that your browser is local to you, on a remote network link. An ab run on the webserver itself will have almost zero network overhead and report basically the time it takes for Apache to serve up the page. Your home browser link will have however many milliseconds of network transit time added in, on top of the basic page-serving overhead.
Ok.. I think I know what's the problem. While I have been measuring the page load time in browser I have been logged in.. So none of the heavy stuff is happening. The page load times in browser with anonymous user are closer to the ones ab is reporting.