How can I stop redis memory usage increasing with connection count - redis

We are running redis via elasticache on AWS and are seeing memory usage spike when running a large number of lambda functions which just read. Here is some example output from redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys mem clients blocked requests connections
1002 28.11M 15 0 2751795 (+11) 53877
1002 28.07M 15 0 2751797 (+2) 53877
1002 28.07M 15 0 2751799 (+2) 53877
1002 28.11M 15 0 2751803 (+4) 53877
1002 28.07M 15 0 2751806 (+3) 53877
1001 28.11M 15 0 2751808 (+2) 53877
1007 28.08M 15 0 2751837 (+29) 53877
1007 28.08M 15 0 2751839 (+2) 53877
1005 28.10M 16 0 2751841 (+2) 53878
1007 171.68M 94 0 2752012 (+171) 53957
1006 545.93M 316 0 2752683 (+671) 54179
1006 1.07G 483 0 2753508 (+825) 54346
1006 1.54G 677 0 2754251 (+743) 54540
1006 1.98G 882 0 2755024 (+773) 54745
1006 2.35G 1010 0 2755776 (+752) 54873
1005 2.78G 1014 0 2756548 (+772) 54877
1005 2.80G 1014 0 2756649 (+101) 54877
1004 2.79G 1014 0 2756652 (+3) 54877
1008 2.79G 1014 0 2756682 (+30) 54877
1007 2.79G 1014 0 2756685 (+3) 54877
As you can see the number of keys is pretty much constant but as the number of clients increases the memory usage ramps up to 2.8GB. Is this memory pattern expected and if so is there a way to mitigate it other than increasing the amount of RAM available to the process?
The lambda clients are written in Java using lettuce 5.2.1.RELEASE and spring-data-redis 2.2.1.RELEASE
Unless there is some additional redis interaction within spring-data-redis the client code is basically as follows
public <T> T get(final String label, final RedisTemplate<String, ?> redisTemplate) {
final BoundHashOperations<String, String, T> cache = redisTemplate.boundHashOps(REDIS_KEY);
return cache.get(label);
}
There are no usages of RedisTemplate#keys in my codebase, the only interaction with redis is via RedisTemplate#boundHashOps
Here is the output from redis-cli info memory before and after the spike:
Before
# Memory
used_memory:31558400
used_memory_human:30.10M
used_memory_rss:50384896
used_memory_rss_human:48.05M
used_memory_peak:6498905008
used_memory_peak_human:6.05G
used_memory_peak_perc:0.49%
used_memory_overhead:4593040
used_memory_startup:4203584
used_memory_dataset:26965360
used_memory_dataset_perc:98.58%
allocator_allocated:32930040
allocator_active:34332672
allocator_resident:50593792
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:5140907060
maxmemory_human:4.79G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.04
allocator_frag_bytes:1402632
allocator_rss_ratio:1.47
allocator_rss_bytes:16261120
rss_overhead_ratio:1.00
rss_overhead_bytes:-208896
mem_fragmentation_ratio:1.60
mem_fragmentation_bytes:18826560
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:269952
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
After
# Memory
used_memory:4939687896
used_memory_human:4.60G
used_memory_rss:4754452480
used_memory_rss_human:4.43G
used_memory_peak:6498905008
used_memory_peak_human:6.05G
used_memory_peak_perc:76.01%
used_memory_overhead:4908463998
used_memory_startup:4203584
used_memory_dataset:31223898
used_memory_dataset_perc:0.63%
allocator_allocated:5017947040
allocator_active:5043314688
allocator_resident:5161398272
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:5140907060
maxmemory_human:4.79G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.01
allocator_frag_bytes:25367648
allocator_rss_ratio:1.02
allocator_rss_bytes:118083584
rss_overhead_ratio:0.92
rss_overhead_bytes:-406945792
mem_fragmentation_ratio:0.96
mem_fragmentation_bytes:-185235352
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:4904133550
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

Having discussed this with AWS support the cause of this memory spike is that each of the 1000 lambda clients is filling up a read buffer with ~5mb of data as the data we are storing in redis is large serialized json objects.
Their recommendations are to either:
Add 2-3 replicas in the cluster and use replica nodes for read requests. You can use reader endpoints for load balancing the requests
Or control client output buffers with the parameters but note the clients will be disconnect if they reach the buffer limit.
client-output-buffer-limit-normal-hard-limit >> If a client's output buffer reaches the specified number of bytes, the client will be disconnected. The default is zero (no hard limit). By default is 0 which means client can use as much memory they can.
client-output-buffer-limit-normal-soft-limit >> If a client's output buffer reaches the specified number of bytes, the client will be disconnected, but only if this condition persists for client-output-buffer-limit-normal-soft-seconds. The default is zero (no soft limit).
client-output-buffer-limit-normal-soft-seconds >> For Redis publish/subscribe clients: If a client's output buffer reaches the specified number of bytes, the client will be disconnected. Default value is 0
Given these constraints and our usage profile we're actually going to switch to use S3 in this instance.

Related

Media and Data Integrity Errors

I was wondering if anyone can tell me what these mean. From most people posting about them, there is no more than double digits. However, I have 1051556645921812989870080 Media and Data Integrity Errors on my SK hynix PC711 on my new HP dev one. Thanks!
Here's my entire smartctl output
`smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SK hynix PC711 HFS001TDE9X073N
Serial Number: KDB3N511010503A37
Firmware Version: HPS0
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 00254f98f1
Local Time is: Wed Nov 9 13:58:37 2022 EST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.3000W - - 0 0 0 0 5 5
1 + 2.4000W - - 1 1 1 1 30 30
2 + 1.9000W - - 2 2 2 2 100 100
3 - 0.0500W - - 3 3 3 3 1000 1000
4 - 0.0040W - - 3 3 3 3 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 34 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 13,162,025 [6.73 TB]
Data Units Written: 3,846,954 [1.96 TB]
Host Read Commands: 156,458,059
Host Write Commands: 128,658,566
Controller Busy Time: 116
Power Cycles: 273
Power On Hours: 126
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 1051556645921812989870080
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 34 Celsius
Temperature Sensor 2: 36 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged`
Encountered a similar SMART reading from the same model.
I'm seeing a reported Media and Data Integrity Errors rate of a value that's over 2 ^ 84.
It could just be an error with its SMART implementation or the utility reading from it.
Converting your reported value of 1051556645921812989870080 to hex, we get 0xdead0000000000000000 big endian and 0x0000000000000000adde little endian.
Similarly, when I convert my value to hex, I get 0xffff0000000000000000 big endian and 0x0000000000000000ffff little endian, where f is just denotes a value other than 0.
I'm going to assume that the Media and Data Integrity Errors value has no actual meaning with regard to real errors. I doubt that both of us would have values that are padded with 16 0's when converted to hex. Something is sending/receiving/parsing bad data.
If you poke around the other reported SMART values in your post, and on my end, some of them don't seem to make much sense, either.

Redis occupies more memory

OS: RHEL 7.6
Cluster set up:
Single node - 6 instances of redis - 3 master and 3 slaves.
16384 slots shared between three masters.
Sample data:
smembers 201904138
1) "0"
2) "1"
3) "2"
4) "3"
5) "4"
Each set contains 5 ids. I have such 2.49Million keys.
Sample size occupied by each entry:
memory usage 201904138
(integer) 76
:7001> memory usage 201904132
(integer) 76
:7001> memory usage 201904134
(integer) 76
:7001> dbsize
(integer) 2489174
So logically it should occupy 2.49M * 76 = 189MB. I do understand that it stores some extra information as well.
But total memory occupied by this cluster is Memory = 367M, RSS=389M
Why it is double than the original data? how can i reduce it?
Please help.

TEZ mapper resource request

We recently migrated from MapReduce to TEZ for executing Hive queries on EMR. We are seeing cases where for the exact hive query launches very different number of mappers. See Map 3 phase below. On the first run it requested for 305 resources and on another run it requested for 4534 mappers. ( Please ignore the KILLED status because I manually killed the query.) Why does this happen ? How can we change it to be based on underlying data size instead ?
Run 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container KILLED 5 0 0 5 0 0
Map 3 container KILLED 305 0 0 305 0 0
Map 5 container KILLED 16 0 0 16 0 0
Map 6 container KILLED 1 0 0 1 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/06 [>>--------------------------] 0% ELAPSED TIME: 14.16 s
----------------------------------------------------------------------------------------------
Run 2
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 5 5 0 0 0 0
Map 3 container KILLED 4534 0 0 4534 0 0
Map 5 .......... container SUCCEEDED 325 325 0 0 0 0
Map 6 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/06 [=>>-------------------------] 5% ELAPSED TIME: 527.16 s
----------------------------------------------------------------------------------------------
This article explains the process in which Tez allocates resources. https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
If Tez grouping is enabled for the splits, then a generic grouping
logic is run on these splits to group them into larger splits. The
idea is to strike a balance between how parallel the processing is and
how much work is being done in each parallel process.
First, Tez tries to find out the resource availability in the cluster for these tasks. For that, YARN provides a headroom value (and
in future other attributes may be used). Lets say this value is T.
Next, Tez divides T with the resource per task (say M) to find out how many tasks can run in parallel at one (ie in a single wave). W =
T/M.
Next W is multiplied by a wave factor (from configuration - tez.grouping.split-waves) to determine the number of tasks to be used.
Lets say this value is N.
If there are a total of X splits (input shards) and N tasks then this would group X/N splits per task. Tez then estimates the size of
data per task based on the number of splits per task.
If this value is between tez.grouping.max-size & tez.grouping.min-size then N is accepted as the number of tasks. If
not, then N is adjusted to bring the data per task in line with the
max/min depending on which threshold was crossed.
For experimental purposes tez.grouping.split-count can be set in configuration to specify the desired number of groups. If this config
is specified then the above logic is ignored and Tez tries to group
splits into the specified number of groups. This is best effort.
After this the grouping algorithm is executed. It groups splits by node locality, then rack locality, while respecting the group size
limits.

Dot Net Cisco Command Line Console Parser [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm Trying to write a Cisco Command Line Parser to have an automated Graphical User Interface replacement for the Cisco console output. I have been able to get the ping time using Regular Expressions from a ping output and graph it, but am now stuck with more detailed out put of other commands like "Show interfaces" command,
any ideas how I can parse the Show Interface command output and extract all the useful info which i need?
Here is a "Show Interfaces" out put example:
FastEthernet0/0 is up, line protocol is up
Hardware is MV96340 Ethernet, address is 0018.189d.1df0 (bia 0018.189d.1df0)
Description: IP+ connection
Internet address is 164.128.251.50/24
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, 100BaseTX/FX
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/3718/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 2000 bits/sec, 6 packets/sec
5 minute output rate 3000 bits/sec, 10 packets/sec
152817108 packets input, 1043050554 bytes
Received 77347880 broadcasts (67140888 IP multicasts)
0 runts, 0 giants, 3351 throttles
381823 input errors, 0 CRC, 0 frame, 0 overrun, 381823 ignored
0 watchdog
0 input packets with dribble condition detected
--More-- 99065802 packets output, 440637782 bytes, 0 underruns
0 output errors, 0 collisions, 2 interface resets
300246 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out
FastEthernet0/1 is administratively down, line protocol is down
Hardware is MV96340 Ethernet, address is 0018.189d.1df1 (bia 0018.189d.1df1)
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Auto-duplex, Auto Speed, 100BaseTX/FX
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes
Received 0 broadcasts (0 IP multicasts)
--More-- 0 runts, 0 giants, 0 throttles
--More-- 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog
0 input packets with dribble condition detected
0 packets output, 0 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out
Tunnel0 is up, line protocol is up
Hardware is Tunnel
Interface is unnumbered. Using address of FastEthernet0/0 (164.128.251.50)
MTU 17912 bytes, BW 100 Kbit/sec, DLY 50000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, loopback not set
Keepalive not set
Tunnel source 164.128.251.50 (FastEthernet0/0), destination 164.128.32.1
Tunnel Subblocks:
src-track:
Tunnel0 source tracking subblock associated with FastEthernet0/0
Set of tunnels with source FastEthernet0/0, 1 member (includes iterators), on interface
Tunnel protocol/transport PIM/IPv4
--More-- Tunnel TOS/Traffic Class 0xC0, Tunnel TTL 255
--More-- Tunnel transport MTU 1472 bytes
Tunnel is transmit only
Tunnel transmit bandwidth 8000 (kbps)
Tunnel receive bandwidth 8000 (kbps)
Last input never, output 28w1d, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
44 packets output, 2464 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out
Virtual-Access1 is up, line protocol is up
Hardware is Virtual Access interface
Description: Internally created by SSLVPN context TEST
MTU 1406 bytes, BW 100000 Kbit/sec, DLY 100000 usec,
--More-- reliability 255/255, txload 1/255, rxload 1/255
--More-- Encapsulation SSL
Internal vaccess
Vaccess status 0x0, loopback not set
Keepalive set (10 sec)
DTR is pulsed for 5 seconds on reset
Last input never, output never, output hang never
Last clearing of "show interface" counters 29w5d
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 packets output, 0 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out
0 carrier transitions
Interface_Long_Split = Regex.Split(Result_Long, "(POS[0-9]/[0-9]/[0-9])|(POS[0-9]/[0-9])|(GigabitEthernet[0-9]/[0-9])|(FastEthernet[0-9]/[0-9])")
Dim count As Integer = 0
For i = 0 To Interface_Long_Split.Length
If Regex.IsMatch(Interface_Long_Split(i), "(POS[0-9]/[0-9]/[0-9])|(POS[0-9]/[0-9])|(GigabitEthernet[0-9]/[0-9])|(FastEthernet[0-9]/[0-9])") = True Then
ReDim Preserve Interfaces_List(count)
Interfaces_List(count) = Interface_Long_Split(i)
count = count + 1
End If
imho you are probably on a hiding to nothing.
you could try parsing those complex outputs a line at a time rather than as one big blob.

Pulling from a very very specific location imbedded in a text file

I finished every piece of code in my program save for one tid bit, how to pull two numbers from a text file. I know how to pull lines, I know how to pull search strings, but I cant figure out this one to save my life.
Anyways here is a sample of the automatically generated text that I need to pull from...
.......................................................................
Applications Memory Usage (kB):
Uptime: 6089044 Realtime: 6089040
** MEMINFO in pid 764 [com.lookout] **
native dalvik other total
size: 27908 8775 N/A 36683
allocated: 3240 4216 N/A 7456
free: 24115 4559 N/A 28674
(Pss): 1454 1142 6524 *9120*
(priv dirty): 1436 628 5588 *7652*
Objects
Views: 0 ViewRoots: 0
AppContexts: 0 Activities: 0
Assets: 3 AssetManagers: 3
Local Binders: 15 Proxy Binders: 41
Death Recipients: 3
OpenSSL Sockets: 0
SQL
heap: 98 MEMORY_USED: 98
PAGECACHE_OVERFLOW: 16 MALLOC_SIZE: 50
DATABASES
pgsz dbsz Lookaside(b) Dbname
1 14 120 google_analytics.db
Asset Allocations
zip:/system/app/com.lookout_6.0.1_r8234_Release.apk:/resources.arsc: 161K
.............................................................................
The two numbers that I need out of this are the two ones that I put in the **'s (the asterisks are not normally there). These numbers will be different every time this sheet is generated, and the number placement might be different as well as some of the numbers could have 4 digits, 5 digits, or 6 digits.
If anyone could shed any light on the subject it would be greatly appreciated
Thanks,
Zach
You just need to read in the last word of the line and convert it to a number. Use String.LastIndexOf to find the last space " " in the file and read the data from that point forwards.
Dim line as String = " (Pss): 1454 1142 6524 9120"
Dim value as Integer
If line.IndexOf("(Pss)") > 0 Then
value = CInt(line.Substring(line.LastIndexOf(" ") + 1))
End If