Memory utilization in redis for each database - redis

Redis allows storing data in 16 different 'databases' (0 to 15). Is there a way to get utilized memory & disk space per database. INFO command only lists number of keys per database.

No, you can not control each database individually. These "databases" are just for logical partitioning of your data.
What you can do (depends on your specific requirements and setup) is spin multiple redis instances, each one does a different task and each one has its own redis.conf file with a memory cap. Disk space can't be capped though, at least not in Redis level.
Side note: Bear in mind that the 16 database number is not hardcoded - you can set it in redis.conf.

I did it by calling dump on all the keys in a Redis DB and measuring the total number of bytes used. This will slow down your server and take a while. It seems the size dump returns is about 4 times smaller than the actual memory use. These number will give you an idea of which db is using the most space.
Here's my code:
https://gist.github.com/mathieulongtin/fa2efceb7b546cbb6626ee899e2cfa0b

Related

Redis using too much memory smaller number of keys

I have a redis standalone server, with around 8000 keys at a given instance .
The used_memeory is showing to be around 8.5 GB.
My individuals key-value size is max around 50kb , by that calculation the used_memory should be less than 1 GB (50kb * 8000)
I am using spring RedisTemplate with default pool configuration to connect to redis
Any idea what should I look into, to narrow down where the memory is being consumed ?
zset internally uses two data structures to hold the same elements in order to get O(log(N)) INSERT and REMOVE operations into a sorted data structure.
The two Data-structures to be specific are,
Hash Table
Skip list
Storage for ideal cases according to my research is in the following order,
hset < set < zset
I would recommend you to start using hset in case you have hierarchical data storage. This would lower down your memory consumption but might make searching teeny-tiny bit slower (only if one key has more than say a couple of hundred records)

What happens when a SQL query runs out of memory?

I want to set up a Postgres server on AWS, the biggest table will be 10GB - do I have to select 10GB of memory for this instance?
What happens when my query result is larger than 10GB?
Nothing will happen, the entire result set is not loaded into memory. The maximum available memory will be used and re-used as needed while the result is prepared and will spill over to disk as needed.
See PostgreSQL resource documentation for more info.
Specifically, look at work_mem:
work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files.
As long as you don't run out of working memory on a single operation or set of parallel operations you are fine.
Edit: The above was an answer to the question What happens when you query a 10GB table without 10GB of memory on the server/instance?
Here is an updated answer to the updated question:
Only server side resources are used to produce the result set
Assuming JDBC drivers are used, by default, the entire result set is sent to your local computer which could cause out of memory errors
This behavior can be changed by altering the fetch size through the use of a cursor.
Reference to this behavior here
Getting results based on a cursor
On the server side, with a simple query like yours it just keeps a "cursor" which points to where it's at, as it's spooling the results to you, and uses very little memory. Now if there were some "sorts" in there or what not, that didn't have indexes it could use, that might use up lots of memory, not sure there. On the client side the postgres JDBC client by default loads the "entire results" into memory before passing them back to you (overcomeable by specifying a fetch count).
With more complex queries (for example give me all 100M rows, but order them by "X" where X is not indexed) I don't know, but probably internally it creates a temp table (so it won't run out of RAM) which, treated as a normal table, uses disk backing. If there's a matching index then it can just traverse that, using a pointer, still uses little RAM.

Postgres performance improvement and checklist

I'm studing a series of issues related to performance of my application written in Java, which has about 100,000 hits per day and each visit on average from 5 to 10 readings/writings on the 2 principale database tables (divided equally) whose cardinality is for both between 1 and 3 million records (i access to DB via hibernate).
My two main tables store user information (about 60 columns of type varchar, integer and timestamptz) and another linked to the data to be displayed (with about 30 columns here mainly varchar, integer, timestamptz).
The main problem I encountered may have had a drop in performance of my site (let's talk about time loads over 5 seconds which obviously does not depend only on the database performance), is the use of FillFactor which is currently the default value of 100 (that it's used always when data not changing..).
Obviously fill factor it's same on index (there are 10 for each 2 tables of type btree)
Currently on my main tables I make
40% select operations
30% update operations
20% operations insert
10% delete operations.
My database is also made ​​up of 40 other tables of minor importance (there is just others 3 with same cardinality of user).
My questions are:
How do you find the right value of the fill factor to be set ?
Which can be a checklist of tasks to be checked to improve the performance
of a database of this kind?
Database is on server dedicated (16GB Ram, 8 Core) and storage it's on SSD disk (data are backupped all days and moved on another storage)
You have likely hit the "knee" of your memory usage where the entire index of the heavily used tables no longer fits in shared memory, so disk I/O is slowing it down. Confirm by checking if disk I/O is higher than normal. If so, try increasing shared memory (shared_buffers), or if that's already maxed, adjust the system shared memory size or add more system memory so you can bump it higher. You'll also probably have to start adjusting temp buffers, work memory and maintenance memory, and WAL parameters like checkpoint_segments, etc.
There are some perf tuning hints on PostgreSQL.org, and Google is your friend.
Edit: (to address the first comment) The first symptom of not-enough-memory is a big drop in performance, everything else being the same. Changing the table fill factor is not going to make a difference if you hit a knee in memory usage, if anything it will make it worse w.r.t. load times (which I assume means "db reads") because row information will be expanded across more pages on disk with blank space in each page thus more disk I/O is needed for table scans. But fill factor less than 100% can help with UPDATE operations, but I've found adjusting WAL parameters can compensate most of the time when using indexes (unless you've already optimized those). Bottom line, you need to profile all the heavy queries using EXPLAIN to see what will help. But at first glance, I'm pretty certain this is a memory issue even with the database on an SSD. We're talking a lot of random reads and random writes and a lot of SSDs actually get worse than HDDs after a lot of random small writes.

SSIS crash after few records

I have an SSIS package which suppose to take 100,000 records loop on them and for each one save the details to few tables.
It's working fine, until it reaches somewhere near the 3000 records, then the visual studio crashes. At this point devenv.exe used about 500MB and only 3000 rows were processed.
I'm sure the problem is not with a specific record because it always happens on different 3K of records.
I have a good computer with 2 GIG of ram available.
I'm using SSIS 2008.
Any idea what might be the issue?
Thanks.
Try increasing the default buffer size on your data flow tasks.
Example given here: http://www.mssqltips.com/sqlservertip/1867/sql-server-integration-services-ssis-performance-best-practices/
Best Practice #7 - DefaultBufferMaxSize and DefaultBufferMaxRows
As I said in the "Best Practices #6", the execution tree creates
buffers for storing incoming rows and performing transformations. So
how many buffers does it create? How many rows fit into a single
buffer? How does it impact performance?
The number of buffer created is dependent on how many rows fit into a
buffer and how many rows fit into a buffer dependent on few other
factors. The first consideration is the estimated row size, which is
the sum of the maximum sizes of all the columns from the incoming
records. The second consideration is the DefaultBufferMaxSize property
of the data flow task. This property specifies the default maximum
size of a buffer. The default value is 10 MB and its upper and lower
boundaries are constrained by two internal properties of SSIS which
are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size
of a buffer can be as small as 64 KB and as large as 100 MB. The third
factor is, DefaultBufferMaxRows which is again a property of data flow
task which specifies the default number of rows in a buffer. Its
default value is 10000.
Although SSIS does a good job in tuning for these properties in order
to create a optimum number of buffers, if the size exceeds the
DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things. First you can remove
unwanted columns from the source and set data type in each column
appropriately, especially if your source is flat file. This will
enable you to accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune
these properties to have a small number of large buffers, which could
improve performance. Beware if you change the values of these
properties to a point where page spooling (see Best Practices #8)
begins, it adversely impacts performance. So before you set a value
for these properties, first thoroughly testing in your environment and
set the values appropriately.
You can enable logging of the BufferSizeTuning event to learn how many
rows a buffer contains and you can monitor "Buffers spooled"
performance counter to see if the SSIS has began page spooling. I
will talk more about event logging and performance counters in my next
tips of this series.

max memory per query

How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium