Datastax enterprise graph schema design and cluster sizing - datastax

Please advise on following
1)Based on - https://www.datastax.com/dev/blog/dse-5-0-3-released-huge-performance-gains-for-graph-analytics
where bench-marking is done with each node having 256 GB disk and 256 GB of RAM. Does this mean that entire data is loaded into memory ?
What are node and cluster sizing recommendations?
2)We have a use-case where vertex properties are dynamic and added at runtime. Number of property keys, its name and value can be anything. Something like vertex "entity"(does not have any properties except default id) has edges to vertex "propertydef" and "propertyvalue". Is graph DB right choice for this use-case?
Thanks
Tilak

Related

How to lookup Azure drive type?

When creating/resizing OS and Data drives Azure programmatically picks a drive type for you. E.g. standard SSD:
e4
e10
e15
or magnetic s6, s20, ...
Those show up in the cost analysis the next day. E.g.
How to look up the drive type, without waiting 24 hours to see what you payed for?
UPDATE: March 23, here are all the disks I created - I only have two disks now
When defining managed disks you choose the type (Premium SSD, Standard SSD, and Standard HDD) and size. The type determines the class of storage (P, E, S) and then the size allocation sets the description. With that in mind, your configuration determines the value that will be used. For example, S10 = Standard SSD 128 Gb
Here is a reference: https://azure.microsoft.com/en-us/pricing/details/managed-disks/
Unfortunately, it is not explicitly displayed in the disk properties so it looks like the description is assigned as part of the reporting process as it evaluates the composite values.
Thanks for the feedback. If we put the Disks type such as P30, E, S in Disks configuration under Disks tab for the VM, would it satisfy your requirement? You should be able to see disk properties of a particular VM as soon as you have provisioned a VM.

How to calculate redis memory used percentage on ElastiCache

I want to monitor my redis cache cluster on ElastiCache. From AWS/Elasticache i am able to get metrics like FreeableMemory and BytesUsedForCache. If i am not wrong BytesUsedForCache is the memory used by cluster(assuming there is only one node in cluster). I want to calculate percentage uses of memory. Can any one help me to get percentage of Memory uses in Redis.
We had the same issue since we wanted to monitor the percentage of ElastiCache Redis memory that is consumed by our data.
As you wrote correctly, you need to look at BytesUsedForCache - that is the amount of memory (in bytes) consumed by the data you've stored in Redis.
The other two important numbers are
The available RAM of the AWS instance type you use for your ElastiCache node, see https://aws.amazon.com/elasticache/pricing/
Your value for parameter reserved-memory-percent (check your ElastiCache parameter group). That's the percentage of RAM that is reserved for "nondata purposes", i.e. for the OS and whatever AWS needs to run there to manage your ElastiCache node. By default this is 25 %. See https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/redis-memory-management.html#redis-memory-management-parameters
So the total available memory for your data in ElastiCache is
(100 - reserved-memory-percent) * instance-RAM-size
(In our case, we use instance type cache.r5.2xlarge with 52,82 GB RAM, and we have the default setting of reserved-memory-percent = 25%.
Checking with the info command in Redis I see that maxmemory_human = 39.61 GB, which is equal to 75 % of 52,82 GB.)
So the ratio of used memory to available memory is
BytesUsedForCache / ((100 - reserved-memory-percent) * instance-RAM-size)
By comparing the freeableMemory and bytesUsedForCache metrics, you will have the available memory for the Elasticache non-cluster mode (not sure if it applies to cluster-mode too).
Here is the NRQL we're using to monitor the cache:
SELECT Max(`provider.bytesUsedForCache.Sum`) / (Max(`provider.bytesUsedForCache.Sum`) + Min(`provider.freeableMemory.Sum`)) * 100 FROM DatastoreSample WHERE provider = 'ElastiCacheRedisNode'
This is based on the following:
FreeableMemory: The amount of free memory available on the host. This is derived from the RAM, buffers and cache that the OS reports as freeable.AWS CacheMetrics HostLevel
BytesUsedForCache: The total number of bytes allocated by Redis for all purposes, including the dataset, buffers, etc. This is derived from used_memory statistic at Redis INFO.AWS CacheMetrics Redis
So BytesUsedForCache (amount of memory used by Redis) + FreeableMemory (amount of data that Redis can have access to) = total memory that Redis can use.
With the release of the 18 additional CloudWatch metrics, you can now use DatabaseMemoryUsagePercentage and see the percentage of memory utilization in redis.
View more about the metric in the memory section here
You would have to calculate this based on the size of the node you have selected. See these 2 posts for more information.
Pricing doc gives you the size of your setup.
https://aws.amazon.com/elasticache/pricing/
https://forums.aws.amazon.com/thread.jspa?threadID=141154

What is the memory formula of Redis Sorted Set?

I need to calculate how much memory a Redis SortedSet takes assuming my average element of the Sorted Set is X bytes.
If you know the average size of an element before it's stored in redis, just do this:
Clear redis of all data: command flushall (dumps all databases)
Command info, check field used_memory_human (should be zero or close to it)
Add/store data in redis
info again, check used_memory_human, size indicates memory used by redis to store objects.
Hope it helps

How to get LBA(logical block addressing) of a file from MFT on NTFS file system?

I accessed the $MFT file and extracted file attributes.
Given the file attributes from MFT, how to get a LBA of file from the MFT record on NTFS file system?
To calculate LBA, I know that cluster number of file.
It that possible using cluster number to calculate?
I'm not entirely sure of your question-- But if you're simply trying to find the logical location on disk of a file, there are various IOCTLs that will achieve this.
For instance, MFT File records: FSCTL_GET_NTFS_FILE_RECORD
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364568(v=vs.85).aspx
Location on disk of a specific file via HANDLE: FSCTL_GET_RETRIEVAL_POINTERS
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364572(v=vs.85).aspx
If you're trying to parse NTFS on your own, you'll need to follow the $DATA attribute-- Which will always be non-resident data runs (unless it's a small file that can be resident within the MFT). Microsoft's data runs are fairly simply structures of data contained in the first two nibbles, which specify offset and length for the next run of data.
IMHO you should write the code by doing some basic arithmetic rather than using IOCTLs and FSCTLs for everything. You should know the size of your disk and the offset from which a volume starts (or every extent by using IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS) and store those values somewhere. Then just add the LCN times the size of a cluster to the offset of the extent on the disk.
Most of the time you just have to deal with one extent. When you have multiple extents you can figure out on which extent the cluster is by multiplying the LCN with the size of a cluster and then subtracting the size of each extent returned by the IOCTL in the order they are returned, if the next number to subtract is greater than your current number, that particular LCN is on that extent.
A file is a single virtually contiguous unit consisting of virtual clusters. These virtual clusters map onto extents (fragments) of logical clusters where LCN 0 is the boot sector of the volume. The logical clusters map onto different logical clusters if there are bad clusters. The actual logical cluster is then translated to a physical cluster, PCN, or LBA (the first sector of the physical cluster) by summing the number of hidden sectors (the sector number of the boot sector relative to the start of the disk) and then adding it to LCN*(sectors per cluster in the volume). PCN = hidden sectors / (sectors per cluster in the volume) + LCN. LBA = hidden sectors + LCN*(sectors per cluster in the volume)

What is the maximum value size you can store in redis?

Does anyone know what the maximum value size you can store in redis? I want to use redis as a message queue with celery to store some small documents that need to be processed by a worker on another server, and I want to make sure the documents aren't going to be too big.
I found one page with a reference to 1GB, but when I followed the link on the page for where they got that answer the link wasn't valid anymore. Here is the link:
http://news.ycombinator.com/item?id=1182005
All string values are limited to 512 MiB. This is the size limit you probably care most about.
EDIT: Because keys in Redis are strings, the maximum key size is 512 MiB. The maximum number of keys is 2^32 - 1 = 4,294,967,295.
Values, on the other hand, can vary in size depending on their type. For aggregate data types (i.e. hash, list, set, and sorted set), the maximum value size is 512 MiB for each element, although the data structure itself can have up to 2^32 - 1 elements.
https://redis.io/topics/data-types
https://redis.io/topics/faq#what-is-the-maximum-number-of-keys-a-single-redis-instance-can-hold-and-what-is-the-max-number-of-elements-in-a-hash-list-set-sorted-set
http://groups.google.com/group/redis-db/browse_thread/thread/1c7e33fbc98734b3?fwc=2
Article about Redis Memory Usage can help you to roughly determine how much memory your database would take.
It's in the order of the amount of RAM you have, at least, so unless you plan on puting multi-gigabyte objects in there I wouldn't worry. I've had sets that were hundreds of megabytes big without a problem, but I don't know the exact limits.
A String value can accommodate the size of max 512MB. But according to this link, the size can be increased.