How to pre-allocated volumes on a specified volume server to balance volumes? - seaweedfs

I have a master server M1 and three volume servers V1,V2,V3 in my cluster. And I want to add three more volume servers. V1,V2,V3 already have 30 volumes and newly added V4,V5,V6 have no volumes.
Now I want new files to be stored in V4,V5,V6, but after some time I see no new volumes appears under V4,V5,V6. Of course, volume.balance should solve balance it, but the url to files will change. The url are like V1address:V1port/somefid and may become V5address:V5port/somefid after balance.
If I pre-allocate volumes, the pre-allocated ones are randomly allocated at all volume server. How to pre-allocated volumes only on V4,V5,V6?

The volumes are expected to move around. Use the volume id to lookup the location, and then resolve to the volume server address.
To explicitly create volumes on specific servers, see https://github.com/chrislusf/seaweedfs/wiki/Optimization#increase-concurrent-writes
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1&dataNode=node1

Related

How to lookup Azure drive type?

When creating/resizing OS and Data drives Azure programmatically picks a drive type for you. E.g. standard SSD:
e4
e10
e15
or magnetic s6, s20, ...
Those show up in the cost analysis the next day. E.g.
How to look up the drive type, without waiting 24 hours to see what you payed for?
UPDATE: March 23, here are all the disks I created - I only have two disks now
When defining managed disks you choose the type (Premium SSD, Standard SSD, and Standard HDD) and size. The type determines the class of storage (P, E, S) and then the size allocation sets the description. With that in mind, your configuration determines the value that will be used. For example, S10 = Standard SSD 128 Gb
Here is a reference: https://azure.microsoft.com/en-us/pricing/details/managed-disks/
Unfortunately, it is not explicitly displayed in the disk properties so it looks like the description is assigned as part of the reporting process as it evaluates the composite values.
Thanks for the feedback. If we put the Disks type such as P30, E, S in Disks configuration under Disks tab for the VM, would it satisfy your requirement? You should be able to see disk properties of a particular VM as soon as you have provisioned a VM.

How to calculate redis memory used percentage on ElastiCache

I want to monitor my redis cache cluster on ElastiCache. From AWS/Elasticache i am able to get metrics like FreeableMemory and BytesUsedForCache. If i am not wrong BytesUsedForCache is the memory used by cluster(assuming there is only one node in cluster). I want to calculate percentage uses of memory. Can any one help me to get percentage of Memory uses in Redis.
We had the same issue since we wanted to monitor the percentage of ElastiCache Redis memory that is consumed by our data.
As you wrote correctly, you need to look at BytesUsedForCache - that is the amount of memory (in bytes) consumed by the data you've stored in Redis.
The other two important numbers are
The available RAM of the AWS instance type you use for your ElastiCache node, see https://aws.amazon.com/elasticache/pricing/
Your value for parameter reserved-memory-percent (check your ElastiCache parameter group). That's the percentage of RAM that is reserved for "nondata purposes", i.e. for the OS and whatever AWS needs to run there to manage your ElastiCache node. By default this is 25 %. See https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/redis-memory-management.html#redis-memory-management-parameters
So the total available memory for your data in ElastiCache is
(100 - reserved-memory-percent) * instance-RAM-size
(In our case, we use instance type cache.r5.2xlarge with 52,82 GB RAM, and we have the default setting of reserved-memory-percent = 25%.
Checking with the info command in Redis I see that maxmemory_human = 39.61 GB, which is equal to 75 % of 52,82 GB.)
So the ratio of used memory to available memory is
BytesUsedForCache / ((100 - reserved-memory-percent) * instance-RAM-size)
By comparing the freeableMemory and bytesUsedForCache metrics, you will have the available memory for the Elasticache non-cluster mode (not sure if it applies to cluster-mode too).
Here is the NRQL we're using to monitor the cache:
SELECT Max(`provider.bytesUsedForCache.Sum`) / (Max(`provider.bytesUsedForCache.Sum`) + Min(`provider.freeableMemory.Sum`)) * 100 FROM DatastoreSample WHERE provider = 'ElastiCacheRedisNode'
This is based on the following:
FreeableMemory: The amount of free memory available on the host. This is derived from the RAM, buffers and cache that the OS reports as freeable.AWS CacheMetrics HostLevel
BytesUsedForCache: The total number of bytes allocated by Redis for all purposes, including the dataset, buffers, etc. This is derived from used_memory statistic at Redis INFO.AWS CacheMetrics Redis
So BytesUsedForCache (amount of memory used by Redis) + FreeableMemory (amount of data that Redis can have access to) = total memory that Redis can use.
With the release of the 18 additional CloudWatch metrics, you can now use DatabaseMemoryUsagePercentage and see the percentage of memory utilization in redis.
View more about the metric in the memory section here
You would have to calculate this based on the size of the node you have selected. See these 2 posts for more information.
Pricing doc gives you the size of your setup.
https://aws.amazon.com/elasticache/pricing/
https://forums.aws.amazon.com/thread.jspa?threadID=141154

carbon-relay Replication across Datacenters

I recently "inherited" a Carbon/Graphite setup from a colleague which I have to redesign. The current setup is:
Datacenter 1 (DC1): 2 servers (server-DC1-1 and server-DC1-2) with 1 carbon-relay and 4 carbon caches
Datacenter 2 (DC2): 2 servers (server-DC2-1 and server-DC2-2) with 1 carbon-relay and 4 carbon caches
All 4 carbon-relays are configured with a REPLICATION_FACTOR of 2, consistent hashing and all carbon-caches ( 2(DCs) * 2(Servers) * 4(Caches) ). This had the effect that some metrics exist only on 1 server (they probably were hashed to a different cache on the same server). With over 1 million metrics this problem affects about 8% of all metrics.
What I would like to do is a multi-tiered setup with redundancy, so that I mirror all metrics across the datacenters and inside the datacenter I use consistent hashing to distribute the metrics evenly across 2 servers.
For this I need help with the configuration (mainly) of the relays. Here is a picture of what I have in mind:
The clients would send their data to the tier1relays in their respective Datacenters ("loadbalancing" would occur on client side, so that for example all clients with an even number in the hostname would send to tier1relay-DC1-1 and clients with an odd number would send to tier1relay-DC1-2).
The tier2relays use consistent hashing to distribute the data in the datacenter evenly across the 2 servers. For example the "pseudo" configuration for tier2relay-DC1-1 would look like this:
RELAY_METHOD = consistent-hashing
DESTINATIONS = server-DC1-1:cache-DC1-1-a, server-DC1-1:cache-DC1-1-b, (...), server-DC1-2:cache-DC1-2-d
What I would like to know: how do I tell tier1relay-DC1-1 and tier1relay-DC1-2 that they should send all metrics to the tier2relays in DC1 and DC2 (replicate the metrics across the DCs) and do some kind of "loadbalancing" between tier2relay-DC1-1 and tier2relay-DC1-2.
On another note: I also would like to know what happens inside the carbon-relay if I use consistent hashing, but one or more of the destinations are unreachable (server down) - do the metrics get hashed again (against the reachable caches) or will they simply be dropped for the time? (Or to ask the same question from a different angle: when a relay receives a metric does it do the hashing of the metric based on the list of all configured destinations or based on the currently available destinations?)
https://github.com/grobian/carbon-c-relay
Which exactly does what you need. Also it give you a great boost in performance.

CloudFront: Cost estimate

Have to come up with a proposal to use Amazon S3 with CloudFront as CDN.
One of the important thing is to do a cost estimate. I read over AWS website and forums, used their calculator, but couldn't come to a conclusion with the final number (approx) that I will be confident of. Honestly, I got confused between terms like "Data Transfer Out", "GET and Other Requests" and whether I need to fill in the details both at Amazon S3 and Amazon CloudFront and then do a sum total.
So need help here to estimate my monthly bill.
I will be using S3 to store files (mostly images)
I will be configuring cloud front with my S3 bucket to deliver the content.
Most of the client base (almost 95%) is in US.
Average file size: 500KB
Average number of files stored on S3 monthly: 80000 (80K)
Approx number of total users requesting for the file monthly or approx number of total requests to fetch the file from CloudFront: 30 Millions monthly
There will be some invalidation requests per month (lets say 1000)
Would be great if I can get more understanding as to how my monthly bill will be calculated and what approximately it will be.
Also, with the above data and estimates, any approx on how much the monthly bill, if I use Akamai or Rackspace.
I'll throw another number into the ring.
Using http://calculator.s3.amazonaws.com/calc5.html
CloudFront
data transfer out
0.5MB x 30 million = ~15,000GB
Average size 500kb
1000 invalidation requests
95% US
S3
Storage
80K x 0.5MB 4GB
requests
30million
My initial result is $1,413. As #user2240751 noted, a factor of safety of 2 isn't unreasonable, so that's in the $1,500 - $3,000/month range.
I'm used to working with smaller numbers, but the final amount is always more than you might expect because of extra requests and data transfer.
Corrections or suggestions for improvements welcome!
Good luck
The S3 put and get request fields (in your case) should be restricted to the number of times you are likely to call / update the files in S3 from your application only.
To calculate the Cloudfront service costs, you should work out the rough outbound bandwidth of your page load (number of objects served from cloudfront per page - then double it - to give yourself some headroom), and fill in the rest of the fields.
Rough calc.
500GB data out (guess)
500k average object size
1000 invalidation requests
95% to US based edge location
5% to Europe based edge location
Comes in at $60.80 + your S3 costs.
I think the maths here is wrong 0.5MB * 30,000,000 is 14503GB NOT 1500GB - thats a factor of 10 out unless I'm missing something
Which means your monthly costs are going to be around $2000 not $200

How to get LBA(logical block addressing) of a file from MFT on NTFS file system?

I accessed the $MFT file and extracted file attributes.
Given the file attributes from MFT, how to get a LBA of file from the MFT record on NTFS file system?
To calculate LBA, I know that cluster number of file.
It that possible using cluster number to calculate?
I'm not entirely sure of your question-- But if you're simply trying to find the logical location on disk of a file, there are various IOCTLs that will achieve this.
For instance, MFT File records: FSCTL_GET_NTFS_FILE_RECORD
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364568(v=vs.85).aspx
Location on disk of a specific file via HANDLE: FSCTL_GET_RETRIEVAL_POINTERS
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364572(v=vs.85).aspx
If you're trying to parse NTFS on your own, you'll need to follow the $DATA attribute-- Which will always be non-resident data runs (unless it's a small file that can be resident within the MFT). Microsoft's data runs are fairly simply structures of data contained in the first two nibbles, which specify offset and length for the next run of data.
IMHO you should write the code by doing some basic arithmetic rather than using IOCTLs and FSCTLs for everything. You should know the size of your disk and the offset from which a volume starts (or every extent by using IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS) and store those values somewhere. Then just add the LCN times the size of a cluster to the offset of the extent on the disk.
Most of the time you just have to deal with one extent. When you have multiple extents you can figure out on which extent the cluster is by multiplying the LCN with the size of a cluster and then subtracting the size of each extent returned by the IOCTL in the order they are returned, if the next number to subtract is greater than your current number, that particular LCN is on that extent.
A file is a single virtually contiguous unit consisting of virtual clusters. These virtual clusters map onto extents (fragments) of logical clusters where LCN 0 is the boot sector of the volume. The logical clusters map onto different logical clusters if there are bad clusters. The actual logical cluster is then translated to a physical cluster, PCN, or LBA (the first sector of the physical cluster) by summing the number of hidden sectors (the sector number of the boot sector relative to the start of the disk) and then adding it to LCN*(sectors per cluster in the volume). PCN = hidden sectors / (sectors per cluster in the volume) + LCN. LBA = hidden sectors + LCN*(sectors per cluster in the volume)