How to get LBA(logical block addressing) of a file from MFT on NTFS file system? - block

I accessed the $MFT file and extracted file attributes.
Given the file attributes from MFT, how to get a LBA of file from the MFT record on NTFS file system?
To calculate LBA, I know that cluster number of file.
It that possible using cluster number to calculate?

I'm not entirely sure of your question-- But if you're simply trying to find the logical location on disk of a file, there are various IOCTLs that will achieve this.
For instance, MFT File records: FSCTL_GET_NTFS_FILE_RECORD
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364568(v=vs.85).aspx
Location on disk of a specific file via HANDLE: FSCTL_GET_RETRIEVAL_POINTERS
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364572(v=vs.85).aspx
If you're trying to parse NTFS on your own, you'll need to follow the $DATA attribute-- Which will always be non-resident data runs (unless it's a small file that can be resident within the MFT). Microsoft's data runs are fairly simply structures of data contained in the first two nibbles, which specify offset and length for the next run of data.

IMHO you should write the code by doing some basic arithmetic rather than using IOCTLs and FSCTLs for everything. You should know the size of your disk and the offset from which a volume starts (or every extent by using IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS) and store those values somewhere. Then just add the LCN times the size of a cluster to the offset of the extent on the disk.
Most of the time you just have to deal with one extent. When you have multiple extents you can figure out on which extent the cluster is by multiplying the LCN with the size of a cluster and then subtracting the size of each extent returned by the IOCTL in the order they are returned, if the next number to subtract is greater than your current number, that particular LCN is on that extent.

A file is a single virtually contiguous unit consisting of virtual clusters. These virtual clusters map onto extents (fragments) of logical clusters where LCN 0 is the boot sector of the volume. The logical clusters map onto different logical clusters if there are bad clusters. The actual logical cluster is then translated to a physical cluster, PCN, or LBA (the first sector of the physical cluster) by summing the number of hidden sectors (the sector number of the boot sector relative to the start of the disk) and then adding it to LCN*(sectors per cluster in the volume). PCN = hidden sectors / (sectors per cluster in the volume) + LCN. LBA = hidden sectors + LCN*(sectors per cluster in the volume)

Related

How to lookup Azure drive type?

When creating/resizing OS and Data drives Azure programmatically picks a drive type for you. E.g. standard SSD:
e4
e10
e15
or magnetic s6, s20, ...
Those show up in the cost analysis the next day. E.g.
How to look up the drive type, without waiting 24 hours to see what you payed for?
UPDATE: March 23, here are all the disks I created - I only have two disks now
When defining managed disks you choose the type (Premium SSD, Standard SSD, and Standard HDD) and size. The type determines the class of storage (P, E, S) and then the size allocation sets the description. With that in mind, your configuration determines the value that will be used. For example, S10 = Standard SSD 128 Gb
Here is a reference: https://azure.microsoft.com/en-us/pricing/details/managed-disks/
Unfortunately, it is not explicitly displayed in the disk properties so it looks like the description is assigned as part of the reporting process as it evaluates the composite values.
Thanks for the feedback. If we put the Disks type such as P30, E, S in Disks configuration under Disks tab for the VM, would it satisfy your requirement? You should be able to see disk properties of a particular VM as soon as you have provisioned a VM.

How to know the configured size of the Chronicle Map?

We use Chronicle Map as a persisted storage. As we have new data arriving all the time, we continue to put new data into the map. Thus we cannot predict the correct value for net.openhft.chronicle.map.ChronicleMapBuilder#entries(long). Chronicle 3 will not break when we put more data than expected, but will degrade performance. So we would like to recreate this map with new configuration from time to time.
Now it the real question: given a Chronicle Map file, how can we know which configuration was used for that file? Then we can compare it with actual amount of data (source of this knowledge is irrelevant here) and recreate a map if needed.
entries() is a high-level config, that is not stored internally. What is stored internally is the number of segments, expected number of entries per segment, and the number of "chunks" allocated in the segment's entry space. They are configured via ChronicleMapBuilder.actualSegments(), entriesPerSegment() and actualChunksPerSegmentTier() respectively. However, there is no way at the moment to query the last two numbers from the created ChronicleMap, so it doesn't help much. (You can query the number of segments via ChronicleMap.segments().)
You can contribute to Chronicle-Map by adding getters to ChronicleMap to expose those configurations. Or, you need to store the number of entries separately, e. g. in a file along with the ChronicleMap persisted file.

Get Total Hard Disk Space of Remote Machine Using WMIC

I need to determine "TOTAL" hard disk space of the remote Windows host using wmic call.
I have tried executing wmic /node:<IP-ADDRESS> diskdrive get Size on some systems. For most of the systems, it worked well. But, for a few of them, it displayed multiple values, which are the total sizes of the partitions available.
H:\>wmic /node:172.22.248.112 diskdrive get size
Size
36273484800
293621594112
In order to get unique value for the total hard disk space (addition of sizes of all partitions), what should be done?
try to use
C:\>wmic /node:<IP-ADDRESS> logicaldisk get Size

when unloading a table from amazon redshift to s3, how do I make it generate only one file

When I unload a table from amazon redshift to S3, it always splits the table into two parts no matter how small the table. I have read the redshift documentation regarding unloading, but no answers other than it says sometimes it splits the table (I've never seen it not do that). I have two questions:
Has anybody every seen a case where only one file is created?
Is there a way to force redshift to unload into a single file?
Amazon recently added support for unloading to a single file by using PARALLEL OFF in the UNLOAD statement. Note that you still can end up with more than one file if it is bigger than 6.2GB.
By default, each slice creates one file (explanation below). There is a known workaround - adding a LIMIT to the outermost query will force the leader node to process whole response - thus it will create only one file.
SELECT * FROM (YOUR_QUERY) LIMIT 2147483647;
This only works as long as your inner query returns fewer than 2^31 - 1 records, as a LIMIT clause takes an unsigned integer argument.
How files are created? http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html
Amazon Redshift splits the results of a select statement across a set of files, one or more files per node slice, to simplify parallel reloading of the data.
So now we know that at least one file per slice is created. But what is a slice? http://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html
The number of slices is equal to the number of processor cores on the node. For example, each XL compute node has two slices, and each 8XL compute node has 16 slices.
It seems that the minimal number of slices is 2, and it will grow larger when more nodes or more powerful nodes is added.
As of May 6, 2014 UNLOAD queries support a new PARALLEL options. Passing PARALLEL OFF will output a single file if your data is less than 6.2 gigs (data is split into 6.2 GB chunks).

Difference between blocks and sectors

With reference to this article, there is a line that reads:
Because there are limits to the number of blocks, or drive addresses, that an operating system can address. By defining a block as several sectors, an OS can work with bigger hard drives without increasing the number of block addresses.
What does it mean? What is meant by "operating system can address"? And the subsequent maths isn't clear either. How can 64*512 be less than 64*4?
Look at it this way. Every block that's used in your operating system's file system to store data requires a certain amount of metadata to be stored along with the actual file data you're writing. e.g: timestamps (created, modified), filename, ownership/permission bits. For files that span multiple blocks, you also have to store the IDs of each of those blocks and the order they're chained together, etc.
Determining block size in an OS is a case of tradeoffs. Every file must occupy at least one block, even if the file is 0 bytes long, so there's something for the file's metadata to be attached to. Unless you can guarantee that your files will ALWAYS be some multiple of the block size in size (e.g. in a 4k block OS, all files are 4k), there will be a certain amount of wastage for the files that don't exactly fit within that block.
Small block sizes are good when you need to store many small files. On the other hand, more blocks = more metadata, so you end up wasting a chunk of your storage system on overhead, tracking the location of all the files.
On the flip side, large blocks mean less metadata, but also mean greater wastage when you're storing small files. e.g. a 1 byte file stored in a 4k block wastes 3.99k of that block.
Each of those blocks must be given an ID number by the OS, so it can be uniquely identified. An OS which uses an 8 bit ID field can track only 256 blocks, and therefore, by extension, only 256 files. But if each of those blocks is actually 1 megabyte in size, then you can store up to 256 megabytes of data.
The article you link to has a typo/logical flaw: they meant 512 BYTES, not 512k, so 64*512 bytes is smaller than 64*4k, aka 64*4096 bytes. Most hard drives shipped with 512 byte sector/block sizes.
However, as discussed earlier, small blocks mean more metadata. With drive sizes now in the 3+ terabyte range, with 512 byte blocks, you had to have metadata storage for 3TB/512 bytes = 6.44 billion blocks. That's one major waste of space. So now they ship drives with 4k blocks, 8 times larger, so you only need metadata storage for 805 million blocks. The total number of possible files has been cut by a factor of 8, but the reduced amount of metadata means you can actually store a larger amount of useable data.
Incidentally, 6.4 billion blocks is larger than what can be addressed directly by a 32bit system. 2^32 has an upper limit of ~4.2 billion, so older 32bit machines could not use the entirety of a 3TB drive. Hence switching to larger block sizes. 32bit boxes can easily handle 805 million blocks.