D: How to get free disk space on media/partition? - size

I need to get total and free size on partition. How I can get it with D?
Which functions should I use for it?

On a Posix system (includes Linux, but typically not Windows), you can use the statvfs C function, which is available in D via import core.sys.posix.sys.statvfs;.
Generally speaking, any Posix C function is in import core.sys.posix.something where something is the same as the C #cinlude name just with / swapped out for ..
Anyway, this little program will print out some info about teh filesystem mounted at "/":
import core.sys.posix.sys.statvfs;
import std.stdio;
void main() {
statvfs_t buf;
if(statvfs("/", &buf))
throw new Exception("failed");
writeln(buf.f_bfree * buf.f_bsize, " free bytes");
writeln(buf.f_blocks * buf.f_bsize, " total size in bytes");
writeln((buf.f_bfree * buf.f_bsize) / 1024, " free KB");
writeln((buf.f_blocks * buf.f_bsize) / 1024, " total size in KB");
writeln(100 - (buf.f_bfree * 100 / buf.f_blocks), "% used");
}
It is a pretty easy function call. On Windows, it is different, but similarly easy:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364937%28v=vs.85%29.aspx
just call that function. I can write up an example for that too if you need one though I didn't now because the updated Windows headers that includes it won't be included until the next D release in a couple weeks, so you'd have to extern(Windows) it now... but in a couple weeks you won't have to.

It certainly is operating system specific. On Linux systems, you want the statfs(2) syscall (in C). You might need to make the glue code between C & D yourself.

Related

Megamorphic virtual call - trying to optimize it

we know that there are some techniques that make virtual calls not so expensive in JVM like Inline Cache or Polymorphic Inline Cache.
Let's consider the following situation:
Base is an interface.
public void f(Base[] b) {
for(int i = 0; i < b.length; i++) {
b[i].m();
}
}
I see from my profiler that calling virtual (interface) method m is relatively expensive.
f is on the hot path and it was compiled to machine code (C2) but I see that call to m is a real virtual call. It means that it was not optimised by JVM.
The question is, how to deal with a such situation? Obviously, I cannot make the method m not virtual here because it requires a serious redesign.
Can I do anything or I have to accept it? I was thinking how to "force" or "convince" a JVM to
use polymorphic inline cache here - the number of different types in b` is quite low - between 4-5 types.
to unroll this loop - length of b is also relatively small. After an unroll it is possible that Inline Cache will be helpful here.
Thanks in advance for any advices.
Regards,
HotSpot JVM can inline up to two different targets of a virtual call, for more receivers there will be a call via vtable/itable [1].
To force inlining of more receivers, you may try to devirtualize the call manually, e.g.
if (b.getClass() == X.class) {
((X) b).m();
} else if (b.getClass() == Y.class) {
((Y) b).m();
} ...
During execution of profiled code (in the interpreter or C1), JVM collects receiver type statistics per call site. This statistics is then used in the optimizing compiler (C2). There is just one call site in your example, so the statistics will be aggregated throughout the entire execution.
However, for example, if b[0] always has just two receivers X or Y, and b[1] always has another two receivers Z or W, JIT compiler may benefit from splitting the code into multiple call sites, i.e. manual unrolling:
int len = b.length;
if (len > 0) b[0].m();
if (len > 1) b[1].m();
if (len > 2) b[2].m();
...
This will split the type profile, so that b[0].m() and b[1].m() can be optimized individually.
These are low level tricks relying on the particular JVM implementation. In general, I would not recommend them for production code, since these optimizations are fragile, but they definitely make the source code harder to read. After all, megamorphic calls are not that bad [2].
[1] https://shipilev.net/blog/2015/black-magic-method-dispatch/
[2] https://shipilev.net/jvm/anatomy-quarks/16-megamorphic-virtual-calls/

Objective-c: convert array of uint8 to int32

I'm looking for function which can fast convert array of uint8's to int32's (keeping count of numbers).
There is already such a function to convert uint8 to double in vDSP library:
vDSP_vfltu8D
How can analogous function be implemented on Objective-c (iOS, amd arch)? Pure C solutions accepted too.
In that case, based on the comments above:
ARM's Neon SIMD/Vector library is what you're looking for, but I'm not 100% sure it's supported on iOS. Even if it was, I wouldn't recommend it. You've got a 64-bit architecture on iOS, meaning you would only be able to DOUBLE the speed of your process (because you're converting to int32s).
Now that is if there was a single command that could do this. There isn't. There are a few commands that would allow you to, when used in succession, load the uint8s into a 64-bit register, shift them and zero out the other bytes, and then store those as int32s. Those commands will have more overhead because it takes several operations to do it.
If you really want to look into the commands available, check them out here (again, not sure if they're supported on iOS): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/CJAJIIGG.html
The iOS architecture isn't really built for this kind of processing. Vector commands in most cases only become useful when a computer has 256-bit registers, allowing you to load in 32 bytes at a time and operate on them simultaneously. I would recommend you go with the normal approach of converting one at a time in a loop (or maybe unwrap the loop to remove a bit of overhead like so:
//not syntactically correct code
for (int i = 0; i < lengthOfArray; i+=4) {
int32Array[i] = (int32)int8Array[i];
int32Array[i + 1] = (int32)int8Array[i + 1];
int32Array[i + 2] = (int32)int8Array[i + 2];
int32Array[i + 3] = (int32)int8Array[i + 3];
}
While it's a small optimization, it removes 3/4s of the looping overhead. It won't do much, but hey, it's something.
Source: I worked on Intel's SIMD/Vector team, converting C functions to optimize on 256-bit registers. Some things just couldn't be done efficiently, unfortunately.

File reading and checksums in go. Difference between methods

Recently I'm into creating checksums for files in go. My code is working with small and big files. I tried two methods, the first uses ioutil.ReadFile("filename") and the second is working with os.Open("filename").
Examples:
The first function is working with the io/ioutil and works for small files. When I try to copy a big file my ram gets blastet and for a 1.5GB iso it uses 3GB of ram.
func byteCopy(fileToCopy string) {
file, err := ioutil.ReadFile(fileToCopy) //1.5GB file
omg(err) //error handling function
ioutil.WriteFile("2.iso", file, 0777)
os.Remove("2.iso")
}
Even worse when I want to create a checksum with crypto/sha512 and io/ioutil.
It will never finish and abort because it runs out of memory.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
fmt.Printf("%x", h.Sum(file))
}
When using the function below everything works fine.
func ioHash() {
f, err := os.Open(iso) //iso is a big ~ 1.5tb file
omg(err) //error handling function
defer f.Close()
h := sha512.New()
io.Copy(h, f)
fmt.Printf("%x", h.Sum(nil))
}
My Question:
Why is the ioutil.ReadFile() function not working right? The 1.5GB file should not fill my 16GB of ram. I don't know where to look right now.
Could somebody explain the differences between the methods? I don't get it with reading the go-doc and examples.
Having usable code is nice, but understanding why its working is way above that.
Thanks in advance!
The following code doesn't do what you think it does.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
fmt.Printf("%x", h.Sum(file))
}
This first reads your 1.5GB iso. As jnml pointed out, it continuously makes bigger and bigger buffers to fill it. In the end, And total buffer size is no less than 1.5GB and no greater than 1.875GB (by the current implementation).
However, after that you then make another buffer! h.Sum(file) doesn't hash file. It appends the current hash to file! This may or may not cause yet another allocation.
The real problem is that you are taking that file, now appended with the hash, and printing it with %x. Fmt actually pre-computes using the same type of method jnml pointed out that ioutil.ReadAll used. So it constantly allocated bigger and bigger buffers to store the hex of your file. Since each letter is 4 bits, that means we are talking about no less than a 3GB buffer for that and no greater than 3.75GB.
This means your active buffers may be as big 5.625GB. Combine that with the GC not being perfect and not removing all the intermediate buffers, and it could very easily fill your space.
The correct way to write that code would have been.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
h.Write(file)
fmt.Printf("%x", h.Sum(nil))
}
This doesn't do nearly the number the allocations.
The bottom line is that ReadFile is rarely what you want to use. IO streaming (using readers and writers) is always the best way when it is an option. Not only do you allocate much less when you use io.Copy, you also hash and read the disk concurrently. In your ReadFile example, the two resources are used synchronously when they don't depend on each other.
ioutil.ReadFile is working right. It's your fault to abuse the system resources by using that function for things you know are huge.
ioutil.ReadFile is a handy helper for files you're pretty sure in advance that they're going to be small. Like configuration files, most source code files etc. (Actually it's optimizing things for files <= 1e9 bytes, but that's an implementation detail and not part of the API contract. Your 1.5GB file forces it to use slice growing and thus allocating more than one big buffer for your data in the process of reading the file.)
Even your other approach using os.File is not okay. You definitely should be using the "bufio" package for sequential processing of large files, see bufio.NewReader.

ROL / ROR on variable using inline assembly only in Objective-C [duplicate]

This question already has answers here:
ROL / ROR on variable using inline assembly in Objective-C
(2 answers)
Closed 9 years ago.
A few days ago, I asked the question below. Because I was in need of a quick answer, I added:
The code does not need to use inline assembly. However, I haven't found a way to do this using Objective-C / C++ / C instructions.
Today, I would like to learn something. So I ask the question again, looking for an answer using inline assembly.
I would like to perform ROR and ROL operations on variables in an Objective-C program. However, I can't manage it – I am not an assembly expert.
Here is what I have done so far:
uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5
asm("ROR v1, v2");
the error I get is:
Unknown use of instruction mnemonic with unknown size suffix
How can I fix this?
A rotate is just two shifts - some bits go left, the others right - once you see this rotating is easy without assembly. The pattern is recognised by some compilers and compiled using the rotate instructions. See wikipedia for the code.
Update: Xcode 4.6.2 (others not tested) on x86-64 compiles the double shift + or to a rotate for 32 & 64 bit operands, for 8 & 16 bit operands the double shift + or is kept. Why? Maybe the compiler understands something about the performance of these instructions, maybe the just didn't optimise - but in general if you can avoid assembler do so, the compiler invariably knows best! Also using static inline on the functions, or using macros defined in the same way as the standard macro MAX (a macro has the advantage of adapting to the type of its operands), can be used to inline the operations.
Addendum after OP comment
Here is the i86_64 assembler as an example, for full details of how to use the asm construct start here.
First the non-assembler version:
static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
// assume shift is in range 0..31 or subtraction would be wrong
// however we know the compiler will spot the pattern and replace
// the expression with a single roll and there will be no subtraction
// so if the compiler changes this may break without:
// shift &= 0x1f;
return (value << shift) | (value >> (32 - shift));
}
void test_rotl32(uint32 value, unsigned shift)
{
uint32 shifted = rotl32_i64(value, shift);
NSLog(#"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}
If you look at the assembler output for profiling (so the optimiser kicks in) in Xcode (Product > Generate Output > Assembly File, then select Profiling in the pop-up menu as the bottom of the window) you will see that rotl32_i64 is inlined into test_rotl32 and compiles down to a rotate (roll) instruction.
Now producing the assembler directly yourself is a bit more involved than for the ARM code FrankH showed. This is because to take a variable shift value a specific register, cl, must be used, so we need to give the compiler enough information to do that. Here goes:
static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
// i64 - shift must be in register cl so create a register local assigned to cl
// no need to mask as i64 will do that
register uint8 cl asm ( "cl" ) = shift;
uint32 shifted;
// emit the rotate left long
// %n values are replaced by args:
// 0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
// 1: "0" (value) - *same* register as %0 (0), load from var (value)
// 2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
__asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
return shifted;
}
Change test_rotl32 to call rotl32_i64_asm and check the assembly output again - it should be the same, i.e. the compiler did as well as we did.
Further note that if the commented out masking line in rotl32_i64 is included it essentially becomes rotl32 - the compiler will do the right thing for any architecture all for the cost of a single and instruction in the i64 version.
So asm is there is you need it, using it can be somewhat involved, and the compiler will invariably do as well or better by itself...
HTH
The 32bit rotate in ARM would be:
__asm__("MOV %0, %1, ROR %2\n" : "=r"(out) : "r"(in), "M"(N));
where N is required to be a compile-time constant.
But the output of the barrel shifter, whether used on a register or an immediate operand, is always a full-register-width; you can shift a constant 8-bit quantity to any position within a 32bit word, or - as here - shift/rotate the value in a 32bit register any which way.
But you cannot rotate 16bit or 8bit values within a register using a single ARM instruction. None such exists.
That's why the compiler, on ARM targets, when you use the "normal" (portable [Objective-]C/C++) code (in << xx) | (in >> (w - xx)) will create you one assembler instruction for a 32bit rotate, but at least two (a normal shift followed by a shifted or) for 8/16bit ones.

Serialising and counting a list of values

I need to serialise a large list of values using a custom encoding function (which I have). I've done this and it works, but I'd also like to have it count how many values are being serialised and written to disk whilst still using a relatively constant amount of memory (i.e. it shouldn't need to keep the entire input list around, as it gets very large).
Without the requirement of keeping a count, binary, cereal and blaze-builder all work (using the equivalent of B.writeFile "foo" . runPut . mapM_ encodeValue); but no matter what I try to do with any of these libraries it seems that the resulting ByteString gets kept around in memory until it is finished rather than starting to be written to disk as soon as a chunk is available (even when using toByteStringIO from blaze-builder).
This is a minimal example demonstrating what I've been trying to do:
import Data.Binary
import Data.Binary.Put
import Control.Monad(foldM)
import qualified Data.ByteString.Lazy as B
main :: IO ()
main = do let ns = [1..10000000] :: [Int]
(count,b) = runPutM $ foldM (\ c n -> c `seq` (put n >> return (c+1))) (0 :: Int) ns
B.writeFile "testOut" b
print count
When compiled and run with +RTS -hy, the result is an almost triangular graph dominated by ByteString values.
The only solution I've found so far (that I'm not a big fan of) is to do the looping (either directly or with foldM) in IO using B.appendFile rather than within Put or directly constructing a Builder value, which to me doesn't seem very elegant. Is there a better way?
I'm a bit surprised that toByteStringIO doesn't work, hopefully someone more familiar with that library will provide an answer.
That being said, whenever I want to intermix stream processing with IO actions, I usually find iteratees to be the most elegant solution. This is because they allow for precise control over how much data is processed and retained, and for combining the streaming aspects with other arbitrary IO actions. There are several iteratee implementations on hackage; this example is with "iteratee" because it's the one I'm most familiar with.
import Data.Binary.Put
import Control.Monad
import Control.Monad.IO.Class
import qualified Data.ByteString.Lazy as B
import Data.ByteString.Lazy.Internal (defaultChunkSize)
import Data.Iteratee hiding (foldM)
import qualified Data.Iteratee as I
main :: IO ()
main = do
let ns = [1..80000000] :: [Int]
iter <- enumPureNChunk ns (defaultChunkSize `div` 8)
(joinI $ serializer $ writer "testOut")
count <- run iter
print count
serializer = mapChunks ((:[]) . runPutM . foldM
(\ !cnt n -> put n >> return (cnt+1)) 0)
writer fp = I.foldM
(\ !cnt (len,ck) -> liftIO (B.appendFile fp ck) >> return (cnt+len))
0
There are three parts to this. writer is the "iteratee", i.e. a data consumer. It writes each chunk of data as its received and keeps a running count of the length. serializer is a stream transformer a.k.a. "enumeratee". It takes an input chunk of type [Int] and serializes it to a stream with type [(Int, B.ByteString)] (number of elements, bytestring). Finally enumPureNChunk is the "enumerator", which produces a stream, in this case from the input list. It takes enough elements from the input to fill a single lazy bytestring chunk (I'm on 64bit, divide by 4 for 32bit systems), and then writes them to disk so they can be GC'd.