What is the reason for such an implementation of the io::stdin().read_line(&mut input) method? Why not just return a Result with the appropriate error or input? Why pass &mut input? Does this approach have any advantages?
The huge advantage of this is that because you're passing a mutable reference to an existing String, you can reuse a buffer, or pre-allocate it as needed:
// 2K is a good default buffer size, but definitely do
// analyze the situation and adjust its size accordingly
let mut buffer = String::with_capactity(2048);
// Lock our standard input to eliminate synchronization overhead (unlocks when dropped)
let mut stdin = io::stdin().lock();
// Read our first line.
stdin.read_line(&mut buffer)?;
// This is a stand-in for any function that takes an &str
process_line_1(&buffer);
// Discard the data we've read, but retain the buffer that we have
buffer.clear();
// Reading a second line will reuse the memory allocation:
stdin.read_line(&mut buffer)?;
process_line_2(&buffer)?;
Remember: allocating too much is a lot more efficient than allocating too often. Buffer sharing across different functions may be a little unwieldy due to Rust's borrowing rules (my advice is to have a "cache struct" that keeps empty pre-allocated buffers for a specific function or a collection of APIs), but if you're creating and destroying the buffer within one function, there's minimal work required to get this caching set up and varying potential for performance benefit from caching allocations like this.
What's great about this API is that it not only enables buffer reuse in a clean way, it also encourages it. If you have several .read_line() calls in a row, it immediately feels wrong to create a new buffer for every call. The API design teaches you how to use it efficiently without saying a word. The takeaway is that this tiny trick doesn't just improve performance of I/O code in Rust, it also attempts to guide beginners towards designing their own APIs in this manner, although allocation reuse is sadly often overlooked in third-party APIs. [citation needed]
I believe it has to do with considerations about where the line read will live: on the heap or maybe on the stack of the caller or wherever the caller wants it to be.
Note that a function has no way to return a reference to a value that lives on it's own stack as that value wouldn't live long enough. So the only other option would be to allocate it on the heap or copy the whole thing around, neither of which is desirable from the POV of the caller.
(Please take into account that I am a rust beginner myself, so this answer may be totally wrong. In which case I'm ready to delete it.)
Related
I've got a chunk of memory in a Buf I want to pass in to a C library, but the library will be using the memory beyond the lifetime of a single call.
I understand that can be problematic since the Garbage Collector can move memory around.
For passing in a Str, the Nativecall docs
say "If the C function requires the lifetime of a string to exceed the function call, the argument must be manually encoded and passed as CArray[uint8]" and have an example of doing that, essentially:
my $array = CArray[uint8].new($string.encode.list);
My question is: Must I do the same thing for a Buf? In case it gets moved by the GC? Or will the GC leave my Buf where it sits? For a short string, that isn't a big deal, but for a large memory buffer, that could potentially be an expensive operation. (See, for example, Archive::Libarchive which you can pass in a Buf with a tar file. Is that code problematic?
multi method open(Buf $data!) {
my $res = archive_read_open_memory $!archive, $data, $data.bytes;
...
Is there (could there be? should there be?) some sort of trait on a Buf that tells the GC not to move it around? I know that could be trouble if I add more data to the Buf, but I promise not to do that. What about for a Blob that is immutable?
You'll get away with this on MoarVM, at least at the moment, provided that you keep a reference to the Blob or Buf alive in Perl 6 for as long as the native code needs it and (in the case of Buf) you don't do a write to it that could cause a resize.
MoarVM allocates the Blob/Buf object inside of the nursery, and will move it during GC runs. However, that object does not hold the data; rather, it holds the size and a pointer to a block of memory holding the values. That block of memory is not allocated using the GC, and so will not move.
+------------------------+
| GC-managed Blob object |
+------------------------+ +------------------------+
| Elements |----->| Non-GC-managed memory |
+------------------------+ | (this bit is passed to |
| Size | | native code) |
+------------------------+ +------------------------+
Whether you should rely on this is a trickier question. Some considerations:
So far as I can tell, things could go rather less well if running on the JVM. I don't know about the JavaScript backend. You could legitimately decide that, due to adoption levels, you're only going to worry about running on MoarVM for now.
Depending on implementation details of MoarVM is OK if you just need the speed in your own code, but if working on a module you expect to be widely adopted, you might want to think if it's worth it. A lot of work is put in by both the Rakudo and MoarVM teams to not regress working code in the module ecosystem, even in cases where it can be well argued that it depended on bugs or undefined behavior. However, that can block improvements. Alternatively, on occasion, the breakage is considered worth it. Either way, it's time consuming, and falls on a team of volunteers. Of course, when module authors are responsive and can apply provided patches, it's somewhat less of a problem.
The problem with "put a trait on it" is that the decision - at least on the JVM - seems to need to be made up front at the time that the memory holding the data is allocated. In which case, a portable solution probably can't allow an existing Buf/Blob to be marked up as such. Perhaps a better way will be for I/O-ish things to be asked to give something CArray-like instead, so that zero-copy can be achieved by having the data in the "right kind of memory" in the first place. That's probably a reasonable feature request.
I have read Apple's memory management guide, and think I understand the practices that should be followed to ensure proper memory management in my application.
At present it looks like there are no memory leaks in my code. But as my code grows more complex, I wonder whether there is any particular pattern I should follow to keep track of allocations and deallocations of objects.
Does it make sense to create some kind of global object that is present throughout the execution of the application which contains a count of the number of active objects of a type? Each object could increment the count of their type in their init method, and decrement it in dealloc. The global object could verify at appropriate times if the count of a particular type is zero of not.
EDIT: I am aware of how to use the leaks too, as well as how to analyze the project using Xcode. The reason for this post is to keep track of cases which may not be detected through leaks or analyze easily.
EDIT: Also, it seems to make sense to have something like this so that leaks can be detected in builds early by running unit tests that check the global object. I guess that as an inexperienced objective-c programmer I would benefit from the views of others on this.
Each object could increment the count of their type in their init
method, and decrement it in dealloc.
To do that right, you'll have to do one of the following: 1) override behavior at some common point, such as NSObject's -init or , or 2) add the appropriate code to the designated initializer of every single class. Neither seems simple.
The global object could verify at appropriate times if the count of a
particular type is zero of not.
Sounds good, but can you elaborate a bit on "appropriate times"? How would you know at any given point in the life of your program which classes should have zero instances? You'd have a pretty good idea that there should be no objects at the end of the program, but Instruments could tell you the same thing in that case.
Objective-C has taken several steps to make memory management much simpler. Use properties and synthesized accessors where you can, as they essentially manage your objects for you. A more recent improvement is ARC, which goes even further toward automating most memory management tasks. You basically let the compiler figure out where to put the memory management calls -- it's like garbage collection without the garbage collector. Learn to use those tools well before you try to invent new ones.
Don't go that route... it's a pain in single inheritance. Most importantly, there are excellent tools at your disposal which you should master before thinking you must create some global counter. The global counter exists in a few tools already -- Learn them!
The way you combat it is to learn how to balance and manage everything correctly when it's written. It's really very simple in hindsight.
ARC is another option -- really that just postpones your understanding.
The first "design pattern" I recommend it to use release instead of autorelease where possible (although generally more useful for over-releases).
Next, run the leaks instrument/util regularly and fix all leaks/zombies immediately.
Third, learn the existing tools as you go! These tools can do really crazy stuff, like record the backtrace of every allocation and every reference count. You can pause your program's execution and view what allocations exist, alloc counts, backtraces, and all sorts of other stats.
Out of pure curiosity, is there a way to free the memory used by a StringBuilder, other than the obvious MyBuilder = New StringBuilder and MyBuilder.Remove(0, Length) (although I guess the later wouldn't free anything, would it?)
Thanks!
CFP.
The best way is to leave it fall out of scope and the GC will take care of the memory.
You are right, using the Remove() method just resets an internal index, it doesn't do anything with the internal array. Not explicitly managing memory is a key property of the .NET framework, the garbage collector sorts it out automatically.
Technically you can by calling GC.Collect(). That is however almost always a Really Bad Idea. Yes, you'll make the StringBuilder instance and its internal array disappear, providing there are no references left to it. But it also promotes other objects in your program too early. Which means they'll stick around longer than necessary, your program will eventually use more memory.
You can either let it fall out of scope or Remove(0, Length) and then set Capacity to 0 (I think it frees the extra memory if you set Capacity to 0, but I am not entirely sure).
You could always use MyBuilder = Nothing, but I usually just let them fall out of scope.
The way you "free memory" for any object in .NET is to let the GC take care of it.
In the case of a StringBuilder, just set the instance to Nothing* and let it be.
I think you might be confused about what it means to "dispose" of objects in .NET. The purpose of the IDisposable interface is to provide a mechanism by which objects can release access to some shared resource, e.g., file streams. Calling Dispose is not the same as releasing memory. Since a StringBuilder does not access any shared resources, it does not need to implement IDisposable.
If you must force memory to be freed, that's what GC.Collect is for. But honestly, I've never encountered a scenario where it made sense to call that yourself (as opposed to letting the GC decide when it makes sense to perform a collection).
*I am assuming it's a class-level variable. If it was only a local variable to begin with, there's no need to set it to Nothing as it will soon fall out of scope anyway.
One of the things that block objects, introduced in Snow Leopard, are good for is situations that would previously have been handled with callbacks. The syntax is much cleaner for passing context around. However, I haven't seen any information on the performance implications of using blocks in this manner. What, if any, performance pitfalls should I look out for when using blocks, particularly as a replacement for a C-style callback?
The blocks runtime looks pretty tight. Block descriptors and functions are statically allocated, so they could enlarge the working set of your program, but you only "pay" in storage for the variables you reference from the enclosing scope. Non-global block literals and __block variables are constructed on the stack without any branching, so you're unlikely to run into much of a slowdown from that. Calling a block is just result = (*b->__FuncPtr)(b, arg1, arg2); this is comparable to result = (*callback_func_ptr)(callback_ctx, arg1, arg2).
If you think of blocks as "callbacks that write their own context structure and handle the ugly packing, memory management, casting, and dereferencing for you," I think you'll realize that blocks are a small cost at runtime and a huge savings in programming time.
You might want to check out this blog post and this one. Blocks are implemented as Objective-C objects, except they can be put on the stack, so they don't necessarily have to be malloc'd (if you retain a reference to a block, it will be copied onto the heap, though). They will thus probably perform better than most Objective-C objects, but will have a slight performance hit compared to a simple callback--I'd guess it shouldn't be a problem 95% of the time.
To begin with, let me say that I understand how and why the problem I'm describing can happen. I was a Computer Science major, and I understand overflow/underflow and signed/unsigned arithmetic. (For those unfamiliar with the topic, Apple's Secure Coding Guide discusses integer overflow briefly.)
My question is about reporting and recovering from such an error once it has been detected, and more specifically in the case of an Objective-C framework. (I write and maintain CHDataStructures.) I have a few collections classes that allocate memory for storing objects and dynamically expand as necessary. I haven't yet seen any overflow-related crashes, probably because my test cases mostly use sane data. However, given unvalidated values, things could explode rather quickly, and I want to prevent that.
I have identified at least two common cases where this can occur:
The caller passes a very large unsigned value (or negative signed value) to -initWithCapacity:.
Enough objects have been added to cause the capacity to dynamically expand, and the capacity has grown large enough to cause overflow.
The easy part is detecting whether overflow will occur. (For example, before attempting to allocate length * sizeof(void*) bytes, I can check whether length <= UINT_MAX / sizeof(void*), since failing this test will mean that the product will overflow and potentially allocate a much smaller region of memory than desired. On platforms that support it, the checkint.h API is another alternative.) The harder part is determining how to deal with it gracefully. In the first scenario, the caller is perhaps better equipped (or at least in the mindset) to deal with a failure. The second scenario can happen anywhere in the code that an object is added to the collection, which may be quite non-deterministic.
My question, then, is this: How is "good citizen" Objective-C code expected to act when integer overflow occurs in this type of situation? (Ideally, since my project is a framework in the same spirit as Foundation in Cocoa, I'd like to model off of the way it behaves for maximum "impedance matching". The Apple documentation I've found doesn't mention much at all about this.) I figure that in any case, reporting the error is a given. Since the APIs to add an object (which could cause scenario 2) don't accept an error parameter, what can I really do to help resolve the problem, if anything? What is really considered okay in such situations? I'm loath to knowingly write crash-prone code if I can do better...
Log and raise an exception.
You can only really be a good citizen to other programmers, not the end user, so pass the problem upstairs and do it in a way that clearly explains what is going on, what the problem is (give numbers) and where it is happening so the root cause can be removed.
There are two issues at hand:
(1) An allocation has failed and you are out of memory.
(2) You have detected an overflow or other erroneous condition that will lead to (1) if you continue.
In the case of (1), you are hosed (unless the failed allocation was both stupid large & you know that the failed allocation was only that one). If this happens, the best thing you can do is to crash as quickly as possible and leave behind as much evidence as you can. In particular, creating a function that calls abort() of a name like IAmCrashingOnPurposeBecauseYourMemoryIsDepleted() will leave evidence in the crash log.
If it is really (2), then there are additional questions. Specifically, can you recover from the situation and, regardless, is the user's data still intact? If you can recover, then grand... do so and the user never has to know. If not, then you need to make absolutely sure that the user's data is not corrupt. If it isn't, then save and die. If the user's data is corrupt, then do your best to not persist the corrupted data and let the user know that something has gone horribly wrong. If the user's data is already persisted, but corrupt, then... well... ouch... you might want to consider creating a recovery tool of some kind.
With regards to dynamically growing, array-based storage, there's only so much that can be done. I'm a developer on the Moab scheduler for supercomputers, and we also deal with very large numbers on systems with thousands of processors, thousands of jobs, and massive amounts of job output. At some point, you can't declare a buffer to be any bigger, without creating a whole new data-type to deal with sizes larger than UINT_MAX, or LONG_LONG_MAX etc., at which point on most "normal" machines you'll be running out of stack/heap space anyway. So I'd say log a meaningful error-message, keep the collection from exploding, and if the user needs to add that many things to a CHDataStructures collection, they ought to know that there are issues dealing with very large numbers, and the caller ought to check whether the add was successful (keep track of the size of the collection, etc.).
Another possibility is to convert array-based storage to dynamically allocated, linked-list-based storage when you get to the point when you can't allocate a larger array with an unsigned int or unsigned long. This would be expensive, but would happen rarely enough that it shouldn't be terribly noticeable to users of the framework. Since the limit on the size of a dynamically allocated, linked-list-based collection is the size of the heap, any user that added enough items to a collection to "overflow" it then would have bigger problems than whether or not his item was successfully added.
I'd say the correct thing to do would be to do what the Cocoa collections do. For example, if I have the following code:
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSMutableArray * a = [[NSMutableArray alloc] init];
for (uint32_t i = 0; i < ULONG_MAX; ++i) {
for (uint32_t i = 0; i < 10000000; ++i) {
[a addObject:#"foo"];
}
NSLog(#"%lu rounds of 10,000,000 completed", i+1);
}
[a release];
[pool drain];
return 0;
}
..and just let it run, it will eventually die with EXC_BAD_ACCESS. (I compiled and ran this as a 32-bit app so I could be sure to run out of space when I hit 2**32 objects.
In other words, throwing an exception would be nice, but I don't think you really have to do anything.
Using assertions and a custom assertion handler may be the best available option for you.
With assertions, you could easily have many checkpoints in your code, where you verify that things work as they should. If they don't, by default the assertion macro logs the error (developer-defined string), and throws an exception. You can also override the default behavior using a custom assertion handler and implement a different way to handle error conditions (even avoid throwing exceptions).
This approach allows for a greater degree of flexibility and you can easily modify your error handling strategy (throwing exceptions vs. dealing with errors internally) at any point.
The documentation is very concise: Assertions and Logging.