vkUpdateDescriptorSets updates the descriptor instantly, so it seems to me that it's doing a write across PCI-E to write to device-local memory. I think this is what's happening as there's no concept of the descriptor writes being available "by the next command", it's immediate, so I think this must be taking place. My question is that if you have a case where you do:
vkUpdateDescriptorSets();
vkCmdDraw();
And then submit the command buffer to a queue, you're essentially doing the transfer/write to GPU twice. However since you generally don't need the descriptors updated until you submit further commands I'm wondering if we can have greater control of this? I've seen there's an extension vkCmdPushDescriptorSetKHR, which I think does what I want, but according to this answer AMD drivers don't support it.
Is my concept about this right? I mean, for the descriptors writes to be made immediately then what must be happening essentially is an extra HOST-GPU synchronisation, right?
And then submit the command buffer to a queue, you're essentially doing the transfer/write to GPU twice.
The result of that depends on whether you're using descriptor indexing or not. If you're not using VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT on the binding(s) being modified, then you have undefined behavior. Outside of the protections provided by that flag, you are specifically disallowed from modifying any descriptor set which is bound to any command buffer until all CBs in which the descriptor set is bound have completed execution.
So, does this mean that updating a descriptor set performs some kind of immediate memory transfer to GPU-accessible memory? Or does binding the descriptor set cause the transfer? Or does something else entirely happen?
It doesn't matter. Unless you're using descriptor indexing, you are not allowed to modify a bound descriptor set. And if you are using descriptor indexing, you still cannot modify a descriptor set that's being used. It simply changes the definition of "used" to "is part of a command buffer that was submitted." Exactly how those changes happen is up to the implementation.
Related
Under the Stream Programming Guide: Polling versus Run-Loop Scheduling section, the last para says:
It should be pointed out that neither the polling nor run-loop
scheduling approaches are airtight defenses against blocking. If the
NSInputStream hasBytesAvailable method or the NSOutputStream
hasSpaceAvailable method returns NO, it means in both cases that the
stream definitely has no available bytes or space. However, if either
of these methods returns YES, it can mean that there is available
bytes or space or that the only way to find out is to attempt a read
or a write operation (which could lead to a momentary block). The
NSStreamEventHasBytesAvailable and NSStreamEventHasSpaceAvailable
stream events have identical semantics.
So, it seems neither hasBytesAvailable/hasSpaceAvailable, nor stream events provide a guarantee against blocking. Is there any way to get guaranteed non-blocking behaviour with streams? I could create a background thread to get guaranteed non blocking behaviour, but I want to avoid doing that.
Also, I fail to understand why NSStream can't provide gauranteed non-blocking behaviour given that the low-level APIs (select, kqueue, etc.) can do so. Can someone explain why this is the case?
You either run your reading or writing in a different thread or you can't use NSStream. There are no other ways to get guaranteed non-blocking behavior.
For regular files and sockets you most likely will get non-blocking behavior if you schedule the stream on a runloop. But there are other types of stream that are not implemented on top of a file descriptor. By documenting the base class as not always non-blocking Apple keeps options open of implementing different streams in a way where they can't guarantee the non-blocking property.
But since we can't check the source code we can only speculate on this. You might want to file a bug with Apple requesting them to update the docs with that information.
With regards to atomic property, Apple's documentation has this below:
This means that the synthesized accessors ensure that a value is
always fully retrieved by the getter method or fully set via the
setter method, even if the accessors are called simultaneously from
different threads.
What does "fully retrieved" or "fully set" mean?
Why is "fully retrieved" or "fully set" not enough to guarantee thread safety?
Note: I am aware there are many posts regarding atomicity on SO, please don't tag this as duplicate unless the ticket specifically address the question above. After reading the posts, I still do not fully understand atomic property.
Atomic means that calls to the getter/setter are synchronized. That way if one thread is setting the property at the same time as another thread is getting it, the one getting the property is guaranteed to get a valid return value. Without it being atomic, it would be possible that the getter retrieves a garbage value, or a pointer to an object that is immediately deallocated. When it's atomic, it will also ensure that if two threads try to set it at the same time, one will wait for the other to finish. If it weren't atomic and two threads tried to set it at the same time, you could end up with a garbage value being written, or possibly objects being over/under retained or over/under released.
So basically, if the property is being set, any other calls to set it or get it will wait for the method to return. Same for if the property is being gotten, any other calls to get it or set it will wait until that get finishes.
This is sometimes sufficient for thread safety depending on what it's being used for. But often you want more than this level of synchronization for thread safety. For example if one block of code on a thread gets the value, makes some change to it, and wants to set it again without some other thread changing it in the meantime. You would have to do additional synchronization to make sure that you have a lock on it from before you get it until after you subsequently set it. You would want to do the same if you wanted to get an object and make some changes to that object without another thread trying to make changes at the same time on it.
"Fully set" and "fully retrieved" means the code below will always print "0x11111111" or "0x22222222". It will never print things like "0x11112222" or "0x11221122". Without atomic or some other appropriate thread synchronization, those sorts of partial reads or partial updates are allowed for some data types on some CPU architectures.
// Thread 1
while (true) x = 0x11111111;
// Thread 2
while (true) x = 0x22222222;
// Thread 3
while (true) printf("0x%x\n", x);
It means the value will never be accessed when it's halfway through being written. It will always be one intended value or another, never an incompletely altered bit pattern.
It isn't enough to guarantee thread-safety because ensuring that the value of a variable is either fully written or nor written at all is not enough to make all the code that uses that variable thread-safe. For example, there might be two variables that need to be updated together (the classical example is transferring credits from one account to another), but code in another thread could see one variable with the new value and one with the old.
Very often, you'll need to implement synchronization for whole systems, so the guarantees offered by atomic variables end up almost not mattering a lot of the time.
I have a situation where a session of background processing can finish by timing out, user asynchronously cancelling or the session completing. Any of those completion events can run a single shot completion method. The completion method must only be run once. Assume that the session is an instance of an object so any synchronisation must use instance constructs.
Currently I'm using an Atomic Compare and Swap operation on a completion state variable so that each event can test and set the completion state when it runs. The first completion event to fire gets to set the completed state and run the single shot method and the remaining events fail. This works nicely.
However I can't help feeling that I should be able to do this in a higher level way. I tried using a Lock object (NSLock as I'm writing this with Cocoa) but then got a warning that I was releasing a lock that was still in the locked state. This is what I want of course. The lock gets locked once and never unlocked but I was afraid that system resources representing the lock might get leaked.
Anyway, I'm just interested as to whether anyone knows of a more high level way to achieve a single shot method like this.
sample code for any of the completion events:
if(OSAtomicCompareAndSwapInt(0, 1, &completed))
{
self.completionCallback();
}
Doing a CAS is almost certainly the right thing to do. Locks are not designed for what you need, they are likely to be much more expensive and are semantically a poor match anyway -- the completion is not "locked". It is "done". A boolean flag is the right representation, and doing a CAS ensures that it is manipulated safely in concurrent scenarios. In C++, I'd use std::atomic_flag for this, maybe check whether Cocoa has anything similar (this just wraps the CAS in a nicer interface, so that you never accidentally use a non-CAS test on the variable, which would be racy).
(edit: in pthreads, there's a function called pthread_once which does what you want, but I wouldn't know about Cocoa; the pthread_once interface is quite unwieldy anyway, in my opinion...)
I've been using an NSCache to store items that should be cached for performance reasons, since they are rather expensive to recreate. Understanding NSCache behaviour is giving me some headaches though.
I initialize my NSCache as follows:
_cellCache = [[NSCache alloc] init];
[_cellCache setDelegate:self];
[_cellCache setEvictsObjectsWithDiscardedContent:NO]; // never evict cells
The objects held in the cache implement the NSDiscardableContent protocol. Now my problem is the NSCache class does not seem to be working correctly in a couple of instances.
1) First the smaller issue. NSCache's setCountLimit: states that:
Setting the count limit to a number less than or equal to 0 will have no effect on the maximum size of the cache.
Can someone shed some light on what this means? I.e., that the NSCache is maximal, and will issue discardContentIfPossible messages only when memory is required elsewhere? Or that the NSCache is minimal, and it will issue discardContentIfPossible messages immediately?
The former makes more sense, but testing seems to indicate that the later is what is happening. If I log calls to the discardContentIfPossible method in my cached object, I see that it is being called almost immediately -- after only a 2-3 dozen items have been added to the cache (each less than 0.5 MB).
Okay. So I try then to set a large count limit -- way more than I will ever need -- by adding the following line:
[_cellCache setCountLimit:10000000];
Then the discardContentIfPossible messages are no longer sent almost immediately. I have to load a lot more content into the cache and use it for a while before these message start occurring which makes more sense.
So what is the intended behaviour here?
2) The larger issue. The documentation states:
By default, NSDiscardableContent objects in the cache are automatically removed from the cache if their content is discarded, although this automatic removal policy can be changed. If an NSDiscardableContent object is put into the cache, the cache calls discardContentIfPossible on it upon its removal.
So I set the eviction policy to NO (as above) so objects are never evicted. Instead, when discardContentIfPossible is called, I clear and release the internal data of the cached object according to special criteria. That is, I may decide not to actually clear and discard the data under certain circumstances (for example if the item has been very recently used). In such a scenario, I simply return from the discardContentIfPossible method not having discarded anything. The idea is that some other object that isn't recently used will get the message at some point, and it can discard it's content instead.
Now, interestingly, all seems to work great for a while of heavy use. Loading lots of content, placing and accessing objects in the cache, etc. After some time though, when I try to access an arbitrary object in the NSCache it's literally not there! Somehow it appears it has been removed -- even though I specifically set the eviction policy to NO.
Okay. So implementing the delegate method cache:willEvictObject: shows it never gets called. Which means the object is not actually getting evicted. But it's mysteriously disappearing from the NSCache since it can't be found on future lookups -- or somehow the key it was associated with is is no longer mapped to the original object. But how can that happen?
BTW, the key object that I'm associating with my value objects (CALayer's) is an NSURL container/wrapper, since I can't use NSURL's directly as NSCache doesn't copy keys -- only retains them, and a later NSURL key used for lookup might not be the exact original object (only the same URL string) that was initially used to load the cache with that object. Whereas with an NSURL wrapper/container, I can ensure it is the exact original key object that was used to add the original value object to the NSCache.
Can anyone shed some light on this?
Your comment "never evict cells" is not what -setEvictsObjectsWithDiscardedContent: says it does. It says it won't evict cells just because their content was discarded. That's not the same as saying it will never evict cells. You can still run past the maximum size or count and they can still be evicted. The advantage of discardable content is that you may be able to discard some of your content without removing yourself from the cache. You would then rebuild the discarded content on demand, but might not have to rebuild the entire object. In some cases this can be a win.
That brings us to your first question. Yes, NSCache starts evicting when it hits the maximum size. That's why it's called the "maximum size." You don't indicate whether this is Mac or iPhone, but in both cases you don't want the cache to grow until memory is exhausted. That's bad on a number of fronts. On Mac, you're going to start swapping heavily long before memory is exhausted. In iOS, you don't want to start sending memory warnings to every other process just because one process got crazy with its cache. In either case, a cache that is too large provides poor performance. So objects put into an NSCache should always expect to be evicted at any time.
May be my question is stupid. But i would like to get it cleared. We know that functions are loaded in memory only once and when you create new objects, only instance variables gets created, functions are never created. My question is, say suppose there is server and all clients access a method named createCustomer(). Say suppose all clients do something which fired createCustomer on server. So, if the method is in middle of execution and new client fires it. Will the new request be put on wait? or new request also will start executing the method? How does it all get managed when there is only one copy of function in memory? No book mentions answers to this type of questions. So i am posting here where i am bound to get answers :).
Functions are code which is then executed in a memory context. The code can be run many times in parallel (literally in parallel on a multi-processor machine), but each of those calls will execute in a different memory context (from the point of view of local variables and such). At a low level this works because the functions will reference local variables as offsets into memory on something called a "stack" which is pointed to by a processor register called the "stack pointer" (or in some interpreted languages, an analog of that register at a higher level), and the value of this register will be different for different calls to the function. So the x local variable in one call to function foo is in a different location in memory than the x local variable in another call to foo, regardless of whether those calls happen simultaneously.
Instance variables are different, they're referenced via a reference (pointer) to the memory allocated to the instance of an object. Two running copies of the same function might access the same instance variable at exactly the same time; similarly, two different functions might do so. This is why we get into "threading" or concurrency issues, synchronization, locks, race conditions, etc. But it's also one reason things can be highly efficient.
It's called "multi-threading". If each request has its own thread, and the object contains mutable data, each client will have the opportunity to modify the state of the object as they see fit. If the person who wrote the object isn't mindful of thread safety you could end up with an object that's in an inconsistent state between requests.
This is a basic threading issue, you can look it up at http://en.wikipedia.org/wiki/Thread_(computer_science).
Instead of thinking in terms of code that is executed, try to think of memory context of a thread that is changed. It does not matter where and what the actual code happens to be, and if it is the same code or a duplicate or something else.
Basically, it can happen that the function is called while it was already called earlier. The two calls are independent and may even happen to run in parallel (on a multicore machine). The way to achieve this independence is by using different stacks and virtual address spaces for each thread.
There are ways to synchronize calls, so additional callers have to wait until the first call finishes. This is also explained in the above link.