Is this Vulkan code to found appropriate memory types for allocation safe? [duplicate] - vulkan

Since memoryTypeBits in VkMemoryRequirements is a 32 bit uint, does it mean that there can be no more than 32 memory types?

Basically, yes. But you pretty much never see more than 12 on actual implementations. There just aren't that many combinations of heaps and memory allocation patterns.
At least, not yet. It's possible that extensions and later core features could balloon this past 32 (much like they had to extend the bits for stages to 64 when they added ray tracing). But thus far, they're pretty far from the limit.

Related

What are consequences of switching from 32 bit to 64 bit VM?

I am wondering what are the possible pros and cons of switching from 32 bit to 64 bit VM. Most of answers I encountered suggest that with VM you don't need to worry about switch and you get extra addressable memory, but I am afraid that there is more to the problem.
Here is what I came up with:
Pros:
substantially more addressable memory
Cons:
whatever uses pointers under the hood will double up the memory usage - I know VMs do optimisations in order to avoid this problem, but I guess it's safe to assume that same logic under 64bit VM will potentially take more memory
all deterministic outputs relying on memory hashes are likely to change
Anything more?
I am happy to take concrete example like JVM.

Should NSInteger really be used everywhere?

Now there is an iPhone coming with 64-bit architecture. long becomes 64 bits (while int remains 32-bit) , and everywhere NSInteger has been used is now a long and so 64-bits not 32. And twitter has quite a few people saying "I'm glad I've used NSInteger everywhere not int".
If you need to store a value that doesn't exceed 32 bits (for example in a loop which only loops 25 times), why should a long be used, because the 32 (at least) upper bits are going to be empty.
If the program has worked on 32-bit integers, then what benefit does using 64-bits for integers provide, when it uses up more memory?
Also there will be situations where using a 64-bit integer gives a different result to using a 32-bit integer. So if you use NSInteger then something may work on an iPhone 5S but not on an older device, whereas if int or long is explicitly used then the result will be the same on any device.
If you need to store a value that doesn't exceed 32 bits... why should a long be used?
If you can really make that guarantee, then there is absolutely no reason to value a 64 bit type over a 32 bit one. For simple operations like bounded loops, counters, and general arithmetic, 32-bit integers suffice. But for more complex operations, especially those required of high-performance applications - such as those that perform audio or image processing - the increase in the amount of data the processor can handle in 64 bit modes is significant.
If the program has worked on 32-bit integers, then what benefit does using 64-bits for integers provide, when it uses up more memory?
You make using more memory seem like a bad thing. By doubling the size of some data types, they can be addressed to more places in memory, and the more memory that can be addressed, the less time the OS spends loading code. In addition, having twice the amount of lanes for data in a processor bus equates to an order of magnitude more values that can be processed in a single go, and the increase in register size means an order of magnitude more data can be kept around in one register. This equates to, in simplest terms, a nearly automatic doubling of the speed of most applications.
Also there will be situations where using a 64-bit integer gives a different result to using a 32-bit integer? ...
Yes, but not in the way you'd think. 32-bit data types and operations as well as 64-bit operations (most simulated in software, or by special hardware or opcodes in 32-bit hosts) are "relatively stable" in terms of their sizes. You cannot make nearly as many guarantees on a 64-bit architecture because different compilers implement different versions of 64-bit data types (see LP64, SILP64, and LLP64). Practically, this means casting a 64 bit type to a 32-bit type - say a pointer to an int - is guaranteed to lead to information loss, but casting between two data types that are guaranteed to be 64 bits - a pointer and a long on LP64 - is acceptable. ARM is usually compiled using LP64 (all ints are 32-bit, all longs are 64-bit). Again, most developers should not be affected by the switch, but when you start dealing with arbitrarily large numbers that you try to store in integers, then precision becomes an issue.
For that reason, I'd recommend using NSUInteger and NSInteger in public interfaces, and APIs where there is no inherent bounds checking or overflow guards. For example, a TableView requests an NSUInteger amount of data not because it's worried about 32 and 64 bit data structures, but because it can make no guarantees about the architecture upon which it's compiled. Apple's attempt to make architecture-independent data types is actually a bit of a luxury, considering how little work you have to do to get your code to compile and "just work" in both architectures.
The internal storage for NSInteger can be one of many different backing types, which is why you can use it everywhere and do not need to worry about it, which is the whole point of it.
Apple takes care for backward compatibility if your app is running on a 32 or 64 bit engine and will convert your variable behind the scenes to a proper data type using the __LP64__ macro.
#if __LP64__
typedef long NSInteger;
typedef unsigned long NSUInteger;
#else
typedef int NSInteger;
typedef unsigned int NSUInteger;
#endif

Why are C variable sizes implementation specific?

Wouldn't it make more sense (for example) for an int to always be 4 bytes?
How do I ensure my C programs are cross platform if variable sizes are implementation specific?
The types' sizes aren't defined by c because c code needs to be able to compile on embedded systems as well as your average x86 processor and future processors.
You can include stdint.h and then use types like:
int32_t (32 bit integer type)
and
uint32_t (32 bit unsigned integer type)
C is often used to write low-level code that's specific to the CPU architecture it runs on. The size of an int, or of a pointer type, is supposed to map to the native types supported by the processor. On a 32-bit CPU, 32-bit ints make sense, but they won't fit on the smaller CPUs which were common in the early 1970s, or on the microcomputers which followed a decade later. Nowadays, if your processor has native support for 64-bit integer types, why would you want to be limited to 32-bit ints?
Of course, this makes writing truly portable programs more challenging, as you've suggested. The best way to ensure that you don't build accidental dependencies on a particular architecture's types into your programs is to port them to a variety of architectures early on, and to build and test them on all those architectures often. Over time, you'll become familiar with the portability issues, and you'll tend to write your code more carefully.
Becoming aware that not all processors have integer types the same width as their pointer types, or that not all computers use twos-complement arithmetic for signed integers, will help you to recognize these assumptions in your own code.
You need to check the size of int in your implementation. Don't assume it is always 4 bytes. Use
sizeof(int)

Convert OMF16 to OBJ32

Is there a way to convert from OMF 16 bit object file format to COFF 32 bit object file format?
I seriously doubt there would exist one. Code designed to be ran in 16 bit environment is binary incompatible with 32 bit mode. For example there's an instruction that tells the CPU to flip bit sizes for the upcoming instruction. In 16 bit mode such an instruction is needed to use 32 bit instructions. However the same opcode is needed to use 16 bit instructions in 32 bit mode.
Whether a series of opcodes are to be assumed to be 16 or 32 bits is specified in the segment descriptor.
Anyway, if you have 16 bit code that you'd like to use in 32 bit mode, that has no OS dependencies, you can use that by disaassembling it using IDA, then recompile it with a 32bit assembler. Of course only if that's permitted by its license. (although this could be fair use, but IANAL).
If the code is also tied to the underlying OS, this could be a lot more difficult, and would require to rewrite perhaps significant portions of the code.
Presumably the OMF16 code targets 16 bit x86 real-mode or 286 protected mode? That being the case, the object file format is not really your issue, the code itself is entirely incompatible since it uses different register sizes and a different addressing scheme.
Moreover if the code is targetted for DOS, Win16 or OS/2 (i.e. systems that used OMF16), then targeting it to a 32 bit target is not just a case of converting the object file format.
You need to rebuild from the source which give the tags to the question is either C or C++? Either that or you have a significant reverse engineering task on your hands!
I've searched on the net, and found these links:
The first one is a collection of tools:
http://sourceware.org/binutils/
The second one is a tool I think you need:
http://sourceware.org/binutils/docs/binutils/objcopy.html
They are not work in all cases(bazsi77 above), just test it.

Optimizing for ARM: Why different CPUs affects different algorithms differently (and drastically)

I was doing some benchmarks for the performance of code on Windows mobile devices, and noticed that some algorithms were doing significantly better on some hosts, and significantly worse on others. Of course, taking into account the difference in clock speeds.
The statistics for reference (all results are generated from the same binary, compiled by Visual Studio 2005 targeting ARMv4):
Intel XScale PXA270
Algorithm A: 22642 ms
Algorithm B: 29271 ms
ARM1136EJ-S core (embedded in a MSM7201A chip)
Algorithm A: 24874 ms
Algorithm B: 29504 ms
ARM926EJ-S core (embedded in an OMAP 850 chip)
Algorithm A: 70215 ms
Algorithm B: 31652 ms (!)
I checked out floating point as a possible cause, and while algorithm B does use floating point code, it does not use it from the inner loop, and none of the cores seem to have a FPU.
So my question is, what mechanic may be causing this difference, preferrably with suggestions on how to fix/avoid the bottleneck in question.
Thanks in advance.
One possible cause is that the 926 has a shorter pipeline (5 cycles vs. 8 cycles for the 1136, iirc), so branch mispredictions are less costly on the 926.
That said, there are a lot of architectural differences between those processors, too many to say for sure why you see this effect without knowing something about the instructions that you're actually executing.
Clock speed is only one factor. Bus width and latency are big if not bigger factors. Cache is a factor. Speed of the media the program is run from if run from media and not memory.
Is this test using any shared libraries at all at any point in the test or is it all internal code? Fetching shared libraries on media that will vary from platform to platform (even if it is say the same sd card).
Is this the same algorithm compiled separately for each platform or the same binary? You can and will see some compiler induced variation as well. 50% faster and slower can easily come from the same compiler on the same platform by varying compiler settings. If possible you want to execute the same binary, and insure that no shared libraries are used in the loop under test. If not the same binary disassemble the loop under test for each platform and insure that there are no variations other than register selection.
From the data you have presented, its difficult to point the exact problem, but we can share some of the prior experience
Cache setting (check if all the
processors has the same CACHE
setting)
You need to check both D-Cache and I-Cache
For analysis,
Break down your code further, not just as algorithm but at a block level, and try to understand the block that causes the bottle-neck. After you find the block that causes the bottle-neck, try to disassemble the block's source code, and check the assembly. It may help.
Looks like the problem is in cache settings or something memory-related (maybe I-Cache "overflow").
Pipeline stalls, branch miss-predictions usually give less significant differences.
You can try to count some basic operations, executed in each algorithm, for example:
number of "easy" arithmetical/bitwise ops (+-|^&) and shifts by constant
number of shifts by variable
number of multiplications
number of "hard" arithmetics operations (divides, floating point ops)
number of aligned memory reads (32bit)
number of byte memory reads (8bit) (it's slower than 32bit)
number of aligned memory writes (32bit)
number of byte memory writes (8bit)
number of branches
something else, don't remember more :)
And you'll get info, that things get 926 much slower. After this you can check suspicious blocks, making using of them more or less intensive. And you'll get the answer.
Furthermore, it's much better to enable assembly listing generation in VS and use it (but not your high-level source code) as base for research.
p.s.: maybe the problem is in OS/software/firmware? Did you testing on clean system? OS is the same on all devices?