Can I write single bit to block device? - block-device

Question is about block devices like HDD/SDD etc, not talking about filesystems here.
I wonder is it possible to write a SINGLE BIT or exact amount of data (eg. 7 bits) to block device like HDD? I read somewhere (can't tell where) that I could not and whatever I will do the whole block will be written.
Can someone explain it to me?

Related

API for storing binary blobs

I'm doing some moderately low-level programming of an embedded device that has some NVRAM we plan to use for retaining values between runs of a program. We'd like to abstract the operations into an API over a driver or talking to a daemon. This is lower-level than the serialization semantics I've seen here and there. Basically we want a process or function to be able to reserve some space (with some name or other identifier), store a value (arbitrary byte sequence) in that reserved space, retrieve the value later, and surrender the reservation if it no longer needs to use it. This feels a lot like malloc, write, read, and free. I'm tempted to implement nvAlloc() (or something) and so on. Or am I missing something obvious? Maybe security: another process getting a handle and accessing or corrupting the value.
It seems http://pramfs.sourceforge.net/ and normal file system access are the right answer.

How can i make my code "save against concurrent invocations" when using "NSSortConcurrent" in sortedArrayWithOptions:usingComparator?

Im trying to sort an Array of NSDates using the sortedArrayWithOptions:usingComparator: - method od NSArray. So far all is very well and my code works as expected.
However, seeing that i can specify options for the method to use, i went into the docs and tried to figure out what they mean.
Theres NSSortStable, of course: Objects that have the same Value should be returned in the order they existed in before the sort. Thats easy enough, i guess.
But im somewhat stumped as to what NSSortConcurrent means. This is what the docs say:
Specifies that the Block sort operation should be concurrent.
This option is a hint and may be ignored by the implementation under some circumstances;
the code of the Block must be safe against concurrent invocation.
Available in Mac OS X v10.6 and later.
So i understand that i can allow the use of multiple threads for the sorting operation? thats great. In this case, is "save against concurrent invocations" just fancy talk for "thread-safe"? And if it isnt, what does it mean? Im sorry for this rather stupid question, but im not a native english speaker. Thanks.
Never mind, i figured ot out. NSSortConcurrent will indeed allow the sorting operation to use multiple threads, and thus the only rewuirement is for the sorting block to be thread-safe. As long as youre not touching any data that is located outside the block (so dont use __block-variables) you should be fine.

Organzing Data in EEPROM

I have a 64KB EEPROM, organized as 128-byte pages, on my board which talks to an AT Mega 1281. The board also has a SD Card slot and is capable of copying over some configuration files onto the EEPROM (which acts as the internal memory). Due to the nature of the board, only two types of files are needed - one is known as the Circuit Data and the other is Location Data - both are binary files.
Up until now, I had just split the EEPROM into two 32K halves and wrote the Circuit Data in the top half and the Location Data in the bottom half. Both files also have a 25 byte header. I copy the header in the last pages of the files respective half i.e. the page starting at address 0x7F80 has the Circuit Data file's header and the address starting at 0xFF80 has the other header. The data is always going to be of fixed width so that makes random access quite easy.
My question is, is there a better, simpler, way to organize data in an EEPROM? At the moment, I don't even store the length of the data as it's not really needed. But I'm thinking it might add an another step of safety if I do include that in the header.
Better? It depends. Simpler? Really not. It depends how strong is your "always". How much do you believe yourself that the files will be always of fixed length? The fact that you are asking this question probably means some doubts. Keep in mind KISS principle. Microcontroller development is still an area where unecessary features are a direct threat to the solution stability. Having a data length in the header would be useful if you want to make your EEPROM access more generic. But then again, generalization for two files is an overkill.
Second thought: rather than introducing file lengths which you actually don't need, i would like to know why you store the file headers at the opposite side of the respective memory chunk. A "header" is to me something what needs to be read before the file itself. You could save one transfer of the reading address to EEPROM.
I believe, in any embedded project, simplest solution is the best. Your way to organize storage is simple, and looks like it meets all your requirements.
Any attempt to "improve" or "optimize" this solution will lead to more complicated code and will increase probability of making bug in it. So keep all your engineering solutions as simple as possible. If there will pop new requirements, you always can find new simple solution for them. Don't do any premature optimizations.

Determining most register hungry part of kernel

when I get a kernel using too many registers there are basically 3 options I can do:
leave the kernel as it is, which results in low occupancy
set compiler to use lower number of registers, spilling them, causing worse performance
rewrite the kernel
For option 3, I'd like to know which part of the kernel needs the maximum number of registers. Is there any tool or technique allowing me to identify this part? Reading through the PTX code (I develop on NVidia) is not helpful, the registers have various high numbers and to be honest, the best I can do is to identify which part of the assembly code maps to which part of the C code.
Just commenting out some code is not much a way to go - for example, I noticed that if I just put the code into loop, the number of registers raises dramatically, not only by one for the loop control variable. I personally suspect the NVidia compiler from imperfect variable liveness analysis, but of course I cannot do much with that :-)
If you're running on NVidia hardware, you can pass -cl-nv-verbose compile option to clBuildProgram then clGetProgramInfo CL_PROGRAM_BINARIES to get human readable text about the compile. In there it will say the number of registers it uses. Note that NVidia caches compiles and it only produces that register info when the kernel source actually changes, so you may want to inject some superfluous change in the source code to force it to do the full analysis.
If you're running on AMD hardware, just set the environment variable GPU_DUMP_DEVICE_KERNEL=1. It will produce a text file of the IL during the compile. Not sure it explicitly says the number of registers used, but it's what is equivalent to the NVidia technique above.
When looking at that output (at least on nvidia), you'll find that it seems to use an infinite number of registers (if you go by the register numbers). In reality, it does a flow analysis and actually reuses registers in a way that is not at all obvious when looking at the IL.
This is a tough question in any language, and there's probably not one correct answer. But here are some things to think about:
Look for the code in the "deepest" scope you can find, keeping in mind that most functions are probably inlined by your OpenCL compiler. Count the variables used in this scope, and walk up the containing scopes. In each containing scope, count variables that are used both before and after the inner scope. These are potentially live while the inner scope executes. This process could help you account for the live registers at a particular part of the program.
Try to take variables that "span" deep scopes and make them not span the scope if possible. For example, if you have something like this:
int i = func1();
int j = func2(); // perhaps lots of live registers here
int k = func3(i,j);
you could try to reorder the first two lines if func2 has lots of live registers. That would remove i from the set of live registers while func2 is running. This is a trivial pattern, of course, but hopefully it's illustrative.
Think about getting rid of variables that just keep around the results of simple computations. You might be able to recompute these when you need them. For example, if you have something like int i = get_local_id(0) you might be able to just use get_local_id(0) wherever you would have used i.
Think about getting rid of variables that are keeping around values stored in memory.
Without good tools for this kind of thing, it ends up being more art than science. But hopefully some of this is helpful.

What techniques are available for memory optimizing in 8051 assembly language?

I need to optimize code to get room for some new code. I do not have the space for all the changes. I can not use code bank switching (80c31 with 64k).
You haven't really given a lot to go on here, but there are two main levels of optimizations you can consider:
Micro-Optimizations:
eg. XOR A instead of MOV A,0
Adam has covered some of these nicely earlier.
Macro-Optimizations:
Look at the structure of your program, the data structures and algorithms used, the tasks performed, and think VERY hard about how these could be rearranged or even removed. Are there whole chunks of code that actually aren't used? Is your code full of debug output statements that the user never sees? Are there functions specific to a single customer that you could leave out of a general release?
To get a good handle on that, you'll need to work out WHERE your memory is being used up. The Linker map is a good place to start with this. Macro-optimizations are where the BIG wins can be made.
As an aside, you could - seriously- try rewriting parts of your code with a good optimizing C compiler. You may be amazed at how tight the code can be. A true assembler hotshot may be able to improve on it, but it can easily be better than most coders. I used the IAR one about 20 years ago, and it blew my socks off.
With assembly language, you'll have to optimize by hand. Here are a few techniques:
Note: IANA8051P (I am not an 8501 programmer but I have done lots of assembly on other 8 bit chips).
Go through the code looking for any duplicated bits, no matter how small and make them functions.
Learn some of the more unusual instructions and see if you can use them to optimize, eg. A nice trick is to use XOR A to clear the accumulator instead of MOV A,0 - it saves a byte.
Another neat trick is if you call a function before returning, just jump to it eg, instead of:
CALL otherfunc
RET
Just do:
JMP otherfunc
Always make sure you are doing relative jumps and branches wherever possible, they use less memory than absolute jumps.
That's all I can think of off the top of my head for the moment.
Sorry I am coming to this late, but I once had exactly the same problem, and it became a repeated problem that kept coming back to me. In my case the project was a telephone, on an 8051 family processor, and I had totally maxed out the ROM (code) memory. It kept coming back to me because management kept requesting new features, so each new feature became a two step process. 1) Optimize old stuff to make room 2) Implement the new feature, using up the room I just made.
There are two approaches to optimization. Tactical and Strategical. Tactical optimizations save a few bytes at a time with a micro optimization idea. I think you need strategic optimizations which involve a more radical rethinking about how you are doing things.
Something I remember worked for me and could work for you;
Look at the essence of what your code has to do and try to distill out some really strong flexible primitive operations. Then rebuild your top level code so that it does nothing low level at all except call on the primitives. Ideally use a table based approach, your table contains stuff like; Input state, event, output state, primitives.... In other words when an event happens, look up a cell in the table for that event in the current state. That cell tells you what new state to change to (optionally) and what primitive(s) (if any) to execute. You might need multiple sets of states/events/tables/primitives for different layers/subsystems.
One of the many benefits of this approach is that you can think of it as building a custom language for your particular problem, in which you can very efficiently (i.e. with minimal extra code) create new functionality simply by modifying the table.
Sorry I am months late and you probably didn't have time to do something this radical anyway. For all I know you were already using a similar approach! But my answer might help someone else someday who knows.
In the whacked-out department, you could also consider compressing part of your code and only keeping some part that is actively used decompressed at any particular point in time. I have a hard time believing that the code required for the compress/decompress system would be small enough a portion of the tiny memory of the 8051 to make this worthwhile, but has worked wonders on slightly larger systems.
Yet another approach is to turn to a byte-code format or the kind of table-driven code that some state machine tools output -- having a machine understand what your app is doing and generating a completely incomprehensible implementation can be a great way to save room :)
Finally, if the code is indeed compiled in C, I would suggest compiling with a range of different options to see what happens. Also, I wrote a piece on compact C coding for the ESC back in 2001 that is still pretty current. See that text for other tricks for small machines.
1) Where possible save your variables in Idata not in xdata
2) Look at your Jmp statements – make use of SJmp and AJmp
I assume you know it won't fit because you wrote/complied and got the "out of memory" error. :) It appears the answers address your question pretty accurately; short of getting code examples.
I would, however, recommend a few additional thoughts;
Make sure all the code is really
being used -- code coverage test? An
unused sub is a big win -- this is a
tough step -- if you're the original
author, it may be easier -- (well, maybe) :)
Ensure the level of "verification"
and initialization -- sometimes we
have a tendency to be over zealous
in insuring we have initialized
variables/memory and sure enough
rightly so, how many times have we
been bitten by it. Not saying don't
initialize (duh), but if we're doing
a memory move, the destination
doesn't need to be zero'd first --
this dovetails with
1 --
Eval the new features -- can an
existing sub be be enhanced to cover
both functions or perhaps an
existing feature replaced?
Break up big code if a piece of the
big code can save creating a new
little code.
or perhaps there's an argument for hardware version 2.0 on the table now ... :)
regards
Besides the already mentioned (more or less) obvious optimizations, here is a really weird (and almost impossible to achieve) one: Code reuse. And with Code reuse I dont mean the normal reuse, but to a) reuse your code as data or b) to reuse your code as other code. Maybe you can create a lut (or whatever static data) that it can represented by the asm hex opcodes (here you have to look harvard vs von neumann architecture).
The other would reuse code by giving code a different meaning when you address it different. Here an example to make clear what I mean. If the bytes for your code look like this: AABCCCDDEEFFGGHH at address X where each letter stands for one opcode, imagine you would now jump to X+1. Maybe you get a complete different functionality where the now by space seperated bytes form the new opcodes: ABC CCD DE EF GH.
But beware: This is not only tricky to achieve (maybe its impossible), but its a horror to maintain. So if you are not a demo code (or something similiar exotic), I would recommend to use the already other mentioned ways to save mem.