What happens when you send an 8bit number to an output which is 4bit? C Language - hardware

I'm studying in high school, and we have an electronics project.
We have an output from our computer which is 4 bit, output address is 37Ah
and my teacher did this:
outportb(0x37A,0x80);
so what will appear in the output? 0h or 8h?

Unless this is a 4-bit CPU from the 70s then your output port will be 8 bits, but the connected hardware might only use 4. In that case it is common (but not necessary) to use the lower 4 bits so you would have 0x0 as value. But that makes using 0x80 a smokescreen, it would be the same as 0x00 and 0xF0. So from that alone I would guess that the upper 4 bits are used here, and the value sent is 0x8.
But a twisted hardware engineer could have used the middle 4 bits.

You need to explain your problem a little better. What microprocesser do you use etc. Is it a 4-port output you have?
But 0x80 is equal to:
0b1000000 and if you use the lower 4 bits: 0b1000xxxx, then they will be zero (not turned on). This will happen if 0x37A is 8bit.
Otherwise, explain your problem better :)
Can't you try and see what happens? or is it only theoretical until now?
EDIT:
I see it is a printer port. Check http://www.tinet.cat/~sag/gifs/ParallelPort.gif if you use port 2,3,4,5 then the upper 4 bits really doesn't matter :) as said in my comment.

Related

Accessing a combination of ports by adding both their offsets to a base address. How would this work?

Context: I am following an embedded systems course https://www.edx.org/course/embedded-systems-shape-the-world-microcontroller-i
In the lecture on bit specific addressing they present the following example on a "peanut butter and jelly port".
Given you a port PB which has a base address of 0x40005000 and you wanted to access both port 4 and port 6 from PB which would be PB6 and PB4 respectively. One could add the offset of port 4(0x40) and port 6(0x100) to the base address(0x40005000) and define that as their new address 0x40005140.
Here is where I am confused. If I wanted to define the address for PB6 it would be base(0x40005000) + offset(0x100) = 0x40005100 and the address for PB4 would be base(0x40005000) + offset(0x40) = 0x40005040. So how is it that to access both of them I could use base(0x40005000) + offset(0x40) + offset(0x100) = 0x40005140? Is this is not an entirely different location in memory for them individually?
Also why is bit 0 represented as 0x004. In binary that would be 0000 0100. I suppose it would represent bit 0 if you disregard the first two binary bits but why are we disregarding them?
Lecture notes on bit specific addressing:
Your interpretation of how memory-mapped registers are addressed is quite reasonable for any normal peripheral on an ARM based microcontroller.
However, if you read the GPIODATA register definition on page 662 of the TM4C123GH6PM datasheet then you will see that this "register" behaves very differently.
They map a huge block of the address space (1024 bytes) to a single 32-bit register. This means that bits[9:2] of the the address bus are not needed, and are in fact overloaded with data. They contain the mask of the bits to be updated. This is what the "offset" calculation you have copied is trying to describe.
Personally I think this hardware interface could be a very clever way to let you set only some of the outputs within a bank using a single atomic write, but it makes this a very bad choice of device to use for teaching, because this isn't the way things normally work.

Compact-u16 - what is the purpose of this?

I was doing some research over the weekend on some blockchain dev in the Solana blockchain and came across a construct called Compact-u16. The definition of this in the documentation says the following: "A compact-u16 is a multi-byte encoding of 16 bits. The first byte contains the lower 7 bits of the value in its lower 7 bits. If the value is above 0x7f, the high bit is set and the next 7 bits of the value are placed into the lower 7 bits of a second byte. If the value is above 0x3fff, the high bit is set and the remaining 2 bits of the value are placed into the lower 2 bits of a third byte.".
I have been coding for 30+ years. Maybe I'm just old school on this, but why is there a construct to store 16 bits of data in 3 bytes? This is just vastly inefficient from my standpoint. Is there a reason for this? On further research, I found a doc related to assembly instruction pointers, which referenced 7 instruction pointers that are useful for caching values when context switching in and out of the processor stack. But this construct is used for a web app platform. Like, literally, there is no reason that I have been able to find that justifies using 3 bytes to store 16 bits of data. If the developers wanted to use an elegant bit mapping solution to compress space, why not just use a semaphore? Why create a brand new construct that increases the storage requirements for the data by 33%.
What am I missing?
I had some similar confusion when reading the compact-u16 description. Based on the code for parsing them in the solana python module I believe they're doing something conceptually similar to UTF-8, and storing the number in 1-3 bytes depending on its size.
Basically instead of each byte having 8 bits of a number, it has 7 bits of the number and a flag (the most significant bit) that indicates whether the number continues in the next byte. For the largest numbers they need an extra byte, but for numbers less than 128 they need only one byte. Since Solana seems to use these for storing the length of arrays, if it's common that the length of the arrays is less than 128 then they will end up with fewer total bytes to transfer across all transactions.
Some examples I worked out for myself:
hex | compact-u16
--------+------------
0x0000 | [0x00]
0x0001 | [0x01]
0x007f | [0x7f]
0x0080 | [0x80 0x01]
0x3fff | [0xff 0x7f]
0x4000 | [0x80 0x80 0x01]
0xc000 | [0x80 0x80 0x03]
0xffff | [0xff 0xff 0x03])

how to drive a dotstar strip from C on a raspberry pi

I am trying to figure out how to drive a dotstart strip by calling write(handle, datap, len) to an SPI handle, from C, on a raspberry pi. I'm not quite clear on how to lay out the data.
Looking at https://cdn-shop.adafruit.com/datasheets/APA102.pdf#page=3 makes me think you start with 4 bytes of 0, a string of coded LED values (4 bytes per LED) and then 4 bytes of 1's. But that cannot be right; the final 4 bytes of 1's would be indistinguishable from a request to set an LED to full brightness white. So how could that terminate the data?
Insight welcome. Yes, I know there's a python library out there for this, but I'm coding in C++ or C.
After much digging, I found the answer here:
https://cpldcpu.wordpress.com/2014/11/30/understanding-the-apa102-superled/
The end frame is more complex than the spec suggests, but the spec is correct if your string has 32 LEDS, and you must always specify values for all LEDS in your string.

Advice for bit level manipulation

I'm currently working on a project that involves a lot of bit level manipulation of data such as comparison, masking and shifting. Essentially I need to search through chunks of bitstreams between 8kbytes - 32kbytes long for bit patterns between 20 - 40bytes long.
Does anyone know of general resources for optimizing for such operations in CUDA?
There has been a least a couple of questions on SO on how to do text searches with CUDA. That is, finding instances of short byte-strings in long byte-strings. That is similar to what you want to do. That is, a byte-string search is much like a bit-string search where the number of bits in the byte-string can only be a multiple of 8, and the algorithm only checks for matches every 8 bits. Search on SO for CUDA string searching or matching, and see if you can find them.
I don't know of any general resources for this, but I would try something like this:
Start by preparing 8 versions of each of the search bit-strings. Each bit-string shifted a different number of bits. Also prepare start and end masks:
start
01111111
00111111
...
00000001
end
10000000
11000000
...
11111110
Then, essentially, perform byte-string searches with the different bit-strings and masks.
If you're using a device with compute capability >= 2.0, store the shifted bit-strings in global memory. The start and end masks can probably just be constants in your program.
Then, for each byte position, launch 8 threads that each checks a different version of the 8 shifted bit-strings against the long bit-string (which you now treat like a byte-string). In each block, launch enough threads to check, for instance, 32 bytes, so that the total number of threads per block becomes 32 * 8 = 256. The L1 cache should be able to hold the shifted bit-strings for each block, so that you get good performance.

Where does the limitation of 10^15 in D.J. Bernstein's 'primegen' program come from?

At http://cr.yp.to/primegen.html you can find sources of program that uses Atkin's sieve to generate primes. As the author says that it may take few months to answer an e-mail sent to him (I understand that, he sure is an occupied man!) I'm posting this question.
The page states that 'primegen can generate primes up to 1000000000000000'. I am trying to understand why it is so. There is of course a limitation up to 2^64 ~ 2 * 10^19 (size of long unsigned int) because this is how the numbers are represented. I know for sure that if there would be a huge prime gap (> 2^31) then printing of numbers would fail. However in this range I think there is no such prime gap.
Either the author overestimated the bound (and really it is around 10^19) or there is a place in the source code where the arithmetic operation can overflow or something like that.
The funny thing is that you actually MAY run it for numbers > 10^15:
./primes 10000000000000000 10000000000000100
10000000000000061
10000000000000069
10000000000000079
10000000000000099
and if you believe Wolfram Alpha, it is correct.
Some facts I had "reverse-engineered":
numbers are sifted in batches of 1,920 * PRIMEGEN_WORDS = 3,932,160 numbers (see primegen_fill function in primegen_next.c)
PRIMEGEN_WORDS controls how big a single sifting is - you can adjust it in primegen_impl.h to fit your CPU cache,
the implementation of the sieve itself is in primegen.c file - I assume it is correct; what you get is a bitmask of primes in pg->buf (see primegen_fill function)
The bitmask is analyzed and primes are stored in pg->p array.
I see no point where the overflow may happen.
I wish I was on my computer to look, but I suspect you would have different success if you started at 1 as your lower bound.
Just from the algorithm, I would conclude that the upper bound comes from the 32 bit numbers.
The page mentiones Pentium-III as CPU so my guess it is very old and does not use 64 bit.
2^32 are approx 10^9. Sieve of Atkins (which the algorithm uses) requires N^(1/2) bits (it uses a big bitfield). Which means in 2^32 big memory you can make (conservativ) N approx 10^15. As this number is a rough conservative upper bound (you have system and other programs occupying memory, reserving address ranges for IO,...) the real upper bound is/might be higher.