I want to convert a binary code to a grey code using 16-to-1 MUX - optimization

The problem is that I didn't learned yet adders or VHDL (which a lot of people are telling me to use them) but all I have is 16-to-1 MUXs.
Should I link each MUX with the other from the select input? (Knowing that I have 4 inputs and 4 outputs obviously)
P.S: I am new to this kind of stuff and I am having a hard time to solve this.
Thank you in advance.

Converting binary to Gray code is pretty easy. In C it's just gray = bin ^ (bin>>1). You need on XOR gate for each bit except for the highest one.
There's a nice schematic over here: https://www.electrical4u.com/binary-to-gray-code-converter-and-grey-to-binary-code-converter/
You can make an XOR easily with a 4-to-1 MUX. You can, of course, make one with a 16-to-1 MUX too, by tying two of the inputs to ground, but it's a big waste of gates.

Related

Procedural generated 2D caves/dungeons for sidescroller like game

I would like to proceduraly generate 2D caves. I already tried out using some 1D simplex noise to determine the terrain of the floor, which is basically everything you can change in a sidescroller, but it turned out rather unimpressive.
I would like to have an interesting terrain for my cave/dungeon and if possible some alternative paths.
I couldn't come up with any ideas for this kind of terrain and I also could not find any promising ways to do this kind of stuff on the internet.
Assuming you've tried something like abs(simplex) > 0.1 ? nocave : cave, maybe the problem you're running into is that it's too connected, and that there are no dead-ends anywhere. You could extend this by doing the following simplex1*simplex1 > threshold(simplex2) where threshold(t)=a*t/(t+b) where a roughly controls the max threshold (cave opening size) and b roughly controls how much of the underground is caves vs not. The function looks like this, and the fact that it quickly rushes to zero is supposed to make dead ends look more rounded and less pointy. Working with the squared noise value is also for this purpose.
Have you tried Perlin Noise? Seems to be the standard way of doing this sort of thing.

How can I compare two NSImages for differences?

I'm attempting to gauge the percentage difference between two images.
Having done a lot of reading I seem to have a number of options but I'm not sure what the best method to follow for:
Ease of coding
Performance.
The methods I've seen are:
Non language specific - academic Image comparison - fast algorithm and Mac specific direct pixel access http://www.markj.net/iphone-uiimage-pixel-color/
Does anyone have any advice about what solutions make most sense for the above two cases and have code samples to show how to apply them?
I've had success calculating the difference between two images using the histogram technique mentioned here. redmoskito's answer in the SO question you linked to was actually my inspiration!
The following is an overview of the algorithm I used:
Convert the images to grayscale—compare one channel instead of three.
Divide each image into an n * n grid of "subimages". Then, for subimage pair:
Calculate their colour composition histograms.
Calculate the absolute difference between the two histograms.
The maximum difference found between two subimages is a measure of the two images' difference. Other metrics could also be used (e.g. the average difference betwen subimages).
As tskuzzy noted in his answer, if your ultimate goal is a binary "yes, these two images are (roughly) the same" or "no, they're not", you need some meaningful threshold value. You could produce such a value by passing images into the algorithm and tweaking the threshold based on its output and how similar you think the images are. A form of machine learning, I suppose.
I recently wrote a blog post on this very topic, albeit as part of a larger goal. I also created a simple iPhone app to demonstrate the algorithm. You can find the source on GitHub; perhaps it will help?
It is really difficult to suggest something when you don't tell us more about the images or the variations. Are they shapes? Are they the different objects and you want to know what class of objects? Are they the same object and you want to distinguish the object instance? Are they faces? Are they fingerprints? Are the objects in the same pose? Under the same illumination?
When you say performance, what exactly do you mean? How large are the images? All in all it really depends. With what you've said if it is only ease of coding and performance I would suggest to just find the absolute value of the difference of pixels. That is super easy to code and about as fast as it gets, but really unlikely to work for anything other than the most synthetic examples.
That being said I would like to point you to: DHOG, GLOH, SURF and SIFT.
You can use fairly basic subtraction technique that the lads above suggested. #carlosdc has hit the nail on the head with regard to the type of image this basic technique can be used for. I have attached an example so you can see the results for yourself.
The first shows a image from a simulation at some time t. A second image was subtracted away from the first which was taken some (simulation) time later t + dt. The subtracted image (in black and white for clarity) then shows how the simulation has changed in that time. This was done as described above and is very powerful and easy to code.
Hope this aids you in some way
This is some old nasty FORTRAN, but should give you the basic approach. It is not that difficult at all. Due to the fact that I am doing it on a two colour pallette you would do this operation for R, G and B. That is compute the intensities or values in each cell/pixal, store them in some array. Do the same for the other image, and subtract one array from the other, this will leave you with some coulorfull subtraction image. My advice would be to do as the lads suggest above, compute the magnitude of the sum of the R, G and B componants so you just get one value. Write that to array, do the same for the other image, then subtract. Then create a new range for either R, G or B and map the resulting subtracted array to this, the will enable a much clearer picture as a result.
* =============================================================
SUBROUTINE SUBTRACT(FNAME1,FNAME2,IOS)
* This routine writes a model to files
* =============================================================
* Common :
INCLUDE 'CONST.CMN'
INCLUDE 'IO.CMN'
INCLUDE 'SYNCH.CMN'
INCLUDE 'PGP.CMN'
* Input :
CHARACTER fname1*(sznam),fname2*(sznam)
* Output :
integer IOS
* Variables:
logical glue
character fullname*(szlin)
character dir*(szlin),ftype*(3)
integer i,j,nxy1,nxy2
real si1(2*maxc,2*maxc),si2(2*maxc,2*maxc)
* =================================================================
IOS = 1
nomap=.true.
ftype='map'
dir='./pictures'
! reading first image
if(.not.glue(dir,fname2,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy2
read(unit2,err=11)rad,dxy
do i=1,nxy2
do j=1,nxy2
read(unit2,err=11)si2(i,j)
enddo
enddo
CLOSE(unit2)
! reading second image
if(.not.glue(dir,fname1,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy1
read(unit2,err=11)rad,dxy
do i=1,nxy1
do j=1,nxy1
read(unit2,err=11)si1(i,j)
enddo
enddo
CLOSE(unit2)
! substracting images
if(nxy1.eq.nxy2)then
nxy=nxy1
do i=1,nxy1
do j=1,nxy1
si(i,j)=si2(i,j)-si1(i,j)
enddo
enddo
else
print *,'SUBSTRACT: Different sizes of image arrays'
IOS=0
return
endif
* normal finishing
IOS=0
nomap=.false.
return
* exceptional finishing
10 write (*,30) fullname
return
11 write (*,32) fullname
return
30 format('Cannot open file ',72A)
31 format('Improper filename ',72A)
32 format('Error reading from file ',72A)
end
! =============================================================
Hope this is of some use. All the best.
Out of the methods described in your first link, the histogram comparison method is by far the simplest to code and the fastest. However key point matching will provide far more accurate results since you want to know a precise number describing the difference between two images.
To implement the histogram method, I would do the following:
Compute the red, green, and blue histograms of each image
Add up the differences between each bucket
If the difference is above a certain threshold, then the percentage is 0%
Otherwise the colors found in the images are similar. So then do a pixel by pixel comparison and convert the difference into a percentage.
I don't know any precise algorithms for finding the key points of an image. However once you find them for each image you can do a pixel by pixel comparison for each of the key points.

How to create an "intercept missile" for a game?

I have a game I am working on that has homing missiles in it. At the moment they just turn towards their target, which produces a rather dumb looking result, with all the missiles following the target around.
I want to create a more deadly flavour of missile that will aim at the where the target "will be" by the time it gets there and I am getting a bit stuck and confused about how to do it.
I am guessing I will need to work out where my target will be at some point in the future (a guess anyway), but I can't get my head around how far ahead to look. It needs to be based on how far the missile is away from the target, but the target it also moving.
My missiles have a constant thrust, combined with a weak ability to turn. The hope is they will be fast and exciting, but steer like a cow (ie, badly, for the non HitchHiker fans out there).
Anyway, seemed like a kind of fun problem for Stack Overflow to help me solve, so any ideas, or suggestions on better or "more fun" missiles would all be gratefully received.
Next up will be AI for dodging them ...
What you are suggesting is called "Command Guidance" but there is an easier, and better way.
The way that real missiles generally do it (Not all are alike) is using a system called Proportional Navigation. This means the missile "turns" in the same direction as the line-of-sight (LOS) between the missile and the target is turning, at a turn rate "proportional" to the LOS rate... This will do what you are asking for as when the LOS rate is zero, you are on collision course.
You can calculate the LOS rate by just comparing the slopes of the line between misile and target from one second to the next. If that slope is not changing, you are on collision course. if it is changing, calculate the change and turn the missile by a proportionate angular rate... you can use any metrics that represent missile and target position.
For example, if you use a proportionality constant of 2, and the LOS is moving to the right at 2 deg/sec, turn the missile to the right at 4 deg/sec. LOS to the left at 6 deg/sec, missile to the left at 12 deg/sec...
In 3-d problem is identical except the "Change in LOS Rate", (and resultant missile turn rate) is itself a vector, i.e., it has not only a magnitude, but a direction (Do I turn the missile left, right or up or down or 30 deg above horizontal to the right, etc??... Imagine, as a missile pilot, where you would "set the wings" to apply the lift...
Radar guided missiles, which "know" the rate of closure. adjust the proportionality constant based on closure (the higher the closure the faster the missile attempts to turn), so that the missile will turn more aggressively in high closure scenarios, (when the time of flight is lower), and less aggressively in low closure (tail chases) when it needs to conserve energy.
Other missiles (like Sidewinders), which do not know the closure, use a constant pre-determined proportionality value). FWIW, Vietnam era AIM-9 sidewinders used a proportionality constant of 4.
I've used this CodeProject article before - it has some really nice animations to explain the math.
"The Mathematics of Targeting and Simulating a Missile: From Calculus to the Quartic Formula":
http://www.codeproject.com/KB/recipes/Missile_Guidance_System.aspx
(also, hidden in the comments at the bottom of that article is a reference to some C++ code that accomplishes the same task from the Unreal wiki)
Take a look at OpenSteer. It has code to solve problems like this. Look at 'steerForSeek' or 'steerForPursuit'.
Have you considered negative feedback on the recent change of bearing over change of time?
Details left as an exercise.
The suggestions is completely serious: if the target does not maneuver this should obtain a near optimal intercept. And it should converge even if the target is actively dodging.
Need more detail?
Solving in a two dimensional space for ease of notation. Take \vec{m} to be the location of the missile and vector \vec{t} To be the location of the target.
The current heading in the direction of motion over last time unit: \vec{h} = \bar{\vec{m}_i - \vec{m}_i-1}}. Let r be the normlized vector between the missile and the target: \vec{r} = \bar{\vec{t} - \vec{m}}. The bearing is b = \vec{r} \dot \vec{h} Compute the bearing at each time tick, and the change thereof, and change heading to minimize that quantity.
The math is harrier in 3d because of the need to find the plane of action at each step, but the process is the same.
You'll want to interpolate the trajectory of both the target and the missile as a function of time. Then look for the times in which the coordinates of the objects are within some acceptable error.

Quick divisibility check in ZX81 BASIC

Since many of the Project Euler problems require you to do a divisibility check for quite a number of times, I've been trying to figure out the fastest way to perform this task in ZX81 BASIC.
So far I've compared (N/D) to INT(N/D) to check, whether N is dividable by D or not.
I have been thinking about doing the test in Z80 machine code, I haven't yet figured out how to use the variables in the BASIC in the machine code.
How can it be achieved?
You can do this very fast in machine code by subtracting repeatedly. Basically you have a procedure like:
set accumulator to N
subtract D
if carry flag is set then it is not divisible
if zero flag is set then it is divisible
otherwise repeat subtraction until one of the above occurs
The 8 bit version would be something like:
DIVISIBLE_TEST:
LD B,10
LD A,100
DIVISIBLE_TEST_LOOP:
SUB B
JR C, $END_DIVISIBLE_TEST
JR Z, $END_DIVISIBLE_TEST
JR $DIVISIBLE_TEST_LOOP
END_DIVISIBLE_TEST:
LD B,A
LD C,0
RET
Now, you can call from basic using USR. What USR returns is whatever's in the BC register pair, so you would probably want to do something like:
REM poke the memory addresses with the operands to load the registers
POKE X+1, D
POKE X+3, N
LET r = USR X
IF r = 0 THEN GOTO isdivisible
IF r <> 0 THEN GOTO isnotdivisible
This is an introduction I wrote to Z80 which should help you figure this out. This will explain the flags if you're not familiar with them.
There's a load more links to good Z80 stuff from the main site although it is Spectrum rather than ZX81 focused.
A 16 bit version would be quite similar but using register pair operations. If you need to go beyond 16 bits it would get a bit more convoluted.
How you load this is up to you - but the traditional method is using DATA statements and POKEs. You may prefer to have an assembler figure out the machine code for you though!
Your existing solution may be good enough. Only replace it with something faster if you find it to be a bottleneck in profiling.
(Said with a straight face, of course.)
And anyway, on the ZX81 you can just switch to FAST mode.
Don't know if RANDOMIZE USR is available in ZX81 but I think it can be used to call routines in assembly. To pass arguments you might need to use POKE to set some fixed memory locations before executing RANDOMIZE USR.
I remember to find a list of routines implemented in the ROM to support the ZX Basic. I'm sure there are a few to perform floating operation.
An alternative to floating point is to use fixed point math. It's a lot faster in these kind of situations where there is no math coprocessor.
You also might find more information in Sinclair User issues. They published some articles related to programming in the ZX Spectrum
You should place the values in some pre-known memory locations, first. Then use the same locations from within Z80 assembler. There is no parameter passing between the two.
This is based on what I (still) remember of ZX Spectrum 48. Good luck, but you might consider upgrading your hw. ;/
The problem with Z80 machine code is that it has no floating point ops (and no integer divide or multiply, for that matter). Implementing your own FP library in Z80 assembler is not trivial. Of course, you can use the built-in BASIC routines, but then you may as well just stick with BASIC.

What are the most hardcore optimisations you've seen?

I'm not talking about algorithmic stuff (eg use quicksort instead of bubblesort), and I'm not talking about simple things like loop unrolling.
I'm talking about the hardcore stuff. Like Tiny Teensy ELF, The Story of Mel; practically everything in the demoscene, and so on.
I once wrote a brute force RC5 key search that processed two keys at a time, the first key used the integer pipeline, the second key used the SSE pipelines and the two were interleaved at the instruction level. This was then coupled with a supervisor program that ran an instance of the code on each core in the system. In total, the code ran about 25 times faster than a naive C version.
In one (here unnamed) video game engine I worked with, they had rewritten the model-export tool (the thing that turns a Maya mesh into something the game loads) so that instead of just emitting data, it would actually emit the exact stream of microinstructions that would be necessary to render that particular model. It used a genetic algorithm to find the one that would run in the minimum number of cycles. That is to say, the data format for a given model was actually a perfectly-optimized subroutine for rendering just that model. So, drawing a mesh to the screen meant loading it into memory and branching into it.
(This wasn't for a PC, but for a console that had a vector unit separate and parallel to the CPU.)
In the early days of DOS when we used floppy discs for all data transport there were viruses as well. One common way for viruses to infect different computers was to copy a virus bootloader into the bootsector of an inserted floppydisc. When the user inserted the floppydisc into another computer and rebooted without remembering to remove the floppy, the virus was run and infected the harddrive bootsector, thus permanently infecting the host PC. A particulary annoying virus I was infected by was called "Form", to battle this I wrote a custom floppy bootsector that had the following features:
Validate the bootsector of the host harddrive and make sure it was not infected.
Validate the floppy bootsector and
make sure that it was not infected.
Code to remove the virus from the
harddrive if it was infected.
Code to duplicate the antivirus
bootsector to another floppy if a
special key was pressed.
Code to boot the harddrive if all was
well, and no infections was found.
This was done in the program space of a bootsector, about 440 bytes :)
The biggest problem for my mates was the very cryptic messages displayed because I needed all the space for code. It was like "FFVD RM?", which meant "FindForm Virus Detected, Remove?"
I was quite happy with that piece of code. The optimization was program size, not speed. Two quite different optimizations in assembly.
My favorite is the floating point inverse square root via integer operations. This is a cool little hack on how floating point values are stored and can execute faster (even doing a 1/result is faster than the stock-standard square root function) or produce more accurate results than the standard methods.
In c/c++ the code is: (sourced from Wikipedia)
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1); // Now this is what you call a real magic number
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
A Very Biological Optimisation
Quick background: Triplets of DNA nucleotides (A, C, G and T) encode amino acids, which are joined into proteins, which are what make up most of most living things.
Ordinarily, each different protein requires a separate sequence of DNA triplets (its "gene") to encode its amino acids -- so e.g. 3 proteins of lengths 30, 40, and 50 would require 90 + 120 + 150 = 360 nucleotides in total. However, in viruses, space is at a premium -- so some viruses overlap the DNA sequences for different genes, using the fact that there are 6 possible "reading frames" to use for DNA-to-protein translation (namely starting from a position that is divisible by 3; from a position that divides 3 with remainder 1; or from a position that divides 3 with remainder 2; and the same again, but reading the sequence in reverse.)
For comparison: Try writing an x86 assembly language program where the 300-byte function doFoo() begins at offset 0x1000... and another 200-byte function doBar() starts at offset 0x1001! (I propose a name for this competition: Are you smarter than Hepatitis B?)
That's hardcore space optimisation!
UPDATE: Links to further info:
Reading Frames on Wikipedia suggests Hepatitis B and "Barley Yellow Dwarf" virus (a plant virus) both overlap reading frames.
Hepatitis B genome info on Wikipedia. Seems that different reading-frame subunits produce different variations of a surface protein.
Or you could google for "overlapping reading frames"
Seems this can even happen in mammals! Extensively overlapping reading frames in a second mammalian gene is a 2001 scientific paper by Marilyn Kozak that talks about a "second" gene in rat with "extensive overlapping reading frames". (This is quite surprising as mammals have a genome structure that provides ample room for separate genes for separate proteins.) Haven't read beyond the abstract myself.
I wrote a tile-based game engine for the Apple IIgs in 65816 assembly language a few years ago. This was a fairly slow machine and programming "on the metal" is a virtual requirement for coaxing out acceptable performance.
In order to quickly update the graphics screen one has to map the stack to the screen in order to use some special instructions that allow one to update 4 screen pixels in only 5 machine cycles. This is nothing particularly fantastic and is described in detail in IIgs Tech Note #70. The hard-core bit was how I had to organize the code to make it flexible enough to be a general-purpose library while still maintaining maximum speed.
I decomposed the graphics screen into scan lines and created a 246 byte code buffer to insert the specialized 65816 opcodes. The 246 bytes are needed because each scan line of the graphics screen is 80 words wide and 1 additional word is required on each end for smooth scrolling. The Push Effective Address (PEA) instruction takes up 3 bytes, so 3 * (80 + 1 + 1) = 246 bytes.
The graphics screen is rendered by jumping to an address within the 246 byte code buffer that corresponds to the right edge of the screen and patching in a BRanch Always (BRA) instruction into the code at the word immediately following the left-most word. The BRA instruction takes a signed 8-bit offset as its argument, so it just barely has the range to jump out of the code buffer.
Even this isn't too terribly difficult, but the real hard-core optimization comes in here. My graphics engine actually supported two independent background layers and animated tiles by using different 3-byte code sequences depending on the mode:
Background 1 uses a Push Effective Address (PEA) instruction
Background 2 uses a Load Indirect Indexed (LDA ($00),y) instruction followed by a push (PHA)
Animated tiles use a Load Direct Page Indexed (LDA $00,x) instruction followed by a push (PHA)
The critical restriction is that both of the 65816 registers (X and Y) are used to reference data and cannot be modified. Further the direct page register (D) is set based on the origin of the second background and cannot be changed; the data bank register is set to the data bank that holds pixel data for the second background and cannot be changed; the stack pointer (S) is mapped to graphics screen, so there is no possibility of jumping to a subroutine and returning.
Given these restrictions, I had the need to quickly handle cases where a word that is about to be pushed onto the stack is mixed, i.e. half comes from Background 1 and half from Background 2. My solution was to trade memory for speed. Because all of the normal registers were in use, I only had the Program Counter (PC) register to work with. My solution was the following:
Define a code fragment to do the blend in the same 64K program bank as the code buffer
Create a copy of this code for each of the 82 words
There is a 1-1 correspondence, so the return from the code fragment can be a hard-coded address
Done! We have a hard-coded subroutine that does not affect the CPU registers.
Here is the actual code fragments
code_buff: PEA $0000 ; rightmost word (16-bits = 4 pixels)
PEA $0000 ; background 1
PEA $0000 ; background 1
PEA $0000 ; background 1
LDA (72),y ; background 2
PHA
LDA (70),y ; background 2
PHA
JMP word_68 ; mix the data
word_68_rtn: PEA $0000 ; more background 1
...
PEA $0000
BRA *+40 ; patched exit code
...
word_68: LDA (68),y ; load data for background 2
AND #$00FF ; mask
ORA #$AB00 ; blend with data from background 1
PHA
JMP word_68_rtn ; jump back
word_66: LDA (66),y
...
The end result was a near-optimal blitter that has minimal overhead and cranks out more than 15 frames per second at 320x200 on a 2.5 MHz CPU with a 1 MB/s memory bus.
Michael Abrash's "Zen of Assembly Language" had some nifty stuff, though I admit I don't recall specifics off the top of my head.
Actually it seems like everything Abrash wrote had some nifty optimization stuff in it.
The Stalin Scheme compiler is pretty crazy in that aspect.
I once saw a switch statement with a lot of empty cases, a comment at the head of the switch said something along the lines of:
Added case statements that are never hit because the compiler only turns the switch into a jump-table if there are more than N cases
I forget what N was. This was in the source code for Windows that was leaked in 2004.
I've gone to the Intel (or AMD) architecture references to see what instructions there are. movsx - move with sign extension is awesome for moving little signed values into big spaces, for example, in one instruction.
Likewise, if you know you only use 16-bit values, but you can access all of EAX, EBX, ECX, EDX , etc- then you have 8 very fast locations for values - just rotate the registers by 16 bits to access the other values.
The EFF DES cracker, which used custom-built hardware to generate candidate keys (the hardware they made could prove a key isn't the solution, but could not prove a key was the solution) which were then tested with a more conventional code.
The FSG 2.0 packer made by a Polish team, specifically made for packing executables made with assembly. If packing assembly isn't impressive enough (what's supposed to be almost as low as possible) the loader it comes with is 158 bytes and fully functional. If you try packing any assembly made .exe with something like UPX, it will throw a NotCompressableException at you ;)