How many nand gates does a computer actually need to operate?

How many nand gates does a computer actually need to operate? - hardware

At first I was thinking that logic gates were much smaller than they actually are:
https://www.google.com/search?q=nand+gates#q=nand+gates&tbm=shop
So my question is, how many logic gates (similar to the one above) does a computer actually need to operate? Since this number must be somewhat small due to size limitations (there clearly cannot be millions of these in a computer), how is it that the computer can work with such a small number of these gates?

You can't just shop for nand gates, those are ones used for hobbies or other markets, nand gates are in your processor in your computer, and they're not individual component but directly lithographied on the die, they are a few nanometers in size and there are billons of them on a modern processor.

Related

Is it possible to model the Universe in an object oriented manner from the subatomic level upwards?

While I'm certain this must have been tried before, I cant seem to find any examples of this concept being done myself.
What I'm describing goes off of the idea that effectively you could model all "things" which are as objects. From their you can make objects which use other objects. An example would be starting at the fundamental particles in physics combine them to get certain particles like protons neutrons and electrons - then atoms - work your way up to the rest of chemistry etc....
Has this been attempted before and is it possible? How would I even go about it?

If what you mean by "the Universe," is the entire actual universe, the answer to "Is it possible?" is a resounding "Hell no!!!"
Consider a single mole of H2O, good old water. By definition a mole contains ~6*1023 atoms, and knowing the atomic weights involved yields the mass. The density of water is well known. Pulling all the pieces together, we end up with 1 mole is about 18 mL of water. To put that in perspective, the cough syrup dose cup in my medicine cabinet is 20mL. If you could represent the state of each atom using a single byte—I doubt it!—you'd require 1011 terabytes of storage just to represent a snapshot of that mass, and you'd need to update that volume of data every delta-t for the duration you wish to simulate. Additionally, the number of 2-way interactions between N entities grows as O(N2), i.e., on the order of 1046 calculations would be involved, again at every delta-t. To put that into perspective, if you had access to the world's fastest current distributed computer with exaflop capability, it would take you O(1028) seconds (on the order of 1020 years) to perform the calculations for a single simulated delta-t update! You might be able to improve that by playing games with locality, but given the speed of light and the small distances involved you'd have to make a convincing case that heat transfer via thermal radiation couldn't cause state-altering interactions between any pair of atoms within the volume. To sum it up, the storage and calculation requirements are both infeasible for as little as a single mole of mass.
I know from a conversation at a conference a couple of years ago that there are some advanced physics labs that have worked on this approach to get an idea of what happens with a few thousand atoms. However, I can't give specific references since I haven't seen the papers and only heard about it over a beer.

Relationship between number of logic cells on an FPGA and performance

Hey so I have a question about FPGA's. If you look at the current lineup of xilinx products, specifically the 7 series, there is a massive price differential between each of the models. What I don't understand is if I could buy an Artix-7 with ~200k logic cells for $300 whereas a Virtex-7 with ~2000k logic cells costs in excess of $20,000. So could I just buy 10 Artix-7's and get the same performance? Furthermore is performance linearly related to the number of logic cells, and if not then how are they related? Is there any advantage to having more logic cells per core? I'm sure it depends on what you are doing but as my interest in the matter, although theoretical, lies in cryptograpic applications, my question relates specifically to implementations of the MD5, SHA-0/1/2/3, and similar encryption algorithms.

An FPGA doesn't have "performance" like a processor. It just has a bunch of logic elements (LEs) that you can use. If a high-end part has 2MLEs and a low-end part has 200kLEs, but you only need 20kLEs for your processing core, it makes little difference which one you use, all else being equal. Of course, if you have a problem that can easily be parallelized, then you can turn those extra LEs into extra performance by building more processing cores. But that's up to you to do.
Now, all else is not always equal, because there's a lot more to an FPGA than simply the number of logic cells. I can't speak for Xilinx parts (I work for another major FPGA vendor) but typically the high-end families will have things like very high-speed transceivers that the midrange and low-end families do not. In addition, sometimes they have different mixes of embedded RAM, DSP, etc.
So, can you use a bunch of small FPGAs instead of a large one? Remember that an FPGA will only have about 1000-2000 IOs, whereas there will be more like 100Ks of internal wires between the corresponding parts of the higher-end part. So not only will you have to build a pretty complicated board, you might find yourself IO-limited in getting signals off of one chip and onto another.

Small embedded synthesized speech libraries/suggestions

Are there any easy-to-use free or cheap speech synthesis libraries for PIC and/or ARM embedded systems where code size is more important than speech quality? Nowadays it seems that a 1 meg package is considered "compact", but a lot of microcontrollers are smaller than that. Back in the 1980's Apple hired a contractor to produce Macintalk, which offered reasonable-quality speech in a 26K package which ran on a 7.16MHz 68000, and a program called SAM could produce speech that wasn't quite as good, but still serviceable, with a 16K package that ran on a 1MHz 6502. The SpeakJet runs a speech-synthesis algorithm on some type of PIC.
I probably wouldn't particularly need to produce speech, but would want to be able to speak messages formed from a number of pre-set words. Obviously it would be possible to simply prerecord all the messages, but with a vocabulary of e.g. 100 words, I would think that storing 16K worth of code plus maybe 1K worth of phonetic strings would be more compact than storing audio for 100 words.
Alternatively, if I wanted to store audio for 100 words, what would be the best way of generating a set of words that would flow naturally together? On older-style speech synthesizers, any given word could be spoken three ways: neutral inflection, falling inflection (as if followed by a period), or rising inflection (followed by a question mark). Words with neutral inflection could be spliced together in any order and sound fine. The text-to-wave tools I've found, though, seem to like to add finer details of inflection which sound "off" if words are cut apart and resequenced. Are there any tools which are designed for producing waves that can be concatenated and spliced nicely? If I do use such a tool, what audio format would be best for storing the waves so as to allow efficient decoding on a small microcontroller?

Last time I did this I was able add hardware like:http://www.sparkfun.com/products/9578 . There may be patent liabilities in your environment, like I ran into, that force a commercial software stack or OTS chip.
Otherwise, I've used http://www.speech.cs.cmu.edu/flite/ for more lenient projects, and it worked well.

Can you program a pure GPU game?

I'm a CS master student, and next semester I will have to start working on my thesis. I've had trouble coming up with a thesis idea, but I decided it will be related to Computer Graphics as I'm passionate about game development and wish to work as a professional game programmer one day.
Unfortunately I'm kinda new to the field of 3D Computer Graphics, I took an undergraduate course on the subject and hope to take an advanced course next semester, and I'm already reading a variety of books and articles to learn more. Still, my supervisor thinks its better if I come up with a general thesis idea now and then spend time learning about it in preparation for doing my thesis proposal. My supervisor has supplied me with some good ideas but I'd rather do something more interesting on my own, which hopefully has to do with games and gives me more opportunities to learn more about the field. I don't care if it's already been done, for me the thesis is more of an opportunity to learn about things in depth and to do substantial work on my own.
I don't know much about GPU programming and I'm still learning about shaders and languages like CUDA. One idea I had is to program an entire game (or as much as possible) on the GPU, including all the game logic, AI, and tests. This is inspired by reading papers on GPGPU and questions like this one I don't know how feasible that is with my knowledge, and my supervisor doesn't know a lot about recent GPUs. I'm sure with time I will be able to answer this question on my own, but it'd be handy if I could know the answer in advance so I could also consider other ideas.
So, if you've got this far, my question: Using only shaders or something like CUDA, can you make a full, simple 3D game that exploits the raw power and parallelism of GPUs? Or am I missing some limitation or difference between GPUs and CPUs that will always make a large portion of my code bound to CPU? I've read about physics engines running on the GPU, so why not everything else?

DISCLAIMER: I've done a PhD, but have never supervised a student of my own, so take all of what I'm about to say with a grain of salt!
I think trying to force as much of a game as possible onto a GPU is a great way to start off your project, but eventually the point of your work should be: "There's this thing that's an important part of many games, but in it's present state doesn't fit well on a GPU: here is how I modified it so it would fit well".
For instance, fortran mentioned that AI algorithms are a problem because they tend to rely on recursion. True, but, this is not necessarily a deal-breaker: the art of converting recursive algorithms into an iterative form is looked upon favorably by the academic community, and would form a nice center-piece for your thesis.
However, as a masters student, you haven't got much time so you would really need to identify the kernel of interest very quickly. I would not bother trying to get the whole game to actually fit onto the GPU as part of the outcome of your masters: I would treat it as an exercise just to see which part won't fit, and then focus on that part alone.
But be careful with your choice of supervisor. If your supervisor doesn't have any relevant experience, you should pick someone else who does.

I'm still waiting for a Gameboy Emulator that runs entirely on the GPU, which is just fed the game ROM itself and current user input and results in a texture displaying the game - maybe a second texture for sound output :)
The main problem is that you can't access persistent storage, user input or audio output from a GPU. These parts have to be on the CPU, by definition (even though cards with HDMI have audio output, but I think you can't control it from the GPU). Apart from that, you can already push large parts of the game code into the GPU, but I think it's not enough for a 3D game, since someone has to feed the 3D data into the GPU and tell it which shaders should apply to which part. You can't really randomly access data on the GPU or run arbitrary code, someone has to do the setup.
Some time ago, you would just setup a texture with the source data, a render target for the result data, and a pixel shader that would do the transformation. Then you rendered a quad with the shader to the render target, which would perform the calculations, and then read the texture back (or use it for further rendering). Today, things have been made simpler by the fourth and fifth generation of shaders (Shader Model 4.0 and whatever is in DirectX 11), so you can have larger shaders and access memory more easily. But still they have to be setup from the outside, and I don't know how things are today regarding keeping data between frames. In worst case, the CPU has to read back from the GPU and push again to retain game data, which is always a slow thing to do. But if you can really get to a point where a single generic setup/rendering cycle would be sufficient for your game to run, you could say that the game runs on the GPU. The code would be quite different from normal game code, though. Most of the performance of GPUs comes from the fact that they execute the same program in hundreds or even thousands of parallel shading units, and you can't just write a shader that can draw an image to a certain position. A pixel shader always runs, by definition, on one pixel, and the other shaders can do things on arbitrary coordinates, but they don't deal with pixels. It won't be easy, I guess.
I'd suggest just trying out the points I said. The most important is retaining state between frames, in my opinion, because if you can't retain all data, all is impossible.

First, Im not a computer engineer so my assumptions cannot even be a grain of salt, maybe nano scale.
Artificial intelligence? No problem.There are countless neural network examples running in parallel in google. Example: http://www.heatonresearch.com/encog
Pathfinding? You just try some parallel pathfinding algorithms that are already on internet. Just one of them: https://graphics.tudelft.nl/Publications-new/2012/BB12a/BB12a.pdf
Drawing? Use interoperability of dx or gl with cuda or cl so drawing doesnt cross pci-e lane. Can even do raytracing at corners so no z-fighting anymore, even going pure raytraced screen is doable with mainstream gpu using a low depth limit.
Physics? The easiest part, just iterate a simple Euler or Verlet integration and frequently stability checks if order of error is big.
Map/terrain generation? You just need a Mersenne-twister and a triangulator.
Save game? Sure, you can compress the data parallelly before writing to a buffer. Then a scheduler writes that data piece by piece to HDD through DMA so no lag.
Recursion? Write your own stack algorithm using main vram, not local memory so other kernels can run in wavefronts and GPU occupation is better.
Too much integer needed? You can cast to a float then do 50-100 calcs using all cores then cast the result back to integer.
Too much branching? Compute both cases if they are simple, so every core is in line and finish in sync. If not, then you can just put a branch predictor of yourself so the next time, it predicts better than the hardware(could it be?) with your own genuine algorithm.
Too much memory needed? You can add another GPU to system and open DMA channel or a CF/SLI for faster communication.
Hardest part in my opinion is the object oriented design since it is very weird and hardware dependent to build pseudo objects in gpu. Objects should be represented in host(cpu) memory but they must be separated over many arrays in gpu to be efficient. Example objects in host memory: orc1xy_orc2xy_orc3xy. Example objects in gpu memory: orc1_x__orc2_x__ ... orc1_y__orc2_y__ ...

The answer has already been chosen 6 years ago but for those interested to the actual question, Shadertoy, a live-coding WebGL platform, recently added the "multipass" feature allowing preservation of state.
Here's a live demo of the Bricks game running on Gpu.

I don't care if it's already been
done, for me the thesis is more of an
opportunity to learn about things in
depth and to do substantial work on my
own.
Then your idea of what a thesis is is completely wrong. A thesis must be an original research. --> edit: I was thinking about a PhD thesis, not a master thesis ^_^
About your question, the GPU's instruction sets and capabilities are very specific to vector floating point operations. The game logic usually does little floating point, and much logic (branches and decision trees).
If you take a look to the CUDA wikipedia page you will see:
It uses a recursion-free,
function-pointer-free subset of the C
language
So forget about implementing any AI algorithms there, that are essentially recursive (like A* for pathfinding). Maybe you could simulate the recursion with stacks, but if it's not allowed explicitly it should be for a reason. Not having function pointers also limits somewhat the ability to use dispatch tables for handling the different actions depending on state of the game (you could use again chained if-else constructions, but something smells bad there).
Those limitations in the language reflect that the underlying HW is mostly thought to do streaming processing tasks. Of course there are workarounds (stacks, chained if-else), and you could theoretically implement almost any algorithm there, but they will probably make the performance suck a lot.
The other point is about handling the IO, as already mentioned up there, this is a task for the main CPU (because it is the one that executes the OS).

It is viable to do a masters thesis on a subject and with tools that you are, when you begin, unfamiliar. However, its a big chance to take!
Of course a masters thesis should be fun. But ultimately, its imperative that you pass with distinction and that might mean tackling a difficult subject that you have already mastered.
Equally important is your supervisor. Its imperative that you tackle some problem they show an interest in - that they are themselves familiar with - so that they can become interested in helping you get a great grade.
You've had lots of hobby time for scratching itches, you'll have lots more hobby time in the future too no doubt. But master thesis time is not the time for hobbies unfortunately.

Whilst GPUs today have got some immense computational power, they are, regardless of things like CUDA and OpenCL limited to a restricted set of uses, whereas the CPU is more suited towards computing general things, with extensions like SSE to speed up specific common tasks. If I'm not mistaken, some GPUs have the inability to do a division of two floating point integers in hardware. Certainly things have improved greatly compared to 5 years ago.
It'd be impossible to develop a game to run entirely in a GPU - it would need the CPU at some stage to execute something, however making a GPU perform more than just the graphics (and physics even) of a game would certainly be interesting, with the catch that game developers for PC have the biggest issue of having to contend with a variety of machine specification, and thus have to restrict themselves to incorporating backwards compatibility, complicating things. The architecture of a system will be a crucial issue - for example the Playstation 3 has the ability to do multi gigabytes a second of throughput between the CPU and RAM, GPU and Video RAM, however the CPU accessing GPU memory peaks out just past 12MiB/s.

The approach you may be looking for is called "GPGPU" for "General Purpose GPU". Good starting points may be:
http://en.wikipedia.org/wiki/GPGPU
http://gpgpu.org/
Rumors about spectacular successes in this approach have been around for a few years now, but I suspect that this will become everyday practice in a few years (unless CPU architectures change a lot, and make it obsolete).
The key here is parallelism: if you have a problem where you need a large number of parallel processing units. Thus, maybe neural networks or genetic algorithms may be a good range of problems to attack with the power of a GPU. Maybe also looking for vulnerabilities in cryptographic hashes (cracking the DES on a GPU would make a nice thesis, I imagine :)). But problems requiring high-speed serial processing don't seem so much suited for the GPU. So emulating a GameBoy may be out of scope. (But emulating a cluster of low-power machines might be considered.)

I would think a project dealing with a game architecture that targets multiple core CPUs and GPUs would be interesting. I think this is still an area where a lot of work is being done. In order to take advantage of current and future computer hardware, new game architectures are going to be needed. I went to GDC 2008 and there were ome talks related to this. Gamebryo had an interesting approach where they create threads for processing computations. You can designate the number of cores you want to use so that if you don't starve out other libraries that might be multi-core. I imagine the computations could be targeted to GPUs as well.
Other approaches included targeting different systems for different cores so that computations could be done in parallel. For instance, the first split a talk suggested was to put the renderer on its own core and the rest of the game on another. There are other more complex techniques but it all basically boils down to how do you get the data around to the different cores.

GPS signal cleaning & road network matching

I'm using GPS units and mobile computers to track individual pedestrians' travels. I'd like to in real time "clean" the incoming GPS signal to improve its accuracy. Also, after the fact, not necessarily in real time, I would like to "lock" individuals' GPS fixes to positions along a road network. Have any techniques, resources, algorithms, or existing software to suggest on either front?
A few things I am already considering in terms of signal cleaning:
- drop fixes for which num. of satellites = 0
- drop fixes for which speed is unnaturally high (say, 600 mph)
And in terms of "locking" to the street network (which I hear is called "map matching"):
- lock to the nearest network edge based on root mean squared error
- when fixes are far away from road network, highlight those points and allow user to use a GUI (OpenLayers in a Web browser, say) to drag, snap, and drop on to the road network
Thanks for your ideas!

I assume you want to "clean" your data to remove erroneous spikes caused by dodgy readings. This is a basic dsp process. There are several approaches you could take to this, it depends how clever you want it to be.
At a basic level yes, you can just look for really large figures, but what is a really large figure? Yeah 600mph is fast, but not if you're in concorde. Whilst you are looking for a value which is "out of the ordinary", you are effectively hard-coding "ordinary". A better approach is to examine past data to determine what "ordinary" is, and then look for deviations. You might want to consider calculating the variance of the data over a small local window and then see if the z-score of your current data is greater than some threshold, and if so, exclude it.

One note: you should use 3 as the minimum satellites, not 0. A GPS needs at least three sources to calculate a horizontal location. Every GPS I have used includes a status flag in the data stream; less than 3 satellites is reported as "bad" data in some way.
You should also consider "stationary" data. How will you handle the pedestrian standing still for some period of time? Perhaps waiting at a crosswalk or interacting with a street vendor?
Depending on what you plan to do with the data, you may need to supress those extra data points or average them into a single point or location.

You mention this is for pedestrian tracking, but you also mention a road network. Pedestrians can travel a lot of places where a car cannot, and, indeed, which probably are not going to be on any map you find of a "road network". Most road maps don't have things like walking paths in parks, hiking trails, and so forth. Don't assume that "off the road network" means the GPS isn't getting an accurate fix.

In addition to Andrew's comments, you may also want to consider interference factors such as multipath, and how they are affected in your incoming GPS data stream, e.g. HDOPs in the GSA line of NMEA0183. In my own GPS controller software, I allow user specified rejection criteria against a range of QA related parameters.
I also tend to work on a moving window principle in this regard, where you can consider rejecting data that represents a spike based on surrounding data in the same window.

Read the posfix to see if the signal is valid (somewhere in the $GPGGA sentence if you parse raw NMEA strings). If it's 0, ignore the message.
Besides that you could look at the combination of HDOP and the number of satellites if you really need to be sure that the signal is very accurate, but in normal situations that shouldn't be necessary.
Of course it doesn't hurt to do some sanity checks on GPS signals:
latitude between -90..90;
longitude between -180..180 (or E..W, N..S, 0..90 and 0..180 if you're reading raw NMEA strings);
speed between 0 and 255 (for normal cars);
distance to previous measurement matches (based on lat/lon) matches roughly with the indicated speed;
timedifference with system time not larger than x (unless the system clock cannot be trusted or relies on GPS synchronisation :-) );
To do map matching, you basically iterate through your road segments, and check which segment is the most likely for your current position, direction, speed and possibly previous gps measurements and matches.
If you're not doing a realtime application, or if a delay in feedback is acceptable, you can even look into the 'future' to see which segment is the most likely.
Doing all that properly is an art by itself, and this space here is too short to go into it deeply.
It's often difficult to decide with 100% confidence on which road segment somebody resides. For example, if there are 2 parallel roads that are equally close to the current position it's a matter of creative heuristics.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas