Number of instructions in JVM - jvm

I was asked the following question in an exam today. I still don't know the answer.
Java uses stack for byte code in JVM. Each instruction is of one byte, so how many such instructions (per byte code) are possible in an operating system.
All I know is that the stack is 32 bits wide. Can anybody help me (I am a beginner in JVM)?

The expected answer was almost certainly 256, because there are 256 possible values of a byte.
This of course has nothing to do with the actual JVM instruction set. The number of possible instructions can vary anywhere from a couple dozen to an exponentially large number depending on how you count.
The actual JVM instruction set has many unused opcodes, one opcode that conceptually represent more than one instruction, and many instructions that can be encoded in tons of different ways with multiple different opcodes. Many instructions are more than one byte, with a couple that can be up to 64kb long. And that isn't even getting into stuff like how you count the wide prefix.

Related

How to concatenate the low halves of two SSE registers?

I have two SSE registers and I want to replace the high half of one by the low half of the other. As usual, the fastest way.
I guess it is doable by shifting one of the registers by 8 bytes, then alignr to concatenate.
Is there any single-instruction solution?
You can use punpcklqdq to combine the low halves of two registers into hi:lo in a single register. This is identical to what the movlhps FP instruction does, and also unpcklpd, but operates in the integer domain on CPUs that care about FP vs. integer shuffles for bypass delays.
Bonus reading: combining different parts of two registers
palignr would only be good for combining hi:xxx with xxx:lo, to produce lo:hi (i.e. reversed). You can use an FP shuffle (the register-register form of movsd) to get hi:lo (by moving the low half of xxx:lo to replace the low garbage in hi:xxx). Without that, you'd want to use punpckhqdq to bring the high half of one register to the low half, then use punpcklqdq to combine the low halves of two registers.
On most CPUs other than Intel Nehalem, floating-point shuffles on integer data are generally fine (little or no extra latency when used between vector-integer ALU instructions). On Nehalem, you might get two cycles of extra latency into and out of a floating point shuffle (for a total of 4 cycles latency), but that's only a big problem for throughput if it's part of a loop-carried dependency chain. See Agner Fog's guides for more info.
Agner's Optimizing Assembly guide also has a whole section of tables of SSE/AVX instructions that are useful for various kinds of data movement within or between registers. See the sse tag wiki for a link, download the PDF, read section 13.7 "Permuting data" on page 130.
To use FP shuffles with intrinsics, you have to clutter up your code with _mm_castsi128_ps and _mm_castps_si128, which are reinterpret-casts that emit no instructions.

Is it possible to use one variable to represent numbers of unlimited length in programming? [duplicate]

This question already has answers here:
Most efficient implementation of a large number class
(5 answers)
Closed 8 years ago.
I've been using C# for three years to make games and I've played with various simulations where numbers sometimes get big and Int32 is not enough to store the value. Eventually even Int64 became insufficient for my experiments, it took several such fields (actually an array of variable length) and a special property to correctly handle such big numbers. And so I wondered: Is there a way to declare a numeric variable with unlimited (unknown beforehand) length so I can relax and let the computer do the math?
We can write any kind of number we like on paper without needing any special kind of paper. We can also type a lot of words in a text file without needing special file system alterations to make it save and load correctly. Isn't there a variable to declare a who-knows-how-long-it-will-be number in any programming languages?
Starting with .NET 4, the .NET framework contains a BigInteger structure, which can handle integers of arbitrary size.
Since your question is language-agnostic, it might be worth to mention that internally BigInteger stores the value in an array of unsigned integers, see the following SO question for details:
How does the BigInteger store values internally?
BigInteger is immutable, so there is no need to "resize" the array. Arithmetic operations create new instances of BigInteger, with appropriately sized arrays.
Most modern dynamic languages such as Perl6, Tcl8 and Ruby goes one step further by allowing you to store unlimited (up to available RAM) sized numbers in number types.
Most of these languages don't have separate integer and floating point types but rather a single "number" type that automatically gets converted to whatever it needs to be to be stored in RAM. Some, like Perl6, even includes complex numbers in its "number" type.
How it's implemented at the machine level is that by default numbers are assumed to be integers - so int32 or int64. If need be numbers are converted to floats or doubles if the result of a calculation or assignment isn't an integer. If the integer grows too large then the interpreter/runtime environment silently converts it to a bigInt object/struct (which is simply a big, growable array or linked-list of ints).
How it appears to the programmer is that numbers have unlimited size (again, up to available RAM).
Still, there are gotchas with this system (kind of like the 0.1+0.2!=0.3 issue with floats) so you'd still need to be aware of the underlying implementation even if you can ignore it 99.99% of the time.
For example, if at any point in time you super large number gets converted to a floating point number (most likely a double in hardware) you'll lose precision. Because that's just how floating point numbers work. Sometimes you can do it accidentally. In some languages for example, the power function (like pow() in C) returns a floating point result. So raising an integer to the power of another integer may truncate the result if it's too large.
For the most part, it works. And I personally feel that this is the sane way of dealing with numbers. Lots of language designers have apparently come up with this solution independently.
Is it possible to [...] represent numbers of unlimited length [...]?
No.
On existing computers it is not possible to represent unlimited numbers because the machines are finite. Even when using all existing storage it is not possible to store unlimited numbers.
It is possible, though, to store very large numbers. Wikipedia has information on the concept of arbitrary precision integers.
"Unlimited" - no, as Nikolai Ruhe soundly pointed out. "Unknown" - yes, qualified by the first point. :}
A BigInteger type is available in .NET 4.0 and in Java as others point out.
For .NET 2.0+, take a look at IntX.
More generally, languages (or a de facto library used with them at least) generally have some support for arbitrarily long integers, which provides a means of dealing with the "unknown" you describe.
A discussion on the Ubuntu forums somewhat addresses this question more generally and touches on specifics in more languages - some of which provide simpler means of leveraging arbitrarily large integers (e.g. Python and Common Lisp). Personally, the "relax and let the computer do the math" factor was highest for me in Common Lisp years ago: so it may pay to look around broadly for perspective as you seem inclined to do.

gfortran change/find out write buffer size

I have this molecular dynamics program that writes atom position and velocities to a file at every n steps of simulation. The actual writing is taking like 90% of the running time! (checked by eiminating the writes) So I desperately need to optimize that.
I see that some fortrans have an extension to change the write buffer size (called i/o block size) and the "number of blocks" at the OPEN statement, but it appears that gfortran doesn't. Also I read somewhere that gfortran uses 8192 bytes write buffer.
I even tried to do an FSTAT (right after opening, is that right?) to see what is the block size and number of blocks it is using but it returns -1 on both. (compiling for windows 64 bit)
Isn't there a way to enlarge the write buffer for a file in gfortran? Will it be diferent compiling for linux than for windows?
I'd really really rather stay in fortran but as a desperate measure isn't there a way to do so by adding some c routine?
thanks!
IanH question is key. Unformatted IO is MUCH faster than formatted. The conversion from base 2 to base 10 is very CPU intensive. If you don't need the values to be human readable, then use unformatted IO. If you want to be able to read the values in another language, then use access='stream'.
Another approach would be to add your own buffering. Replace the write statement with a call to a subroutine. Have that subroutine store values and write only when it has received M values. You'll also have to have a "flush" call to the subroutine to cause it to write the last values, if they are fewer them M.
If gcc C is faster at IO, you could mix Fortran and C with Fortran's ISO_C_Binding: https://stackoverflow.com/questions/tagged/fortran-iso-c-binding. There are examples of the use of the ISO C Binding in the gfortran manual under "Mixed Language Programming".
If you spend 90% of your runtime writing coords/vels every n timesteps, the obvious quick fix would be to instead write data every, say, n/100 timestep. But I'm sure you already thought of that yourself.
But yes, gfortran has a fixed 8k buffer, whose size cannot be changed except by modifying the libgfortran source and rebuilding it. The reason for the buffering is to amortize the syscall overhead; (simplistic) tests on Linux showed that 8k is sufficient and more than that goes far into diminishing returns territory. That being said, if you have some substantiated claims that bigger buffers are useful on some I/O patterns and/or OS, there's no reason why the buffer can't be made larger in a future release.
As for you performance issues, as already mentioned, unformatted is a lot faster than formatted I/O. Additionally, gfortran has rather high per-IO-statement overhead. You can amortize that by writing arrays (or, array sections) rather than individual elements (this matters mostly for unformatted, for formatted IO there is so much to do that this doesn't help that much).
I am thinking that if cost of IO is comparable or even larger than the effort of simulation, then it probably isn't such a good idea to store all these data to disk the first place. It is better to do whatever processing you intend to do directly during the simulation, instead of saving lots of intermediate data them later read them in again to do the processing.
Moreover, MD is an inherently highly parallelizable problem, and with IO you will severely cripple the efficiency of parallelization! I would avoid IO whenever possible.
For individual trajectories, normally you just need to store the initial condition of each trajectory, along with its key statistics, or important snapshots at a small number of time values. When you need one specific trajectory plotted you can regenerate the exact same trajectory or section of trajectory from the initial condition or the closest snapshot, and with similar cost as reading it from the disk.

What would be a good (de)compression routine for this scenario

I need a FAST decompression routine optimized for restricted resource environment like embedded systems on binary (hex data) that has following characteristics:
Data is 8bit (byte) oriented (data bus is 8 bits wide).
Byte values do NOT range uniformly from 0 - 0xFF, but have a poisson distribution (bell curve) in each DataSet.
Dataset is fixed in advanced (to be burnt into Flash) and each set is rarely > 1 - 2MB
Compression can take as much as time required, but decompression of a byte should take 23uS in the worst case scenario with minimal memory footprint as it will be done on a restricted resource environment like an embedded system (3Mhz - 12Mhz core, 2k byte RAM).
What would be a good decompression routine?
The basic Run-length encoding seems too wasteful - I can immediately see that adding a header setion to the compressed data to put to use unused byte values to represent oft repeated patterns would give phenomenal performance!
With me who only invested a few minutes, surely there must already exist much better algorithms from people who love this stuff?
I would like to have some "ready to go" examples to try out on a PC so that I can compare the performance vis-a-vis a basic RLE.
The two solutions I use when performance is the only concern:
LZO Has a GPL License.
liblzf Has a BSD License.
miniLZO.tar.gz This is LZO, just repacked in to a 'minified' version that is better suited to embedded development.
Both are extremely fast when decompressing. I've found that LZO will create slightly smaller compressed data than liblzf in most cases. You'll need to do your own benchmarks for speeds, but I consider them to be "essentially equal". Both are light-years faster than zlib, though neither compresses as well (as you would expect).
LZO, in particular miniLZO, and liblzf are both excellent for embedded targets.
If you have a preset distribution of values that means the propability of each value is fixed over all datasets, you can create a huffman encoding with fixed codes (the code tree has not to be embedded into the data).
Depending on the data, I'd try huffman with fixed codes or lz77 (see links of Brian).
Well, the main two algorithms that come to mind are Huffman and LZ.
The first basically just creates a dictionary. If you restrict the dictionary's size sufficiently, it should be pretty fast...but don't expect very good compression.
The latter works by adding back-references to repeating portions of output file. This probably would take very little memory to run, except that you would need to either use file i/o to read the back-references or store a chunk of the recently read data in RAM.
I suspect LZ is your best option, if the repeated sections tend to be close to one another. Huffman works by having a dictionary of often repeated elements, as you mentioned.
Since this seems to be audio, I'd look at either differential PCM or ADPCM, or something similar, which will reduce it to 4 bits/sample without much loss in quality.
With the most basic differential PCM implementation, you just store a 4 bit signed difference between the current sample and an accumulator, and add that difference to the accumulator and move to the next sample. If the difference it outside of [-8,7], you have to clamp the value and it may take several samples for the accumulator to catch up. Decoding is very fast using almost no memory, just adding each value to the accumulator and outputting the accumulator as the next sample.
A small improvement over basic DPCM to help the accumulator catch up faster when the signal gets louder and higher pitch is to use a lookup table to decode the 4 bit values to a larger non-linear range, where they're still 1 apart near zero, but increase at larger increments toward the limits. And/or you could reserve one of the values to toggle a multiplier. Deciding when to use it up to the encoder. With these improvements, you can either achieve better quality or get away with 3 bits per sample instead of 4.
If your device has a non-linear μ-law or A-law ADC, you can get quality comparable to 11-12 bit with 8 bit samples. Or you can probably do it yourself in your decoder. http://en.wikipedia.org/wiki/M-law_algorithm
There might be inexpensive chips out there that already do all this for you, depending on what you're making. I haven't looked into any.
You should try different compression algorithms with either a compression software tool with command line switches or a compression library where you can try out different algorithms.
Use typical data for your application.
Then you know which algorithm is best-fitting for your needs.
I have used zlib in embedded systems for a bootloader that decompresses the application image to RAM on start-up. The licence is nicely permissive, no GPL nonsense. It does make a single malloc call, but in my case I simply replaced this with a stub that returned a pointer to a static block, and a corresponding free() stub. I did this by monitoring its memory allocation usage to get the size right. If your system can support dynamic memory allocation, then it is much simpler.
http://www.zlib.net/

On Disk Substring index

I have a file (fasta file to be specific) that I would like to index, so that I can quickly locate any substring within the file and then find the location within the original fasta file.
This would be easy to do in many cases, using a Trie or substring array, unfortunately the strings I need to index are 800+ MBs which means that doing them in memory in unacceptable, so I'm looking for a reasonable way to create this index on disk, with minimal memory usage.
(edit for clarification)
I am only interested in the headers of proteins, so for the largest database I'm interested in, this is about 800 MBs of text.
I would like to be able to find an exact substring within O(N) time based on the input string. This must be useable on 32 bit machines as it will be shipped to random people, who are not expected to have 64 bit machines.
I want to be able to index against any word break within a line, to the end of the line (though lines can be several MBs long).
Hopefully this clarifies what is needed and why the current solutions given are not illuminating.
I should also add that this needs to be done from within java, and must be done on client computers on various operating systems, so I can't use any OS Specific solution, and it must be a programatic solution.
In some languages programmers have access to "direct byte arrays" or "memory maps", which are provided by the OS. In java we have java.nio.MappedByteBuffer. This allows one to work with the data as if it were a byte array in memory, when in fact it is on the disk. The size of the file one can work with is only limited by the OS's virtual memory capabilities, and is typically ~<4GB for 32-bit computers. 64-bit? In theory 16 exabytes (17.2 billion GBs), but I think modern CPUs are limited to a 40-bit (1TB) or 48-bit (128TB) address space.
This would let you easily work with the one big file.
The FASTA file format is very sparse. The first thing I would do is generate a compact binary format, and index that - it should be maybe 20-30% the size of your current file, and the process for coding/decoding the data should be fast enough (even with 4GB) that it won't be an issue.
At that point, your file should fit within memory, even on a 32 bit machine. Let the OS page it, or make a ramdisk if you want to be certain it's all in memory.
Keep in mind that memory is only around $30 a GB (and getting cheaper) so if you have a 64 bit OS then you can even deal with the complete file in memory without encoding it into a more compact format.
Good luck!
-Adam
I talked to a few co-workers and they just use VIM/Grep to search when they need to. Most of the time I wouldn't expect someone to search for a substring like this though.
But I don't see why MS Desktop search or spotlight or google's equivalent can't help you here.
My recommendation is splitting the file up --by gene or species, hopefully the input sequences aren't interleaved.
I don't imagine that the original poster still has this problem, but anyone needing FASTA file indexing and subsequence extraction should check out fastahack: http://github.com/ekg/fastahack
It uses an index file to count newlines and sequence start offsets. Once the index is generated you can rapidly extract subsequences; the extraction is driven by fseek64.
It will work very, very well in the case that your sequences are as long as the poster's. However, if you have many thousands or millions of sequences in your FASTA file (as is the case with the outputs from short-read sequencing or some de novo assemblies) you will want to use another solution, such as a disk-backed key-value store.