Data Encoding to create handle to a population collection - serialization

A fictional Broadway show has 3 shows every Saturday. Tickets are valid for a particular show and enumerated seat. The process of encoding the showtime and serially enumerated seat number defines a unique ticket. Ticket are encoded with a barcode comprising said data to measure attendance.
Is the process of encoding the notion of a time and seat an example serialization? enumeration? If not, what should it be called?

Serialization has to me two meanings:
1. Some operation is serialized, i.e. executed sequentially, in contrast to concurrently.
2. Some data structure is transformed in a format that can be stored e.g. in a file.
Enumeration means to me
3. a data type for variables that can accept a finite number of values.
Encoding the notion of a time and seat is thus for me an example of 2., since one had only to store the date, the show, and the seat, which is unique.
But it is also an example of 3., assuming the following:
An enum is encoded as 64 bits, i.e. the maximum is 2^64, i.e roughly 10^21. The question is now, how long the earth will live.
One assumes that our sun will become a red giant in about 5 billion years, and extinct all life on earth. Thus, 5 billion years = 5 * 10^9 years = 52 weeks / year * 5 * 10^9 years, which is about 250 * 10^9 saturdays.
With 3 shows / saturdays this is about 10^12 shows until the end of the earth.
But the enum could encode 10^21 shows, so no problem for encoding...

Related

How to calculate drainage density of a basin using Arc-GIS?

In theory clasees, we all have learnt about drainage density (DD), which is the ratio of 'total length of stream network in a basin (L)' to the 'basin area(A)'. However, when, we are trying to determine DD of some Indian river basins using Arc-GIS, we are facing some confusions. For instance,
We have to define some particular break value in Arc-GIS to get streams of differnt orders. Lesser the break value, higher is the stream order, therefore resulting in a higher value of L, and vice versa. For example, when we set the breakvalue as 500 for a particular basin, we are obatining a stream order upto 7, but if we increase the break value to 1500, the max. stream order reduces to 5 - automatically the L value also reduces. Thus, the same baisn may yeild two differnt DD values, under two aforesaid considertions of break values.
We also fixed the theoretically least possible break value , i.e., >0 and obtained a extremely dense stream network with high DD value.
So, my question is, what should be the threshold break value for a particular basin in order to get the DD?
From literature, we found that, there are five classes of DD with the following value ranges (km/km2), i.e., very coarse (<2), coarse (2-4), moderate (4-6), fine (6-8), and very fine (>8). However, for 20 river basins across differnt parts of India, we obtained DD values ranging between 1.03 to 1.29, which makes all those basins fall under very coarse category. But, from our visual inspection (one sample bain attached below), it seems to be very less to us.
We want some justification/ clarification/ comment on it.

Solitaire: storing guaranteed wins cheaply

Given a list of deals of Klondike Solitaire that are known to win, is there a way to store a reasonable amount of deals (say 10,000+) in a reasonable amount of space (say 5MB) to retrieve on command? (These numbers are arbitrary)
I thought of using a pseudo random generator where a given seed would generate a decimal string of numbers, where each two digits represents a card, and the index represents the location of the deal. In this case, you would only have to store the seed and the PRG code.
The only cons I can think of would be that A) the number of possible deals is 52!, and so the number of possible seeds would be at least 52!, and would be monstrous to store in the higher number range, and B) the generated number can't repeat a two digit number (though they can be ignored in the deck construction)
Given no prior information, the theoretical limit on how compactly you can represent an ordered deck of cards is 226 bits. Even the simple naive 6-bits-per card is only 312 bits, so you probably won't gain much by being clever.
If you're willing to sacrifice a large part of the state-space, you could use a 32- or 64-bit PRNG to generate the decks, and then you could reproduce them from the 32- or 64-bit initial PRNG state. But that limits you to 2^64 different decks out of the possible 2^225+.
If you are asking hypothetically, I would say that you would need at least 3.12 MB to store 10,000 possible deals. You need 6 bits to represent each card (assuming you number them 1-52) and then you would need to order them so 6 * 52 = 312. Take that and multiply it by the number of deals 312 * 10,000 and you get 3,120,000 bits or 3.12 MB.

Print a number in decimal

Well, it is a low-level question
Suppose I store a number (of course computer store number in binary format)
How can I print it in decimal format. It is obvious in high-level program, just print it and the library does it for you.
But how about in a very low-level situation where I don't have this library.
I can just tell what 'character' to output. How to convert the number into decimal characters?
I hope you understand my question. Thank You.
There are two ways of printing decimals - on CPUs with division/remainder instructions (modern CPUs are like that) and on CPUs where division is relatively slow (8-bit CPUs of 20+ years ago).
The first method is simple: int-divide the number by ten, and store the sequence of remainders in an array. Once you divided the number all the way to zero, start printing remainders starting from the back, adding the ASCII code of zero ('0') to each remainder.
The second method relies on the lookup table of powers of ten. You define an array of numbers like this:
int pow10 = {10000,1000,100,10,1}
Then you start with the largest power, and see if you can subtract it from the number at hand. If you can, keep subtracting it, and keep the count. Once you cannot subtract it without going negative, print the count plus the ASCII code of zero, and move on to the next smaller power of ten.
If integer, divide by ten, get both the result and the remainder. Repeat the process on the result until zero. The remainders will give you decimal digits from right to left. Add 48 for ASCII representation.
Basically, you want to tranform a number (stored in some arbitrary internal representation) into its decimal representation. You can do this with a few simple mathematical operations. Let's assume that we have a positive number, say 1234.
number mod 10 gives you a value between 0 and 9 (4 in our example), which you can map to a character¹. This is the rightmost digit.
Divide by 10, discarding the remainder (an operation commonly called "integer division"): 1234 → 123.
number mod 10 now yields 3, the second-to-rightmost digit.
continue until number is zero.
Footnotes:
¹ This can be done with a simple switch statement with 10 cases. Of course, if your character set has the characters 0..9 in consecutive order (like ASCII), '0' + number suffices.
It doesnt matter what the number system is, decimal, binary, octal. Say I have the decimal value 123 on a decimal computer, I would still need to convert that value to three characters to display them. Lets assume ASCII format. By looking at an ASCII table we know the answer we are looking for, 0x31,0x32,0x33.
If you divide 123 by 10 using integer math you get 12. Multiply 12*10 you get 120, the difference is 3, your least significant digit. we go back to the 12 and divide that by 10, giving a 1. 1 times 10 is 10, 12-10 is 2 our next digit. we take the 1 that is left over divide by 10 and get zero we know we are now done. the digits we found in order are 3, 2, 1. reverse the order 1, 2, 3. Add or OR 0x30 to each to convert them from integers to ascii.
change that to use a variable instead of 123 and use any numbering system you like so long as it has enough digits to do this kind of work
You can go the other way too, divide by 100...000, whatever the largest decimal you can store or intend to find, and work your way down. In this case the first non zero comes with a divide by 100 giving a 1. save the 1. 1 times 100 = 100, 123-100 = 23. now divide by 10, this gives a 2, save the 2, 2 times 10 is 20. 23 - 20 = 3. when you get to divide by 1 you are done save that value as your ones digit.
here is another given a number of seconds to convert to say hours and minutes and seconds, you can divide by 60, save the result a, subtract the original number - (a*60) giving your remainder which is seconds, save that. now take a and divide by 60, save that as b, this is your number of hours. subtract a - (b*60) this is the remainder which is minutes save that. done hours, minutes seconds. you can then divide the hours by 24 to get days if you want and days and then that by 7 if you want weeks.
A comment about divide instructions was brought up. Divides are very expensive and most processors do not have one. Expensive in that the divide, in a single clock, costs you gates and power. If you do the divide in many clocks you might as well just do a software divide and save the gates. Same reason most processors dont have an fpu, gates and power. (gates mean larger chips, more expensive chips, lower yield, etc). It is not a case of modern or old or 64 bit vs 8 bit or anything like that it is an engineering and business trade off. the 8088/86 has a divide with a remainder for example (it also has a bcd add). The gates/size if used might be better served than for a single instruction. Multiply falls into that category, not as bad but can be. If operand sizes are not done right you can make either instruction (family) not as useful to a programmer. Which brings up another point, I cant find the link right now but a way to avoid divides but convert from a number to a string of decimal digits is that you can multiply by .1 using fixed point. I also cant find the quote about real programmers not needing floating point related to keeping track of the decimal point yourself. its the slide rule vs calculator thing. I believe the link to the article on dividing by 10 using a multiply is somewhere on stack overflow.

Class Design (UML Class Diagram)

Could somebody please give their input into the following scenario:
I'm creating a math quiz system. I need to generate several math problems and show them in the screen. There are many of them. Like:
Times tables: 9 X 9 = ____
Addition Subtraction, Multiplication and division of Integers: 3901 + 22 = ____
Comparing Integers (<, >, =): 37 ____ -24
Convert Decimal to fraction: 0.75 = ____ (fraction)
Convert fraction to decimal: 3/4 (fraction) = ____ (decimal)
It will generate many problems as you can see above. There are many types of them. As the questions it will be different for each student (the random's seed always be the same for the student to generate always the test), I need to store his answers, but I don't know what type I can store the data, because some are doubles, others integers, others like fraction I need to store two integers, in comparison it is a char.
And none of the problems must be repeated.
The student is going to answer the examen question by question, it'll have X time to answer the problem, and the problems are organized by subject (check below the scan image).
The problem is how to model it, all of them are very differents, if they must be in a collection class. Sorry, I'm a little lost.
First of all you should create domain model. All required data are presented in your question.

How can one compute the optimal parameters to a start-step-stop coding scheme?

A start-step-stop code is a data compression technique that is used to compress number that are relatively small.
The code works as follows: It has three parameters, start, step and stop. Start determines the amount of bits used to compute the first few numbers. Step determines how many bits to add to the encoding when we run out and stop determines the maximum amount of bits used to encode a number.
So the length of an encoding is given by l = start + step * i.
The "i" value of a particular code is encoded using unary. That is, a number of 1 bits followed by a terminating 0 bit. If we have reached stop then we can drop the terminating 0 bit. If i is zero we only write out the 0 bit.
So a (1, 2, 5) start-step-stop code would work as follows:
Value 0, encoded as: 0 0
Value 1, encoded as: 0 1
Value 2, encoded as: 10 000
Value 9, encoded as: 10 111
Value 10, encoded as: 11 00000
Value 41, encoded as: 11 11111
So, given a file containing several numbers, how can we compute the optimal start-step-stop codes for that file? The optimal parameters are defined as those that will result in the greatest compression ratio.
These "start-step-stop" codes looks like a different way of calling Huffman codes. See the basic technique for an outline of the pseudo-code for calculating them.
Essentially this is what the algorithm does:
Before you start the Huffman encoding you need to gather the statistics of each symbol you'll be compressing (Their total frequency in the file to compress).
After you have that you create a binary tree using that info such that the most frequently used symbols are at the top of the tree (and thus use less bits) and such that no encoding has a prefix code. Since if an encoding has a common prefix there could be ambiguities decompressing.
At the end of the Huffman encoding your start value will be depth of the shallowest leaf node, your step will always be 1 (logically this makes sense, why would you force more bits than you need, just add one at a time,) and your stop value will be the depth of the deepest leaf node.
If the frequency stats aren't sorted it will take O(nlog n) to do, if they are sorted by frequency it can be done in O(n).
Huffman codes are guaranteed to have the best average compression for this type of encoding:
Huffman was able to design the most
efficient compression method of this
type: no other mapping of individual
source symbols to unique strings of
bits will produce a smaller average
output size when the actual symbol
frequencies agree with those used to
create the code.
This should help you implement the ideal solution to your problem.
Edit: Though similar, this isn't what the OP was looking for.
This academic paper by the creator of these codes describes a generalization of start-step-stop codes, start-stop codes. However, the author briefly describes how to get optimal start-step-stop near the end of section 2. It involves using a statistical random variable, or brute-force funding the best combination. Without any prior knowledge of the file the algorithm is O((log n)^3).
Hope this helps.
The approach I used was a simple brute force solution. The algorithm followed these basic steps:
Count the frequency of each number in the file. In the same pass, compute the total amount of numbers in the file and determine the greatest number as maxNumber.
Compute the probability of each number as its frequency divided by the total amount of numbers in the file.
Determine "optimalStop" as equal to log2(maxNumber). This is the ideal number of bits that should be used to represent maxNumber as in Shannon information theory and therefore a reasonable estimate of the optimal maximum amount of bits used in the encoding of a particular number.
For every "start" value from 1 to "optimalStop" repeat step 5 - 7:
For every "step" value from 1 to ("optimalStop" - "start") / 2, repeat step 6 & 7:
Calculate the "stop" value closest to "optimalStop" that satisfies stop = start + step * i for some integer i.
Compute the average number of bits that would be used by this encoding. This can be calculated as each number's probability multiplied by its bit length in the given encoding.
Pick the encoding with the lowest average number of bits.