NSNumber how to get smallest common denominator? Like 3/8 for 0.375? - objective-c

Say if I have an NSNumber, which is something between 0 and 1, and it can be represented using X/Y, how do I calculate the X and Y in this case? I don't want to compare:
if (number.doubleValue == 0.125)
{
X = 1;
Y = 8;
}
so I get 1/8 for 0.125

That's relatively straightforward. For example, 0.375 is equivalent to 0.375/1.
First step is to multiply numerator and denominator until the numerator is an integral value (a), giving you 375/1000.
Then find the greatest common divisor and divide both numerator and denominator by that.
A (recursive) function for GCD is:
int gcd (int a, int b) {
return (b == 0) ? a : gcd (b, a%b);
}
If you call that with 375 and 1000, it will spit out 125 so that, when you divide the numerator and denominator by that, you get 3/8.
(a) As pointed out in the comments, there may be problems with numbers that have more precision bits than your integer types (such as IEEE754 doubles with 32-bit integers). You can solve this by choosing integers with a larger range (longs, or a bignum library like MPIR) or choosing a "close-enough" strategy (consider it an integer when the fractional part is relatively insignificant compared to the integral part).
Another issue is the fact that some numbers don't even exist in IEEE754, such as the infamous 0.1 and 0.3.
Unless a number can be represented as the sum of 2-n values where n is limited by the available precision (such as 0.375 being 1/4 + 1/8), the best you can hope for is an approximation.
Example, consider the single-precision (you'll see why below, I'm too lazy to do the whole 64 bits) 1/3. As a single precision value, this is stored as:
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
0 01111101 01010101010101010101010
In this example, the sign is 0 hence it's a positive number.
The exponent bits give 125 which, when you subtract the 127 bias, gives you -2. Hence the multiplier will be 2-2, or 0.25.
The mantissa bits are a little trickier. They form the sum of an explicit 1 along with all the 2-n values for the 1 bits, where n is 1 through 23 (left to right. So the mantissa is calculated thus:
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
0 01111101 01010101010101010101010
| | | | | | | | | | |
| | | | | | | | | | +-- 0.0000002384185791015625
| | | | | | | | | +---- 0.00000095367431640625
| | | | | | | | +------ 0.000003814697265625
| | | | | | | +-------- 0.0000152587890625
| | | | | | +---------- 0.00006103515625
| | | | | +------------ 0.000244140625
| | | | +-------------- 0.0009765625
| | | +---------------- 0.00390625
| | +------------------ 0.015625
| +-------------------- 0.0625
+---------------------- 0.25
Implicit 1
========================
1.3333332538604736328125
When you multiply that by 0.25 (see exponent earlier), you get:
0.333333313465118408203125
Now that's why they say you only get about 7 decimal digits of precision (15 for IEEE754 double precision).
Were you to pass that actual number through my algorithm above, you would not get 1/3, you would instead get:
5,592,405
---------- (or 0.333333313465118408203125)
16,777,216
But that's not a problem with the algorithm per se, more a limitation of the numbers you can represent.
Thaks to Wolfram Alpha for helping out with the calculations. If you ever need to do any math that stresses out your calculator, that's one of the best tools for the job.
As an aside, you'll no doubt notice the mantissa bits follow a certain pattern: 0101010101.... This is because 1/3 is an infinitely recurring binary value as well as an infinitely recurring decimal one. You would need and infinite number of 01 bits at the end to exactly represent 1/3 exactly.

You can try this:
- (CGPoint)yourXAndYValuesWithANumber:(NSNumber *)number
{
float x = 1.0f;
float y = x/number.doubleValue;
for(int i = 1; TRUE; i++)
{
if((float)(int)(y * i) == y * i)
// Alternatively floor(y * i), instead of (float)(int)(y * i)
{
x *= i;
y *= i;
break;
}
}
/* Also alternatively
int coefficient = 1;
while(floor(y * coefficient) != y * coefficient)coefficient++;
x *= coefficient, y *= coefficient;*/
return CGPointMake(x, y);
}
This will not work if you have invalid input. X and Y will have to exist and be valid natural numbers (1 to infinity). A good example that will break it is 1/pi. If you have limits, you can do some critical thinking to implement them.

The approach outlined by paxdiablo is spot-on.
I just wanted to provide an efficient GCD function (implemented iteratively):
int gcd (int a, int b){
int c;
while ( a != 0 ) {
c = a; a = b%a; b = c;
}
return b;
}
Source.

Related

How to select half precision (BFLOAT16 vs FLOAT16) for your trained model?

how will you decide what precision works best for your inference model? Both BF16 and F16 takes two bytes but they use different number of bits for fraction and exponent.
Range will be different but I am trying to understand why one chose one over other.
Thank you
|--------+------+----------+----------|
| Format | Bits | Exponent | Fraction |
|--------+------+----------+----------|
| FP32 | 32 | 8 | 23 |
| FP16 | 16 | 5 | 10 |
| BF16 | 16 | 8 | 7 |
|--------+------+----------+----------|
Range
bfloat16: ~1.18e-38 … ~3.40e38 with 3 significant decimal digits.
float16: ~5.96e−8 (6.10e−5) … 65504 with 4 significant decimal digits precision.
bfloat16 is generally easier to use, because it works as a drop-in replacement for float32. If your code doesn't create nan/inf numbers or turn a non-0 into a 0 with float32, then it shouldn't do it with bfloat16 either, roughly speaking. So, if your hardware supports it, I'd pick that.
Check out AMP if you choose float16.

Why does two's-complement multiplication need to do sign extension?

In the book Computer Systems A Programmer's Perspective (2.3.5), the method to calculate two's-complement multiplication is described as follows:
Signed multiplication in C generally is performed by truncating the 2w-bit product to w bits.
Truncating a two’s-complement number to w bits is equivalent to first computing its value modulo 2w and then converting from unsigned to two’s-complement.
Thus, for similar bit-level operands, why is unsigned multiplication different from two’s-complement multiplication? Why does two's-complement multiplication need to do sign extension?
To calculate same bit-level representation of unsigned and two’s-complement addition, we can convert the arguments of two’s-complement, then perform unsigned addition, and finally convert back to two’s-complement.
Since multiplication consists of multiple additions, why are the full representations of unsigned and two’s-complement multiplication different?
Figure 2.27 demonstrates the example below:
+------------------+----------+---------+-------------+-----------------+
| Mode | x | y | x · y | Truncated x · y |
+------------------+----------+---------+-------------+-----------------+
| Unsigned | 5 [101] | 3 [011] | 15 [001111] | 7 [111] |
| Two's complement | –3 [101] | 3 [011] | –9 [110111] | –1 [111] |
+------------------+----------+---------+-------------+-----------------+
If you multiply 101 by 011, you will get 1111 (which is equal to 001111). How did they get 110111 for two's complement case then?
The catch here is that to get a correct 6-bit two's complement product you need to multiply 6-bit two's complement numbers. Thus, you need first to convert -3 and 3 to 6-bit two's complement representation: -3 = 111101, 3 = 000011 and only then multiply them 111101 * 000011 = 10110111. You also need to truncate the result to 6 bits to eventually get 110111 from the table above.

Custom Rolling Computation

Assume I have a model that has A(t) and B(t) governed by the following equations:
A(t) = {
WHEN B(t-1) < 10 : B(t-1)
WHEN B(t-1) >=10 : B(t-1) / 6
}
B(t) = A(t) * 2
The following table is provided as input.
SELECT * FROM model ORDER BY t;
| t | A | B |
|---|------|------|
| 0 | 0 | 9 |
| 1 | null | null |
| 2 | null | null |
| 3 | null | null |
| 4 | null | null |
I.e. we know the values of A(t=0) and B(t=0).
For each row, we want to calculate the value of A & B using the equations above.
The final table should be:
| t | A | B |
|---|---|----|
| 0 | 0 | 9 |
| 1 | 9 | 18 |
| 2 | 3 | 6 |
| 3 | 6 | 12 |
| 4 | 2 | 4 |
We've tried using lag, but because of the models' recursive-like nature, we end up only getting A & B at (t=1)
CREATE TEMPORARY FUNCTION A_fn(b_prev FLOAT64) AS (
CASE
WHEN b_prev < 10 THEN b_prev
ELSE b_prev / 6.0
END
);
SELECT
t,
CASE WHEN t = 0 THEN A ELSE A_fn(LAG(B) OVER (ORDER BY t)) END AS A,
CASE WHEN t = 0 THEN B ELSE A_fn(LAG(B) OVER (ORDER BY t)) * 2 END AS B
FROM model
ORDER BY t;
Produces:
| t | A | B |
|---|------|------|
| 0 | 0 | 9 |
| 1 | 9 | 18 |
| 2 | null | null |
| 3 | null | null |
| 4 | null | null |
Each row is dependent on the row above it. It seems it should be possible to compute a single row at a time, while iterating through the rows? Or does BigQuery not support this type of windowing?
If it is not possible, what do you recommend?
Round #1 - starting point
Below is for BigQuery Standard SQL and works (for me) with up to 3M rows
#standardSQL
CREATE TEMP FUNCTION x(v FLOAT64, t INT64)
RETURNS ARRAY<STRUCT<t INT64, v FLOAT64>>
LANGUAGE js AS """
var i, result = [];
for (i = 1; i <= t; i++) {
if (v < 10) {v = 2 * v}
else {v = v / 3};
result.push({t:i, v});
};
return result
""";
SELECT 0 AS t, 0 AS A, 9 AS B UNION ALL
SELECT line.t, line.v / 2, line.v FROM UNNEST(x(9, 3000000)) line
Going above 3M rows produces Resources exceeded during query execution: UDF out of memory.
To overcome this - i think you should just implement it on the client - so no JS UDF Limits are applied. I think it is reasonable "workaround" because looks like anyway you have no really data in BQ and just one starting value (9 in this example). But even if you do have other valuable columns in the table - you can then JOIN produced result back to table ON t value - so should be Ok!
Round #2 - It could be billions ... - so let's take care of scale, parallelization
Below is a little trick to avoid JS UDFs Resource and/or Memory error
So, I was able to run it for 2B rows in one shot!
#standardSQL
CREATE TEMP FUNCTION anchor(seed FLOAT64, len INT64, batch INT64)
RETURNS ARRAY<STRUCT<t INT64, v FLOAT64>> LANGUAGE js AS """
var i, result = [], v = seed;
for (i = 0; i <= len; i++) {
if (v < 10) {v = 2 * v} else {v = v / 3};
if (i % batch == 0) {result.push({t:i + 1, v})};
}; return result
""";
CREATE TEMP FUNCTION x(value FLOAT64, start INT64, len INT64)
RETURNS ARRAY<STRUCT<t INT64, v FLOAT64>>
LANGUAGE js AS """
var i, result = []; result.push({t:0, v:value});
for (i = 1; i < len; i++) {
if (value < 10) {value = 2 * value} else {value = value / 3};
result.push({t:i, v:value});
}; return result
""";
CREATE OR REPLACE TABLE `project.dataset.result` AS
WITH settings AS (SELECT 9 init, 2000000000 len, 1000 batch),
anchors AS (SELECT line.* FROM settings, UNNEST(anchor(init, len, batch)) line)
SELECT 0 AS t, 0 AS A, init AS B FROM settings UNION ALL
SELECT a.t + line.t, line.v / 2, line.v
FROM settings, anchors a, UNNEST(x(v, t, batch)) line
In above query - you "control" initial values in below line
WITH settings AS (SELECT 9 init, 2000000000 len, 1000 batch),
in above example, 9 is initial value, 2,000,000,000 is number of rows to be calculated and 1000 is a batch to process with (this is important one to keep BQ Engine out of throwing Resource and/or Memory error - you cannot make it too big or too small - i feel I got some sense of what it needs to be - but not enough for trying to formulate it)
Some stats (settings - execution time):
1M: SELECT 9 init, 1000000 len, 1000 batch - 0 min 9 sec
10M: SELECT 9 init, 10000000 len, 1000 batch - 0 min 50 sec
100M: SELECT 9 init, 100000000 len, 600 batch - 3 min 4 sec
100M: SELECT 9 init, 100000000 len, 40 batch - 2 min 56 sec
1B: SELECT 9 init, 1000000000 len, 10000 batch - 29 min 39 sec
1B: SELECT 9 init, 1000000000 len, 1000 batch - 27 min 50 sec
2B: SELECT 9 init, 2000000000 len, 1000 batch - 48 min 27 sec
Round #3 - some thoughts and comments
Obviously, as I mentioned in #1 above - this type of calculation is more suited for being implemented on client of your choice - so it is hard for me to judge practical value of above - but I really had fun playing with it! In reality, I had few more cool ideas in mind and also implemented and played with them - but above (in #2) was the most practical/scalable one
Note: The most interesting part of above solution is anchors table. It is very cheap to generate and allows to set anchors in batch-size interval - so having this you can for example calculate value of row = 2,000,035 or 1,123,456,789 (for example) without actually processing all previous rows - and this will take fraction of sec. Or you can parallelize calculation of all rows by starting several threads/calculations using respective anchors, etc. Quite a number of opportunities.
Finally, it really depends on your specific use-case which way to go further - so I am leaving it up to you
It seems it should be possible to compute a single row at a time, while iterating through the rows
Support for Scripting and Stored Procedures is now in beta (as of October 2019)
You can submit multiple statements separated with semi-colons and BigQuery is able to run them now.
So, conceptually your process could look like below script:
DECLARE b_prev FLOAT64 DEFAULT NULL;
DECLARE t INT64 DEFAULT 0;
DECLARE arr ARRAY<STRUCT<t INT64, a FLOAT64, b FLOAT64>> DEFAULT [STRUCT(0, 0.0, 9.0)];
SET b_prev = 9.0 / 2;
LOOP
SET (t, b_prev) = (t + 1, 2 * b_prev);
IF t >= 100 THEN LEAVE;
ELSE
SET b_prev = CASE WHEN b_prev < 10 THEN b_prev ELSE b_prev / 6.0 END;
SET arr = (SELECT ARRAY_CONCAT(arr, [(t, b_prev, 2 * b_prev)]));
END IF;
END LOOP;
SELECT * FROM UNNEST(arr);
Even though above script is simpler and more directly represents logic for non-technical personal and easier to manage - it does not fit in scenarios were you need to loop through more than 100 or more iterations. For example above script took close to 2 min while my original solution for same 100 rows took just 2 sec
But still great for simple / smaller cases

Clarifying what is meant by “complete path coverage”

In class, we were given this static method which we are asked to test. The method is supposed to (but won’t always) return the same integer value that was given as input.
static int identity(int x) {
if (20 <= x && x <= 30) {
x /= 2;
}
if (5 <= x && x <= 15) {
x *= 2;
}
return x;
}
The question asks us to create the minimum set of tests that has “complete path coverage”. Since there are two conditional statements, you would expect to generate 2^n tests, which is 4 in this instance. However, it is impossible to create a test where the first condition is true and the second condition is false. Does this mean that the minimum number of tests that has “complete path coverage” is 3?
From the POV of a tester, I would look at the edges of your ranges to determine your test coverage. You don't want to just make sure that the if statements are executed - you want to check the boundaries of the ranges as well as one input within the range.
Given this, I would test the following inputs and expect the following outputs:
| input | output |
| 19 | 19 | (just outside first boundary minimum)
| 20 | 20 | (just inside first boundary minimum)
| 24 | 24 | (value within the range)
| 30 | 30 | (just inside first boundary maximum)
| 31 | 31 | (just outside first boundary maximum)
| 4 | 4 | (just outside second boundary minimum)
| 5 | 10 | (just inside second boundary minimum)
| 10 | 20 | (value within the range)
| 15 | 30 | (just inside second boundary maximum)
| 16 | 16 | (just outside second boundary maximum)
If you want to read more, google boundary testing.

Give the triple representation of a statement x:= y[i]

Give the triple representation of a statement x:= y[i] I'm having a problem in this one
You should probably be more specific about the context that you need the representation, this book has some good information about compiler design. Here is what it would look like using it's semantics.
| operator | operand1 | operand2
1. | [] | y | i
2. | := | x | (1.)