.NET SQL Computed column [duplicate] - sql

I'm using the chechsum function in sql server 2008 R2 and I would like to get the same int values in a C# app.
Is there any equivalent method in c# that returns the values like the sql checksum function?
Thanx

On SQL Server Forum, at this page, it's stated:
The built-in CHECKUM function in SQL Server is built on a series of 4 bit left rotational xor operations. See this post for more explanation.
I was able to port the BINARY_CHECKSUM to c# and it seems to be working... I'll be looking at the plain CHECKSUM later...
private int SQLBinaryChecksum(string text)
{
long sum = 0;
byte overflow;
for (int i = 0; i < text.Length; i++)
{
sum = (long)((16 * sum) ^ Convert.ToUInt32(text[i]));
overflow = (byte)(sum / 4294967296);
sum = sum - overflow * 4294967296;
sum = sum ^ overflow;
}
if (sum > 2147483647)
sum = sum - 4294967296;
else if (sum >= 32768 && sum <= 65535)
sum = sum - 65536;
else if (sum >= 128 && sum <= 255)
sum = sum - 256;
return (int)sum;
}

CHECKSUM docs don't disclose how it computes the hash. If you want a hash you can use in T-SQL and C#, pick from the algorithms supported in HashBytes

The T-SQL documentation does not specify what algorithm is used by checksum() outside of this:
CHECKSUM computes a hash value, called the checksum, over its list of arguments. The hash value
is intended for use in building hash indexes. If the arguments to CHECKSUM are columns, and an
index is built over the computed CHECKSUM value, the result is a hash index. This can be used for
equality searches over the columns.
It's unlikely to compute an MD5 hash, since its return value (the computed hash) is a 32-bit integer; an MD5 hash is 128 bits in length.

In case you need to do a checksum on a GUID, change dna2's answer to this:
private int SQLBinaryChecksum(byte[] text)
With a byte array, the value from SQL will match the value from C#. To test:
var a = Guid.Parse("DEAA5789-6B51-4EED-B370-36F347A0E8E4").ToByteArray();
Console.WriteLine(SQLBinaryChecksum(a));
vs SQL:
select BINARY_CHECKSUM(CONVERT(uniqueidentifier,'DEAA5789-6B51-4EED-B370-36F347A0E8E4'))
both answers will be -1897092103.

#Dan's implementation of BinaryChecksum can be greatly simplified down in c# down to
int SqlBinaryChecksum(string text)
{
uint accumulator = 0;
for (int i = 0; i < text.Length; i++)
{
var leftRotate4bit = (accumulator << 4) | (accumulator >> -4);
accumulator = leftRotate4bit ^ text[i];
}
return (int)accumulator;
}
This also makes it clearer what the algorithm is doing. For each character, a 4 bit circular shift then an xor with character's byte

Related

Given no modulus or if even/odd function, how would one check for an odd or even number?

I have recently sat a computing exam in university in which we were never taught beforehand about the modulus function or any other check for odd/even function and we have no access to external documentation except our previous lecture notes. Is it possible to do this without these and how?
Bitwise AND (&)
Extract the last bit of the number using the bitwise AND operator. If the last bit is 1, then it's odd, else it's even. This is the simplest and most efficient way of testing it. Examples in some languages:
C / C++ / C#
bool is_even(int value) {
return (value & 1) == 0;
}
Java
public static boolean is_even(int value) {
return (value & 1) == 0;
}
Python
def is_even(value):
return (value & 1) == 0
I assume this is only for integer numbers as the concept of odd/even eludes me for floating point values.
For these integer numbers, the check of the Least Significant Bit (LSB) as proposed by Rotem is the most straightforward method, but there are many other ways to accomplish that.
For example, you could use the integer division operation as a test. This is one of the most basic operation which is implemented in virtually every platform. The result of an integer division is always another integer. For example:
>> x = int64( 13 ) ;
>> x / 2
ans =
7
Here I cast the value 13 as a int64 to make sure MATLAB treats the number as an integer instead of double data type.
Also here the result is actually rounded towards infinity to the next integral value. This is MATLAB specific implementation, other platform might round down but it does not matter for us as the only behavior we look for is the rounding, whichever way it goes. The rounding allow us to define the following behavior:
If a number is even: Dividing it by 2 will produce an exact result, such that if we multiply this result by 2, we obtain the original number.
If a number is odd: Dividing it by 2 will result in a rounded result, such that multiplying it by 2 will yield a different number than the original input.
Now you have the logic worked out, the code is pretty straightforward:
%% sample input
x = int64(42) ;
y = int64(43) ;
%% define the checking function
% uses only multiplication and division operator, no high level function
is_even = #(x) int64(x) == (int64(x)/2)*2 ;
And obvisouly, this will yield:
>> is_even(x)
ans =
1
>> is_even(y)
ans =
0
I found out from a fellow student how to solve this simplistically with maths instead of functions.
Using (-1)^n :
If n is odd then the outcome is -1
If n is even then the outcome is 1
This is some pretty out-of-the-box thinking, but it would be the only way to solve this without previous knowledge of complex functions including mod.

BigQuery UDF using BYTES datatype

I am currently trying to calculate the Hamming distance between two binary strings in BigQuery using User defined functions in Javascript, my schema is quite simple:
row_id STRING
descriptors BYTES REPEATED
phash BYTES
What I am finding a bit confusing is the fact that you apparently deal with BYTES in BigQuery as a Base64 string, I imported both functions atob() and btoa() so I would be able to work with the binary form of the byte strings instead of the Base64 representation:
My Query currently looks like this:
CREATE TEMP FUNCTION f_PHASH_distance(ph1 BYTES, ph2 BYTES)
RETURNS INT64
LANGUAGE js AS
"""
return HammingDistance(ph1, ph2);
"""
OPTIONS (
library=["gs://test.appspot.com/HammingDistance.js",
"gs://test.appspot.com/btoa_atob.js"]
);
SELECT f_PHASH_distance(phash, CAST("9Slp3g9OgVI=" AS BYTES))
FROM ims.images WHERE row_id = "2333USX"
And the row with id = "2333USX" phash is equal to "9Slp3g9OgVI=" in base64, which means that the Hamming distance is 0. But instead of 0 I am currently getting is 35 on BigQuery.
HammingDistance.js has the following content:
function HammingDistance(a, b){
var count = 0;
for(var i = 0; i < a.length; i++){
// calculate XOR between the two chars
var xor = a.charCodeAt(i) ^ b.charCodeAt(i);
// count number of 1's on the result
for(var j = 0; j < 16; j++){
//add if LSB is 1
count += xor % 2;
//right shift the variable
xor = xor >> 1;
}
}
return count;
}
/**
* Calculates the distance between two Perceptual hashes of two images encoded
* in base 64
*/
function PHASHDistance(a, b){
return HammingDistance(atob(a), atob(b));
}
And testing it in the JS console of my browser I do get the expected result. So I assume that I am doing something wrong with the casts but the documentation is very scarce on UDFs with BYTE parameters.
Any help would be much appreciated.
It looks like the problem is that you are casting "9Slp3g9OgVI=" to bytes rather than converting it to bytes from base64. I think you want this instead:
SELECT f_PHASH_distance(phash, FROM_BASE64("9Slp3g9OgVI="))
FROM ims.images WHERE row_id = "2333USX"
You might be better off using SQL functions rather than JavaScript functions, though, since JavaScript normally isn't as fast. Here's a Hamming distance implementation in SQL, assuming that the bytes have equal lengths:
#standardSQL
CREATE TEMP FUNCTION HammingDistance(b1 BYTES, b2 BYTES) AS (
BIT_COUNT(b1 ^ b2)
);
WITH Input AS (
SELECT b'defdef' AS bytes UNION ALL
SELECT b'123de4' UNION ALL
SELECT b'abc123'
)
SELECT HammingDistance(b'abcdef', bytes)
FROM Input;
It takes the bitwise XOR of the two byte values, then checks how many bits are not the same.
In case someone is looking for a solution in the case of comparing regular strings (not binary ones as this question), look at my answer here

Get the most occuring number amongst several integers without using arrays

DISCLAIMER: Rather theoretical question here, not looking for a correct answere, just asking for some inspiration!
Consider this:
A function is called repetitively and returns integers based on seeds (the same seed returns the same integer). Your task is to find out which integer is returned most often. Easy enough, right?
But: You are not allowed to use arrays or fields to store return values of said function!
Example:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
int occurencesOfResult = magic();
if(occurencesOfResult > occurencesOfMostFrequentNumber)
{
mostFrequentNumber = result;
occurencesOfMostFrequentNumber = occurencesOfResult;
}
}
If getNumberFromSeed() returns 2,1,5,18,5,6 and 5 then mostFrequentNumber should be 5 and occurencesOfMostFrequentNumber should be 3 because 5 is returned 3 times.
I know this could easily be solved using a two-dimentional list to store results and occurences. But imagine for a minute that you can not use any kind of arrays, lists, dictionaries etc. (Maybe because the system that is running the code has such a limited memory, that you cannot store enough integers at once or because your prehistoric programming language has no concept of collections).
How would you find mostFrequentNumber and occurencesOfMostFrequentNumber? What does magic() do?? (Of cause you do not have to stick to the example code. Any ideas are welcome!)
EDIT: I should add that the integers returned by getNumber() should be calculated using a seed, so the same seed returns the same integer (i.e. int result = getNumber(5); this would always assign the same value to result)
Make an hypothesis: Assume that the distribution of integers is, e.g., Normal.
Start simple. Have two variables
. N the number of elements read so far
. M1 the average of said elements.
Initialize both variables to 0.
Every time you read a new value x update N to be N + 1 and M1 to be M1 + (x - M1)/N.
At the end M1 will equal the average of all values. If the distribution was Normal this value will have a high frequency.
Now improve the above. Add a third variable:
M2 the average of all (x - M1)^2 for all values of xread so far.
Initialize M2 to 0. Now get a small memory of say 10 elements or so. For every new value x that you read update N and M1 as above and M2 as:
M2 := M2 + (x - M1)^2 * (N - 1) / N
At every step M2 is the variance of the distribution and sqrt(M2) its standard deviation.
As you proceed remember the frequencies of only the values read so far whose distances to M1 are less than sqrt(M2). This requires the use of some additional array, however, the array will be very short compared to the high number of iterations you will run. This modification will allow you to guess better the most frequent value instead of simply answering the mean (or average) as above.
UPDATE
Given that this is about insights for inspiration there is plenty of room for considering and adapting the approach I've proposed to any particular situation. Here are some thoughts
When I say assume that the distribution is Normal you should think of it as: Given that the problem has no solution, let's see if there is some qualitative information I can use to decide what kind of distribution would the data have. Given that the algorithm is intended to find the most frequent number, it should be fine to assume that the distribution is not uniform. Let's try with Normal, LogNormal, etc. to see what can be found out (more on this below.)
If the game completely disallows the use of any array, then fine, keep track of only, say 10 numbers. This would allow you to count the occurrences of the 10 best candidates, which will give more confidence to your answer. In doing this choose your candidates around the theoretical most likely value according to the distribution of your hypothesis.
You cannot use arrays but perhaps you can read the sequence of numbers two or three times, not just once. In that case you can read it once to check whether you hypothesis about its distribution is good nor bad. For instance, if you compute not just the variance but the skewness and the kurtosis you will have more elements to check your hypothesis. For instance, if the first reading indicates that there is some bias, you could use a LogNormal distribution instead, etc.
Finally, in addition to providing the approximate answer you would be able to use the information collected during the reading to estimate an interval of confidence around your answer.
Alright, I found a decent solution myself:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
int maxNumber = -2147483647;
int minNumber = 2147483647;
//Step 1: Find the largest and smallest number that _can_ occur
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result > maxNumber)
{
maxNumber = result;
}
if(result < minNumber)
{
minNumber = result;
}
}
//Step 2: for each possible number between minNumber and maxNumber, count occurences
for(int thisNumber = minNumber; thisNumber <= maxNumber; thisNumber++)
{
int occurenceOfThisNumber = 0;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result == thisNumber)
{
occurenceOfThisNumber++;
}
}
if(occurenceOfThisNumber > occurencesOfMostFrequentNumber)
{
occurencesOfMostFrequentNumber = occurenceOfThisNumber;
mostFrequentNumber = thisNumber;
}
}
I must admit, this may take a long time, depending on the smallest and largest possible. But it will work without using arrays.

How to generate random number

I am working on calculator application. For me all operations are cleared except "rand". Could any one tell me how to generate random number of some number by selecting rand.
For example
initially i select one(1) then rand... if so it has to be displayed random number of one(1).
In objective-C you should use (for between 0 and 1)
int r = arc4random() % 10;
float r2 = r/10
Imagine that you want a number between 0 and 50 with decimals, then you should do:
int r = arc4random()%50*100;
float r2 = r/100;
You will get something like 40.123
NSInteger num = (arc4random() % maxnumber) + 1;
Check the synopsis...
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include
u_int32_t
arc4random(void);
void
arc4random_stir(void);
void
arc4random_addrandom(unsigned char *dat, int datlen);
DESCRIPTION
The arc4random() function uses the key stream generator employed by the arc4 cipher, which uses 8*8 8
bit S-Boxes. The S-Boxes can be in about (2*1700) states. The arc4random() function returns pseudo-
random numbers in the range of 0 to (2*32)-1, and therefore has twice the range of rand(3) and
random(3).
The arc4random_stir() function reads data from /dev/urandom and uses it to permute the S-Boxes via
arc4random_addrandom().
There is no need to call arc4random_stir() before using arc4random(), since arc4random() automatically
initializes itself.
EXAMPLES
The following produces a drop-in replacement for the traditional rand() and random() functions using
arc4random():
#define foo4random() (arc4random() % ((unsigned)RAND_MAX + 1))

Objective C - Random results is either 1 or -1

I am trying randomly generate a positive or negative number and rather then worry about the bigger range I am hoping to randomly generate either 1 or -1 to just multiply by my other random number.
I know this can be done with a longer rule of generating 0 or 1 and then checking return and using that to either multiply by 1 or -1.
Hoping someone knows of an easier way to just randomly set the sign on a number. Trying to keep my code as clean as possible.
I like to use arc4random() because it doesn't require you to seed the random number generator. It also conveniently returns a uint_32_t, so you don't have to worry about the result being between 0 and 1, etc. It'll just give you a random integer.
int myRandom() {
return (arc4random() % 2 ? 1 : -1);
}
If I understand the question correctly, you want a pseudorandom sequence of 1 and -1:
int f(void)
{
return random() & 1 ? 1 : -1;
// or...
// return 2 * (random() & 1) - 1;
// or...
// return ((random() & 1) << 1) - 1;
// or...
// return (random() & 2) - 1; // This one from Chris Lutz
}
Update: Ok, something has been bothering me since I wrote this. One of the frequent weaknesses of common RNGs is that the low order bits can go through short cycles. It's probably best to test a higher-order bit: random() & 0x80000 ? 1 : -1
To generate either 1 or -1 directly, you could do:
int PlusOrMinusOne() {
return (rand() % 2) * 2 - 1
}
But why are you worried about the broader range?
return ( ((arc4random() & 2) * 2) - 1 );
This extra step won't give you any additional "randomness". Just generate your number straight away in the range that you need (e.g. -10..10).
Standard rand() will return a value from this range: 0..1
You can multiply it by a constant to increase the span of the range or you can add a constant to push it left/right on the X-Axis.
E.g. to generate random values from from (-5..10) range you will have:
rand()*15-5
rand will give you a number from 0 to RAND_MAX which will cover every bit in an int except for the sign. By shifting that result left 1 bit you turn the signed MSB into the sign, but have zeroed-out the 0th bit, which you can repopulate with a random bit from another call to rand. The code will look something like:
int my_rand()
{
return (rand() << 1) + (rand() & 1);
}