Abstract Binary Search - binary-search

In the article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binarySearch,
he says
Careful readers may note that binary search can also be used when a
predicate yields a series of yes answers followed by a series of no
answers. This is true and complementing that predicate will satisfy
the original condition. For simplicity we'll deal only with predicates
described in the theorem.
I couldn't get what he meant could some one please explain?
Thanks

Imagine you're doing a binary search on a set of numbers: for the search to work, you need to put the numbers in order, so that the question "is this number less than the number I'm searching for" gives yeses followed by nos.
Example: searching for number 8 in the sequence [1,1,2,3,5,8,13,21]
is 1 less than 8 ? "yes"
is 1 less than 8 ? "yes"
is 2 less than 8 ? "yes"
is 3 less than 8 ? "yes"
is 5 less than 8 ? "yes"
is 8 less than 8 ? "no"
is 13 less than 8 ? "no"
is 21 less than 8 ? "no"
This means if you looked at, say the middle number in the sequence, you could tell instantly whether your target number was before or after this mid-point (if you get a 'no' look before, if you get a 'yes' look after). You can then exclude the unwanted half of the series and repeat the process with the remaining half...
This way of halving the search field at each step is the key to binary search, and guarantees you will find the target in O(log n) time.
Looking at the second part of your paragraph:
complementing that predicate will satisfy the original condition
To complement the predicate means to swap 'yes' and 'no', which would give us 'a series of no answers followed by a series of yes answers', which is referred to in the previous paragraph (the original condition).
So in summary, your quote is saying 'yes followed by no' will work equally as well as 'no followed by yes'

He is talking about formal logic, and using terms from formal logic.
Directly from the article:
"Behind the cryptic mathematics I am really stating that if you had a yes or no question (the predicate), getting a yes answer for some potential solution x means that you'd also get a yes answer for any element after x. Similarly, if you got a no answer, you'd get a no answer for any element before x. As a consequence, if you were to ask the question for each element in the search space (in order), you would get a series of no answers followed by a series of yes answers."
I think you're going to need to use some elbow grease and buff up on those terms. I'm not sure which part you're having trouble with.

Related

Could you explain how to convert from lz77 to huffman?

Could you explain how to convert from lz77 to huffman on the example in the below picture?
Easy:
In the first step your output is essentially 3 numbers:
prev index
number of characters to repeat
next character (be it ascii or unicode)
The algorithm demands that you specify a sliding window up front. That means you know how big (1) and (2) can be at most.
In other words, you know how many bits (1) and (2) will take up.
Since (3) is essentially also a character from a fixed length alphabet, you also know the bit-length of (3)
That means it's safe to simply concatenate them.
So, the output of the first algorithm can be thought of as outputting a bit-sequence, where every item in the sequence has a fixed length.
That's ideal for applying huffman.
Of course the specifics are not mentioned, and you can choose from a lot of options.
normalized huffman table
1 on left-branch vs 0 on left-branch
priorities when merging items of similar count
etc
So I can not readily explain the exact output values you are showing.
But I hope I can at least explain how to get from A to B.
You can't. The coding shown is, well, figurative. Not literal. The symbols A, B, and C are all coded to the single bit 0. Obviously that's not going to be very helpful on the decoding end.

Text questions with multiple choices randomisation

I have a text file that contains over 11 thousand multiple choice and matching questions. The questions have different sizes, besides having different number of given choices. The following block is a sample of matching question with five given choices taken from that text file:
Type: MT
1) Can you match each of these cities to their location? Drag the cities on the right to match them with the locations on the left.
~ Correct. You got all these matches correct.
# Incorrect. You got some of these wrong.
a. North = Turin
b. Center = Rome
c. South = Naples
d. Sicily = Palermo
e. Sardinia = Cagliari
Before processing this file into a HTML generating engine, I need to shuffle all those questions, i.e. to randomly change the position of each question as a block in the file, so the final product will be extremely unpredictable. Each question number (as mentioned under Type:) is insignificant.
I found a Word vba code at this link, but it does need lots of expert alterations to accommodate variant sizes of questions.
Expert assistance in this matter is deeply appreciated. Thanks in advance.
First, I agree with Tim Williams in the comments above that this is not exactly the level of specificity that is expected in a StackOverflow posting.
That said, if I were you, I would break this question down into two components.
First - figure out if there is a text string that can be used to identify the blocks that constitute the "question." For example, if each question starts with "Type:", then you can find the first instance of this in the file, then find the second, and everything between them constitutes a "question". Then, you can place that question in an array.
Second - randomize the array. There are probably a ton of ways to do this. One might be to use a randbetween function between 0 and the length of the array of questions twice, and switch the questions for each of the random numbers. Then, repeat that a number of times relative to the total number of items in the array (for example, if you have 100 questions, perform the "switch" 125 times to sufficiently randomize the output. Then print the array back to the original file.
For the approach above, you need some delimiter in your file (I assumed the delimiter was "Type:") to break the questions above. If a delimiter like this doesn't exist, you may need some more complicated logic.

what must the minimum number of bits in each word of the Little Man computer be?

I came across the follow question while reading a CS book, can someone please explain it to me? >"The Little Man computer can have ten operation codes (0-9) and address 100 words of storage (0-99). If binary numbers are to replace decimal numbers, what must the minimum number of bits in each word of the LMC be?"
Since you need to be able to distinguish 10 codes for operation, the minimum word size would have to be 4 bits. Using 4 bits, you can represent up to 2^4 = 16 possible codes (since each bit can be 0 or 1). Anything less (2^3 = 8) will not allow a separate binary number for each code.
The Little Man Computer is an architecture where one instruction is held in one word, therefore a word has to contain both the op code and the address. That means you have to hold 000 to 999 so my answer would be 10 bits. You could assume that the question implies the op code and address in separate fields - in that case you need 4 bits for the op code and 7 bits for the address making 11 in total.
Note that the LMC has a "jump if greater than or equal to zero" instruction and for this to mean anything you must be able to represent negative numbers - so that implies that memory has a sign bit. My own simulation allows -999 to +999 as numbers in memory.

Problem 98 - Project Euler

The problem is as follows:
By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 36^(2). What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 96^(2). We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.
Using words.txt (right click and 'Save Link/Target As...'), a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).
What is the largest square number formed by any member of such a pair?
NOTE: All anagrams formed must be contained in the given text file.
I don't understand the mapping of CARE to 1296? How does that work? or are all permutation mappings meant to be tried i.e. all letters to 1-9?
All assignments of digits to letters are allowed. So C=1, A=2, R=3, E=4 would be a possible assignment ... except that 1234 is not a square, so that would be no good.
Maybe another example would help make it clear? If we assign A=6, E=5, T=2, then TEA = 256 = 16² and EAT = 625 = 25². So (TEA=256, EAT=625) is a square anagram word pair.
(Just because all assignments of digits to letters are allowed, does not mean that actually trying out all such assignments is the best way to solve the problem. There may be some other, cleverer, way to do it.)
In short: yes, all permutations need to be tried.
If you test all substitutions letter for digit, than you are looking for pairs of squares with properties:
have same length
have same digits with number of occurrences as in input string.
It is faster to find all these pairs of squares. There are 68 squares with length 4, 216 squares with length 5, ... Filtering all squares of same length by upper properties will generate 'small' number of pairs, which are solutions you are looking for.
These data is 'static', and doesn't depend on input strings. It can be calculated once and used for all input strings.
Hmm. How to put this. The people who put together Project Euler promise that there is a solution that is under one minute for every problem, and there is only one problem that I think might fail this promise, but this is not it.
Yes, you could permute the digits, and try all permutations against all squares, but that would be a very large search space, not at all likely to be the (TM) right thing. In general, when you see that your "look" at the problem is going to generate a search that will take too long, you need to search something else.
Like, suppose you were asked to determine what numbers would be the result of multiplying two primes between 1 and a zillion. You could factor every number between 1 and a zillion, but it might be faster to take all combinations of two primes and multiply them. Since you are looking at combinations, you can start with two and go until your results are too large, then do the same with three, etc. By comparison, this should be much faster - and you don't have to multiply all the numbers out, you could take logs of all the primes and then just add them and find the limit for every prime, giving you a list of numbers you could add up.
There are a bunch of innovative solutions, but the first one you think of - especially the one you think of when Project Euler describes the problem, is likely to be wrong.
So, how can you approach this problem? There are probably too many permutations to look at, but maybe you can figure out something with mappings and comparing mappings?
(Trying to avoid giving it all away.)

Storage algorithm question - verify sequential data with little memory

I found this on an "interview questions" site and have been pondering it for a couple of days. I will keep churning, but am interested what you guys think
"10 Gbytes of 32-bit numbers on a magnetic tape, all there from 0 to 10G in random order. You have 64 32 bit words of memory available: design an algorithm to check that each number from 0 to 10G occurs once and only once on the tape, with minimum passes of the tape by a read head connected to your algorithm."
32-bit numbers can take 4G = 2^32 different values. There are 2.5*2^32 numbers on tape total. So after 2^32 count one of numbers will repeat 100%. If there were <= 2^32 numbers on tape then it was possible that there are two different cases – when all numbers are different or when at least one repeats.
It's a trick question, as Michael Anderson and I have figured out. You can't store 10G 32b numbers on a 10G tape. The interviewer (a) is messing with you and (b) is trying to find out how much you think about a problem before you start solving it.
The utterly naive algorithm, which takes as many passes as there are numbers to check, would be to walk through and verify that the lowest number is there. Then do it again checking that the next lowest is there. And so on.
This requires one word of storage to keep track of where you are - you could cut down the number of passes by a factor of 64 by using all 64 words to keep track of where you're up to in several different locations in the search space - checking all of your current ones on each pass. Still O(n) passes, of course.
You could probably cut it down even more by using portions of the words - given that your search space for each segment is smaller, you won't need to keep track of the full 32-bit range.
Perform an in-place mergesort or quicksort, using tape for storage? Then iterate through the numbers in sequence, tracking to see that each number = previous+1.
Requires cleverly implemented sort, and is fairly slow, but achieves the goal I believe.
Edit: oh bugger, it's never specified you can write.
Here's a second approach: scan through trying to build up to 30-ish ranges of contiginous numbers. IE 1,2,3,4,5 would be one range, 8,9,10,11,12 would be another, etc. If ranges overlap with existing, then they are merged. I think you only need to make a limited number of passes to either get the complete range or prove there are gaps... much less than just scanning through in blocks of a couple thousand to see if all digits are present.
It'll take me a bit to prove or disprove the limits for this though.
Do 2 reduces on the numbers, a sum and a bitwise XOR.
The sum should be (10G + 1) * 10G / 2
The XOR should be ... something
It looks like there is a catch in the question that no one has talked about so far; the interviewer has only asked the interviewee to write a program that CHECKS
(i) if each number that makes up the 10G is present once and only once--- what should the interviewee do if the numbers in the given list are present multple times? should he assume that he should stop execting the programme and throw exception or should he assume that he should correct the mistake by removing the repeating number and replace it with another (this may actually be a costly excercise as this involves complete reshuffle of the number set)? correcting this is required to perform the second step in the question, i.e. to verify that the data is stored in the best possible way that it requires least possible passes.
(ii) When the interviewee was asked to only check if the 10G weight data set of numbers are stored in such a way that they require least paases to access any of those numbers;
what should the interviewee do? should he stop and throw exception the moment he finds an issue in the algorithm they were stored in, or correct the mistake and continue till all the elements are sorted in the order of least possible passes?
If the intension of the interviewer is to ask the interviewee to write an algorithm that finds the best combinaton of numbers that can be stored in 10GB, given 64 32 Bit registers; and also to write an algorithm to save these chosen set of numbers in the best possible way that require least number of passes to access each; he should have asked this directly, woudn't he?
I suppose the intension of the interviewer may be to only see how the interviewee is approaching the problem rather than to actually extract a working solution from the interviewee; wold any buy this notion?
Regards,
Samba