Max occurrences of consecutive character in a string.(Ruby) - ruby-on-rails-3

How can I find out the max occurrences of consecutive character in a string and return the result as an array in sorted order.
Example:
input = “abcccdddeee”
output = [“c”,”d”,”e”]

This is crude and likely can be improved, but you're basically looking at a simple state machine, where the current state is the previous character, and the next state is either a reset or an incrementation of a counter.
str = "abcccdddeee"
state = nil
current_count = 0
counts = {}
str.each_char do |char|
if state == char
current_count += 1
counts[char] ||= 0
counts[char] = current_count if current_count > counts[char]
else
current_count = 0
end
state = char
end
p counts.to_a.sort {|a, b| b[1] <=> a[1] }.map(&:first)
Since this only counts and stores counts when the current input causes the FSM to remain in the counting state, you don't get non-repeating characters in your output.
However, since this is Ruby, we can cheat and use regexes:
"abccdddeee".scan(/((.)\2{1,})/).map(&:first).sort_by(&:length).map {|s| s[0] }

Related

What would be time complexity of a binary search that makes call to another helper function?

The helper retrieves value to be compared in the search function. here mem is an object.
def get_val(mem, c):
if c == "n":
return mem.get_name()
elif c == "z":
return mem.get_zip()
In the function below the helper function above is called in each iteration. Will this impact the time-complexity of the binary search or will it still be O(log n)
def bin_search(array, c, s):
first = 0
last = len(array)-1
found = False
while( first<=last and not found):
mid = (first + last)//2
val = get_val(array[mid], criteria)
if val == s:
return array[mid]
else:
if s < val:
last = mid - 1
else:
first = mid + 1
return None
Since you are calling get_val() once per iteration of your binary search, the total time complexity should be
O(log n * f(x)),
where f(x) is the time complexity of get_val(). If this is constant (does not depend on the input, such as the contents of array), then indeed your total time complexity is still O(log n).

Mantain a count of subarrays that satisfy the condition usning recursion

So I made a recursive function that gives me all the subarrays , I want to apply a condition on those sub-arrays and then keep a count of subarrays that satisfy the condition. But Initialization of count variable has been bothering me ,please help!
here is my code:
def printSubArrays(arr, start, end):
if end == len(arr):
return
elif start > end:
return printSubArrays(arr,0,end+1)
else:
dictio = arr[start:end + 1]
print(dictio)
if len(dictio)!=1:
for i in range(len(dictio)):
aand =1
aand =aand & dictio[i]
if aand %2 !=0:
count=count+1
return printSubArrays(arr,start+1,end)
arr=[1,2,5,11,15]
dictio=[]
count = 0
printSubArrays(arr,0,0)
print(count)
The most common technique to keep a count is to use a helper function. So you'd have your principal function call a helper like this:
def printSubArrays(arr, start, end):
return _printSubArrays(arr, start, end, 0)
The 0 at the end is the count.
Then each time you recurse you increment:
def _printSubArrays(arr, start, end, count):
if end == len(arr):
return count
elif start > end:
return _printSubArrays(arr,0,end+1, count + 1)
else:
dictio = arr[start:end + 1]
print(dictio)
if len(dictio)!=1:
for i in range(len(dictio)):
aand =1
aand =aand & dictio[i]
if aand %2 !=0:
count=count+1
return _printSubArrays(arr,start+1,end, count+1)

Detect the "outliers"

In a column I have values like 0.7,0.85, 0.45, etc but also it might happen to have 2.13 which is different than the majority of the values. How can I spotted this "outliers"?
Thank you
Call scipy.stats.zscore(a) with a as a DataFrame to get a NumPy array containing the z-score of each value in a. Call numpy.abs(x) with x as the previous result to convert each element in x to its absolute value. Use the syntax (array < 3).all(axis=1) with array as the previous result to create a boolean array. Filter the original DataFrame with this result.
z_scores = stats.zscore(df)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = df[filtered_entries]
You could get the standard deviation and mean of the set and remove anything more than X (say 2) standard deviations from the mean?
The following would calculate the standard deviation
public static double StdDev(this IEnumerable<double> values)
{
double ret = 0;
if (values.Count() > 1)
{
double avg = values.Average();
double sum = values.Sum(d => Math.Pow(d - avg, 2));
ret = Math.Sqrt((sum) / (values.Count() - 1));
}
return ret;
}

How to use while loop inside a function?

I decide to modify the following while loop and use it inside a function so that the loop can take any value instead of 6.
i = 0
numbers = []
while i < 6:
numbers.append(i)
i += 1
I created the following script so that I can use the variable(or more specifically argument ) instead of 6 .
def numbers(limit):
i = 0
numbers = []
while i < limit:
numbers.append(i)
i = i + 1
print numbers
user_limit = raw_input("Give me a limit ")
numbers(user_limit)
When I didn't use the raw_input() and simply put the arguments from the script it was working fine but now when I run it(in Microsoft Powershell) a cursor blinks continuously after the question in raw_input() is asked. Then i have to hit CTRL + C to abort it. Maybe the function is not getting called after raw_input().
Now it is giving a memory error like in the pic.
You need to convert user_limit to Int:
raw_input() return value is str and the statement is using i which is int
def numbers(limit):
i = 0
numbers = []
while i < limit:
numbers.append(i)
i = i + 1
print numbers
user_limit = int(raw_input("Give me a limit "))
numbers(user_limit)
Output:
Give me a limit 8
[0, 1, 2, 3, 4, 5, 6, 7]

Help in optimizing a for loop in matlab

I have a 1 by N double array consisting of 1 and 0. I would like to map all the 1 to symbol '-3' and '3' and all the 0 to symbol '-1' and '1' equally. Below is my code. As my array is approx 1 by 8 million, it is taking a very long time. How to speed things up?
[row,ll] = size(Data);
sym_zero = -1;
sym_one = -3;
for loop = 1 : row
if Data(loop,1) == 0
Data2(loop,1) = sym_zero;
if sym_zero == -1
sym_zero = 1;
else
sym_zero = -1;
end
else
Data2(loop,1) = sym_one;
if sym_one == -3
sym_zero = 3;
else
sym_zero = -3;
end
end
end
Here's a very important MATLAB optimization tip.
Preallocate!
Your code is much faster with a simple preallocation. Just add
Data2 = zeros(size(Data));
for loop = 1: row
...
before your for loop.
On my computer your code with preallocation terminated in 0.322s, and your original code is still running. I removed my original solution since yours is pretty fast with this optimization :).
Also since we're talking about MATLAB, it's faster to work on column vectors.
Hope you can follow this and I hope that I have understood your code correctly:
nOnes = sum(Data);
nZeroes = size(Data,2) - nOnes;
Data2(find(Data)) = repmat([-3 3],1,nOnes/2)
Data2(find(Data==0)) = repmat([-1 1],1,nZeroes/2)
I'll leave it to you to deal with the odd 1s and 0s.
So, disregarding negative signs, the equation for the output item Data2[loop,1] = Data[loop,1]*2 + 1. So why not first do that using a simple multiply-- that should be fast since it can be vectorized. Then create an array of half the original array length of 1s, half the original array length of -1s, call randperm on that. Then multiply by that. Everything's vectorized and should be much faster.
[row,ll] = size(Data);
sym_zero = -1;
sym_one = -3;
for loop = 1 : row
if ( Data(loop,1) ) // is 1
Data2(loop,1) = sym_one;
sym_one = sym_one * -1; // flip the sign
else
Data2(loop,1) = sym_zero;
sym_zero = sym_zero * -1; // flip the sign
end
end