How to use a look up table in MATLAB - optimization

I need to perform an exponential operation of two parameters (one set: t, and the other comes from the arrays) on a set of 2D arrays (a 3D Matrix if you want).
f(t,x) = exp(t-x)
And then I need to add the result of every value in the 3rd dimension. Because it takes too much time using bsxfun to perform the entire operation I was thinking of using a look up table.
I can create the table as a matrix LUT (2 dimensional due to the two parameters), then I can retrieve the values using LUT(par1,par2). But accessing on the 3rd dimension using a loop is expensive too.
My question is: is there a way to implement such mechanism (a look up table) to have a predefined values and then just using them accessing from the matrix elements (kind of indexing) without loops. Or, how can I create a look up table that MATLAB handles automatically to speed up the exponential operation?
EDIT:
I actually used similar methods to create the LUT. Now, my problem actually is how to access it in an efficient way.
Lets said I have a 2 dimensional array M. With those values that I want to apply the function f(t,M(i,j)) for fixed value t. I can use a loop to go through all the values (i,j) of M. But I want a faster way of doing it, because I have a set of M's, and then I need to apply this procedure to all the other values.
My function is a little bit complex than the example I gave:
pr = mean(exp(-bsxfun(#rdivide,bsxfun(#minus,color_vals,double(I)).^2,m)./2),3);
That is my actual function, as you can see is more complex than the example I presented. But the idea is the same. It does an average in the third dimension of the set of M's of the exponential of the difference of two arrays.
Hope that helps.

I agree that the question is not very clear, and that showing some code would help. I'll try anyway.
In order to have a LUT make sense at all, the set of values attained by t-x has to be limited, for example to integers.
Assuming that the exponent can be any integer from -1000 to 1000, you could create a LUT like this:
LUT = exp(-1000:1000);
Then you create your indices (assuming t is a 1D array, and x is a 2D array)
indexArray = bsxfun(#minus,reshape(t,[1,1,3]), x) + 1001; %# -1000 turns into 1
Finally, you create your result
output = LUT(indexArray);
%# sum along third dimension (i.e. sum over all `t`)
output = sum(output,3);

I am not sure I understand your question, but I think this is the answer.
x = 0:3
y = 0:2
z = 0:6
[X,Y,Z] = meshgrid(x,y,z)
LUT = (X+Y).^Z

Related

Searching for groups of objects given a reduction function

I have a few questions about a type of search.
First, is there a name and if so what is the name of the following type of search? I want to search for subsets of objects from some collection such that a reduction and filter function applied to the subset is true. For example, say I have the following objects, each of which contains an id and a value.
[A,10]
[B,10]
[C,10]
[D,9]
[E,11]
I want to search for "all the sets of objects whose summed values equal 30" and I would expect the output to be, {{A,B,C}, {A,D,E}, {B,D,E}, {C,D,E}}.
Second, is the only strategy to perform this search brute-force? Is there some type of general-purpose algorithm for this? Or are search optimizations dependent on the reduction function?
Third, if you came across this problem, what tools would you use to solve it in a general way? Assume the reduction and filter functions could be anything and are not necessarily the sum function. Does SQL provide a good API for this type of search? What about Prolog? Any interesting tips and tricks would be appreciated.
Thanks.
I cannot comment on the problem in general but brute forcing search can be easily done in prolog.
w(a,10).
w(b,10).
w(c,10).
w(d,9).
w(e,11).
solve(0, [], _).
solve(N, [X], [X|_]) :- w(X, N).
solve(N, [X|Xs], [X|Bs]) :-
w(X, W),
W < N,
N1 is N - W,
solve(N1, Xs, Bs).
solve(N, [X|Xs], [_|Bs]) :- % skip element if previous clause fails
solve(N, [X|Xs], Bs).
Which gives
| ?- solve(30, X, [a, b, c, d, e]).
X = [a,b,c] ? ;
X = [a,d,e] ? ;
X = [b,d,e] ? ;
X = [c,d,e] ? ;
(1 ms) no
Sql is TERRIBLE at this kind of problem. Until recently there was no way to get 'All Combinations' of row elements. Now you can do so with Recursive Common Table Expressions, but you are forced by its limitations to retain all partial results as well as final results which you would have to filter out for your final results. About the only benefit you get with SQL's recursive procedure is that you can stop evaluating possible combinations once a sub-path exceeds 30, your target total. That makes it slightly less ugly than an 'evaluate all 2^N combinations' brute force solution (unless every combination sums to less than the target total).
To solve this with SQL you would be running an algorithm that can be described as:
Seed your result set with all table entries less than your target total and their value as a running sum.
Iteratively join your prior result with all combinations of table that were not already used in the result set and whose value added to running sum is less than or equal to target total. Running sum becomes old running sum plus value, and append ID to ID LIST. Union this new result to the old results. Iterate until no more records qualify.
Make a final pass of the result set to filter out the partial sums that do not total to your target.
Oh, and unless you make special provisions, solutions {A,B,C}, {C,B,A}, and {A,C,B} all look like different solutions (order is significant).

How to histogram a numeric variable?

I want to produce a simple histogram of a numeric variable X.
I'm having trouble finding a clear example.
Since it's important that the histogram be meaningful more than beautiful, I would prefer to specify the bin-size rather than letting the tool decide. See: Data Scientists: STOP Randomly Binning Histograms
Histograms are a primary tool for understanding the distribution of data. As such, Splunk automatically creates a histogram by default for raw event queries. So it stands to reason that Splunk should provide tools for you to create histograms of your own variables extracted from query results.
It may be that the reason this is hard to find is that the basic answer is very simple:
(your query) |rename (your value) as X
|chart count by X span=1.0
Select "Visualization" and set chart type to "Column Chart" for a traditional vertical-bar histogram.
There is an example of this in the docs described as "Chart the number of transactions by duration".
The span value is used to control binning of the data. Adjust this value to optimize your visualization.
Warning: It is legal to omit span, but if you do so the X-axis will be compacted non-linearly to eliminate empty bins -- this could result in confusion if you aren't careful about observing the bin labels (assuming they're even drawn).
If you have a long-tail distribution, it may be useful to partition the results to focus on the range of interest. This can be done using where:
(your query) |rename (your value) as X
|where X>=0 and X<=100
|chart count by X span=1.0
Alternatively, use a clamping function to preserve the out-of-range counts:
(your query) |rename (your value) as X
|eval X=max(0,min(X,100))
|chart count by X span=1.0
Another way to deal with long-tails is to use a logarithmic span mode -- special values for span include log2 and log10 (documented as log-span).
If you would like to have both a non-default span and a compressed X-axis, there's probably a parameter for that -- but the documentation is cryptic.
I found that this 2-stage approach made that happen:
(your query) |rename (your value) as X
|bin X span=10.0 as X
|chart count by X
Again, this type of chart can be dangerously misleading if you don't pay careful attention to the labels.

What is a use case of `SSWAP`?

In doing some stuff with BLAS operations I see the level 1 operation SSWAP.
I can't come up with a programming use case for this.
My thinking is, if you where passing y to a function but wanted it with the values of x, why not simply pass x? Swapping the values seems rather convoluted.
This is just a question out of curiosity.
Sometimes swapping the content of two (stride) vectors is exactly what you need. For instance when doing row or column interchanges in pivoting during LU factorization -- the reference BLAS uses xSWAP in xGBTRF. The pivoting algorithm for LU decomposition requires swapping the content of two rows (or columns). These two rows (or columns) can be thought of as two vectors (possibly with non-unit stride between the elements). One needs to do many such interchanges along the way, and they gradually change, so there is no option to "just send some other line to a function" at the end of the algorithm.
To sum up, as a basic building block of more complex algorithms, a (potentially) optimized routine for interchanging columns or rows of a matrix seems useful.

LeavePGroupsOut For multidimensional array

I am working on a research problem and due to a small sized dataset with subjects I am trying to implement Leave N Out style analyses.
Currently I am doing this ad-hoc and I stumbled upon scikit-learn LeavePGroupsOut function.
I read the docs but I am unable to understand how to use it in multidimensional array.
My data are the following: I have 50 subjects, around 20 entries per subject (not fixed) and 20 features per entry with ground-truth value (0 or 1) for every entry.
Well the documentation is actually pretty clear:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeavePGroupsOut.html#sklearn.model_selection.LeavePGroupsOut
In your case you need to concatenate your array s.t. you can provide for every entry and feature the group index. Thus your feature array will have the shape 50*20 datapoints times 20 features (1000,20), so your group array also needs to have shape (1000,).
Then you need to define the cross validation via
lpgo = LeavePGroupsOut(n_groups=n_groups)
It's important to notice that this will result in all possible combinations of left out test groups.

Why do we use multiple dimensional arrays?

I have an understanding about how multiple dimensional arrays work and how to use them except for one thing, In what situation would we need to use them and why?
Basically multi dimension arrays are used if you want to put arrays inside an array.
Say you got 10 students and each writes 3 tests. You can create an array like: arr_name[10][3]
So, calling arr_name[0][0] gives you the result of student 1 on lesson 1.
Calling arr_name[5][2] gives you the result of student 6 on test 3.
You can do this with a 30 position array, but the multi dimension is:
1) easier to understand
2) easier to debug.
Here are a couple examples of arrays in familiar situations.
You might imagine a 2 dimensional array is as a grid. So naturally it is useful when you're dealing with graphics. You might get a pixel from the screen by saying
pixel = screen[20][5] // get the pixel at the 20th row, 5th column
That could also be done with a 3 dimensional array to represent 3d space.
An array could act like a spreadsheet. Here the rows are customers, and the columns are name, email, and date of birth.
name = customers[0][0]
email = customers[0][1]
dateofbirth = customers[0][2]
Really there is a more fundamental pattern underlying this. Things have things have things... and so on. And in a sense you're right to wonder whether you need multidimensional arrays, because there are other ways to represent that same pattern. It's just there for convenience. You could alternatively
Have a single dimensional array and do some math to make it act multidimensional. If you indexed pixels one by one left to right top to bottom you would end up with a million or so elements. Divide by the width of the screen to get the row. The remainder is the column.
Use objects. Instead of using a multidimensional array in example 2 you could have a single dimensional array of Customer objects. Each Customer object would have the attributes name, email and dob.
So there's rarely one way to do something. Just choose the most clear way. With arrays you're accessing by number, with objects you're accessing by name.
Such solution comes as intuitive when you are faced with accessing a data element identified by a multidimensional vector. So if "which element" is defined by more than two "dimensions".
Good uses for 2D or Two D arrays might be:
Matrix Math i.e. rotation things in space on a plane and more.
Maps like game maps, top or side views for either actual graphics or descriptive data.
Spread Sheet like storage.
Multi Columns of display table data.
Kinds of Graphics work.
I know there could be much more, so maybe someone else can add to this list in their answers.