How to multiply matrix with its transpose using Oracle database and utl_nla - sql

I'm going nuts with this issue. I can't get the result from the following multiplication:
X^t * X
X is an m * n matrix with m = 36 rows and n = 3 columns which is represented by an utl_nla_array_dbl datatype. The data origins from a table and gets copied by simple pl/sql code.
To solve my problem, I chose the method utl_nla.blas_gemm. It's a matrix-matrix method, in contrast to utl_nla.blas_gemv as a matrix-vector method (I got that one working. I was able to multiply that very matrix X by a vector y and received the right result).
Here is the relevant code, which outputs me a matrix with the right dimension (3X3) but just zeros in it. To make it clearer I hard coded most parameters:
utl_nla.blas_gemm(
transa => 'T',
transb => 'N',
m => 3,
n => 3,
k => 36,
alpha => 1.0,
a => X,
lda => 3,
b => X,
ldb => 3,
beta => 0.0,
c => XtX,
ldc => 3);
The variable XtX is also of type utl_nla_array_dbl and is to hold the result.
Any idea what I'm doing wrong? I'll appreciate every contribution since I'm totally stuck and can't find any help elsewhere on the web.

I had the same problem, and after a few days I'm sure , that the UTL_NLA.BLAS_GEMM procedure is broken.
It was broken in the 10.2g version, and still the same error occurs in version 11.2g.
The problem is in the in wrapper procedure written in PL/SQL.
It does not handle the parameters
M, N, K, LDA, LDB, LDC correctly,
in the case when one or both of the parameters TRANSA, TRANSB are set to 'T'.
Not surprisingly it is working, when the matrix is a sqare matrix,
for example the matrix A is 100x100 and the relevant parameter TRANSA = 'T'.
The procedure UTL_NLS.BLAS_GEMM mishandles the parameters in this case too,
but they are equal, so it has no effect.
The workaround I use is simple: before I call the procedure, I transpose the relevant matrix,
and I use BLAS_GEMM allways with the setting TRANSA = 'N' and TRANSB = 'N'.
Unfortunately there is no transpose procedure in the UTL_NLA package (btw. BLAS has one),
but to write one is not a big deal:
PROCEDURE MatTranspose (nRows IN NUMBER, /* number of rows in A */
nCols IN NUMBER, /* number of columns in A */
mat_A IN utl_nla_array_dbl, /* supposed it is stored column-wise i.e. 'C' */
mat_At IN OUT utl_nla_array_dbl) IS
/* the array can be larger then nRow * nCol, the rest part is not handled in either matrices */
nIii NUMBER;
nJjj NUMBER;
BEGIN
FOR nIii IN 1 .. nRows LOOP
FOR nJjj IN 1 .. nCols LOOP
mat_At (nJjj + nCols * (nIii - 1)) := mat_A (nIii + nRows * (nJjj - 1));
END LOOP;
END LOOP;
END MatTranspose;
For me the real pain was the documentation, e.g. e40758.pdf.
It is full of mistakes too, see for instance p. 232-26 and it misleads me, makes me think I pass the wrong parameters.
I spent couple of hours searching the web for a working example but - of course - in vain.
It is probably a simple error in the BLAS_GEMM procedure which takes half an our to fix,
and yet developers are waiting more then 6 years for a correct version.

After looking at the spec for UTL_NLA and reading the description of BLAS_GEMM, it looks to me like LDA and LDB should be 36. Try changing those and see if it helps.
Share and enjoy.

Related

Scilab for Cutting Stock Algorithm

I'm new to Scilab (and programming in general). I'm trying to implement a Scilab code to solve the cutting stock problem aka 'bin packing'.
The problem: given 'n' items of sizes[] (a vector from s1 to sn), and same capacity for all bins (c=1000), I need to minimize the number of bins required to fit all items.
I'm trying the 'next item algorithm', i.e., pick the first item from the vector, put it in the bin, then pick the next item and try to put in the same bin, in case there is no enough space, then create another bin.
Actually I don't need help in improving the algorithm, but rather in implement the code for this specific one.
Here is what I've tried so far:
// 'n' is the number of items to be packed
// 'c' is the capacity (how much items fit in the bin)
// 'sizes' a vector containing the size of n items
// 'nbins' number of bins used so far
// 'bin_rem' space left in current bin
sizes=[400,401,402,403,404,405,406,408,409,411,428,450,482]
c=1000
n=length(sizes)
nbins = 0
bin_rem = c
function y = bins(sizes,c,n)
for i=0; i<n; i=i+1
if sizes[i] > bin_rem
nbins=nbins+1
bin_rem = c - sizes(i)
bin_rem = bin_rem - sizes(i)
end
endfunction
disp ("Number of bins needed "+string(bins([sizes,c,n])))
end
I'm stuck with this error below and have no idea on how to solve it.
at line 20 of executed file
endfunction
^~~~~~~~~~~^
Error: syntax error, unexpected endfunction, expecting end
Any help?
First, seems like you still don't quite understand Scilab's syntax, since I see you using sizes[i], instead of sizes(i), and calling bins([sizes,c,n]). So, for now, try not to use functions. As for the error you get, it happens because you forgot one end. The way you wrote your code, only the if statement is closed, and for loop is still open.
Secondly, when you correct that, you'll notice that your program does not work properly, and that is because you defined the loop wrong. In Scilab, the for loop is actually a "for each" loop, therefore, you need to provide a full range of values for each iteration, instead of starting value (i=0), condition (i<n) and increment function (i=i+1).
Thirdly, seems like you understand the algorithm you're trying to use, but you implemented it wrong: the last line inside the loop should be the else statement.
Solving all those, you have the following piece of code:
sizes=[400,401,402,403,404,405,406,408,409,411,428,450,482]
c=1000
n=length(sizes)
nbins = 0 //should start as 0
bin_rem = 0 //should start as 0
for i = 1:n
if sizes(i) > bin_rem
nbins = nbins + 1;
bin_rem = c - sizes(i);
else
bin_rem = bin_rem - sizes(i);
end
end
disp ("Number of bins needed "+string(nbins))
Just to clarify, 1:n means a vector from 1 to n with pace of 1. You could have also written for i = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

Are calculations involving a large matrix using arrays in VBA faster than doing the same calculation manually in Excel?

I am trying to do calculations as part of regression model in Excel.
I need to calculate ((X^T)WX)^(-1)(X^T)WY. Where X, W, Y are matrices and ^T and ^-1 are denoting the matrix transpose and inverting operation.
Now when X, W, Y are of small dimensions I simply run my macro which calculates these values very very fast.
However sometimes I am dealing with the case when say, the dimensions of X, W, Y are 5000 X 5, 5000 X 1 and 5000 X 1 respectively, then the macro can take a lot longer to run.
I have two questions:
Would, instead of using my macro which generates the matrices on Excel sheets and then uses Excel formulas like MMULT and MINVERSE etc. to calculate the output, it be faster for larger dimension matrices if I used arrays in VBA to do all the calculations? (I am not too sure how arrays work in VBA so I don't actually know if it would do anything to excel, and hence if it would be any quicker/less computationally intensive.)
If the answer to the above question is no it would be no quicker. Then does anybody have an idea how to speed such calculations up? Or do I need to simply put up with it and wait.
Thanks for your time.
Considering that the algorithm of the code is the same, the speed ranking is the following:
Dll custom library with C#, C++, C, Java or anything similar
VBA
Excel
I have compared a VBA vs C++ function here, in the long term the result is really bad for VBA.
So, the following Fibonacci with recursion in C++:
int __stdcall FibWithRecursion(int & x)
{
int k = 0;
int p = 0;
if (x == 0)
return 0;
if (x == 1)
return 1;
k = x - 1;
p = x - 2;
return FibWithRecursion(k) + FibWithRecursion(p);
}
is exponentially better, when called in Excel, than the same complexity function in VBA:
Public Function FibWithRecursionVBA(ByRef x As Long) As Long
Dim k As Long: k = 0
Dim p As Long: p = 0
If (x = 0) Then FibWithRecursionVBA = 0: Exit Function
If (x = 1) Then FibWithRecursionVBA = 1: Exit Function
k = x - 1
p = x - 2
FibWithRecursionVBA = FibWithRecursionVBA(k) + FibWithRecursionVBA(p)
End Function
Better late than never:
I use matrices that are bigger, 3 or 4 dimensions, sized like 16k x 26 x 5.
I run through them to find data, apply one or two formulas or make combos with other matrices.
Number one, after starting the macro, open another application like notepad, you might have a nice speed increase ☺ !
Then, I guess you switched of screen updating etc, and turned of automatic calculation
As last: don't put the data in cells, not in arrays.
Just something like:
Dim Matrix1 as String ===>'put it in declarations if you want to use it in other macros as well. Remember you can not do "blabla=activecell.value2" etc anymore!!
In the "Sub()" code, use ReDim Matrix1(1 to a_value, 1 to 2nd_value, ... , 1 to last_value)
Matrix1(45,32,63)="what you want to put there"
After running, just drop the
Matrix1(1 to a_value, 1 to 2nd_value,1) at 1st sheet,
Matrix1(1 to a_value, 1 to 2nd_value,2) at 2nd sheet, etc
Switch on screen updating again, etc
In this way my calculation went from 45 minutes to just one, by avoiding the intermediary screen update
Success, I hope it is useful for somebody

How do I read numbers from a file in Fortran and immediately do calculations with each numbers?

I'm totally new to Fortran, and I'm trying to learn the language here:
http://www.fortrantutorial.com/files-precision/index.php. I have some basic experience with C and Python, but not much, like introduction class and such.
So in exercises 4.1, they ask me to input some numbers from a file, and check if these numbers are even or odd. Here is the code to input the numbers:
program readdata
implicit none
!reads data from a file called mydata.txt
real :: x,y,z
open(10,file='mydata.txt')
read(10,*) x,y,z
print *,x,y,z
end program readdata
The file mydata.txt contains some random numbers. And they can check if the number is even or odd by:
if (mod(num,2)>0) then……
My question is that: if this file have like 10, or 1000 numbers, do I have to manually assign every single one of them? Is there any other way for me to do quick calculation with mass numbers situation like that?
Every read also moves the read pointer forwards. So with every new read, a new line is read in from the file.
The easiest thing to do is to keep reading until the READ statement returns an error. Of course, you have to pass a variable for the READ to write its error into. Something like this:
program readdata
implicit none
real :: x, y, z
integer :: iounit, ios
open(newunit=iounit, file='mydata.txt', iostat=ios, action='READ')
if (ios /= 0) STOP 1
do
read(iounit, *, iostat=ios) x, y, z
if (ios /= 0) exit
print *, x, y, z
end do
close(iounit)
end program readdata
Update: If you're limited to Fortran 95, as OP suggested in his comments, here's what to change: Instead of
integer :: iounit, ios
open(newunit=iounit, ...)
you use
integer :: ios
integer, paramter :: iounit = 100
open(unit=iounit, ...)
All that's important is that iounit is a number, greater than 10, which is not used as a unit for any other read/write operation.
well a mass of numbers means that X, Y, and Z need to be more than a single number of each... so something like this should get you vectored towards a solution:
program readdata
implicit none
!reads data from a file called mydata.txt
INTEGER, PARAMETER :: max2process = 10000
INTEGER :: I
INTEGER :: K = 0
real , DIMENSION(max2process) :: x,y,z
LOGICAL, DIMENSION(max2process) :: ODD = .FALSE.
open(10,file='mydata.txt')
10 CONTINUE
X(:) = 0
Y(:) = 0
K(:) = 0
DO I = 1, max2process
K = K + I
read(10,*, EOF=99) x(I), y(I), z(I)
print *,x(I), y(I), z(I)
ENDDO
WRITE(*,22) I, MINVAL(X(1:I)), MAXVAL(X(1:I)
22 FORMAT(' MIN(X(1:',I5,')=',0PE12.5,' Max=',0PE12.5)
odd = .FALSE.
WHERE (MODULO(X, 2) /= 0)
ODD = .TRUE.
ENDWHERE
DO I = 1, 10
WRITE(*,88) I, X(I), odd(I)
88 FORMAT('x(', I2,')=',0PE12.5,' odd="',L1,'"')
EBDDO
WRITE(*,*) ' we did not hit the end of file... Go back and read some more'
GOTO 10
99 WRITE(*,*) 'END of file reached at ', (K-1)
end program readdata
Basically something like that, but I probably have a typo (LTR). If MODULO is ELEMENTAL then you can do it easily... And MODULO is ELEMENTAL, so you are in luck (which is common).
There is a https://rosettacode.org/wiki.Even_or_Odd#Fortran that looks nice. the functions could be further enhanced using ELEMENTAL FUNCTION.
that should give you enough to work it out.
or you could read the file to find the size, close and reopen it, and then allocate X, Y, Z and Odd... the example i gave was more of a streaming example. one should generally use a DO WHILE rather than a GOTO, but starting out a GOTO can be conceptually easier. (You need an EXIT or some DO WHILE (IOSTAT /= ) in order to break out)

Smooth Coloring Mandelbrot Set Without Complex Number Library

I've coded a basic Mandelbrot explorer in C#, but I have those horrible bands of color, and it's all greyscale.
I have the equation for smooth coloring:
mu = N + 1 - log (log |Z(N)|) / log 2
Where N is the escape count, and |Z(N)| is the modulus of the complex number after the value has escaped, it's this value which I'm unsure of.
My code is based off the pseudo code given on the wikipedia page: http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
The complex number is represented by the real values x and y, using this method, how would I calculate the value of |Z(N)| ?
|Z(N)| means the distance to the origin, so you can calculate it via sqrt(x*x + y*y).
If you run into an error with the logarithm: Check the iterations before. If it's part of the Mandelbrot set (iteration = max_iteration), the first logarithm will result 0 and the second will raise an error.
So just add this snippet instead of your old return code. .
if (i < iterations)
{
return i + 1 - Math.Log(Math.Log(Math.Sqrt(x * x + y * y))) / Math.Log(2);
}
return i;
Later, you should divide i by the max_iterations and multiply it with 255. This will give you a nice rgb-value.

CUDAFunctionLoad in Mathematica - Indexing problem

I am trying to debug an index problem I am having on my CUDA machine
Cuda Machine Info:
{1->{Name->Tesla C2050,Clock Rate->1147000,Compute Capabilities->2.,GPU Overlap->1,Maximum Block Dimensions->{1024,1024,64},Maximum Grid Dimensions->{65535,65535,65535},Maximum Threads Per Block->1024,Maximum Shared Memory Per Block->49152,Total Constant Memory->65536,Warp Size->32,Maximum Pitch->2147483647,Maximum Registers Per Block->32768,Texture Alignment->512,Multiprocessor Count->14,Core Count->448,Execution Timeout->0,Integrated->False,Can Map Host Memory->True,Compute Mode->Default,Texture1D Width->65536,Texture2D Width->65536,Texture2D Height->65535,Texture3D Width->2048,Texture3D Height->2048,Texture3D Depth->2048,Texture2D Array Width->16384,Texture2D Array Height->16384,Texture2D Array Slices->2048,Surface Alignment->512,Concurrent Kernels->True,ECC Enabled->True,Total Memory->2817982462},
All this code does is set the values of a 3D array equal to the index that CUDA is using:
__global __ void cudaMatExp(
float *matrix1, float *matrixStore, int lengthx, int lengthy, int lengthz){
long UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;
long index = UniqueBlockIndex * blockDim.z * blockDim.y * blockDim.x +
threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x +
threadIdx.x;
if (index < lengthx*lengthy*lengthz) {
matrixStore[index] = index;
}
}
For some reason, once the dimension of my 3D array becomes too large, the indexing stops.
I have tried different block dimensions (blockDim.x by blockDim.y by blockDim.z):
8x8x8 only gives correct indexing up to array dimension 12x12x12
9x9x9 only gives correct indexing up to array dimension 14x14x14
10x10x10 only gives correct indexing up to array dimension 15x15x15
For dimensions larger than these all of the different block sizes eventually start to increase again, but they never reach a value of dim^3-1 (which is the maximum index that the cuda thread should reach)
Here are some plots that illustrate this behavior:
For example: This is plotting on the x axis the dimension of the 3D array (which is xxx), and on the y axis the maximum index number that is processed during the cuda execution. This particular plot is for block dimensions of 10x10x10.
Here is the (Mathematica) code to generate that plot, but when I ran this one, I used block dimensions of 1024x1x1:
CUDAExp = CUDAFunctionLoad[codeexp, "cudaMatExp",
{{"Float", _,"Input"}, {"Float", _,"Output"},
_Integer, _Integer, _Integer},
{1024, 1, 1}]; (*These last three numbers are the block dimensions*)
max = 100; (* the maximum dimension of the 3D array *)
hold = Table[1, {i, 1, max}];
compare = Table[i^3, {i, 1, max}];
Do[
dim = ii;
AA = CUDAMemoryLoad[ConstantArray[1.0, {dim, dim, dim}], Real,
"TargetPrecision" -> "Single"];
BB = CUDAMemoryLoad[ConstantArray[1.0, {dim, dim, dim}], Real,
"TargetPrecision" -> "Single"];
hold[[ii]] = Max[Flatten[
CUDAMemoryGet[CUDAExp[AA, BB, dim, dim, dim][[1]]]]];
, {ii, 1, max}]
ListLinePlot[{compare, Flatten[hold]}, PlotRange -> All]
This is the same plot, but now plotting x^3 to compare to where it should be. Notice that it diverges after the dimension of the array is >32
I test the dimensions of the 3D array and look at how far the indexing goes and compare it with dim^3-1. E.g. for dim=32, the cuda max index is 32767 (which is 32^3 -1), but for dim=33 the cuda output is 33791 when it should be 35936 (33^3 -1). Notice that 33791-32767 = 1024 = blockDim.x
Question:
Is there a way to correctly index an array with dimensions larger than the block dimensions in Mathematica?
Now, I know that some people use __mul24(threadIdx.y,blockDim.x) in their index equation to prevent errors in bit multiplication, but it doesn't seem to help in my case.
Also, I have seen someone mention that you should compile your code with -arch=sm_11 because by default it's compiled for compute capability 1.0. I don't know if this is the case in Mathematica though. I would assume that CUDAFunctionLoad[] knows to compile with 2.0 capability. Any one know?
Any suggestions would be extremely helpful!
So, Mathematica kind of has a hidden way of dealing with grid dimensions, to fix your grid dimension to something that will work, you have to add another number to the end of the function you are calling.
The argument denotes the number of threads to launch (or grid dimension times block dimension).
For example, in my code above:
CUDAExp =
CUDAFunctionLoad[codeexp,
"cudaMatExp", {
{"Float", _, "Input"}, {"Float", _,"Output"},
_Integer, _Integer, _Integer},
{8, 8, 8}, "ShellOutputFunction" -> Print];
(8,8,8) denotes the dimension of the block.
When you call CUDAExp[] in mathematica, you can add an argument that denotes the number of threads to launch:
In this example I finally got it to work with the following:
// AA and BB are 3D arrays of 0 with dimensions dim^3
dim = 64;
CUDAExp[AA, BB, dim, dim, dim, 4089];
Note that when you compile with CUDAFunctionLoad[], it only expects 5 inputs, the first is the array you pass it (of dimensions dim x dim x dim) and the second is where the memory of it is stored. The third, fourth, and fifth are the dimensions.
When you pass it a 6th, mathematica translates that as gridDim.x * blockDim.x, so, since I know I need gridDim.x = 512 in order for every element in the array to be dealt with, I set this number equal to 512 * 8 = 4089.
I hope this is clear and useful to someone in the future that comes across this issue.