I have a vector
[2 3 4]
That I need to multiply with a matrix
1 1 1
2 2 2
3 3 3
to get
2 3 4
4 6 8
6 9 12
Now, I can make the vector into a matrix and do an element-wise multiplication, but is there also an efficient way to do this in MKL / CBLAS?
Yes, there is a function in oneMKL called cblas_?gemv which computes the multiplication of matrix and vector.
You can refer to the below link for more details regarding the usage of the function.
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-routines/blas-level-2-routines/cblas-gemv.html
If you have installed the oneMKL in your system, you can take a look at the examples which helps you to better understand the usage of the functions that are available in the library.
Related
(Dyalog) APL learner question
If I have a matrix Y:
Y
4 9 2
3 5 7
8 1 6
I can get two of its members like this:
Y[(1 1) (2 2)]
4 5
I can use the same technique using dfn syntax:
{⍵[(1 1) (2 2)]}Y
4 5
I, however, can't work out how to do the equivalent in a tacit function. In particular it seems that bracket indexing doesn't work in a tacit function, and I can't find a way of using squad indexing with list of indexes.
Is there a way of doing this, or is this a limitation of tacit functions?
Note that in my real example the list of indexes is generated, so I can't simply do (((1 1)⌷⊢),(2 2)⌷⊢)Y or anything similar.
(1 1)(2 2)⌷¨⊂Y
works, also
(1 1)(2 2)⊃⍤0 99⊢Y
The first thing one might try is
Y ← 3 3⍴⍳9
Y
1 2 3
4 5 6
7 8 9
Y[(1 1)(2 2)]
1 5
1 1⌷Y
1
(1 1)(2 2)⌷Y
2 2
2 2
But we see that (1 1)(2 2)⌷Y doesn't work. What is happening is that ⌷ looks at the vectors on its left and builds all combinations of indices, which just builds a 2 by 2 matrix of 2s, as (1 1)(2 2) is interpreted as the indices (1 2), (1, 2), (1, 2), and then (1, 2) again.
It might be easier to see it like this:
(1 2)3⌷Y
3 6
(1 2)3⌷ means "from the first and second rows, give me the element in the 3rd column".
Therefore, if you want to give the indices like that, you are likely to need to use the each operator ¨ with ⌷:
(1 1)(2 2)⌷¨⊂Y
1 5
If you really want that tacitly, then you can use
I ← ⌷¨∘⊂
As other answer(s) have shown, there are more alternatives to indexing. I can also recommend the following webinar on indexing: https://dyalog.tv/Webinar/?v=AgYDvSF2FfU .
Take your time to go through the alternatives in the video, APL isn't like Python: in APL, there's generally more than one obvious way to do it :)
I have a dataframe of few hundreds rows , that can be grouped to ids as follows:
df = Val1 Val2 Val3 Id
2 2 8 b
1 2 3 a
5 7 8 z
5 1 4 a
0 9 0 c
3 1 3 b
2 7 5 z
7 2 8 c
6 5 5 d
...
5 1 8 a
4 9 0 z
1 8 2 z
I want to use GridSearchCV , but with a custom CV that will assure that all the rows from the same ID will always be on the same set.
So either all the rows if a are in the test set , or all of them are in the train set - and so for all the different IDs.
I want to have 5 folds - so 80% of the ids will go to the train and 20% to the test.
I understand that it can't guarentee that all folds will have the exact same amount of rows - since one ID might have more rows than the other.
What is the best way to do so?
As stated, you can provide cv with an iterator. You can use GroupShuffleSplit(). For example, once you use it to split your dataset, you can put the result within GridSearchCV() for the cv parameter.
As mentioned in the sklearn documentation, there's a parameter called "cv" where you can provide "An iterable yielding (train, test) splits as arrays of indices."
Do check out the documentation in future first.
As mentioned previously, GroupShuffleSplit() splits data based on group lables. However, the test sets aren't necessarily disjoint (i.e. doing multiple splits, an ID may appear in multiple test sets). If you want each ID to appear in exactly one test fold, you could use GroupKFold(). This is also available in Sklearn.model_selection, and directly extends KFold to take into account group lables.
I want to fit poission distribution on my data points and want to decide based on chisquare test that should I accept or reject this proposed distribution. I only used 10 observations. Here is my code
#Fitting function:
def Poisson_fit(x,a):
return (a*np.exp(-x))
#Code
hist, bins= np.histogram(x, bins=10, density=True)
print("hist: ",hist)
#hist: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02,2.67094169e-03,4.39345778e-04,6.59603327e-05,1.01518320e-05,
1.06301906e-06]
XX = np.arange(len(hist))
print("XX: ",XX)
#XX: [0 1 2 3 4 5 6 7 8 9]
plt.scatter(XX, hist, marker='.',color='red')
popt, pcov = optimize.curve_fit(Poisson_fit, XX, hist)
plt.plot(x_data, Poisson_fit(x_data,*popt), linestyle='--',color='red',
label='Fit')
print("hist: ",hist)
plt.xlabel('s')
plt.ylabel('P(s)')
#Chisquare test:
f_obs =hist
#f_obs: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02, 2.67094169e-03, 4.39345778e-04, 6.59603327e-05,
1.01518320e-05, 1.06301906e-06]
f_exp= Poisson_fit(XX,*popt)
f_exp: [6.76613820e-01, 2.48912314e-01, 9.15697229e-02, 3.36866185e-02,
1.23926144e-02, 4.55898806e-03, 1.67715798e-03, 6.16991940e-04,
2.26978650e-04, 8.35007789e-05]
chi,p_value=chisquare(f_obs,f_exp)
print("chi: ",chi)
print("p_value: ",p_value)
chi: 0.4588956658201067
p_value: 0.9999789643475111`
I am using 10 observations so degree of freedom would be 9. For this degree of freedom I can't find my p-value and chi value on Chi-square distribution table. Is there anything wrong in my code?Or my input values are too small that test fails? if P-value >0.05 distribution is accepted. Although p-value is large 0.999 but for this I can't find chisquare value 0.4588 on table. I think there is something wrong in my code. How to fix this error?
Is this returned chi value is the critical value of tails? How to check proposed hypothesis?
What's difference between summation and concatenation at neural network like CNNs?
For example Googlenet's Inception module used concatenation, Resnet's Residual learning used summation.
Please teach me.
Concatenation means to concatenate two blobs, so after the concat we have a bigger blob that contains the previous blobs in a continuous memory. For example:
blob1:
1
2
3
blob2:
4
5
6
blob_res:
1
2
3
4
5
6
Summation means element-wise summation, blob1 and blob2 must have the exact same shape, and the resultant blob has the same shape with the elements a1+b1, a2+b2, ai+bi, ... an+bn.
For the example above,
blob_res:
(1+4) 5
(2+5) 7
(3+6) 9
I conducted a feature selection using lasso method as well as a covariance test using covTest::covTest to retrieve the p.values. I borrow an example from covTest such that:
require(lars)
require(covTest)
set.seed(1234)
x=matrix(rnorm(100*10),ncol=10)
x=scale(x,TRUE,TRUE)/sqrt(99)
beta=c(4,rep(0,9))
y=x%*%beta+.4*rnorm(100)
a=lars(x,y)
covTest(a,x,y)
$results
Predictor_Number Drop_in_covariance P-value
1 105.7307 0.0000
6 0.9377 0.3953
10 0.2270 0.7974
3 0.0689 0.9334
7 0.1144 0.8921
2 0.0509 0.9504
9 0.0508 0.9505
8 0.0006 0.9994
4 0.1190 0.8880
5 0.0013 0.9987
$sigma
[1] 0.3705
$null.dist
[1] "F(2,90)
The covTest's results showed the p-values of the top hit features. My question is how to retrieve the coefficient of these features such as that of the predictor 1 as well as its Std.err and 95%CI. I'd to compare these estimates with the counterparts from glm.