numpy grid with specific values in each spot in python - numpy

I know how to have a grid of all 0s but how do I put 1s in the specific spot. I have tried writing if statements but I feel like that is too much work and now what I am suppose to do.
returns: an np array of size size, whose values are all zero, except for positions
(2,0), (0,1), (2,1), (1,2), and (2,2), whose values are 1. Intended to be used
on a game w 2 states.

Related

How does PIL handle a numpy matrix with negative values?

I am trying to build a machine learning model, and in the first step, I plan to convert my data matrix (consisting of real numbers, both positive and negative values) into RGB images. All of these values are smaller than 255. I know we can do that with the PIL package, but I wonder if the original negative values can still be retained if we make them into images? Or they will be all rounded to zero?
I went through so many google examples, but still confused. So I am asking to be certain.

Applying LSA on term document matrix when number of documents are very less

I have a term-document matrix (X) of shape (6, 25931). The first 5 documents are my source documents and the last document is my target document. The column represents counts for different words in the vocabulary set. I want to get the cosine similarity of the last document with each of the other documents.
But since SVD produces an S of size (min(6, 25931),), If I used the S to reduce my X, I get a 6 * 6 matrix. But In this case, I feel that I will be losing too much information since I am reducing a vector of size (25931,) to (6,).
And when you think about it, usually, the number of documents will always be less than number of vocabulary words. In this case, using SVD to reduce dimensionality will always produce vectors that are of size (no documents,).
According to everything that I have read, when SVD is used like this on a term-document matrix, it's called LSA.
Am I implementing LSA correctly?
If this is correct, then is there any other way to reduce the dimensionality and get denser vectors where the size of the compressed vector is greater than (6,)?
P.S.: I also tried using fit_transform from sklearn.decomposition.TruncatedSVD which expects the vector to be of the form (n_samples, n_components) which is why the shape of my term-document matrix is (6, 25931) and not (25931, 6). I kept getting a (6, 6) matrix which initially confused me. But now it makes sense after I remembered the math behind SVD.
If the objective of the exercise is to find the cosine similarity, then the following approach can help. The author is only attempting to solve for the objective and not to comment on the definition of Latent Semantic Analysis or the definition of Singular Value Decomposition mentioned by the questioner.
Let us first invoke all the required libraries. Please install them if they do not exist in the machine.
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
Let us generate some sample data for this exercise.
df = {'sentence': ['one two three','two three four','four five','six seven eight nine ten']}
df = pd.DataFrame(df, columns = ['sentence'])
The first step is to get the exhaustive list of all the possible features. So collate all of the content at one place.
all_content = [' '.join(df['sentence'])]
Let us build a vectorizer and fit it now. Please note that the arguments in the vectorizer are not explained by the author as the focus is on solving the problem.
vectorizer = TfidfVectorizer(encoding = 'latin-1',norm = 'l2', min_df = 0.03, ngram_range = (1,2), max_features = 5000)
vectorizer.fit(all_content)
We can inspect the vocabulary to see if it makes sense. If needed, one could add stop words in the vectorizer above and supress them to see if they are indeed supressed.
print(vectorizer.vocabulary_)
Let us vectorize the sentences for us to deploy cosine similarity.
s1Tokens = vectorizer.transform(df.iloc[1,])
s2Tokens = vectorizer.transform(df.iloc[2,])
Finally, the cosine of the similarity can be computed as follows.
cosine_similarity(s1Tokens , s2Tokens)

numpy concatenate over dimension

I find myself doing the following quite frequently and am wondering if there's a "canonical" way of doing it.
I have an ndarray say shape = (100, 4, 6) and I want to reduce to (100, 24) by concatenating the 4 vectors of length 6 into one vector
I can use reshape to do this but I've been manually computing the new shape
i.e.
np.reshape(x,shape=(a.shape[0],a.shape[1]*a.shape[2]))
ideally I'd simply supply the dimension I want to reduce on
np.concatenate(x,dim=-1)
but np.concatenate operates on an enumerable of ndarray. I've wondered if it's possible to supply an iterator over an ndarray axis but haven't looked further. What is the usual pattern here?
You can avoid calculating one dimension by using -1 like:
x.reshape(a.shape[0], -1)

Interpreting the Y values of a normal distribution

I've written this code to generate a normal distribution of a set of values 1,2,3 :
import pandas as pd
import random
import numpy as np
df = pd.DataFrame({'col1':[1,2,3]})
print(df)
fig, ax = plt.subplots(1,1)
df.plot(kind='hist', normed=True, ax=ax)
Returns :
The X values are the range of possible values but how are the Y values interpreted ?
Reading http://www.stat.yale.edu/Courses/1997-98/101/normal.htm the Y value is calculated using :
A normal distribution has a bell-shaped density curve described by its
mean and standard deviation . The density curve is symmetrical,
centered about its mean, with its spread determined by its standard
deviation. The height of a normal density curve at a given point x is
given by
What is the meaning of this formula ?
I think you are confusing two concepts here. A histogram will just plot how many times a certain value appears. So for your list of [1,2,3], the value 1 will appear once and the same for 2 and 3. If you would have set Normed=False you would get the plot you have now with a height of 1.0.
However, when you set Normed=True, you will turn on normalization. Note that this does not have anything to do with a normal distribution. Have a look at the documentation for hist, which you can find here: http://matplotlib.org/api/pyplot_api.html?highlight=hist#matplotlib.pyplot.hist
There you see that what the option Normed does, which is:
If True, the first element of the return tuple will be the counts normalized to form a probability density, i.e., n/(len(x)`dbin), i.e., the integral of the histogram will sum to 1. If stacked is also True, the sum of the histograms is normalized to 1.
So it gives you the formula right there. So in your case, you have three points, i.e. len(x)=3. If you look at your plot you can see that your bins have a width of 0.2 so dbin=0.2. Each value appears only once for for both 1, 2, and 3, you will have n=1. Thus the height of your bars should be 1/(3*0.2) = 1.67, which is exactly what you see in your histogram.
Now for the normal distribution, that is just a specific probability function that is defined as the formula you gave. It is useful in many fields as it relates to uncertainties. You'll see it a lot in statistics for example. The Wikipedia article on it has lots of info.
If want to generate a list of values that conform to a normal distribution, I would suggest reading the documentation of numpy.random.normal which will do this for you: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

Maximum values along axis of Numpy ndarray?

I'm afraid that I can't describe the problem so I draw a sketch of it.Anyway,what I need is to find the max values along the 0th axis in a numpy ndarray,i.e.array.shape(5,5,3), and their corresponding "layer numbers", and use the "layer numbers" to create a new 2d array with shape of (1,5,3).Hope I'm giving a clear description here..thanks a lot.
If you check the documentation of np.max, you'll see it takes an axis argument:
a.max(axis=0)
But that won't help you yet. However, there's a function argmax that gives you the indices of the maxima along a given axis:
a.argmax(axis=...)
So, let's find your first (5,5) array: it's a[...,0]. You can find the position of the maxima per rows (or columns) with a[...,0].max(axis=1) (or 0), and use that to find the values on the other sides.