Building block matrix with numpy - numpy

I'm trying to build up a block matrix from another one, but I can't figure out which numpy function I should use. Here is what I want:
Input:
[[1,2,3],
[4,5,6], and [1,3,2]
[7,8,9]]
Wanted output:
[[1,2,2,2,3,3],
[4,5,5,5,6,6],
[4,5,5,5,6,6],
[4,5,5,5,6,6],
[7,8,8,8,9,9],
[7,8,8,8,9,9]]
I've tried to se numpy.repeat and numpy.block, but I din't figure out to do it.
Can you help me?
Thanks in advance!

Related

Having problems to plot this equation, giving wrong plot

I'm trying to plot a radioisotope's power output by time. For this, I've found the following equation:
without the exponential at the end it works but I need the difference by time so I need to use it but I can't make that part work.
This is my code:
import matplotlib.pyplot as plt
import numpy as np
import math
ln2=0.693
avon=6.02214076*(10**23)
mevj=1.6*(10**(-13))
t = np.array(range(50))
co60 = (((ln2)/(5.27*365*24*3600))*((avon)*mevj*2.505/60))/(2.718**((ln2)/(5.27*365*24*3600)*(t))))
plt.plot(t, co60, label='co60')
This gives a wrong output. I can't get an exponential plot. I used 2.718 for e since when I use math.exp it gives errors. The correct output must be like the black line.
I think I'm having troubles with using the mathematical dynamics of python and couldn't interpret the equation correctly. Could you help?
Thanks.

Plots from excel with panda and seaborn 'ufunc 'isfinite' not supported for the input types'

I am trying to configure a template for creating plots for my test data. Therefore I need to say I am pretty new to that in python, and I already googled quite a lot regarding my question but what I found could not help me. I have a excel table with data in two columns, which I want to plot against each other. My code looks as follows
file='C:/Documents/Test/test_file.xlsx'
df1=pd.read_Excel(file,sheet_name='sheet1',header=0, engine="openpyxl")
plt.figure()
sns.lineplot(data=df1[:,:],x="eps",y="sigma",sort=False,linewidth=0.8)
The excel has -as mentioned a header with eps and sigma as x and y values. The values following are floats, when I check the datatype with df1.dtypes, the result is 'float64' So has anyone an idea what is not working? I get the error 'ufunc 'isfinite' not supported for the input types'
Plotting data from excel with panda and seaborn against each other and save the image.
This might be a library issue. I've been running into the same problem with example datasets and even a very simple:
sns.lineplot(x=[1], y=[1])
I'll update if I find a solution.
Edit: There seems to be an issue with Numpy that is causing this issue with Seaborn. Solution is to downgrade Numpy to 1.23 until 1.24.1 is released.
https://github.com/mwaskom/seaborn/issues/3192

Assertive programming with pandas

I am looking for a way to do assertive programming on pandas dataframes data as does assertr in R.
Are there any convenient library for that ?
All Advice are very welcome.
I don't know of analogous libraries that integrate specifically with Pandas, but assert is a built-in keyword in Python, which you could use to validate data at various points in your data pipeline.
The syntax is simply:
assert [condition]
If true, nothing happens. If false, an AssertionError is raised.
To validate Pandas data you could write a statement like this:
import pandas as pd
import seaborn as sns
iris = sns.load_dataset('iris')
# throws an exception if there are negative values in the sepal_length column
assert (iris['sepal_length'] > 0).all()
I found an answer to my own question: engarde exactly what I was looking for.

Save a numpy sparse matrix into file

I want to save the result of TfidfVectorizer in sklearn.feature_extraction.text into a text file for future use. As I found, it is a sparse matrix of type ''. However when I try to save it using the following code
np.savetxt('Feature_TfIdf.txt', X_Tfidf, fmt='%2.6f')
I get an error like this
IndexError: tuple index out of range
Use joblib.dump or sklearn.externals.joblib.dump for this. NumPy doesn't get SciPy sparse matrices.
Simple example:
np.save('TfIdf.pkl',tfidf)
I manage to solve the problem by converting the sparse matrix to full matrix and then save matrix and save the results. This approach however is not useful for large arrays so it is better to save the matrix in .pkl format.

Why the difference between octave's prctile and numpy's percentile?

I've been rewriting a matlab/octave program into numpy and ran across a difference in some resultant values.
This occurs with both the percentile/prctile and the stdard-deviation functions.
In Numpy:
import matplotlib.mlab as ml
import numpy
>>> t = numpy.linspace(0,100, 100)
>>> numpy.percentile(t,95)
95.0
>>> numpy.std(t)
29.157646512850626
>>> ml.prctile(t,95)
95.000000000000014
In Octave:
octave:1> t = linspace(0,100,100)';
octave:2> prctile(t,95)
ans = 95.454545
octave:3> std(t)
ans = 29.304537
Although the array values of 't' are the same, the results are more different than I would suspect.
In the numpy help(numpy.std) they specifically mention that the algorithm is:
std = sqrt(mean(abs(x - x.mean())**2))
So I implemented that in octave and got the exact answer numpy gives. So it seems the std-deviation function differs.
But why/how? And which is correct? (if there is such a thing)
And even prctile/percentile?
Just in case since I'm in Linux aptosid...
GNU Octave, version 3.6.2
numpy.version '1.6.2rc1'
Numpy simply uses a different algorithm when the percentile lies between two data points. Octave, Matlab and R always center it exactly between two points when needed (I believe), numpy does a bit more then that... if you check http://en.wikipedia.org/wiki/Percentile you will see there are a couple of ways to calculate percentiles.
It seems like Octave assumes ddof=1, at least by default, and numpy uses 0 by default:
>>> numpy.std(t, ddof=0)
29.157646512850633
>>> numpy.std(t, ddof=1)
29.304537349375785