Using GeFolki for the coregistration of different satellite datasets I receive the following ValueError trying to manipulate the data.
Could you explain what am I doing wrong? Please Help me
from skimage.transform import resize
nx = int(round(dimx/fdecimation))
ny = int(round(dimy/fdecimation))
Mg = resize(Master,(nx, ny),1,'constant')
nsx = int(round(dimxn/fdecimation))
nsy = int(round(dimyn/fdecimation))
Sg = resize(Slave,(nsx, nsy),1,'constant')
# Rank computation and Criterion on images after deximation
from rank import rank_sup as rank_filter_sup
from rank import rank_inf as rank_filter_inf
Mg_rank = rank_filter_inf(Mg, rank) # rank sup : high value pixels have low rank
Sg_rank = rank_filter_inf(Sg, rank)
R=np.zeros((nx-nsx-1,ny-nsy-1));
indices=np.nonzero(Sg_rank);
test2=Sg_rank[indices];
for k in range(0,nx-nsx-1):
for p in range(0,ny-nsy-1):
test1=Mg_rank[k:k+nsx,p:p+nsy];
test1=test1[indices];
test=(test1-test2)**2
R[k,p]=test.mean();
ValueError Traceback (most recent call last)
<ipython-input-24-bebe7595123d> in <module>
17 Sg_rank = rank_filter_inf(Sg, rank)
18
---> 19 R=np.zeros((nx-nsx-1,ny-nsy-1));
20 indices=np.nonzero(Sg_rank);
21 test2=Sg_rank[indices];
ValueError: negative dimensions are not allowed
Apparently one of nx-nsx-1,ny-nsy-1 is negative, but you cannot create an array of 0s with negative number of rows/columns. I suggest printing out those values and see where they get negative to fix it.
Related
I am new to Numpy and I have been trying to get the average of an array I derived from another array.
This is the code that have been giving me error: "ufunc 'divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' "
import numpy as np
import pandas as pd
cars = pd.read_csv('data/co2_emissions_canada.csv')
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
#the median of the cars_engine_sizes
print(np.median(cars_engine_sizes))
#the average fuel consumption for regular gasoline (Fuel Type = X), #premium gasoline (Z), ethanol (E), and diesel (D)?
fuel_typesx=np.array(cars_fuel_types[cars_fuel_types=='X'])
print(np.average(fuel_typesx))
fuel_typesz=np.array(cars_fuel_types[cars_fuel_types=='Z'])
print(np.average(fuel_typesz))
fuel_typese=np.array(cars_fuel_types[cars_fuel_types=='E'])
print(np.average(fuel_typese))
please, what am i missing
I'm guessing the FULL error message looks something like this:
In [753]: np.average(np.array(['A','B','C','A'],dtype=object))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [753], in <cell line: 1>()
----> 1 np.average(np.array(['A','B','C','A'],dtype=object))
File <__array_function__ internals>:5, in average(*args, **kwargs)
File ~\anaconda3\lib\site-packages\numpy\lib\function_base.py:380, in average(a, axis, weights, returned)
377 a = np.asanyarray(a)
379 if weights is None:
--> 380 avg = a.mean(axis)
381 scl = avg.dtype.type(a.size/avg.size)
382 else:
File ~\anaconda3\lib\site-packages\numpy\core\_methods.py:191, in _mean(a, axis, dtype, out, keepdims, where)
189 ret = ret.dtype.type(ret / rcount)
190 else:
--> 191 ret = ret / rcount
193 return ret
TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
cars_fuel_types comes from a dataframe, and evidently contains strings like 'E'. So it is object dtype. Even if you select like values, you can't take an 'average'.
average takes the sum of values and divides by the count. sum for python strings is concatenation, not some sort of math.
In [754]: np.sum(np.array(['A','B','C','A'],dtype=object))
Out[754]: 'ABCA'
The neg_ctl_df dataframe contains negative control and the coding_gene_df contains my gene-of-interest.
I want to create a dataframe norm_df that stores the normalized output.
import pandas as pd
# Median of the NEGATIVE controls
neg_ctl_median = neg_ctl_df.iloc[:,-29:].median()
# Median of the POSITIVE controls
pos_ctl_median = pos_ctl_df.iloc[:,-29:].median()
# Mean of the PROBESET controls
probeset_norm = qnorm.quantile_normalize(probe_ctl_df.iloc[:,-29:], axis=1)
# Normalize the samples
i = []
for gene, sample in coding_gene_df.iloc[:,-29:].astype(float).iterrows():
norm_val = sample - neg_ctl_median # Subtract the median of the NEGATIVE controls within the patient sample
norm_val = norm_val / pos_ctl_median # Divide the median of the POSITIVE controls within the patient sample (replace sample value with the value that has already been normalized against negative control)
norm_val = norm_val / probeset_norm # Probeset normalization (quantile normalization)
i.append(norm_val)
pd.DataFrame(i)
Traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-63-85108013a3d8> in <module>()
15 norm_val = norm_val / probeset_norm # Probeset normalization (quantile normalization)
16 i.append(norm_val)
---> 17 pd.DataFrame(i)
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
553 values = values.reshape((values.shape[0], 1))
554 elif values.ndim != 2:
--> 555 raise ValueError(f"Must pass 2-d input. shape={values.shape}")
556
557 return values
ValueError: Must pass 2-d input. shape=(1, 102, 29)
Samples:
coding_gene_df.iloc[1:10,-29:-27].to_dict()
{'12h_P1_T4_TimeC2_PIDC4_Non-Survivor': {'CNTN2': '6.35',
'KCNA2': '5.29',
'LOC79160': '5.99',
'PTGIS': '5.66',
'TTTY11': '3.91',
'VPS4B': '9.68',
'XRCC1': '9.09',
'ZC3HC1': '7.19',
'ZFAS1': '8.68'},
'48h_P1_T6_TimeC3_PIDC1_Non-Survivor': {'CNTN2': '6.6',
'KCNA2': '5.36',
'LOC79160': '6.18',
'PTGIS': '5.54',
'TTTY11': '3.92',
'VPS4B': '9.51',
'XRCC1': '9.15',
'ZC3HC1': '7.05',
'ZFAS1': '8.46'}}
Negative controls:
neg_ctl_df.iloc[1:10,-29:-27].to_dict()
{'12h_P1_T4_TimeC2_PIDC4_Non-Survivor': {'---': '8.45'},
'48h_P1_T6_TimeC3_PIDC1_Non-Survivor': {'---': '8.16'}}
I have a dataframe d and one of the columns is price (Numerical) having 109248 rows. I divided the data into two parts d_train and d_test. d_train has 73196 values and d_test has 36052 values. Now to normalize d_train['price'] and d_test['price'] i did something like this..
price_scalar = Normalizer()
X_train_price = price_scalar.fit_transform(d_train['price'].values.reshape(1, -1)
X_test_price = price_scalar.transform(d_test['price'].values.reshape(1, -1))
Now I'm having this issue
ValueError Traceback (most recent call last)
<ipython-input-20-ba623ca7bafa> in <module>()
3 X_train_price = price_scalar.fit_transform(X_train['price'].values.reshape(1, -1))
----> 4 X_test_price = price_scalar.transform(X_test['price'].values.reshape(1, -1))
/usr/local/lib/python3.7/dist-packages/sklearn/base.py in _check_n_features(self, X, reset)
394 if n_features != self.n_features_in_:
395 raise ValueError(
397 f"is expecting {self.n_features_in_} features as input."
398 )
ValueError: X has 36052 features, but Normalizer is expecting 73196 features as input.
Doing change: reshape(-1,1) instead of reshape(1,-1) runs ok but makes all row values of price to 1.
Reshape(-1, 1) is Ok.The results with 1 is what is expected if you use Normalizer from sklearn:
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one.
scikit-learn always assumes that the data is organized with shape (n_points, n_features) (i.e., each row is a data point). Also, from the documentation, Normalizer normalizes "samples individually to unit norm". This means that each data point (i.e., row) is normalized, rather than along the column (i.e., all price values).
To normalize the values to the [0, 1] range, you should use the MinMaxScaler with the data reshaped into a column. That is,
from sklearn.preprocessing import MinMaxScaler
price_scalar = MinMaxScaler()
X_train_price = price_scalar.fit_transform(d_train['price'].values.reshape(-1, 1))
X_test_price = price_scalar.transform(d_test['price'].values.reshape(-1, 1))
It it noteworthy that this does not guarantee that the price values in the test set are all within the [0, 1] range. That is the way it should be when learning an ML model, but remember to keep that in mind.
Here, you can directly fit_transform() function, instead of fit() and transform() function separately.
price_scalar = Normalizer()
X_train_price = price_scalar.fit_transform(d_train['price'].values.reshape(1, -1)
X_test_price = price_scalar.fit_transform(d_test['price'].values.reshape(1, -1))
The following is my code for finding the 5 point summary statistics. I keep getting this error:
list indices must be integers or slices, not str
It seems like the way i'm using the describe function that i created is wrong.
from statistics import stdev,median,mean
def describe(key):
a=[]
for i in scripts:
a.append(i[key])
a=scripts[key]
total = sum(script[key] for script in scripts)
avg = total/len(a)
avg=mean(a)
s = stdev(a)
q25 = min(a)+(max(a)-min(a))*25
med = min(a)+(max(a)-min(a))*50
med=median(a)
q75 = min(a)+(max(a)-min(a))*75
return (total, avg, s, q25, med, q75)`enter code here`
summary = [('items', describe('items')),
('quantity', describe('quantity')),
('nic', describe('nic')),
('act_cost', describe('act_cost'))]
I keep getting this error:
TypeError Traceback (most recent call last)
<ipython-input-8-ba78d5218ead> in <module>()
----> 1 summary = [('items', describe('items')),
2 ('quantity', describe('quantity')),
3 ('nic', describe('nic')),
4 ('act_cost', describe('act_cost'))]
<ipython-input-1-bcf37f98eb7d> in describe(key)
4 for i in scripts:
5 a.append(i[key])
----> 6 a=scripts[key]
7 total = sum(script[key] for script in scripts)
8 avg = total/len(a)
TypeError: list indices must be integers or slices, not str
It is hard to understand your problem, since we don't know how scripts looks like. It is a global variable which is not defined in your script. The error states that scripts is of type list, but it looks like you assume it is a dataframe in your code. So please check the type of scripts.
Also, did you know that there is an easy way to calculate a Five-number summary with numpy like this:
import numpy as np
minimum, q25, med, q75, maximum = np.percentile(a, [0, 25, 50, 75, 100], interpolation='midpoint')
For description, see:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html
As per your question, you are accessing list of dictionaries.
Directly accessing with its key value is not yielding the result here.
So you must do,
getValues = lambda key,inputData: [subVal[key] for subVal in inputData if key in subVal]
in this case,
getValues('key', scripts) will give the corresponding list, then its easy to compute the statistics of that list.
This python code:
import numpy,math
import scipy.optimize as optimization
import matplotlib.pyplot as plt
# Create toy data for curve_fit.
zo = numpy.array([0.0,1.0,2.0,3.0,4.0,5.0])
mu = numpy.array([0.1,0.9,2.2,2.8,3.9,5.1])
sig = numpy.array([1.0,1.0,1.0,1.0,1.0,1.0])
# Define hubble function.
def Hubble(x,a,b):
return H0 * m.sqrt( a*(1+x)**2 + 1/2 * a * (1+b)**3 )
# Define
def Distancez(x,a,b):
return c * (1+x)* np.asarray(quad(lambda tmp:
1/Hubble(a,b,tmp),0,x))
def mag(x,a,b):
return 5*np.log10(Distancez(x,a,b)) + 25
#return a+b*x
# Compute chi-square manifold.
Steps = 101 # grid size
Chi2Manifold = numpy.zeros([Steps,Steps]) # allocate grid
amin = 0.2 # minimal value of a covered by grid
amax = 0.3 # maximal value of a covered by grid
bmin = 0.3 # minimal value of b covered by grid
bmax = 0.6 # maximal value of b covered by grid
for s1 in range(Steps):
for s2 in range(Steps):
# Current values of (a,b) at grid position (s1,s2).
a = amin + (amax - amin)*float(s1)/(Steps-1)
b = bmin + (bmax - bmin)*float(s2)/(Steps-1)
# Evaluate chi-squared.
chi2 = 0.0
for n in range(len(xdata)):
residual = (mu[n] - mag(zo[n], a, b))/sig[n]
chi2 = chi2 + residual*residual
Chi2Manifold[Steps-1-s2,s1] = chi2 # write result to grid.
Throws this error message:
ValueError Traceback (most recent call last)
<ipython-input-136-d0ef47a881a7> in <module>()
36 residual = (mu[n] - mag(zo[n], a, b))/sig[n]
37 chi2 = chi2 + residual*residual
---> 38 Chi2Manifold[Steps-1-s2,s1] = chi2 # write result to
grid.
ValueError: setting an array element with a sequence.
Note: If I define a simple mag function such as (a+b*x), I do not get any error message.
In fact all three functions Hubble, Distancez and Meg have to be functions of redshift z, which is an array.
Now do you think I need to redefine all these functions to have an output array? I mean first, create an array of redshift and then the output of the functions automatically become array?
I need the output of the Distancez() and mag() functions to be arrays. I managed to do it, simply by changing the upper limit of the integral in the Distancez function from x to x.any(). Now I have an array and this is what I want. However, now I see that the output value of the for example Distance(0.25, 0.5, 0.3) is different from when I just put x in the upper limit of the integral? Any help would be appreciated.
Thanks for your reply.
I need the output of the Distancez() and mag() functions to be arrays. I managed to do it, simply by changing the upper limit of the integral in the Distancez function from x to x.any(). Now I have an array and this is what I want. However, now I see that the output value of the for example Distance(0.25, 0.5, 0.3) is different from when I just put x in the upper limit of the integral? Any help would be appreciated.
The ValueError is saying that it cannot assign an element of the array Chi2Manifold with a value that is a sequence. chi2 is probably a numpy array because residual is a numpy array because, your mag() function returns a numpy array, all because your Distancez function returns an numpy array -- you are telling it to do this with that np.asarray().
If Distancez() returned a scalar floating point value you'd probably be set. Do you need to use np.asarray() in Distancez()? Is that actually a 1-element array, or perhaps you intend to reduce that somehow to a scalar. I don't know what your Hubble() function is supposed to do and I'm not an astronomer but in my experience distances are often scalars ;).
If chi2 is meant to be a sequence or numpy array, you probably want to set an appropriately-sized range of values in Chi2Manifold to chi2.