how are you?.
I'm trying to create a function that calculates the z-transform of a transfer function using the residues method but for that, I need the factors of the characteristic equation and the powers of the factors, so, in order to do that I tried to factorize polynomials with non-integer coefficients but after trying everything that I read I couldn't factorize make maxima to factorize those polynomials the way I need it.
For giving an example, I have this characteristic equation: "s·(s^2+0.1·s)", the factors should be "s^2" and "s + 0.1" but maxima allways gives me "(s^2·(10·s + 1))/10".
Why I'm signalling this?, well, as I learned that maxima treates the outputs equation as list so I can have its dimension and separate the factos by its positions in the list to measure the powers of the factors and do what I need, but like maxima gives me the result that is shown above then the dimension of the list is different and it will make my function to work differently and possibly have errors.
The result that is shown is given by maxima no matter if I use factor, gfactor, or expand or whatever other function that I found and I know that result is because maxima are rationalizing the polynomial before working with it but I don't need that behavior, I only need the pure factors, so, how can I have the result that I want?.
Thanks in advance for the help.
My questions are specific to https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.
I don't understand why you square eigenvalues
https://github.com/scikit-learn/scikit-learn/blob/55bf5d9/sklearn/decomposition/pca.py#L444
here?
Also, explained_variance is not computed for new transformed data other than original data used to compute eigen-vectors. Is that not normally done?
pca = PCA(n_components=2, svd_solver='full')
pca.fit(X)
pca.transform(Y)
In this case, won't you separately calculate explained variance for data Y as well. For that purpose, I think we would have to use point 3 instead of using eigen-values.
Explained variance can be also computed by taking the variance of each axis in the transformed space and dividing by the total variance. Any reason that is not done here?
Answers to your questions:
1) The square roots of the eigenvalues of the scatter matrix (e.g. XX.T) are the singular values of X (see here: https://math.stackexchange.com/a/3871/536826). So you square them. Important: the initial matrix X should be centered (data has been preprocessed to have zero mean) in order for the above to hold.
2) Yes this is the way to go. explained_variance is computed based on the singular values. See point 1.
3) It's the same but in the case you describe you HAVE to project the data and then do additional computations. No need for that if you just compute it using the eigenvalues / singular values (see point 1 again for the connection between these two).
Finally, keep in mind that not everyone really wants to project the data. Someone can only get the eigenvalues and then immediately estimate the explained variance WITHOUT projecting the data. So that's the best gold standard way to do it.
EDIT 1:
Answer to edited Point 2
No. PCA is an unsupervised method. It only transforms the X data not the Y (labels).
Again, the explained variance can be computed fast, easily, and with half line of code using the eigenvalues/singular values OR as you said using the projected data e.g. estimating the covariance of the projected data, then variances of PCs will be in the diagonal.
what conclusion can be drawn from the resulting t-stats value When ttest_ind is applied on two independent series?
As you can read here, the scipy.stats.ttest_ind has two outputs
The calculated t-statistic.
The two-tailed p-value.
Very intuitively, you can read the t-statistic as a normalized difference of averages in both populations, considering their variances and sizes:
The larger are the samples, the more serious the difference of averages is because we have more evidence for that.
The larger are the variances, the less serious the difference of averages is because the absolute difference can be given by randomness only.
The higher is the value of the t-statistic, the more serious is the difference.
The p-value makes this intuition more explicit: it is the probability that the difference of averages can be considered as zero. If the p-value is bellow a threshold, e.g. 0.05, we say that the difference in not zero.
For a university course in numerical analysis we are transitioning from Maple to a combination of Numpy and Sympy for various illustrations of the course material. This is because the students already learn Python the semester before.
One of the difficulties we have is in emulating fixed precision in Python. Maple allows the user to specify a decimal precision (say 10 or 20 digits) and from then on every calculation is made with that precision so you can see the effect of rounding errors. In Python we tried some ways to achieve this:
Sympy has a rounding function to a specified number of digits.
Mpmath supports custom precision.
This is however not what we're looking for. These options calculate the exact result and round the exact result to the specified number of digits. We are looking for a solution that does every intermediate calculation in the specified precision. Something that can show, for example, the rounding errors that can happen when dividing two very small numbers.
The best solution so far seems to be the custom data types in Numpy. Using float16, float32 and float64 we were able to al least give an indication of what could go wrong. The problem here is that we always need to use arrays of one element and that we are limited to these three data types.
Does anything more flexible exist for our purpose? Or is the very thing we're looking for hidden somewhere in the mpmath documentation? Of course there are workarounds by wrapping every element of a calculation in a rounding function but this obscures the code to the students.
You can use decimal. There are several ways of usage, for example, localcontext or getcontext.
Example with getcontext from documentation:
>>> from decimal import *
>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
Example of localcontext usage:
>>> from decimal import Decimal, localcontext
>>> with localcontext() as ctx:
... ctx.prec = 4
... print Decimal(1) / Decimal(3)
...
0.3333
To reduce typing you can abbreviate the constructor (example from documentation):
>>> D = decimal.Decimal
>>> D('1.23') + D('3.45')
Decimal('4.68')
I searched the site but did not find exactly what I was looking for... I wanted to generate a discrete random number from normal distribution.
For example, if I have a range from a minimum of 4 and a maximum of 10 and an average of 7. What code or function call ( Objective C preferred ) would I need to return a number in that range. Naturally, due to normal distribution more numbers returned would center round the average of 7.
As a second example, can the bell curve/distribution be skewed toward one end of the other? Lets say I need to generate a random number with a range of minimum of 4 and maximum of 10, and I want the majority of the numbers returned to center around the number 8 with a natural fall of based on a skewed bell curve.
Any help is greatly appreciated....
Anthony
What do you need this for? Can you do it the craps player's way?
Generate two random integers in the range of 2 to 5 (inclusive, of course) and add them together. Or flip a coin (0,1) six times and add 4 to the result.
Summing multiple dice produces a normal distribution (a "bell curve"), while eliminating high or low throws can be used to skew the distribution in various ways.
The key is you are going for discrete numbers (and I hope you mean integers by that). Multiple dice throws famously generate a normal distribution. In fact, I think that's how we were first introduced to the Gaussian curve in school.
Of course the more throws, the more closely you approximate the bell curve. Rolling a single die gives a flat line. Rolling two dice just creates a ramp up and down that isn't terribly close to a bell. Six coin flips gets you closer.
So consider this...
If I understand your question correctly, you only have seven possible outcomes--the integers (4,5,6,7,8,9,10). You can set up an array of seven probabilities to approximate any distribution you like.
Many frameworks and libraries have this built-in.
Also, just like TokenMacGuy said a normal distribution isn't characterized by the interval it's defined on, but rather by two parameters: Mean μ and standard deviation σ. With both these parameters you can confine a certain quantile of the distribution to a certain interval, so that 95 % of all points fall in that interval. But resticting it completely to any interval other than (−∞, ∞) is impossible.
There are several methods to generate normal-distributed values from uniform random values (which is what most random or pseudorandom number generators are generating:
The Box-Muller transform is probably the easiest although not exactly fast to compute. Depending on the number of numbers you need, it should be sufficient, though and definitely very easy to write.
Another option is Marsaglia's Polar method which is usually faster1.
A third method is the Ziggurat algorithm which is considerably faster to compute but much more complex to program. In applications that really use a lot of random numbers it may be the best choice, though.
As a general advice, though: Don't write it yourself if you have access to a library that generates normal-distributed random numbers for you already.
For skewing your distribution I'd just use a regular normal distribution, choosing μ and σ appropriately for one side of your curve and then determine on which side of your wanted mean a point fell, stretching it appropriately to fit your desired distribution.
For generating only integers I'd suggest you just round towards the nearest integer when the random number happens to fall within your desired interval and reject it if it doesn't (drawing a new random number then). This way you won't artificially skew the distribution (such as you would if you were clamping the values at 4 or 10, respectively).
1 In testing with deliberately bad random number generators (yes, worse than RANDU) I've noticed that the polar method results in an endless loop, rejecting every sample. Won't happen with random numbers that fulfill the usual statistic expectations to them, though.
Yes, there are sophisticated mathematical solutions, but for "simple but practical" I'd go with Nosredna's comment. For a simple Java solution:
Random random=new Random();
public int bell7()
{
int n=4;
for (int x=0;x<6;++x)
n+=random.nextInt(2);
return n;
}
If you're not a Java person, Random.nextInt(n) returns a random integer between 0 and n-1. I think the rest should be similar to what you'd see in any programming language.
If the range was large, then instead of nextInt(2)'s I'd use a bigger number in there so there would be fewer iterations through the loop, depending on frequency of call and performance requirements.
Dan Dyer and Jay are exactly right. What you really want is a binomial distribution, not a normal distribution. The shape of a binomial distribution looks a lot like a normal distribution, but it is discrete and bounded whereas a normal distribution is continuous and unbounded.
Jay's code generates a binomial distribution with 6 trials and a 50% probability of success on each trial. If you want to "skew" your distribution, simply change the line that decides whether to add 1 to n so that the probability is something other than 50%.
The normal distribution is not described by its endpoints. Normally it's described by it's mean (which you have given to be 7) and its standard deviation. An important feature of this is that it is possible to get a value far outside the expected range from this distribution, although that will be vanishingly rare, the further you get from the mean.
The usual means for getting a value from a distribution is to generate a random value from a uniform distribution, which is quite easily done with, for example, rand(), and then use that as an argument to a cumulative distribution function, which maps probabilities to upper bounds. For the standard distribution, this function is
F(x) = 0.5 - 0.5*erf( (x-μ)/(σ * sqrt(2.0)))
where erf() is the error function which may be described by a taylor series:
erf(z) = 2.0/sqrt(2.0) * Σ∞n=0 ((-1)nz2n + 1)/(n!(2n + 1))
I'll leave it as an excercise to translate this into C.
If you prefer not to engage in the exercise, you might consider using the Gnu Scientific Library, which among many other features, has a technique to generate random numbers in one of many common distributions, of which the Gaussian Distribution (hint) is one.
Obviously, all of these functions return floating point values. You will have to use some rounding strategy to convert to a discrete value. A useful (but naive) approach is to simply downcast to integer.