test for normality - sql

What is the best way to fit test / test normality for each unique ilitm in the below dataset? Thanks

As you know (visible in the edit history) Oracle provides the Shapiro-Wilk
test of normality (I use a link to [R], as you will find much more reference for this implementation).
The important thing to know is that the OUT parameter sig corresponds to what the statistics call the p-value.
Example
DECLARE
sig NUMBER;
mean NUMBER := 0;
stdev NUMBER := 1;
BEGIN
DBMS_STAT_FUNCS.normal_dist_fit (USER,
'DIST',
'DIST1',
'SHAPIRO_WILKS',
mean,
stdev,
sig);
DBMS_OUTPUT.put_line (sig);
END;
/
you get the following output
W value : ,9997023261540432791888281834378157820514
,7136528702727722659486194469256296703232
For comparison the test in r with the same data
> shapiro.test(df$DIST1)
Shapiro-Wilk normality test
data: df$DIST1
W = 0.9997, p-value = 0.7137
The rest is statistics:)
My interpretation - this test is useful if you need to discard the most coarse deviations from the normal distribution
If sig < .05 you may throw the data away as not normal distributed, but a high value of sig doesn't mean the opposite. You only know that you can't discard it as non-normal..
Anyway a plot of distribution can provide better insight that a simple true/false test. Here is R a good resource as well.
Some other useful discussions to this topic.

Related

Does anyone see any issues with my Fibonacci retracement function?

Pinescript doesn't have a built-in function that calculates Fibonacci retracements yet so I tried to write my own. It works by taking data from a security() call and running it through a for loop which does the actual calculations. I was hoping that the function would be able to identify support and resistance levels and use that information to help identify when to buy or sell. However, I'm having an issue simply getting the syntax correct, I have been looking for hours now and I can't seem to find the issue. Can anyone spot the issue? If you have any suggestions besides just syntax I'm all ears, I'm pretty new to pinescript so maybe there is a simpler way of doing this.
//#version=4
// Function for calculating Fibonacci Retracements
// Inputs: tickerid, 'D' (for daily chart), fibonacci_length
fibonacci(tickerid, chart_type, fibonacci_length) =>
// Retrieve high and low prices of the given asset
hl = security(tickerid, chart_type, high, low)
high = hl[0]
low = hl[1]
// Calculate the retracement levels
fibonacci_levels = fill(0.0,fibonacci_length+1)
for i = 0 to fibonacci_length
level = low + (high - low)*i/fibonacci_length
fibonacci_levels[i] := level
// Return the calculated retracement
fibonacci_levels
resulted in syntax error on line that says "fibonacci_levels[i] := level"

Blending problem - calculate prices of a product given an increase in profits

new to AMPL.
In a blending problem, I have solved a model by maximizing profit.
Furthermore, I'm trying to calculate what the unit price of a given amount of product has to be, in order to increase my profits with a certain percentage.
Is it possible to do this directly in run. ?
By applying "let" i'm able to change the amount of product available, but i'm struggling figuring out how to set the price of the product as a variable? How can I do this?
Thank you.
There are a couple of things you're trying to do here.
One is to solve a modified version of the problem with a requirement that the profit be at least X% better than the previous version of the problem. This can be done as follows:
Solve the original version of the problem
Set a param profit_0 equal to the profit from the old problem.
Modify the constraints on the problem as appropriate
Change the objective function, so it's now "maximise Supplier 3's price".
Re-solve the problem with a new constraint that the profit in the new solution be at least as good as 1.05*profit_0 or whatever the requirement is.
The other is to treat some prices as fixed and others as variables. Probably the simplest way to do this is define them all as variables but then constrain some of them to fixed values.
Coding these in AMPL isn't too hard, and I'll give an example of the syntax below. Unfortunately, the fact that you're multiplying (variable price) by (variable quantity bought) to find your costs means you end up with a quadratic constraint, which many solvers will reject.
In this example I've used Gecode, which isn't ideal for this kind of problem (in particular, it requires all variables be integer) but does at least allow for quadratic constraints:
reset;
option solver gecode;
# all the "integer" constraints in this example are there because
# gecode won't accept non-integer variables; ideally they wouldn't
# be there.
# To keep the demo simple, we assume that we are simply buying
# a single ingredient from suppliers and reselling it, without
# blending considerations.
param ingredients_budget;
# the maximum we can spend on buying ingredients
set suppliers;
param max_supply{suppliers};
# maximum amount each supplier has available
var prices{suppliers} integer >= 0;
# the amount charged by each supplier - in fact we want to treat
# some of those as var and some as constant, which we'll do by
# fixing some of the values.
param fixed_prices{suppliers} default -1;
# A positive value will be interpreted as "fix price at this value";
# negative will be interpreted as variable price.
s.t. fix_prices{s in suppliers: fixed_prices[s] > 0}:
prices[s] = fixed_prices[s];
param selling_price;
var quantity_bought{s in suppliers} >= 0, <= max_supply[s] integer;
var quantity_sold integer;
s.t. max_sales: quantity_sold = sum{s in suppliers} quantity_bought[s];
var input_costs integer;
s.t. defineinputcosts: input_costs = sum{s in suppliers} quantity_bought[s]*prices[s];
s.t. enforcebudget: input_costs <= ingredients_budget;
var profit integer;
s.t. defineprofit: profit = quantity_sold*selling_price - input_costs;
maximize OF1: profit;
data;
param ingredients_budget := 1000;
set suppliers :=
S1
S2
;
param max_supply :=
S1 100
S2 0
;
param fixed_prices :=
S1 120
;
param selling_price := 150;
model;
print("Running first case with nothing available from S2");
solve;
display profit;
display quantity_bought;
display quantity_sold;
param profit_0;
let profit_0 := profit;
param increase_factor = 0.05;
let max_supply["S2"] := 100;
s.t. improveprofit: profit >= (1+increase_factor)*profit_0;
maximize OF2: prices["S2"];
objective OF2;
print("Now running to increase profit.");
solve;
display profit;
display quantity_bought;
display quantity_sold;
display prices["S2"];
Another option is to use AMPL's looping commands to run the same problem repeatedly but changing the relevant price value each time, to see which price values give an acceptable profit. If you do this, you don't need to declare price as a var, just make it a param and use "let" to change it between scenarios. This will avoid the quadratic constraint and allow you to use MIP solvers like CPLEX and Gurobi.
You could also ask on the Operations Research SE, where you might get some better answers than I can give you.

AMPL Sum variables operator

I am trying to solve a set of problems using AMPL and add their objective values. However, the sum operator does not seem to work and only keeps getting updated to the most recent value.
set CASES := {1,2,3,4,5,6};
model modelFile.mod;
option solver cplex;
option eexit -123456789;
var total;
let total := 0;
for {j in CASES}
{
reset data;
data ("data" & j & ".dat")
solve;
display total_Cost;
let total := total + total_Cost;
display total;
}
Sample Output:
CPLEX 12.6.3.0: optimal solution; objective 4.236067977
2 dual simplex iterations (0 in phase I)
total_Cost = 4.23607
total = 4.23607
CPLEX 12.6.3.0: optimal solution; objective 5.656854249
5 dual simplex iterations (0 in phase I)
total_Cost = 5.65685
total = 5.65685
where total_cost is the objective value from the optimization problem
Since AMPL is an algebraic modeling language rather than a general-purpose programming language, variables in it denote optimization variables which are determined during the solution process. So each time you call solve, optimization variable total is reset. What you need here is a parameter which, unlike variable, is not changed during the optimization:
param total;
I finally realized that this happened due to the new keyword "reset data" that AMPL has. By changing the keyword to "update", the code works.

Can I run a GA to optimize wavelet transform?

I am running a wavelet transform (cmor) to estimate damping and frequencies that exists in a signal.cmor has 2 parameters that I can change them to get more accurate results. center frequency(Fc) and bandwidth frequency(Fb). If I construct a signal with few freqs and damping then I can measure the error of my estimation(fig 2). but in actual case I have a signal and I don't know its freqs and dampings so I can't measure the error.so a friend in here suggested me to reconstruct the signal and find error by measuring the difference between the original and reconstructed signal e(t)=|x(t)−x^(t)|.
so my question is:
Does anyone know a better function to find the error between reconstructed and original signal,rather than e(t)=|x(t)−x^(t)|.
can I use GA to search for Fb and Fc? or do you know a better search method?
Hope this picture shows what I mean, the actual case is last one. others are for explanations
Thanks in advance
You say you don't know the error until after running the wavelet transform, but that's fine. You just run a wavelet transform for every individual the GA produces. Those individuals with lower errors are considered fitter and survive with greater probability. This may be very slow, but conceptually at least, that's the idea.
Let's define a Chromosome datatype containing an encoded pair of values, one for the frequency and another for the damping parameter. Don't worry too much about how their encoded for now, just assume it's an array of two doubles if you like. All that's important is that you have a way to get the values out of the chromosome. For now, I'll just refer to them by name, but you could represent them in binary, as an array of doubles, etc. The other member of the Chromosome type is a double storing its fitness.
We can obviously generate random frequency and damping values, so let's create say 100 random Chromosomes. We don't know how to set their fitness yet, but that's fine. Just set it to zero at first. To set the real fitness value, we're going to have to run the wavelet transform once for each of our 100 parameter settings.
for Chromosome chr in population
chr.fitness = run_wavelet_transform(chr.frequency, chr.damping)
end
Now we have 100 possible wavelet transforms, each with a computed error, stored in our set called population. What's left is to select fitter members of the population, breed them, and allow the fitter members of the population and offspring to survive into the next generation.
while not done
offspring = new_population()
while count(offspring) < N
parent1, parent2 = select_parents(population)
child1, child2 = do_crossover(parent1, parent2)
mutate(child1)
mutate(child2)
child1.fitness = run_wavelet_transform(child1.frequency, child1.damping)
child2.fitness = run_wavelet_transform(child2.frequency, child2.damping)
offspring.add(child1)
offspring.add(child2)
end while
population = merge(population, offspring)
end while
There are a bunch of different ways to do the individual steps like select_parents, do_crossover, mutate, and merge here, but the basic structure of the GA stays pretty much the same. You just have to run a brand new wavelet decomposition for every new offspring.

Constrained Single-Objective Optimization

Introduction
I need to split an array filled with a certain type (let's take water buckets for example) with two values set (in this case weight and volume), while keeping the difference between the total of the weight to a minimum (preferred) and the difference between the total of the volumes less than 1000 (required). This doesn't need to be a full-fetched genetic algorithm or something similar, but it should be better than what I currently have...
Current Implementation
Due to not knowing how to do it better, I started by splitting the array in two same-length arrays (the array can be filled with an uneven number of items), replacing a possibly void spot with an item with both values being 0. The sides don't need to have the same amount of items, I just didn't knew how to handle it otherwise.
After having these distributed, I'm trying to optimize them like this:
func (main *Main) Optimize() {
for {
difference := main.Difference(WEIGHT)
for i := 0; i < len(main.left); i++ {
for j := 0; j < len(main.right); j++ {
if main.DifferenceAfter(i, j, WEIGHT) < main.Difference(WEIGHT) {
main.left[i], main.right[j] = main.right[j], main.left[i]
}
}
}
if difference == main.Difference(WEIGHT) {
break
}
}
for main.Difference(CAPACITY) > 1000 {
leftIndex := 0
rightIndex := 0
liters := 0
weight := 100
for i := 0; i < len(main.left); i++ {
for j := 0; j < len(main.right); j++ {
if main.DifferenceAfter(i, j, CAPACITY) < main.Difference(CAPACITY) {
newLiters := main.Difference(CAPACITY) - main.DifferenceAfter(i, j, CAPACITY)
newWeight := main.Difference(WEIGHT) - main.DifferenceAfter(i, j, WEIGHT)
if newLiters > liters && newWeight <= weight || newLiters == liters && newWeight < weight {
leftIndex = i
rightIndex = j
liters = newLiters
weight = newWeight
}
}
}
}
main.left[leftIndex], main.right[rightIndex] = main.right[rightIndex], main.left[leftIndex]
}
}
Functions:
main.Difference(const) calculates the absolute difference between the two sides, the constant taken as an argument decides the value to calculate the difference for
main.DifferenceAfter(i, j, const) simulates a swap between the two buckets, i being the left one and j being the right one, and calculates the resulting absolute difference then, the constant again determines the value to check
Explanation:
Basically this starts by optimizing the weight, which is what the first for-loop does. On every iteration, it tries every possible combination of buckets that can be switched and if the difference after that is less than the current difference (resulting in better distribution) it switches them. If the weight doesn't change anymore, it breaks out of the for-loop. While not perfect, this works quite well, and I consider this acceptable for what I'm trying to accomplish.
Then it's supposed to optimize the distribution based on the volume, so the total difference is less than 1000. Here I tried to be more careful and search for the best combination in a run before switching it. Thus it searches for the bucket switch resulting in the biggest capacity change and is also supposed to search for a tradeoff between this, though I see the flaw that the first bucket combination tried will set the liters and weight variables, resulting in the next possible combinations being reduced by a big a amount.
Conclusion
I think I need to include some more math here, but I'm honestly stuck here and don't know how to continue here, so I'd like to get some help from you, basically that can help me here is welcome.
As previously said, your problem is actually a constrained optimisation problem with a constraint on your difference of volumes.
Mathematically, this would be minimise the difference of volumes under constraint that the difference of volumes is less than 1000. The simplest way to express it as a linear optimisation problem would be:
min weights . x
subject to volumes . x < 1000.0
for all i, x[i] = +1 or -1
Where a . b is the vector dot product. Once this problem is solved, all indices where x = +1 correspond to your first array, all indices where x = -1 correspond to your second array.
Unfortunately, 0-1 integer programming is known to be NP-hard. The simplest way of solving it is to perform exhaustive brute force exploring of the space, but it requires testing all 2^n possible vectors x (where n is the length of your original weights and volumes vectors), which can quickly get out of hands. There is a lot of literature on this topic, with more efficient algorithms, but they are often highly specific to a particular set of problems and/or constraints. You can google "linear integer programming" to see what has been done on this topic.
I think the simplest might be to perform a heuristic-based brute force search, where you prune your search tree early when it would get you out of your volume constraint, and stay close to your constraint (as a general rule, the solution of linear optimisation problems are on the edge of the feasible space).
Here are a couple of articles you might want to read on this kind of optimisations:
UCLA Linear integer programming
MIT course on Integer programming
Carleton course on Binary programming
Articles on combinatorial optimisation & linear integer programming
If you are not familiar with optimisation articles or math in general, the wikipedia articles provides a good introduction, but most articles on this topic quickly show some (pseudo)code you can adapt right away.
If your n is large, I think at some point you will have to make a trade off between how optimal your solution is and how fast it can be computed. Your solution is probably suboptimal, but it is much faster than the exhaustive search. There might be a better trade off, depending on the exact configuration of your problem.
It seems that in your case, difference of weight is objective, while difference of volume is just a constraint, which means that you are seeking for solutions that optimize difference of weight attribute (as small as possible), and satisfy the condition on difference of volume attribute (total < 1000). In this case, it's a single objective constrained optimization problem.
Whereas, if you are interested in multi-objective optimization, maybe you wanna look at the concept of Pareto Frontier: http://en.wikipedia.org/wiki/Pareto_efficiency . It's good for keeping multiple good solutions with advantages in different objective, i.e., not losing diversity.