SAS Proc Optmodel Syntax and Strategy - optimization

I have a data set that looks like this (SAS 9.4):
data opt_test;
input ID GRP $ x1 MIN MAX y z;
cards;
2 F 10 9 11 1.5 100
3 F 10 9 11 1.2 50
4 F 11 9 11 .9 20
8 G 5 4 6 1.2 300
9 G 6 4 6 .9 200
;
run;
I want to create a new variable x2 that maximizes a function based on x1, x2, y, and z.
I am having two main problems:
The syntax on my proc optmodel has some errors that I have not been able to fix "Subscript 1 may not be a set" and constraint has incomplete declaration". UPDATE: I figured this part out.
I need for the value of x2 to be the same for all members of the same GRP. So, id 2,3,4 would have same x2. ID 8 and 9 would have same x2.
Below is my attempt. This will ultimately be able to run with sevarl different GRP of varying numbers of ID.
Thanks in advance for any assistance.
proc optmodel;
set<num> ID;
var x2{ID} >= 0;
string GRP{ID};
number x1{ID};
number MIN{ID};
number MAX{ID};
number y{ID};
number z{ID};
max sales=sum{i in ID}(x2[i])*(1-(x2[i]-x1[i])*y[i]/x1[i])*z[i];
con floor_METRIC1{i in ID}: x2[i]>=MIN[i];
con ceiling_METRIC1{i in ID}: x2[i]<=MAX[i];
read data opt_test into
ID=[ID]
GRP
x1
MIN
MAX
y
z
;
solve;
print x2;
quit;

If you want the value of x2 to be the same for all ids in the same group, then you only need one variable x2 per group. To keep track of which ids are in which group you could use an array of sets indexed by group:
set<num> ID;
string GRP{ID};
set GRPS = setof{i in ID} GRP[i];
set IDperGRP{gi in GRPS} = {i in ID: GRP[i] = gi};
When you use = (as opposed to init), you provide OPTMODEL with a function you don't need to update later. If you change any of the GRP or ID data, optmodel will recompute GRPS and IDperGRP as needed.
Now you can use the GRPS set and the IDperGRP array of sets to rewrite your objective to more closely match your business rule:
max sales = sum{gi in GRPS} sum{i in IDperGRP[gi]}
(x2[gi]) * (1-(x2[gi]-x1[i])*y[i]/x1[i]) * z[i];
Writing the expression this way makes it clearer (to me at least) that it can be simplified further as long as x1, y, and z are constants.
Adding the new sets also makes it clearer (to me at least) that the bounds from floor_METRIC1 and ceiling_METRIC1 can be combined to tighten the domain of x2. Since MIN and MAX are constants, you can move the constraints into direct variable bounds by adding >= and <= clauses to the declaration of x2. Since x2 will now depend on MIN and MAX, you will have to declare those before x2. IMHO that makes your intent clearer:
number MIN{ID};
number MAX{ID};
var x2{gi in GRPS} >= max{i in IDperGRP[gi]} MIN[i]
<= min{i in IDperGRP[gi]} MAX[i]
;

Related

Combining many sort ranks into one master sort rank

Say I have some sorted result from a SQL query that looks like:
x y z
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 2 0
0 2 1
Where x, y and z are sort ranks. These sort ranks are always greater than 0, and smaller than 500mil.
Is there a way to combine the values from x, y and z into one "master" sort rank? Sorting the dataset using this "master" sort rank should result in the same ordering.
I'm thinking I can do something with bit shifting but I am not sure...
Assuming that every value in each of the three columns in between 1 and 500 million, you could use the following formula to generate a unique rank:
1000000
z + (500 x 10^6)*y + (500 x 10^6)*(500 x 10^6)*x
To generate this rank you could use the following query:
SELECT
x, y, z,
z + (500 * 1000000)*y + (500 * 1000000)*(500 * 1000000)*x AS master_rank
FROM yourTable;
The reason this works can be seen by examining say the z and y columns. The largest value from z is 500 million, which is guaranteed to be smaller than the smallest value in y, which is 1 billion. This logic applies to the whole formula. This approach is similar to using a bit mask, on a larger scale.
Note that I assume that your version of SQL can tolerate numbers this large. If it doesn't, then you might want to consider another approach here, possibly just ordering as #Gordon mentioned in his answer. Besides this, having 1 bil x 1 bil records would make for a very large table and would have other problems.
Do you mean something like this?
order by x * 10000 + y * 100 + z
(You would adjust the numbers for the width you need.)
I'm not sure why you would want to do that instead of:
order by x, y, z
If you do combine into a single value, be careful about integer overflow.

Shuffle data in a repeatable way (ability to get the same "random" order again)

This is the opposite of what most "random order" questions are about.
I want to select data from a database in random order. But I want to be able to repeat certain selects, getting the same order again.
Current (random) select:
SELECT custId, rand() as random from
(
SELECT DISTINCT custId FROM dummy
)
Using this, every key/row gets a random number. Ordering those ascending results in a random order.
But I want to repeat this select, getting the very same order again. My idea is to calculate a random number (r) once per session (e.g. "4") and use this number to shuffle the data in some way.
My first idea:
SELECT custId, custId * 4 as random from
(
SELECT DISTINCT custId FROM dummy
)
(in real life "4" would be something like 4005226664240702)
This results in a different number for each line but the same ones every run. By changing "r" to 5 all numbers will change.
The problem is: multiplication is not sufficient here. It just increases the numbers but keeps the order the same. Therefore I need some other kind of arithmetic function.
More abstract
Starting with my data (A-D). k is the key and r is the random number currently used:
k r
A = 1 4
B = 2 4
C = 3 4
D = 4 4
Doing some calculation using k and r in every line I want to get something like:
k r
A = 1 4 --> 12
B = 2 4 --> 13
C = 3 4 --> 11
D = 4 4 --> 10
The numbers can be whatever they want, but when I order them ascending I want to get a different order than the initial one. In this case D, C, A, B, E.
Setting r to 7 should result in a different order (C, A, B, D):
k r
A = 1 7 --> 56
B = 2 7 --> 78
C = 3 7 --> 23
D = 4 7 --> 80
Every time I use r = 7 should result in the same numbers => same order.
I'm looking for a mathematical function to do the calculation with k and r. Seeding the RAND() function is not suitable because it's not supported by some databases we support
Please note that r is already a randomly generated number
Background
One Table - Two data consumers. One consumer will get random 5% of the table, the other one the other 95%. They don't just get the data but a generated SQL. So there are two SQL's which must not select the same data twice but still random.
You could try and implement the Multiply-With-Carry PseudoRandomNumberGenerator. The C version goes like this (source: Wikipedia):
m_w = <choose-initializer>; /* must not be zero, nor 0x464fffff */
m_z = <choose-initializer>; /* must not be zero, nor 0x9068ffff */
uint get_random()
{
m_z = 36969 * (m_z & 65535) + (m_z >> 16);
m_w = 18000 * (m_w & 65535) + (m_w >> 16);
return (m_z << 16) + m_w; /* 32-bit result */
}
In SQL, you could create a table Random, with two columns to contain w and z, and one ID column to identify each session. Perhaps your vendor supports variables and you need not bother with the table.
Nonetheless, even if we use a table, we immediately run into trouble cause ANSI SQL doesn't support unsigned INTs. In SQL Server I could switch to BIGINT, unsure if your vendor supports that.
CREATE TABLE Random (ID INT, [w] BIGINT, [z] BIGINT)
Initialize a new session, say number 3, by inserting 1 into z and the seed into w:
INSERT INTO Random (ID, w, z) VALUES (3, 8921, 1);
Then each time you wish to generate a new random number, do the computations:
UPDATE Random
SET
z = (36969 * (z % 65536) + z / 65536) % 4294967296,
w = (18000 * (w % 65536) + w / 65536) % 4294967296
WHERE ID = 3
(Note how I have replaced bitwise operands with div and mod operations and how, after computing, you need to mod 4294967296 to stay within the proper 32 bits unsigned int range.)
And select the new value:
SELECT(z * 65536 + w) % 4294967296
FROM Random
WHERE ID = 3
SQLFiddle demo
Not sure if this applies in non-SQL Server, but typically when you use a RAND() function, you can specify a seed. Everytime you specify the same seed, the randomization will be the same.
So, it sounds like you just need to store the seed number and use that each time to get the same set of random numbers.
MSDN Article on RAND
Each vendor has solved this in its own way. Creating your own implementation will be hard, since random number generation is difficult.
Oracle
dbms_random can be initialized with a seed: http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_random.htm#i998255
SQL Server
First call to RAND() can provide a seed: http://technet.microsoft.com/en-us/library/ms177610.aspx
MySql
First call to RAND() can provide a seed: http://dev.mysql.com/doc/refman/4.1/en/mathematical-functions.html#function_rand
Postgresql
Use SET SEED or SELECT setseed() : http://www.postgresql.org/docs/8.3/static/sql-set.html

How to choose a range for a loop based upon the answers of a previous loop?

I'm sorry the title is so confusingly worded, but it's hard to condense this problem down to a few words.
I'm trying to find the minimum value of a specific equation. At first I'm looping through the equation, which for our purposes here can be something like y = .245x^3-.67x^2+5x+12. I want to design a loop where the "steps" through the loop get smaller and smaller.
For example, the first time it loops through, it uses a step of 1. I will get about 30 values. What I need help on is how do I Use the three smallest values I receive from this first loop?
Here's an example of the values I might get from the first loop: (I should note this isn't supposed to be actual code at all. It's just a brief description of what's happening)
loop from x = 1 to 8 with step 1
results:
x = 1 -> y = 30
x = 2 -> y = 28
x = 3 -> y = 25
x = 4 -> y = 21
x = 5 -> y = 18
x = 6 -> y = 22
x = 7 -> y = 27
x = 8 -> y = 33
I want something that can detect the lowest three values and create a loop. From theses results, the values of x that get the smallest three results for y are x = 4, 5, and 6.
So my "guess" at this point would be x = 5. To get a better "guess" I'd like a loop that now does:
loop from x = 4 to x = 6 with step .5
I could keep this pattern going until I get an absurdly accurate guess for the minimum value of x.
Does anybody know of a way I can do this? I know the values I'm going to get are going to be able to be modeled by a parabola opening up, so this format will definitely work. I was thinking that the values could be put into a column. It wouldn't be hard to make something that returns the smallest value for y in that column, and the corresponding x-value.
If I'm being too vague, just let me know, and I can answer any questions you might have.
nice question. Here's at least a start for what I think you should do for this:
Sub findMin()
Dim lowest As Integer
Dim middle As Integer
Dim highest As Integer
lowest = 999
middle = 999
hightest = 999
Dim i As Integer
i = 1
Do While i < 9
If (retVal(i) < retVal(lowest)) Then
highest = middle
middle = lowest
lowest = i
Else
If (retVal(i) < retVal(middle)) Then
highest = middle
middle = i
Else
If (retVal(i) < retVal(highest)) Then
highest = i
End If
End If
End If
i = i + 1
Loop
End Sub
Function retVal(num As Integer) As Double
retVal = 0.245 * Math.Sqr(num) * num - 0.67 * Math.Sqr(num) + 5 * num + 12
End Function
What I've done here is set three Integers as your three Min values: lowest, middle, and highest. You loop through the values you're plugging into the formula (here, the retVal function) and comparing the return value of retVal (hence the name) to the values of retVal(lowest), retVal(middle), and retVal(highest), replacing them as necessary. I'm just beginning with VBA so what I've done likely isn't very elegant, but it does at least identify the Integers that result in the lowest values of the function. You may have to play around with the values of lowest, middle, and highest a bit to make it work. I know this isn't EXACTLY what you're looking for, but it's something along the lines of what I think you should do.
There is no trivial way to approach this unless the problem domain is narrowed.
The example polynomial given in fact has no minimum, which is readily determined by observing y'>0 (hence, y is always increasing WRT x).
Given the wide interpretation of
[an] equation, which for our purposes here can be something like y =
.245x^3-.67x^2+5x+12
many conditions need to be checked, even assuming the domain is limited to polynomials.
The polynomial order is significant, and the order determines what conditions are necessary to check for how many solutions are possible, or whether any solution is possible at all.
Without taking this complexity into account, an iterative approach could yield an incorrect solution due to underflow error, or an unfortunate choice of iteration steps or bounds.
I'm not trying to be hard here, I think your idea is neat. In practice it is more complicated than you think.

Divide int into 2 other int

I need to divide one int into 2 other int's. the first int is not constant so one problem would be, what to do with odd numbers because I only want whole numbers. For example, if int = 5, then int(2) will = 2 and int(3) will = 3. Any help will greatly be appreciated.
Supposing you want to express x = a + b, where a and b are as close to x/2 as possible:
a = ceiling(x / 2.0);
b = floor(x / 2.0);
That's pseudo code, you have to find out the actual functions for floor and ceiling from your library. Make sure the division is performed as floating point numbers.
As pure integers:
a = x / 2 + (x % 2 == 0 ? 0 : 1);
b = x / 2
(This may be a bit fishy for negative numbers, because it'll depend on the behaviour of division and modulo for negative numbers.)
You can try ceil and floor functions from math to produce results like 2 and 3 for odd inputs;
int(2)=ceil(int/2); //will produce 3 for input 5
int(3)=floor(int/2); //will produce 2 for input 5
Well my answer is not in Objective-C but i guess you could translate this easily.
My idea is:
part1 = source_number div 2
part2 = source_number div 2 + (source_number mod 2)
This way the second number will be bigger if the starting number is an odd number.

How would I do this in a program? Math question

I'm trying to make a generic equation which converts a value. Here are some examples.
9,873,912 -> 9,900,000
125,930 -> 126,000
2,345 -> 2,400
280 -> 300
28 -> 30
In general, x -> n
Basically, I'm making a graph and I want to make values look nicer. If it's a 6 digit number or higher, there should be at least 3 zeros. If it's a 4 digit number or less, there should be at least 2 digit numbers, except if it's a 2 digit number, 1 zero is fine.
(Ignore the commas. They are just there to help read the examples). Anyways, I want to convert a value x to this new value n. What is an equation g(x) which spits out n?
It is for an objective-c program (iPhone app).
Divide, truncate and multiply.
10**x * int(n / 10**(x-d))
What is "x"? In your examples it's about int(log10(n))-1.
What is "d"? That's the number of significant digits. 2 or 3.
Ahhh rounding is a bit awkward in programming in general. What I would suggest is dividing by the power of ten, int cast and multiplying back. Not remarkably efficient but it will work. There may be a library that can do this in Objective-C but that I do not know.
if ( x is > 99999 ) {
x = ((int)x / 1000) * 1000;
}
else if ( x > 999 ) {
x = ((int) x / 100) * 100;
}
else if ( x > 9 ) {
x = ((int) x / 10) * 10;
}
Use standard C functions like round() or roundf()... try man round at a command line, there are several different options depending on the data type. You'll probably want to scale the values first by dividing by an appropriate number and then multiplying the result by the same number, something like:
int roundedValue = round(someNumber/scalingFactor) * scalingFactor;