GAMS- manipulating expression within a loop - gams-math

I have a matrix, of dimension, i rows and j columns, a specific element of which is called x(i,j), where say i are plants, and j are markets. In standard GAMS notation:
Sets
i canning plants / seattle, san-diego /
j markets / new-york, chicago, topeka / ;
Now, I also wish to create a loop, over time- for 5 periods. Essentially, say I define
Set t time period
/period1
period2
period3
period4
period5
/ ;
Parameters
time(t)
/ period1 1,
period2 2,
period3 3,
period4 4,
period5 5
/ ;
Basically, I want to re-run this loop, which contains a bunch of other commands, but I wish to re-define this matrix from period 2 onwards, to look like this:
x("seattle",j)=x("seattle",j)+s("new-york",j)
x("new-york",j)=0'
Essentially, within the loop, I want the matrix x(i,j) to look different after period 2, wherein the column x("seattle",j) is replaced with the erstwhile x("seattle",j)+s("new-york",j) and the column x("new-york",j) is set to 0.
The loop would start like :
loop
(t,
...
Option reslim = 20000 ;
option nlp = conopt3 ;
solve example using NLP maximizing VARIABLE ;
) ;
I am not sure how to keep redefining this matrix within the loop, for each period>2.
Please note: After period 2, the matrix looks the same. The change only happens once (i.e., the matrix elements do not keep looping from the previous period, but just switch once , at the end of period 2, and then stay constant thereafter.
Any help on this is much appreciated!

You can use a $ condition to make this change in the loop for period2 only, like this:
x("seattle",j)$sameAs(t,'period2')=x("seattle",j)+s("new-york",j);

Related

How can I create a series of numbers increasing by a decimal in BigQuery?

I'm trying to create a series of numbers for multiple samples using a distribution plot. I want to have numbers .01, .02, .03... etc. in a column by itself, just creating a random number.
I have tried randbetween(0,1) but I cannot have it run a specified number of times and it is also not in sequence. I also tried rand() but that didn't work either.
The output should be something like this:
0.000001
0.000002
0.000003
0.000004
..... etc
Use generate_array() to generate a series of numbers. Then divide:
select cast(n / 1000000 as numeric)
from unnest(generate_array(1, 10)) n

iteration in spark sql dataframe , getting 1st row value in first iteration and second row value in next iteration and so on

Below is the query that will give the data and distance where distance is <=10km
var s=spark.sql("select date,distance from table_new where distance <=10km")
s.show()
this will give the output like
12/05/2018 | 5
13/05/2018 | 8
14/05/2018 | 18
15/05/2018 | 15
16/05/2018 | 23
---------- | --
i want to use first row of the dataframe s , store the date value in a variable v , in first iteration.
In next iteration it should pick the second row , and corresponding data value to be replaced the old variable b .
like wise so on .
I think you should look at Spark "Window Functions". You may find here what you need.
The "bad" way to do this would be to collect the dataframe using df.collect() which would return a list of Rows which you can manually iterate over each using a loop.This is bad cause it brings all the data in your driver.
The better way would be to use foreach() :
df.foreach(lambda x: <<your code here>>)
foreach() takes a lambda function as argument which iterates over each row of the dataframe without bringing all the data in the driver.But you cant use a simple local variable v inside a lambda fuction when there is overwriting involved.you can use spark accumulators for such a case.
eg: if i want to sum all the values in 2nd column
counter = sc.longAccumulator("counter")
df.foreach(lambda row: counter.add(row.get(1)))

How can i add an array of values to Google ortools versus a lower and upper bound?

In the documentation and all examples I can find... in terms of nurse scheduling at least, everyone just declares shift values within the search space of {1,4} lets say for shift 1,2,3,4....
solver = pywrapcp.Solver("schedule_shifts")
num_nurses = 4
num_shifts = 4 # Nurse assigned to shift 0 means not working that day.
num_days = 7
# [START]
# Create shift variables.
shifts = {}
for j in range(num_nurses):
for i in range(num_days):
shifts[(j, i)] = solver.IntVar(0, num_shifts - 1, "shifts(%i,%i)" % (j, i))
shifts_flat = [shifts[(j, i)] for j in range(num_nurses) for i in range(num_days)]
# Create nurse variables.
nurses = {}
for j in range(num_shifts):
for i in range(num_days):
nurses[(j, i)] = solver.IntVar(0, num_nurses - 1, "shift%d day%d" % (j,i))
I want to avoid the use of range of values when I call solver.IntVar(lowerbound, upperbound, ...)
I want IntSolver([available values that you can choose], ...)
I created a matrix of all shifts as the columns flowing from the first day to last. My row indexes don't matter but in each day/shift column, I have the index values of nurses in ranked descending order of who bid the highest for that shift. I want to create then a constraint where if I choose a nurse, I choose the maximum bid that is allowed via other constraints from the column, however I don't know how to do that given the limited documentation ortools has with python IntVar.
Can you try
solver.IntVar([values...], 'name')
It should work.
See https://github.com/google/or-tools/blob/master/examples/python/einav_puzzle2.py

Looping through variables in spss

Im looking for a way to loop through variables (eg week01 to week52) and count the number of times the value changes across the them. For example
week01 to week18 may be coded as 1
week19 to week40 may be coded as 4
and week 41 to 52 may be coded as 3
That would be 2 transistions within the data.
How could i go about writing a code that can find me this information? I'm rather new to this and some help to get me in the right direction would be very appreciated.
You can use the DO REPEAT command to loop through variable lists. Below is an example of using this command to create a before date and after date to compare, and increment a count variable whenever these two variables are different.
data list fixed / observation (A1).
begin data
1
2
3
4
5
end data.
*making random data.
vector week(52).
do repeat week = week1 to week52.
compute week = RND(RV.UNIFORM(0.5,4.4)).
end repeat.
execute.
*initialize count to zero.
compute count = 0.
do repeat week_after = week2 to week52 / week_before = week1 to week51.
if week_after <> week_before count = count + 1.
end repeat.
execute.

Power-law distribution in T-SQL

I basically need the answer to this SO question that provides a power-law distribution, translated to T-SQL for me.
I want to pull a last name, one at a time, from a census provided table of names. I want to get roughly the same distribution as occurs in the population. The table has 88,799 names ranked by frequency. "Smith" is rank 1 with 1.006% frequency, "Alderink" is rank 88,799 with frequency of 1.7 x 10^-6. "Sanders" is rank 75 with a frequency of 0.100%.
The curve doesn't have to fit precisely at all. Just give me about 1% "Smith" and about 1 in a million "Alderink"
Here's what I have so far.
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank] = ROUND(88799 * RAND(), 0)
But this of course yields a uniform distribution.
I promise I'll still be trying to figure this out myself by the time a smarter person responds.
Why settle for the power-law distribution when you can draw from the actual distribution ?
I suggest you alter the LastNames table to include a numeric column which would contain a numeric value representing the actual number of indivuduals with a name that is more common. You'll probably want a number on a smaller but proportional scale, say, maybe 10,000 for each percent of representation.
The list would then look something like:
(other than the 3 names mentioned in the question, I'm guessing about White, Johnson et al)
Smith 0
White 10,060
Johnson 19,123
Williams 28,456
...
Sanders 200,987
..
Alderink 999,997
And the name selection would be
SELECT TOP 1 [LastName]
FROM [LastNames] as LN
WHERE LN.[number_described_above] < ROUND(100000 * RAND(), 0)
ORDER BY [number_described_above] DESC
That's picking the first name which number does not exceed the [uniform distribution] random number. Note how the query, uses less than and ordering in desc-ending order; this will guaranty that the very first entry (Smith) gets picked. The alternative would be to start the series with Smith at 10,060 rather than zero and to discard the random draws smaller than this value.
Aside from the matter of boundary management (starting at zero rather than 10,060) mentioned above, this solution, along with the two other responses so far, are the same as the one suggested in dmckee's answer to the question referenced in this question. Essentially the idea is to use the CDF (Cumulative Distribution function).
Edit:
If you insist on using a mathematical function rather than the actual distribution, the following should provide a power law function which would somehow convey the "long tail" shape of the real distribution. You may wan to tweak the #PwrCoef value (which BTW needn't be a integer), essentially the bigger the coeficient, the more skewed to the beginning of the list the function is.
DECLARE #PwrCoef INT
SET #PwrCoef = 2
SELECT 88799 - ROUND(POWER(POWER(88799.0, #PwrCoef) * RAND(), 1.0/#PwrCoef), 0)
Notes:
- the extra ".0" in the function above are important to force SQL to perform float operations rather than integer operations.
- the reason why we subtract the power calculation from 88799 is that the calculation's distribution is such that the closer a number is closer to the end of our scale, the more likely it is to be drawn. The List of family names being sorted in the reverse order (most likely names first), we need this substraction.
Assuming a power of, say, 3 the query would then look something like
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 88799 - ROUND(POWER(POWER(88799.0, 3) * RAND(), 1.0/3), 0)
Which is the query from the question except for the last line.
Re-Edit:
In looking at the actual distribution, as apparent in the Census data, the curve is extremely steep and would require a very big power coefficient, which in turn would cause overflows and/or extreme rounding errors in the naive formula shown above.
A more sensible approach may be to operate in several tiers i.e. to perform an equal number of draws in each of the, say, three thirds (or four quarters or...) of the cumulative distribution; within each of these parts list, we would draw using a power law function, possibly with the same coeficient, but with different ranges.
For example
Assuming thirds, the list divides as follow:
First third = 425 names, from Smith to Alvarado
Second third = 6,277 names, from to Gainer
Last third = 82,097 names, from Frisby to the end
If we were to need, say, 1,000 names, we'd draw 334 from the top third of the list, 333 from the second third and 333 from the last third.
For each of the thirds we'd use a similar formula, maybe with a bigger power coeficient for the first third (were were are really interested in favoring the earlier names in the list, and also where the relative frequencies are more statistically relevant). The three selection queries could look like the following:
-- Random Drawing of a single Name in top third
-- Power Coef = 12
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 425 - ROUND(POWER(POWER(425.0, 12) * RAND(), 1.0/12), 0)
-- Second third; Power Coef = 7
...
WHERE LN.[Rank]
= (425 + 6277) - ROUND(POWER(POWER(6277.0, 7) * RAND(), 1.0/7), 0)
-- Bottom third; Power Coef = 4
...
WHERE LN.[Rank]
= (425 + 6277 + 82097) - ROUND(POWER(POWER(82097.0, 4) * RAND(), 1.0/4), 0)
Instead of storing the pdf as rank, store the CDF (the sum of all frequencies until that name, starting from Aldekirk).
Then modify your select to retrieve the first LN with rank greater than your formula result.
I read the question as "I need to get a stream of names which will mirror the frequency of last names from the 1990 US Census"
I might have read the question a bit differently than the other suggestions and although an answer has been accepted, and a very through answer it is, I will contribute my experience with the Census last names.
I had downloaded the same data from the 1990 census. My goal was to produce a large number of names to be submitted for search testing during performance testing of a medical record app. I inserted the last names and the percentage of frequency into a table. I added a column and filled it with a integer which was the product of the "total names required * frequency". The frequency data from the census did not add up to exactly 100% so my total number of names was also a bit short of the requirement. I was able to correct the number by selecting random names from the list and increasing their count until I had exactly the required number, the randomly added count never ammounted to more than .05% of the total of 10 million.
I generated 10 million random numbers in the range of 1 to 88799. With each random number I would pick that name from the list and decrement the counter for that name. My approach was to simulate dealing a deck of cards except my deck had many more distinct cards and a varing number of each card.
Do you store the actual frequencies with the ranks?
Converting the algebra from that accepted answer to MySQL is no bother, if you know what values to use for n. y would be what you currently have ROUND(88799 * RAND(), 0) and x0,x1 = 1,88799 I think, though I might misunderstand it. The only non-standard maths operator involved from a T-SQL perspective is ^ which is just POWER(x,y) == x^y.