uniformInt produce duplicate number - gams-math

I asked this question Generate 100 data randomly, or select if it is possible a few days ago. Now when I want to pick 10 number randomly , from a set of i /1*300/.
i use this code :
Set I /0*300/
picks /p1*p10/;
Scalar pick;
Parameter MyParameter(I);
MyParameter(I) = 0;
loop(picks,
pick = uniformInt(1, card(I));
* Make sure to not pick the same one twice
while(sum(I$(pick=ord(I)),MyParameter(I))=1,
pick = uniformInt(1, card(I))
Display 'here';
);
MyParameter(I)$(pick=ord(I))=1;
);
Display MyParameter;
I want to run this code a few times ,And I want to choose 10 random numbers at first times.
20 selection in Second times, 30selection in third times , ... ,100 selection in 10th .
In addition, I need to select new numbers every time,
i mean , the numbers that are selected the second time must be different from numbers that are selected the first time.
but uniformInt Every time,selects repetitive numbers.
For example, the results for the selection of 10 and 20 are as follows:
* for picks /p1*p10/;
21 1.000, 52 1.000, 68 1.000, 88 1.000, 91
1.000, 106 1.000
151 1.000, 166 1.000, 254 1.000, 258 1.000
And result for 20 selected is:
* for picks /p1*p20/;
21 1.000, 40 1.000, 49 1.000, 52 1.000, 68
1.000, 76 1.000
88 1.000, 91 1.000, 106 1.000, 132 1.000, 151
1.000, 166 1.000
175 1.000, 193 1.000, 202 1.000, 230 1.000, 254
1.000, 258 1.000
299 1.000
21,52,68,88,91,106,151,166,254,258 are duplicate in second time.
What should I do to avoid duplicate numbers every time? Why does not this function produce different numbers?

i searched and i found my answer in https://support.gams.com/gams:random_number_generator_in_gams
i wanted the following code:
Set I /0*300/
picks /p1*p10/;
Scalar pick;
Parameter MyParameter(I);
MyParameter(I) = 0;
execseed = 1 + gmillisec(jnow);
loop(picks,
pick = uniformInt(1, card(I));
* Make sure to not pick the same one twice
while(sum(I$(pick=ord(I)),MyParameter(I))=1,
pick = uniformInt(1, card(I))
);
MyParameter(I)$(pick=ord(I))=1;
);
Display MyParameter;

Related

Excel UDF to Unpivot (Melt, Reverse pivot, Flatten, Normalize) blocks of data within Tables

This question will seek multiple approaches LET/LAMBDA VBA UDF and Power Query Function, so there will be no single right answer, but a solicitation of approaches to be used as references.
Scott raised a question here about unpivoting a complex table that contains blocks of data instead of individual data points. The basic idea is illustrated in this table:
Jan
Jan
Jan
Jan
Feb
Feb
Feb
Feb
Mar
Mar
Mar
Mar
State
City
Pressure
Temp
Humidity
CO2
Pressure
Temp
Humidity
CO2
Pressure
Temp
Humidity
CO2
Georgia
Atlanta
1
2
3
4
5
6
7
8
9
10
11
12
Massachusetts
Boston
49
50
51
52
53
54
55
56
57
58
59
60
Texas
Dallas
97
98
99
100
101
102
103
104
105
106
107
108
Louisiana
Jonesboro
145
146
147
148
149
150
151
152
153
154
155
156
California
San Francisco
193
194
195
196
197
198
199
200
201
202
203
204
The data for each city is in blocks of four columns containing Pressure, Temperature, Humidity and CO2 (or PTHC). We want to unpivot the PTHC blocks of values according to their month by the State and City. Here is the desired output:
State
City
month
Pressure
Temp
Humidity
CO2
Georgia
Atlanta
Jan
1
2
3
4
Georgia
Atlanta
Feb
5
6
7
8
Georgia
Atlanta
Mar
9
10
11
12
Massachusetts
Boston
Jan
49
50
51
52
Massachusetts
Boston
Feb
53
54
55
56
Massachusetts
Boston
Mar
57
58
59
60
Texas
Dallas
Jan
97
98
99
100
Texas
Dallas
Feb
101
102
103
104
Texas
Dallas
Mar
105
106
107
108
Louisiana
Jonesboro
Jan
145
146
147
148
Louisiana
Jonesboro
Feb
149
150
151
152
Louisiana
Jonesboro
Mar
153
154
155
156
California
San Francisco
Jan
193
194
195
196
California
San Francisco
Feb
197
198
199
200
California
San Francisco
Mar
201
202
203
204
The order of the rows is not important, so long as they are complete - i.e. the output could be sorted by month, city, state, ... it does not matter. The output does not need to be a dynamic array that spills - i.e. in the case of a Power Query function, it clearly would not be.
It can be assumed that the PTHC block is always consistent, i.e.
it never skips a field value, e.g. PTHC PTC PTHC...
it never changes order, e.g. PTHC PCHT
The months are always presented in groups that are equally sized to the block (in this example, 4, so there will be four Jan columns, Feb columns, etc.). e.g. if there are 7 months, there will be 7 PTHC blocks or 28 columns of data.
However, the pattern of months can also be interleaved such that the
months will increment and the PTHC block will be grouped (i.e. PPP
TTT HHH CCC) like this:
Jan
Feb
Mar
Jan
Feb
Mar
Jan
Feb
Mar
Jan
Feb
Mar
State
City
Pressure
Pressure
Pressure
Temp
Temp
Temp
Humidity
Humidity
Humidity
CO2
CO2
CO2
The UDF would also have to accommodate more or less than 4 fields inside the block. The use of Months and PTHC are just illustrations, the attribute that represents months in this example will always be a single row (although a multi-row approach would be an interesting question - but a new and separate one). The attribute that represents the field values PTHC will also be a single row.
I will propose a LET function based on Scott's question, but there
certainly can be better approaches and both VBA and Power Query have
their own strengths. The objective is to create a collection of
working approaches.
LET/LAMBDA Approach
This requires Excel 365. The formula is:
=LET( upValues, C3:N7, upHdr, C2:N2, upAttr, C1:N1,
byBody, A3:B7, byHdr, A2:B2,
attrTitle, "month",
upFields, UNIQUE( upHdr,1 ), blockSize, COUNTA( upFields ),
byC, COLUMNS( byBody ), upC, COLUMNS( upValues ),
dmxR, MIN( ROWS( upValues ), ROWS( byBody ) ),
upCells, dmxR * upC/blockSize,
tCSeq, SEQUENCE( 1, byC + 1 + blockSize ), tRSeq, SEQUENCE( upCells + 1,, 0 ), upSeq, SEQUENCE( upCells,, 0 ),
hdr, IF( tCSeq <= byC, INDEX( byHdr, , tCSeq ),
IF( tCSeq = byC + 1, attrTitle,
INDEX( upFields, 1, tCSeq - byC - 1 ) ) ),
muxBody, INDEX( byBody, SEQUENCE( upCells, byC, 0 )/byC/upC*blockSize + 1, SEQUENCE( 1, byC ) ),
muxAttr, INDEX( upAttr, MOD( SEQUENCE( upCells,, 0, blockSize ), upC ) + 1 ),
muxValues, INDEX( upValues, SEQUENCE( upCells, blockSize, 0 )/upC+1, MOD(SEQUENCE( upCells, blockSize, 0 ),upC)+1),
table, IF( tCSeq <= byC, muxBody,
IF( tCSeq = byC + 1, muxAttr,
INDEX( muxValues, upSeq + 1, tCSeq - byC - 1 ) ) ),
IF( tRSeq = 0, hdr, INDEX( table, tRSeq, tCSeq) ) )
This takes in 6 variables:
upValues - the data that will be unpivoted in blocks
upHdr - the header row that contains the PTHC values
upAttr - the attribute that will be unpivoted i.e. the months row
byBody - the body of values that will unpivot the values i.e. the State and City values
byHdr - the header of the byBody (the titles "State" and "City")
attrTitle - an optional title for the attribute that will be unpivoted
These are better understood in this illustration:
and here it is with the test data and the results shown to make it easier to understand:
The output above can also be illustrated:
The red text are the internal variables used to construct the result.
The formula has 5 parts as follows:
Taking Dimensions is obvious - it is simply parameterizing the variables that will be used repeatedly later. dmxR is using the MIN of the rows of either upValues or byBody just in case the user accidentally puts in malformed values and byBody that would otherwise result in a nonsensical output.
Building Sequences creates three sequences that will be used for indexing the inputs and outputs:
tCSeq (table column sequence) is a column-wise sequence sized to the final output table that will have byBody + Attribute (month) + values (blocksize) columns.
tRSeq (table row sequence) is a row-wise sequence sized to the final output table that will have dmxR*upC/blocksize + 1 (hdr) rows.
upSeq (unpivot sequence) is a row-wise sequence sized to the final output table that will have dmxR*upC/blocksize rows (no header).
Create Array Components uses the dimensions and sequences above to construct the parts of the output table.
hdr (header) is the new header with the labels (State & City), the attribute title (month) and the field names (PTHC).
muxBody (multiplexed byBody) is the repetition of the byBody that is multiplexed across the dmxR rows.
muxAttr (multiplexed upAttr) is the repetition of the upAttr that is multiplexed across the dmxR rows.
muxValues (multiplexed upValues) is a block-wise repetition that will have dmxR*upC/blocksize rows.
The last two lines stitch parts together. First, table stitches muxBody, muxAttr and muxValues in a column-wise integration using tCSeq and a row-wise multiplex using upSeq.
Just because it is mentally easier (and easier to test), I separated the row-wise integration (using tRSeq) of the hdr onto the table in the last line.
An alternative to stitching with IF statements is to use IFERROR(INDEX which forces errors and then replaces the errors with the next part of the table, but that is sooo hard to test and debug even when it is only row-wise or column-wise. Put in a combination of row-wise and column-wise, it is a cauchemar.
Powerquery version. A bit longer code to accommodate possibility of AAAABBBB instead of ABABABAB
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
// list of months
#"Unpivoted Other Columns" = List.Repeat(Table.UnpivotOtherColumns(Table.FirstN(Source,1), {"Column1", "Column2"}, "Attribute", "Value")[Value],Table.RowCount(Source)-2),
#"Converted to Table" = Table.AddIndexColumn(Table.FromList(#"Unpivoted Other Columns", Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Index", 0, 1),
// list of PTHC
#"Unpivoted Other Columns2" = List.Repeat(Table.UnpivotOtherColumns(Table.FirstN(Table.Skip(Source,1) ,1), {"Column1", "Column2"}, "Attribute", "Value")[Value],Table.RowCount(Source)-2),
#"Converted to Table2" = Table.AddIndexColumn(Table.FromList(#"Unpivoted Other Columns2", Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Index", 0, 1),
// all other data
#"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(Table.Skip(Source,2), {"Column1", "Column2"}, "Attribute", "Value"),
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns1", "Index", 0, 1),
// merge in months and PTHC
#"Merged Queries" = Table.NestedJoin(#"Added Index",{"Index"},#"Converted to Table",{"Index"},"X1",JoinKind.LeftOuter),
#"Merged Queries2" = Table.NestedJoin(#"Merged Queries" ,{"Index"},#"Converted to Table2",{"Index"},"X2",JoinKind.LeftOuter),
#"Expanded X1" = Table.ExpandTableColumn(#"Merged Queries2", "X1", {"Column1"}, {"Month"}),
#"Expanded X2" = Table.ExpandTableColumn(#"Expanded X1", "X2", {"Column1"}, {"Type"}),
//extra work to pivot in correct format
#"Renamed Columns" = Table.RenameColumns(#"Expanded X2",{{"Column1", "State"}, {"Column2", "City"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Attribute","Index"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"State", Order.Ascending}, {"City", Order.Ascending}, {"Month", Order.Ascending}, {"Type", Order.Ascending}}),
#"Added Index1" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
TypeCount=List.Count(List.Distinct(#"Added Index1"[Type])),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index1", {{"Index", each Number.IntegerDivide(_, TypeCount), Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Integer-Divided Column", List.Distinct(#"Integer-Divided Column"[Type]), "Type", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"
Not sure if it can be called an improvement to the existing LET solution, but this is both shorter and a little more intuitive to me.
=LET( upValues, C3:N7, upHdr, C2:N2, upAttr, C1:N1,
byBody, A3:B7, byHdr, A2:B2,
attrTitle, "month",
attributes, UNIQUE(upAttr,1), attrcount, COUNTA(attributes),
vars, UNIQUE(upHdr,1), varcount, COUNTA(vars),
rowseq, SEQUENCE(ROWS(byBody)*attrcount),
colseq, SEQUENCE(1,varcount+3),
rept, CEILING(rowseq/attrcount,1),
rept1, IF(MOD(rowseq, attrcount)=0, attrcount, MOD(rowseq, attrcount)),
byC, COLUMNS(byBody),
header, IF(colseq<3, byHdr, IF(colseq=3, attrTitle, INDEX(vars, 1, colseq-byC-1))),
loc, INDEX(byBody,rept, SEQUENCE(1,byC)),
attrCol, INDEX(attributes, 1, rept1),
data, INDEX(upValues, rept, SEQUENCE(1,varcount)+(rept1*varcount)-varcount),
mydata, IF(colseq<(byC+1), loc, IF(colseq<4, attrCol, INDEX(data, rowseq, colseq-byC-1))),
final, IF(SEQUENCE(MAX(rowseq)+1)=1, header, INDEX(mydata, SEQUENCE(ROWS(byBody)*attrcount+1)-1, colseq)),
final )

Replacement of elements in list based on index of another list

This is my code:
time is a list with values from 60 to 900 with interval of 60.
PAR_2 is the list I want to modify. When time = 60 or a multiple of 60, the values in PAR_2 should be replaced by a fixed amount (50).
# here I try to go through two lists at the same time
for n, m in zip(time, PAR_2):
if n[60] < n[90]: ## If time is between 60 and 90, then I want the PAR_2=50
m=50
print(PAR_2)
Output example:
time PAR_2
60 50
61 50
62 50
.. 50
90 346
91 345
91 345
... 495
120 50
121 50
122 50
You could do this:
for i in range(len(PAR_2)):
try:
if time[i] % 60 == 0:
PAR_2[i] = 50
except IndexError: break

SQL: Counting occurrence of certain value from its first appearance till next five minutes and repeat the same for the next occurence again

I need to find the number of times a value say 34occurred from its first occurrence till next 5 minutes.
Then again do the same thing after 5 minutes, again fetch the record with value 20, see how many times it occurred til next 5 minutes for each device
Suppose say I have following table:
DevID value DateTime
--------------------------------------------------
99 20 18-12-2016 18:10
99 34 18-12-2016 18:11
99 34 18-12-2016 18:12
99 20 18-12-2016 18:15
23 15 18-12-2016 18:16
28 34 18-12-2016 18:17
23 15 18-12-2016 18:18
23 12 18-12-2016 18:19
99 20 18-12-2016 18:20
99 34 18-12-2016 18:21
99 34 18-12-2016 18:22
99 34 18-12-2016 18:23
99 34 18-12-2016 18:24
99 34 18-12-2016 18:25
I'm interested in number 34. I want to check the first appearance of number 34, get its time and then count how many times this number (34) occurred for the next 5 minutes. Basically fetch records from the first time of occurrence till occurrence +5minutes and count how many of them have 34 and if its more than 3 list that device name.
Repeat same for the next record with 34 till next 5 minutes. so in the case above, record 99 will had 34 first time at 18-12-2016 18:11 but then we did not get more than 3 record of 34 in next 5 minutes but however we again got 34 at 18-12-2016 18:21 and got more than 3 entries of 34 in next 5 minutes
So the expected output for the above table would be device id 99.
Editted
I am interested in finding only value 34. So the extra complexity for finding all such repeated value in 5 minutes gap is not required.
Just want to know for which device we have 34 repeated more than 3 times(this should be changable i can hardcode this to 10 times as well) between a time interval of 5 minutes.
The most efficient method is to use lag()/lead():
select t.*
from (select t.*,
lead(datetime, 2) over (partition by devid order by datetime) as next2_dt
from t
where value = 34
) t
where next2_dt <= dateadd(minute, 5, datetime);
This peaks ahead to the 2nd value and just compares the datetime of that value with the datetime on the current row.
This could be done as follows:
SELECT DevID
FROM t
WHERE Value = 34
AND 2 <= (
SELECT COUNT(*)
FROM t AS x
WHERE x.DevID = t.DevID
AND x.Value = t.Value
AND x.DateTime > t.DateTime
AND x.DateTime < DATEADD(MINUTE, 5, t.DateTime)
)
GROUP BY DevID
You might want to replace < with <= depending on how you count 5 minutes.
please adjust to your RDBMS, but it should look something like this:
select b.*
from (
select value, min(DateTime) as md
from the_table
group by value
) as a
join the_table as b
on a.value = b.value
and b.DateTime between a.md and a.md + interval'5'minute

SQL selecting values between two columns with a list

I'm attempting to find rows given a list of values where one of the values is in a range between two of the columns, as an example:
id column1 column2
1 1 5
2 6 10
3 11 15
4 16 20
5 21 25
...
99 491 495
100 496 500
I'd like to give a list of values, e.g. (23, 83, 432, 334, 344) which would return the rows
id column1 column2
5 21 25
17 81 85
87 431 435
67 331 335
69 341 345
The only way I can think of doing this so far has been to split each into it's own call by doing
SELECT * FROM TableA WHERE (column1 < num1 AND num1 < column2)
However this scales quite poorly when the list of numbers is around several million.
Is there any better way of doing this?
Thanks for the help.
Putting millions of numbers into the SQL command itself would be unwieldy.
Obviously, you have to put the numbers into a (temporary) table.
Then you can just join the two tables:
SELECT *
FROM TableA JOIN TempTable
ON TempTable.Value BETWEEN TableA.column1 AND TableA.column2;

Set certain values in column B by matching it to another pair of columns

In First sheet I have many rows (but less than 10 thousand rows)
A B
198
198
198
197
197
225
…
…
…
119
229
In a Second sheet I have the matching values (some will be empty e.g 8.6 has no pair). Values in A are not sequential, while B is sequential from 0.1 to 21.1 (0.1 interval)
A B
139 0.1
211 0.2
208 0.3
208 0.3
207 0.4
…
…
…
229 4.0
…
…
…
119 7.4
…
…
…
- 8.6
198 8.5
197 8.7
…
…
…
225 9.9
After the macro/VBA I want the result in the First sheet, such as: (please can someone give me some hints, thank you very much)
A B
198 8.5
198 8.5
198 8.5
197 8.7
197 8.7
225 9.9
…
…
…
119 7.4
229 4.0
In the first sheet, use a VLOOKUP function to find corresponding matches in the second sheet (I'll call that Sheet2. with IFERROR to catch no matches. In the first sheet's B2 cell, use this formula,
=IFERROR(VLOOKUP($A2, 'Sheet2'!$A:$B, 2, FALSE), "")
VLOOKUP function
That will return the first value in column B that corresponds to a matching value in column A which seems to be what you want. Other options would be SUMIF, AVERAGEIF and/or COUNTIF.