Fellow SO users,
I'm hoping someone can share their ideas on how to go about doing this problem.
Lets say, I have a table of values. The table contains two columns, the first col (COL1) has a specific value associated with it and the second column (COL2) has another value:
The values are all in hexadecimal
COL1 COL2
0 11
1 90
2 52
3 C8
4 B7
Now, what I have to do is, compare the value in one of the registers and if it matches any value from COL1, I have to load another register with the corresponding value from COL2. For example, if I have a value, say, R2 = 1, I will have to load R3 with 90.
The approach that I'm using involves completely avoiding the lb instruction (which I'm aiming for);
and $r1, $r1, $r0 #Initialise r1 to 0
addi $r1, $r1, 1 #load r1 with 1
beq $r2, $r1, LOAD_1 #Check to see if r2 = 1
and $r1, $r1, $r0
addi $r1, $r1, 2
beq $r2, $r1, LOAD_2
LOAD_1:
and $r3, $r3, $zero
addi $r3, $r3, 0x52 #Load r3 with 0x52 as per the table
LOAD_2:
Load value into r3 as before.
The issue with this is, it will get ridiculously long if I have a huge table. Could someone please suggest a shorter way, if there exists one (with using the lb operator)?
If the COL1 sequence is sorted you can perform a binary search to quickly find a given value.
If the sequence is both sorted in ascending order and doesn't have any gaps or duplicates (i.e. the x:th element always equals the x-1:th element plus 1) it becomes even easier:
if (R2 >= COL1[0] && R2 <= COL1[last_index]) {
R3 = COL2[R2 - COL1[0]];
}
Related
Simplified version of the dataset I have is:
DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "LMK" . 29
;RUN;
I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice?
EDIT:
Apologies, I dumbed down my dataset too much and forgot about an important exception. Note in my new dataset (I only added one row to the end). I do NOT want to delete group 3, since match2 is blank. I only want to delete a group where match2 is not blank and match1 does not equal match2.
Thanks
There's a few ways to do this. One would be to just construct a dataset of IDs that have non-matching rows, then do a merge or a SQL join and remove anything that matched this list.
However, my preferred option (partly because of speed, but also it's more straightforward once you understand how it works) is the DoW loop.
data want;
id_nonmatch = 0;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if match1 ne match2 then id_nonmatch = 1; *set the flag to 1 if we find a nonmatch;
end;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if id_nonmatch = 0 then output;
end;
run;
There are two set statements on the data step, each of which runs through the same dataset separately. If it doesn't make sense, throw a put _all_; inside each of the do loops - that will show you what it's doing. The first loop goes over all of the rows for one ID, checks if any violate the constraint, and if none do, the flag variable (id_nonmatch) stays 0. If one does, it becomes a 1 (and stays that way). Then, when it hits an ID boundary, it stops pulling records from the first set statement, and goes onto the second - re-pulling those same rows. Now, it outputs only when the flag is a zero.
This is very efficient because of buffering - unless your id groups are very large, the data step may be able to use buffers to keep the same rows in memory and not have to reread them from disk. (This will depend on your disk and buffers - and seems to help much less on flash than on physical disks [since there is not the additional benefit of the disk head not having to move] - so your mileage may vary here.)
Just to show this difference, here is a log showing that there isn't much additional time needed for the second read - when the record is reasonably sized. This benefit is less when the record is very small - I imagine there is more overhead involved. Note that the second read adds only 1/7 of the time of the first read to the total processing time!
69 data have;
70 call streaminit(7);
71 length strvar $1000;
72 do id = 1 to 100000;
73 do iter = 1 to 50;
74 x = rand('Uniform');
75 output;
76 end;
77 end;
78 run;
NOTE: Variable strvar is uninitialized.
NOTE: The data set WORK.HAVE has 5000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 5.20 seconds
cpu time 5.20 seconds
79
80
81 data _null_;
82 do _n_ = 1 by 1 until (last.id);
83 set have;
84 by id;
85 end;
86 run;
NOTE: There were 5000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 2.37 seconds
cpu time 2.37 seconds
87
88
89 data _null_;
90 do _n_ = 1 by 1 until (last.id);
91 set have;
92 by id;
93 end;
94 do _n_ = 1 by 1 until (last.id);
95 set have;
96 by id;
97 end;
98 run;
NOTE: There were 5000000 observations read from the data set WORK.HAVE.
NOTE: There were 5000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 2.74 seconds
cpu time 2.73 seconds
It is easy to do this with an SQL query with a GROUP BY and HAVING clause.
proc sql;
create table want as
select *
from have
group by id
having max( (match1 ne match2) and not missing(match2))
;
quit;
SAS evaluates boolean expressions as 1/0 for TRUE/FALSE so the MAX() of a series of TRUE/FALSE values will be TRUE if ANY of them are TRUE.
I have a table called "latencies" and it encompasses 2 sets, a and b, and a variable y to iterate over this table. As well, I have some parameters for a that must be satisfied:
table latencies(a, b)
b1 b2 b3
a1 1 2 3
a2 4 5 6
a3 7 9 8;
parameter pam1(a) /"a1" 12, "a2" 13, "a3" 14/;
positive variable y(a,b);
I am trying to make the sum of each row from the latencies table at most each respective element in the parameter pam1.
equations maxime(a), ...;
maxime(a)..
sum(a, y(a,b)) =l= pam1(a);
So the sum of the first row in latencies should be less than or equal to 12, the sum of the 2nd row should be less than or equal to 13, etc. However, I am getting these errors: "Set is under control already" and "Uncontrolled set entered as constant" on the same equation above. How do I do this?
Here is the corrected solution (which works):
equations maxime(a), ...;
maxime(a)..
sum(b, y(a,b)) =l= pam1(a);
I was incorrectly setting the row index (a) as my controlling index before. I needed to set that index as b, the column index. That is how you would iterate over the sum of each row and put an upper bound on it.
I am new to GAMS and I have a table data which has 3 rows and 6 columns. I want to pull each row and use its data for two parameters(pull each row which has 6 element and use the first three elements for one parameter and the other three elements for the second parameter) using loop or for statement. i tried to use both of them but for the loop i received zero value for my parameter which is incorrect and for the for statement i received some errors.
this is my code for the first row which both 'loop' and 'for' are used (i used them separately each time but for show what was my code i just wrote them together).
Please help me.
Thanks
scalars j;
sets
o /red,green,blue/
p /b1,b2,b3,p1,p2,p3/
k /1*3/;
Table sup(*,*)
b1 b2 b3 p1 p2 p3
red 12 15 20 200 50 50
green 16 17 0 150 50 0
blue 13 18 0 100 50 0 ;
parameters Bid_Red(k),Pmax_Red(k),t;
*for statement***************
for(j= 1 to 3,
t=card(o)+j;
Bid_Red(k)$( ord(k) = j )=sup('red',j);
Pmax_Red(k)$( ord(k) = j )=sup('red',t);
);
*loop statement***************
t=card(o);
loop(k,
Bid_Red(k)=sup('red',k);
Pmax_Red(k)=sup('red',k+t);
);
display Bid_red, Pmax_Red
One of the core features of GAMS is how it deals with set structures and indexing. I'd recommend looking at the excellent documentation, for example on set definition https://www.gams.com/latest/docs/UG_SetDefinition.html, to really get a feel for how to get the best out of it.
In your case, you can proceed as follows. p is a set. Create some subsets of it p_ and b_, given by the syntax subset_name(set_name).
sets p_(p) / p1, p2, p3 /,
b_(p) / b1, b2, b3 /;
Create parameters over appropriate dimensions (i.e. the full set), and define them over the subset you are interested in:
parameters bid_red(o,p),pmax_red(o,p);
bid_red(o,b_) = sup(o,b_);
pmax_red(o,p_) = sup(o,p_);
Then display bid_red, pmax_red; gives:
---- 21 PARAMETER bid_red
b1 b2 b3
red 12.000 15.000 20.000
green 16.000 17.000
blue 13.000 18.000
---- 21 PARAMETER pmax_red
p1 p2 p3
red 200.000 50.000 50.000
green 150.000 50.000
blue 100.000 50.000
If you do want to select individual rows, you can use e.g. pmax_red('red',p_) in your code. This is essentially just a special case of subsetting in which the subset is of size 1.
I am interested in using Redis to check if a IP address (converted into integer) falls within a range of IPs. It is very likely that the ranges will overlap.
I have found this question/answer, although I am not able to fully understand the logic behind it.
Thank you for your help!
EDIT - Since I got a downvote (a comment to explain why would be nice), I've removed some clutter from my answer.
#DidierSpezia answer in your linked question is a good answer, but it becomes hard to maintain if you are adding/removing ranges.
However it is not trivial (and expensive) to build and maintain it.
I have an answer that is easier to maintain, but it could get slow and memory expensive to compute with many ranges as it requires cloning a set of all ranges.
You need to save all ranges twice, in two sets. The score of each range will be its border values.
Going with the sets in #DidierSpezia example:
A 2-8
B 4-6
C 2-9
D 7-10
Your two sets will be:
ZADD ranges:low 2 "2-8" 4 "4-6" 2 "2-9" 7 "7-10"
ZADD ranges:high 8 "2-8" 6 "4-6" 9 "2-9" 10 "7-10"
To query to which ranges a value belongs, you need to trim the ranges that the lower border is higher than the queried value, and trim the ranges that the higher border is lower.
The most efficient way I can think of is cloning one of the sets, trimming one of it sides by the rules gave above, changing the scores of the ranges to reflect the other border and then trim the second side.
Here's how to find the ranges 5 belongs to:
ZUNIONSTORE tmp 1 ranges:low
ZREMRANGEBYSCORE tmp (5 +inf
ZINTERSTORE tmp 2 tmp ranges:high WEIGHTS 0 1
ZREMRANGEBYSCORE tmp -inf (5
ZRANGE tmp 0 -1
In this discussion, Dvir Volk and #antirez suggested to use a sorted set in which each entry represent a range, and has the following form:
Member = "min-max" range
Score = max value
For example:
ZADD z 10 "0-10"
ZADD z 20 "10-20"
ZADD z 100 "50-100"
And in order to check if a value falls within a range, you can use ZRANGEBYSCORE and parse the member returned.
For example, to check value 5:
ZRANGEBYSCORE z 5 +inf LIMIT 0 1
this will return the "0-10" member, and you only need to parse the string and validate if your value is in between.
To check value 25:
ZRANGEBYSCORE z 25 +inf LIMIT 0 1
will return "50-100", but the value is not between that range.
I have over 3000 values in column A. I have a value of x in column B. I want excel to look through the values in column A and give back a "yes" if there is a value bigger than x, yet smaller than x+7 (x+7>value to be found in column A >x). If such value(s) does not exist, then display "no"
Here's an example:
Column A
2: 11.2
3: 11.3
4: 11.4
5: 13.5
6: 13.6
7: 20.5
8: 20.6
9: 30.5
Column B
2: 11.1
3: 20.7
In this case, since there are values in Column 1 that are bigger than 11.1 and within the range (smaller than B2+7, and bigger than B2), I need excel to give back "yes". If possible, it would be ideal to also give the first value after the specific value in column B.
Here's what I have tried so far but have had no success:
=IF(AND((B2+7)>A1:A3000>B2),"yes","no")
=IF(AND((B2+7)>$A$2:$A$3000,$A$2:$A$3000>B2),"yes","no")
How can I do this in Excel? is there a way to do this other than using IF?
Forgive me if I am not understanding the question, but isn't the answer:
=IF(AND((B2+7)>$A$2:$A$10,$A$2:$A$10>B2),"yes","no")
Where that would be the equation in C2, testing B2 for entries in the list that spans A2 - A10. You'd copy that equation down the column for all the entries in B column.
Try this (for cell B2)
=IF(SUMPRODUCT((A:A>B2)*(A:A<B2+7)),"Yes","No")
and copy down as far as required.
For the second part, to return the next larger value, try this
=IF(SUMPRODUCT((A:A>B2)*(A:A<B2+7)),INDEX(A:A,MATCH(B2,A:A,1)+1),"No")
Note that this requires the data in column A to be sorted ascending (as your sample ata is) and for cell A to contain '0'. If these conditions are not possible, then you might have to consider a VBA user defined function.
I don't understand what you require but maybe =IF(AND(A2>B$2,A2<B$2+7),"yes","no") copied down would serve to test B2 and B2+7 against each of the Column A values.