How do I specify minutes range between 55 and 60 (both inclusive) in kusto - kql

So, I was storing the current time in a variable and extracting the minute for it.
For eg
let current_time=now():
let current_minute=datetime_part("minute", current_time);
RequiredTime=iff(current_minute between (55 .. 00), variable1, variable2)
So, I want to check range between 55th and the 60th minute. Should I refer to it as 00 (I doubt this since ideally it should get confused looking for a range between 55 and 00) or is there any other way to handle this case?

let current_time=now():
let current_minute=datetime_part("minute", current_time);
let current_minute60=iff(current_minute==00, 60, current_minute);
RequiredTime=iff(current_minute between (55 .. current_minute60), variable1, variable2);
This takes care of the edge case.

Related

Convert character variable to numeric variable in SAS

I'm trying to convert a character variable to a numeric variable, but unfortunately i'm really struggeling. Help would be appreciated!
I keep getting the following error: 'Invalid argument to function INPUT at line 3259 column 17'
Syntax:
Data want;
Set have;
Dosis_num = input(Dosis, best12.);
run;
I have also tried multiplying the variable by 1. This doesnt work either.
The variable looks like this:
Dosis
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
Want:
Dosis_num
155.0
201.0
2.1
0.8
123.8
12.0
333.4
0.6
Thanks alot!
The code will work with the data you show. So either the values in the character variable are not what you think or you are not using the right variable name for the variable.
The code is trying to only use the first 12 bytes of the character variable. Normally you don't need to restrict the number of characters you ask the INPUT() function to use. In fact the INPUT() function does not care if the width of the informat used is larger than the length of the string being read. So just use 32. as the informat since 32 is the maximum width that the normal numeric informat can read. Note that BEST is the name of a FORMAT, if you use it as the name of informat it is just an alias for the normal numeric informat.
If the variable has a length longer than 12 then perhaps there are leading spaces in the variable (note the ODS output displays do not properly display leading spaces) then use the LEFT() function to remove them.
Dosis_num = input(left(Dosis), 32.);
The typical thing to do here is to find out what's actually in the character variable. There is likely something in there that is causing the issue.
Try this:
data have;
input #1 Dosis $8.;
datalines;
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
;;;;
run;
data check;
set have;
put dosis hex32.;
run;
What I get is this:
83 data check;
84 set have;
85 put dosis hex32.;
86 run;
3135352020202020
3230312020202020
322E312020202020
302E382020202020
3132332E38302020
31322E3020202020
333333332E342020
30302E3620202020
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 8 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
All those 2020202020 are spaces, which should be there (all strings are space-padded to full length). Period/Decimal Point is 2E, Digits are 3x where x is the digit (because the ASCII for 0 is 30, not because of any other reason). So for example for the last one, 00.6, 30 means zero, 30 means zero, 2E means period, and 36 means 6.
Check to make sure that you don't have any other characters other than digits (3x) and period (2e) and space (20).
The other thing to verify is that your system is set to use . as the decimal separator and not , as many European systems are - otherwise this requires the commaw. informat. You can actually just try the commaw. informat (comma12. is sufficient if 12 is plenty - and don't include anything after the period) as anything that 12. can read in also can be read in by commaw..

Weighted Activity Selection Problem with allowing shifting starting time

I have some activities with weights, and I would like to select non overlapping activities by maximizing the total weight. This is known problem and solution exists.
In my case, I am allowed to shift the start time of activities in some extend while duration remains same. This will give me some flexibility and I might increase my utilization.
Example scenario is something like the following where all activities are supposed to be in interval (0-200):
(start, end, profit)
a1: 10 12 120
a2: 10 13 100
a3: 14 18 150
a4: 14 20 100
a5: 120 125 100
a6: 120 140 150
a7: 126 130 100
Without shifting flexibility, I would choose (a1, a3, a6) and that is it. On the other hand I have shifting flexibility to the left/right by at most t units for any task where t is given. In that case I might come up with this schedule and all tasks can be selected except a7 since conflict cannot be avoided by shift .
t: 5
a1: 8 10 120 (shifted -2 to left)
a2: 10 13 100
a3: 14 18 150
a4: 18 24 100 (shifted +4 to right)
a5: 115 120 100 (shifted -5 to left)
a6: 120 140 150
In my problem, total time I have is very big with respect to activity duration. While activities are like 10sec on average, total time I have would even be 10000sec. However that does not mean all of activities can be selected since shifting flexibility would not be enough for some activities to non-overlap.
Also in my problem, there are clusters of activities which overlaps and very big empty space where no activities and there comes another cluster of overlapping activities i.e a1, a2, a3 and a4 are let say cluster1 and a5, a6 and a7 is cluster2. Each cluster can be expanded in time by shifting some of them to left and right. By doing that, I can select more activities than the original activity selection problem. However, I do not know how to decide which tasks to be shifted to left or right.
My expectation is to find an near-optimal solution where total profit would be somehow local optima. I do not need global optimum value. Also I do not have any criteria about cluster utilization., i.e I do not have a guarantee about a minimum number of activity per cluster etc. Actually, these clusters something I visually describe. There is not defined cluster. However, in time domain, activities are separated as clusters somehow.
Also activity start and end times are all integers since I can dismiss fractions. I would have around 50 activities whose duration would be 10 on average. And time window is like 10000.
Are there any feasible solution to this problem?
You mentioned that you can partition the activities into clusters that don't overlap even if activities within them are shifted to the extent. Each of these clusters can be considered independently, and the optimal results computed for each cluster simply summed up for the final answer. So the first step of the algorithm could be a trial run that extends all activities in both directions, finds which ones form clusters, and process each cluster independently. In the worst case, all of the activities might form a single cluster.
Depending on the maximum size of the remaining clusters, there are several approaches. If it's under 20 (or even 30, depending on whether you want your program to run in seconds or minutes), you could combine search over all subsets of activities in the given cluster with a greedy approach. In other words: if you are processing a subset of N elements, try every one of its 2^N possible subsets (okay, 2^N-1 if we forget the empty subset), check whether the activities in this specific subset can be scheduled in non-overlapping manner, and pick the subset that is eligible and has maximum sum.
How do we check that a given subset of activities can be scheduled in non-overlapping manner? Let's sort them in ascending order of their end and process them from left to right. For every activity, we try to schedule it as early as possible, making sure it does no intersect with activities we already considered. So, the first activity in the cluster is always started time t earlier than originally planned, the second one is started either when the first one ends, or t earlier than originally planned, whichever is larger, and so on. If at any point we can't schedule the next activity in a way that it does not overlap with previous one, then there is no way to schedule the activities in this subset in a non-overlapping manner. This algorithm takes O(NlogN) time, and overall each cluster is processed in O(2^N * NlogN). Once again, note that this function grows very quickly, so if you are dealing with large enough clusters, this approach goes out the window.
===
Another approach is specific to the additional restrictions you provided. If the activities' starts and ends and parameter t are all measured in integer number of seconds, and t is about 2 minutes, then the problem for each cluster is set in a small discrete space. Even though you could position a task to start at a non-integer second value, there always is an optimal solution that uses only integers. (To prove it, consider an optimal solution that does not use integers - since t is integer, you can always shift tasks, starting from the leftmost, to the left a bit so that it starts at an integer value.)
Knowing that the start and end times are discrete, you can build a DP solution: process the activities in the ascending order of their end*, and memoize the maximum possible sum of weights you can obtain from the first 1, 2, ..., N activities for each x from activity_start - t to activity_start + t if a given activity ends at time x. If we denote this memoized function as f[activity][end_time], then the recurrence relation is f[a][e] = weight[a] + max(f[i][j] over all i < a, j <= e - (end[a] - start[a]), which roughly translates to "if activity a ended at time e, the previous activity must have ended at or before start of a - so let's pick the maximum total weight over previous activities and their ends, and add the current activity's weight".
*Again, we can prove that there is at least one optimal answer where this ordering is preserved, even though there might be other optimal answers which do not possess this property
We could go further and eliminate the iteration over previous activities, instead encoding this information in f. Its definition would then change to "f[a][e] is the maximum possible total weight of the first a activities if none of them ends after e", and recurrence relation would become f[a][e] = max(f[a-1][e], weight[a] + max(f[a-1][i] over i <= e - (end[a] - start[a])])), and its computational complexity would be O(X * N), where X is the total span of the discrete space where task starts/ends are placed.
I assume you need to compute not just the maximum possible weight, but also the activities you need to select to obtain it, and possibly even the exact time each of them needs to be started. Thankfully, we can derive all of this from the values of f, or compute it at the same time as we compute f. The latter is easier to reason about, so let's introduce a second function g[activity][end]. g[activity][end] returns a pair (last_activity, last_activity_end), essentially pointing us to the exact activity and its timing that the optimal weight in f[activity][end] uses.
Let's go through the example you provided to illustrate how this works:
(start, end, profit)
a1: 10 12 120
a2: 10 13 100
a3: 14 18 150
a4: 14 20 100
a5: 120 125 100
a6: 120 140 150
a7: 126 130 100
We order the activities by their end time, thereby swapping a7 and a6.
We initialize the values of f and g for the first activity:
f[1][7] = 120, f[1][8] = 120, ..., f[1][17] = 120, meaning that the first activity could end anywhere from 7 to 17, and costs 120. f[1][i] for all other i should be set to 0.
g[1][7] = (1, 7), g[1][8] = (1, 8), ..., g[1][17] = (1, 17), meaning that the last activity that was included in f[1][i] values was a1, and it ended at i. g[1][i] for all i outside [7, 17] is undefined/irrelevant.
That's where something interesting begins. For each i such that a2 cannot end at time i, let's assign f[2][i] = f[1][i], g[2][i] = g[1][i], which essentially means that we wouldn't be using activity a2 in those answers. For all other i, namely, in [8..18] interval, we have:
f[2][8] = max(f[1][8], 100 + max(f[1][0..5])) = f[1][8]
f[2][9] = max(f[1][9], 100 + max(f[1][0..6])) = f[1][9]
f[2][10] = max(f[1][10], 100 + max(f[1][0..7])). This is the first time when the second clause is not just plain 100, as f[1][7]>0. It is, in fact, 100+f[1][7]=220, meaning that we can take activity a2, shift it in a way that puts its end at time 10, and get a total weight of 220. We continue computing f[2][i] this way for all i <= 18.
The values of g are: g[2][8]=g[1][8]=(1, 8), g[2][9]=g[1][9]=(1, 9), g[2][10]=(2, 10), because it was optimal to take activity a2 and end it at time 10 in this case.
I hope the pattern of how this continues is visible - we compute all the values of f and g through the end, and then pick the maximum f[N][e] over all possible end times e of the last activity. Armed with the auxiliary function g, we can traverse the values backwards to figure out the exact activities and times. Namely, the last activity we use and its timing is in g[N][e]. Let's call them A and T. We know that A began at T-(end[A]-start[A]). Then, the previous activity must have ended at that point or before - so let's look at g[A-1][T-(end[A]-start[A]) for it, and so on.
Note that this approach works even if you don't partition anything into clusters, but with the partitioning, the size of the space in which tasks can be scheduled is reduced, and with it the runtime.
You might notice that neither of these solutions is polynomial in the size of input. I have a feeling that your problem doesn't have a general polynomial solution, but I was unable to prove it by reducing another NP-complete problem to it. Would be really curious to read a reduction / better general solution!

SAS - Advanced querying

I have one SQL table with the data in SAS. The first column is a datetime, and there is one row for each second. The set spans for about 20 minutes. The other columns contain integer values.
Here is what I need:
For example, Let's pick 50. How many times did the integer value go from below 50 to above 50 and stay above 50 for at least n seconds.
Is it possible to conduct such analysis with proc sql? If yes, how so, and if not, how else?
I am new to SAS, so any help is appreciated. Let me know if you need more info!
Thanks!
How many times did the integer value go from below/above 50
I think this could be solution to first part of the question. Resolution is maybe the best obtained by comparing current value with prior
data begin; /*Some test data...*/
input int_in_question;
datalines;
51
51
49
55
55
40
40
60
40
;
run;
data With_calc;
set begin;
if int_in_question < 50 and
lag(int_in_question)>=50
then Times_below_50+1;
run;

Pandas shifting uneven timeseries data

I have some irregularly stamped time series data, with timestamps and the observations at every timestamp, in pandas. Irregular basically means that the timestamps are uneven, for instance the gap between two successive timestamps is not even.
For instance the data may look like
Timestamp Property
0 100
1 200
4 300
6 400
6 401
7 500
14 506
24 550
.....
59 700
61 750
64 800
Here the timestamp is say seconds elapsed since a chose origin time. As you can see we could have data at the same timestamp, 6 secs in this case. Basically the timestamps are strictly different, just that second resolution cannot measure the change.
Now I need to shift the timeseries data ahead, say I want to shift the entire data by 60 secs, or a minute. So the target output is
Timestamp Property
0 750
1 800
So the 0 point got matched to the 61 point and the 1 point got matched to the 64 point.
Now I can do this by writing something dirty, but I am looking to use as much as possible any inbuilt pandas feature. If the timeseries were regular, or evenly gapped, I could've just used the shift() function. But the fact that the series is uneven makes it a bit tricky. Any ideas from Pandas experts would be welcome. I feel that this would be a commonly encountered problem. Many thanks!
Edit: added a second, more elegant, way to do it. I don't know what will happen if you had a timestamp at 1 and two timestamps of 61. I think it will choose the first 61 timestamp but not sure.
new_stamps = pd.Series(range(df['Timestamp'].max()+1))
shifted = pd.DataFrame(new_stamps)
shifted.columns = ['Timestamp']
merged = pd.merge(df,shifted,on='Timestamp',how='outer')
merged['Timestamp'] = merged['Timestamp'] - 60
merged = merged.sort(columns = 'Timestamp').bfill()
results = pd.merge(df,merged, on = 'Timestamp')
[Original Post]
I can't think of an inbuilt or elegant way to do this. Posting this in case it's more elegant than your "something dirty", which is I guess unlikely. How about:
lookup_dict = {}
def assigner(row):
lookup_dict[row['Timestamp']] = row['Property']
df.apply(assigner, axis=1)
sorted_keys = sorted(lookup_dict.keys)
df['Property_Shifted'] = None
def get_shifted_property(row,shift_amt):
for i in sorted_keys:
if i >= row['Timestamp'] + shift_amt:
row['Property_Shifted'] = lookup_dict[i]
return row
df = df.apply(get_shifted_property, shift_amt=60, axis=1)

How to set a minimum random number in REBOL?

I'm executing some code and then waiting somewhere between 1 second and 1 minute. I'm currently using random 0:01:00 /seed but what I really need is to be able to set a floor so that its waiting between 30 seconds and 1 minute.
If you want 0:0:30 to be the minimum and 0:1:0 to be the maximum, try the formula:
0:0:29 + random 0:0:31
This formula yields a "discretely distributed (pseudo)random value". If you want a "continuously distributed (pseudo) random value", you can use (just in R3) the formula:
0:0:30 + random 30.0
R2 does not have a native support for "continuously distributed (pseudo)random values".
Not my area of expertise, but:
00:00:30 + to time! (random 100% * (to integer! 00:00:30))
...appears to work, I think.
>>random/seed now/precise
>> t1: now wait 30 + random 30 difference now t1
== 0:00:39
How about the following:
0:00:30 + random 0:00:30
You could generate a whole number from 1 to 30 and subtract that number in seconds from 1 minute and 1 second.
(and about seeding, use that, but not constantly)