This is a followup to question: How to optimize this moving average calculation, in F#
To summarize the original question: I need to make a moving average of a set of data I collect; each data point has a timestamp and I need to process data up to a certain timestamp.
This means that I have a list of variable size to average.
The original question has the implementation as a queue where elements gets added and eventually removed as they get too old.
But, in the end, iterating through a queue to make the average is slow.
Originally the bulk of the CPU time was spent finding the data to average, but then once this problem was removed by only keeping the data needed in the first place, the Seq.average call proved to be very slow.
It looks like the original mechanism (based on Queue<>) is not appropriate and this question is about finding a new one.
I can think of two solutions:
implement this as a circular buffer which is large enough to accommodate the worst case scenario, this would allow to use an array and do only two iterations to make the sum.
quantize the data in buckets and pre-sum it, but I'm not sure if the extra complexity will help performance.
Is there any implementation of a circular buffer that would behave similarly to a Queue<>?
The fastest code, so far, is:
module PriceMovingAverage =
// moving average queue
let private timeQueue = Queue<DateTime>()
let private priceQueue = Queue<float>()
// update moving average
let updateMovingAverage (tradeData: TradeData) priceBasePeriod =
// add the new price
timeQueue.Enqueue(tradeData.Timestamp)
priceQueue.Enqueue(float tradeData.Price)
// remove the items older than the price base period
let removeOlderThan = tradeData.Timestamp - priceBasePeriod
let rec dequeueLoop () =
let p = timeQueue.Peek()
if p < removeOlderThan then
timeQueue.Dequeue() |> ignore
priceQueue.Dequeue() |> ignore
dequeueLoop()
dequeueLoop()
// get the moving average
let getPrice () =
try
Some (
priceQueue
|> Seq.average <- all CPU time goes here
|> decimal
)
with _ ->
None
Based on a queue length of 10-15k I'd say there's definitely scope to consider batching trades into precomputed blocks of maybe around 100 trades.
Add a few types:
type TradeBlock = {
data: TradeData array
startTime: DateTime
endTime: DateTime
sum: float
count:int
}
type AvgTradeData =
| Trade of TradeData
| Block of TradeBlock
I'd then make the moving average use a DList<AvgTradeData>. (https://fsprojects.github.io/FSharpx.Collections/reference/fsharpx-collections-dlist-1.html) The first element in the DList is summed manually if startTime is after the price period and removed from the list once the price period exceeds the endTime. The last elements in the list are kept as Trade tradeData until 100 are appended and then all removed from the tail and turned into a TradeBlock.
Related
I have a following time scheduling optimisation problem:
There are n breaks to be scheduled. A break takes up k time grains of 15 minutes each. The total horizon I am looking at is of m time grains. Each time grain has a desired amount of breaks to optimise for. The range to start a break at is defined per break, you cannot freely pick the range.
To make it more general - there is a distribution of breaks over time as a goal. I need to output a result which would align with this desired distribution as much as possible. I am allowed to move each break within certain boundaries, e.g. 1 hour boundary.
I had a look at the TimeGrain pattern as a starting point which is described here: https://www.optaplanner.org/blog/2015/12/01/TimeSchedulingDesignPatterns.html and in this video: https://youtu.be/wLK2-4IGtWY. I am trying to use Constraint Streams for incremental optimisation.
My approach so far is following:
Break.scala:
case class Break(vehicleId: String, durationInGrains: Int)
TimeGrain.scala:
#PlanningEntity
case class TimeGrain(desiredBreaks: Int,
instant: Instant,
#CustomShadowVariable(...), // Dummy annotation, I want to use the entity in constraint stream
var breaks: Set[Break])
BreakAssignment:
#PlanningEntity
case class BreakAssignment(
break: Break,
#PlanningVariable(valueRangeProviderRefs = Array("timeGrainRange"))
var startingTimeGrain: TimeGrain,
#ValueRangeProvider(id = "timeGrainRange")
#ProblemFactCollectionProperty #field
timeGrainRange: java.util.List[TimeGrain],
#CustomShadowVariable(
variableListenerClass = classOf[StartingTimeGrainVariableListener],
sources = Array(new PlanningVariableReference(variableName = "startingTimeGrain"))
)
var timeGrains: util.Set[TimeGrain]
)
object BreakAssignment {
class StartingTimeGrainVariableListener extends VariableListener[Solution, BreakAssignment] {
override def afterVariableChanged(scoreDirector: ScoreDirector[Solution], entity: BreakAssignment): Unit = {
val end = entity.startingTimeGrain.instant
.plusSeconds((entity.break.durationInGrains * TimeGrain.grainLength).toSeconds)
scoreDirector.getWorkingSolution.timeGrains.asScala
.filter(
timeGrain =>
timeGrain.instant == entity.startingTimeGrain.instant ||
entity.startingTimeGrain.instant.isBefore(timeGrain.instant) && end
.isAfter(timeGrain.instant)
)
.foreach { timeGrain =>
scoreDirector.beforeVariableChanged(timeGrain, "breaks")
timeGrain.breaks = timeGrain.breaks + entity.break
scoreDirector.afterVariableChanged(timeGrain, "breaks")
}
}
}
}
Constraints.scala:
private def constraint(constraintFactory: ConstraintFactory) =
constraintFactory
.from(classOf[TimeGrain])
.filter(timeGrain => timeGrain.breaks.nonEmpty)
.penalize(
"Constraint",
HardSoftScore.ONE_SOFT,
(timeGrain: TimeGrain) => {
math.abs(timeGrain.desiredBreaks - timeGrain.breaks.size)
}
)
As you can see I need to iterate over all the grains in order to find out which ones needs to be updated to hold the break which was just moved in time. This somewhat negates the idea of Constraint Streams.
A different way to look at the issue I am facing is that I cannot come up with an approach to link the BreakAssignment planning entity with respective TimeGrains via e.g. Shadow Variables. A break assignment is spanning multiple time grains. A time grain in return contains multiple break assignments. For the soft constraint I need to group all the assignments within the same grain, accessing the desired target break count of a grain, while doing this for all the grains of my time horizon. My approach therefore is having each grain as a planning entity, so I can store the information of all the breaks on each change of the starting time grain of the assignment himself. What I end up is basically a many-to-many relationship between assignments and time grains. This does not fit into the inverse mechanism of a shadow variable from my understanding as it needs to be a one-to-many relationship.
Am I going in the wrong direction while trying to come up with the correct model?
If I understand properly what you mean, then conceptually, in the TimeGrain class, I would keep a (custom) shadow variable keeping (only) the count of Break instances that are overlapping that TimeGrain (instance). Let me call it breakCount for simplicity. Let me call x the number of TimeGrains a Break spans.
So, upon the solver assigning a Break instance to a TimeGrain instance, I would increment that TimeGrain instance's breakCount. Not only thát TimeGrain instance's breakCount, but also the breakCount of the next few (x-1) TimeGrain instances. Beware to wrap each of those incrementations in a "scoreDirector.beforeVariableChanged()"-"scoreDirector.afterVariableChanged()" bracket.
The score calculation would do the rest. But do note that I myself would moreover also square the difference of a TimeGrain's ideal breakCount and it's "real" breakCount (i.e. the shadow variable), like explained in OptaPlanner's documentation, in order to enforce more "fairness".
Edit : of course also decrement a TimeGrain's breakCount upon removing a Break instance from a Timegrain instance...
I am new to Kotlin (and Java). In order to pick up on the language I am trying to solve some problems from a website.
The problem is quite easy and straightfoward, the function has to count how many times the biggest value is included in an IntArray. My function also works for smaller arrays but seems to exceed the allowed time limit for larger ones (error: Your code did not execute within the time limits).
fun problem(inputArray: Array<Int>): Int {
// Write your code here
val n: Int = inputArray.count{it == inputArray.max()}
return n
}
So as I am trying to improve I am not looking for a faster solution, but for some hints on topics I could look at in order to find a faster solution myself.
Thanks a lot!
In an unordered array you to touch every element to calcuate inputArray.max(). So inputArray.count() goes over all elements and calls max() that goes over all elements.
So runtime goes up n^2 for n elements.
Store inputArray.max() in an extra variable, and you have a linear runtime.
val max = inputArray.max()
val n: Int = inputArray.count{ it == max }
I have this calculation method that calculates 6 fields and total.
It works.
Question is how can I optimize it performance wise and code quality wise.
Just want to get some suggestions on how to write better code.
def _ocnhange_date(self):
date = datetime.datetime.now().strftime ("%Y-%m-%d %H:%M:%S")
self.date = date
self.drawer_potential = self.drawer_id.product_categ_price * self.drawer_qty
self.flexible_potential = self.flexible_id.product_categ_price * self.flexible_qty
self.runner_potential = self.runner_id.product_categ_price * self.runner_qty
self.group_1_potential = self.group_1_id.product_categ_price * self.group_1_qty
self.group_2_potential = self.group_2_id.product_categ_price * self.group_2_qty
self.group_3_potential = self.group_3_id.product_categ_price * self.group_3_qty
total = [self.drawer_potential,self.flexible_potential,self.runner_potential,self.group_1_potential,
self.group_2_potential,self.group_3_potential]
self.total_potentail = sum(total)
First things first: you should worry about performance mostly on batch operations. Your case is an onchange method, which means:
it will be triggered manually by user interaction.
it will only affect a single record at a time.
it will not perform database writes.
So, basically, this one will not be a critical bottleneck in your module.
However, you're asking how that could get better, so here it goes. It's just an idea, in some points just different (not better), but this way you can maybe see a different approach in some place that you prefer:
def _ocnhange_date(self):
# Use this alternative method, easier to maintain
self.date = fields.Datetime.now()
# Less code here, although almost equal
# performance (possibly less)
total = 0
for field in ("drawer", "flexible", "runner",
"group_1", "group_2", "group_3"):
potential = self["%s_id"].product_categ_price * self["%s_qty"]
total += potential
self["%s_potential"] = potential
# We use a generator instead of a list
self.total_potential = total
I only see two things you can improve here:
Use Odoo's Datetime class to get "now" because it already takes Odoo's datetime format into consideration. In the end that's more maintainable, because if Odoo decides to change the whole format system wide, you have to change your method, too.
Try to avoid so many assignments and instead use methods which allow a combined update of some values. For onchange methods this would be update() and for other value changes it's obviously write().
def _onchange_date(self):
self.update({
'date': fields.Datetime.now(),
'drawer_potential': self.drawer_id.product_categ_price * self.drawer_qty,
'flexible_potential': self.flexible_id.product_categ_price * self.flexible_qty,
# and so on
})
I am streaming data from Kafka (batch interval 10 sec), convert the RDD to a PairRDD, and then storing the RDD into the state using mapWithState(). Below is the code:
JavaPairDStream<String, Object> transformedStream = stream
.mapToPair(record -> new Tuple2<>(record.getKey(), record))
.mapWithState(StateSpec.function(updateDataFuncGDM).numPartitions(32)).stateSnapshots();
transformedStream.foreachRDD(rdd -> {
//if flag is true, put the RDD to a SQL table, and run a query to do some aggregations liek sum, avg etc
// if flag is false, return;
}
Now, i keep on updating data in the state, and on a certain event, i change the flag to true, and i put this data in the table, and do my calculations.
The problem here is that since i am getting "stateSnapshots" in every batch, its not efficient, and mapWithState keeps lots of data in memory, and as the state grows it will become even worse. Also, since mapWithState checkpoints data after every 10 iterations, it takes a lot of time because the data is very large.
I want to get a stateSnapshot of the state only on demand (i.e. only in the iteration of foreachRDD when the flag is true)
But i didn't find a lot of ways to play around with the state
How to prevent lag bugs issues in flash games? For example If game have countdown timer 1 minute and player have to catch that much items that possible.
Here are following lag bugs issues:
If items moving (don't have static position) - that higher lag player
have, that slower items move;
Timer starting count slowly when player have lags (CPU usage 90-100%).
So for example If player without lags can get 100 points, player with slow / bad computer can get 4-6x more, like 400-600.
I think that because It's on client side, but how to move It to server side? Should I insert (and update) countdown time to database? But how to update It on every millisecond?
And how about items position solution? If player have big lags, items moving very very slowly, so easy to click on that, have you any ideas?
Moving the functionality to the server side doesn't solve the problem.
Now if there are many players connected to the server, the server will lag and give those players more time to react.
To make your logic independent from lag, do not base it on the screen update.
Because this assumes a constant time between screen updates (or frames)
Instead, make your logic based on the actual time that passed between frames.
Use getTimer to measure how much time passed between the current and the last frame.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/utils/package.html
Of course, your logic should include calculations for what happens in between frames.
In order to mostly fix speed issues on the client you need to make all your speed related code based on actual time, not frames. For example:
Here is a fairly typical example code used to move an object based on frames:
// speed = pixels per frame
var xSpeed:Number = 5;
var ySpeed:Number = 5;
addEventListener(Event.ENTER_FRAME, update);
function update(e:Event):void {
player.x += xSpeed;
player.y += ySpeed;
}
While this code is simple and good enough for a single client, it is very dependent on the frame rate, and as you know the frame rate is very "elastic" and actual frame rate is heavily influenced by the client CPU speed.
Instead, here is an example where the movement is based on actual elapsed time:
// speed = pixels per second
var xSpeed:Number = 5 * stage.frameRate;
var ySpeed:Number = 5 * stage.frameRate;
var lastTime:int = getTimer();
addEventListener(Event.ENTER_FRAME, update);
function update(e:Event):void {
var currentTime:int = getTimer();
var elapsedSeconds:Number = (currentTime - lastTime) / 1000;
player.x += xSpeed * elapsedSeconds;
player.y += ySpeed * elapsedSeconds;
lastTime = currentTime;
}
The crucial part here is that the current time is tracked using getTimer(), and each update moves the player based on the actual elapsed time, not a fixed amount. I set the xSpeed and ySpeed to 5 * stage.frameRate to illustrate how it can be equivelent to the other example, but you don't have to do it that way. The end result is that the second example would have consistent speed of movement regardless of the actual frame rate.