Using a Skewed Associative Cache in Gem5 - gem5

I am trying to learn how to implement an L2 cache with a skewed associativity.
I see there is already implemented classes for skewed associativity (skewed_associative.cc/hh) under
/gem5/src/mem/cache/tags/indexing_policies/
and i would like to use this to start off with.
What I do not understand is how to access those files and specify a given cache to act as a skewed associative. Do I need to create an entire new cache class that inherits from the basecache? Or is there a way to do it through implementing a new tag?
I'm still trying to figure out how gem5 has anything structured/works, so any amount of help or links would be greatly appreciated.

The above comments helped lead to a solution.
When instantiating a given cache, assign its tags' indexing_policy as SkewedAssociative()
in configs/common/CacheConfig.py:
Instead of having:
system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain, **_get_cache_opts('l2', options))
you'd have
system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain, tags=BaseSetAssoc(indexing_policy=SkewedAssociative()), **_get_cache_opts('l2', options))

Related

Optimize "1D" bin packing/sheet cutting

Our use case could be described as a variant of 1D bin packing or sheet cutting.
Imagine a drywall with a beam framing.
We want to optimize the number and size of gypsum boards that would be needed to cover the wall.
Boards must start and end on a beam.
Boards must not overlap (hard constraint).
Less (i.e. bigger) boards, the better (soft constraint).
What we currently do:
Pre-generate all possible boards and pass them as problem facts.
Let the solver pick the best subset of those (nullable planning variable).
First Fit Decreasing + Simulated Annealing
Even relatively small walls (~6m, less than 20 possible boards to pick from) take sometimes minutes and while we mostly get a feasible solution, it's rarely optimal.
Is there a better way to model that?
EDIT
Our current domain model looks like the following. Please note that the planning entity only holds the selected/picked material but nothing else. I.e. currently our planning entities are all equal, which kind of prevents any optimization that depends on planning entity difficulty.
data class Assignment(
#PlanningId
private val id: Long? = null,
#PlanningVariable(
valueRangeProviderRefs = ["materials"],
strengthComparatorClass = MaterialStrengthComparator::class,
nullable = true
)
var material: Material? = null
)
data class Material(
val start: Double,
val stop: Double,
)
Active (sub)pillar change and swap move selectors. See optaplanner docs section about move selectors (move neighorhoods). The default moves (single swap and single change) are probably getting stuck in local optima (and even though SA helps them escape those, those escapes are probably not efficient enough).
That should help, but a custom move to swap two subpillars of the almost the same size, might improve efficiency further.
Also, as you're using SA (Simulated Annealing), know that SA is parameter sensitive. Use optaplanner-benchmark to try multiple SA starting temp parameters with different dataset set sizes. Also compare it to a plain LA (Late Acceptance) in benchmarks too. LA isn't fickle like SA can be. (With fickle I don't mean unstable. I mean potential dataset size sensitive parameter tweaking.)

wxDataViewListCtrl is slow with 100k items from another thread

The requirements:
100k lines
One of the columns is not text - its custom painted with wxDC*.
The items addition is coming from another thread using wxThreadEvent.
Up until now I used wxDataViewListCtrl, but it takes too long to AppendItem 100 thousand time.
wxListCtrl (in virtual mode) does not have the ability to use wxDC* - please correct me if I am wrong.
The only thing I can think of is using wxDataViewCtrl + wxDataViewModel. But I can't understand how to add items.
I looked at the samples (https://github.com/wxWidgets/wxWidgets/tree/WX_3_0_BRANCH/samples/dataview), too complex for me.
I cant understand them.
I looked at the wiki (https://wiki.wxwidgets.org/WxDataViewCtrl), also too complex for me.
Can somebody please provide a very simple example of a wxDataViewCtrl + wxDataViewModel with one string column and one wxDC* column.
Thanks in advance.
P.S.
Per #HajoKirchhoff's request in the comments, I am posting some code:
// This is called from Rust 100k times.
extern "C" void Add_line_to_data_view_list_control(unsigned int index,
const char* date,
const char* sha1) {
wxThreadEvent evt(wxEVT_THREAD, 44);
evt.SetPayload(ViewListLine{index, std::string(date), std::string(sha1)});
wxQueueEvent(g_this, evt.Clone());
}
void TreeWidget::Add_line_to_data_view_list_control(wxThreadEvent& event) {
ViewListLine view_list_line = event.GetPayload<ViewListLine>();
wxVector<wxVariant> item;
item.push_back(wxVariant(static_cast<int>(view_list_line.index)));
item.push_back(wxVariant(view_list_line.date));
item.push_back(wxVariant(view_list_line.sha1));
AppendItem(item);
}
Appending 100k items to a control will always be slow. That's because it requires moving 100k items from your storage to the controls storage. A much better way for this amount of data is to have a "virtual" list control or wxGrid. In both cases the data is not actually transferred to the control. Instead when painting occurs, a callback function will transfer only the data required to paint. So for a 100k list you will only have "activity" for the 20-30 lines that are visible.
With wxListCtrl see https://docs.wxwidgets.org/3.0/classwx_list_ctrl.html, specify the wxLC_VIRTUAL flag, call SetItemCount and then provide/override
OnGetItemText
OnGetItemImage
OnGetItemColumnImage
Downside: You can only draw items contained in a wxImageList, since the OnGetItemImage return indizes into the list. So you cannot draw arbitrary items using a wxDC. Since the human eye will be overwhelmed with 100k different images anyway, this is usually acceptable. You may have to provide 20/30 different images beforehand, but you'll have a fast, flexible list.
That said, it is possible to override the OnPaint function and use that wxDC to draw anything in the list. But that'll get difficult pretty soon.
So an alternative would be to use wxGrid, create a wxGridTableBase derived class that acts as a bridge between the grid and your actual 100k data and create wxGridCellRenderer derived classes to render the actual data onscreen. The wxGridCellRenderer class will get a wxDC. This will give you more flexibility but is also much more complex than using a virtual wxListCtrl.
The full example of doing what you want will inevitably be relatively complex. But if you decompose in simple parts, it's really not that difficult: you do need to define a custom model, but if your list is flat, this basically just means returning the value of the item at the N-th position, as you can trivially implement all model methods related to the tree structure. An example of such a model, although with multiple columns can be found in the sample, so you just need to simplify it to a one (or two) column version.
Next, you are going to need a custom renderer too, but this is not difficult neither and, again, there is an example of this in the sample too.
If you have any concrete questions, you should ask them, but it's going to be difficult to do much better than what the sample shows and it does already show exactly what you want to do.
Thank you every one who replied!
#Vz.'s words "If you have any concrete questions, you should ask them" got me thinking and I took another look at the samples of wxWidgets. The full code can be found here. Look at the following classes:
TreeDataViewModel
TreeWidget
TreeCustomRenderer

Does ordering of mesh element change from run to run for constrained triangulation under CGAL?

I iterate over finie_vertieces, finite_edges and finite_faces after generating constrained delauny triangulation with Loyd optimization. I am on VS2012 using CGAL 4.12 under release mode. I see for a given case finite_verices list is repeatable (so is the vertex list under finite_faces), however, the ordering of the edges in finite_edges seems to change from run to run
for(auto eit = cdtp.finite_edges_begin(); eit != cdtp.finite_edges_end(); ++eit)
{
const auto isConstrainedEdge = cdtp.is_constrained(*eit);
auto & cFace = *(eit->first);
auto cwVert = cFace.vertex(cFace.cw(eit->second));
auto ccwVert = cFace.vertex(cFace.ccw(eit->second));
I use the above code snippet to extract vertex list, and vertex list with a given edge changes from run to run.
Any help is appreciated resolving this, as I am looking for consistent behavior in the code. My triangulation involves many line constraints on a two dimensional domain.
I was told it's likely dependable behaviour, but there is no guarantee of order. IIRC the documentation says the traversal order is not guaranteed. I think it's best to assume the iterators' transversal is not deterministic and could change.
You could use any of the _info extensions to embed information into the face, edge, etc (a hash perhaps?) which you could then check against to detect a change.
In my use case, I wanted to traverse the mesh in parallel and OpenMP didn't support the iterators. So I hold a vector of the Face_handles in memory which I can then easily index over. In conjunction with the _info data, you could use this to build a vector of edges,faces, etc with a guaranteed order using unique information in the ->info() field.
Another _info example.

How to add a new syntax element in HM (HEVC test Model)

I've been working on the HM reference software for a while, to improve something in the intra prediction part. Now a new intra prediction algorithm is added to the code and I let the encoder choose between my algorithm and the default algorithm of HM (according to the RDCost of course).
What I need now, is to signal a flag for each PU, so that the decoder will be able to perform the same algorithm as the encoder decides in the rate distortion loop.
I want to know what exactly should I do to properly add this one bit flag to the stream, without breaking anything in the code.
Assuming that I want to use a CABAC context model to keep the track of my flag's statistics, what else should I do:
adding a new context model like ContextModel3DBuffer m_cCUIntraAlgorithmSCModel to the TEncSbac.h file.
properly initializing the model (both at encoder and decoder side) by looking at how the HM initialezes other context models.
calling the function m_pcBinIf->encodeBin(myFlag, cCUIntraAlgorithmSCModel) and m_pcTDecBinIfdecodeBin(myFlag, cCUIntraAlgorithmSCModel) at the encoder side and decoder side, respectively.
I take these three steps but apparently it breaks something.
PS: Even an equiprobable signaling (i.e. without using CABAC contexts) will be useful. I just want to send this flag peacefully!
Thanks in advance.
I could solve this problem finally. It was a bug in the CABAC context initialization.
But I want to share this experience as many people may want to do the same thing.
The three steps that I explained are essentially necessary to add a new syntax element, but one might be very careful with the followings:
In the beginning, you need to decide either you want to use a separate context model for your syntax element? Or you want to use an existing one? In case of CABAC separation, you should define a ContextModel3DBuffer and the best way to do that is: finding a similar syntax element in the code; then duplicating its ``ContextModel3DBuffer'' definition and ALL of its occurences in the code. This way assures that you are considering everything.
Encoding of each syntax elements happens in two different places: first, in the RDO loop to make a "decision", and second, during the actual encoding phase and when the decisions are being encoded (e.g. encodeCtu function).
The order of encoding/decoding syntaxt elements should be the same at the encoder/decoder sides. For example if your new syntax element is encoded after splitFlag and before predMode at the encoder side, you should decode it exactly between splitFlag and predMode at the decoder side.
The context model is implemented as a 3D matrix in order to let track the statistics of syntaxt elements separately for different block sizes, componenets etc. This means that when you want to call the function encodeBin, you may make sure that a correct index is being used. I've made stupid mistakes in this part!
Apart from the above remarks, I found a the function getState very useful for debugging. This function returns the state of your CABAC context model in an arbitrary place of the code when you have access to it. It is very useful to compare the state at the same place of the encoder and the decoder when you have a mismatch. For example, it happens a lot that you encode a 1 but you decode a 0. In this case, you need to check the state of your CABAC context before encoding and decoding. They should be the same. If they are not the same, track back the error to find the first place of mismatch.
I hope it was helpful.

Implementing Sticky Threads In Django

Note: I'm new to both django and databases, so please excuse my ignorance.
I'm trying to implement a forum in django and wish to have sticky threads. The naive way that I was thinking of do this was to define the Thread model like this:
class Thread(models.Model):
title = models.CharField(max_length=max_title_length)
author = models.ForeignKey(Player, related_name="nonsticky_threads")
post_date = models.DateField()
parent = models.ForeignKey(Subsection, related_name="nonsticky_threads")
closed = models.BooleanField()
sticky = models.BooleanField()
and then to get the sticky threads, do something like this:
sticky_threads = Thread.objects.all().filter(sticky=True)
The problem is that at least theoretically this has O(n) complexity, which sounds bad. (Since sticky threads are always displayed on the first page, this query will be run fairly frequently) However, I don't know how database/django cleverness will affect the final performance or if it will still be bad.
My current alternative is to also create distinct Thread and Sticky_Thread classes:
class Thread(models.Model):
title = models.CharField(max_length=max_title_length)
author = models.ForeignKey(Player, related_name="nonsticky_threads")
post_date = models.DateField()
parent = models.ForeignKey(Subsection, related_name="nonsticky_threads")
closed = models.BooleanField()
class Sticky_Thread(models.Model):
title = models.CharField(max_length=max_title_length)
author = models.ForeignKey(Player, related_name="sticky_threads")
post_date = models.DateField()
parent = models.ForeignKey(Subsection, related_name="sticky_threads")
closed = models.BooleanField()
letting me grab the sticky threads in O(1) time no matter what. What I don't like about this approach is that now if I want to just get all of a player's threads, I have to implement a special threads property like this:
class Player(models.Model):
[snip]
#property
def threads(self):
return self.sticky_threads | self.nonsticky_threads
and this approach feels ugly.
Is there an obviously best way to imeplement something like this? Do I just need to do timings to see if the naive way is acceptable? (I'm implementing this as a learning exercise, so I don't really have hard limits, making this check a little difficult) (If so, how would you recommend I do that? (IS something like timeit the bast way?) Is there a better alternative?
Thanks!
Your analysis of the complexity of those two operations is way off. It's simply not true to classify the filter operation as O(n) and the two separate classes as O(1) - I don't know what you're using to make that distinction. Databases are highly optimized for selecting on individual criteria: an index on the sticky column will make the filter query almost exactly the same as querying for everything from a separate table.
The first way is without question the right way to go about this, as long as you ensure that your sticky column is indexed.