A Design Of java.util.HashMap field threshold?

A Design Of java.util.HashMap field threshold? - oop

HashMap source code define threshold as a default field. why not design it to private field?
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

Related

storing non-root table of flatbuffers object for later deserialization

Consider the following flatbuffers schema (from this stack overflow question):
table Foo {
...
}
table Bar {
value:[Foo];
}
root_type Bar;
Assume the number of Foos in a typical object is significant so we want to avoid modifying schema to make Foo the root_type.
Scenario:
A C++ client serializes a proper flatbuffers object and posts it to another component (nodejs backend) that partially deserializes the object and stores the binary representing every Foo in a database as separate documents:
const buf = new flatbuffers.ByteBuffer(req.body)
const bar = fbs.Bar.getRootAsBar(buf)
for (let i = 0; i < bar.valueLength(); i++) {
const foo = bar.value(i)
let item = {
'raw': foo.bb.bytes_ // <-- primary suspect
}
// ... store `item` as an individual entity (mongodb doc)
}
Later, a third component fetches the binary data stored in "raw" key of the mongodb documents and tries to deserialize it into a Foo object:
auto mongoCol = db.collection("results");
auto mongoResult = mongoCol.find_one(
bsoncxx::builder::stream::document{}
<< "_id" << oid << bsoncxx::builder::stream::finalize);
// ...check that mongoResult is not null
const auto result = mongoResult->view();
const auto& binary = result["raw"].get_binary();
std::string content((const char*)binary.bytes, binary.size);
const auto& foo = flatbuffers::GetRoot<fbs::Foo>(content.c_str());
The problem:
But the pointer given as foo does not point to the expected data and any operation on foo potentially leads to segfault or access violation.
Suspicions:
I speculate that the root cause is that the binary that is stored in the database uses offsets according to the original message. So it is essentially invalid in its own original format and the offsets should be readjusted before inserting into database. But I do not see any flatbuffers function API to readjust the offsets?
One less likely root cause may be that the final deserialization code is incomplete and we have to readjust the offsets?
The reason I suspect it is related to offsets is the fact that this same code works just fine if we make a compromise and post smaller flatbuffers objects with one Foo element in every Bar vector (and change backend code to store bar.bb.bytes in raw instead).
Question:
In any way, is it even possible to grab part of a larger properly constructed flatbuffers binary file that you know represents your desired table and deserialize it on its own?

You can't simply copy a sub-table out of a larger FlatBuffer byte-wise, since this data is not necessarily contiguous. The best workaround is to instead make Bar store a [FooBuffer] where table FooBuffer { buf:[byte] (nested_flatbuffer: Foo) }. When you construct one of these, you construct each Foo into its own FlatBufferBuilder and then store the resulting bytes in the parent. Then when you need to stores Foos seperately this then becomes an easy copy.

What do member numbers mean in Microsoft Bond?

Using Microsoft Bond (the C# library in particular), I see that whenever a Bond struct is defined, it looks like this:
struct Name
{
0: type name;
5: type name;
...
}
What do these numbers (0, 5, ...) mean?
Do they require special treatment in inheritance? (Do I need to make sure that I do not override members with the same number defined in my ancestor?)

The field ordinals are the unique identity of each field. When serializing to tagged binary protocols, these numbers are used to indicate which fields are in the payload. The names of the fields are not used. (Renaming a field in the .bond file does not break serialized binary data compatibility [though, see caveat below about text protocols].) Numbers are smaller than strings, which helps reduce the payload size, but also ends up improving serialization/deserialization time.
You cannot re-use the same field ordinal within the same struct.
There's no special treatment needed when you inherit from a struct (or if you have a struct field inside your struct). Bond keeps the ordinals for the structs separate. Concretely, the following is legal and will work:
namespace inherit_use_same_ordinal;
struct Base {
0: string field;
}
struct Derived : Base {
0: bool field;
}
A caveat about text serialization protocols like Simple JSON and Simple XML: these protocols use the field name as the field identifier. So, in these protocols renaming a field breaks serialized data compatibility.
Also, Simple JSON and Simple XML flatten the inheritance hierarchy, so re-using names across Base and Derived will result in clashes. Both have ways to work around this. For Simple XML, the SimpleXml.Settings.UseNamespaces parameter can be set to true to emit fully qualified names.
For Simple JSON, the Bond attribute JsonName can be used to change the name used for Simple JSON serialization, to avoid the conflict:
struct Derived : Base {
[JsonName("derived_field")]
0: bool field;
}

List of classes versus class of lists

I have an OOP design question.
Let's assume that I have a class that contains several numerical scalar properties like maximum, minimum, frequency etc. Since data are flowing in continuously I eventually end up with a list of such class instances. To obtain, say, the global minimum I loop over all classes in the list to find it.
Alternatively, I could instantiate one class (possibly a singleton) that contains lists instead of scalars for each property, and function members that loop over the lists. This approach however seems to generate code that looks more like procedural than object oriented programming.
The question is: What criterions define which approach to choose? If efficiency is important, should I choose one class that contains lists for each properties? If readability is key, should I choose a list of classes?
Thanks for suggestions.

Basically you're askyng if it's more preferable to have an "Array of Structures (AoS)" or "Structure of Arrays (SoA)"
The answer depends on what you need to do with this data. If you want to write a more readable code than go for an Array of Structures, if you want to use SSE or CUDA to optimize your computation-heavy code then go for a Structure of Arrays.
If you search in literature the terms "Array of Structures (AoS)" and "Structure of Arrays (SoA)" you will find many in depth dissertations on this topic, i link just some discussions here:
Structure of arrays and array of structures - performance difference
http://hectorgon.blogspot.it/2006/08/array-of-structures-vs-structure-of.html
http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf
http://en.wikipedia.org/wiki/Stream_processing

You were asking for decision criteria. Let me recommend one:
You should think about what constitutes a data point in your application. Let's assume you are measuring values, and one data point consists of several numerical properties. Then you would certainly want a list of classes, where the class represents all of the properties that go together (what I called 'data point' for lack of a better term).
If you must perform some aggregation of these 'data points', such as finding a global minimum over a longer time period, I would suggest designing an extra component for this. So you'd end up with a data gathering component which consists mainly of a 'list of classes', and an aggregation component which may utilize different data structures, but processes parts of your 'list of classes' (say, the part over which the global minimum is to be found).

Basically, OOP is not a solution to every question in programming, sometimes you have to see beyond this,below this. The thing is you have to concentrate on the problem. Efficiency should be more preferable. But if your code is taking too much time to load or you can say its time complexity is too much high then again you'll have trouble. You have to keep to keep both ends in hands. What i'll prefer is the list of classes not the class of list. But different people have different point of views and so we should respect them. Why i am chosing list of classes because,i'll have each object with the respected data, say , i have one object with higher frequency,one with lower,one with a medium,it will be easier to manage all that,plus not much time will be taken. i think in both cases it will be O(n) where n is the number of elements or classes in my case.

In your case, list of data
struct Statistic{
int max;
int min;
std::vector<int> points;
int average;
};
int main(){
std::vector<Statistic> stats;
return 0;
}

You could also store the values and the statistics together in one class and do the calculations on the fly when adding a new value (example in Java):
public class YourClass {
private List<Integer> values = new ArrayList<Integer>();
private long sum = 0;
private int minimum = Integer.MAX_VALUE;
private int maximum = Integer.MIN_VALUE;
// add more stuff you need
public synchronized void add(Integer value) {
values.add(value);
sum += value;
if (value < minimum) {
minimum = value;
}
if (value > maximum) {
maximum = value;
}
}
public List<Integer> getValues() {
return Collections.unmodifiableList(values);
}
public long getSum() {
return sum;
}
public long getAvg() {
return values.isEmpty() ? 0 : sum / values.size();
}
public int getMaximum() {
return maximum;
}
public int getMinimum() {
return minimum;
}
}

OOP , object concept

According to the standard definition, an object is an entity that contains both data and behaviour.
According to my understanding the data is sent from outside.For eg,we have a class that computes the square of a number.We create an instance and sends a message,along with the number, to the object to compute the square,.
Are we not sending the data from outside?
Why do all the definitions state that the object contains the data?
Thanks

Data, in this context, is state of the object. The definition says that the state/data of object should be internally stored. For example, consider the following class:
class Math {
Double square(double x) {
return x * x;
}
// other similar functions
}
As a language construct, it is a class. But, it is not a true class in object-oriented sense. Because it does not have a state or data. It is just a function wrapped in a class construct. This is not necessarily wrong. Because in this case, it happens that you have operations that don't need a state.
What the definition trying to emphasize is that: you have a real object, when it (or it's class) has both data and behavior. Not every usage of the class construct represents a true object.
Therefore, you have an object if the class representing it satisfies the following three conditions.
The class has state/date. If not, then it is just a bunch of functions. It is not object-oriented, it is procedural.
The class has behavior. If not, then it is just a container, a bunch of variable ( Structures in C).
Not only the class has state/data and behavior/methods, but there is an intrinsic relation between the data and behavior. Which means that just throwing some variables and functions together does not make a true object. For example, if you have state/data and you also have some method, in the class, but if that function does not need to operate upon any of the state, then there is a question whether that method really belongs to that class.
Below is a simple example of what I think is a proper class (representation of object).
Class Patient {
// blood pressure
double systolic;
double diastolic;
double weight;
int age;
public Patient(double systolic, double diastolic, double weight, int age){
}
Public boolean isHealthy(){
// do some calculations and algorithms on age, weight and blood pressure indicators.
// return result as true of false
}
}
Here, we see that class has both state and behavior. We also see that both state and behavior really belong to this class. They are properties of the concept of patient. We further see that operation has an intrinsic relation to data. You can’t decide whether the patient is healthy or not, without consulting/using its state.

I think the problem is with your example which badly fit with an Object Oriented design. I just mean that computing the square of a number is a memoryless function thus there is obviously no reason to store data inside the object properties. However when you will have to deal with the management of stateful entities you will get more easily the importance of classes and object orientation in general.

Your example is a private case where the object doesn't need to hold data (i.e. state). In this case it can be replaced with a function (just the behavior). Most objects need to store data. E.g., an object Person should contain the qualities describing the person, not just possible behavior.

An object is an instance of a class.
Class (a, a*a) is square class but (2, 4) is an instance of it (object). Yes, data is sent to the class and creates new object.

Reference Semantics in Google Protocol Buffers

I have slightly peculiar program which deals with cases very similar to this
(in C#-like pseudo code):
class CDataSet
{
int m_nID;
string m_sTag;
float m_fValue;
void PrintData()
{
//Blah Blah
}
};
class CDataItem
{
int m_nID;
string m_sTag;
CDataSet m_refData;
CDataSet m_refParent;
void Print()
{
if(null == m_refData)
{
m_refParent.PrintData();
}
else
{
m_refData.PrintData();
}
}
};
Members m_refData and m_refParent are initialized to null and used as follows:
m_refData -> Used when a new data set is added
m_refParent -> Used to point to an existing data set.
A new data set is added only if the field m_nID doesn't match an existing one.
Currently this code is managing around 500 objects with around 21 fields per object and the format of choice as of now is XML, which at 100k+ lines and 5MB+ is very unwieldy.
I am planning to modify the whole shebang to use ProtoBuf, but currently I'm not sure as to how I can handle the reference semantics. Any thoughts would be much appreciated

Out of the box, protocol buffers does not have any reference semantics. You would need to cross-reference them manually, typically using an artificial key. Essentially on the DTO layer you would a key to CDataSet (that you simply invent, perhaps just an increasing integer), storing the key instead of the item in m_refData/m_refParent, and running fixup manually during serialization/deserialization. You can also just store the index into the set of CDataSet, but that may make insertion etc more difficult. Up to you; since this is serialization you could argue that you won't insert (etc) outside of initial population and hence the raw index is fine and reliable.
This is, however, a very common scenario - so as an implementation-specific feature I've added optional (opt-in) reference tracking to my implementation (protobuf-net), which essentially automates the above under the covers (so you don't need to change your objects or expose the key outside of the binary stream).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

A Design Of java.util.HashMap field threshold? - oop

Related

storing non-root table of flatbuffers object for later deserialization

What do member numbers mean in Microsoft Bond?

List of classes versus class of lists

OOP , object concept

Reference Semantics in Google Protocol Buffers

Categories

Resources