I have two csv files A and B. A is the master repository. I need to read those files, map the records of B to A and save the mapped records to another file.
The class to hold records is, say Record. The class to hold the matched records is, say, RecordMatch.
class Record
{
string Id;
string Name;
string Address;
string City;
string State;
string Zipcode;
}
class RecordMatch
{
string Aid;
string AName;
string Bid;
string BName;
double NameMatchPercent;
}
The mapping scenario goes thus : First, against each record of B, the records of A are filtered using state, city and then zipcode. The records of A thus filtered are then compared with the record of B. This comparison is between the name field, and is a best-match comparison using a fuzzy string algorithm. The best match is selected and saved.
The string matching algorithm will give a percentage of match. Thus, the best result out of all the matches have to be selected.
Now that I tried my best to explain the scenario, I will come to the design issue. My initial design was to make a Mapper class, which will be something as below :
class Mapper
{
List<Record> ReadFromFile(File);
List<Record> FilterData(FilterType);
void Save(List<Record>);
RecordMatch MatchRecord(Record A, Record B);
}
But looking at the design, it simply seems to be a class wrapper over some methods. I dont see any OO design in it. I also felt that the Match() belongs more to the Record class than the Mapper class.
But on another look, I saw the class as implementing something resembling to Repository pattern.
Another way I think is to keep the Mapper class, and just move the Match() method to the Record class, something like this :
class Mapper
{
List<Record> ReadFromFile(File);
List<Record> FilterData(FilterType);
void Save(List<Record>);
}
class Record
{
string id;
string name;
string address;
// other fields;
public RecordMatch Match (Record record)
{
// This record will compare the name field with that of the passed Record.
// It will return RecordMatch specifyin the percent of match.
}
}
Now I am totally confused in this simple scenario. What would ideally be a good OO design in this scenario?
Amusingly enough, I am working on a project almost exactly like this right now.
Easy Answer: Ok, first off, it is not the end of the world if a method is in the wrong class for a while! If you have your classes all covered with tests, where the functions lives is important, but can be changed around fluidly as you, the king of your domain, sees fit.
If you are not testing this, well, that would be my first suggestion. Many many smarter people than me have remarked on how TDD and testing can help bring your classes to the best design naturally.
Longer Answer: Rather than looking for patterns to apply to a design, I like to think it through like this: what are the reasons each of your classes has to change? If you separate those reasons from each other (which is one thing TDD can help you do), then you will start to see design patterns naturally emerge from your code.
Here are some reasons to change I could think of in a few passes reading through your question:
The data file changes format/adds columns
You find a better matching algorithm, or: "now we want to filter on cell phone number too"
You are asked to make it match xml/yaml/etc files as well
You are asked to save it in a new format/location
Ok, so, if implementing any of those would make you need to add an "if statement" somewhere, then perhaps that is a seam for a subclasses implementing a common interface.
Also, let's say you want to save the created file in a new place. That is one reason to change, and should not overlap with you needing to change your merging strategy. If those two parts are in the same class, that class now has two responsibilities, and that violates the single responsibility principle.
So, that is a very brief example, to go further in depth with good OO design, check out the SOLID principles. You can't go wrong with learning those and seeking too apply them with prudence throughout your OO designs.
I gave this a try. There's not so much you can do when it comes to OO principles or design patterns I think, except for maybe using composition for the MatchingAlgorithm (and perhaps Strategy and Template if needed). Here's what I've cooked up:
class Mapper {
map(String fileA, String fileB, String fileC) {
RecordsList a = new RecordsList(fileA);
RecordsList b = new RecordsList(fileB);
MatchingRecordsList c = new MatchingRecordsList();
for(Record rb : b) {
int highestPerc = -1;
MatchingRecords matchingRec;
for(Record ra : a) {
int perc;
rb.setMatchingAlgorithm(someAlgorithmYouVeDefined);
perc = rb.match(ra);
if(perc > highestPerc) {
matchingRec = new MatchingRecords(rb, ra, perc);
}
}
if(matchingRec != null) {
c.add(matchingRec);
}
}
c.saveToFile(fileC);
}
}
class MatchingAlgorithm {
int match(Record b, Record a) {
int result;
// do your magic
return result;
}
}
class Record {
String Id;
String Name;
String Address;
String City;
String State;
String Zipcode;
MatchingAlgorithm alg;
setMatchingAlgorithm(MatchingAlgorithm alg) {
this.alg = alg;
}
int match(Record r) {
int result; -- perc of match
// do the matching by making use of the algorithm
result = alg.match(this, r);
return result;
}
}
class RecordsList implements List<Record> {
RecordsList(file f) {
//create list by reading from csv-file)
}
}
class MatchingRecords {
Record a;
Record b;
int matchingPerc;
MatchingRecords(Record a, Record b, int perc) {
this.a = a;
this.b = b;
this.matchingPerc = perc;
}
}
class MatchingRecordsList {
add(MatchingRecords mr) {
//add
}
saveToFile(file x) {
//save to file
}
}
(This is written in Notepad++ so there can be typos etc; also the proposed classes can surely benefit from a little more refactoring but I'll leave that to you if you choose to use this layout.)
Related
Suppose there is a class A and it has many instances from class B, and in A, it will have some shared attributes for B to access. Simply I can write this type, I just want to know if there are any pattern or some other good way to make this relationship in OOP.
My idea is straightforward :
class A {
protected int shared;
public List<B> bList;
int getShared ()
{
return shared;
}
}
class B {
protected A _a;
B (A a) {
this._a = a;
}
void hello () {
print (this._a.getShared());
}
}
As I am pretty much a novice in OOP, so I think maybe there some pattern can do this better, looking forward your ideas. Thanks.
Your code is looking like Mediator pattern. Except that classically Mediator (A class) has a set of different objects for interacting with or between them without explicit references.
I'm new to C++/CLI and I'm wondering what is "best practice" regarding managed type data members. Declaring as handle:
public ref class A {
public:
A() : myList(gcnew List<int>()) {}
private:
List<int>^ myList;
};
or as a value:
public ref class B {
private:
List<int> myList;
};
Can't seem to find definitive advice on this.
When writing managed C++ code, I'm in favor of following the conventions used by the other managed languages. Therefore, I'd go with handles for class-level data members, and only use values (stack semantics) where you'd use a using statement in C#.
If your class member is a value, then replacing the object entirely means that the object would need a copy constructor defined, and not many .NET classes do. Also, if you want to pass the object to another method, you'll need to use the % operator to convert from List<int> to List<int>^. (Not a big deal to type %, but easy to forget, and the compiler error just says it can't convert List<int> to List<int>^.)
//Example of the `%` operator
void CSharpMethodThatDoesSomethingWithAList(List<int>^ list) { }
List<int> valueList;
CSharpMethodThatDoesSomethingWithAList(%valueList);
List<int>^ handleList = gcnew List<int>();
CSharpMethodThatDoesSomethingWithAList(handleList);
It all depends on the lifetime. When you have a private member which lives exactly as long as the owning class, the second form is preferable.
Personally, I would use the second form. I say this because I use frameworks that are written by other teams of people, and they use this form.
I believe this is because it is cleaner, uses less space, and is easier for the non-author to read. I try to keep in mind that the most concise code, while still being readable by someone with minimal knowledge of the project is best.
Also, I have not encountered any problems with the latter example in terms of readability across header files, methods, classes, or data files ...etc
Though I'm FAR from an expert in the matter, that is what I prefer. Makes more sense to me.
class AlgoCompSelector : public TSelector {
public :
AlgoCompSelector( TTree *tree = 0 );
virtual ~AlgoCompSelector(){ /* */ };
virtual void Init(TTree *tree);
virtual void SlaveBegin(TTree *tree);
virtual Bool_t Process(Long64_t entry);
virtual void Terminate();
virtual Int_t Version() const { return 1; }
void setAlgo( Int_t idx, const Char_t *name, TTree* part2, TTree* part3 );
void setPTthres( Float_t val );
void setEthres( Float_t val );
private:
std::string mAlgoName[2]; // use this for the axis labels and/or legend labels.
TTree *mPart1;
TTree *mPart2[2], *mPart3[2]; // pointers to TTrees of the various parts
TBranch *mPhotonBranch[2]; // Used branches
TClonesArray *mPhotonArray[2]; // To point to the array in the tree
for example
I often have a situation where I need to do:
function a1() {
a = getA;
b = getB;
b.doStuff();
.... // do some things
b.send()
return a - b;
}
function a2() {
a = getA;
b = getB;
b.doStuff();
.... // do some things, but different to above
b.send()
return a - b;
}
I feel like I am repeating myself, yet where I have ...., the methods are different, have different signatures, etc..
What do people normally do? Add an if (this type) do this stuff, else do the other stuff that is different? It doesn't seem like a very good solution either.
Polymorphism and possibly abstraction and encapsulation are your friends here.
You should specify better what kind of instructions you have on the .... // do some things part. If you're always using the same information, but doing different things with it, the solution is fairly easy using simple polymorphism. See my first revision of this answer. I'll assume you need different information to do the specific tasks in each case.
You also didn't specify if those functions are in the same class/module or not. If they are not, you can use inheritance to share the common parts and polymorphism to introduce different behavior in the specific part. If they are in the same class you don't need inheritance nor polymorphism.
In different classes
Taking into account you're stating in the question that you might need to make calls to functions with different signature depending on the implementation subclass (for instance, passing a or b as parameter depending on the case), and assuming you need to do something with the intermediate local variables (i.e. a and b) in the specific implementations:
Short version: Polymorphism+Encapsulation: Pass all the possible in & out parameters that every subclass might need to the abstract function. Might be less painful if you encapsulate them in an object.
Long Version
I'd store intermediate state in generic class' member, and pass it to the implementation methods. Alternatively you could grab the State from the implementation methods instead of passing it as an argument. Then, you can make two subclasses of it implementing the doSpecificStuff(State) method, and grabbing the needed parameters from the intermediate state in the superclass. If needed by the superclass, subclasses might also modify state.
(Java specifics next, sorry)
public abstract class Generic {
private State state = new State();
public void a() {
preProcess();
prepareState();
doSpecificStuf(state);
clearState();
return postProcess();
}
protected void preProcess(){
a = getA;
b = getB;
b.doStuff();
}
protected Object postProcess(){
b.send()
return a - b;
}
protected void prepareState(){
state.prepareState(a,b);
}
private void clearState() {
state.clear();
}
protected abstract doSpecificStuf(State state);
}
public class Specific extends Generic {
protected doSpecificStuf(State state) {
state.getA().doThings();
state.setB(someCalculation);
}
}
public class Specific2 extends Generic {
protected doSpecificStuf(State state) {
state.getB().doThings();
}
}
In the same class
Another possibility would be making the preProcess() method return a State variable, and use it inthe implementations of a1() and a2().
public class MyClass {
protected State preProcess(){
a = getA;
b = getB;
b.doStuff();
return new State(a,b);
}
protected Object postProcess(){
b.send()
return a - b;
}
public void a1(){
State st = preProcess();
st.getA().doThings();
State.clear(st);
return postProcess();
}
public void a2(){
State st = preProcess();
st.getB().doThings();
State.clear(st);
return postProcess();
}
}
Well, don't repeat yourself. My golden rule (which admittedly I break from time on time) is based on the ZOI rule: all code must live exactly zero, one or infinite times. If you see code repeated, you should refactor that into a common ancestor.
That said, it is not possible to give you a definite answer how to refactor your code; there are infinite ways to do this. For example, if a1() and a2() reside in different classes then you can use polymorphism. If they live in the same class, you can create a function that receives an anonymous function as parameter and then a1() and a2() are just wrappers to that function. Using a (shudder) parameter to change the function behavior can be used, too.
You can solve this in one of 2 ways. Both a1 and a2 will call a3. a3 will do the shared code, and:
1. call a function that it receives as a parameter, which does either the middle part of a1 or the middle part of a2 (and they will pass the correct parameter),
- or -
2. receive a flag (e.g. boolean), which will tell it which part it needs to do, and using an if statement will execute the correct code.
This screams out loud for the design pattern "Template Method"
The general part is in the super class:
package patterns.templatemethod;
public abstract class AbstractSuper {
public Integer doTheStuff(Integer a, Integer b) {
Integer x = b.intValue() + a.intValue();
Integer y = doSpecificStuff(x);
return b.intValue() * y;
}
protected abstract Integer doSpecificStuff(Integer x);
}
The spezific part is in the subclass:
package patterns.templatemethod;
public class ConcreteA extends AbstractSuper {
#Override
protected Integer doSpecificStuff(Integer x) {
return x.intValue() * x.intValue();
}
}
For every spezific solution you implement a subclass, with the specific behavior.
If you put them all in an Collection, you can iterate over them and call always the common method and evry class does it's magic. ;)
hope this helps
I am trying to translate a poker game to a correct OOP model.
The basics :
class Hand
{
Card cards[];
}
class Game
{
Hand hands[];
}
I get games and hands from a text file. I parse the text file several times, for several reasons:
get somes infos (reason 1)
compute some stats (reason 2)
...
For reason 1 I need some attributes (a1, b1) in class Hand. For reason 2, I need some other attributes (a2, b2). I think the dirty way would be :
class Hand
{
Card cards[];
Int a1,b1;
Int a2,b2;
}
I would mean that some attributes are useless most of the time.
So, to be cleaner, we could do:
class Hand
{
Card cards[];
}
class HandForReason1 extends Hand
{
Int a1,b1;
}
But I feel like using a hammer...
My question is : is there an intermediate way ? Or the hammer solution is the good one ? (in that case, what would be a correct semantic ?)
PS : design patterns welcome :-)
PS2 : strategy pattern is the hammer, isn't it?
* EDIT *
Here is an application :
// Parse the file, read game infos (reason 1)
// Hand.a2 is not needed here !
class Parser_Infos
{
Game game;
function Parse()
{
game.hands[0].a1 = ...
}
}
// Later, parse the file and get some statistics (reason 2)
// Hand.a1 is not needed here !
class Parser_Stats
{
Game game;
function Parse()
{
game.hand[0].a2 = ...
}
}
Using a chain of responsibility to recognize a poker hand is what I would do. Since each hand has it's own characteristics, you can't just have a generic hand.
Something like
abstract class Hand {
protected Hand next;
abstract protected boolean recognizeImpl(Card cards[]);
public Hand setNext(Hand next) {
this.next = next;
return next;
}
public boolean Hand recognize(Card cards[]) {
boolean result = ;
if (recognizeImpl(cards)) {
return this;
} else if (next != null) {
return next.recognize(cards);
} else {
return null;
}
}
}
And then have your implementation
class FullHouse extends Hand {
protected boolean recognizeImpl(Card cards[]) {
//...
}
}
class Triplet extends Hand {
protected boolean recognizeImpl(Card cards[]) {
//...
}
}
Then build your chain
// chain start with "best" hand first, we want the best hand
// to be treated first, least hand last
Hand handChain = new FullHouse();
handChain
.setNext(new Triplet())
//.setNext(...) /* chain method */
;
//...
Hand bestHand = handChain.recognize(cards);
if (bestHand != null) {
// The given cards correspond best to bestHand
}
Also, with each hand it's own class, you can initialize and have then hold and compute very specific things. But since you should manipulate Hand classes as much as you can (to stay as much OO as possible), you should avoid having to cast your hands to a specific hand class.
** UPDATE **
Alright, so to answer your original question (sig) the class Hand is for manipulating and treating "hands". If you need to calculate other statistics or other needs, wrapping your Hand class might not be a good idea as you'll end up with a compound class, which is not desirable (for maintainability's sake and OOP paradigm).
For the reason 1, it is alright to have different kinds of hands, as the chain of responsibility illustrate; you can read your file, create different kinds of hands with the many parameters as is required.
For reason 2, you might look at other solutions. One would be to have your Hand classes fire events (ex: when it is recognized) and your application could register those hands into some other class to listen for events. That other class should also be responsible to collect the necessary data from the files you are reading. Since a hand is not (or should not be) responsible to collect statistical data, the bottom line is that you need to have something else handle that.
One package = coherent API and functionalities
One class = coherent functionalities (a hand is a hand, not a statistical container)
One method = a (single) functionality (if a method needs to handle more than one functionality, break those functionalities into separate private methods, and call them from the public method)
I'm giving you a generic answer here because reason 1 and reason 2 are not specific.
I am designing a class that stores (caches) a set of data. I want to lookup a value, if the class contains the value then use it and modify a property of the class. I am concerned about the design of the public interface.
Here is how the class is going to be used:
ClassItem *pClassItem = myClass.Lookup(value);
if (pClassItem)
{ // item is found in class so modify and use it
pClassItem->SetAttribute(something);
... // use myClass
}
else
{ // value doesn't exist in the class so add it
myClass.Add(value, something);
}
However I don't want to have to expose ClassItem to this client (ClassItem is an implementation detail of MyClass).
To get round that the following could be considered:
bool found = myClass.Lookup(value);
if (found)
{ // item is found in class so modify and use it
myClass.ModifyAttribute(value, something);
... // use myClass
}
else
{ // value doesn't exist in the class so add it
myClass.Add(value, something);
}
However this is inefficient as Modify will have to do the lookup again. This would suggest a lookupAndModify type of method:
bool found = myClass.LookupAndModify(value, something);
if (found)
{ // item is found in class
... // use myClass
}
else
{ // value doesn't exist in the class so add it
myClass.Add(value, something);
}
But rolling LookupAndModify into one method seems like very poor design. It also only modifies if value is found and so the name is not only cumbersome but misleading as well.
Is there another better design that gets round this issue? Any design patterns for this (I couldn't find anything through google)?
Actually std::set<>::insert() does precisely this. If the value exists, it returns the iterator pointing to the existing item. Otherwise, the iterator where the insertion was made is returned.
It is likely that you are using a similar data structure for fast lookups anyway, so a clean public interface (calling site) will be:
myClass.SetAttribute(value, something)
which always does the right thing. MyClass handles the internal plumbing and clients don't worry about whether the value exists.
Two things.
The first solution is close.
Don't however, return ClassItem *. Return an "opaque object". An integer index or other hash code that's opaque (meaningless) to the client, but usable by the myClass instance.
Then lookup returns an index, which modify can subsequently use.
void *index = myClass.lookup( value );
if( index ) {
myClass.modify( index, value );
}
else {
myClass.add( value );
}
After writing the "primitive" Lookup, Modify and Add, then write your own composite operations built around these primitives.
Write a LookupAndModify, TryModify, AddIfNotExists and other methods built from your lower-level pieces.
This assumes that you're setting value to the same "something" in both the Modify and Add cases:
if (!myClass.AddIfNotExists(value, something)) {
// use myClass
}
Otherwise:
if (myClass.TryModify(value, something)) {
// use myClass
} else {
myClass.Add(value, otherSomething);
}