Object condition in multiple places/repeated code (DRY) - oop

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.

Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.

No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

Related

Why does class Money extend Expression in Kent Beck's TDD by Example?

I'm studying TDD by Example and up so far I'm finding it a great book. But there's a point where he tells us to write:
// in class Money:
Expression plus(Money addend) {
return new Money(amount + addend.amount, currency);
}
Which won't build unless we declare:
class Money implements Expression {...
This doesn't really make sense to me. Author created Expression as an interface for Sum and Money has nothing in common with Sum. Later on he adds method reduce() in common to both classes, but reduce in Money merely returns this.
Making Money implement Expression is only the path of least effort for removing the error from the plus() method, but it fills the code with unnecessary information (it will have to implement reduce() because of this decision) and increases entropy.
I haven't given it much of a thought, but wouldn't it be cleaner to do something such as this?
class Money {
Money plus(Money addend) {
return new Money(amount + addend.amount, currency).reduce();
}
}
// edited this, it previously returned Expression
Edit:
In the following chapter, author implements a reduce() method in another class (named Bank), which converts Money in between currencies. I still find this to be a weird solution, Sum and Expression names imply we should have a conversion expression class for this task instead. What the author is likely intending to do is to use recursion to add money in different currencies. Either way, it seems to me that he did some far-ahead planning which seems incompatible with TDD as is presented in the book.
On planning ahead
TDD doesn't prohibit planning ahead. The process is about getting rapid feedback on your plans instead of wasting days (or weeks) creating elaborate plans, only to see them 'not survive contact with reality' (to paraphrase Helmuth von Moltke). It's okay to think ahead.
Still, Kent Beck reveals in chapter 17 that this isn't his first rodeo:
"I have programmed money in production at least three times that I can think of. I have used it as an example in print another half-dozen times. I have programmed it live on stage [...] another fifteen times. I coded another three or four times preparing for writing [...] Then, while I was writing this, I thought of using expression as a metaphor and the design went in a completely different direction than before."
So, if you think that he's cheating: yes, he is. He's open about it, though. I think that the motivation was to present a compelling example. He also writes that it was in part based on early reviews of the book.
On the API
That doesn't explain why the code looks like it does, but there's a reason for that. It's actually a nice API.
Why can't you write something like return new Expression(amount + addend.amount, currency).reduce()?
You can't because the reduce method isn't nullary. It takes arguments. You must supply both a bank (which holds the currency conversion rates) and a destination currency.
Keep in mind the problem being solved by the code. People get this wrong all the time, and I think that Kent Beck (inadvertently) fuelled the confusion by naming the example Money.
The problem isn't to model money, but to model investment portfolios in different currencies. If you have a portfolio of 25.000 USD and 10.000 CHF, reducing it to a single currency hides important details. With a portfolio in multiple currencies you diversify risk. Portfolio owners will want to see more than one view of their portfolios. Sometimes, they want to see the portfolio grouped by currency, and at other times they want to see 'the current total worth' of the portfolio presented in a single currency.
The API in the book enables both views.
The reason the underlying 'metaphor' is called an expression is that the API is just a specialised expression tree. It turns out fairly well, though, because it's lawful. Restricted to the sub-types presented in the book, it informally gives rise to a monoid.

Sequence number field with concurrent writes

Imagine a rail track, and my goal is to store every railcar that exists on that track. Each railcar has a position. Say there are 100 railcars on the track, so each railcar object would have a TrackPosition from 1-100.
That is essentially what we are doing right now, with a Track having child Railcars, and each Railcar has an integer TrackPosition.
When a new railcar is added, we simply take the # of cars in the track + 1 to save the position of the new car.
We are running into issues in a few different areas:
We would like to add cars concurrently using AWS Lambda. This presents a problem as two functions could hit the line of logic that calculates "total cars on track + 1" at the same time. When they go to save, both cars would have the same position. Locking that bit of code is not possible within AWS Lambda (as far as I can tell from what I've read). We've resolved this for the time being by setting the Lambdas to fire synchronously (concurrency set to 1), obviously not ideal for performance.
We would like to add a car into the middle of a track. This would involve taking any car with a greater position and incrementing them all. Not difficult to write some code to do this, but..
I'm wondering if I'm missing something fundamental within SQL that can achieve what I am after in a less error-prone way. The way I'm doing it seems naive. I've looked into Sequences, but I'm not sure if they would solve my concurrency issue.
Any insight would be greatly appreciated. We are using Entity Framework Core 2 with SQL Server.

Designing a solution to retrieve and classify content based on given attributes

This is a design problem I am facing. Let's say I have a cars website. Cars have the following attributes with different possible values.
Color: red, green, blue
Size: small, big
Based on those attributes I want to classify between cars for young people, cars for middle aged people and cars for elder people, with the following criteria:
Cars_young: red or green
Cars_middle_age: blue and big
Cars_elder: blue and small
I'll call this criteria target
I have a table cars with columns: id, color and size.
I need to be able to:
a) when retrieving a car by id, tell its target (if it's young, middle age or elder people)
b) be able to query the database to know how many views had cars belonging to each target
Also, as a developer, I must implement it in a way that those criteria are easily changed.
Which is the best way to implement it? Is there a design pattern for it? I can explain two possible solutions I thought about but I don't really like:
1) create a new column in the database table called target, so it's easy to make both a) and b).
Drawbacks: Each time crieteria changes I have to update the column target for all cars, and also, I have to change the insertNewCar() function.
2) Implement it in the 'Cars' class.
Drawback: Each time criteria changes I have to change query in b) as well as code in 'getCarById' in a).
3) Use TRIGGERS in SQL, but I would like to avoid this solution if possible
I would like to be able have this criteria definition somewhere in the code which can be changed easily, and would also hopefully be used by 'Cars' class. I'm thinking about some singleton or global objects for 'target' which can be injected in some Cars methods.
Anyone can explain a nice solution or send documentation about some post that faces this problem, or a pattern design that solves it?
On first sight specification pattern might meet your expectations. Wikipedia gives a nice explanation how it works, small teaser bellow:
OverDueSpecification OverDue = new OverDueSpecification();
NoticeSentSpecification NoticeSent = new NoticeSentSpecification();
InCollectionSpecification InCollection = new InCollectionSpecification();
ISpecification SendToCollection = OverDue.And(NoticeSent).And(InCollection.Not());
InvoiceCollection = Service.GetInvoices();
foreach (Invoice currentInvoice in InvoiceCollection) {
if (SendToCollection.IsSatisfiedBy(currentInvoice)) {
currentInvoice.SendToCollection();
}
}
You can consider combine specification pattern with observers.
Also there are few other ideas:
extention of specification pattern on SQL generation, WHERE clauses in particular
storing criteria configuration in database
criteria versioning: storing information about version of rules used to assign to category comined with category itself

Complex derived attributes in Django models

What I want to do is implement submission scoring for a site with users voting on the content, much like in e.g. reddit (see the 'hot' function in http://code.reddit.com/browser/sql/functions.sql). Edit: Ultimately I want to be able to retrieve an arbitrarily filtered list of arbitrary length of submissions ranked according to their score.
My submission model currently keeps track of up and down vote totals. Currently, when a user votes I create and save a related Vote object and then use F() expressions to update the Submission object's voting totals. The problem is that I want to update the score for the submission at the same time, but F() expressions are limited to only simple operations (it's missing support for log(), date_part(), sign() etc.)
From my limited experience with Django I can see 5 options here:
extend F() somehow (haven't looked at the code yet) to support the missing SQL functions; this is my preferred option and seems to fit within the Django framework the best
define a scoring function (much like reddit's 'hot' function) in my database, and have Django use the value of that function for the value of the score field; as far as I can tell, #2 is not possible
wrap my two step voting process in a suitably isolated transaction so that I can calculate the voting totals in Python and then update the Submission's voting totals without fear that another vote against the submission could be added/changed in the meantime; I'm hesitant to take this route because it seems overly complex - what is a "suitably isolated transaction" in this case anyway?
use raw SQL; I would prefer to avoid this entirely -- what's the point of an ORM if I have to revert to SQL for such a common use case as this! (Note that this coming from somebody who loves sprocs, but is using Django for ease of development.)
(edit: added this after further discussion) compute the score using an extra select parameter containing a call to my function; this would work but impose unnecessary load on the DB (would be forced to calculate the score for every submission ever made every time the query ran; caching could help here, but it still seems like a bit of lame workaround)
Before I embark on this mission to extend F() (which I'm not sure is even possible), am I about to reinvent the wheel? Is there a more standard way to do this? It seems like such a common use case and yet in an hour of searching I have yet to find a common solution...
EDIT: There is another option: set the default value of the field in the database script to be an expression containing my function. This is not as flexible as #1, but probably the quickest and cleanest approach to solving the problem (although my initial investigation into extending F() looks promising).
Why can't you just denormalize the score and reconstruct it with the Vote objects every once and a while?
If you can't do that, it is very easy to make a 'property' function that acts as an object attribute for scoring.
#property
def score(self):
... calculate score from Vote objects ...
return score
I've never used F() on a property like this, but it's Python, so I bet it works.
If you are using django-voting (which I recommend), you can put #3 in the manager's record_vote function since that's how all vote transactions take place.

Should we put units of measurements in attribute names? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I think most of us agree that it's a good idea to use a descriptive name for variables, object attributes, and database columns. If you want to store something's name, you may as well call the attribute Name so people know what to put in it.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
My database administrator, however, just told me that including units of measurement in database column names is “frowned upon”. I think that's just nuts, but perhaps there's some risk DBAs know about that I don't.
Throw me a line, here: should we embed units of measurement in our attribute names? Why? Why not?
If you have a consistent UOM for things, then your DBA's policy is OK.
For example, if timespans are ALWAYS in minutes, etc.
If the UOM could change, then you should store it in another column, alongside the qty.
That said, I tend to side with you on this. Clarity trumps most things, including this. I'd rather see DurationMinutes than Duration and have to guess what the UOM is.
Yes. You should.
The key, as #[Charles Bretana] pointed out, is legibility and that the other users of your table or developers following you know what you're using.
I would absolutely involve the units/measurement in a field name - in my business you can't guess what you'll find from the context or name: a field entitled MarketValue - is that in millions, thousands or units? US Dollars, Euros, pounds, $CURRENCY? Is that value a percentage, a ratio? Absolute or relative? Daily, monthly, calendar year, financial year? That timestamp, what time zone is it?
Your first, last and only task when providing data is to ensure that it isn't used incorrectly because the consumer wasn't able to find out enough about it. As developers, throwing "Metre", "USD", "GMT", "Percent" or whatever into a field name isn't the least bit smelly.
There are enormous smells that need resolving before the tiny whiff of field naming needs standardising.
This is why the Mars Climate Orbiter crashed into the surface at 350 meters/sec when it was planned to only handle 350 ft/sec (or something like that).
Although "Never say 'Never' or 'Always'" is, in general, a good rule of thumb, here I will bend my rule and say I think you should "always" make it clear what units a numeric value is in.
The convention of naming all my columns in the format:
{name}_in_{unit}
helped for one project, since I was using si units it actually ended up allowing me to be able to infer the column data type and generally simplify my writing style.
length_in_m
speed_in_ms-1
color_in_nm
there were a few exceptions that I handled either with at_time or number_of:
started_at_time
updated_at_time
number_of_rotations
I think this is a good idea anywhere since there is always room for ambiguity.
For example, the with high performance timer class we use, I keep having to check if the GetElapsed() method returns seconds or milliseconds or something else. If it were called GetElapsedMilliseconds() that would save the confusion.
The only downside being if you wanted to change your mind ... but in that case any clients would need to know about the change anyway.
F# has an interesting twist on this allowing measurement units to be specified in the type system. See this blog post, and another stackoverflow question discussing Are units of measurement unique to F#?
I've done a lot of database work, and I would not frown upon that at all, nor have I heard of frowning on it.
It's better than the extended properties, which is not apparent to the casual developer. It's better than in a separate document, because many developers won't read them, and certainly not in great detail. If the units are set, then having it in the name sounds like a good idea. If that changes, then when the unit field is added, change the name of the measurement field.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
You could go even a step further (in your code, not in the database) and have a Length type, which takes care of the measurement unit and of possible conversions. This is the approach of the "Quantity" pattern in Martin Fowler's "Analysis Patterns" book.
Do not put units of measurement (or column type) in your database column names.
Many Databases have the ability to document/comment columns in some way (in SQL Server it is sp_addextendedproperty), I would suggest that is a more appropriate place.
For Python datetimes, consider using objects from the datetime package. Doing so will capture the unit implicity to microsecond resolution. There is then no basis for including the unit in the variable name.
If you must use an int or float instead, it is strongly recommend to suffix the unit name abbreviation to the variable name. For example, instead of the variable name diff, use diff_secs for seconds, diff_ms for milliseconds, diff_µs for microseconds, or diff_ns for nanoseconds.
We don't put units of measurement in column names in our database. We do, however, have a data dictionary document where all of the columns and relationships are described.
The ideal approach is, if possible, to use a type that leaves no ambiguity as to the measurement. For example in .NET rather than saying int periodInSeconds you'd be much better off using TimeSpan period.
The F# language actually has units of measurement as part of the type system so you can declare types in units such as 10<m/s> and 5<s> and even perform calculations on them so something like 10<m/s> * 5<s> would result in 50<m>. See here for more info.
So I'd say if possible use a type that conveys your intention, but if that isn't possible then you should probably encode the measurement into the name. It's better and more obvious than a comment.
You definitely want units of measurement somewhere. I don't know if the column names are a good place or if the schema is better. Ask your database administrator
Where is the information about units of measure stored?
How can I get access to the units programmatically?
If the answers are "it isn't" or "you can't", complain bitterly---they have no right to deny you your naming convention. Otherwise, all may be happier if you work within the system.
P.S. I really like the support for units of measure that they've put into F#.
I have to say, I hate "descriptive" variable names becoming "incredibly verbose" variable names.
My preferred alternative is to use nothing but the unit-of-measure names in short functions. Eg.
function velocity(m, s) {
return m/s;
}
You don't need to say "length_m" because in this context, it's obvious that only lengths are measurable in metres.
Having said that. If I was writing a system where units of measure errors were really dangerous, I'd probably use the type system and define a Length class which always converted itself into a standard unit for any calculation. Maybe even different sub-classes for Feet, Metres etc.
NO, the name of the attribute is seperate from its unit of measurement.
If you call a variable length_mm then you are tied to mm.
what if you use a 32bit int to store length_mm, eventually the length in mm may get larger then 62,000, or whatever the limit is on 32bit ints. You cant switch over to m cause you tied you length variable to length_mm.
I think putting units in your identifiers is a huge design smell. It almost surely means that you chose the wrong language: if units are so important to the project, you'd better be using a language whose type system is capable of representing them.