Designing a solution to retrieve and classify content based on given attributes - oop

This is a design problem I am facing. Let's say I have a cars website. Cars have the following attributes with different possible values.
Color: red, green, blue
Size: small, big
Based on those attributes I want to classify between cars for young people, cars for middle aged people and cars for elder people, with the following criteria:
Cars_young: red or green
Cars_middle_age: blue and big
Cars_elder: blue and small
I'll call this criteria target
I have a table cars with columns: id, color and size.
I need to be able to:
a) when retrieving a car by id, tell its target (if it's young, middle age or elder people)
b) be able to query the database to know how many views had cars belonging to each target
Also, as a developer, I must implement it in a way that those criteria are easily changed.
Which is the best way to implement it? Is there a design pattern for it? I can explain two possible solutions I thought about but I don't really like:
1) create a new column in the database table called target, so it's easy to make both a) and b).
Drawbacks: Each time crieteria changes I have to update the column target for all cars, and also, I have to change the insertNewCar() function.
2) Implement it in the 'Cars' class.
Drawback: Each time criteria changes I have to change query in b) as well as code in 'getCarById' in a).
3) Use TRIGGERS in SQL, but I would like to avoid this solution if possible
I would like to be able have this criteria definition somewhere in the code which can be changed easily, and would also hopefully be used by 'Cars' class. I'm thinking about some singleton or global objects for 'target' which can be injected in some Cars methods.
Anyone can explain a nice solution or send documentation about some post that faces this problem, or a pattern design that solves it?

On first sight specification pattern might meet your expectations. Wikipedia gives a nice explanation how it works, small teaser bellow:
OverDueSpecification OverDue = new OverDueSpecification();
NoticeSentSpecification NoticeSent = new NoticeSentSpecification();
InCollectionSpecification InCollection = new InCollectionSpecification();
ISpecification SendToCollection = OverDue.And(NoticeSent).And(InCollection.Not());
InvoiceCollection = Service.GetInvoices();
foreach (Invoice currentInvoice in InvoiceCollection) {
if (SendToCollection.IsSatisfiedBy(currentInvoice)) {
currentInvoice.SendToCollection();
}
}
You can consider combine specification pattern with observers.
Also there are few other ideas:
extention of specification pattern on SQL generation, WHERE clauses in particular
storing criteria configuration in database
criteria versioning: storing information about version of rules used to assign to category comined with category itself

Related

Object condition in multiple places/repeated code (DRY)

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.
Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.
No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

how to represent multichannel event sequences

I'm trying to use TraMineR but am open to feedback/references/links to more info as to how to represent multi-channel or hierarchical event sequences and algorithms that deal with it.
I have a complex event structure that I'm trying to figure out how to represent as a sequence. There are different types of events. Each event type may have a different set of fields (and different numbers of fields). For instance, age might be a field in one event type whereas height might be a field in another event type. My first instinct (and I believe a common approach) was to “flatten” everything, e.g. every possible combination of values for an event constitutes a unique event type. However, this may miss patterns in the generic event types.
For example, let's say I'm a dog breeder and drink a lot of coffee and I want to see if there are patterns in my coffee/dog buying habits (yes, silly example). I might have events like:
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: dark
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: light
- Bought dog
- Breed: Doberman pincher
- Sex: male
To flatten the data I may say that every unique combination of store and roast is a unique coffee buying event. Also, every unique combination of breed and sex is a unique dog buying event. This approach would turn the example above into 5 different event types (rather than 2 event types with fields). This representation could detect patterns such as the following: if I drink 2 dark roast coffees from Starbucks then I am more likely to by a male Doberman pincher.
However, this representation may miss more general patterns that don't depend on field values in the events. For instance, it may be the case that I simply buy a dog after having two coffees in general.
I'd like to be able to detect patterns at both "levels" and am unsure of how to represent the events to do so. Of course one approach would be to use both representations and then just combine the results of the two.
So, questions are:
1. Any links/citations to papers that deal with this?
2. Is this a common issue?
3. Any recommendations on how to represent these events?
4. Any recommendations on how to work with them in TraMineR
5. Any recommendations / links / references to algorithms that deal with this sort of thing?
6. Any ideas at all?
Thanks!!!
This is actually similar to the question asked here (although they did not know to reference "multi-channel" and the title was vague): Multiple events in traminer
TraMineR has support for dealing with multichannel sequences with functions like:
seqdistmc
The general approach, I believe, is to do exactly what I outlined as our "flatten" solution. In this case you combine the values for each channel into one event type. e.g. in my example dog.hound.female would be one event with one channel/field to replace the first event in my example that has 3 separate fields/channels. You then use the typical functions for finding distances, subsequences, etc. You do have options for setting up substitution costs and finding distances though, so it has some extra options for doing this multi-channel approach. It also deals with missing values in case you have channels that are different length or have gaps.
This is also similar to what's suggested in the answer to the topic linked above, using the native R function interaction.

In UML/ER diagramming, how to notate & make transaction requirements with entity that has different values depending on a certain attribute?

I am diagramming an art museum system, where there are Permanent_Art_Objects. Each Permanent_Art_Object has many attributes, and can also be either a 1) Sculpture/Statue, 2) Painting, or 3) Other. Depending on whether it's a sculpture/statue, painting, or other, it has sub-attributes unique to itself.
Here is an example of these sub-attributes.
What is the proper notation for showing these 'sub-attributes'?
For example, if Permanent_Art_Object is Other, it has as sub-attributes Type and Style.
Also, how would I make a query to INSERT INTO Permanent_Art_Object VALUES() for a new art object, if there's so much variety??
It all depends on what you are making. If this is purely for a database, I think ERD's are the cleanest way for modeling but a sidenote is that there are atleast 4 types of notations. Below is how I would do it in UML and ERD with the limited context I have.
More info about ERD's:
Basics: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch2.xht
Specialisations: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch16.xht
Overview of different types: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Cardinalities
My example:

figuring out which field to look for a value in with SQL and perl

I'm not too good with SQL and I know there's probably a much more efficient way to accomplish what I'm doing here, so any help would be much appreciated. Thanks in advance for your input!
I'm writing a short program for the local school high school. At this school, juniors and seniors who have driver's licenses and cars can opt to drive to school rather than ride the bus. Each driver is assigned exactly one space, and their DLN is used as the primary key of the driver's table. Makes, models, and colors of cars are stored in a separate cars table, related to the drivers table by the License plate number field.
My idea is to have a single search box on the main GUI of the program where the school secretary can type in who/what she's looking for and pull up a list of results. Thing is, she could be typing a license plate number, a car color, make, and model, someone driver's name, some student driver's DLN, or a space number. As the programmer, I don't know what exactly she's looking for, so a couple of options come to mind for me to build to be certain I check everywhere for a match:
1) preform a couple of
SELECT * FROM [tablename]
SQL statements, one per table and cram the results into arrays in my program, then search across the arrays one element at a time with regex, looking for a matched pattern similar to the search term, and if I find one, add the entire record that had a match in it to a results array to display on screen at the end of the search.
2) take whatever she's looking for into the program as a scaler and prepare multiple select statements around it, such as
SELECT * FROM DRIVERS WHERE DLN = $Search_Variable
SELECT * FROM DRIVERS WHERE First_Name = $Search_Variable
SELECT * FROM CARS WHERE LICENSE = $Search_Variable
and so on for each attribute of each table, sticking the results into a results array to show on screen when the search is done.
Is there a cleaner way to go about this lookup without having to make her specify exactly what she's looking for? Possibly some kind of SQL statement I've never seen before?
Seems like a right application for the Sphinx full-text search engine. There's the Sphinx::Search module on CPAN which can be used as perl client for Sphinx.
First of all, you should not use SELECT * and you should definitely use bind values.
Second, the easiest way to figure out what the user is searching for is to ask the user. Have a set of checkboxes likes so:
Search among: [ ] Names
[ ] License Plate Numbers
[ ] Driver's License Numbers
Alternatively, you can note that names do not contain any digits and I have not seen any driver's license number which contains digits. There are other heuristics you can apply to partially deduce what the user was trying to search.
If you do an OK job of presenting the results, this might work out.
Finally, try to figure out what search possibilities are offered by the database you are using and leverage them so that most of the searching happens before the user interface touches the data.

HABTM finds with "AND" joins, NOT "OR"

I have two models, associated with a HABTM (actually using has_many :through on both ends, along with a join table). I need to retrieve all ModelAs that is associated with BOTH of two ModelBs. I do NOT want all ModelAs for ModelB_1 concatenated with all ModelAs for ModelB_2. I literally want all ModelAs that are associated with BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelBs, it may be up to 50 ModelBs, so this must scale.
I can describe the problem using a variety of analogies, that I think better describes my problem than the previous paragraph:
* Find all books that were written by all 3 authors together.
* Find all movies that had the following 4 actors in them.
* Find all blog posts that belonged to BOTH the Rails and Ruby categories for each post.
* Find all users that had all 5 of the following tags: funny, thirsty, smart, thoughtful, and quick. (silly example!)
* Find all people that have worked in both San Francisco AND San Jose AND New York AND Paris in their lifetimes.
I've thought of a variety of ways to accomplish this, but they're grossly inefficient and very frowned upon.
Taking an analogy above, say the last one, you could do something like query for all the people in each city, then find items in each array that exist across each array. That's a minimum of 5 queries, all the data of those queries transfered back to the app, then the app has to intensively compare all 5 arrays to each other (loops galore!). That's nasty, right?
Another possible solution would be to chain the finds on top of each other, which would essentially do the same as above, but won't eliminate the multiple queries and processing. Also, how would you dynamicize the chain if you had user submitted checkboxes or values that could be as high as 50 options? Seems dirty. You'd need a loop. And again, that would intensify the search duration.
Obviously, if possible, we'd like to have the database perform this for us, so, people have suggested to me that I simply put multiple conditions in. Unfortunately, you can only do an OR with HABTM typically.
Another solution I've run across is to use a search engine, like sphinx or UltraSphinx. For my particular situation, I feel this is overkill, and I'd rather avoid it. I still feel there should be a solution that will let a user craft a query for an arbitrary number of ModelBs and find all ModelAs.
How would you solve this problem?
You may do this:
build a query from your ModelA, joining ModelB (through the join model), filtering the ModelBs that have one of the values that you are looking for, that is putting them in OR (i.e. where ModelB = 'ModelB_1' or ModelB = 'ModelB_2'). With this query the result set will have multiple 'ModelA' rows, exactly one row for each ModelB condition satisfied.
add a group by condition to the query on the ModelA columns you need (even all of them if you wish). The count() for each row is equal to the number of ModelB conditions satisfied*.
add a 'having' condition selecting only the rows whose count(*) is equal to the number of ModelB conditions you need to have satisfied
example:
model_bs_to_find = [100, 200]
ModelA.all( :joins=>{:model_a_to_b=>:model_bs},
:group=>"model_as.id",
:select=>"model_as.*",
:conditions=>["model_bs.id in (?)", model_bs_to_find],
:having=>"count(*)=#{model_bs_to_find.size}")
N.B. the group and select parameters specified in that way will work in MySQL, the standard SQL way to do so would be to put the whole list of model_as columns in both the group and select parameters.