In UML/ER diagramming, how to notate & make transaction requirements with entity that has different values depending on a certain attribute? - sql

I am diagramming an art museum system, where there are Permanent_Art_Objects. Each Permanent_Art_Object has many attributes, and can also be either a 1) Sculpture/Statue, 2) Painting, or 3) Other. Depending on whether it's a sculpture/statue, painting, or other, it has sub-attributes unique to itself.
Here is an example of these sub-attributes.
What is the proper notation for showing these 'sub-attributes'?
For example, if Permanent_Art_Object is Other, it has as sub-attributes Type and Style.
Also, how would I make a query to INSERT INTO Permanent_Art_Object VALUES() for a new art object, if there's so much variety??

It all depends on what you are making. If this is purely for a database, I think ERD's are the cleanest way for modeling but a sidenote is that there are atleast 4 types of notations. Below is how I would do it in UML and ERD with the limited context I have.
More info about ERD's:
Basics: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch2.xht
Specialisations: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch16.xht
Overview of different types: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Cardinalities
My example:

Related

Object condition in multiple places/repeated code (DRY)

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.
Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.
No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

how to represent multichannel event sequences

I'm trying to use TraMineR but am open to feedback/references/links to more info as to how to represent multi-channel or hierarchical event sequences and algorithms that deal with it.
I have a complex event structure that I'm trying to figure out how to represent as a sequence. There are different types of events. Each event type may have a different set of fields (and different numbers of fields). For instance, age might be a field in one event type whereas height might be a field in another event type. My first instinct (and I believe a common approach) was to “flatten” everything, e.g. every possible combination of values for an event constitutes a unique event type. However, this may miss patterns in the generic event types.
For example, let's say I'm a dog breeder and drink a lot of coffee and I want to see if there are patterns in my coffee/dog buying habits (yes, silly example). I might have events like:
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: dark
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: light
- Bought dog
- Breed: Doberman pincher
- Sex: male
To flatten the data I may say that every unique combination of store and roast is a unique coffee buying event. Also, every unique combination of breed and sex is a unique dog buying event. This approach would turn the example above into 5 different event types (rather than 2 event types with fields). This representation could detect patterns such as the following: if I drink 2 dark roast coffees from Starbucks then I am more likely to by a male Doberman pincher.
However, this representation may miss more general patterns that don't depend on field values in the events. For instance, it may be the case that I simply buy a dog after having two coffees in general.
I'd like to be able to detect patterns at both "levels" and am unsure of how to represent the events to do so. Of course one approach would be to use both representations and then just combine the results of the two.
So, questions are:
1. Any links/citations to papers that deal with this?
2. Is this a common issue?
3. Any recommendations on how to represent these events?
4. Any recommendations on how to work with them in TraMineR
5. Any recommendations / links / references to algorithms that deal with this sort of thing?
6. Any ideas at all?
Thanks!!!
This is actually similar to the question asked here (although they did not know to reference "multi-channel" and the title was vague): Multiple events in traminer
TraMineR has support for dealing with multichannel sequences with functions like:
seqdistmc
The general approach, I believe, is to do exactly what I outlined as our "flatten" solution. In this case you combine the values for each channel into one event type. e.g. in my example dog.hound.female would be one event with one channel/field to replace the first event in my example that has 3 separate fields/channels. You then use the typical functions for finding distances, subsequences, etc. You do have options for setting up substitution costs and finding distances though, so it has some extra options for doing this multi-channel approach. It also deals with missing values in case you have channels that are different length or have gaps.
This is also similar to what's suggested in the answer to the topic linked above, using the native R function interaction.

How to *really* write UML cardinalities?

I would like to know once and for all how to write UML cardinalities, since I very often had to debate about them (so proofs and sources are very welcome :)
If I want to explain that a Mother can have several Children but a Child has one and only one Mother, should I write:
Mother * ---------- 1 Child
Or
Mother 1 ---------- * Child
?
the second one
Mother 1 ----------------- 1..* Child
You would find many example in the UML specification for all figure related to the Abstract Syntax...
Of course Red Beard is right, the correct answer is the second one.
As for a tip for remembering this, I advise to think in english: You say "A child has ONE mother", and in this sentence like in UML, ONE is written next to Mother. Fairly simple.
Many people have this question when they start using UML, especially when they come from another notation where the names are always read clockwise, regardless of which end of the line they're on. That's really confusing!
Red Beard is correct, although the UML spec does not explicitly state where association-end information (i.e., name and multiplicity) is written, it implies it in several places. For example, Figures 7.11 (showing attributes) and 7.12 (showing unidirectional associations with association ends next to the arrowheads) are equivalent property notations; thus, the multiplicity does indeed go next to the property's type.
One way I learned to remember which end has which multiplicity is to imagine a unidirectional graph of instances and write the number next to the arrowheads that point at the target.
BTW, you should use descriptive association end names. These often turn into attribute names in Java, element names in XSD, and so on. For example, in Java, the Mother class might have a "children" attribute of type "Set<Child>". If you don't name them, you'll often get undesirable default names.

Designing a solution to retrieve and classify content based on given attributes

This is a design problem I am facing. Let's say I have a cars website. Cars have the following attributes with different possible values.
Color: red, green, blue
Size: small, big
Based on those attributes I want to classify between cars for young people, cars for middle aged people and cars for elder people, with the following criteria:
Cars_young: red or green
Cars_middle_age: blue and big
Cars_elder: blue and small
I'll call this criteria target
I have a table cars with columns: id, color and size.
I need to be able to:
a) when retrieving a car by id, tell its target (if it's young, middle age or elder people)
b) be able to query the database to know how many views had cars belonging to each target
Also, as a developer, I must implement it in a way that those criteria are easily changed.
Which is the best way to implement it? Is there a design pattern for it? I can explain two possible solutions I thought about but I don't really like:
1) create a new column in the database table called target, so it's easy to make both a) and b).
Drawbacks: Each time crieteria changes I have to update the column target for all cars, and also, I have to change the insertNewCar() function.
2) Implement it in the 'Cars' class.
Drawback: Each time criteria changes I have to change query in b) as well as code in 'getCarById' in a).
3) Use TRIGGERS in SQL, but I would like to avoid this solution if possible
I would like to be able have this criteria definition somewhere in the code which can be changed easily, and would also hopefully be used by 'Cars' class. I'm thinking about some singleton or global objects for 'target' which can be injected in some Cars methods.
Anyone can explain a nice solution or send documentation about some post that faces this problem, or a pattern design that solves it?
On first sight specification pattern might meet your expectations. Wikipedia gives a nice explanation how it works, small teaser bellow:
OverDueSpecification OverDue = new OverDueSpecification();
NoticeSentSpecification NoticeSent = new NoticeSentSpecification();
InCollectionSpecification InCollection = new InCollectionSpecification();
ISpecification SendToCollection = OverDue.And(NoticeSent).And(InCollection.Not());
InvoiceCollection = Service.GetInvoices();
foreach (Invoice currentInvoice in InvoiceCollection) {
if (SendToCollection.IsSatisfiedBy(currentInvoice)) {
currentInvoice.SendToCollection();
}
}
You can consider combine specification pattern with observers.
Also there are few other ideas:
extention of specification pattern on SQL generation, WHERE clauses in particular
storing criteria configuration in database
criteria versioning: storing information about version of rules used to assign to category comined with category itself

Grammatically correct double-noun identifiers, plural versions

Consider compounds of two nouns, which in natural English would most often appear in the form "noun of noun", e.g. "direction of light", "output of a filter". When programming, we usually write "LightDirection" and "FilterOutput".
Now, I have a problem with plural nouns. There are two cases:
1) singular of plural
e.g. "union of (two) sets", "intersection of (two) segments"
Which is correct, SetUnion and SegmentIntersection or SetsUnion and SegmentsIntersection?
2) plural of plural
There are two subcases:
(a) Many elements, each having many related elements, e.g. "outputs of filters"
(b) Many elements, each having single related element, e.g. "directions of vectors"
Shall I use FilterOutputs and VectorDirections or FiltersOutputs and VectorsDirections?
I suspect correct is the first version (FilterOutupts, VectorDirections), but I think it may lead to ambiguities, e.g.
FilterOutputs - many outputs of a single filter or many outputs of many filters?
LineSegmentProjections - projections of many segments or many projections of a single segment?
What are the general rules, I should follow?
There's a grammatical misunderstanding lying behind this question. When we turn a phrase of form:
1. X of Y
into
2. Y X
the Y changes grammatical role from a noun in the possessive (1) to an adjective in the attributive (2). So while one may pluralise both X and Y in (1), one may only pluralise X in (2), because Y in (2) is an adjective, and adjectives do not have grammatical number.
Hence, e.g., SetsUnion is not in accordance with English. You're free to use it if it suits you, but you are courting unreadability, and I advise against it.
Postscript
In particular, consider two other possessive constructions, first the old-fashioned construction using the possessive pronoun "its", singular:
3a. Y, its X
the equivalent plural:
4a. Ys, their X
and their contractions, with 4b much less common than 3b:
3b. Y's X
4b. Ys' X
Here, SetsUnion suggests it is a rendering of the singular possessive type (3) Set's Union (=Set, its Union), where you intended to communicate the plural possessive (4) Sets, their Union (contracted to the less common Sets' Union).
So it's actively misleading.
Unless you're getting hamstrung by a convention driven system (ruby on rails, cakePHP etc), why not use OutputsOfFilters, UnionOfSets etc? They may not be conventional but they may be clearer.
For example its pretty clear that ProjectionOfLineSegments and ProjectionsOfLineSegment are different things or even ProjectionsOfLineSegments....
Using plural forms of nouns can make them more difficult to read.
When you have a number of things, they are usually stored in a datastructure - an array, a list, a map, set, etc.. generically called a collection or abstract data type. The interface to a collection of items is typically part of the programming environment (e.g. Collections in java and .net, STL in C++) and is well understood by developers to involve quantities of items.
You can avoid pluralizing your nouns, and make the fact that you are dealing with multiple quantities explicit, and indicate how they are accessed by incorporating the name of the collection. For example,
VectorDirectionList - the vectors and their directions are listed, e.g. some kind of Pair type. Works particularly well if you have a VectorDirection, combining a Vector and a Direction.
VectorDirectionMap - if the vector directions are mapped from vector.
Because it's a collection type, dealing with multiple objects is understood as it is endemic to a collection type. It then puts it in the same class as SetUnion - a union always involves at least 2 sets, and a VectorDirectionList makes it clear there can be more than one VectorDirection.
I agree about avoiding homonyms where the word has more than one word class, e.g. Filter, (and actually, Set, although to my mind Set would not really be used in a class name as a verb, so I interpret it as a noun.) I originally wrote this using FilterOutput as an example, but it didn't read well. Using a compound for Filter may help disambiguate - e.g. ImageFilterOutputs (or applying my own adivce, this would be ImageFilterOutputList.)
Avoiding plural forms with class names seems natural when you consider that an instance of a class is itself always one item - "an instance". If we use a plural name, then we get a mismatch - an instance trying to imply that it is multiple things - it itself is just one thing, even if it references multiple other things. The collection naming above builds on this - you have an instance which is a list, a map etc so there is no mismatch.
I'm assuming you are talking about programming language constructs, although the same thinking applies to tables/views. These are understood to involve quantities of items and table names are consequently often singlular (Customer, Order, Item) even though they store multiple rows. Many-to-Many Mapping tables are usually compounds of the entities being related, e.g. relating orders to items - OrderItem. In my experience, using plurals for table names makes the SQL difficult to read.
To sum up, I would avoid plural froms as they make reading harder. There are sure to be cases where they are unavoidable - where using the plural form is more readable than creating a huge name of nested entities and collections, but these are the exception than the rule.
What are the general rules, I should follow?
Make it Clear -- for both visual and aural thinkers.
Make it Specific but Accurate.
Make it pass the "crowded room" or "emergency phone call" test.
To illustrate with the SetsUnion example:
"SetsUnion" is right out; It's easily confused for a typo and speaking it (even in your head) will confuse it for "Set's Union" (Or worse).
The plural is also implied, so the 2nd 's' is redundant.
SetUnion is better but still ambiguous.
UnionOfSets is clearer and should be the bare minimum standard.
But all of these, so far, are uselessly vague (unless you are working with pure mathematical theory).
The term really should be specific. For example, "Red cars", "Programmers who spent too much time on esoterica", etc.
These are all unions of sets, but they tell you something useful. ;-)
.
Finally, Phil Factor had the right of it. To paraphrase:
Can you shout a (term) out across a crowded room and have it keyed in, and successfully (used), by a listener at the other side?
Try yelling, "SetsUnion," or even, "UnionOfSets," across a packed Irish bar. ;-)
1) i would use SetUnion and SegmentIntersection because i think in this case the plurality is implied anyway and it just looks nicer that way.
2) again, i would use FilterOutputs and VectorDirections, for the same reason. you could always use MultipleFilterOutputs if you want to be more specific.
but ultimately it's entirely down to your personal preference.
I think that while general naming conventions and consistency are important, but in a very very tight/tricky algorithm, clarity should trump convention. If it helps, use veryLongAndDescriptiveIdentifiers.
What's wrong with Union()?
Moreover, "union of sets" turns into "sets' union" (the two sets' union is ...); I'm sure I'm not the only person who's okay with CamelCase but not CamelsCaseMinusApostrophes. If it needs an apostrophe to make sense, don't use it. Set.Union() reads exactly like "union of set(s)".
Mathematations will also say "the (set) union of A and B", or rarely "A and B's (set) union". "The sets' union of A and B" makes no sense!
Most people will also see Vector[] vectors and Directions[] vectorDirections and assume that vectors[i] corresponds to vectorDirections[i]. If things really get ambiguous, I use something like vector_by_index and vectorDirection_by_index. Then you can have Map<Filter,Output> output_by_filter or Map<Filter,Output[]> outputs_by_filter, which makes it very obvious what the key is (this is very important in Objective-C where it's completely non-obvious what type the keys or values are).
If you really want, you can add an s and get vectors_by_index, but then consistency gives you the silly outputss_by_filter.
The right thing is, of course, something like struct FilterState { Filter filter; Output[] outputs; }; FilterState[] filterStates;.
I'd suggest singular for the first word: SetUnion, VectorDirections, etc.
Do a quick class search in your IDE, for: Strings*, Sets*, Vectors*, Collections*
Anyway, whatever you choose, be consistent throughout the whole application.