Is this a correct schema design for what I need to accomplish? - schema

Pawnshop business model:
CLIENTES (customer table), LOTES (lot table), ARTICULOS (item table) and TRANSACCIONES (transaction table).
The reason I defined a lot table is because when customers pawn or sell items, the pawnshop groups all these items into one lot, calculates the total loan or purchase amount, stores these values under one transaction and prints the ticket with a description of all the items and total amount. So I want the ability to say, if customer defaults on interest payments or does not redeem pawn, then customer forfeits all items and pawnshop may choose to sell some items to gold refinery and/or transfer other non-gold items to inventory to sell to the public. In other words, I want the ability to do a break-out explosions of each item.
Would the above ER be adequate for this capability?

From the point of view of a logical model, you probably don't want store_id on the lot (as it comes from the customer) or the transactions or articles (as they get it through the lot and customer). At the physical level you might have those as attributes (called denormalisation), you have the risk of data showing, for example, LOT 1234 being on CUSTOMER C12 and at STORE S1, while the customer table has C12 being at store S2.
Of course it is possible that you allow Mr Smith to pawn an item at one store but make payments on it at another. Or perhaps an item might be pawned at one store but physically relocated to a different one for security or space reasons. If so, then it is appropriate to have distinct store ids on these entities.
However that doesn't sit comfortably with the 'store' being an attribute of the customer, since that implies they have a relationship with only one store.
Also consider what happens if MR P BROKER has three stores, but decides to close one and move the business to one of the others. You need to merge the stores but do you update the store id on all the transactions and articles and lots (including ones that are 'in progress' and those redeemed) or do you leave them with the original store id ?
Another common data modelling issue is identifying customers. Is Mr Smith one customer and Mrs Smith another, or can Mr and Mrs Smith be 'parts' of the same customer ? If Mr Smith pawns something, can Mrs Smith redeem it ? I'm thinking family squabbles, disputed heirlooms.... Perhaps she can't redeem it, but can make payments on it.
If an item (eg a watch) is included in one lot, then redeemed, then included in a different lot, does it get a different item_id ?

When a client buys an article offered to the general public, is that a transaction? Or does your database only track transactions about lots?
Can an item exist in your system without being part of any lot? You can't express that fact in the ER model you've presented.
Your ER model doesn't show any many to many relationships. That makes me suspicious. I've never worked in a pawnshop, so I can't say for sure. But every other enterprise database I've ever seen has at least one many-to-many relationship. Sometimes a relationship is treated as though it were an entity, and appears with a box of its own. But that box would be on the "infinity" end of more than one relationship, something I don't see in your diagram.
Buena suerte.

Related

Should I use a name as primary key if there is nothing else to use, or should i create ids for the entities that don't have anything useable as PK?

I have to specify that this is for a database assignment. I'm pretty good with SQL code but the diagram aspect of the assignment is killing me, I think that every step I take is wrong.
They have given us This scenario and requirements :
A research team has asked you to create a database for a project on movie production
companies; the project aims to use machine learning, neural networks and other
methods to extract information about the situation of movie production companies in
Europe and the health of this sector for a set of specific countries, including the UK.
The data analytics application resulting from this project – which you DO NOT have to
develop; your job is to develop the central, server-side database that underpins it – has been commissioned by a research institute (which shall remain nameless), and it is
intended to be open source, and therefore available to anyone.
Basically, it is a machine learning application that would run on a database with the aim
to identify the correlation between different aspects of the sector, including funding
opportunities and development of new production companies or studios.
The database records every production company in Europe, including the name of the
company, the address, ZIP code, city, country, type of the company (e.g., non-profit
organisation), number of employees and net worth (calculated as total assets minus
total liabilities). Every production company has its name registered with one and only
one local government authority (for example, Companies House in the UK) on a specific
date; each company can have many shareholders. The authority typically requires
information about all the shareholders, including town of birth, mother’s maiden name,
father’s first name, their personal telephone number (only one), national insurance
number (each country in Europe has a similar unique ID), and passport number. Also,
the registration procedure has a cost associated with it (e.g., 12£ in the UK).
The database also records the employees’ data for each company: each employee is
assumed to work for a single production company. Due to the complex structure of
movie production companies and the need for various skills and professions,
employees are categorised into crew and staff. The crew consists of three main groups:
the actors, the director(s) and those who work on other jobs relevant to the filming
(producers, editors, production designers, costume designers, composer, etc.). All
other employees belong to the staff group, including those responsible for HR,
advertising, etc. Employees are identified by an employee ID, first name, last name and
an optional middle name, date of birth and start date. Also, each employee has their
contact details recorded, whether it is a single phone number or multiple, with a
description associated with each of them. Each employee has a single email address,
too.
Members of the crew are paid hourly, and this is recorded in the database as well as a
bonus that depends on their contract. Actors get a bonus for each day of work and
another bonus for each scene completed; directors get a bonus at the end of the
shooting; crew members that work in other jobs relevant to the filming get a bonus at
the end of the shooting, and they have their role recorded as well (e.g., producer or
costume designer).
Staff members have the monthly salary and the working hours (e.g., full time 9-5).
Furthermore, each staff member belongs to a specific department (e.g., advertising),
which is located in a given building at a given address (both recorded in the database).
The database records all movies from each production company. More specifically, for
each movie the following information is recorded: a universal unique movie code(similar to the ISBN for books), the title of the movie, the year and the first release date
(different release dates are not important and should NOT be recorded).
Also, the database records each member of the crew that is part of the movie, and the
role they have in the movie: each crew member can play a single role or multiple roles
in the same movie, and each role has a description associated with it. For example, in
each movie there can be a single protagonist or more than one, the same actor can play
one or several roles, or even have a cameo.
One of the aims of the project is to provide insights on the impact of funding and grants
within the movie industry. To this end, the database should be able to record all the
funding that each production company receives. This must include the name of the
grant, the funding body (e.g., the government of a given country or European Union
grants such as the ERDF), the maximum amount for that grant and the deadline to
submit a proposal.
Then, for each company the database must record the date of the application to a given
grant, the amount requested, the outcome (successful/unsuccessful).
A grant can be given to a single production company or shared among several. Finally,
once the database is ready, the project will run a set of machine learning algorithms to
perform high level data analysis based on the different grants and their corresponding
impact with the aim to investigate the impacts of such funding against a list of criteria.
No additional information is provided at this stage from the project.
In the spec, the requirements are numerated from 1 to 5, as the scenario was not given
at that time. The details of each requirement are provided in the following:
Each production company may have received one or multiple grants, and grants
can be shared by more than one company.
It is possible for each employee to have more than one telephone number. Each
telephone number has a description associated with it (e.g., personal, or work).
Each production company is registered only once but can have many shareholders.
Each employee can either be a member of the crew OR a staff member. Each crew
member can be an actor OR a director OR have another role. Each staff member
belongs to a department. No duplication of data is allowed.
Each crew member may be part of one or more movies in a single role or many.
Based on that I have created THIS DIAGRAM.
I think I have all the entities,attributes and relationships down but I'm missing the keys. Keys can't be names right? I will use the company entity as an example. So, should I create new attributes like company_id to use as primary keys or just underline the name attributes and use it as Primary Key?
Also, please tell me if there's anything else wrong with the diagram.
Thanks a lot!
I created an er diagram but some entities don't attributes that can be used as primary keys because they are names. I tried using them but I don't think it's right.
The problem with names as primary keys
In your diagram, you have a couple of name used to identify entities: Grant, Production Company, Shareholder (full name), Employee, Movie (Title). You can in theory use them as primary key. However, this is a bad practice:
names can change (e.g. departments and companies can be renamed, movies can have a temporary working title);
names are often not sufficient to distinguish entities (e.g. there may be different people having the same name, e.g. Adam Smith);
names can be spelled differently across source of information , and are also easily misspelled;
although not really noticeable with modern RDBMS, names are more time consuming to search, and consume more memory when used as foreign keys.
How to chose a primary key?
You'd better use a primary key that guarantees uniqueness. You can then decide easily if a same name correspond to a different entity or not.
The next question that you'll then face in you design is surrogate key vs. natural key:
When there's no other unique information, you'll have not choice than using a surrogate key.
When there are other potential unique attributes, you may chose to use either a natural key (e.g. company registration number, national insurance number together with a country code, movie code?) or a surrogate one.
Keep in mind that both have advantages and inconveniences, but the surrogate key is in general more robust, as natural keys sometimes appear to be not as stable as expected.
Other remarks concerns about your ERD
By the way, here some issues and other remarks:
Works in relation does not relate Staff to anything else. From the name, it's obviously not a reflexive relation either. So this is a diagram error. department (name) and building should either be attached to a Department entity or be attributes of Staff.
In several cases you relate attributes to other attributes (actor-extra role, phone number-description) . This is also a diagramming error. Either add the extra attribute to the same entity, or there's a missing relationship with a missing entity.
In one case you relate two entities without a relation between the two (production company- application). This is an inconsistency that must be corrected also.
The following attributes are not real attributes but probably values of an unidentified attribute: producer, composer, actor, editor, xyzzy designer, advertising, HR, janitor.
Government authority is a misleading entity name: nowhere do you refer to data about the authority itself (name of the authority, e.g. "CNC", country of the government, ...). It's only information about the company's registration.
In your diagram you leave the hourly and monthly wage at the level of the Employee. This does not model accurately the requirements.
The link of the relation receive funding and the entity Application with the same attribute outcome seems very ambiguous.
In the name of the entities, stay consistent: either singular or plural. But mixing both will lead to lots of typos.
Better show cardinality in the link between the relation and the entity, than on the top of the relation: this avoids confusion about the direction of reading.
As a side remark, your question provides wealth of interesting details, but that are not really needed for answering the core of your question. Better limit yourself to only the information directly related to your issue in your next questions ;-)
Research or not research, keep in mind that GDPR may apply and that it requires inter alia privacy by design (some information about the shareholder and the employee may require some additional thoughts).

Database Design for Contacts

So I have 3 main entities. Airports, Customers, and Vendor.
Each of these will have multiple contacts I need to relate to each.
So they way I set it up currently.
I have the following tables..
Airport
Customer
Vendor
I then have one Contacts table and a xref for Airport, Customer, Vendor...
I am questioning that and was thinking a contacts table for each ..
Airport and AirportContacts
Customer and CustomerContacts
Vendor and VendorContacts
Any drawbacks to either of these designs?
To me, the deciding factor is duplication of entities vs "one version of the truth". If a single real-world person can be a contact for more than one of the other entities, then you don't want to store that single person in multiple contact tables, because then you have to maintain any changes to his/her properties in multiple places.
If you put the same "Joe Smith" in both AirportContacts and VendorContacts, then one day when you look and see his city is "Denver" in one table and "Boston" in another table, which one will you consider to be the truth?
But as someone mentioned in comments, if a contact can only be associated with one of the three other entities ("types" as you call them), then putting them in separate tables makes the most sense.
And there's yet a third scenario. Say "Joe Smith" can be a contact for both Airports and Vendors. But say that he has some properties, like his gender and age, which are the same regardless of which "type" he is being considered, but there might be some properties, like phone number, or position/job title, which could depend on the "type". Maybe he uses one phone in his capacity as an Airport Vendor, and a different phone as a Vendor Contact. Moreover, maybe there are some properties that apply to one type of contact that don't apply to the others. In these cases, I would look at some kind of hybrid approach where you keep common properties in a single Contact table, and "Type"-specific properties in their own type-related tables. These type-related tables would be bridge tables that have FKs back to the Contact table and to the main entity table of the "Type" they are related to (Vendor, Customer or Airport).
What I have so far ... Dont mind some of the data types.. just placed quick placeholders..

Database Design: Unique Billing Assocation

Our company is in the process of redesigning an old database. We've come across an association we need to show but not sure the best way to handle. Our business logic is that our customers (called distributors) will always have payment terms. However, some of these accounts payment terms will be controlled by a separate distributor account (ex: think chain stores and all billing goes through a corporate account).
We've come up with a couple solutions and would like some feedback on which one is most viable, or if we should be handling this billing association a different way.
Solution #1
Distributors
-------------
DistributorId
Name
PaymentTermsId (Points to separate PaymentTerms table)
Since each account has payment terms they're reflected in the Distributors table. Then the association is done in a separate table
DistributorBillingAssociations
------------------------------
DistributorId
BillingDistributorId (points to DistibutorId of Distributors table)
Now you're relied on business logic to make sure the Payment Terms match the same as the billing account. The biggest issue I have with this solution is that the design isn't intuitive. Hard for future users to know the payment terms is driven by the billing associations table.
Solution # 2
Distributors
------------
DistributorId
Name
PaymentTermsId (nullable)
BillingAccountId (references DistributorId, also nullable)
If a distributor doesn't have it's own Payment Terms then the column is set to null and through a self reference you'll know to use the Payment Terms from the Distributor set as the billing account.
Any feedback or suggestions is welcome.
I would go for Solution #2 because I think its most logic:
if (PaymentTermsId=null) PaymentTerms=(PaymentTermsFor(BillingAccountId))
else PaymenntTerms=PaymentTermsFor(PaymentTermsID)
I would set up a paymentTermsID for every payment entity be it an individual distributor or a conglomerate and then have a FK from the distributor table to the payment terms table.
if there are many distributors with the same payment terms, they simply refer to the same record in the payment terms table.
this avoids any potentially confusing ifnull logic in your data queries.
select * from Distributors
left join PaymentTerms
on Distributors.paymentTermsID = PaymentTerms.PaymentTermsID
I have made a rough sketch on sqlfiddle, take a look and play around
http://sqlfiddle.com/#!6/c4a30/1/0

Should I denormalize Loans, Purchases and Sales tables into one table?

Based on the information I have provided below, can you give me your opinion on whether its a good idea to denormalize separate tables into one table which holds different types of contracts?.. What are the pro's/con's?.. Has anyone attempted this before?.. Banking systems use a CIF (Customer Information File) [master] where customers may have different types of accounts, CD's, mortgages, etc. and use transaction codes[types] but do they store them in one table?
I have separate tables for Loans, Purchases & Sales transactions. Rows from each of these tables are joined to their corresponding customer in the following manner:
customer.pk_id SERIAL = loan.fk_id INTEGER;
= purchase.fk_id INTEGER;
= sale.fk_id INTEGER;
Since there are so many common properties among these tables, which revolves around the same merchandise being: pawned, bought and sold, I experimented by consolidating them into one table called "Contracts" and added the following column:
Contracts.Type char(1) {L=Loan, P=Purchase, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic table where for example:
Contracts.Capital DECIMAL(7,2)
in a loan contract it holds the pawn principal amount, in a purchase holds the purchase price, in a sale holds sale price.
Is this design a good idea or should I keep them separate?
Your table second design is better, and, is "normalised".
You first design was the denormalised one!
You are basiclly following a database design/modelling pattern known as "subtype/supertype"
for dealing with things like transactions where there is a lot of common data and some data specific to each tranasaction type.
There are two accepted ways to model this. If the the variable data is minimal you hold everthing in a single table with the transaction type specfic attributes held in "nullable" columns. (Which is essentially your case and you have done the correct thing!).
The other case is where the "uncommon" data varies wildly depending on transaction type in which case you have a table which holds all the "common" attributes, and a table for each type which holds the "uncommon" attributes for that "type".
However while "LOAN", "PURCHASE" and "SALE" are valid as transactions. I think Inventory is a different entity and should have a table on its own. Essentially "LOAN" wll add to inventory transaction, "PURCHASE" wll change of inventory status to Saleable and "SALE" wll remove the item from inventory (or change status to sold). Once an item is added to inventory only its status should change (its still a widget, violin or whatever).
I would argue that it's not denormalized. I see no repeating groups; all attributes depending on a unique primary key. Sounds like a good design to me.
Is there a question here? Are you just looking for confirmation that it's acceptable?
Hypothetically, what would you do if the overwhelming consensus was that you should revert back to separate tables for each type? Would you ignore the evidence from your own experience (better performance, simpler programming) and follow the advice of Stackoverflow.com?

Circular database relationships. Good, Bad, Exceptions?

I have been putting off developing this part of my app for sometime purely because I want to do this in a circular way but get the feeling its a bad idea from what I remember my lecturers telling me back in school.
I have a design for an order system, ignoring the everything that doesn't pertain to this example I'm left with:
CreditCard
Customer
Order
I want it so that,
Customers can have credit cards (0-n)
Customers have orders (1-n)
Orders have one customer(1-1)
Orders have one credit card(1-1)
Credit cards can have one customer(1-1) (unique ids so we can ignore uniqueness of cc number, husband/wife may share cc instances ect)
Basically the last part is where the issue shows up, sometimes credit cards are declined and they wish to use a different one, this needs to update which their 'current' card is but this can only change the current card used for that order, not the other orders the customer may have on disk.
Effectively this creates a circular design between the three tables.
Possible solutions:
Either
Create the circular design, give references:
cc ref to order,
customer ref to cc
customer ref to order
or
customer ref to cc
customer ref to order
create new table that references all three table ids and put unique on the order so that only one cc may be current to that order at any time
Essentially both model the same design but translate differently, I am liking the latter option best at this point in time because it seems less circular and more central. (If that even makes sense)
My questions are,
What if any are the pros and cons of each?
What is the pitfalls of circular relationships/dependancies?
Is this a valid exception to the rule?
Is there any reason I should pick the former over the latter?
Thanks and let me know if there is anything you need clarified/explained.
--Update/Edit--
I have noticed an error in the requirements I stated. Basically dropped the ball when trying to simplify things for SO. There is another table there for Payments which adds another layer. The catch, Orders can have multiple payments, with the possibility of using different credit cards. (if you really want to know even other forms of payment).
Stating this here because I think the underlying issue is still the same and this only really adds another layer of complexity.
A customer can have 0 or more credit cards associated, but the association is dynamic - it can come and go. And as you point out a credit card can be associated with more than one customer. So this ends up being an n:m table, maybe with a flag column for "active".
An order has a static relationship to 0 or 1 credit card, and after a sale is complete, you can't mess with the cc value, no matter what happens to the relationship between the cc and the customer. The order table should independently store all the associated info about the cc at the time of the sale. There's no reason to associate the sale with any other credit card column in any other table (which might change - but it wouldn't affect the sale).
I think the problem is with the modeling of the Order. Instead of one Order has one credit card, an order should be able to be associated with more than one credit card of which only one is active at any time. Essentially, Order and Credit is many-to-many. In order to model this in DB, you need to introduce an association table, let's say PaymentHistory. Now when an order requires a new credit card, you can simply create a new credit card, and associate it with the order and mark the associating PaymentHistory as active.
Hmm?
A customer has several credit cards, but only a current one. An order has a single assigned card. When a customer puchases something, his default card is tried first, otherwise, he may change his main card?
I see no circular references here; when a user's credit card changes, his orders' stay the same. Your tables would end up as:
Customer: id, Current Card
Credit Cards: id, number, customer_id
Order: id, Card_id, Customer_id
Edit: Oops, forgot a field, thanks.
No matter the reason your data has circular relationships, you'll be a lot happier if you "forget" to declare one of them so that your tables have a bulk-load order.
That comes in handy when you least expect it.
This is a year old but there's some points worth making.
NB For on-line NON-ACCOUNT processes: The Customer would be better defined as Buyer and there would also probably be another type of customer - the Beneficiary/Recipient. You can buy/purchase airline tickets and flowers etc. for other people so these two roles need to be clearly separated as they involve different business processes (one to pay and the other to be sent the goods).
If it is a non-account process then you shouldn't be retaining credit card details. It's a security risk - and you're putting the buyer at risk by keeping this information. Credit cards are processed in real-time and then the information should be thrown away.
ACCOUNT CUSTOMERS: The only exception would be when someone has opened an account and provided their credit card information for use in subsequent purchases. In such a case changes to the credit card information would take place outside of the transaction - as part of the Account Management process.
The main point is to make sure that you fully understand the business processes before you start modelling and coding.