relational database design structure for a specific query - sql

I have a database problem where I am suppose to design a tour database. they keep track
visitors, tickets, and the attractions (such as palace or local shows) that they visit. We assume that each visitor has to buy a ticket to enter the tour.
Each ticket is valid for only one day, and there are no special tickets for children or families etc. But there are several classes of tickets, in particular Gold, Silver, and Bronze tickets. A Gold ticket is more expensive, but then most of the attractions in the tour are free or
at least fairly cheap. For Silver and Bronze tickets, there may be significant extra charges for most of the attractions. Thus, a person planning to spend the whole day and do as many rides as possible may want to buy a Gold ticket, while others are better off with a Silver or Bronze ticket.
Ticket prices, as well as the extra charges per attraction for each ticket class, may depend on the season, and may change over time. Thus, during Spring Break 2013 Season there may be one set of prices, while during the Peak Summer Season 2013 there may be another set of prices. Each attraction has a unique name (e.g., Glass museum or gold mine), and whenever a visitor visits an attraction, the database should store information about the ticket held by the visitor and about when exactly the visitor entered the attraction.
Visitors
(v_id,visitor_name)
Ticket_purchase
(v_id,t_id,date)
Tickets
(t_id,class,price,season)
Attractions
(t_id,attraction_name,goldextracost,silverextracost,bronzeextracost)
Attraction_visited
(v_id,t_id,attraction_name,datetime)
Now if I want to output the number of people who bought a Bronze ticket but who would have saved money if they had bought a Gold ticket (because they visited a lot of attractions where they had to pay extra during that day)
Is it possible with the above tables or I need some structural changes??

I think you have the information you need. You're recording the ticket used when a patron visits an attraction and with that data you can get the ticket class and calculate what the cost would have been had they visited the attraction with a different class ticket.

from what i can see you overwrite the ticket price when you change it...
because of this you cannot look back at different ticket prices... as i undrstand ticket prices can change
you would have to add to the tickets a field for the date when you created the new price. then you would not delete the old ticket price. just pull the newest.
and the same for anything else with a price... you cant overwrite it.. just add a start date...
and when you are ready to see how much the person spend you can see how much they spent on their tickets and attractions...
once you do this what you want will be possible.

Related

Should I use a name as primary key if there is nothing else to use, or should i create ids for the entities that don't have anything useable as PK?

I have to specify that this is for a database assignment. I'm pretty good with SQL code but the diagram aspect of the assignment is killing me, I think that every step I take is wrong.
They have given us This scenario and requirements :
A research team has asked you to create a database for a project on movie production
companies; the project aims to use machine learning, neural networks and other
methods to extract information about the situation of movie production companies in
Europe and the health of this sector for a set of specific countries, including the UK.
The data analytics application resulting from this project – which you DO NOT have to
develop; your job is to develop the central, server-side database that underpins it – has been commissioned by a research institute (which shall remain nameless), and it is
intended to be open source, and therefore available to anyone.
Basically, it is a machine learning application that would run on a database with the aim
to identify the correlation between different aspects of the sector, including funding
opportunities and development of new production companies or studios.
The database records every production company in Europe, including the name of the
company, the address, ZIP code, city, country, type of the company (e.g., non-profit
organisation), number of employees and net worth (calculated as total assets minus
total liabilities). Every production company has its name registered with one and only
one local government authority (for example, Companies House in the UK) on a specific
date; each company can have many shareholders. The authority typically requires
information about all the shareholders, including town of birth, mother’s maiden name,
father’s first name, their personal telephone number (only one), national insurance
number (each country in Europe has a similar unique ID), and passport number. Also,
the registration procedure has a cost associated with it (e.g., 12£ in the UK).
The database also records the employees’ data for each company: each employee is
assumed to work for a single production company. Due to the complex structure of
movie production companies and the need for various skills and professions,
employees are categorised into crew and staff. The crew consists of three main groups:
the actors, the director(s) and those who work on other jobs relevant to the filming
(producers, editors, production designers, costume designers, composer, etc.). All
other employees belong to the staff group, including those responsible for HR,
advertising, etc. Employees are identified by an employee ID, first name, last name and
an optional middle name, date of birth and start date. Also, each employee has their
contact details recorded, whether it is a single phone number or multiple, with a
description associated with each of them. Each employee has a single email address,
too.
Members of the crew are paid hourly, and this is recorded in the database as well as a
bonus that depends on their contract. Actors get a bonus for each day of work and
another bonus for each scene completed; directors get a bonus at the end of the
shooting; crew members that work in other jobs relevant to the filming get a bonus at
the end of the shooting, and they have their role recorded as well (e.g., producer or
costume designer).
Staff members have the monthly salary and the working hours (e.g., full time 9-5).
Furthermore, each staff member belongs to a specific department (e.g., advertising),
which is located in a given building at a given address (both recorded in the database).
The database records all movies from each production company. More specifically, for
each movie the following information is recorded: a universal unique movie code(similar to the ISBN for books), the title of the movie, the year and the first release date
(different release dates are not important and should NOT be recorded).
Also, the database records each member of the crew that is part of the movie, and the
role they have in the movie: each crew member can play a single role or multiple roles
in the same movie, and each role has a description associated with it. For example, in
each movie there can be a single protagonist or more than one, the same actor can play
one or several roles, or even have a cameo.
One of the aims of the project is to provide insights on the impact of funding and grants
within the movie industry. To this end, the database should be able to record all the
funding that each production company receives. This must include the name of the
grant, the funding body (e.g., the government of a given country or European Union
grants such as the ERDF), the maximum amount for that grant and the deadline to
submit a proposal.
Then, for each company the database must record the date of the application to a given
grant, the amount requested, the outcome (successful/unsuccessful).
A grant can be given to a single production company or shared among several. Finally,
once the database is ready, the project will run a set of machine learning algorithms to
perform high level data analysis based on the different grants and their corresponding
impact with the aim to investigate the impacts of such funding against a list of criteria.
No additional information is provided at this stage from the project.
In the spec, the requirements are numerated from 1 to 5, as the scenario was not given
at that time. The details of each requirement are provided in the following:
Each production company may have received one or multiple grants, and grants
can be shared by more than one company.
It is possible for each employee to have more than one telephone number. Each
telephone number has a description associated with it (e.g., personal, or work).
Each production company is registered only once but can have many shareholders.
Each employee can either be a member of the crew OR a staff member. Each crew
member can be an actor OR a director OR have another role. Each staff member
belongs to a department. No duplication of data is allowed.
Each crew member may be part of one or more movies in a single role or many.
Based on that I have created THIS DIAGRAM.
I think I have all the entities,attributes and relationships down but I'm missing the keys. Keys can't be names right? I will use the company entity as an example. So, should I create new attributes like company_id to use as primary keys or just underline the name attributes and use it as Primary Key?
Also, please tell me if there's anything else wrong with the diagram.
Thanks a lot!
I created an er diagram but some entities don't attributes that can be used as primary keys because they are names. I tried using them but I don't think it's right.
The problem with names as primary keys
In your diagram, you have a couple of name used to identify entities: Grant, Production Company, Shareholder (full name), Employee, Movie (Title). You can in theory use them as primary key. However, this is a bad practice:
names can change (e.g. departments and companies can be renamed, movies can have a temporary working title);
names are often not sufficient to distinguish entities (e.g. there may be different people having the same name, e.g. Adam Smith);
names can be spelled differently across source of information , and are also easily misspelled;
although not really noticeable with modern RDBMS, names are more time consuming to search, and consume more memory when used as foreign keys.
How to chose a primary key?
You'd better use a primary key that guarantees uniqueness. You can then decide easily if a same name correspond to a different entity or not.
The next question that you'll then face in you design is surrogate key vs. natural key:
When there's no other unique information, you'll have not choice than using a surrogate key.
When there are other potential unique attributes, you may chose to use either a natural key (e.g. company registration number, national insurance number together with a country code, movie code?) or a surrogate one.
Keep in mind that both have advantages and inconveniences, but the surrogate key is in general more robust, as natural keys sometimes appear to be not as stable as expected.
Other remarks concerns about your ERD
By the way, here some issues and other remarks:
Works in relation does not relate Staff to anything else. From the name, it's obviously not a reflexive relation either. So this is a diagram error. department (name) and building should either be attached to a Department entity or be attributes of Staff.
In several cases you relate attributes to other attributes (actor-extra role, phone number-description) . This is also a diagramming error. Either add the extra attribute to the same entity, or there's a missing relationship with a missing entity.
In one case you relate two entities without a relation between the two (production company- application). This is an inconsistency that must be corrected also.
The following attributes are not real attributes but probably values of an unidentified attribute: producer, composer, actor, editor, xyzzy designer, advertising, HR, janitor.
Government authority is a misleading entity name: nowhere do you refer to data about the authority itself (name of the authority, e.g. "CNC", country of the government, ...). It's only information about the company's registration.
In your diagram you leave the hourly and monthly wage at the level of the Employee. This does not model accurately the requirements.
The link of the relation receive funding and the entity Application with the same attribute outcome seems very ambiguous.
In the name of the entities, stay consistent: either singular or plural. But mixing both will lead to lots of typos.
Better show cardinality in the link between the relation and the entity, than on the top of the relation: this avoids confusion about the direction of reading.
As a side remark, your question provides wealth of interesting details, but that are not really needed for answering the core of your question. Better limit yourself to only the information directly related to your issue in your next questions ;-)
Research or not research, keep in mind that GDPR may apply and that it requires inter alia privacy by design (some information about the shareholder and the employee may require some additional thoughts).

Is this database scheme correct? (ERM)

I'm in the process of designing a small database for a project and I'm not quite sure if everything is correct with my design or if there is something that needs to be addressed.
It is about a management of events.
Especially with the table Participants I'm not quite sure, because there the user_id is contained and at the same time also with the Order. The table Participants makes sense for me but on the one hand because of the ratings that can be created only by participants and on the other hand for future extensions (chat, etc.).
I appreciate any feedback.
Summing up our comment thread into principles.
Avoid having multiple paths to the same relationship.
For example, a Participant belongs to Event, but it also has many Tickets which belong to TicketCategories which can be for different Events.
Avoid redundancies.
For example, everything in OrderItem can be derived from the Order's Tickets.
Derive relationships through other relationships.
Users and Events are related through their Tickets and TicketCategories. Participant is redundant.
The schema would look something like this ("has many through" indicates a join).
User
has many Tickets
has many TicketCategories through Tickets
has many Events through TicketCategories
has many Orders
Event
has many TicketCategories
has many Tickets through TicketCategories
has many Users through Tickets
Ticket
belongs to User
belongs to Order
belongs to TicketCategory
has one Event through TicketCategory
TicketCategory
belongs to Event
has many Tickets
has many Users through Tickets
Order
belongs to User
has many Payments
has many Tickets
Payment
belongs to Order
has one User through Order
Participant which linked a User to an Event is gone. Now User and Event are linked by their Tickets through TicketCategory. If you want to have Event-specific User information, make a key/value table like so.
EventUserInfo
key
value (ideally JSON to store whatever)
belongs to User
belongs to Event
unique(key, User, Event)
OrderItem is also gone. Everything in it can be derived from an Order's Tickets. Because it belongs to TicketCategory it meant only Tickets could ever be ordered. Without it other things can potentially be ordered.
You may in the future wish to add something like OrderItem back to record the specific details about ordering that item (discount codes, special instructions, quantity, the actual sale price) but in that case a Ticket would belong to an OrderItem instead of an Order.
OrderItem
belongs to Order
has many Tickets
has one User through Order
price (the price at the time of sale)
discount code (specific to these items)
Ticket
belongs to OrderItem
belongs to User
belongs to TicketCategory
OrderItem.price is a good example of when we do want redundant data. This is a snapshot of the price at the time the order was placed. For example, if TicketCategory.price changes you don't want the price on existing orders to change.
Then you can order other things by having them also belong to OrderItem.

Schema for storing payment attempt records in SQL

I tried looking at similar StackOverlow posts and it seems as those questions for input about schema is valid. Also, I'm a software developer and not a DB expert by trade. So hopefully this is met well.
I'm using SQL Server, though I think this question is generic enough that it might be applicable to pretty much any SQL product as it pertains to what's the best schema for my scenario.
I'm writing a referral payment system whereby stores may credit and pay back individuals who refer customers. The entities are -
Referrer: the one to be paid for referring customers,
Referral: the customer that was referred
Referral Purchase: The amount and date of the referral's purchase.
Admin: the one doing the paying.
When determining what to pay the referrer I need to tally up all of the referral purchases that have not been credited. The sum at the time of the pay out attempt is what gets paid.
The confounding part of this whole thing is that when an Admin makes a payment, it may fail for any number of reasons (insufficient funds, the referrer gave bad PayPal information, etc.). All of this needs to be stored so that I can not only look back over past payment attempts and determine the failures and what referral purchases were involved in the failure, but also to determine which referral purchases have yet to be credited to the referrer.
The best schema I've been able to devise is the following:
The point here is that each PaymentAttempt holds the status of the payment attempt (success/failure) and each Referral Purchase that was credited in the payment attempt has a link table which associates it with the payment attempt. One referral purchase may, then, be involved in any number of attempts to credit the referrer, with the last one being the successful attempt.
Ultimately my question comes down to this: when I need to go back and then determine how much the referrer needs to be paid at a later date, is it going to be a pain in years to come if I need to query ALL of the ReferralPurchases associated with the referrer, then join ALL of the ReferralPurchase/PaymentAttempt link tables, then join the associated PaymentAttempt status tables to find out which of the referral purchases have yet to be credited? I could see myself needing to create pretty weird queries just to find those five purchases that have yet to be credited.
Alternatively I could update the ReferralPurchase itself with a status flag, but is this considered "asking for it" in terms of data integrity (I think I could see some saying this is poor design since the state could be queried in other ways, and perhaps a bug might result in the bit being set without proper records to warrant it)? Is that bad design?
Or is there some better way to lay things out?
Will try my best to help you out, hopefully I understand your question correctly. If I were designing the system, there would be two tables that stand out for me. The tables and their columns are.
ReferralPurchase
• ReferralPurchase_Id (PK)
• Referrer_Id (Pointing to a person table)
• Referral _Id (Pointing to a person table)
Payment
• Payment_Id (PK)
• ReferralPurchase_Id (FK)
• AmountToBePaid
• StatusOfPayment
• DateLogged
• DatePaymentMade (Null if status is not successful)
• Admin_Id (Pointing to a person table)
Ben, not sure what you mean by status field. I would steer away from lifecycle status fields, but would consider a boolean. For example:
An isPaid flag on ReferralPurchase would seem like a reasonable approach. It should only be updated on a confirmed payment, and if there is a query on why it has been set, the evidence will exist in the form of history from the PaymentAttempt and link tables. This would simplify queries of outstanding payments, and pending payments would just be incomplete PaymentAttempts. There is the theoretical possibility that the history could contradict the value of the flag.
Alternatively, you could have an isSuccessful flag on the link table, which is "closer to source", if I can put it that way, in that it cannot as easily be in conflict, as it is the history itself (as long as the coder does not allow more than one row to be marked isSuccessful for a given ReferralPayment for example). Finding outstanding payments is just those ReferralPayments where not exists an isSuccessful link record.
Others will have different views on this. Let us know which way you go.

What is the recommended database schema for overlapping rules (such as price changes)?

Envision the rules that govern the price of a hotel room.
In general, $100 a night
On Fridays or Saturdays, $120
In the summer months, $150
For a special next week, $80
Etc..
Given a database of hotel rooms with varying rules like this, how would you model this in the database so that you can quickly and easily modify and query the price at a given time?
You need to define an order of priority. Then you store each rule with its priority and its criteria (from - to + weekdays bitmap for instance), and you find the matching rule with the highest priority.
I guess there's multiple ways you could do that, but the one that I'm most familiar with is to store attributes 'date-from' and 'date-to' in the table along with the corresponding price for that duration. Then, while querying you could specify sysdate(or any other desired date) in the where clause to retrieve the correct price.
Alternatively, if you had the same rules for all rooms in the hotel, you could create a separate table with the rules(date-from, date-to, price(or %change in price)). This would be a more normalized way of doing it, but that would mean you have the same rules for all rooms.
It all depends on what the business rules are, really.

Is this a correct schema design for what I need to accomplish?

Pawnshop business model:
CLIENTES (customer table), LOTES (lot table), ARTICULOS (item table) and TRANSACCIONES (transaction table).
The reason I defined a lot table is because when customers pawn or sell items, the pawnshop groups all these items into one lot, calculates the total loan or purchase amount, stores these values under one transaction and prints the ticket with a description of all the items and total amount. So I want the ability to say, if customer defaults on interest payments or does not redeem pawn, then customer forfeits all items and pawnshop may choose to sell some items to gold refinery and/or transfer other non-gold items to inventory to sell to the public. In other words, I want the ability to do a break-out explosions of each item.
Would the above ER be adequate for this capability?
From the point of view of a logical model, you probably don't want store_id on the lot (as it comes from the customer) or the transactions or articles (as they get it through the lot and customer). At the physical level you might have those as attributes (called denormalisation), you have the risk of data showing, for example, LOT 1234 being on CUSTOMER C12 and at STORE S1, while the customer table has C12 being at store S2.
Of course it is possible that you allow Mr Smith to pawn an item at one store but make payments on it at another. Or perhaps an item might be pawned at one store but physically relocated to a different one for security or space reasons. If so, then it is appropriate to have distinct store ids on these entities.
However that doesn't sit comfortably with the 'store' being an attribute of the customer, since that implies they have a relationship with only one store.
Also consider what happens if MR P BROKER has three stores, but decides to close one and move the business to one of the others. You need to merge the stores but do you update the store id on all the transactions and articles and lots (including ones that are 'in progress' and those redeemed) or do you leave them with the original store id ?
Another common data modelling issue is identifying customers. Is Mr Smith one customer and Mrs Smith another, or can Mr and Mrs Smith be 'parts' of the same customer ? If Mr Smith pawns something, can Mrs Smith redeem it ? I'm thinking family squabbles, disputed heirlooms.... Perhaps she can't redeem it, but can make payments on it.
If an item (eg a watch) is included in one lot, then redeemed, then included in a different lot, does it get a different item_id ?
When a client buys an article offered to the general public, is that a transaction? Or does your database only track transactions about lots?
Can an item exist in your system without being part of any lot? You can't express that fact in the ER model you've presented.
Your ER model doesn't show any many to many relationships. That makes me suspicious. I've never worked in a pawnshop, so I can't say for sure. But every other enterprise database I've ever seen has at least one many-to-many relationship. Sometimes a relationship is treated as though it were an entity, and appears with a box of its own. But that box would be on the "infinity" end of more than one relationship, something I don't see in your diagram.
Buena suerte.