How to prevent a clutter of DTO objects? - api

I am using the DTO pattern,
I am using an Auto Mapper library to even help map domain objects to DTO objects and it works well.
now that my application gets bigger i find myself in need for many different DTO object supporting different data needs.
lets say for example my application displays a list of employees each employee has an age property and a salary property in my application.
in my UI, on one page i show just a list of all the employees, in a different one i show a list of departments names the employees are in and the number of employees in each one, on a different page i show statistics on each avg salary, avg age total employee salary in the department etc...
considering i have many departments and many employees (lets say millions of employees and thousands of departments, too many for the client to calculate statistics on its own)
my question is how would you build an API that serves the client without creating many many DTOs? and without making unnecessary calculations?
for example: in one flow counting the amount of employees in a department without calculating the avg salary in a case the avg salary is not interesting in order to make the api respond quicker and in another case calculating them both.
are there any other patterns to make this more efficient?

GraphQL is one potential option. It's a tool developed by Facebook for retrieving specific parts of their domain without retrieving the whole object.
This works well for retrieving pre-existing domain data. It doesn't work as well for calculating data (you mention count of employees and avg salary). As long as these are defined attributes (of department) you're fine.


Should I use a name as primary key if there is nothing else to use, or should i create ids for the entities that don't have anything useable as PK?

I have to specify that this is for a database assignment. I'm pretty good with SQL code but the diagram aspect of the assignment is killing me, I think that every step I take is wrong.
They have given us This scenario and requirements :
A research team has asked you to create a database for a project on movie production
companies; the project aims to use machine learning, neural networks and other
methods to extract information about the situation of movie production companies in
Europe and the health of this sector for a set of specific countries, including the UK.
The data analytics application resulting from this project – which you DO NOT have to
develop; your job is to develop the central, server-side database that underpins it – has been commissioned by a research institute (which shall remain nameless), and it is
intended to be open source, and therefore available to anyone.
Basically, it is a machine learning application that would run on a database with the aim
to identify the correlation between different aspects of the sector, including funding
opportunities and development of new production companies or studios.
The database records every production company in Europe, including the name of the
company, the address, ZIP code, city, country, type of the company (e.g., non-profit
organisation), number of employees and net worth (calculated as total assets minus
total liabilities). Every production company has its name registered with one and only
one local government authority (for example, Companies House in the UK) on a specific
date; each company can have many shareholders. The authority typically requires
information about all the shareholders, including town of birth, mother’s maiden name,
father’s first name, their personal telephone number (only one), national insurance
number (each country in Europe has a similar unique ID), and passport number. Also,
the registration procedure has a cost associated with it (e.g., 12£ in the UK).
The database also records the employees’ data for each company: each employee is
assumed to work for a single production company. Due to the complex structure of
movie production companies and the need for various skills and professions,
employees are categorised into crew and staff. The crew consists of three main groups:
the actors, the director(s) and those who work on other jobs relevant to the filming
(producers, editors, production designers, costume designers, composer, etc.). All
other employees belong to the staff group, including those responsible for HR,
advertising, etc. Employees are identified by an employee ID, first name, last name and
an optional middle name, date of birth and start date. Also, each employee has their
contact details recorded, whether it is a single phone number or multiple, with a
description associated with each of them. Each employee has a single email address,
Members of the crew are paid hourly, and this is recorded in the database as well as a
bonus that depends on their contract. Actors get a bonus for each day of work and
another bonus for each scene completed; directors get a bonus at the end of the
shooting; crew members that work in other jobs relevant to the filming get a bonus at
the end of the shooting, and they have their role recorded as well (e.g., producer or
costume designer).
Staff members have the monthly salary and the working hours (e.g., full time 9-5).
Furthermore, each staff member belongs to a specific department (e.g., advertising),
which is located in a given building at a given address (both recorded in the database).
The database records all movies from each production company. More specifically, for
each movie the following information is recorded: a universal unique movie code(similar to the ISBN for books), the title of the movie, the year and the first release date
(different release dates are not important and should NOT be recorded).
Also, the database records each member of the crew that is part of the movie, and the
role they have in the movie: each crew member can play a single role or multiple roles
in the same movie, and each role has a description associated with it. For example, in
each movie there can be a single protagonist or more than one, the same actor can play
one or several roles, or even have a cameo.
One of the aims of the project is to provide insights on the impact of funding and grants
within the movie industry. To this end, the database should be able to record all the
funding that each production company receives. This must include the name of the
grant, the funding body (e.g., the government of a given country or European Union
grants such as the ERDF), the maximum amount for that grant and the deadline to
submit a proposal.
Then, for each company the database must record the date of the application to a given
grant, the amount requested, the outcome (successful/unsuccessful).
A grant can be given to a single production company or shared among several. Finally,
once the database is ready, the project will run a set of machine learning algorithms to
perform high level data analysis based on the different grants and their corresponding
impact with the aim to investigate the impacts of such funding against a list of criteria.
No additional information is provided at this stage from the project.
In the spec, the requirements are numerated from 1 to 5, as the scenario was not given
at that time. The details of each requirement are provided in the following:
Each production company may have received one or multiple grants, and grants
can be shared by more than one company.
It is possible for each employee to have more than one telephone number. Each
telephone number has a description associated with it (e.g., personal, or work).
Each production company is registered only once but can have many shareholders.
Each employee can either be a member of the crew OR a staff member. Each crew
member can be an actor OR a director OR have another role. Each staff member
belongs to a department. No duplication of data is allowed.
Each crew member may be part of one or more movies in a single role or many.
Based on that I have created THIS DIAGRAM.
I think I have all the entities,attributes and relationships down but I'm missing the keys. Keys can't be names right? I will use the company entity as an example. So, should I create new attributes like company_id to use as primary keys or just underline the name attributes and use it as Primary Key?
Also, please tell me if there's anything else wrong with the diagram.
Thanks a lot!
I created an er diagram but some entities don't attributes that can be used as primary keys because they are names. I tried using them but I don't think it's right.
The problem with names as primary keys
In your diagram, you have a couple of name used to identify entities: Grant, Production Company, Shareholder (full name), Employee, Movie (Title). You can in theory use them as primary key. However, this is a bad practice:
names can change (e.g. departments and companies can be renamed, movies can have a temporary working title);
names are often not sufficient to distinguish entities (e.g. there may be different people having the same name, e.g. Adam Smith);
names can be spelled differently across source of information , and are also easily misspelled;
although not really noticeable with modern RDBMS, names are more time consuming to search, and consume more memory when used as foreign keys.
How to chose a primary key?
You'd better use a primary key that guarantees uniqueness. You can then decide easily if a same name correspond to a different entity or not.
The next question that you'll then face in you design is surrogate key vs. natural key:
When there's no other unique information, you'll have not choice than using a surrogate key.
When there are other potential unique attributes, you may chose to use either a natural key (e.g. company registration number, national insurance number together with a country code, movie code?) or a surrogate one.
Keep in mind that both have advantages and inconveniences, but the surrogate key is in general more robust, as natural keys sometimes appear to be not as stable as expected.
Other remarks concerns about your ERD
By the way, here some issues and other remarks:
Works in relation does not relate Staff to anything else. From the name, it's obviously not a reflexive relation either. So this is a diagram error. department (name) and building should either be attached to a Department entity or be attributes of Staff.
In several cases you relate attributes to other attributes (actor-extra role, phone number-description) . This is also a diagramming error. Either add the extra attribute to the same entity, or there's a missing relationship with a missing entity.
In one case you relate two entities without a relation between the two (production company- application). This is an inconsistency that must be corrected also.
The following attributes are not real attributes but probably values of an unidentified attribute: producer, composer, actor, editor, xyzzy designer, advertising, HR, janitor.
Government authority is a misleading entity name: nowhere do you refer to data about the authority itself (name of the authority, e.g. "CNC", country of the government, ...). It's only information about the company's registration.
In your diagram you leave the hourly and monthly wage at the level of the Employee. This does not model accurately the requirements.
The link of the relation receive funding and the entity Application with the same attribute outcome seems very ambiguous.
In the name of the entities, stay consistent: either singular or plural. But mixing both will lead to lots of typos.
Better show cardinality in the link between the relation and the entity, than on the top of the relation: this avoids confusion about the direction of reading.
As a side remark, your question provides wealth of interesting details, but that are not really needed for answering the core of your question. Better limit yourself to only the information directly related to your issue in your next questions ;-)
Research or not research, keep in mind that GDPR may apply and that it requires inter alia privacy by design (some information about the shareholder and the employee may require some additional thoughts).

SQL - MS Access Form design - add data of ISA relationships

I'm taking a DBMS course and I need to design and build my own DB. I have a database for a hospital where doctors,nurses,support staff etc are in a ISA relationship to an Employee entity with the rest of the data like the name, address , salary and the rest of the employee data.
Designing a form, I want to be able to add an employee with all of their data in one form.
Is there a way to do a "conditional table" of sorts where if i select "doctor" from a drop-down i get to add to the doctor table too, and same for the rest of the entities under the ISA relationship?
As a general rule, when dealing with data, you do NOT flip or switch tables for a given form or relatonal database design.
So, for example. If I have a table of customers. Well, now if I want to mark some of the customers as plumbers, and others as doctors? I don't create two tables.
All I would do is add ONE column to that customers table and it would simply allow me to set the type of customer. The reason for this design is "many" but some significant reasons are:
For each new type of customer, you would not create a new table. Worse, all of the forms, the reports, the SQL, the code you write? Well, all of that code would have to be modified EACH time you create a new table. So, you SIMPLY cannot adopt a design in which the concept of changing a table is part of that process.
Forms are bound to ONE table. For related data, you in most cases will use a sub form.
So, think of even a accounting system. They can have huge numbers of customers, and as a result, you can "query" that table to give you all customers. Or you might ask how many accounting firms are in the customer list. Or make a report that summeries by customer type a "count" of each type of customer.
So, buidling forms, or reports? They cannot on the fly "change" the tables they are using.
So, in place of a tables called:
Well, now you can't query sales from Jan to mar, because the data is in different tables.
So, what you do is have ONE table called "sales", and you add ONE column of the date. Now, at the start of each new month, you don't have to create a new table.
Now, of course in some cases it makes sense to create a separate table. For example, a table of customers, and a table of employees in a database is just fine. It makes sense in this case to use two tables, since the information about a customer and what they can do and the kind of information is VERY different then how you would deal with employees.
So, with above? Well, if I need to print mailing labels for all customers and all employees? That would require two different reports. And very likely the table structure for the two tables is different.
Bottom line:
If you working on design or form or report? And you needing to try and change the table that the form/report/code etc is going to operate on? This is a sign that your design approach has gone complete off the rails and is the wrong design.
So, in the case of doctors, nurses etc.? Well, they are all hospital staff, and MOST of the basic information about such employees will be common, much the same, and thus a SINGLE table of "employees" makes the most sense. You would only need a nice "employee type" combo box on that one form, and thus you can add/enter/edit/search any employee in that one table.
The fact that you "want to search" for a employee show that all these people "are" employees and thus belong in one table. And the basic information about all employees is going to be the same anyway. If you find you are attempting to create a new table but with near identical structures over and over, then just like a new table for each month sales, or a new table for each new kind of employee? Simply add the "one" column that allows you to make that distinguish, and not a whole new table.
Now one COULD even attempt to put patients in the same table, but then again, dealing with patents as opposed employees is a considerable different kind of "thing".
So employees are employees - even different kinds. (manager, cleaning staff etc.).
And patients are patients - even different kinds (long term care, emergency etc.).

design database model and related sql queries

Develop the rdbms for the administrative structure of an organisation. Each employee belong to a certain dept and is associated with multiple projects. each manager is an employee who manages several projects as well as several employee. Each project executes for a certain duration. employees stay in organisations for a certain duration.
find no of employees who has worked is every project
find max no of employees working at a time in project 'x'
find the unproductive managers who manages less than 5 project in last 1 year
find the dept whose employees handled maximum project in last 1 year
I am not able to decide how to deal with the time constraint in the last 3 queries.
I have made 3 tables:
EMPLOYEE with attributes: emp_id,name,dept,manager_id where emp_id is primary key and manager_id is self referential foreign key
PROJECT with p_id,p_name,manager_id where p_id is primary key
ALLOTMENT with emp_id,p_id where both attributes make a composite primary key
The above helps me answer the first query but how do I add the time constraints to answer the rest of the queries. Do I need date-time attribute or a simple duration attribute will work or something else is required here? please help.
ALLOTMENT is the intersection between EMPLOYEE and PROJECT. It records the employees working on a project.
However, employees join projects and leave projects. Projects grow and shrink in their resourcing. So clearly ALLOTMENT needs columns to indicate the time span of a particular assignment, say START_DATE and END_DATE.
Once you add those columns you will be able to answer the remaining questions. Some of them will remain tricky (especially 2) but at least you will have the information required.
Incidentally, you probably ought to have a DEPARTMENT table but you can write those queries without it. Also, in real life a PROJECT would have an initiation date and (we hope) a completion date. However they too are not required for the queries you have to write.
In the project table consider adding two columns start_date and duration| End_date of the project. I think this will be sufficient for you to work through the last 3 queries.
You can consider having another column No_of_Employees_under_project in project table, which is normalized and will dynamically reflect employees joining or leaving the project. This column should be added only after measuring the gain against the cost of normalization.

Database design....storing relationship in one table, and the data in another table?

I'm looking at this company's database design, and would like to know the purpose of their design, ie store relationship in one table and the data in another, why do this?
They have this,
Id (PK)
EmployeeId (PK)
First Name
Last Name
Rather than this...
Id (PK)
First Name
Last Name
...OR this...(employee can belong to many departments)
Id (PK)
First Name
Last Name
That's a link table, or join table, or cross table.. lots of different names.
How would you assign an employee to two different departments with your design? You can't. You can only assign them to one.
With their design, they can assign the same ID to multiple departments by creating multiple records with the employee ID and different department ID's.
You need to be more specific about what you're asking. Your first question seemed to be asking what the purpose of mapping table was. Then you changed it, then you changed it again.. none of which makes much sense.
It seems now that you are asking what the better design is, which is a totally different question than what you originally asked. Please state specifically what question you want answered so we don't have to guess.
Upon re-reading, if this is the actual design, then no.. It does not support multiple department id's. Such a design makes little sense, except for one situation. If the original design did not include a department, this would allow them to add a department ID without modifying the original EMPLOYEE_DATA table.
They may not have wanted to update legacy code to support the Employee id, so they added it this way.
Purpose of design is determined by business rules.
Business rules dictate entity (logical model perspective) / table (physical model perspective) design. No design is "bad" if it is built according to the requirements that were determined based on business rules. Those rules can however change over time -- foreseeing such changes and building to accommodate/future-proof the data model can really save time, effort and ultimately money.
The first and third example are the same -- the third example has an extraneous column ( ORMs don't like composite keys, so a single column is used in place of.
The second example is fine if:
employees will never work for more than one department
there's no need for historical department tracking
The first/third example is more realistic for the majority of real-world situations, and can be easily customized to provide more value without major impact (re-writing entire table structure). It uses what is most commonly referred to as a many-to-many relationship to allow many employees to relate to many departments.
If an employee can be in more than one department, then you would need a mapping table but I'd do it like the following:
Id (PK)
First Name
Last Name
Id (PK)
EmployeeId_fk (PK)
DepartmentId_fk (PK)
This would allow for multiple positions in multiple departments.
You would do this if an employee can be a member of multiple departments. With the latter table, each employee can only belong to one department.
The only remotely good reason for doing this is to implement an extension model where the master table identifying unique customers does not include all the data for customers that is not always necessary. Instead, you create one core table with the core employee data and and extension table with all the supplementary fields. I've seen people take this approach to avoid creating large tables with many columns that are rarely needed. However, in my experience it's typically premature optimization, and I wouldn't recommend it.
In contrast to many responses, the model included does not support multiple departments per employee - it is not a many to many mapping approach.

Database design: implement worker->superior relationship

I'm currently developing a website for a hotel. And one of the things I'm about to implement is worker->superior relationship. What is the best way to do so in MySQL?
Here is what I mean: a chef's superior is a head chef, head chef's superior is shift manager, shift manager's superior is general manager. In the employee table, I could make a field superior with ID of superior employee but then I'm only able to get one superior/upper role; more importantly I wouldn't be able to retrieve list of all employees that manager manages at the particular hotel.
I'd need advice on this.
Well, with a simple adjacency graph (supervisor_id pointing back at the employees table), you could certainly do what you want, but it won't be very efficient, and will not scale to large numbers of people.
What you probably want is to implement a nested set model. This allows you to very easily grab everyone who reports to some arbitrary person in the organization.
If you're early enough in development, you might consider looking at the Doctrine ORM system, which provides a nestedset behavior for models, so you don't need to implement your own.
Edit: Richard Knop has a post about nested sets with php example code which you might find more helpful than Celko's 100%-sql examples..
You could get a list of all of the employees a manager manages by doing this on your tabs:
SELECT * FROM employees WHERE `superior`='id goes here'
But if you or bout making more than one superior just make a new table with columns like this
superior, person
or if you wanted to show it like a tree just do it in a loop of queries
Maybe you would be better off creating an additional table or list with roles / functions and assigning levels to each function, so a level 4 would be the direct boss of a level 3 etc.
I suggest a nested set model. However said it will only allow 1 person to have 1 manager directly above them...
I wrote a blog on this a couple of years ago. OK its in MSSQL but it should convert to MySQL ok.
Link here
This shows how to insert/move/retrieve full lists etc.
If you are dealing with shifts, then the shift manager will vary separately from the staff working on the shift. That is, this week, a chef might be working the lunch-time shift under one shift manager; next week, he might be working the evening shift under another manager. Be sure you keep such complexities in mind.
You can do this with three fields, an "ID" and a "managed IDs" range (two values). These IDs are not based in anything real, they are simply to describe the hierarchy.
Then to find all the managed employees you select every ID in the manager's "managed IDs" range. The top managers ID range covers everybody, the managers under him cover a smaller range, etc, down to the people who don't manage anybody and have a range containing just themselves.
One problem with this design is that when you shuffle people around or add new people you sometimes have to renumber everybody in order to keep the hierarchy straight. If you use nice big spacing of IDs for this then you don't have to renumber the entire thing too often. Or you can use floating point ID fields.
Edit: After reading other responses I notice that this is the nested set model, or a variant of it.