I am working for a hospital building a patient recording system on Microsoft Access.
One of the Doctor's requests is that when they are seeing patients that are siblings, is there any way that they do not have to type in duplicate information for these relatives? (Address, phone number, town of residence, etc...)
I understand that the ID number will change for each patient. Say the form goes like this:
1) Does patient have other siblings at risk? (Y/N)
2) If so, how many?
Depending on the digit that answers question two, could I duplicate the primary patient's information so that it exists in 3 different files? The doctor is adamant that it would be easier to go back and alter a few fields than it would be to type them repetitively.
Is this possible? Are there any shortcuts to doing so?
It sounds like you need a table called RESIDENCES, which would have an id, address, phone number etc. A patient record would include a foreign key to this table's id. When a sibling is added, the sibling record would include the same foreign key. The tricky part of doing this would be in the user interface: when a new patient is about to be added, you will have to ask whether this patient is a sibling of someone. If yes, you will have to ask who the sibling is, then copy the 'residences' foreign key; if no, you will have to display a dialog which allows the 'residencies' data to be added.
But what happens if a sibling changes address or has a separate telephone number? In that case, the contact details of each person would be stored in the patient record, without a 'residences' table; your user interface could include a function which copies contact data from record to record.
I think that this approach is better because each person has his own contact data which is independent from everyone else. The problem which you stated is more an interface problem (how do I add this functionality to make things easier for the users?) than a data modelling problem (how should the data be stored?).
Related
I am designing a database for a system and I came up with the following three tables
My problem is that an Address can belong to either a Person or a Company (or other things in the future) So how do I model this?
I discarded putting the address information in both tables (Person
and Company) because of it would be repeated
I thought of adding two columns (PersonId and CompanyId) to the
Address table and keep one of them null, but then I will need to add
one column for every future relation like this that appears (for
example an asset can have an address where its located at)
The last option that occur to me was to create two columns, one
called Type and other Id, so a pair of values would represent a
single record in the target table, for example: Type=Person,Id=5 and
Type=Company,Id=9 this way I can Join the right table using the type
and it will only be two columns no matter how many tables relate to
this table. But I cannot have constraints which reduce data integrity
I don't know if I am designing this properly. I think this should be a common issue (I've faced it at least three times during this small design in objects like Contact information, etc...) But I could not find many information or examples that would resemble mine.
Thanks for any guidance that you can give me
There are several basic approaches you could take, depending on how much you want to future proof your system.
In general, Has-One relationships are modeled by a foreign key on the owning entity, pointing to the primary key on the owned entity. So you would have an AddressId on both Company and Person,which would be a foreign key to Address.Id. The complexity in your case is how to handle the fact that a person can have multiple addresses. If you are 100% sure that there will only ever be a home and work address, you could put two foreign key columns on Person, but this becomes a big problem if there's a third, fourth, fifth etc. address. The other option is to create a join table, PersonAddress, with three columns a PersonId an AddressId and a AddressType, to indicate whether its a home work or whatever address.
I have a database with millions of client contacts. However, a lot of them are duplicated and may I ask some hero from here to advise how to identify those duplicates using Oracle SQL, PL/SQL or Excel.
Following is the data structure:
Client_Header
id integer (Primary Key)
Client_First_Name (varchar2)
Client_Last_Name (varchar2)
Client_Date_Of_Birth (timestamp)
Client_Address
Client_Id (Foreign Key ref Client_header)
Address_Line1 (varchar2)
Address_Line2 (varhchar2)
Adderss_Line3 (varchar2)
Suburb (Varchar2)
State (varchar2)
Country (varchar2)
My challenge is other than Client_Date_Of_Birth and those key fields, all fields are free text only.
For example, we have a client like following
Surname : Jones
First name : David
Client_Date_Of_Birth: 10/05/1975
Address: Unit 10 Floor 1, 20 Railway Parade, St Peter, NSW 2044
However, as those fields are free text, I have a lot of data issues and following link (jpeg file only) illustrated some of those issues
Sample of data issues
Note:
Other than those issues, sometime we may miss the first name or last name of the client (but not both) too
Sometimes multiple problems can be find within the same record.
Also sometime, the address may simply be the name of a school,
shopping center etc.
The system does not store any other id that can uniquely identify the client.
I understand it is close to impossible to gather all duplicate records where the client address is a school or shopping center. However, for other cases, is there anyway to identify most of the duplication.
Thank you for your help!
Not a pretty sight, and I'm afraid I don't have good news for you.
This is a common problem in databases, especially if the data entry personnel are insufficiently trained. One of the main objectives in data entry training is to make the problem well understood and show ways to avoid it. Something to keep in mind in the future.
Unfortunately, there isn't any "magic wand" that will clean your data for you. I'm sorry, but you have before you one of the most tedious tasks in database maintenance. You're going to have to basically remove the duplicates by hand, and the job requires more of an editor than a database administrator.
If you have millions of records, of which perhaps a million are actually duplicates, I would estimate that it will take an expert working full time for at least two years -- and probably longer -- to clean up your problem: to do it in two years would require fixing 2000 records a day, with time off on weekends and two weeks of vacation.
In the end, the only sure way to remove all the duplicates is to compare all of them and remove them one at a time. But there are plenty of tricks you can use to get rid of blocks of them at once. Here are a few that I can think of with your data sample:
Change "Dave" to "David" in both first and last name fields. (Make sure that nobody actually has the last name "Dave.")
Change all instances of "Jones David" to "David Jones." (Make sure that there are no people named "Jones David".)
Change "1/F" to "Floor 1."
The idea is to focus on some of the fields, and in those fields get all of the duplicates to be exact duplicates. Once you have that done, you delete all the records with the target values in the fields, except the one with the primary key of the record that you want to keep (if your table isn't keyed, you'll have to find another way to do it, such as selecting the top record into a new table).
This technique speeds things up for records with a large number of duplicates. Where you have only a few duplicates, it's quicker to just identify them one by one. One way to do this quickly is to go into edit mode on a table, work with a particular field (for example, the postal code field in this case), and put a unique value in that field when you want to mark it for deletion (in this case, perhaps a single zero). Then you can periodically delete all the records with that value in the field.
You'll also need to sort the data in multiple ways to find the duplicates, which it appears you already know.
As for your notes, don't try to identify all the ways that the data is messed up. Once you identify one record as a duplicate of another, you don't care what's wrong with it, you just have to get rid of it. If you have two records and each contains data that you want to keep that the other one is missing, then you'll have to consolidate them and delete one of them. And then go on to the next, and the next, and the next...
Some years ago I had a similar task and I tooks about one years to clean the data.
What I did in short:
send the address to api.addressdoctor.com for validation and split into single fields (with maps.googleapis.com it is also possible)
use a first name and last name match list to check the names (we used namepedia.org). A lot depends on the quality of this list. This list should base on country of birth or of the first address. From the results we made a propability what kind of name it is (first/last/company).
with this improved date you should create some normalized and fuzzy attributes. Normalized fields from names and address...like upper and just with alpha-numeric
List item
at the end I would change the data model a little bit to improve the data quality by design. I recommend you adding pre-title, post-title, middle-name and post-name fields. You should also add the splitted address fields like street, streetno, zip, location, longitude, latitude, etc...
I would also change the relation between Client_Header and Client_Address with an extra address_Id as primary key...but this depends on the requirements. And at the end I would add some constraints to prevent duplicated entries.
after all that is the deduplication not hard. Group just all normalized or fuzzy data together and greate a dense_rank. (I group by person, household, ...) Make a ranking over the attributes (I used data quality, data fillrate and transaction history for a score value) Finally it is your choice if you just want to delete the duplicates and copy the corresponding data to the living client or virtually connect the data via Client_Id in an extra Field.
for insert and update processes you should create PL/SQL functions that check if fuzzy last-name (eg. first-name) + fuzzy address exist. Split the names and address fileds and check them with the address API's and match them with the names reference. If it is a single tuple data entry, show the best results to the user and let him decide.
I am essentially trying to create a glorified contacts list using LibreOffice Base. Many of our contacts have multiple addresses (office, mailing, home), and sometimes multiple people have the same address.
I've created a simple contacts table with the Contact ID, Last Name, and First Name. I've created an address table with an Address ID, City, State, etc. I've also created a junction table with Contact ID and Address ID, and connected the three tables using the relationships tool.
Now I want to add everything into a single form. I watched this youtube video, which was very helpful, but I want to be able to add new cities instead of only selecting from a pre-established list. So I followed along with the video, but set the columns to "combo box" instead of "list box." However, when I try that I get an error message:
Error inserting the new record
SQL Status: 23000
Error code: -177
Integrity constraint violation - no parent SYS_FK_94 table: Address Table in statement [INSERT INTO "Contact-Address Junction" ( "Address ID","Contact ID") VALUES ( ?,?)]
I assume there's something obvious that I'm missing, but at this point I'm pretty stuck.
E: I took more screenshots to show the relationships, tables, and how things connect in the form:
In the video you cite, this is kind of silly as there is no need for a many to many relationship. The author could have simply added a Movie ID field to the genre Table, (so multiple genre records can point to each movie record).
I have a complex many-to-many contacts structure that I'm in the process of porting from MS Access to LibreOffice. It has 4 primary data tables: Groups, Address, Phones, and People. And there are 6 link tables to connect those primary tables: GroupAddress, GroupPhone, GroupPerson; AddressPhone, AddressPerson; and finaly PhonePerson. Unlike the video you cite above with only 2 fields, in my link tables there are 3 fields. GroupAddress for example has GroupAddressID (the unique id for the link table itself), GroupID (which points to the Groups table), and AddressID (which points to the addresses table). This allows any number of addresses per each group, and at the same time, any number of groups per each address.
So the whole thing has maximum data flexibility: For example it allows for unlimited number of people per group, each person can have an unlimited number of phones or addresses, each address an unlimited number of people, etc.
Implementing it in LO: Because in LO you can't edit a query based on more than one table (you can view it, but not edit it), like you can in Access, (and therefore you also can't build a form with a table that can edit a query based on more than one table), in LO this is not as clean as it is in Access.
In Access it works very well and the link tables manage themselves! In LO you have to manually edit the link tables in a separate table. I'm starting to think about how I might use some macros to improve on this, but for the moment it's bare bones.
To give you a better idea, the first form to edit Groups, which also edits the group's addresses, people, and phones (not just those tables, but also the links to those tables) looks like this:
Group lookup pulldown (used to find a group record)
(this is a drop down box with custom code that helps me find group records)
Group editing fields, e.g. Group name, category, url, etc
Then below that
Group-Person links table
Then to the right of that
Persons table (plural) - to create new person, then..
And below that
Person table (singular)
[pointed to by Group-Person link]
to view person pointed to.
Group-Address links || (similar structure to above)
(for example, when a business has two or more addresses)
Group-Phone links || (similar structure to above)
(this is for phone#'s that the group owns directly, not personal phones of the group's members, and not phones tied to specific addresses).
----------
Then there are 3 other similar forms,
one for Addresses w/ Person, Group & Phone links;
one for Phones w/ Group, Address & Person links;
and one for People w/ Group, Address and phone links;
Here is a screen shot of the first of the 4 editing forms, to edit the groups and the links associated with the group:
Hope this helps. I would be interested to see what you develop. Thanks.
Please see link to FileMaker Pro 12 database I've created to illustrate my problem:
https://dl.dropboxusercontent.com/u/24821795/Example.fmp12
I want to count the number of times an Activity has been assigned to a Staff member, but there are a couple of things making it tricky (not impossible, I hope):
When the user performs a Find, the count should update to only include the found records.
The user can add to the list of activities.
In the example provided, SelfJoinCount and Activities::Count are not what I want - they both count Activity (e.g. Archery has been assigned to two staff members), but do not meet criteria 1. above.
Try performing a Find of Gender = M
The values of ReviewedCount (a summary field, counting Reviewed) change to 3, which is what I want.
The values of SelfJoinCount and Activities::Count do not change. In this case, I want them to change to 1 (i.e. One record with Ballooning, one record with Bird watching and one record with Archery in the found set).
I could create a calculation field with a 1 in it if the activity occurs and then a summary field counting that 1 for every single activity in the database, BUT this won't work because of criteria 2. above (also, there are a lot of activities).
Any ideas?
Ok, several problems you have here.
First, you need key values for your tables. This can easily be accomplished by creating a number field in each table. In the options, select auto-enter serial number, validation unique, not empty.
Now, you need a 3rd table for the join. This table will have the foreign key value for the Staff member as well as the foreign key for the activity.
You will want to have your Staff layout in form view, add a portal into the join table with they foreign key field for the activity. Create a popup or drop down list for activities (hint: if you want the name of the activity to show in the layout rather than its key, use a popup.) it needs to have they key value and display second field, all values, show only value from second field.
This will allow you to have a many to many relationship between the tables so that a single staff member can have many activities and an activity can have many staff members.
Now, if you want a count of each activity, you could of course create calculation fields in the staff table to count instances of each activity type, but I find that cumbersome and time consuming as well as requiring you to create TO's for each activity. What you really want to use is an ExecuteSQL() function. Some thing like this:
ExecuteSQL("
SELECT COUNT(J.FK_ActivityID)
FROM JoinTable J
WHERE J.FK_StaffID =?
Group by J.FK_ActivityID";"";"";Staff::PK_StaffID)
You can tweak that ExecuteSQL to include a specific activity, or leave it as it is an include them all. Up to you how you do it.
If it wasn't 7am and having had no sleep, I would mock up the file for you, but I think you would do better testing it and working on it on your own.
I am working on a project where there are several types of users (students and teachers). Currently to store the user's information, two tables are used. The users table stores the information that all users have in common. The teachers table stores information that only teachers have with a foreign key relating it to the users table.
users table
id
name
email
34 other fields
teachers table
id
user_id
subject
17 other fields
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user use users.id. Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
e.g.
users
id
name
email
subject
51 other fields
Is this too many fields for one table? Will this impede performance?
I think this design is fine, assuming that most of the time you only need the user data, and that you know when you need to show the teacher-specific fields.
In addition, you get only teachers just by doing a JOIN, which might come in handy.
Tomorrow you might have another kind of user who is not a teacher, and you'll be glad of the separation.
Edited to add: yes, this is an inheritance pattern, but since he didn't say what language he was using I didn't want to muddy the waters...
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user
use users.id.
I would expect relating to the teacher_id for classes/sections...
Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
Are you modelling a system for a high school, or post-secondary? Reason I ask is because in post-secondary, a user can be both a teacher and a student... in numerous subjects.
I would think it fine provided neither you or anyone else succumbs to the temptation to reuse 'empty' columns for other purposes.
By this I mean, there will in your new table be columns that are only populated for teachers. Someone may decide that there is another value they need to store for non-teachers, and use one of the teacher's columns to hold it, because after all it'll never be needed for this non-teacher, and that way we don't need to change the table, and pretty soon your code fills up with things testing row types to find what each column holds.
I've seen this done on several systems (for instance, when loaning a library book, if the loan is a long loan the due date holds the date the book is expected back. but if it's a short loan the due date holds the time it's expected back, and woe betide anyone who doesn't somehow know that).
It's not too many fields for one table (although without any details it does seem kind of suspicious). And worrying about performance at this stage is premature.
You're probably dealing with very few rows and a very small amount of data. You concerns should be 1) getting the job done 2) designing it correctly 3) performance, in that order.
It's really not that big of a deal (at this stage/scale).
I would not stuff all fields in one table. Student to teacher ratio is high, so for 100 teachers there may be 10000 students with NULLs in those 17 fields.
Usually, a model would look close to this:
I your case, there are no specific fields for students, so you can omit the Student table, so the model would look like this
Note that for inheritance modeling, the Teacher table has UserID, same as the User table; contrast that to your example which has an Id for the Teacher table and then a separate user_id.
it won't really hurt the performance, but the other programmers might hurt you if you won't redisign it :) (55 fielded tables ??)