Creating Table Relationships - vb.net

I am working on a VB.net (VS-2010, Win XP Pro 2 SP3), Employee Management Project. I need to keep track of Employee Leave Attendance and also each Equipment assigned to an Employee. How can I achieve this using SQLlite.
It will be very useful if you could provide me with examples as I am completely new to the field of SQL and VB.net
I think this can be done with two tables where one has the primary key while the other has a foreign key, but I am not sure. Also how many tables will I need for storing data in Leave and Equipment Form.
I went through other questions but I was unable to figure out a solution for my problem.
(Sorry, I cannot provide with images as this site prevents me from posting images without 10 reps)

Most problems are only as complex, and as simple as you make them. Out of habbit, nearly all tables end up with a unique ID field. There are exceptions, which I will call "link" tables, eg, ones that provide connection details between two data tables.
Now, in your senario
You would need a "holiday" table, where each row will contain the employee unique ID and either a start/finish date, eg, if they take half a day, it needs to be visible, or, just a year and value, eg in 2011, I booked, 2 lots of 35 hours, and 1 lot of 4 hours eg, Ive taken 2 weeks and half a day.
For the equipment, you would need a data table, since an item can only got to 1 employee, it depends if you're going to use this for booking or not, but if its just like a library, eg I currently have a loaner laptop, then you can just have an employee field in the equipment table. If you need a booking system, then you would require link tables and more complex.
Best way to work out your tables is to try and group your data, and then write the items on peices of paper and see how you as a human do it. After a while you end up able to do so in your head.

Related

Access 365 - No duplicates based on a forgein key

I have an issue/problem that I'm looking for a solution for. If I find a solution for this, it could fix two issues I'm facing in a database tool I'm creating for my school's admin office.
The gist of it is, I'm looking for a way in Access 365 (or Access 2019) either through a form/query/vba... to limit data to not have duplicates. Now, of course I know that you can set the field in your table to not have duplicates. But, here is the thing, I wouldn't mind a duplicate in that column in certain cases.
So, let me explain the issue/problem by giving an example.
Image that you have two tables. One table is filled with the name of a school bus and the yearly price that the parents have to pay for it. Another table has all the stops that the school busses take.
Now, here comes the problem. I would love to have a system where I can let the user decide the order of those stops.
What I'm currently doing is, I let the user fill in a number. In another column, I have a calculated field that adds the primary key of the schoolbus table "." the number of the order.
So, that would look like this for the first school bus and the first five stops.
1.1, 1.2, 1.3, 1.4, 1.5
And I have a duplicate control that column. Since that way, I can make sure that it doesn't block the number 1 for the next bus.
Something that might work as well, if that's possible in access is that for each school bus you create, you add a column in the stops table to check for duplicates. But how do you make sure then to filter which stops belongs to which schoolbus?
I hope my issue is a bit clear. If it's not, feel free to ask.
I have a similar style problem in another part of the tool with a book fair system.
Where you have an order table where the order number, the student number, the creation and pickup dates are stored. It also has a field to lock the form to avoid any edits for the bookkeeping folks. I also have an orderline form where all the actual bought books are stored. Now, I would love it if a system exists to avoid users adding the same book twice on an order in the orderline table since I do have an amount column for that. So, in a way, it's almost similar to the above described problem.

SQL - MS Access Form design - add data of ISA relationships

I'm taking a DBMS course and I need to design and build my own DB. I have a database for a hospital where doctors,nurses,support staff etc are in a ISA relationship to an Employee entity with the rest of the data like the name, address , salary and the rest of the employee data.
Designing a form, I want to be able to add an employee with all of their data in one form.
Is there a way to do a "conditional table" of sorts where if i select "doctor" from a drop-down i get to add to the doctor table too, and same for the rest of the entities under the ISA relationship?
Thx!
As a general rule, when dealing with data, you do NOT flip or switch tables for a given form or relatonal database design.
So, for example. If I have a table of customers. Well, now if I want to mark some of the customers as plumbers, and others as doctors? I don't create two tables.
All I would do is add ONE column to that customers table and it would simply allow me to set the type of customer. The reason for this design is "many" but some significant reasons are:
For each new type of customer, you would not create a new table. Worse, all of the forms, the reports, the SQL, the code you write? Well, all of that code would have to be modified EACH time you create a new table. So, you SIMPLY cannot adopt a design in which the concept of changing a table is part of that process.
Forms are bound to ONE table. For related data, you in most cases will use a sub form.
So, think of even a accounting system. They can have huge numbers of customers, and as a result, you can "query" that table to give you all customers. Or you might ask how many accounting firms are in the customer list. Or make a report that summeries by customer type a "count" of each type of customer.
So, buidling forms, or reports? They cannot on the fly "change" the tables they are using.
So, in place of a tables called:
SalesJan
SalesFeb
SalesMar
etc.
Well, now you can't query sales from Jan to mar, because the data is in different tables.
So, what you do is have ONE table called "sales", and you add ONE column of the date. Now, at the start of each new month, you don't have to create a new table.
Now, of course in some cases it makes sense to create a separate table. For example, a table of customers, and a table of employees in a database is just fine. It makes sense in this case to use two tables, since the information about a customer and what they can do and the kind of information is VERY different then how you would deal with employees.
So, with above? Well, if I need to print mailing labels for all customers and all employees? That would require two different reports. And very likely the table structure for the two tables is different.
Bottom line:
If you working on design or form or report? And you needing to try and change the table that the form/report/code etc is going to operate on? This is a sign that your design approach has gone complete off the rails and is the wrong design.
So, in the case of doctors, nurses etc.? Well, they are all hospital staff, and MOST of the basic information about such employees will be common, much the same, and thus a SINGLE table of "employees" makes the most sense. You would only need a nice "employee type" combo box on that one form, and thus you can add/enter/edit/search any employee in that one table.
The fact that you "want to search" for a employee show that all these people "are" employees and thus belong in one table. And the basic information about all employees is going to be the same anyway. If you find you are attempting to create a new table but with near identical structures over and over, then just like a new table for each month sales, or a new table for each new kind of employee? Simply add the "one" column that allows you to make that distinguish, and not a whole new table.
Now one COULD even attempt to put patients in the same table, but then again, dealing with patents as opposed employees is a considerable different kind of "thing".
So employees are employees - even different kinds. (manager, cleaning staff etc.).
And patients are patients - even different kinds (long term care, emergency etc.).

How to identify duplicate records using client name and address in SQL while both of them is in free text

I have a database with millions of client contacts. However, a lot of them are duplicated and may I ask some hero from here to advise how to identify those duplicates using Oracle SQL, PL/SQL or Excel.
Following is the data structure:
Client_Header
id integer (Primary Key)
Client_First_Name (varchar2)
Client_Last_Name (varchar2)
Client_Date_Of_Birth (timestamp)
Client_Address
Client_Id (Foreign Key ref Client_header)
Address_Line1 (varchar2)
Address_Line2 (varhchar2)
Adderss_Line3 (varchar2)
Suburb (Varchar2)
State (varchar2)
Country (varchar2)
My challenge is other than Client_Date_Of_Birth and those key fields, all fields are free text only.
For example, we have a client like following
Surname : Jones
First name : David
Client_Date_Of_Birth: 10/05/1975
Address: Unit 10 Floor 1, 20 Railway Parade, St Peter, NSW 2044
However, as those fields are free text, I have a lot of data issues and following link (jpeg file only) illustrated some of those issues
Sample of data issues
Note:
Other than those issues, sometime we may miss the first name or last name of the client (but not both) too
Sometimes multiple problems can be find within the same record.
Also sometime, the address may simply be the name of a school,
shopping center etc.
The system does not store any other id that can uniquely identify the client.
I understand it is close to impossible to gather all duplicate records where the client address is a school or shopping center. However, for other cases, is there anyway to identify most of the duplication.
Thank you for your help!
Not a pretty sight, and I'm afraid I don't have good news for you.
This is a common problem in databases, especially if the data entry personnel are insufficiently trained. One of the main objectives in data entry training is to make the problem well understood and show ways to avoid it. Something to keep in mind in the future.
Unfortunately, there isn't any "magic wand" that will clean your data for you. I'm sorry, but you have before you one of the most tedious tasks in database maintenance. You're going to have to basically remove the duplicates by hand, and the job requires more of an editor than a database administrator.
If you have millions of records, of which perhaps a million are actually duplicates, I would estimate that it will take an expert working full time for at least two years -- and probably longer -- to clean up your problem: to do it in two years would require fixing 2000 records a day, with time off on weekends and two weeks of vacation.
In the end, the only sure way to remove all the duplicates is to compare all of them and remove them one at a time. But there are plenty of tricks you can use to get rid of blocks of them at once. Here are a few that I can think of with your data sample:
Change "Dave" to "David" in both first and last name fields. (Make sure that nobody actually has the last name "Dave.")
Change all instances of "Jones David" to "David Jones." (Make sure that there are no people named "Jones David".)
Change "1/F" to "Floor 1."
The idea is to focus on some of the fields, and in those fields get all of the duplicates to be exact duplicates. Once you have that done, you delete all the records with the target values in the fields, except the one with the primary key of the record that you want to keep (if your table isn't keyed, you'll have to find another way to do it, such as selecting the top record into a new table).
This technique speeds things up for records with a large number of duplicates. Where you have only a few duplicates, it's quicker to just identify them one by one. One way to do this quickly is to go into edit mode on a table, work with a particular field (for example, the postal code field in this case), and put a unique value in that field when you want to mark it for deletion (in this case, perhaps a single zero). Then you can periodically delete all the records with that value in the field.
You'll also need to sort the data in multiple ways to find the duplicates, which it appears you already know.
As for your notes, don't try to identify all the ways that the data is messed up. Once you identify one record as a duplicate of another, you don't care what's wrong with it, you just have to get rid of it. If you have two records and each contains data that you want to keep that the other one is missing, then you'll have to consolidate them and delete one of them. And then go on to the next, and the next, and the next...
Some years ago I had a similar task and I tooks about one years to clean the data.
What I did in short:
send the address to api.addressdoctor.com for validation and split into single fields (with maps.googleapis.com it is also possible)
use a first name and last name match list to check the names (we used namepedia.org). A lot depends on the quality of this list. This list should base on country of birth or of the first address. From the results we made a propability what kind of name it is (first/last/company).
with this improved date you should create some normalized and fuzzy attributes. Normalized fields from names and address...like upper and just with alpha-numeric
List item
at the end I would change the data model a little bit to improve the data quality by design. I recommend you adding pre-title, post-title, middle-name and post-name fields. You should also add the splitted address fields like street, streetno, zip, location, longitude, latitude, etc...
I would also change the relation between Client_Header and Client_Address with an extra address_Id as primary key...but this depends on the requirements. And at the end I would add some constraints to prevent duplicated entries.
after all that is the deduplication not hard. Group just all normalized or fuzzy data together and greate a dense_rank. (I group by person, household, ...) Make a ranking over the attributes (I used data quality, data fillrate and transaction history for a score value) Finally it is your choice if you just want to delete the duplicates and copy the corresponding data to the living client or virtually connect the data via Client_Id in an extra Field.
for insert and update processes you should create PL/SQL functions that check if fuzzy last-name (eg. first-name) + fuzzy address exist. Split the names and address fileds and check them with the address API's and match them with the names reference. If it is a single tuple data entry, show the best results to the user and let him decide.

Re-assigning IDs in a non-IDENTITY type field in SQL Server database

WARNING: This tale of woe contains examples of code smells and poor design decisions, and technical debt.
If you are conversant with SOLID principles, practice TDD and unit test your work, DO NOT READ ON. Unless you want a good giggle at someone's misfortune and gloat in your own awesomeness knowing that you would never leave behind such a monumental pile of crap for your successors.
So, if you're sitting comfortably then I'll begin.
In this app that I have inherited and been supporting and bug fixing for the last 7 months I have been left with a DOOZY of a balls up by a developer that left 6 and a half months ago. Yes, 2 weeks after I started.
Anyway. In this app we have clients, employees and visits tables.
There is also a table called AppNewRef (or something similar) that ... wait for it ... contains the next record ID to use for each of the other tables. So, may contain data such as :-
TypeID Description NextRef
1 Employees 804
2 Clients 1708
3 Visits 56783
When the application creates new rows for Employees, it looks in the AppNewRef table, gets the value, uses that value for the ID, and then updates the NextRef column. Same thing for Clients, and Visits and all the other tables whose NextID to use is stored in here.
Yes, I know, no auto-numbering IDENTITY columns on this database. All under the excuse of "when it was an Access app". These ID's are held in the (VB6) code as longs. So, up to 2 billion 147 million records possible. OK, that seems to work fairly well. (apart from the fact that the app is updating and taking care of locking / updating, etc., and not the database)
So, our users are quite happily creating Employees, Clients, Visits etc. The Visits ID is steady increasing a few dozen at a time. Then the problems happen. Our clients are causing database corruptions while creating batches of visits because the server is beavering away nicely, and the app becomes unresponsive. So they kill the app using task manager instead of being patient and waiting. Granted the app does seem to lock up.
Roll on to earlier this year and developer Tim (real name. No protecting the guilty here) starts to modify the code to do the batch updates in stages, so that the UI remains 'responsive'. Then April comes along, and he's working his notice (you can picture the scene now, can't you ?) and he's beavering away to finish the updates.
End of April, and beginning of May we update some of our clients. Over the next few months we update more and more of them.
Unseen by Tim (real name, remember) and me (who started two weeks before Tim left) and the other new developer that started a week after, the ID's in the visit table start to take huge leaps upwards. By huge, I mean 10000, 20000, 30000 at a time. Sometimes a few hundred thousand.
Here's a graph that illustrates the rapid increase in IDs used.
Roll on November. Customer phones Tech Support and reports that he's getting an error. I look at the error message and ask for the database so I can debug the code. I find that the value is too large for a long. I do some queries, pull the information, drop it into Excel and graph it.
I don't think making the code handle anything longer than a long for the ID's is the right approach, as this app passes that ID into other DLL's and OCX's and breaking the interface on those just seems like a whole world of hurt that I don't want to encounter right now.
One potential idea that I'm investigating is try to modify the ID's so that I can get them down to a lower level. Essentially filling the gaps. Using the ROW_NUMBER function
What I'm thinking of doing is adding a new column to each of the tables that have a Foreign Key reference to these Visit ID's (not a proper foreign key mind, those constraints don't exist in this database). This new column could store the old (current) value of the Visit ID (oh, just to confuse things; on some tables it's called EventID, and on some it's called VisitID).
Then, for each of the other tables that refer to that VisitID, update to the new value.
Ideas ? Suggestions ? Snippets of T-SQL to help all gratefully received.
Option one:
Explicitly constrain all of your foreign key relationships, and set them to be ON UPDATE CASCADE.
This will mean that whenever you change the ID, the foreign keys will automatically be updated.
Then you just run something like this...
WITH
resequenced AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS newID,
*
FROM
yourTable
)
UPDATE
resequenced
SET
id = newID
I haven't done this in ages, so I forget if it causes problems mid-update by having two records with the same id value. If it does, you could do somethign like this first...
UPDATE yourTable SET id = -id
Option two:
Ensure that none of your foreign key relationships are explicitly defined. If they are, note them donw and remove them.
Then do something like...
CREATE TABLE temp AS
newID INT IDENTITY (1,1),
oldID INT
)
INSERT INTO temp (oldID) SELECT id FROM yourTable
/* Do this once for the table you are re-identifiering */
/* Repeat this for all fact tables holding that ID as a foreign key */
UPDATE
factTable
SET
foreignID = temp.newID
FROM
temp
WHERE
foreignID = temp.oldID
Then re-apply any existing foreign key relationships.
This is a pretty scary option. If you forget to update a table, you just borked your data. But, you can give that temp table a much nicer name and KEEP it.
Good luck. And may the lord have mercy on your soul. And Tim's if you ever meet him in a dark alley.
I would create a numbers table that has just a sequence from 1 to whatever max with an increment of 1 is for long and then change the logic of getting the maxid for visitid and maybe the others doing a right join between the numbers and the visits table. and then you can just look for te min of that number
select min(number) from visits right join numbers on visits.id = numbers.number
That way you get all the gaps filled in without having to change any of the other tables.
but I would just redo the whole database.

Adding new fields vs creating separate table

I am working on a project where there are several types of users (students and teachers). Currently to store the user's information, two tables are used. The users table stores the information that all users have in common. The teachers table stores information that only teachers have with a foreign key relating it to the users table.
users table
id
name
email
34 other fields
teachers table
id
user_id
subject
17 other fields
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user use users.id. Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
e.g.
users
id
name
email
subject
51 other fields
Is this too many fields for one table? Will this impede performance?
I think this design is fine, assuming that most of the time you only need the user data, and that you know when you need to show the teacher-specific fields.
In addition, you get only teachers just by doing a JOIN, which might come in handy.
Tomorrow you might have another kind of user who is not a teacher, and you'll be glad of the separation.
Edited to add: yes, this is an inheritance pattern, but since he didn't say what language he was using I didn't want to muddy the waters...
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user
use users.id.
I would expect relating to the teacher_id for classes/sections...
Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
Are you modelling a system for a high school, or post-secondary? Reason I ask is because in post-secondary, a user can be both a teacher and a student... in numerous subjects.
I would think it fine provided neither you or anyone else succumbs to the temptation to reuse 'empty' columns for other purposes.
By this I mean, there will in your new table be columns that are only populated for teachers. Someone may decide that there is another value they need to store for non-teachers, and use one of the teacher's columns to hold it, because after all it'll never be needed for this non-teacher, and that way we don't need to change the table, and pretty soon your code fills up with things testing row types to find what each column holds.
I've seen this done on several systems (for instance, when loaning a library book, if the loan is a long loan the due date holds the date the book is expected back. but if it's a short loan the due date holds the time it's expected back, and woe betide anyone who doesn't somehow know that).
It's not too many fields for one table (although without any details it does seem kind of suspicious). And worrying about performance at this stage is premature.
You're probably dealing with very few rows and a very small amount of data. You concerns should be 1) getting the job done 2) designing it correctly 3) performance, in that order.
It's really not that big of a deal (at this stage/scale).
I would not stuff all fields in one table. Student to teacher ratio is high, so for 100 teachers there may be 10000 students with NULLs in those 17 fields.
Usually, a model would look close to this:
I your case, there are no specific fields for students, so you can omit the Student table, so the model would look like this
Note that for inheritance modeling, the Teacher table has UserID, same as the User table; contrast that to your example which has an Id for the Teacher table and then a separate user_id.
it won't really hurt the performance, but the other programmers might hurt you if you won't redisign it :) (55 fielded tables ??)