Counting value list items in FileMaker - frequency

Please see link to FileMaker Pro 12 database I've created to illustrate my problem:
https://dl.dropboxusercontent.com/u/24821795/Example.fmp12
I want to count the number of times an Activity has been assigned to a Staff member, but there are a couple of things making it tricky (not impossible, I hope):
When the user performs a Find, the count should update to only include the found records.
The user can add to the list of activities.
In the example provided, SelfJoinCount and Activities::Count are not what I want - they both count Activity (e.g. Archery has been assigned to two staff members), but do not meet criteria 1. above.
Try performing a Find of Gender = M
The values of ReviewedCount (a summary field, counting Reviewed) change to 3, which is what I want.
The values of SelfJoinCount and Activities::Count do not change. In this case, I want them to change to 1 (i.e. One record with Ballooning, one record with Bird watching and one record with Archery in the found set).
I could create a calculation field with a 1 in it if the activity occurs and then a summary field counting that 1 for every single activity in the database, BUT this won't work because of criteria 2. above (also, there are a lot of activities).
Any ideas?

Ok, several problems you have here.
First, you need key values for your tables. This can easily be accomplished by creating a number field in each table. In the options, select auto-enter serial number, validation unique, not empty.
Now, you need a 3rd table for the join. This table will have the foreign key value for the Staff member as well as the foreign key for the activity.
You will want to have your Staff layout in form view, add a portal into the join table with they foreign key field for the activity. Create a popup or drop down list for activities (hint: if you want the name of the activity to show in the layout rather than its key, use a popup.) it needs to have they key value and display second field, all values, show only value from second field.
This will allow you to have a many to many relationship between the tables so that a single staff member can have many activities and an activity can have many staff members.
Now, if you want a count of each activity, you could of course create calculation fields in the staff table to count instances of each activity type, but I find that cumbersome and time consuming as well as requiring you to create TO's for each activity. What you really want to use is an ExecuteSQL() function. Some thing like this:
ExecuteSQL("
SELECT COUNT(J.FK_ActivityID)
FROM JoinTable J
WHERE J.FK_StaffID =?
Group by J.FK_ActivityID";"";"";Staff::PK_StaffID)
You can tweak that ExecuteSQL to include a specific activity, or leave it as it is an include them all. Up to you how you do it.
If it wasn't 7am and having had no sleep, I would mock up the file for you, but I think you would do better testing it and working on it on your own.

Related

Create simple database for chess tournaments

I am trying to make simple app for chess tournaments, but I have problem with database, I have users that participate in tournament (thats fine) but how do I give users to the round and match, should i make another relations user_tournament-round-tournament, user_tournament-match-round?
Please see this answers a food for though rather than a solution. In your question there is not enough information to fully cover all use cases, so the answer below contains a lot of speculation.
In my over simplistic view and picking up on your initial model, the tournament_competitors (renaming from user_tournament as we have competitors and not users) table would create a unique id for each enrolled competitor. This id would be used as a reference in a tournament_matches table (the table would link twice to the tournament_competitors this table would connect two opponents - constraint warning). The table would also register the match type.
For the match type, I see two possibilities.
The matches table would list all possible match types (final, semi-final, quarter-final, elimination rounds, etc.) and these would be referred to in the tournament_matches table via id (composite key in the form tournament_id-competitor_id-group_id). This approach, specially for the elimination round matches, requires the need to find a way to link the number of competitors in each elimination group with then number of matches each competitor has to through before they are considered eliminated or not - creating a round number. I see this as a business logic part so not on the DB. The group_id also needs to be calculated and it would be best done on the application.
The alternative is to have the various match types in the tournament_matches table as a free field - populated by the application. The tournament structure (Number of Groups, number of opponents in each group, etc.) would be defined as attributes in the tournaments table. In this view there is no need for the rounds table.

How to identify duplicate records using client name and address in SQL while both of them is in free text

I have a database with millions of client contacts. However, a lot of them are duplicated and may I ask some hero from here to advise how to identify those duplicates using Oracle SQL, PL/SQL or Excel.
Following is the data structure:
Client_Header
id integer (Primary Key)
Client_First_Name (varchar2)
Client_Last_Name (varchar2)
Client_Date_Of_Birth (timestamp)
Client_Address
Client_Id (Foreign Key ref Client_header)
Address_Line1 (varchar2)
Address_Line2 (varhchar2)
Adderss_Line3 (varchar2)
Suburb (Varchar2)
State (varchar2)
Country (varchar2)
My challenge is other than Client_Date_Of_Birth and those key fields, all fields are free text only.
For example, we have a client like following
Surname : Jones
First name : David
Client_Date_Of_Birth: 10/05/1975
Address: Unit 10 Floor 1, 20 Railway Parade, St Peter, NSW 2044
However, as those fields are free text, I have a lot of data issues and following link (jpeg file only) illustrated some of those issues
Sample of data issues
Note:
Other than those issues, sometime we may miss the first name or last name of the client (but not both) too
Sometimes multiple problems can be find within the same record.
Also sometime, the address may simply be the name of a school,
shopping center etc.
The system does not store any other id that can uniquely identify the client.
I understand it is close to impossible to gather all duplicate records where the client address is a school or shopping center. However, for other cases, is there anyway to identify most of the duplication.
Thank you for your help!
Not a pretty sight, and I'm afraid I don't have good news for you.
This is a common problem in databases, especially if the data entry personnel are insufficiently trained. One of the main objectives in data entry training is to make the problem well understood and show ways to avoid it. Something to keep in mind in the future.
Unfortunately, there isn't any "magic wand" that will clean your data for you. I'm sorry, but you have before you one of the most tedious tasks in database maintenance. You're going to have to basically remove the duplicates by hand, and the job requires more of an editor than a database administrator.
If you have millions of records, of which perhaps a million are actually duplicates, I would estimate that it will take an expert working full time for at least two years -- and probably longer -- to clean up your problem: to do it in two years would require fixing 2000 records a day, with time off on weekends and two weeks of vacation.
In the end, the only sure way to remove all the duplicates is to compare all of them and remove them one at a time. But there are plenty of tricks you can use to get rid of blocks of them at once. Here are a few that I can think of with your data sample:
Change "Dave" to "David" in both first and last name fields. (Make sure that nobody actually has the last name "Dave.")
Change all instances of "Jones David" to "David Jones." (Make sure that there are no people named "Jones David".)
Change "1/F" to "Floor 1."
The idea is to focus on some of the fields, and in those fields get all of the duplicates to be exact duplicates. Once you have that done, you delete all the records with the target values in the fields, except the one with the primary key of the record that you want to keep (if your table isn't keyed, you'll have to find another way to do it, such as selecting the top record into a new table).
This technique speeds things up for records with a large number of duplicates. Where you have only a few duplicates, it's quicker to just identify them one by one. One way to do this quickly is to go into edit mode on a table, work with a particular field (for example, the postal code field in this case), and put a unique value in that field when you want to mark it for deletion (in this case, perhaps a single zero). Then you can periodically delete all the records with that value in the field.
You'll also need to sort the data in multiple ways to find the duplicates, which it appears you already know.
As for your notes, don't try to identify all the ways that the data is messed up. Once you identify one record as a duplicate of another, you don't care what's wrong with it, you just have to get rid of it. If you have two records and each contains data that you want to keep that the other one is missing, then you'll have to consolidate them and delete one of them. And then go on to the next, and the next, and the next...
Some years ago I had a similar task and I tooks about one years to clean the data.
What I did in short:
send the address to api.addressdoctor.com for validation and split into single fields (with maps.googleapis.com it is also possible)
use a first name and last name match list to check the names (we used namepedia.org). A lot depends on the quality of this list. This list should base on country of birth or of the first address. From the results we made a propability what kind of name it is (first/last/company).
with this improved date you should create some normalized and fuzzy attributes. Normalized fields from names and address...like upper and just with alpha-numeric
List item
at the end I would change the data model a little bit to improve the data quality by design. I recommend you adding pre-title, post-title, middle-name and post-name fields. You should also add the splitted address fields like street, streetno, zip, location, longitude, latitude, etc...
I would also change the relation between Client_Header and Client_Address with an extra address_Id as primary key...but this depends on the requirements. And at the end I would add some constraints to prevent duplicated entries.
after all that is the deduplication not hard. Group just all normalized or fuzzy data together and greate a dense_rank. (I group by person, household, ...) Make a ranking over the attributes (I used data quality, data fillrate and transaction history for a score value) Finally it is your choice if you just want to delete the duplicates and copy the corresponding data to the living client or virtually connect the data via Client_Id in an extra Field.
for insert and update processes you should create PL/SQL functions that check if fuzzy last-name (eg. first-name) + fuzzy address exist. Split the names and address fileds and check them with the address API's and match them with the names reference. If it is a single tuple data entry, show the best results to the user and let him decide.

Creating Table Relationships

I am working on a VB.net (VS-2010, Win XP Pro 2 SP3), Employee Management Project. I need to keep track of Employee Leave Attendance and also each Equipment assigned to an Employee. How can I achieve this using SQLlite.
It will be very useful if you could provide me with examples as I am completely new to the field of SQL and VB.net
I think this can be done with two tables where one has the primary key while the other has a foreign key, but I am not sure. Also how many tables will I need for storing data in Leave and Equipment Form.
I went through other questions but I was unable to figure out a solution for my problem.
(Sorry, I cannot provide with images as this site prevents me from posting images without 10 reps)
Most problems are only as complex, and as simple as you make them. Out of habbit, nearly all tables end up with a unique ID field. There are exceptions, which I will call "link" tables, eg, ones that provide connection details between two data tables.
Now, in your senario
You would need a "holiday" table, where each row will contain the employee unique ID and either a start/finish date, eg, if they take half a day, it needs to be visible, or, just a year and value, eg in 2011, I booked, 2 lots of 35 hours, and 1 lot of 4 hours eg, Ive taken 2 weeks and half a day.
For the equipment, you would need a data table, since an item can only got to 1 employee, it depends if you're going to use this for booking or not, but if its just like a library, eg I currently have a loaner laptop, then you can just have an employee field in the equipment table. If you need a booking system, then you would require link tables and more complex.
Best way to work out your tables is to try and group your data, and then write the items on peices of paper and see how you as a human do it. After a while you end up able to do so in your head.

Adding new fields vs creating separate table

I am working on a project where there are several types of users (students and teachers). Currently to store the user's information, two tables are used. The users table stores the information that all users have in common. The teachers table stores information that only teachers have with a foreign key relating it to the users table.
users table
id
name
email
34 other fields
teachers table
id
user_id
subject
17 other fields
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user use users.id. Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
e.g.
users
id
name
email
subject
51 other fields
Is this too many fields for one table? Will this impede performance?
I think this design is fine, assuming that most of the time you only need the user data, and that you know when you need to show the teacher-specific fields.
In addition, you get only teachers just by doing a JOIN, which might come in handy.
Tomorrow you might have another kind of user who is not a teacher, and you'll be glad of the separation.
Edited to add: yes, this is an inheritance pattern, but since he didn't say what language he was using I didn't want to muddy the waters...
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user
use users.id.
I would expect relating to the teacher_id for classes/sections...
Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
Are you modelling a system for a high school, or post-secondary? Reason I ask is because in post-secondary, a user can be both a teacher and a student... in numerous subjects.
I would think it fine provided neither you or anyone else succumbs to the temptation to reuse 'empty' columns for other purposes.
By this I mean, there will in your new table be columns that are only populated for teachers. Someone may decide that there is another value they need to store for non-teachers, and use one of the teacher's columns to hold it, because after all it'll never be needed for this non-teacher, and that way we don't need to change the table, and pretty soon your code fills up with things testing row types to find what each column holds.
I've seen this done on several systems (for instance, when loaning a library book, if the loan is a long loan the due date holds the date the book is expected back. but if it's a short loan the due date holds the time it's expected back, and woe betide anyone who doesn't somehow know that).
It's not too many fields for one table (although without any details it does seem kind of suspicious). And worrying about performance at this stage is premature.
You're probably dealing with very few rows and a very small amount of data. You concerns should be 1) getting the job done 2) designing it correctly 3) performance, in that order.
It's really not that big of a deal (at this stage/scale).
I would not stuff all fields in one table. Student to teacher ratio is high, so for 100 teachers there may be 10000 students with NULLs in those 17 fields.
Usually, a model would look close to this:
I your case, there are no specific fields for students, so you can omit the Student table, so the model would look like this
Note that for inheritance modeling, the Teacher table has UserID, same as the User table; contrast that to your example which has an Id for the Teacher table and then a separate user_id.
it won't really hurt the performance, but the other programmers might hurt you if you won't redisign it :) (55 fielded tables ??)

SQL Design - Accounting for unknown values

Say I have a table called HoursCharged
ChrgNum(varchar(10))
CategoryID(uniqueidentifier)
Month(datetime)
Hours(int)
CategoryID is a foreign key reference to another table in my database, which is just a name/ID pairing. ChrgNum is guaranteed to be unique outside of this database, and so I only check to see if it exists in my db already.
You should also know that this tool support several different groups from one database (hence the globally unique CategoryID, since different groups may potentially name the Categories the same thing, I needed to differentiate them).
This table is populated from a CSV file. The idea is that every combination of ChrgNum, CategoryID, and Month is going to be unique. The report that is being run to create the import file can only see a certain range (i.e. a year). Therefore the algorithm looks something like this:
IF (ChrgNum exists in database, CategoryID exists in database,
combo of ChrgNum/CategoryID/Month DOES NOT exist in table HoursBurned)
THEN add a new row for this entry
ELSE IF (ChrgNum exists in database, CategoryID exists in database,
combo of ChrgNum/CategoryID/Month DOES exist in table HoursBurned)
THEN update the existing row with the new Hours value.
This is all fine and good, except now my boss wants me to account for hours charged, in a particular month, for a known ChrgNum and an unknown Category.
My question is, how do I account for this? If I simply insert a NULL CategoryID, what happens if a totally seperate group charges hours to the same number and category? My other idea was to create a new table for the unknown Categories, but if I do this, and the first import has two unknown categories while the next one has one of the two again (which can happen) what do I do?
My head has been swirling around this for hours. Any help is appreciated!
Your boss presented you with this problem, so why not ask them? Because this really sounds like a business problem. If you have groups which are reporting on categories you don't know about then surely you should be attempting to synchronize your database with the systems which feed it?
Otherwise, what is wrong with having a single "Unknown" category? You are being asked to track the hours assigned to categories you don't currently track.
It seems to me like you should be adding the unrecognized categories to the existing categories table on the fly. If the problem is then distinguishing between categories with the same name from different groups, don't you already have that problem now?