Check My Database Design / PHP/MySQL - sql

I'm currently working on improving my database to make room for growth. As it stands, different users have different 'permissions' to areas of the website. Some users have permissions to multiple areas of the website.
I'd like some feedback if I'm doing this in the most efficient way:
tblUsers:
usrID usrFirst usrLast phone //etc....
1 John Doe
2 Jane Smith
3 Bill Jones
tblAreas:
id name
1 Marketing
2 Support
3 Human Resources
4 Media Relations
tblPermissions:
id usrID areaID
1 1 2
2 1 4
3 2 1
4 3 3
Right now, for each "area", I have separate directories. However, I'd like to minimize all of these directories down to one main directory, and then redirect users on logging in to their appropriate 'area' based upon their permissions.
Does it sound like I'm doing this correctly? I've never created a multi-layered site with different permissions and different groups of people, thus, I'm certainly open to learning more on how to do this correctly.
Thanks very much!

The general design is ok. The issues that pop out on me relate to naming.
SQL doesn't need hungarian notation -- generally considered unnecessary / bad (tblUsers -> users).
I wouldn't prefix table-names to column-names ...
... except for column "id" which should always include your table name (i.e. areaId)
Your "first" and "last" column don't make sense (hint: firstName)
I'd rename tblPermissions -> userAreas
Depending on your programming language and database, I'd also recommend using underscore instead of capitalization for your table/column-names.
As for using separate directories for different groups, I'd advise against it. Have the security-checks in your code instead of your directory layout.
Reasoning:
What happens when somebody decides that support is also allowed to do some marketing stuff? Should you change your code, or add a record into your database?
Or what if you have overlapping actions?
#brianpeiris: A couple of things come to mind:
No need for column aliases in JOINs
Makes it easier to search through code ("foo_id" gives less results than "id")
JOIN USING (foo_id) instead of JOIN ON (foo.id=bar.id).

The schema looks fine.
I would suggest that you put access control in the controller and base it of of URL path, so that you are not coding it into every section.

Yes, this seems like it is addressing your need perfectly from the database side.
The challenge will be using the data as simply and declaratively as possible. Where is the right place to declare what "area" you are in? Does each page do this, or is there a function that calculates it, or can your controllers do it? as someone suggests. The second part is evaluating the current user against this. Ideally you end up with a single function like "security_check_for_area(4)" that does it all.

Related

Postgres: Is there a way to target specific tables based on your data?

I'm new to SQL and I'm currently thinking about an effective way to build out my database. It's a language learning application and I'm torn between two approaches:
Keeping all of my words, regardless of their language, in one giant words table
Splitting my words into separate tables based on their language, ie: words_french, words_italian, etc.
In the second scenario, are there approaches that I can use (perhaps within Postgres) that would allow me target the words_french table in the event that I'm currently working through french lessons / content and need to lookup associated french words?
I feel like there would be some sort of concat process like so: words_${language} and as of this moment I'd figure i'd have to resolve this within JS or something else on the frontend.
-- also, is breaking words and other content into their respective table_language even a valid approach?
Any ideas?
Use Option 1. Option 2 would be horribly difficult to work with.
Word table:
WordId
Word
Language
1
a
English
2
un
French
As Dimitar Spasovski suggests, if you have a need for additional attributes associated with the language, you should also have a Language table. Then replace the Language column in the Word with LanguageId to make the relationship.
Watching or reading some data modeling or data architecture classes online will help.

SQL Server Text Searching

I have a business requirement where we need to do somce crazy name matching against records stored in the database and I was wondering if there is any easy way to do it using SQL Server.
Name Stored in the DB : Austin K
Name to be Matched from UI : Austin Kierland
That's just a sample. In reality, there could be whole lot of different permutations and combinations.
If it's other way round, I could've used wild character but in this case, the name in the database is smaller than the search criteria.
Any suggestions?
Realistically - no. Databases were meant for comparing absolute values, not for messy comparisons. The way they store their data internally just isn't fit for really messy matching. Actually even a superpowerful dedicated search engine like Google, that has a LOT of messy matching features, wouldn't be able to pull off your example without prior knowledge.
I don't know how the requirement is precisely worded, but I'd either shoot the feature request with "technically impossible", or implement a rule set for which messy matches are tried - for your example, you could easily 'hard code' that multiple searches are executed when capitalized words are entered, shortening them so a single letter. No idea if that's a solution to your problem though.
You can do a normal search using the LIKE operator which determines whether a specific character string matches a specified pattern. The problem you will run into is the probability of the returning of multiple records or incorrect people. I've had similar requirement myself for a business app and the best solution to the issue is to require other qualifying values rather then just name. If you do a partial name search without other qualifying data you are certainly going to come across the false positive matches and/or multiple records. In my case I built a web service that checks eligibility allowing text search for first & last name but also added date of birth, primary person SSN, and gender which ensured the matching person was in deed the person intended to search for. If my situation was like yours in which name was the only search criteria my recommendation to the business would be we cannot perform the search until qualifying data is entered into the database otherwise there is no accurate way to query the results they are looking for.

Database design problem

I have a problem creating a database schema for the following scenario:
(I’m not creating a dating site but just using this as an example)
A user logs on to a dating site and is given a multiple selection for the hair colour they’d like their date to have:
This is easy enough to model with the three tables below:
Tables:
User
{key}
HairColour
{key}
UserHairColour
{UserKey}
{HairColourKey}
However, the user also has the option to select ‘any’ which means they don’t care about hair colour and all hair colour should be included in the selection.
How do I give the user the ‘any’ option?
I could obviously select all hair colours and shove them into ’UserHairColour’ but what if I need to add a new hair colour in the future?
Absence of any records for this particular user in the UserHairColour table will indicate they do not care about the hair colour.
Absence of a decision indicates they have no preference. Obviously, it cannot mean they want their date to have no hair color at all.
I do not see here a need for a separate value or any extra table design. What you have allows you to achieve your goal in a simple way.
EDIT: As reaction to a proposed solution with ANY extra value.
The idea of "ANY" will conceptually interfere with the other selections. We are talking about presenting the user with a multitude of choices, ANY being one of them, and allowing them to select many. So the user can technically select ANY along with the other options, making it unclear what takes precedence - ANY or specific options. I believe the approach with simply no records as an indicator of ANY is clearer - it can only be interpreted one way. No records - no preferred values. You obviously cannot interpret it in the other way - no preferred value - user does not want this value to be present - this will make for transparent hair color which makes no sense. You can say it can mean no hairs at all, but I would suggest to have a separate option or a separate question for that already.
Given the example above, I would just add 'Any' or 'No Preference' as a selection and treat it as a specific hair color. This would work the best because if you did want to add more specific hair colors. Typically when I create new relational models I tend to add a -1 for the first key entry and keep the values for that row as my default go to ones.
This would be better practice than just dummy'ing it out with a temp table or query in my opinion.
This should be simple to achieve. If the user chooses "Any", you simply handle it on the query:
select
*
from
User
left join
UserHairColour on UserHairColour.UserId=User.UserId
where
(#hairpreference = 'Any' OR UserHairColour.HairColourId=#hairpreference)
If you can set the input var #hairpreference to null instead of 'Any', then it gets easier:
where
(UserHairColour.HairColourId=COALESCE(#hairpreference, UserHairColour.HairColourId))
Declare a temp table, fill it with the color values and query like this:
SELECT *
FROM UserHairColor
JOIN User
ON User.id = UserHairColor.UserID
WHERE HairColorKey IN
(
SELECT ColorKey
FROM #mytable
)
UNION ALL
SELECT *
FROM UserHairColor
JOIN User
ON User.id = UserHairColor.UserID
AND NOT EXISTS
(
SELECT NULL
FROM #mytable
)
This will select all users with requested hair colors, of all users at all if the table is empty.
If users can select any number of HairColours, I think, for consistency, it would be useful to do shove a record in UserHairColours for every colour. If users can select only one, one of which is 'any', then I favour New in town's solution.
Put (PersonID, HairColorPreference) in a table of its own. If someone has no preference, just don't record a row in that table.
Use views to put together people with preference with just that preference, and people with no preference with all hair colors.
BTW, what are you going to do with people whose preference is "anything but purple"?
As clearly you are not going to build a dating site you may make it clear are the other answers here fulfill your need or not. But my suggestion is to creat another table to tell if a user has selected any hair color of no hair color at all( sounds nonsense in your example but may have meaning in other situation).
By having following tables in your database you may accomplish this.
Users
HairColor
TypeOfColorSelection(1:Selected, 2:All, 3:Exclude, ...)
UserColorSelectionProfile(UserID, TypeOfColorSelection)
UserPreferredColor(UserID, HairColor)
If you want the hair color option to be mandatory then the no choice (empty set) option doesn't work.
This reminds me of the classic UK TV ads for Whiskas cat food. The strapline was originally,
Eight out of ten owners say their
cat prefers it
Later, it was changed to
Eight out of ten owners who expressed
a preference said their cat
prefers it
[The italics are mine.]
Clearly, the results are skewed when failing to show the difference between implicitly explicitly having no preference, otherwise why change a purrfectly good strapline for one that doesn't scan quite as well? QED ;)
My preference would be to use separate tables to model those who expressed a preference (along with the colour(s) they chose), those who expressed they had no preference and those who expressed no preference.
For a worked example, see How To Handle Missing Information Without Using NULL by Hugh Darwen.

First Name Variations in a Database

I am trying to determine what the best way is to find variations of a first name in a database. For example, I search for Bill Smith. I would like it return "Bill Smith", obviously, but I would also like it to return "William Smith", or "Billy Smith", or even "Willy Smith". My initial thought was to build a first name hierarchy, but I do not know where I could obtain such data, if it even exists.
Since users can search the directory, I thought this would be a key feature. For example, people I went to school with called me Joe, but I always go by Joseph now. So, I was looking at doing a phonetic search on the last name, either with NYSIIS or Double Metaphone and then searching on the first name using this name heirarchy. Is there a better way to do this - maybe some sort of graded relevance using a full text search on the full name instead of a two part search on the first and last name? Part of me thinks that if I stored a name as a single value instead of multiple values, it might facilitate more search options at the expense of being able to address a user by the first name.
As far as platform, I am using SQL Server 2005 - however, I don't have a problem shifting some of the matching into the code; for example, pre-seeding the phonetic keys for a user, since they wouldn't change.
Any thoughts or guidance would be appreciated. Countless searches have pretty much turned up empty. Thanks!
Edit: It seems that there are two very distinct camps on the functionality and I am definitely sitting in the middle right now. I could see the argument of a full-text search - most likely done with a lack of data normalization, and a multi-part approach that uses different criteria for different parts of the name.
The problem ultimately comes down to user intent. The Bill / William example is a good one, because it shows the mutation of a first name based upon the formality of the usage. I think that building a name hierarchy is the more accurate (and extensible) solution, but is going to be far more complex. The fuzzy search approach is easier to implement at the expense of accuracy. Is this a fair comparison?
Resolution: Upon doing some tests, I have determined to go with an approach where the initial registration will take a full name and I will split it out into multiple fields (forename, surname, middle, suffix, etc.). Since I am sure that it won't be perfect, I will allow the user to edit the "parts", including adding a maiden or alternate name. As far as searching goes, with either solution I am going to need to maintain what variations exists, either in a database table, or as a thesaurus. Neither have an advantage over the other in this case. I think it is going to come down to performance, and I will have to actually run some benchmarks to determine which is best. Thank you, everyone, for your input!
In my opinion you should either do a feature right and make it complete, or you should leave it off to avoid building a half-assed intelligence into a computer program that still gets it wrong most of the time ("Looks like you're writing a letter", anyone?).
In case of human names, a computer will get it wrong most of the time, doing it right and complete is impossible, IMHO. Maybe you can hack something that does the most common English names. But actually, the intelligence to look for both "Bill" and "William" is built into almost any English speaking person - I would leave it to them to connect the dots.
The term you are looking for is Hypocorism:
http://en.wikipedia.org/wiki/Hypocorism
And Wikipedia lists many of them. You could bang out some Python or Perl to scrape that page and put it in a db.
I would go with a structure like this:
create table given_names (
id int primary key,
name text not null unique
);
create table hypocorisms (
id int references given_names(id),
name text not null,
primary key (id, name)
);
insert into given_names values (1, 'William');
insert into hypocorisms values (1, 'Bill');
insert into hypocorisms values (1, 'Billy');
Then you could write a function/sproc to normalize a name:
normalize_given_name('Bill'); --returns William
One issue you will face is that different names can have the same hypocorism (Albert -> Al, Alan -> Al)
I think your basic approach is solid. I don't think fulltext is going to help you. For seeding, behindthename.com seems to have large amount of the data you want.
Are you using SQl Server 2005 Express with Advanced Services as to me it sounds you would benefit from the Full Text indexing and more specifically Contains and Containstable which you can use with specific instructions here is a link for the uses of Containstable:
http://msdn.microsoft.com/en-us/library/ms189760.aspx
and here is the download link for SQL Server 2005 With Advanced Services:
http://www.microsoft.com/downloads/details.aspx?familyid=4C6BA9FD-319A-4887-BC75-3B02B5E48A40&displaylang=en
Hope this helps,
Andrew
You can use the SQL Server Full Text Search and do an inflectional search.
Basically like:
SELECT ProductId, ProductName
FROM ProductModel
WHERE CONTAINS(CatalogDescription, ' FORMSOF(THESAURUS, metal) ')
Check out:
http://en.wikipedia.org/wiki/SQL_Server_Full_Text_Search#Inflectional_Searches
http://msdn.microsoft.com/en-us/library/ms345119.aspx
http://www.mssqltips.com/tip.asp?tip=1491
Not sure what your application is, but if your users know at the time of sign up that people from their past might be searching the database for them, you could offer them the chance in the user profile to define other names they might be known as (including last names, women change these all the time and makes finding them much harder!) and that they want people to be able to search on. Store these in a separate related table. Then search on that. Just make the structure such that you can define one name as the main name (the one you use for everything except the search.)
You'll find that you're dabbling in an area known as "Natural Language Processing" and you'll need to do several things, most of which can be found under the topic of stemming.
Simplistic stemming simply breaks the word apart, but more advanced algorithms associate words that mean the same thing - for instance Google might use stemming to convert "cat" and "kitten" to "feline" and search for all three, weighing the actual word provided by the user as slightly heavier so exact matches return before stemmed matches.
It's a known problem, and there are open source stemmers available.
-Adam
No, Full Text searches will not help to solve your problem.
I think you might want to take a look at some of the following links: (Funny, no one mentioned SoundEx till now)
SoundEx - MSDN
SoundEx - Google results
InformIT - Tolerant Search algorithms
Basically SoundEx allows you to evaluate the level of similarity in similar sounding words. The function is also available on SQL 2005.
As a side issue, instead of returning similar results, it might prove more intuitive to the user to use a AJAX based script to deliver similar sounding names before the user initiates his/her search. That way you can show the user "similar names" or "did you mean..." kind of data.
Here's an idea for automatically finding "name synonyms" like Bill/William. That problem has been studied in the broader context of synonyms in general: inducing them from statistics of which words commonly appear in the same contexts in a large text corpus like the Web. You could try combining that approach with a list of names like Moby Names; I don't know if it's been done before.
Here are some pointers.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.