this was asked in an interview where i had to store multiple phone numbers for each employee. I answered we could have a comma separated string of numbers. The next question was what if the size of the string becomes really long (suppose hypothetically 1000 numbers). Come up with a better solution. I was clueless. Could someone suggest the correct approach to the solution to this problem..
EDIT: i did suggest we freeze number of columns as some max number and insert aas per needs but it would lead to to many NULL values in most cases so that would have been a bad design.
EDIT: I just wanted to know if their does exist some other way of solving this problem other than adding a new table as suggested in one of the below comments (which i did tell as an answer).
BTW is this some trick on the interviewer's part or does another solution actually exist?
How about a simple 1:n-Relation? Create a seperate table for the phone numbers like this:
Phone_Numbers(id, employee_id, phone_number_type, phone_number)
This way you can add thousands of phone numbers for each employee and not have a problem.
In general: It is never a good idea to store a comma-separated anything in a database field. You should read up on Database Normalization. Usually the 3NF is a good compromise to go
Here phone number is a multi-valued attribute. You may use comma separated values and set upper bound and lower bound to a multi valued attribute for making sense, but as your interviewer asked for 1000 number entries then it will be good to provide atomicity to the table and create a new row for every phone number. This will increase the number of rows. You may then perform normalization. It is a case of multivalued dependency so you have to go till 4NF to come over this problem.
you said you wanted to store long string into DB, I think the DB can not be reational DB, it can be nosql db instead. if the string is very long, you can choose to store the difference of each number instead of storing each of them wholly. and I think this way can save the disk space.
eg. if you want to store 12345, 12346, 12347, 12358
you can store 12345, 1, 2, 3
Related
I have a survey that ask the question of availability via check boxes like so:
I am available (please check all that apply:
[] Early Mornings
[] Mid Mornings
[] Early Afternoons
[] Mid Afternoons
[] Evenings
[] Late Evenings
[] Overnight
That I need to translate into a SQL database. My question is: What is the best way to store this data under one column? I was thinking of a 7 digit bit storage like: 0010001 (Indicates the candidate is only available during Early Afternoons and overnight). Is there a better way? Thanks for any opinions!
A separate table for the options and a "join table" of options to the candidate. The other solutions/suggestions will impede data integrity and performance in a relational database. If you've got another DB it might be different but don't do anything other than the relational table if you're using SQL.
Pipe delimited flags.
Make the column a fairly wide text column, then store:
'Early Mornings|Evenings|Overnight'
if those 3 choices were selected.
(Note: I do agree with the other answer that it is likely better to use separate columns, but this is how I'd do it if there were a good reason to want just 1 column)
Is there any particular reason the results need to be stored inside one column? If so, your solution is probably the best way EDIT: If you are going to be querying this data your solution is the best way, otherwise follow the other answer using "|" to separate the strings in one long varchar field, though anyone looking at that data is going to have no clue what it means unless they've taken the time to memorize each question in order.
If it doesn't need to be all in one column I'd recommend just creating a column for each question with a bit value similar to what you already want to do.
I'm working on a problem in which i have to mask/replace (i know they are both different)some data like credit card no,account no,date of birth etc with a particular pattern .
for example if a credit card no. is like 123/456/789 it will show ###/###/### in front end .
The solution i thought is to use regexp_replace function and it's working but the problem is that it's taking to much time and the query is very tedious and is giving a new column for each pattern(need to match more than 75 pattern for only credit card no. and account no.)+ future pattern will also come
Secondly,is it possible that we can creating a table in which we can store all the pattern and reflect to that table using dynamic sql query ??(if we get the table create access)(but i don't know how to do this )
Thirdly,we can use procedure to mask the data(not replace the data with a pattern),generate the random no. for protecting of data.(I don't think so they will agree on this ,the senior members).
if any other optimum solution is there please share,i also don't know that all the credit card no,account no etc reside in one table or they are present in more than one table, if the data is present in more than one table then what will be the solution ??
Detailed explanation needed....
From a design point of view these data points should have been stored in unique columns -- a column for credit card numbers for example. Is that not the structure of this table? If it is, why would you even include that column in your query? If cc numbers, etc. are included with other columns you may want to take the time to re-structure if you plan to use moving forward.
Continued on if they are stored in the same column -- you are really risking a breach of PII by relying on a replace function to remove sensitive information. Consider other options for accessing the data you need so that you don't breach confidential information due to a mistake in data entry.
We have a database where our customer has typed "Bob's" one time and "Bob’s" another time. (Note the slight difference between the single-quote and apostrophe.)
When someone searches for "Bob's" or "Bob’s", I want to find all cases regardless of what they used for the apostrophe.
The only thing I can come up with is looking at people's queries and replacing every occurrence of one or the other with (’|'') (Note the escaped single quote) and using SIMILAR TO.
SELECT * from users WHERE last_name SIMILAR TO 'O(’|'')Dell'
Is there a better way, ideally some kind of setting that allows these to be interchangeable?
You can use regexp matching
with a_table(str) as (
values
('Bob''s'),
('Bob’s'),
('Bobs')
)
select *
from a_table
where str ~ 'Bob[''’]s';
str
-------
Bob's
Bob’s
(2 rows)
Personally I would replace all apostrophes in a table with one query (I had the same problem in one of my projects).
If you find that both of the cases above are valid and present the same information then you might actually consider taking care of your data before it arrives into the database for later retrieval. That means you could effectively replace one sign into another within your application code or before insert trigger.
If you have more cases like the one you've mentioned then specifying just LIKE queries would be a way to go, unfortunately.
You could also consider hints for your customer while creating another user that would fetch up records from database and return closest matches if there are any to avoid such problems.
I'm afraid there is no setting that makes two of these symbols the same in DQL of Postgres. At least I'm not familiar with one.
Basically, I am the new IT type guy, old guy left a right mess for me! We have a MS-Access DB which stores the answers to an online questionnaire, this particular DB has about 45,000 records and each questionnaire has 220 questions. The old guy, in his wisdom decided to store the answers to the questionnaire questions as text even though the answers are 0-5 integers!
Anyway, we now need to add a load of new questions to the questionnaire taking it upto 240 questions. The 255 field limit for access and the 30ish columns of biographical data also stored in this database means that i need to split the DB.
So, I have managed to get all the bioinfo quite happily into a new table with:
SELECT id,[all bio column names] INTO resultsBioData FROM results;
this didn't cause to much of a problem as i am not casting anything, but for the question data i want to convert it all to integers, at the moment I have:
SELECT id,CInt(q1) AS nq1.......CInt(q220) AS nq220 INTO resultsItemData FROM results;
This seems to work fine for about 400 records but then just stops, I thought it may be because it hit something it cant convert to a integer to start with, so i wrote a little java program that deleted any record where any of ther 220 answers wasnt 0,1,2,3,4 or 5 and it still gives up around 400 (never the same record though!)
Anyone got any ideas? I am doing this on my test system at the moment and would really like something robust before i do it to our live system!
Sorry for the long winded question, but its doing my head in!
I'm unsure whether you're talking about doing the data transformation in Access or SQL Server. Either way, since you're redesigning the schema, now is the time to consider whether you really want resultsItemData table to include 200+ fields, from nq1 through nq220 (or ultimately nq240). And any future question additions would require changing the table structure again.
The rule of thumb is "columns are expensive; rows are cheap". That applies whether the table is in Access or SQL Server.
Consider one row per id/question combination.
id q_number answer
1 nq1 3
1 nq2 1
I don't understand why your current approach crashes at 400 rows. I wouldn't even worry about that, though, until you are sure you have the optimal table design.
Edit: Since you're stuck with the approach you described, I wonder if it might work with an "append" query instead of a "make table" query. Create resultsItemData table structure and append to it with a query which transforms the qx values to numeric.
INSERT INTO resultsItemData (id, nq1, nq2, ... nq220)
SELECT id, CInt(q1), CInt(q2), ... CInt(q220) FROM results;
Try this solution:
select * into #tmp from bad_table
truncate table bad_table
alter bad_table alter column silly_column int
insert bad_table
select cast(silly_column as int), other_columns
from #tmp
drop table #tmp
Reference: Change type of a column with numbers from varchar to int
Just wrote a small java program in the end that created the new table and went through each record individually casting the fields to integers, takes about an hour and a half to do the whole thing though so i am still after a better solution when i come to do this with the live system.
I need to get data in multiple row of one column.
For example data from that format
ID Interest
Sports
Cooking
Movie
Reading
to that format
ID Interest
Sports,Cooking
Movie,Reading
I wonder that we can do that in MS Access sql. If anybody knows that, please help me on that.
Take a look at Allen Browne's approach: Concatenate values from related records
As for the normalization argument, I'm not suggesting you store concatenated values. But if you want to join them together for display purposes (like a report or form), I don't think you're violating the rules of normalization.
This is called de-normalizing data. It may be acceptable for final reporting. Apparently some experts believe it's good for something, as seen here.
(Mind you, kevchadder's question is right on.)
Have you looked into the SQL Pivot operation?
Take a look at this link:
http://technet.microsoft.com/en-us/library/ms177410.aspx
Just noticed you're using access. Take a look at this article:
http://www.blueclaw-db.com/accessquerysql/pivot_query.htm
This is nothing you should do in SQL and it's most likely not possible at all.
Merging the rows in your application code shouldn't be too hard.