DB Design: how to indicate if a column data is public or private - sql

I'm designing a database and I have a question about how to make private some user data.
I have a user table, with name, city, birthday, biography, etc. And the user can make some data private (other users can see that data).
First, I thought to add columns to the table to indicate if a column is private or not. For example:
User
-------
user_id | name | city | cityIsPrivate | birthday | birthdayIsPrivate
---------+------+------+---------------+----------+------------------
Or, another approach is to add a varchar column to indicate which columns are private:
User
-------
user_id | name | city | birthday | privateColumns
---------+------+------+----------+---------------
And this privateColumns will have this: "city:NO; birthday:YES".
The user table will only have three columns that can be private or public. I will only have to add three columns more to the table.
Any advice?

Do not move data into a separate table if you are going to have to join to it every time you query the main table.
Boolean (or equivalent) columns to indicate privacy for every column on which a privacy setting can be applied:
is very simple to add.
takes up practically no space.
is much quicker to query.
shows that the privacy settings are an intrinsic part of the user data.
removes unnecessary complexity from the schema.
The facts that these relate to other columns and that they represent a single kind of logical object are just a distraction. Go for simplicity, performance, and logical correctness.

Move the list of you private columns to separate table whith three fields, like:
UserId |ColumnName |IsPrivate
-----------------------------
Then you can join your queries with that table and get proper result set for each user, and at the same time change the columns of yor user table.
If your User table would not suppose have changes, it is better to move the list of you private columns to separate table with proper structure, like this:
UserId |cityIsPrivate |birthdayIsPrivate
----------------------------------------
Then you can join your user table with this table in a single query and get result set your need.
But don't make it in the same table. The first approach brings redundancy to your database structure. In your second case you would not be able to make SELECT queries by IsPrivate criterias.

You can have a separate table (UserPrivateFields, for example) listing user ID's along with fields they have elected to make private, like so:
UserID | PrivateField
-------+-------------
1 | Name
1 | City
2 | Birthday
When you're running the procedure grabbing the user info to be pulled by the person requesting the info, you can then build a query like this (assume the desired user's UserID is passed into the proc):
CREATE TABLE #PublicUserFields (Publicfield varchar(50))
INSERT INTO #PublicUserFields (Publicfield)
SELECT Publicfield
FROM userPublicfields
WHERE userid = #userid
Declare #sql VARCHAR(MAX) = 'SELECT '
WHILE EXISTS (SELECT 1 FROM #PublicUserFields)
BEGIN
DECLARE #Publicfield VARCHAR(50) =
(SELECT TOP 1 Publicfield FROM #PublicUserFields)
SET #sql = #SQL + #Publicfield
IF (SELECT COUNT(*) FROM #PublicUSERFIELDS) > 1
BEGIN
SET #SQL = #SQL + ', '
END
DELETE FROM #PublicUserFields
WHERE Publicfield = #Publicfield
END
SET #SQL = #SQL + ' FROM MainTable WHERE userID = #userID'
EXEC(#SQL)
The reason I'm bothering with dynamic SQL is that the names of your public fields can't be joined directly to the column names of the main table with this setup - they can only be selected or joined to other records with the same string value. You could maybe get around this by joining to sys.columns and doing interesting things with the object_id of the columns, but that doesn't seem much easier than this appraoch.
This makes sense IF the users can all dynamically set which fields they want to be viewable by other people. If the private fields are known and static, you may just want to separate the two categories and tighten down the permissions on read-access on the private table.

Related

Add data behind string based on data in other column in SQL Server

I have the following challenge which I can't solve at the moment:
I have a column (LongTextfield1) in, let's say, table 'Data' that contains HTML data. Each row in table 'Data' is basically a document with text and pictures. In several occassions multiple pictures are included in the same column (LongTextfield1). In case of a picture, the following code will be shown that refers to the picture:
\Download.aspx?DocumentID={SomeGUID}
To make it a bit more challenging, this piece of code can be included in the same column multiple times, depending on how much pictures are included in the HTML (document). All pictures have their own GUID. So far, so good. This is already present in the column.
In the application that I would like to show these pictures I also need a 'RecordID'. This is a unique ID per picture. So, outcome should be:
\Download.aspx?DocumentID={SomeGUID}&Recorid=SomeID
I have a table 'DocumentData' that contains the unique GUID (DocumentID) and RecordID for each picture. So, I know which RecordID belongs to each GUID.
I need a stored procedure (or other automatic mechanism) that adds the RecordID behind the corresponding GUID. I have tried to use REPLACE '{SomeGUID}' for {SomeGUID}&RecorID=SomeID' but that didn't work. Below is the stored procedure, that is fed with data from the 'DocumentData' table. It keeps on replacing the same string with every possible GUID+RecordID combination: (DocumentID={SomeGUID}&RecordID=1{SomeGUID2}&RecordID2.. etc. and that's not what I want. #DocumentID and #Origin (RecorID) in the stored procedure refer to the 'DocumentData' table:
CREATE PROCEDURE [dbo].[Replace_Picture2] (
#DocumentID UNIQUEIDENTIFIER,
#Origin varchar(100)
) AS
BEGIN
update
data
set longtextfield1 = REPLACE(CAST(LongTextField1 AS VARCHAR(max)), ltrim(cast(#DocumentID as varchar(100))) + '}', ltrim(cast(#DocumentID as varchar(100))) + '}' + '&RecordID=' + #Origin + '"')
where 1=1
and pageid = 1
and groupid = 2
and subgroupid = 8
END
GO
How can I achieve to add the RecordID behind the DocumentID, based on the combinations present in table 'DocumentData'? Thanks for your help!
This is solved by replacing ‘#DocumentID’ + ‘}”’ (notice the ‘}”’ instead of ‘}’. This prevents the stored procedure to keep on replacing and replacing..

Query performance optimization for dynamically joined columns

Current situation in SQL Server database
There is a table Entry with the following columns:
EntryID (int)
EntryName (nvarchar)
EntrySize (int)
EntryDate (datetime)
Further there should be the possibility to save additional metadata for an Entry. Names and values of these metadata should be free to choose and there should be the possibility to dynamically add those without changing the table structure of the database.
Each metadata key can be one of the following data types:
Text
Numeric value
DateTime
Boolean value (True/False)
Thus there is a table DataKey to represent the metadata names and datatypes with the following columns:
DataKeyID (int)
DataKeyName (nvarchar)
DataKeyType (smallint) 0: Text; 1: Numeric; 2: DateTime; 3: Bit
In table DataValue for each combination of Entry and DataKey values can be inserted depending on the data type of the metadata key. For each data type there is one nullable value column. This table has the following columns:
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
TextValue (nvarchar) Nullable
NumericValue (float) Nullable
DateValue (datetime) Nullable
BoolValue (bit) Nullable
Image of the database structure:
TARGET
Target is to retrieve a list of entries fulfilling the specifications like in a WHERE clause. Like the following example:
Assumption:
Meta data key KeyName1 is text
Meta data key KeyName2 is DateTime
Meta data key KeyName3 is numeric
Meta data key KeyName4 is Boolean
Query:
... WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)
Target is to do these queries in a very efficient way, also with a large amount of data like
Number of entries > 2.000.000
Number of data keys between 50 und 100 or maybe > 100
Per entry at least a subset of values specified or maybe also a value for each key (2.000.000 * 100)
PROBLEM
The first problem arises when building the query. Normally queries require to have sets with columns that can be used in the WHERE clause. In this case the columns used in the queries are entries in table DataKey as well to be able to dynamically add metadata without having to change the database table structure.
During research a solution has been found using PIVOT table techniques at runtime. But it turned out that this solution is very slow when there is a large set of data in the database.
QUESTIONS
Is there a more efficient way or structure to save the data for this purpose?
How can the requirements listed above be fulfilled, also with regard to performance and time consumption when querying?
Here is a sql fiddle with the discribed database structure and some sample data: http://www.sqlfiddle.com/#!3/d1912/3
One of the fundamental flaws in an Entity Attribute Value design (which is what you have here) is the difficulty of efficient and performant querying.
The more efficient structure for storing data is to abandon EAV and use a normalised relational form. But that will necessarily involve changing the structure of the database when the data structures change (which should be self evident).
You could abandon your TextValue/NumericValue/DateValue/BoolValue fields and replace them with a single sql_variant column, which would reduce your query complexity slightly, but the fundamental problem will remain.
As a side note, storing all numerics as floats will cause problems if you ever have to deal with money.
I dont feel qualified to comment on what is best, or on design approaches. In fact I'm inclined not to answer at all. However I have thought about your problem and that
you've taken the time to describe it clearly, and this is how I would approach it.
I'd store each metadata datatype in its own table; So
Table MetaData_Text:
ID int identity
EntryID int
KeyName nvarchar(50)
KeyValue nvarchar(max)
MetaData_DateTime, MetaData_Boolean & MetaData_Numeric have the same structure as this, but with the appropriate different datatype of the KeyValue column in each case.
The relationship between an Entry & each of these tables is 0-Many; While every row in each of these tables belongs to one Entry.
To add a new metadata item for an entry, I'd just use a stored procedure taking EntryID, keyname & having optional parameters of possible metadata datatype:
create procedure AddMetaData #entryid int, #keyname varchar(50), #textvalue varchar(max) = null, #datevalue datetime = null, #boolvalue bool = null, #numvalue float = null
as ...
For querying, I would define a set of functions to manage each type of (a) metadata datatype & (b) test needing to be performed on that datatype, for example:
create function MetaData_HasDate_EQ(#entryid int, #keyname varchar(50), #val datetime)
returns bool
as begin
declare #rv bool
select #rv = case when exists(
select 1 from MetaData_DateTime where EntryID = #entryid and KeyName = #keyname and KeyValue = #val) then 1 else 0 end;
return #rv
end
and incorporate function references into required query logic as per
SELECT ...
FROM entry e ...
WHERE (dbo.MetaData_HasText_EQ(e.EntryID, 'KeyName1', 'Test12345') <> 0
AND dbo.MetaData_HasDate_Btwn(e.EntryID, 'KeyName2', '01.09.2012 00:00:00', '01.04.2013 23:59:00') <> 0)
OR (dbo.MetaData_HasNum_GT(e.EntryID, 'KeyName3', 15.3) <> 0
AND dbo.MetaData_HasBool_EQ(e.EntryID, 'KeyName4', 1) <> 0)
I believe that performance issues with that kind of data structure may require the structure to be reworked.
However, I think this fairly simple dynamic sql allows you to query as desired, and appears to run reasonably fast in a quick test I did with over 100,000 rows in the Entry table and 500,000 in the DataValue table.
-- !! CHANGE WHERE CONDITION AS APPROPRIATE
--declare #where nvarchar(max)='where Key0=0'
declare #where nvarchar(max)='where Key1<550'
declare #sql nvarchar(max)='select * from Entry e';
select #sql=#sql
+' outer apply (select '+DataKeyName+'='
+case DataKeyType when 0 then 'TextValue' when 1 then 'NumericValue' when 2 then 'DateValue' when 3 then 'BoolValue' end
+' from DataValue v where v.EntryID=e.EntryID and v.DataKeyID='+cast(DataKeyID as varchar)
+') '+DataKeyName+' '
from DataKey;
set #sql+=#where;
exec(#sql);
You have not specified any background information on how often the table is updated, how often new attributes are added and so on...
Looking at your inputs I think you could use a snapshot which flattens your normalised data. It is not ideal as columns will need to be added manually, but it can be extremely fast. The snapshot could be updated regularly with intervals depending on your users needs.
First to answer why do people use EAV or KVP even if it is so inefficient query-wise? Blogs and textbooks have many plausible reasons. But in real life, it is to avoid dealing with an uncooperative DBA.
For a small organization with small amount of data, it is ok to have a multi-use database (OLTP + DW), because inefficiencies are not noticeable. When your database gets large, it's time to replicate your online data into a data warehouse. In addition, if the data is meant for analytics, it should be replicated further from your relational data warehouse into a dimensional model or a flat and wide for consumption.
This are the data models I would expect from a large organization:
OLTP
Relational Data Warehouse
Dimensional Model for Reporting
Datamarts for Analytics.
So to answer your question, you shouldn't query against your KVP tables and creating a view on top of it doesn't make it better. It should be flattened out (i.e. pivot) into a physical table. What you have is a hybrid of 1 and 2. If there will be no users for #3, just build #4.
Based on Dan Belandi's answer I think the easiest way to use this would be by having a stored procedure/trigger that looks at the meta-data table and builds a view on the Data-table accordingly.
Code would then look like this:
-- drop old view
IF object_id('EntryView') IS NOT NULL DROP VIEW [EntryView]
GO
-- create view based on current meta-information in [DataKey]
DECLARE #crlf char(2)
DECLARE #sql nvarchar(max)
SELECT #crlf = char(13) + char(10)
SELECT #sql = 'CREATE VIEW [EntryView]' + #crlf
+ 'AS' + #crlf
+ 'SELECT *' + #crlf
+ ' FROM [Entry] e'
SELECT #sql = #sql + #crlf
+ ' OUTER APPLY (SELECT '+ QuoteName(DataKeyName) + ' = ' + QuoteName((CASE DataKeyType WHEN 0 THEN 'TextValue'
WHEN 1 THEN 'NumericValue'
WHEN 2 THEN 'DateValue'
WHEN 3 THEN 'BoolValue'
ELSE '<Unknown>' END)) + #crlf
+ ' FROM [DataValue] v WHERE v.[EntryID] = e.[EntryID] AND v.[DataKeyID] = ' + CAST(DataKeyID as varchar) + ') AS ' + QuoteName(DataKeyName)
FROM DataKey
--PRINT #sql
EXEC (#sql)
-- Example usage:
SELECT *
FROM EntryView
WHERE (Key1 = 0 AND Key2 BETWEEN '01.09.2012 00:00:00' AND '01.04.2013 23:59:00')
OR (Key3 > 'Test15.3' AND Key4 LIKE '%1%')
I would use 4 tables, one for each data type:
MDat1
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
TextValue (nvarchar) Nullable
MDat2
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
NumericValue (float) Nullable
MDat3
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
DateValue (datetime) Nullable
MDat4
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
BoolValue (bit) Nullable
If i had partitioning available, i should use it on DataKeyID for all 4 tables.
Then i should used 4 views:
SELECT ... FROM Entry JOIN MDat1 ON ... EnMDat1
SELECT ... FROM Entry JOIN MDat2 ON ... EnMDat2
SELECT ... FROM Entry JOIN MDat3 ON ... EnMDat3
SELECT ... FROM Entry JOIN MDat4 ON ... EnMDat4
So this example:
WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)
Goes like:
...EnMDat1 JOIN EnMDat3 ON ... AND EnMDat1.TextValue ='Test12345' AND EnMDat3.DateValue BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’)
...
UNION ALL
...
EnMDat2 JOIN EnMDat4 ON ... AND EnMDat2.NumericValue > 15.3 AND EnMDat4.BoolValue = True
This will work faster than one metadata table. However you shall need an engine to build the queries if you have many different scenarios at where clauses. You can also omit the views and write the statement from scratch each time.

Separating multiple values in one column in MS SQL

I have a field in an application that allows a user to select multiple values. When I query this field in the DB, if multiple values were selected the result gets displayed as one long word. There are no commas or space between the multiple selected values. Is there any way those values can be split by a comma?
Here’s my query:
SELECT HO.Value
FROM HAssessment ha
INNER JOIN HObservation HO
ON HO.AssessmentiD = ha.AssessmentID
AND HO.Patient_Oid = 2255231
WHERE Ho.FindingAbbr = 'A_R_CardHx'
------------------------------------------------
Result:
AnginaArrhythmiaCADCChest Pain
-------------------------
I would like to see:
Angina, Arrhythmia, CADC, Chest Pain
------------------------------------------
Help!
There's no easy solution to this.
The most expedient would be writing a string splitting function. From your sample data, it seems the values are all concatenated together without any separators. This means you'll have to come up with a list of all possible values (hopefully this is a query from some symptoms table...) and parse each one out from the string. This will be complex and painful.
A straightforward way to do this would be to test each valid symptom value to see whether it's contained within HObservation.Value, stuff all the found values together, and return the result. Note that this will perform very poorly.
Here's an example in TSQL. You'd be much better off doing this at the application layer, though, or better yet, normalizing your database (see below for more on that).
declare #symptoms table (symptom varchar(100))
insert into #symptoms (symptom)
values ('Angina'),('Arrhythmia'),('CADC'),('Chest Pain')
declare #value varchar(100)
set #value = 'AnginaArrhythmiaCADCChest Pain'
declare #result varchar(100)
select #result = stuff((
SELECT ', ' + s.symptom
FROM #symptoms s
WHERE patindex('%' + s.symptom + '%',#value) > 0
FOR XML PATH('')),1,1,'')
select #result
The real answer is to restructure your database. Put each distinct item found in HObservation.Value (this means Angina, Arrhythmia, etc. as separate rows) in to some other table if such a table doesn't exist already. I'll call this table Symptom. Then create a lookup table to link HObservation with Symptom. Then drop the HObservation.Value column entirely. Do the splitting work in the application level, and make multiple inserts in to the lookup table.
Example, based on sample data from your question:
HObservation
------------
ID Value
1 AnginaArrhythmiaCADC
Becomes:
HObservation
------------
ID
1
Symptom
-------
ID Value
1 Angina
2 Arrhythmia
3 CADC
HObservationSymptom
-------------------
ID HObservationID SymptomID
1 1 1
1 1 2
1 1 3
Note that if this is a production system (or you want to preserve the existing data for some other reason), you'll still have to write code to do the string splitting.

SQL query select from table and group on other column

I'm phrasing the question title poorly as I'm not sure what to call what I'm trying to do but it really should be simple.
I've a link / join table with two ID columns. I want to run a check before saving new rows to the table.
The user can save attributes through a webpage but I need to check that the same combination doesn't exist before saving it. With one record it's easy as obviously you just check if that attributeId is already in the table, if it is don't allow them to save it again.
However, if the user chooses a combination of that attribute and another one then they should be allowed to save it.
Here's an image of what I mean:
So if a user now tried to save an attribute with ID of 1 it will stop them, but I need it to also stop them if they tried ID's of 1, 10 so long as both 1 and 10 had the same productAttributeId.
I'm confusing this in my explanation but I'm hoping the image will clarify what I need to do.
This should be simple so I presume I'm missing something.
If I understand the question properly, you want to prevent the combination of AttributeId and ProductAttributeId from being reused. If that's the case, simply make them a combined primary key, which is by nature UNIQUE.
If that's not feasible, create a stored procedure that runs a query against the join for instances of the AttributeId. If the query returns 0 instances, insert the row.
Here's some light code to present the idea (may need to be modified to work with your database):
SELECT COUNT(1) FROM MyJoinTable WHERE AttributeId = #RequestedID
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO MyJoinTable ...
END
You can control your inserts via a stored procedure. My understanding is that
users can select a combination of Attributes, such as
just 1
1 and 10 together
1,4,5,10 (4 attributes)
These need to enter the table as a single "batch" against a (new?) productAttributeId
So if (1,10) was chosen, this needs to be blocked because 1-2 and 10-2 already exist.
What I suggest
The stored procedure should take the attributes as a single list, e.g. '1,2,3' (comma separated, no spaces, just integers)
You can then use a string splitting UDF or an inline XML trick (as shown below) to break it into rows of a derived table.
Test table
create table attrib (attributeid int, productattributeid int)
insert attrib select 1,1
insert attrib select 1,2
insert attrib select 10,2
Here I use a variable, but you can incorporate as a SP input param
declare #t nvarchar(max) set #t = '1,2,10'
select top(1)
t.productattributeid,
count(t.productattributeid) count_attrib,
count(*) over () count_input
from (select convert(xml,'<a>' + replace(#t,',','</a><a>') + '</a>') x) x
cross apply x.x.nodes('a') n(c)
cross apply (select n.c.value('.','int')) a(attributeid)
left join attrib t on t.attributeid = a.attributeid
group by t.productattributeid
order by countrows desc
Output
productattributeid count_attrib count_input
2 2 3
The 1st column gives you the productattributeid that has the most matches
The 2nd column gives you how many attributes were matched using the same productattributeid
The 3rd column is how many attributes exist in the input
If you compare the last 2 columns and the counts
match - you can use the productattributeid to attach to the product which has all these attributes
don't match - then you need to do an insert to create a new combination

increase Ids in table

I would like to increase all ids in my table by 1000 cause I need to insert there data from other table with excactly the same ids. What is the best way to do that?
update dbo.table set id = id + 1000
go
The best way to go is to not do that. You have to change all related records as well and if you are using identities it gets even more complicated. If you do anything wrong you will seriousl mess up your data integrity. I would suggest that the data you want to insert is the data that needs to have the values changed and if you need to relate back to the data in another tbale, store the original ID in a new field in the table called something like table2id or database2id. If you can't change the existing table, then you can use a lookup table that has both the old id value and the new one.
Under no circumstances should you attempt something of this nature without taking a backup first.
First as HLGEM it seems to be a bad id (think about your foreign keys on id's you must add 1000 to them to).
Second dbo.table has become sys.tables in Server 2008.
Finally you'll need to find the foreign keys columns with this request :
SELECT name,OBJECT_NAME(object_id)
FROM sys.columns
WHERE name like '%id' or name like 'id%'
--depends on where is 'id' in your columns names
name : the column name, OBJECT_NAME : the table name
And update the whole thing (with a tricky request that should looks like this one, but i didn't test with the "update" command) :
CREATE TABLE #TablesWithIds (
columnName varchar(100),
tableName varchar(100)
)
Insert into #TablesWithIds
SELECT name as columnName,OBJECT_NAME(object_id) as tableName
FROM sys.columns
WHERE name like '%id%'
update #TablesWithIds.tableName set #TablesWithIds.columnName = #TablesWithIds.columnName +1000
drop table #TablesWithIds