Query performance optimization for dynamically joined columns - sql

Current situation in SQL Server database
There is a table Entry with the following columns:
EntryID (int)
EntryName (nvarchar)
EntrySize (int)
EntryDate (datetime)
Further there should be the possibility to save additional metadata for an Entry. Names and values of these metadata should be free to choose and there should be the possibility to dynamically add those without changing the table structure of the database.
Each metadata key can be one of the following data types:
Text
Numeric value
DateTime
Boolean value (True/False)
Thus there is a table DataKey to represent the metadata names and datatypes with the following columns:
DataKeyID (int)
DataKeyName (nvarchar)
DataKeyType (smallint) 0: Text; 1: Numeric; 2: DateTime; 3: Bit
In table DataValue for each combination of Entry and DataKey values can be inserted depending on the data type of the metadata key. For each data type there is one nullable value column. This table has the following columns:
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
TextValue (nvarchar) Nullable
NumericValue (float) Nullable
DateValue (datetime) Nullable
BoolValue (bit) Nullable
Image of the database structure:
TARGET
Target is to retrieve a list of entries fulfilling the specifications like in a WHERE clause. Like the following example:
Assumption:
Meta data key KeyName1 is text
Meta data key KeyName2 is DateTime
Meta data key KeyName3 is numeric
Meta data key KeyName4 is Boolean
Query:
... WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)
Target is to do these queries in a very efficient way, also with a large amount of data like
Number of entries > 2.000.000
Number of data keys between 50 und 100 or maybe > 100
Per entry at least a subset of values specified or maybe also a value for each key (2.000.000 * 100)
PROBLEM
The first problem arises when building the query. Normally queries require to have sets with columns that can be used in the WHERE clause. In this case the columns used in the queries are entries in table DataKey as well to be able to dynamically add metadata without having to change the database table structure.
During research a solution has been found using PIVOT table techniques at runtime. But it turned out that this solution is very slow when there is a large set of data in the database.
QUESTIONS
Is there a more efficient way or structure to save the data for this purpose?
How can the requirements listed above be fulfilled, also with regard to performance and time consumption when querying?
Here is a sql fiddle with the discribed database structure and some sample data: http://www.sqlfiddle.com/#!3/d1912/3

One of the fundamental flaws in an Entity Attribute Value design (which is what you have here) is the difficulty of efficient and performant querying.
The more efficient structure for storing data is to abandon EAV and use a normalised relational form. But that will necessarily involve changing the structure of the database when the data structures change (which should be self evident).
You could abandon your TextValue/NumericValue/DateValue/BoolValue fields and replace them with a single sql_variant column, which would reduce your query complexity slightly, but the fundamental problem will remain.
As a side note, storing all numerics as floats will cause problems if you ever have to deal with money.

I dont feel qualified to comment on what is best, or on design approaches. In fact I'm inclined not to answer at all. However I have thought about your problem and that
you've taken the time to describe it clearly, and this is how I would approach it.
I'd store each metadata datatype in its own table; So
Table MetaData_Text:
ID int identity
EntryID int
KeyName nvarchar(50)
KeyValue nvarchar(max)
MetaData_DateTime, MetaData_Boolean & MetaData_Numeric have the same structure as this, but with the appropriate different datatype of the KeyValue column in each case.
The relationship between an Entry & each of these tables is 0-Many; While every row in each of these tables belongs to one Entry.
To add a new metadata item for an entry, I'd just use a stored procedure taking EntryID, keyname & having optional parameters of possible metadata datatype:
create procedure AddMetaData #entryid int, #keyname varchar(50), #textvalue varchar(max) = null, #datevalue datetime = null, #boolvalue bool = null, #numvalue float = null
as ...
For querying, I would define a set of functions to manage each type of (a) metadata datatype & (b) test needing to be performed on that datatype, for example:
create function MetaData_HasDate_EQ(#entryid int, #keyname varchar(50), #val datetime)
returns bool
as begin
declare #rv bool
select #rv = case when exists(
select 1 from MetaData_DateTime where EntryID = #entryid and KeyName = #keyname and KeyValue = #val) then 1 else 0 end;
return #rv
end
and incorporate function references into required query logic as per
SELECT ...
FROM entry e ...
WHERE (dbo.MetaData_HasText_EQ(e.EntryID, 'KeyName1', 'Test12345') <> 0
AND dbo.MetaData_HasDate_Btwn(e.EntryID, 'KeyName2', '01.09.2012 00:00:00', '01.04.2013 23:59:00') <> 0)
OR (dbo.MetaData_HasNum_GT(e.EntryID, 'KeyName3', 15.3) <> 0
AND dbo.MetaData_HasBool_EQ(e.EntryID, 'KeyName4', 1) <> 0)

I believe that performance issues with that kind of data structure may require the structure to be reworked.
However, I think this fairly simple dynamic sql allows you to query as desired, and appears to run reasonably fast in a quick test I did with over 100,000 rows in the Entry table and 500,000 in the DataValue table.
-- !! CHANGE WHERE CONDITION AS APPROPRIATE
--declare #where nvarchar(max)='where Key0=0'
declare #where nvarchar(max)='where Key1<550'
declare #sql nvarchar(max)='select * from Entry e';
select #sql=#sql
+' outer apply (select '+DataKeyName+'='
+case DataKeyType when 0 then 'TextValue' when 1 then 'NumericValue' when 2 then 'DateValue' when 3 then 'BoolValue' end
+' from DataValue v where v.EntryID=e.EntryID and v.DataKeyID='+cast(DataKeyID as varchar)
+') '+DataKeyName+' '
from DataKey;
set #sql+=#where;
exec(#sql);

You have not specified any background information on how often the table is updated, how often new attributes are added and so on...
Looking at your inputs I think you could use a snapshot which flattens your normalised data. It is not ideal as columns will need to be added manually, but it can be extremely fast. The snapshot could be updated regularly with intervals depending on your users needs.

First to answer why do people use EAV or KVP even if it is so inefficient query-wise? Blogs and textbooks have many plausible reasons. But in real life, it is to avoid dealing with an uncooperative DBA.
For a small organization with small amount of data, it is ok to have a multi-use database (OLTP + DW), because inefficiencies are not noticeable. When your database gets large, it's time to replicate your online data into a data warehouse. In addition, if the data is meant for analytics, it should be replicated further from your relational data warehouse into a dimensional model or a flat and wide for consumption.
This are the data models I would expect from a large organization:
OLTP
Relational Data Warehouse
Dimensional Model for Reporting
Datamarts for Analytics.
So to answer your question, you shouldn't query against your KVP tables and creating a view on top of it doesn't make it better. It should be flattened out (i.e. pivot) into a physical table. What you have is a hybrid of 1 and 2. If there will be no users for #3, just build #4.

Based on Dan Belandi's answer I think the easiest way to use this would be by having a stored procedure/trigger that looks at the meta-data table and builds a view on the Data-table accordingly.
Code would then look like this:
-- drop old view
IF object_id('EntryView') IS NOT NULL DROP VIEW [EntryView]
GO
-- create view based on current meta-information in [DataKey]
DECLARE #crlf char(2)
DECLARE #sql nvarchar(max)
SELECT #crlf = char(13) + char(10)
SELECT #sql = 'CREATE VIEW [EntryView]' + #crlf
+ 'AS' + #crlf
+ 'SELECT *' + #crlf
+ ' FROM [Entry] e'
SELECT #sql = #sql + #crlf
+ ' OUTER APPLY (SELECT '+ QuoteName(DataKeyName) + ' = ' + QuoteName((CASE DataKeyType WHEN 0 THEN 'TextValue'
WHEN 1 THEN 'NumericValue'
WHEN 2 THEN 'DateValue'
WHEN 3 THEN 'BoolValue'
ELSE '<Unknown>' END)) + #crlf
+ ' FROM [DataValue] v WHERE v.[EntryID] = e.[EntryID] AND v.[DataKeyID] = ' + CAST(DataKeyID as varchar) + ') AS ' + QuoteName(DataKeyName)
FROM DataKey
--PRINT #sql
EXEC (#sql)
-- Example usage:
SELECT *
FROM EntryView
WHERE (Key1 = 0 AND Key2 BETWEEN '01.09.2012 00:00:00' AND '01.04.2013 23:59:00')
OR (Key3 > 'Test15.3' AND Key4 LIKE '%1%')

I would use 4 tables, one for each data type:
MDat1
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
TextValue (nvarchar) Nullable
MDat2
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
NumericValue (float) Nullable
MDat3
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
DateValue (datetime) Nullable
MDat4
DataValueID (int)
EntryID (int) Foreign-Key
DataKeyID (int) Foreign-Key
BoolValue (bit) Nullable
If i had partitioning available, i should use it on DataKeyID for all 4 tables.
Then i should used 4 views:
SELECT ... FROM Entry JOIN MDat1 ON ... EnMDat1
SELECT ... FROM Entry JOIN MDat2 ON ... EnMDat2
SELECT ... FROM Entry JOIN MDat3 ON ... EnMDat3
SELECT ... FROM Entry JOIN MDat4 ON ... EnMDat4
So this example:
WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)
Goes like:
...EnMDat1 JOIN EnMDat3 ON ... AND EnMDat1.TextValue ='Test12345' AND EnMDat3.DateValue BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’)
...
UNION ALL
...
EnMDat2 JOIN EnMDat4 ON ... AND EnMDat2.NumericValue > 15.3 AND EnMDat4.BoolValue = True
This will work faster than one metadata table. However you shall need an engine to build the queries if you have many different scenarios at where clauses. You can also omit the views and write the statement from scratch each time.

Related

Do not Update the Values in Merge statement if old values do not change while update in Merge

MERGE PFM_EventPerformance_MetaData AS TARGET
USING
(
SELECT
[InheritanceMeterID] = #InheritanceMeterPointID
,[SubHourlyScenarioResourceID] = #SubHourlyScenarioResourceID
,[MeterID] = #MeterID--internal ID
,[BaselineID] = #BaselineID--internal ID
,[UpdateUtc] = GETUTCDATE()
)
AS SOURCE ON
TARGET.[SubHourlyScenarioResourceID] = SOURCE.[SubHourlyScenarioResourceID]
AND TARGET.[MeterID] = SOURCE.[MeterID]--internal ID
AND TARGET.[BaselineID] = SOURCE.[BaselineID]--internal ID
WHEN MATCHED THEN UPDATE SET
#MetaDataID = TARGET.ID--get preexisting ID when exists (must populate one row at a time)
,InheritanceMeterID = SOURCE.InheritanceMeterID
,[UpdateUtc] = SOURCE.[UpdateUtc]
WHEN NOT MATCHED
THEN INSERT
(
[InheritanceMeterID]
,[SubHourlyScenarioResourceID]
,[MeterID]--internal ID
,[BaselineID]--internal ID
)
VALUES
(
SOURCE.[InheritanceMeterID]
,SOURCE.[SubHourlyScenarioResourceID]
,SOURCE.[MeterID]--internal ID
,SOURCE.[BaselineID]--internal ID
);
In the above query I do not want to update the values in the Target table if there is no change in old values. I am not sure how to achieve this as I have used Merge statement rarely. Please help me with the solution. Thanks in advance
This is done best in two stages.
Stage 1: Merge Update on condition
SO Answer from before (Thanks to #Laurence!)
Stage 2: hash key condition to compare
Limits: max 4000 characters, including column separator characters
A rather simple way to compare multiple columns in one condition is the use of a computed column on both sides that HASHBYTES( , <column(s)> ) generates.
This moves writing lots of code from the merge statement to the table generation.
Quick example:
CREATE TABLE dbo.Test
(
id_column int NOT NULL,
dsc_name1 varchar(100),
dsc_name2 varchar(100),
num_age tinyint,
flg_hash AS HashBytes( 'SHA1',
Cast( dsc_name1 AS nvarchar(4000) )
+ N'•' + dsc_name2 + N'•' + Cast( num_age AS nvarchar(3) )
) PERSISTED
)
;
Comparing columns flg_hash between source and destination will make comparison quick as it is just a comparison between two 20 bit varbinary columns.
Couple of Caveat Emptor for working with HashBytes:
Function only works for a total of 4000 nvarchar characters
Trade off for short comparison code lies in generation of correct order in views and tables
There is a duplicate collision chance of around an 2^50+ for SHA1 - as security mechanism this is now considered insecure and a few years ago MS tried to drop SHA1 as algorithm
Added columns to tables and views can be overlooked from comparison if hash bytes code is outside of consideration for amendments
Overall I found that when comparing multiple columns this can overload my server engines but never had an issue with hash key comparisons

DB Design: how to indicate if a column data is public or private

I'm designing a database and I have a question about how to make private some user data.
I have a user table, with name, city, birthday, biography, etc. And the user can make some data private (other users can see that data).
First, I thought to add columns to the table to indicate if a column is private or not. For example:
User
-------
user_id | name | city | cityIsPrivate | birthday | birthdayIsPrivate
---------+------+------+---------------+----------+------------------
Or, another approach is to add a varchar column to indicate which columns are private:
User
-------
user_id | name | city | birthday | privateColumns
---------+------+------+----------+---------------
And this privateColumns will have this: "city:NO; birthday:YES".
The user table will only have three columns that can be private or public. I will only have to add three columns more to the table.
Any advice?
Do not move data into a separate table if you are going to have to join to it every time you query the main table.
Boolean (or equivalent) columns to indicate privacy for every column on which a privacy setting can be applied:
is very simple to add.
takes up practically no space.
is much quicker to query.
shows that the privacy settings are an intrinsic part of the user data.
removes unnecessary complexity from the schema.
The facts that these relate to other columns and that they represent a single kind of logical object are just a distraction. Go for simplicity, performance, and logical correctness.
Move the list of you private columns to separate table whith three fields, like:
UserId |ColumnName |IsPrivate
-----------------------------
Then you can join your queries with that table and get proper result set for each user, and at the same time change the columns of yor user table.
If your User table would not suppose have changes, it is better to move the list of you private columns to separate table with proper structure, like this:
UserId |cityIsPrivate |birthdayIsPrivate
----------------------------------------
Then you can join your user table with this table in a single query and get result set your need.
But don't make it in the same table. The first approach brings redundancy to your database structure. In your second case you would not be able to make SELECT queries by IsPrivate criterias.
You can have a separate table (UserPrivateFields, for example) listing user ID's along with fields they have elected to make private, like so:
UserID | PrivateField
-------+-------------
1 | Name
1 | City
2 | Birthday
When you're running the procedure grabbing the user info to be pulled by the person requesting the info, you can then build a query like this (assume the desired user's UserID is passed into the proc):
CREATE TABLE #PublicUserFields (Publicfield varchar(50))
INSERT INTO #PublicUserFields (Publicfield)
SELECT Publicfield
FROM userPublicfields
WHERE userid = #userid
Declare #sql VARCHAR(MAX) = 'SELECT '
WHILE EXISTS (SELECT 1 FROM #PublicUserFields)
BEGIN
DECLARE #Publicfield VARCHAR(50) =
(SELECT TOP 1 Publicfield FROM #PublicUserFields)
SET #sql = #SQL + #Publicfield
IF (SELECT COUNT(*) FROM #PublicUSERFIELDS) > 1
BEGIN
SET #SQL = #SQL + ', '
END
DELETE FROM #PublicUserFields
WHERE Publicfield = #Publicfield
END
SET #SQL = #SQL + ' FROM MainTable WHERE userID = #userID'
EXEC(#SQL)
The reason I'm bothering with dynamic SQL is that the names of your public fields can't be joined directly to the column names of the main table with this setup - they can only be selected or joined to other records with the same string value. You could maybe get around this by joining to sys.columns and doing interesting things with the object_id of the columns, but that doesn't seem much easier than this appraoch.
This makes sense IF the users can all dynamically set which fields they want to be viewable by other people. If the private fields are known and static, you may just want to separate the two categories and tighten down the permissions on read-access on the private table.

Work around SQL Server maximum columns limit 1024 and 8kb record size

I am creating a table with 1000 columns. Most of the columns are nvarchar type. Table is created, but with a warning
Warning: The table "Test" has been created, but its maximum row size
exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this
table will fail if the resulting row exceeds the size limit.
Most of the columns of the table already have data in it (i.e. 99% of columns have data).
When I am trying to update any column after the 310th (where as all starting 309 columns having some value) it gives error:
Cannot create a row of size 8061 which is greater than the allowable
maximum row size of 8060.
I am inserting this data to all starting 308 columns
"Lorem ipsum dolor sit amet, consectetur adipisicing elit."
When I am using ntext data type then it is allowing me to update about 450 columns but beyond of that ntext is also not allowing me. I have to update at least 700 columns. Which SQL Server is not allowing to do that. I have the scenario that I cannot move some columns of table to another table.
Actually I am working for an existing window application. It's a very large windows application.
Actually the table in which I am trying to insert up to 700 nvarchar columns data is created dynamically at runtime.
Only in some cases it requires to insert 400-600 columns. But generally it need 100 -200 columns which i am able to process easily.
The problem is that I cannot split this table in multiple tables. Because a lots of tables created with this structures and names of tables are maintained in another table i.e. there are more than 100 tables with this structure and they are being created dynamically. For creating the table and manipulating its data 4-5 languages(C#, Java..) are being used and WCF, Windows Service and Webservices also Involves.
So I don't think that it would be easy manipulate the table and its data after splitting the table. If I split the table then it would require lots of structural changes.
So please suggest me that what would be the best way to solve this issue.
I have also tried to use Sparse Column like:
Create table ABCD(Id int, Name varchar(100) Sparse, Age int);
I have also thought about ColumnStoreIndex but my purpose is not solved.
Sparse column allow me to create 3000 columns for a table but it also restrict me on page size.
Is any way to achieve it using some temporary table or by using any other type of SQL server object?
SQL Server Maximum Columns Limit
Bytes per short string column 8,000
Bytes per GROUP BY, ORDER BY 8,060
Bytes per row 8,060
Columns per index key 16
Columns per foreign key 16
Columns per primary key 16
Columns per nonwide table 1,024
Columns per wide table 30,000
Columns per SELECT statement 4,096
Columns per INSERT statement 4096
Columns per UPDATE statement (Wide Tables) 4096
When you combine varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns that exceed 8,060 bytes per row, consider the following:
Surpassing the 8,060-byte row-size limit might affect performance because SQL Server still maintains a limit of 8 KB per page. When a combination of varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns exceeds this limit, the SQL Server Database Engine moves the record column with the largest width to another page in the ROW_OVERFLOW_DATA allocation unit, while maintaining a 24-byte pointer on the original page. Moving large records to another page occurs dynamically as records are lengthened based on update operations. Update operations that shorten records may cause records to be moved back to the original page in the IN_ROW_DATA allocation unit. Also, querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously.
Therefore, when you design a table with multiple varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns, consider the percentage of rows that are likely to flow over and the frequency with which this overflow data is likely to be queried. If there are likely to be frequent queries on many rows of row-overflow data, consider normalizing the table so that some columns are moved to another table. This can then be queried in an asynchronous JOIN operation.
The length of individual columns must still fall within the limit of
8,000 bytes for varchar, nvarchar, varbinary, sql_variant, and CLR
user-defined type columns. Only their combined lengths can exceed the
8,060-byte row limit of a table.
The sum of other data type columns, including char and nchar data,
must fall within the 8,060-byte row limit. Large object data is also
exempt from the 8,060-byte row limit.
The index key of a clustered index cannot contain varchar columns
that have existing data in the ROW_OVERFLOW_DATA allocation unit. If
a clustered index is created on a varchar column and the existing
data is in the IN_ROW_DATA allocation unit, subsequent insert or
update actions on the column that would push the data off-row will
fail. For more information about allocation units, see Table and
Index Organization.
You can include columns that contain row-overflow data as key or
nonkey columns of a nonclustered index.
The record-size limit for tables that use sparse columns is 8,018
bytes. When the converted data plus existing record data exceeds
8,018 bytes, MSSQLSERVER ERROR 576 is returned. When columns are
converted between sparse and nonsparse types, Database Engine keeps a
copy of the current record data. This temporarily doubles the storage
that is required for the record. .
To obtain information about tables or indexes that might contain
row-overflow data, use the sys.dm_db_index_physical_stats dynamic
management function.
Creating table with n number of columns and datatype Nvarchar
CREATE Proc [dbo].[CreateMaxColTable_Nvarchar500]
(#TableName nvarchar(100),#NumofCols int)
AS
BEGIN
DECLARE #i INT
DECLARE #MAX INT
DECLARE #SQL VARCHAR(MAX)
DECLARE #j VARCHAR(10)
DECLARE #len int
SELECT #i=1
SELECT #MAX=#NumofCols
SET #SQL='CREATE TABLE ' + #TableName + '('
WHILE #i<=#MAX
BEGIN
select #j= cast(#i as varchar)
SELECT #SQL= #SQL+'X'+#j +' NVARCHAR(500) , '
SET #i = #i + 1
END
select #len=len(#SQL)
select #SQL = substring(#SQL,0,#len-1)
SELECT #SQL= #SQL+ ' )'
exec (#SQL)
END
For more information you can visit these links:
http://msdn.microsoft.com/en-us/library/ms186981%28SQL.105%29.aspx?PHPSESSID=tn8k5p1s508cop8gr43e1f34d2
http://technet.microsoft.com/en-us/library/ms143432.aspx
But please could you tell the scenario why do you need a table with so much columns?
I think you should consider about the re-design of the database.
This simply isn't possible. See Inside the Storage Engine: Anatomy of a record
Assuming your table is something like this.
CREATE TABLE T1(
col_1 varchar(8000) NULL,
col_2 varchar(8000) NULL,
/*....*/
col_999 varchar(8000) NULL,
col_1000 varchar(8000) NULL
)
Then even a row with all NULL values will use the following storage.
1 byte status bits A
1 byte status bits B
2 bytes column count offset
125 bytes NULL_BITMAP (1 bit per column for 1,000 columns)
So that is a guaranteed 129 bytes used up already (leaving 7,931).
If any of the columns have a value that is not either NULL or an empty string then you also need space for
2 bytes variable length column count (leaving 7,929).
Anywhere between 2 - 2000 bytes for the column offset array.
The data itself.
The column offset array consumes 2 bytes per variable length column except if that column and all later columns are also zero length. So updating col_1000 would force the entire 2000 bytes to be used whereas updating col_1 would just use 2 bytes.
So you could populate each column with 5 bytes of data and when taking into account the 2 bytes each in the column offset array that would add up to 7,000 bytes which is within the 7,929 remaining.
However the data you are storing is 102 bytes (51 nvarchar characters) so this can be stored off row with a 24 byte pointer to the actual data remaining in row.
FLOOR(7929/(24 + 2)) = 304
So the best case would be that you could store 304 columns of this length data and that is if you are updating from col_1, col_2, .... If col_1000 contains data then the calculation is
FLOOR(5929/24) = 247
For NTEXT the calculation is similar except it can use a 16 byte pointer which would allow you to squeeze data into a few extra columns
FLOOR(7929/(16 + 2)) = 440
The need to follow all these off row pointers for any SELECT against the table would likely be highly detrimental to performance.
Script to test this
DROP TABLE T1
/* Create table with 1000 columns*/
DECLARE #CreateTableScript nvarchar(max) = 'CREATE TABLE T1('
SELECT #CreateTableScript += 'col_' + LTRIM(number) + ' VARCHAR(8000),'
FROM master..spt_values
WHERE type='P' AND number BETWEEN 1 AND 1000
ORDER BY number
SELECT #CreateTableScript += ')'
EXEC(#CreateTableScript)
/* Insert single row with all NULL*/
INSERT INTO T1 DEFAULT VALUES
/*Updating first 304 cols succeed. Change to 305 and it fails*/
DECLARE #UpdateTableScript nvarchar(max) = 'UPDATE T1 SET '
SELECT #UpdateTableScript += 'col_' + LTRIM(number) + ' = REPLICATE(1,1000),'
FROM master..spt_values
WHERE type='P' AND number BETWEEN 1 AND 304
ORDER BY number
SET #UpdateTableScript = LEFT(#UpdateTableScript,LEN(#UpdateTableScript)-1)
EXEC(#UpdateTableScript)
Having table with 1.000 columns tells you that there is something very wrong in database design.
I have inherited project in which one of the tables had more than 500 columns and after more than one year I am still unable to significantly reduce it, because I will have to rework 90% of the application.
So redesign your DB before it is too late.
Max Columns per 'nonwide' table: 1,024
Max Columns per 'wide' table: 30,000
Although what is exactly the case you require this number per single table ?
It's highly recommended to partition your table vertically several times to get better performance and easier development.
Creating table with n number of columns and datatype Nvarchar
CREATE Proc [dbo].[CreateMaxColTable_Nvarchar500]
(#TableName nvarchar(100),#NumofCols int)
AS
BEGIN
DECLARE #i INT
DECLARE #MAX INT
DECLARE #SQL VARCHAR(MAX)
DECLARE #j VARCHAR(10)
DECLARE #len int
SELECT #i=1
SELECT #MAX=#NumofCols
SET #SQL='CREATE TABLE ' + #TableName + '('
WHILE #i<=#MAX
BEGIN
select #j= cast(#i as varchar)
SELECT #SQL= #SQL+'A'+#j +' NVARCHAR(500) , '
SET #i = #i + 1
END
select #len=len(#SQL)
select #SQL = substring(#SQL,0,#len-1)
SELECT #SQL= #SQL+ ' )'
exec (#SQL)
END
Please check whether you really need to use nvarchar for all columns, nvarchar takes twice as bytes as varchar, so converting the columns to varchar where possible can reduce row data size and might help you overcome this error.
We had application which captures 5000 fields for a loan application. All fields are dependent on a single primary key loanid. We could have split the table into multiples but the fields are also dynamic. The admin also has a feature to create more fields. So everything is dynamic. They only good thing was a one to one relationship between loanid and fields.
So, in the end we went with XML solution. The entire data is store in an xml document. Maximum flexibility but makes it diffifcult to query and report of.

Separating multiple values in one column in MS SQL

I have a field in an application that allows a user to select multiple values. When I query this field in the DB, if multiple values were selected the result gets displayed as one long word. There are no commas or space between the multiple selected values. Is there any way those values can be split by a comma?
Here’s my query:
SELECT HO.Value
FROM HAssessment ha
INNER JOIN HObservation HO
ON HO.AssessmentiD = ha.AssessmentID
AND HO.Patient_Oid = 2255231
WHERE Ho.FindingAbbr = 'A_R_CardHx'
------------------------------------------------
Result:
AnginaArrhythmiaCADCChest Pain
-------------------------
I would like to see:
Angina, Arrhythmia, CADC, Chest Pain
------------------------------------------
Help!
There's no easy solution to this.
The most expedient would be writing a string splitting function. From your sample data, it seems the values are all concatenated together without any separators. This means you'll have to come up with a list of all possible values (hopefully this is a query from some symptoms table...) and parse each one out from the string. This will be complex and painful.
A straightforward way to do this would be to test each valid symptom value to see whether it's contained within HObservation.Value, stuff all the found values together, and return the result. Note that this will perform very poorly.
Here's an example in TSQL. You'd be much better off doing this at the application layer, though, or better yet, normalizing your database (see below for more on that).
declare #symptoms table (symptom varchar(100))
insert into #symptoms (symptom)
values ('Angina'),('Arrhythmia'),('CADC'),('Chest Pain')
declare #value varchar(100)
set #value = 'AnginaArrhythmiaCADCChest Pain'
declare #result varchar(100)
select #result = stuff((
SELECT ', ' + s.symptom
FROM #symptoms s
WHERE patindex('%' + s.symptom + '%',#value) > 0
FOR XML PATH('')),1,1,'')
select #result
The real answer is to restructure your database. Put each distinct item found in HObservation.Value (this means Angina, Arrhythmia, etc. as separate rows) in to some other table if such a table doesn't exist already. I'll call this table Symptom. Then create a lookup table to link HObservation with Symptom. Then drop the HObservation.Value column entirely. Do the splitting work in the application level, and make multiple inserts in to the lookup table.
Example, based on sample data from your question:
HObservation
------------
ID Value
1 AnginaArrhythmiaCADC
Becomes:
HObservation
------------
ID
1
Symptom
-------
ID Value
1 Angina
2 Arrhythmia
3 CADC
HObservationSymptom
-------------------
ID HObservationID SymptomID
1 1 1
1 1 2
1 1 3
Note that if this is a production system (or you want to preserve the existing data for some other reason), you'll still have to write code to do the string splitting.

Enumerated text columns in SQL

I have a number of tables that have text columns that contain only a few different distinct values. I often play the tradeoff between the benefits (primarily reduced row size) of extracting the possible values into a lookup table and storing a small index in the table against the amount of work required to do so.
For the columns that have a fixed set of values known in advance (enumerated values), this isn't so bad, but the more painful case is when I know I have a small set of unique values, but I don't know in advance what they will be.
For example, if I have a table that stores log information on different URLs in a web application:
CREATE TABLE [LogData]
(
ResourcePath varchar(1024) NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
I waste a lot of space by repeating the for every request. There will be a very number of duplicate entries in this table. I usually end up with something like this:
CREATE TABLE [LogData]
(
ResourcePathId smallint NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
CREATE TABLE [ResourcePaths]
(
ResourcePathId smallint NOT NULL,
ResourceName varchar(1024) NOT NULL
)
In this case however, I no longer have a simple way to append data to the LogData table. I have to a lookup on the resource paths table to get the Id, add it if it is missing, and only then can I perform the actual insert. This makes the code much more complicated and changes my write-only logging function to require some sort of transacting against the lookup table.
Am I missing something obvious?
If you have a unique index on ResourseName, the lookup should be very fast even on a big table. However, it has disadvantages. For instance, if you log a lot of data and have to archive it off periodically and want to archive the previous month or year of logdata, you are forced to keep all of resoursepaths. You can come up with solutions for all of that.
yes inserting from existing data doing the lookup as part of the insert
Given #resource, #time and #data as inputs
insert( ResourcePathId, EventTime, ExtraData)
select ResourcePathId, #time, #data
from ResourcePaths
where ResourceName = #resource