Hash Table Data Structure in SQL Server - sql

For the last few days, I've been reading an ebook on data structures and well, frankly speaking, many things are already gone from my head. Just reviewing them and trying to make clear again. I was going through hash tables and get to familiar with it again. So I know and heard, SQL Server uses hash tables internally and many of the threads of stackoverflow.com and forums.asp.net asked about creating hash tables in SQL Server as it stores temporary data. So let me give an example that I've used in a stored procedure using temp table: (Avoid it and it's too long. Just for an example)
1st:
CREATE PROCEDURE [dbo].[Orders]
#OrderLine int
AS
BEGIN
DECLARE #t1 TABLE(Date1 date,
OrderID VARCHAR(MAX),
EmployeeName VARCHAR(MAX),
DeliveryDate date,
StoreName VARCHAR(MAX),
DeliveryAddress VARCHAR(MAX),
ItemName VARCHAR(MAX),
Quantity FLOAT)
INSERT INTO #t1(Date1, OrderID, EmployeeName, DeliveryDate, StoreName, DeliveryAddress, ItemName, Quantity)
(SELECT DISTINCT
CONVERT(VARCHAR(11), DemandOrder.POCreationDate, 6) AS DemandOrderDate,
DemandOrder.OrderID, EmployeeDetails.EmployeeName,
CONVERT(DATE, DemandOrder.DeliveryDate) AS ExpectedDeliveryDate,
StoreDetails.StoreName,
DemandOrder.DeliveryAddress, Item.ItemName,
DemandOrderLine.Quantity
FROM
DemandOrder
INNER JOIN
DemandOrderLine ON DemandOrder.OrderID = DemandOrderLine.OrderID
INNER JOIN
Item on DemandOrderLine.ItemID=Item.ItemID
INNER JOIN
EmployeeDetails ON EmployeeDetails.EmployeeID = DemandOrder.EmployeeID
INNER JOIN
StoreDetails ON DemandOrderLine.StoreID = StoreDetails.StoreID
WHERE
DemandOrderLine.OrderLine = #OrderLine)
DECLARE #t2 TABLE(Approvedby VARCHAR(MAX))
INSERT INTO #t2(Approvedby)
(SELECT EmployeeDetails.EmployeeName
FROM EmployeeDetails
INNER JOIN DemandOrderLine ON DemandOrderLine.ApprovedBy = EmployeeDetails.EmployeeID)
SELECT DISTINCT
CONVERT(VARCHAR(11), Date1, 6) AS Date,
OrderID, EmployeeName,
CONVERT(VARCHAR(11), DeliveryDate, 6) AS ExpectedDeliveryDate,
StoreName, Approvedby, DeliveryAddress,
ItemName, Quantity
FROM
#t1
CROSS JOIN
#t2
END
Another one, from an example, that says in stored procedure, hash tables can't be used. So here it's:
2nd:
CREATE PROCEDURE TempTable AS ---- It's actually not possible in SP
CREATE table #Color
(
Color varchar(10) PRIMARY key
)
INSERT INTO #color
SELECT 'Red'
UNION
SELECT 'White'
UNION
SELECT 'green'
UNION
SELECT 'Yellow'
UNION
SELECT 'blue'
DROP TABLE #color
CREATE table #Color
(
Color varchar(10) PRIMARY key
)
INSERT INTO #color
SELECT 'Red'
UNION
SELECT 'White'
UNION
SELECT 'green'
UNION
SELECT 'Yellow'
UNION
SELECT 'blue'
DROP TABLE #color
GO
So my question is can I say the 1st one is an example of hash table as it uses temp tables and if not, why can't we use it in the stored procedure? Again, if it's created internally, why do we need to create a hash table again for working purposes (Though it has performance issues, just wondering to know if the above examples serve for the purpose). Thanks.
Note: I faced an interview last month and was discussing about it. That's why making sure if I was correct in my views.

Hash-based algorithms are important for any powerful database. These are used for aggregation and join operations. Hash-based joins have been there since version 7.0 -- which is really old (thanks to Martin Smith). You can read more about them in the documentation.
SQL Server 2014 introduced hash-based indexes for memory optimized tables (see here). These are an explicit use of hash tables. In general, though, the tree-based indexes are more powerful because they can be used in more situations:
For range lookups (including like).
For partial key matches.
For order by.
A hash index can only be used for an exact equality match (and group by).

I know im a little late to the party, but I dont think anyone has directly answered your original question.
The first is an example of a table variable and the second is an example of a local table, both are created in the tempdb
The difference between them is that a table variable is not created in memory and cant have a clustered index.
Also a local (hash) table will stick around until that single connection ends, while a table variable is only available for the batch its declared in.
A global table (using a double hash before it) will be available to all connections and persist until all connections using it are closed.
One final thing, the only reason you cant use that local table in a stored procedure is because it uses the same name twice, even though you've used drop table it evaluates it based on the creates in the batch first. So it wont execute anything and moan it already exists.

DECLARE #SEPERATOR as VARCHAR(1)
DECLARE #SP INT
DECLARE #VALUE VARCHAR(MAX)
SET #SEPERATOR = ','
CREATE TABLE #TempCode (id int NOT NULL)
/**this Region For Storing SiteCode**/
WHILE PATINDEX('%' + #SEPERATOR + '%', #Code ) <> 0
BEGIN
SELECT #SP = PATINDEX('%' + #SEPERATOR + '%' ,#Code)
SELECT #VALUE = LEFT(#Code , #SP - 1)
SELECT #Code = STUFF(Code, 1, #SP, '')
INSERT INTO #TempCode (id) VALUES (#VALUE)
END

Related

Optimizing SQL query to return Record with tags

I was looking for help to optimize a query I am writing for SQL Server. Given this database schema:
TradeLead object, a record in this table is a small article.
CREATE TABLE [dbo].[TradeLeads]
(
[TradeLeadID] INT NOT NULL PRIMARY KEY IDENTITY(1,1),
Title nvarchar(250),
Body nvarchar(max),
CreateDate datetime,
EditDate datetime,
CreateUser nvarchar(250),
EditUser nvarchar(250),
[Views] INT NOT NULL DEFAULT(0)
)
Here's the cross reference table to link a TradeLead article to an Industry record.
CREATE TABLE [dbo].[TradeLeads_Industries]
(
[ID] INT NOT NULL PRIMARY KEY IDENTITY(1,1),
[TradeLeadID] INT NOT NULL,
[IndustryID] INT NOT NULL
)
Finally, the schema for the Industry object. These are essentially just tags, but a user is unable to enter these. The database will have a specific amount.
CREATE TABLE [dbo].[Industries]
(
IndustryID INT NOT NULL PRIMARY KEY identity(1,1),
Name nvarchar(200)
)
The procedure I'm writing is used to search for specific TradeLead records. The user would be able to search for keywords in the title of the TradeLead object, search using a date range, and search for a TradeLead with specific Industry Tags.
The database will most likely be holding around 1,000,000 TradeLead articles and about 30 industry tags.
This is the query I have come up with:
DECLARE #Title nvarchar(50);
SET #Title = 'Testing';
-- User defined table type containing a list of IndustryIDs. Would prob have around 5 selections max.
DECLARE #Selectedindustryids IndustryIdentifierTable_UDT;
DECLARE #Start DATETIME;
SET #Start = NULL;
DECLARE #End DATETIME;
SET #End = NULL;
SELECT *
FROM(
-- Subquery to return all the tradeleads that match a user's criteria.
-- These fields can be null.
SELECT TradeLeadID,
Title,
Body,
CreateDate,
CreateUser,
Views
FROM TradeLeads
WHERE(#Title IS NULL OR Title LIKE '%' + #Title + '%') AND (#Start IS NULL OR CreateDate >= #Start) AND (#End IS NULL OR CreateDate <= #End)) AS FTL
INNER JOIN
-- Subquery to return the TradeLeadID for each TradeLead record with related IndustryIDs
(SELECT TI.TradeLeadID
FROM TradeLeads_Industries TI
-- Left join the selected IndustryIDs to the Cross reference table to get the TradeLeadIDs that are associated with a specific industry.
LEFT JOIN #SelectedindustryIDs SIDS
ON SIDS.IndustryID = TI.IndustryID
-- It's possible the user has not selected any IndustryIDs to search for.
WHERE (NOT EXISTS(SELECT 1 FROM #SelectedIndustryIDs) OR SIDS.IndustryID IS NOT NULL)
-- Group by to reduce the amount of records.
GROUP BY TI.TradeLeadID) AS SelectedIndustries ON SelectedIndustries.TradeLeadID = FTL.TradeLeadID
With about 600,000 TradeLead records and with an average of 4 IndustryIDs attached to each one, the query takes around 8 seconds to finish on a local machine. I would like to get it as fast as possible. Any tips or insight would be appreciated.
There's a few points here.
Using constructs like (#Start IS NULL OR CreateDate >= #Start) can cause a problem called parameter sniffing. Two ways of working around it are
Add Option (Recompile) to the end of the query
Use dynamic SQL to only include the criteria that the user has asked for.
I would favour the second method for this data.
Next, the query can be rewritten to be more efficient by using exists (assuming the user has entered industry ids)
select
TradeLeadID,
Title,
Body,
CreateDate,
CreateUser,
[Views]
from
dbo.TradeLeads t
where
Title LIKE '%' + #Title + '%' and
CreateDate >= #Start and
CreateDate <= #End and
exists (
select
'x'
from
dbo.TradeLeads_Industries ti
inner join
#Selectedindustryids sids
on ti.IndustryID = sids.IndustryID
where
t.TradeLeadID = ti.TradeLeadID
);
Finally you will want at least one index on the dbo.TradeLeads_Industries table. The following are candidates.
(TradeLeadID, IndustryID)
(IndustryID, TradeLeadID)
Testing will tell you whether one or both is useful.

SQL How to Split One Column into Multiple Variable Columns

I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:
190230943204;190234443204;
However, some rows have more numbers than others, so in the database you can have
190230943204;190234443204;
121340944534;340212343204;134530943204
I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.
Ideas? Let me know if I must clarify anything.
Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.
CREATE TABLE main_entity (
id INT PRIMARY KEY,
other_fields INT
);
CREATE TABLE entity_properties (
main_entity_id INT PRIMARY KEY,
property_value INT,
FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);
entity_properties.main_entity_id is a foreign key to main_entity.id.
Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.
Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.
If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):
select substring(col, 1+13*(n-1), 12) as val
from t join
(select 1 as n union all select union all select 3
) n
on len(t.col) <= 13*n.n
This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.
IF I were you, I would create a simple function that is dividing values separated with ';' like this:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[fn_Split_List]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_Split_List](#List NVARCHAR(512))
RETURNS #ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#List, ';', ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #ResultRowset ([Value])
SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') Tbl(Col)
RETURN
END
GO
Than simply called in this way:
SET NOCOUNT ON
GO
DECLARE #RawData TABLE( [Value] NVARCHAR(256))
INSERT INTO #RawData ([Value] )
VALUES ('1111111;22222222')
,('3333333;113113131')
,('776767676')
,('89332131;313131312;54545353')
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
SET NOCOUNT OFF
GO
The result is as the follow:
Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131
Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.
Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.
EDIT:
As I can see, there are some rows in your example that will caused the function to return empty strings. For example:
number;number;
will return:
number
number
'' (empty string)
To clear them, just add the following where clause to the statement above like this:
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
WHERE LEN(SL.[Value]) > 0

SQL separate string being passed

I have a product table with a tag column, each product has multiple tags stored in this format: "|technology|mobile|acer|laptop|" ...second product's tags could look like this "|computer|laptop|toshiba|"
I am using MS SQL Server 2008 and stored procedure, I would like to know how I could pass a string like "|computer|laptop|" and get both records returned as they both have the tag laptop in them and if I passed "|computer|" only the second record would return as it is the only one comtainning that tag.
What is the best way of doing this without performance penalties using stored procedure?
I have so far had no luck with different codes i have found on the internet, I really hope you guys can maybe help me with this, thank you.
I agree with the other posters that storing data in a column like that is going to cause headaches. You really want to store those tags in a child table so you can easily and efficiently join them. If it's an inherited system or something you can't refactor right away you can write a split function.
The typical sql split implementation uses a while loop and a table variable in a multi-statement TVF. Every iteration incurs more I/O and CPU overhead. Performance testing on SQL 2005 SP1 showed that this overhead is hidden from the I/O Stats and query plan. Profiling the code will reveal the true cost.
Rewriting that function into a inline TVF is much more efficient. The primary difference between an inline and multi-statement TVF is the Query Optimizer will merge the inline function into the query before processing; this eliminates the overhead from the function call. Also, since there is no table variable required, the additional I/O cost is eliminated. Finally, you avoid the costly iterative processing.
Here is the fastest, most scalable split function I could come up with including unit tests and summary.
This function requires a numbers table:
CREATE TABLE dbo.Numbers
(
NUM INT PRIMARY KEY CLUSTERED
)
;WITH Nbrs ( n ) AS
(
SELECT 1 UNION ALL
SELECT 1 + n FROM Nbrs WHERE n < 10000
)
INSERT INTO dbo.Numbers
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 10000 )
The source of the function is here:
IF EXISTS (
SELECT 1
FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[ParseString]')
AND xtype in (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[ParseString]
END
GO
CREATE FUNCTION dbo.ParseString (#String VARCHAR(8000), #Delimiter VARCHAR(10))
RETURNS TABLE
AS
/*******************************************************************************************************
* dbo.ParseString
*
* Creator: MagicMike
* Date: 9/12/2006
*
*
* Outline: A set-based string tokenizer
* Takes a string that is delimited by another string (of one or more characters),
* parses it out into tokens and returns the tokens in table format. Leading
* and trailing spaces in each token are removed, and empty tokens are thrown
* away.
*
*
* Usage examples/test cases:
Single-byte delimiter:
select * from dbo.ParseString2('|HDI|TR|YUM|||', '|')
select * from dbo.ParseString('HDI| || TR |YUM', '|')
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', '|')
select * from dbo.ParseString2('HDI|||TR|YUM', '|')
select * from dbo.ParseString('', '|')
select * from dbo.ParseString('YUM', '|')
select * from dbo.ParseString('||||', '|')
select * from dbo.ParseString('HDI TR YUM', ' ')
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by Ident
select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by StringValue
Multi-byte delimiter:
select * from dbo.ParseString('HDI and TR', 'and')
select * from dbo.ParseString('Pebbles and Bamm Bamm', 'and')
select * from dbo.ParseString('Pebbles and sandbars', 'and')
select * from dbo.ParseString('Pebbles and sandbars', ' and ')
select * from dbo.ParseString('Pebbles and sand', 'and')
select * from dbo.ParseString('Pebbles and sand', ' and ')
*
*
* Notes:
1. A delimiter is optional. If a blank delimiter is given, each byte is returned in it's own row (including spaces).
select * from dbo.ParseString3('|HDI|TR|YUM|||', '')
2. In order to maintain compatibility with SQL 2000, ident is not sequential but can still be used in an order clause
If you are running on SQL2005 or later
SELECT Ident, StringValue FROM
with
SELECT Ident = ROW_NUMBER() OVER (ORDER BY ident), StringValue FROM
*
*
* Modifications
*
*
********************************************************************************************************/
RETURN (
SELECT Ident, StringValue FROM
(
SELECT Num as Ident,
CASE
WHEN DATALENGTH(#delimiter) = 0 or #delimiter IS NULL
THEN LTRIM(SUBSTRING(#string, num, 1)) --replace this line with '' if you prefer it to return nothing when no delimiter is supplied. Remove LTRIM if you want to return spaces when no delimiter is supplied
ELSE
LTRIM(RTRIM(SUBSTRING(#String,
CASE
WHEN (Num = 1 AND SUBSTRING(#String,num ,DATALENGTH(#delimiter)) <> #delimiter) THEN 1
ELSE Num + DATALENGTH(#delimiter)
END,
CASE CHARINDEX(#Delimiter, #String, Num + DATALENGTH(#delimiter))
WHEN 0 THEN LEN(#String) - Num + DATALENGTH(#delimiter)
ELSE CHARINDEX(#Delimiter, #String, Num + DATALENGTH(#delimiter)) - Num -
CASE
WHEN Num > 1 OR (Num = 1 AND SUBSTRING(#String,num ,DATALENGTH(#delimiter)) = #delimiter)
THEN DATALENGTH(#delimiter)
ELSE 0
END
END
)))
End AS StringValue
FROM dbo.Numbers
WHERE Num <= LEN(#String)
AND (
SUBSTRING(#String, Num, DATALENGTH(ISNULL(#delimiter,''))) = #Delimiter
OR Num = 1
OR DATALENGTH(ISNULL(#delimiter,'')) = 0
)
) R WHERE StringValue <> ''
)
For your case, you could use it like this:
--SAMPLE DATA
CREATE TABLE #products
(
productid INT IDENTITY PRIMARY KEY CLUSTERED ,
prodname VARCHAR(200),
tags VARCHAR(200)
)
INSERT INTO #products (prodname, tags)
SELECT 'toshiba laptop', '|laptop|toshiba|notebook|'
UNION ALL
SELECT 'toshiba netbook', '|netbook|toshiba|'
UNION ALL
SELECT 'Apple macbook', '|laptop|apple|notebook|'
UNION ALL
SELECT 'Apple mouse', '|apple|mouse'
--Actual solution
DECLARE #searchTags VARCHAR(200)
SET #searchTags = '|apple|laptop|' --This would the string that would get passed in if it were a stored procedure
--First we convert the supplied tags into a table for use later
--My (2005) dev box raised a severe error attempting to do the search in 1 step
--hence the temp table
CREATE TABLE #tags
(
tag VARCHAR(200) PRIMARY KEY CLUSTERED
)
INSERT INTO #tags --The function splits the string up into one record for each value
SELECT stringValue
FROM dbo.parsestring(#searchTags,'|') --SQL 2005 has a real problem joining to a TVF twice, apparently
SELECT DISTINCT p.*
FROM #products P --we join the products table with the function to get a row for each tag so we can compare with the temp table
CROSS APPLY (SELECT stringValue FROM dbo.parsestring(P.tags,'|')) T
WHERE EXISTS(SELECT * FROM #tags WHERE tag = T.stringValue) --we compare the rows with our temp table and if we get matches, the products are returned
/*This will return the Apple Macbook and the Toshiba Laptop because they both contain
the 'laptop' tag and the Apple mouse because it contains the 'apple' tag. The
toshiba netbook contains neither tag so it won't be returned.*/
But, with your tags in a separate table as suggested (1-many for a simplified example) It would look like this:
SELECT * FROM Products P
WHERE EXISTS (SELECT *
FROM tags T
INNER JOIN dbo.parsestring(#tags,'|') Q
ON T.tag = Q.StringValue
WHERE T.productid = P.productiId )
You have a many-to-many relationship between products and tags. The best way of doing this is to redesign your database. Create a table of tags and a junction table that links products with tags.
That's not a very good design. Combining like terms into one field and separating them with a delimiter such as a vertical bar does not scale well and it is very limiting.
I recommend you read up on how to design databases. The best book I ever bought regarding database design was Database Design for Mere Mortals by Michael Hernandez ISBN: 0-201-69471-9. Amazon Listing I noticed he has a second edition.
He walks you through the entire process of (from start to finish) of designing a database. I recommend you start with this book.
You have to learn to look at things in groups or chunks. Database design has simple building blocks just like programming does. If you gain a thorough understanding of these simple building blocks you can tackle any database design.
In programming you have:
If Constructs
If Else Constructs
Do While Loops
Do Until Loops
Case Constructs
With databases you have:
Data Tables
Lookup Tables
One to One relationships
One to Many Relationships
Many to Many relationships
Primary keys
Foreign keys
The simpler you make things the better. A database is nothing more than a place where you put data into cubbie holes. Start by identifying what these cubbie holes are and what kind of stuff you want in them.
You are never going to create the perfect database design the first time you try. This is a fact. Your design will go through several refinements during the process. Sometimes things won't seem apparent until you start entering data, and then you have an ahh ha moment.
The web brings it's own sets of challenges. Bandwith issues. Statelessness. Erroneous data from processes that start but never get finished.
make a split with CLR function return a table with the value or pass as xml and load it into a table varible an make a join
create procedure search
(
#data xml
)
AS
BEGIN
--declare #data xml
declare #LoadData table
(
dataToFind varchar(max)
)
--set #data= cast(
--'<data>
-- <item>computer</item>
-- <item>television</item>
--</data>' as xml)
insert into #LoadData
SELECT T2.Loc.value('.','varchar(max)')
FROM (select #data as data )T
CROSS APPLY data.nodes('/data/item') as T2(Loc)
select * from #LoadData--use for join
END
I would suggest you write an extra couplle of tables that with "proper design,
Populate those tables from the existing not well designed bit - this way y our search will work properly buy others using the old | pipe approach won't notice till you have time to refactor

Dynamic table design (common lookup table), need a nice query to get the values

sql2005
This is my simplified example:
(in reality there are 40+ tables in here, I only showed 2)
I got a table called tb_modules, with 3 columns (id, description, tablename as varchar):
1, UserType, tb_usertype
2, Religion, tb_religion
(Last column is actually the name of a different table)
I got an other table that looks like this:
tb_value (columns:id, tb_modules_ID, usertype_OR_religion_ID)
values:
1111, 1, 45
1112, 1, 55
1113, 2, 123
1114, 2, 234
so, I mean 45, 55, 123, 234 are usertype OR religion ID's
(45, 55 usertype, 123, 234 religion ID`s)
Don't judge, I didn't design the database
Question
How can I make a select, showing * from tb_value, plus one column
That one column would be TITLE from the tb_usertype or RELIGIONNAME from the tb_religion table
I would like to make a general thing.
Was thinking initially about maybe a SQL function that returns a string, but I think I would need dynamic SQL, which is not ok in a function.
Anyone a better idea ?
At the beginning we have this -- which is quite messy.
To clean-up a bit I add two views and a synonym:
create view v_Value as
select
ID as ValueID
, tb_modules_ID as ModuleID
, usertype_OR_religion_ID as RemoteID
from tb_value ;
go
create view v_Religion as
select
ID
, ReligionName as Title
from tb_religion ;
go
create synonym v_UserType for tb_UserType ;
go
And now the model looks like
It is easier now to write the query
;
with
q_mod as (
select
m.ID as ModuleID
, coalesce(x1.ID , x2.ID) as RemoteID
, coalesce(x1.Title , x2.Title) as Title
, m.Description as ModuleType
from tb_Modules as m
left join v_UserType as x1 on m.TableName = 'tb_UserType'
left join v_Religion as x2 on m.TableName = 'tb_Religion'
)
select
a.ModuleID
, v.ValueID
, a.RemoteID
, a.ModuleType
, a.Title
from q_mod as a
join v_Value as v on (v.ModuleID = a.ModuleID and v.RemoteID = a.RemoteID) ;
There is an obvious pattern in this query, so it can be created as dynamic sql if you have to add another module-type table. When adding another table, use ID and Title to avoid having to use a view.
EDIT
To build dynamic sql (or query on application level)
Modify lines 6 and 7, the x-index is tb_modules.id
coalesce(x1. , x2. , x3. ..)
Add lines to the left join (below line 11)
left join v_SomeName as x3 on m.TableName = 'tb_SomeName'
The SomeName is tb_modules.description and x-index is matching tb_modules.id
EDIT 2
The simplest would probably be to package the above query into a view and then each time the schema changes dynamically crate and run ALTER VIEW. This way the query would not change from the point of the application.
Since we're all agreed the design is flaky, I'll skip any comments on that. The pattern of the query is this:
-- Query 1
select tb_value.*,tb_religion.religion_name as ANY_DESCRIPTION
from tb_value
JOIN tb_religion on tb_value.ANY_KIND_OF_ID = tb_religion.id
WHERE tb_value.module_id = 2
-- combine it with...
UNION ALL
-- ...Query 2
select tb_value.*,tb_religion.title as ANY_DESCRIPTION
from tb_value
JOIN tb_userType on tb_value.ANY_KIND_OF_ID = tb_userType.id
WHERE tb_value.module_id = 1
-- combine it with...
UNION ALL
-- ...Query 3
select lather, rinse, repeat for 40 tables!
You can actually define a view that hardcodes all 40 cases, and then put filters onto queries for the particular modules you want.
To do this dynamically you need to be able to create a sql statement that looks like this
select tb_value.*, tb_usertype.title as Descr
from tb_value
inner join tb_usertype
on tb_value.extid = tb_usertype.id
where tb_value.tb_module_id = 1
union all
select tb_value.*, tb_religion.religionname as Descr
from tb_value
inner join tb_religion
on tb_value.extid = tb_religion.id
where tb_value.tb_module_id = 2
-- union 40 other tables
Currently you can not do that because you do not have any information in the db telling you which column to use from tb_religion and tb_usertype etc. You can add that as a new field in tb_module.
If you have fieldname to use in tb_module you can build a view that does what you want.
And you could add a trigger to table tb_modules that alters the view whenever tb_modules is modified. That way you do not need to use dynamic sql from the client when doing queries. The only thing you need to worry about is that the table needs to be created in the db before you add a new row to tb_modules
Edit 1
Of course the code in the trigger needs to dynamically build the alter view statement.
Edit 2 You also need to have a field with information about what column in tb_usertype and tb_religion etc. to join against tb_value.extid (usertype_OR_religion_ID). Or you can assume that the field will always be called id
Edit 3 Here is how you could build the trigger on tb_module that alters the view v_values. I have added fieldname as a column in tb_modules and I assume that the id field in the related tables is called id.
create trigger tb_modules_change on tb_modules after insert, delete, update
as
declare #sql nvarchar(max)
declare #moduleid int
declare #tablename varchar(50)
declare #fieldname varchar(50)
set #sql = 'alter view v_value as '
declare mcur cursor for
select id, tablename, fieldname
from tb_modules
open mcur
fetch next from mcur into #moduleid, #tablename, #fieldname
while ##FETCH_STATUS = 0
begin
set #sql = #sql + 'select tb_value.*, '+#tablename+'.'+#fieldname+' '+
'from tb_value '+
'inner join '+#tablename+' '+
'on tb_value.extid = '+#tablename+'.id '+
'where tb_value.tb_module_id = '+cast(#moduleid as varchar(10))
fetch next from mcur into #moduleid, #tablename, #fieldname
if ##FETCH_STATUS = 0
begin
set #sql = #sql + ' union all '
end
end
close mcur
deallocate mcur
exec sp_executesql #sql
Hm..there are probably better solutions available but here's my five cents:
SELECT
id,tb_modules_ID,usertype_OR_religion_ID,
COALESCE(
(SELECT TITLE FROM tb_usertype WHERE Id = usertype_OR_religion_ID),
(SELECT RELIGIONNAME FROM tb_religion WHERE Id = usertype_OR_religion_ID),
'N/A'
) AS SourceTable
FROM tb_valuehere
Note that I don't have the possibility to check the statement right now so I'm reserving myself for any syntax errors...
First, using your current design the only reasonable solution is dynamic SQL. You should write a module in your middle-tier that queries for the appropriate table names and builds the queries on the fly. Trying to accomplish that in T-SQL will be a nightmare. T-SQL was not designed for string construction.
The right solution is to build a new database designed properly, migrate the data and scrap the existing design. The problems you will encounter with your current design will simply grow. It will be harder for new developers to learn the new system. It will be prone to errors. There will be no data integrity (e.g. forcing the attribute "Start Date" to be parsable as a date). Custom queries will be a chore to write and so on. Eventually, you will hit the day when the types of information desired from the system are simply too difficult to extract given the current design.
First take the undesigner out the back and put them out of their misery. They are hurting people.
Due to their incompetence, every time you add a row to Module, you have to modify every query that uses it. Good for www.dailywtf.com.
You do not have Referential Integrity either, because you cannot define an FK on the this_or_that column. Your data is exposed, probably to "code" written by the same undesigner. No doubt you are aware that this is where the deadlocks are created.
That it is a "judgement", that is so that you understand the gravity of the undesign, and you can justify replacing it, to your managers.
SQL was designed for Relational Databases, that means Normalised. It is not good for mangled files. Sure, some queries may be better than others (just look at the answers), but there is no way to get around the undesign, any SQL query will be hamstrung, and need change whenever a Module row is added.
"Dynamic" is reserved for Databases, not possible for flat flies.
Two answers. One to stop the continuing idiocy of changing the existing queries every time a Module row is added (you're welcome); the second to answer your question.
Safe Future Queries
CREATE VIEW UserReligion_vw AS
SELECT [XxxxId] = id, -- replace Xxxx
[ReligionId] = usertype_OR_religion_ID
FROM tb_value
WHERE tb_modules_ID = 1
CREATE VIEW UserReligion_vw AS
SELECT [XxxxId] = id,
[ReligionId] = usertype_OR_religion_ID
FROM tb_value
WHERE tb_modules_ID = 2
From now on, make sure the all queries currently using the undesign, are modified to use the correct View instead. Do not use the Views for Update/Delete/Insert.
Answer
Ok, now for the main question. I can think of other approaches, but this one is the best. You have stated, you want the third column to also be an unnormalised piece of chicken excreta and the supply Title for [EITHER_Religion_OR_UserType_OR_This_OR_That]. Right, so you are teaching the user to be confused as well; when the no of modules grow, they will have great fun figuring out what the column contains. Yes a problem does always compound itself.
SELECT [XxxxId] = id,
[Whatever] = CASE tb_modules_ID
WHEN 1 THEN ( SELECT name -- title, whatever
FROM tb_religion
WHERE id = V.usertype_OR_religion_ID
)
WHEN 2 THEN ( SELECT name -- title, whatever
FROM tb_usertype
WHERE id = V.usertype_OR_religion_ID
)
ELSE "(UnknownModule)" -- do not remove the brackets
END
FROM tb_value V
WHERE conditions... -- you need something here
This is called a Correlated Scalar Subquery.
It works on any version of Sybase since 4.9.2 with no limitations. And SQL 2005 (last time I looked, anyway, Aug 2009). But on MS you will get a StackTrace if the volume of tb_value is large, so make sure the WHERE clause has some conditions on it.
But MS have broken the server with their "new" 2008 codeline, so it does not work in all circumstances (the worse your mangled files, the less likely it will work; the better your database design, the more likely it will work). That is why some MS people pray every day for the next Service pack, and others never attend church.
I guess you want something like this:
Adding tables and one row per table into tb_modules is straight forward.
SET NOCOUNT ON
if OBJECT_ID('tb_modules') > 0 drop table tb_modules;
if OBJECT_ID('tb_value') > 0 drop table tb_value;
if OBJECT_ID('tb_usertype') > 0 drop table tb_usertype;
if OBJECT_ID('tb_religion') > 0 drop table tb_religion;
go
create table dbo.tb_modules (
id int,
description varchar(20),
tablename varchar(255)
);
insert into tb_modules values ( 1, 'UserType', 'tb_usertype');
insert into tb_modules values ( 2, 'Religion', 'tb_religion');
create table dbo.tb_value(
id int,
tb_modules_ID int,
usertype_OR_religion_ID int
);
insert into tb_value values ( 1111, 1, 45);
insert into tb_value values ( 1112, 1, 55);
insert into tb_value values ( 1113, 2, 123);
insert into tb_value values ( 1114, 2, 234);
create table dbo.tb_usertype(
id int,
UserType varchar(30)
);
insert into tb_usertype values ( 45, 'User_type_45');
insert into tb_usertype values ( 55, 'User_type_55');
create table dbo.tb_religion(
id int,
Religion varchar(30)
);
insert into tb_religion values ( 123, 'Religion_123');
insert into tb_religion values ( 234, 'Religion_234');
-- start of query
declare #sql varchar(max) = null
Select #sql = case when #sql is null then ' ' else #sql + char(10) + 'union all ' end
+ 'Select ' + str(id) + ' type, id, ' + description + ' description from ' + tablename from tb_modules
set #sql = 'select v.id, tb_modules_ID , usertype_OR_religion_ID , t.description
from tb_value v
join ( ' + #sql + ') as t
on v.tb_modules_ID = t.type and v.usertype_OR_religion_ID = t.id
'
Print #sql
exec( #sql)
I think it's intended to be used with dynamic sql.
Maybe break out each tb_value.tb_modules_ID row into its own temp table, named with the tb_modules.tablename.
Then have an sp iterate through the temp tables matching your naming convention (by prefix or suffix) building the sql and doing your join.

SQL Inline or Scalar Function?

So I need an SQL function that will concatenate a bunch of row values into one varchar.
I have the functions written but right now I'm focused on what is the better choice for performance.
The Scalar Function is
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS varchar(max)
AS
BEGIN
DECLARE #patients varchar(max)
SET #patients = ''
SELECT #patients = #patients + convert(varchar, Patient) + ';' FROM RecipientsPatients WHERE Recipient = #recipient
RETURN #patients
END
The Inline Function just returns a table of all the values instead of concatenating them.
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS TABLE
AS
RETURN
(
SELECT Patient FROM RecipientsPatients WHERE Recipient = #recipient
)
I would then take this table in a separate function and concatenate them together. I was thinking the second choice is best since I will be going row by row through a smaller data set. Any opinions on what I'm doing right/wrong would be appreciated.
Thanks
This problem of string concatenation in SQL Server has several solutions, and the pros and cons are discussed in Concatenating Row Values in Transact-SQL and other similar articles on the web.
My favourite solution is using the FOR XML PATH(' ') trick. The chain assignment method you use works fine, although is not officialy supported and hence may break in future. Your method should be among the fastest possible, if not the fastes, as long as the table valued function does not perform a full scan, ie. you have an index on Recipient that covers Patient (use include).
The only thing I would add is to declare both functions WITH SCHEMABINDING, this has side effects that improve performance.
See here for an example of using the FOR XML PATH trick
set nocount on;
declare #t table (id int, name varchar(20), x char(1))
insert into #t (id, name, x)
select 1,'test1', 'a' union
select 1,'test1', 'b' union
select 1,'test1', 'c' union
select 2,'test2', 'a' union
select 2,'test2', 'c' union
select 3,'test3', 'b' union
select 3,'test3', 'c'
SELECT p1.id, p1.name,
stuff((SELECT ', ' + x
FROM #t p2
WHERE p2.id = p1.id
ORDER BY name, x
FOR XML PATH('') ), 1,2, '') AS p3
FROM #t p1
GROUP BY
id, name
it returns
1 test1 a, b, c
2 test2 a, c
3 test3 b, c
Have a look at Adam Machanic's results from his Grouped String Concatenation Contest:
http://web.archive.org/web/20150328021904/http://sqlblog.com/blogs/adam_machanic/archive/2009/05/31/grouped-string-concatenation-the-winner-is.aspx
It has the code to show you the most efficient way to do this. Peter Larsson, who won the contest, used a combination of tricks including XML PATH to accomplish the task. There was some debate later about whether it was the most efficient solution based on subsequent tests of other submissions. Make sure you check the comments to know what scripts to look at in the zip file you can download there. Generally FOR XML PATH('') is the fastest though.