Joining sql column with comma delimited column - sql

I have three tables that look like
table1
ID Title Description
1 Title1 Desc1
2 Title2 Desc2
table2
ID HistoryID
1 1001
2 1002
2 1003
2 1004
table3
HistoryID Value
1001 val1
1002 val2
1003 val3
1004 val4
Now I am planning to it using "only" two tables
table1
ID Title Description HistoryIDList
1 Title1 Desc1 1001
2 Title2 Desc2 1002,1003,1004
table3
HistoryID Value
1001 val1
1002 val2
1003 val3
1004 val4
I have created a sql table-value function that will return indexed values 1002,1003,1004 that could be joined with HistoryID from table3.
Since I am losing normalization, and do not have FK for HistoryIDList, my questions are
should there by significant performance issue running a query that would join HistoryIDList
would indexing sql function do the trick or not since there is no relation between two columns.
In that case is it possible to add FK on table created in sql function?

Why would you change a good data structure to a bogus data structure? The two table version is a bad choice.
Yes, there is a significant performance difference when joining from a single id to a list, compared to a simple equi-join. And, as bad as that normally is, the situation is even worse here because the type of the id is presumably an int in the original table and a character string in the other table.
There is no way to enforce foreign key relationships with your proposed structure without using triggers.
The only thing that you could possibly do to improve performance would be to have a full text index on the HistoryIdList column. This might speed the processing. Once again, this is complicated by the fact that you are using numeric ids.
My recommendation: just don't do it. Leave the table structure as it is, because that is the best structure for a relational database.

Related

How will SQL Server query a table partitioned using a calculated field if the predicate col is a factor in the field?

Imagine you have table t where statusID is a calculated field based on status. 'Live'=1, 'Deleted'=2 and so on. t is partitioned by statusID meaning that all 1 goes into a partition, all 2 goes into another etc. Now let's query t. If I want all live records can I use WHERE status='live' or must I use where statusID=1 to take advantage of the partition?
ID
val1
val2
status
statusID
1
abc
ABC
live
1
2
xyz
XYZ
deleted
2
3
foo
BAR
archived
3
We have some existing tables which are getting quite large, and code that leverages the status col, which is indexed. For most loads this is fine, but with the rows getting into the millions we are starting to see issues with joins.

SQL Server "pseudo/synthetic" composite Id(key)

Sorry but I don't know how to call in the Title what I need.
I want to create an unique key where each two digits of the number identify other table PK. Lets say I have below Pks in this 3 tables:
Id Company Id Area Id Role
1 Abc 1 HR 1 Assistant
2 Xyz 2 Financial 2 Manager
3 Qwe 3 Sales 3 VP
Now I need to insert values in other table, I know that I may do in 3 columns and create a Composite Key to reach integrity and uniqueness as below:
Id_Company Id_Area Id_Role ...Other_Columns.....
1 2 1
1 1 2
2 2 2
3 3 3
But I was thinking in create a single column where each X digites identify each FK. So the above table 3 first columns become like below (suposing each digit in an FK)
Id ...Other_Columns.....
121
112
222
333
I don't know how to call it and even if it's stupid but it makes sense for me, where I can select for a single column and in case of need some join I just need to split number each X digits by my definition.
It's called a "smart", "intelligent" or "concatenated" key. It's a bad idea. It is fragile, leads to update problems and impedes the DBMS. The DBMS and query language are designed for you to describe your application via base tables in a straightforward way. Use them as they were intended.

Using a list with a selected item as a value

Let us consider the following table structures:
Table1
Table1_ID A
1 A1
2 A1;B1
and
Table2
Table2_ID Table1_ID B C
1 1 foobar barfoo
2 2 foofoo barbar
The view I'm using is defined by the following query:
SELECT Table1.A, B, C
FROM Table2
INNER JOIN Table1 ON Table1.Table1_ID = Table2.Table1_ID;
95% of A's data consists in a 2 characters long string. In this case, it works fine. However, 5% of it is actually a list (using a semicolon as a separator) of possible values for this field.
This means my users would like to choose between these values when it is appropriate, and keep using the single value automatically the rest of the time. Of course, this is not possible with a single INNER JOIN, since there cannot be a constant selected value.
Table2 is very large, while Table1 is quite small. Manually filling a local A field in each row within Table2 would be a huge waste of time.
Is there an efficient way for SQL (or, more specifically, SQL Server 2008) to handle this? Such as a list with a selected item within a field?
I was planning to add a "A_ChosenValue" field that would store the chosen value when there's a list in A, and remain empty when A only stores a single value. It would only require users to fill it 5% of the time, which is okay. But I was thinking there could be a better way than using two columns to store a single value.
Ideally you would just alter your schema and add a new entity to support the many-to-many relationship between Table1 and Table2 such as the following with a compound key of all three columns.
Table3
| Table1_ID | Table2_ID | A |
-----------------------------
| 1 | 1 | A1 |
------------------------------
| 2 | 2 | A1 |
------------------------------
| 2 | 2 | B1 |
------------------------------
You could then do a select and join on this table and due to it being indexed you won't lose any performance.
Without altering the table structure or normalizing data it is possible using a conditional select statement like that shown in this SO post but the query wouldn't perform so well as you would have to use a function to split the values containing a semi-colon.
Answering my own question:
I added a LocalA column in Table1, in order that my view actually selects ISNULL(LocalA, Table1.A). Therefore, the displayed value equals A by default, and users can manually overwrite it to select a specific value when A stores a list.
I am not sure whether this is the most efficient solution or not, but at least it works without requiring two columns in the view.

DB Design: Same Column used for 2 different foreign keys

I'm developing a method of joining 2 sources of Data (e.g. Queries).
I have a table Named QueryField with the following structure:
QueryID
FieldID
FieldName
....
If I have 2 records on QueryField
QueryID FieldID FieldNAme
------------ --------- ----------
1 1 CustomerID
1 2 CustAddress
2 3 CustNo
2 4 CustomerPhone
I want to have a new table QueryFieldJoin which defines which fields in the 2 queries to use to join on. My idea was to have the following structure
LeftJoinFieldID (FK from FieldID of QueryField)
RightJoinFieldID (also FK from FieldID of QueryField)
JoinType (intersect, outer join).
PrimaryKey is a combination of LeftJoinFieldID and RightJoinFieldID
LeftJoinFieldID RightJoinFieldId JoinType
-------------- ---------------- --------
1 3 Intersect
This will work, however I feel that this isn't the best DB design having the same field as a foreign to two different columns on another table. Can anybody suggest a better approach?
The DB Design also depends on what are your needs:
1) Which queries do you need to answer?
2) How fast do you need to access those data?
From an expressive POV, your design can be correct but maybe not the best solution depending on which queries you need to run.
For Instance, you might consider to have three different tables: One for the Fields, one for The Queries and one Operations.
Or even one big table with everything there if you do not want to perform any Join.

Table-level diff and sync procedure for T-SQL

I'm interested in T-SQL source code for synchronizing a table (or perhaps a subset of it) with data from another similar table. The two tables could contain any variables, for example I could have
base table source table
========== ============
id val id val
---------- ------------
0 1 0 3
1 2 1 2
2 3 3 4
or
base table source table
=================== ==================
key val1 val2 key val1 val2
------------------- ------------------
A 1 0 A 1 1
B 2 1 C 2 2
C 3 3 E 4 0
or any two tables containing similar columns with similar names. I'd like to be able to
check that the two tables have
matching columns: the source table has exactly the same columns as the base table and the datatypes match
make a diff from the base table to the source table
do the necessary updates, deletes and inserts to change the data in the
base table to correspond the source table
optionally limit the diff to a subset of the base table,
preferrably with a stored procedure. Has anyone written a stored proc for this or could you point to a source?
SQL Server 2008 features the new merge statement. It's very flexible, if a bit complex to write out.
As an example, the following query would synchronize the #base and #source tables. It's limited to a subset of #base where id <> 2:
MERGE #base as tgt
USING #source as src
ON tgt.id = src.id and tgt.val = src.val
WHEN NOT MATCHED BY TARGET
THEN INSERT (id, val) values (src.id, src.val)
WHEN NOT MATCHED BY SOURCE AND tgt.id <> 2
THEN DELETE
Interesting question.
you could start from EXCEPT - INTERSECT
http://msdn.microsoft.com/en-us/library/ms188055.aspx
Here is readymade solution, may help you
http://www.sqlservercentral.com/scripts/Miscellaneous/30596/
Not sure if it's of any use to your specific situation, but this kind of operation is usually and relatively easily done using external tools (SQL Workbench diff, SQL Compare etc.).
It can even be scripted, just probably not invokable from a T-SQL procedure.