Attempt at database localization using table-valued functions

Attempt at database localization using table-valued functions - sql

I'm looking for opinions on the following localization technique:
We start with 2 tables:
tblProducts : ProductID, Name,Description,SomeAttribute
tblProductsLocalization : ProductID,Language,Name,Description
and a table-valued function:
CREATE FUNCTION [dbo].[LocalizedProducts](#locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization_Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
What I plan to do is include the the function whenever i need localized-data returned:
select * from LocalizedProducts('en-US') where ID=1
instead of
select * from tblProducts where ID=1
I'm interested if there are major performance concerns arround this or any showstoppers. Any reasons I shouldn't adopt this?
Edit: I've tagged this SQL2005 , altough I develop this using 2008, I think the deployment target only has SQL2005. I could upgrade to 2008 if the need arises though.
Later edit:
I have created a view, with identical content, but without the parameter:
CREATE VIEW [dbo].[LocalizedProductsView]
AS
SELECT b.Language,a.ProductID,COALESCE(b.Name,a.Name)as [Name],
COALESCE(b.Description,a.Description)as [Description],a.SomeAttributefrom tblProducts a
left outer join tblProductsLocalization_Locale b on a.ProductID= b.ProductID
I then proceeded to run some tests:
Estimated execution plan looks identical to both queries:
select * from LocalizedProducts('us-US') where SomeNonIndexedParameter=2
select * from LocalizedProductsView where (Language='us-US' or Language is null) and SomeNonIndexedPramaters=2
Final Question that arrises is: Should I understand that the TVF is computing the translations on ALL the products, regardless of the WHERE parameters? is the View doing the same thing ?

Short answer: As a general rule, there is nothing wrong with using a TVF for this sort of thing, but I would suggest making the ID be a parameter, also:
CREATE FUNCTION [dbo].[LocalizedProducts](#ID int, #locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization _Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
where a.ProductId = #ID
Used like so:
select * from LocalizedProducts(1, 'en-US')
Longer explanation:
I've never tried something like this in SQL 2008 yet, so it's possible that SQL Server can optimized this issue away.
My experience in earlier versions, though, seems to suggest that SQL Server tends to handle
User-Defined Functions in a more procedural than declarative fashion, so it doesn't interpret what you want and then figure out the best way to get you what you want, but actually performs in order the instructions you've written. So it appears to me that this method would:
select all English-language text, placing it into a table variable.
take the results of step #1 and select any records with the given ID.
This would mean a lot of wasted cycles, putting mostly-unused English text into the table variable, before applying the ID filter to that result set. On the other hand, putting all of the filters into the UDF would let SQL Server determine whether it's easiest to filter by ID first (more likely, assuming a standard indexing scheme), and then apply the locale filter, or vice versa. Either way, you should be having less data being moved around in the background, and thus have better performance, if you put all your filters in one spot. Again, this all assumes that SQL Server is not now making giant leaps in optimization. But if so, that's even more reason to say, yes, there is no problem using the TVF.

It's a safe bet that you'll have to translate more than product names. So I'd design the translation solution to handle any kind of string.
For example, you could have a localization table like:
Id, TranslatableStringId, Language, Translation
Then each product could have a translatable string associated with it. But also the explanatory text on top of the product list.
For products, you'd query like:
SELECT *
FROM Products p
INNER JOIN Translations t
ON p.DescriptionId = t.TranslatableStringId
AND t.language = 'en-US'
For an explanatory text, you'd get a simple:
SELECT t.Translation
FROM Translations t
WHERE t.TranslatableStringId = 123 -- ID of string
AND t.language = 'en-US'
P.S. For a real program, I'd use a more shorthand description than TranslatableStringId, like tsid, because translations tend to pop up everywhere.

I wanted to come back with an answer to this after doing a lot more testing.
It appears to me that SQL2008 is actually looking inside the TVF when performing the query plan and optimizing accordingly:
For instance:
select pr.* from LocalizedProducts('en-US') pr inner join LocalizedPhotos('en-US') ph on
ph.ProductId=pr.Id where pr.SomeUnindexProperty= 5
This query needs to touch 4 tables:
Products
Products_Localization
Photos
Photos_Localization
The way the query plan looks is that (let me see if I can format this):
Product gets a Clustered Index Seek
-- >> Products gets nested loop with Photos
-->> nested loop Products_Localization -
->> nested loop Photos_Localization.
Which is not what you would expect if the TVF would be a black box. The simple fact that Product gets an index SEEK would suggest to me that the query will not interpret blindly the entire TVF.
I ran a lot of performance tests, and on average the "localization" TVF are between 50% - 100% slower than using direct table-queries, but that would be expected as twice as many tables are involved in the TVFs than in the normal queries.

Related

How can this change be making my query slow (OR vs UNION) and can I fix it?

I've just been debugging a slow SQL query.
It's a join between 2 tables, with a WHERE clause conditioning on either a property of 1 table OR the other.
If I re-write it as a UNION then it's suddenly 2 orders of magnitude faster, even though those 2 queries produce identical outputs:
DECLARE #UserId UNIQUEIDENTIFIER = '0019813D-4379-400D-9423-56E1B98002CB'
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId) OR Bookings.MixedDealBroker in (#UserId))
--Execution time: ~4000ms
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId))
UNION
SELECT *
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (Bookings.MixedDealBroker in (#UserId))
--Execution time: ~70ms
This seems rather surprising to me! I would have expected the SQL compiler to be entirely capable of identifying that the 2nd form was equivalent and would have used that compilation approach if it were available.
Some context notes:
I've checked and IN (#UserId) vs = #UserId makes no difference.
Nor does JOIN vs LEFT JOIN.
Those tables each have 100,000s records, and the filter cuts it down to ~100.
In the slow version it seems to be reading every row of both tables.
So:
Does anyone have any ideas for how this comes about.
What (if anything) can I do to fix the performance without just re-writing the query as a series of UNIONs (not viable for a variety of reasons.)
=-=-=-=-=-=-=
Execution Plans:

This is a common limitation of SQL engines, not just in SQL Server, but also other database systems as well. The OR complicates the predicate enough that the execution plan selected isn't always ideal. This probably relates to the fact that only one index can be seeked into per instance of a table object at a time (for the most part), or in your specific case, your OR predicate is across two different tables, and other factors with how SQL engines are designed.
By using a UNION clause, you now have two instances of the Bookings table referenced, which can individually be seeked on separately in the most efficient way possible. That allows the SQL Engine to pick a better execution plan to serve you query.
This is pretty much just one of those things that are the way they are because that's just the way it is, and you need to remember the UNION clause workaround for future encounters of this kind of performance issue.
Also, in response to your comment:
I don't understand how the difference can affect the EP, given that the 2 different "phrasings" of the query are identical?
A new execution plan is generated every time one doesn't exist in the plan cache for a given query, essentially. The way the Engine determines if a plan for a query is already cached is based on the exact hashing of that query statement, so even an extra space character at the end of the query can result in a new plan being generated. Theoretically that plan can be different. So a different written query (despite being logically the same) can surely result in a different execution plan.
There are other reasons a plan can change on re-generation too, such as different data and statistics of that data, in the tables referenced in the query between executions. But these reasons don't really apply to your question above.

As already stated, the OR condition prevents the database engine from efficiently using the indexes in a single query. Because the OR condition spans tables, I doubt that the Tuning Advisor will come up with anything useful.
If you have a case where the query you have posted is part of a larger query, or the results are complex and you do not want to repeat code, you can wrap your initial query in a Common Table Expression (CTE) or a subquery and then feed the combined results into the remainder of your query. Sometimes just selecting one or more PKs in your initial query will be sufficient.
Something like:
SELECT <complex select list>
FROM (
SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
FROM Bookings
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (BookingPricings.[Owner] in (#UserId))
UNION
SELECT Bookings.ID AS BookingsID, BookingPricings.ID AS BookingPricingsID
FROM Bookings B
LEFT JOIN BookingPricings ON Booking = Bookings.ID
WHERE (Bookings.MixedDealBroker in (#UserId))
) PRE
JOIN Bookings B ON B.ID = PRE.BookingsID
JOIN BookingPricings BP ON BP.ID = PRE.BookingPricingsID
<more joins>
WHERE <more conditions>
Having just the IDs in your initial select make the UNION more efficient. The UNION can also be changed to a yet more-efficient UNION ALL with careful use of additional conditions, such as AND Bookings.MixedDealBroker <> #UserId in the second part, to avoid overlapping results.

SQL Server stored procedure takes 1' 18" to run... seems long

Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob

Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.

Avoiding a problematic nested query in MySQL

I have this SQL query which due to my own lack of knowledge and problem with mysql handling nested queries, is really slow to process. The query is...
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE Printers.PrinterGroup
IN (
SELECT DISTINCT Printers.PrinterGroup
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE PrintJobs.UserName='<username/>'
);
I would like to avoid splitting this into two queries and inserting the values of the subquery into the main query progamatically.

This is probably not exactly what you are looking for however, i will contribute my 2 cents. First off you should show us your schema and exactly what you are trying to accomplish with that query. However from the looks of it you are not using numeric IDs in the table and are instead using varchar fields to join tables, this is not really a good idea performance wise. Also i am not sure why you are doing:
(select PrinterName, UserName
from PrintJobs) AS Table1
instead of just joining on PrintJobs? Similar stuff for this one:
(select
PrinterName,
PrinterGroup
from Printers) as Table1
Maybe i am just not seeing it right. I would recommend that you simplify the query as much as possible and try it. Also tell us what exactly you are hoping to accomplish with the query and give us some schema to work with.
Removed the bad query from the answer.

This query you have is pretty messed up, not sure if this will handle everything you need but simplifying like this kills all the nested queries and it way faster. You can also use the EXPLAIN command to know how mysql will fetch your query.
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers ON PrintJobs.PrinterName = Printers.PrinterName
AND Printers.Username = '<username/>'
;

Speeding up a SQL query with generic data information

Due to a variety of design decisions, we have a table, 'CustomerVariable'. CustomerVariable has three bits of information--its own id, an id to Variable (a list of possible settings the customer can have), and the value for that variable. The Variable table, on the other hand, has the information on a default--in case the CustomerVariable is not set.
This works well in most situations, allowing us not to have to create an insanely long list of information--especially in a case where there are 16 similar variables that need to be handled for a customer.
The problem comes in trying to get this information into a select. So far, our 'best' solution involves far too many joins to be efficient--we get a list of the 16 VariableIds we need information on, setting them into variables, first. Later on, however, we have to do this:
CROSS JOIN dbo.Variable v01
LEFT JOIN dbo.CustomerVariable cv01 ON cv01.customerId = c.id
AND cv01.variableId = v01.id
CROSS JOIN dbo.Variable v02
LEFT JOIN dbo.CustomerVariable cv02 ON cv02.customerId = c.id
AND cv02.variableId = v02.id
-- snip --
CROSS JOIN dbo.Variable v16
LEFT JOIN dbo.CustomerVariable cv16 ON cv16.customerId = c.id
AND cv16.variableId = v16.id
WHERE
v01.id = #cv01VariableId
v02.id = #cv02VariableId
-- snip --
v16.id = #cv16VariableId
I know there has to be a better way, but we can't seem to find it amidst crunch time. Any help would be greatly appreciated.

If your data set is relatively small and not too volatile, you may want to use materialized views (assuming your database supports them) to optimize the lookup.
If materialized views are not an option, consider writing a stored procedure that retrieves that data in two passes:
First retrieve all of the CustomerVariables available for a particular customer (or set of customers)
Next, retrieve all of the default values from the Variables table
Perform a non-distinct union on the results merging the defaults in wherever a CustomerVariable record is missing.
Essentially, this is the equivalent of:
SELECT variableId,
CASE WHEN CV.variableId = NULL THEN VR.defaultValue ELSE CV.value END
FROM Variable VR
LEFT JOIN CUstomerVariable CV on CV.variableId = VR.variableId
WHERE CV.customerId = c.id

The type of query you want is called a pivot table or crosstab query.
By far the easiest way of dealing with this is to create a view based off of a crosstab query. This will flip the columns from being vertical to being horizontal like a regular sql table. Once that is done, just query the view. Easy. ;)

SELECT with ORs including table joins

I've got a database with three tables: Books (with book details, PK is CopyID), Keywords (list of keywords, PK is ID) and KeywordsLink which is the many-many link table between Books and Keywords with the fields ID, BookID and KeywordID.
I'm trying to make an advanced search form in my app where you can search on various criteria. At the moment I have it working with Title, Author and Publisher (all from the Book table). It produces SQL like:
SELECT * FROM Books WHERE Title Like '%Software%' OR Author LIKE '%Spolsky%';
I want to extend this search to also search using tags - basically to add another OR clause to search the tags. I've tried to do this by doing the following
SELECT *
FROM Books, Keywords, Keywordslink
WHERE Title LIKE '%Joel%'
OR (Name LIKE '%good%' AND BookID=Books.CopyID AND KeywordID=Keywords.ID)
I thought using the brackets might separate the 2nd part into its own kinda clause, so the join was only evaluated in that part - but it doesn't seem to be so. All it gives me is a long list of multiple copies of the one book that satisfies the Title LIKE '%Joel%' bit.
Is there a way of doing this using pure SQL, or would I have to use two SQL statements and combine them in my app (removing duplicates in the process).
I'm using MySQL at the moment if that matters, but the app uses ODBC and I'm hoping to make it DB agnostic (might even use SQLite eventually or have it so the user can choose what DB to use).

You need to join the 3 tables together, which gives you a tablular resultset. You can then check any columns you like, and make sure you get distinct results (i.e. no duplicates).
Like this:
select distinct b.*
from books b
left join keywordslink kl on kl.bookid = b.bookid
left join keywords k on kl.keywordid = k.keywordid
where b.title like '%assd%'
or k.keyword like '%asdsad%'
You should also try to avoid starting your LIKE values with a percent sign (%), as this means SQL Server can't use an index on that column and has to perform a full (and slow) table scan. This starts to make your query into a "starts with" query.
Maybe consider the full-text search options in SQL Server, also.

What you've done here is made a cartesian result set by having the tables joined with the commas but not having any join criteria. Switch your statements to use outer join statements and that should allow you to reference the keywords. I don't know your schema, but maybe something like this would work:
SELECT
*
FROM
Books
LEFT OUTER JOIN KeywordsLink ON KeywordsLink.BookID = Books.CopyID
LEFT OUTER JOIN Keywords ON Keywords.ID = KeywordsLink.KeywordID
WHERE Books.Title LIKE '%JOEL%'
OR Keywords.Name LIKE '%GOOD%'

Use UNION.
(SELECT Books.* FROM <first kind of search>)
UNION
(SELECT Books.* FROM <second kind of search>)
The point is that you could write two (or more) simple and efficient queries instead of one complicated query that tries to do everything at once.
If number of resulting rows is low, then UNION will have very little overhead (and you can use faster UNION ALL if you don't have duplicates or don't care about them).

SELECT * FROM books WHERE title LIKE'%Joel%' OR bookid IN
(SELECT bookid FROM keywordslink WHERE keywordid IN
(SELECT id FROM keywords WHERE name LIKE '%good%'))
Beware that older versions of MySQL didn't like subselects. I think they've fixed that.

You must also limit the product of the join by specifying something like
Books.FK1 = Keywords.FK1 and
Books.FK2 = Keywordslink.FK2 and
Keywords.FK3 = Keywordslink.FK3
But i don't know your exact data model so your solution may be slightly different.

I'm not aware of any way to accomplish a "conditional join" in SQL. I think you'll be best served with executing the two statements separately and combining them in the application. This approach is also more likely to stay DB-agnostic.

It looks like Neil Barnwell has covered the answer that I would have given, but I'll add one thing...
Books can have more than one author. If your data model is really designed as your query implies you might want to consider changing it to accommodate that fact.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas