Not able to create index on schema binding view - sql-server-2012

Not able to create index on below schema binding view.It is created from another view (v_prod_manu_sub).It is showing below error message:
Cannot create index on view "dbo.V_PROD_MANU" because it references derived table "X" (defined by SELECT statement in FROM clause). Consider removing the reference to the derived table
or not indexing the view.
How to change this below query for index creation ?
ALTER VIEW [dbo].[V_PROD_MANU] WITH SCHEMABINDING AS
SELECT X.PRODUCT, CAST(RIGHT(TEXT_CODE,LEN(F_TEXT_CODE)-1) AS VARCHAR(30)) AS TEXT_CODE,
CAST(SUBSTRING(RIGHT(PHRASE,LEN(F_PHRASES)-1),9,LEN(F_PHRASE)-3) AS varchar(700)) AS PHRASE
FROM (
SELECT V1.PRODUCT,
(SELECT ',' + V2.TEXT_CODE FROM dbo.V_PROD_MANU_SUB V2 WHERE V1.PRODUCT = V2.PRODUCT ORDER BY V2.F_COUNTER FOR XML PATH('')) AS TEXT_CODE,
(SELECT ' |par|par ' + V3.F_PHRASE FROM dbo.V_PROD_MANU_SUB V3 WHERE V1.PRODUCT = V3.PRODUCT ORDER BY V3.F_COUNTER FOR XML PATH('')) AS PHRASE
FROM dbo.V_PROD_MANU_SUB V1 GROUP BY V1.PRODUCT)X
OUTPUT:
Product TEXT_CODE PHRASE
00-021 MANU0043,MANU0050 Inc |par Pharmaceuticals Group |par 235 East 5nd Street |par usa |par 1-800-123-000

Typically people use STUFF() to remove a leading comma, instead of these messy converts and LEN() calculations. For example:
SELECT V1.PRODUCT,
TEXT_CODE = STUFF
(
(
(SELECT ',' + V2.TEXT_CODE
FROM dbo.V_PROD_MANU_SUB AS V2
WHERE V1.PRODUCT = V2.PRODUCT
ORDER BY V2.F_COUNTER
FOR XML PATH(''),
TYPE).value('./text()[1]','nvarchar(max)')
),
1,1,N'')
FROM dbo.V_PROD_MANU_SUB AS V1
GROUP BY V1.PRODUCT;
-- much easier in SQL Server 2017 with STRING_AGG()
But that doesn't seem to have anything to do with why you need to materialize the comma-separated list in the first place, whether it has a leading comma or not.
Indexed views are often a form of premature optimization. Essentially you're saying, "the cost of querying this data will be far greater than the cost of maintaining it." Do you know that? How? What is your workload balance (read:write)? How slow is the query now? How often does it run? How long do updates take?
If you do know that, you will have better luck materializing it to your own table, manually, through a trigger. An indexed view is quite likely going to be a dead end for a variety of reasons.

Related

How to generate a hyperlink tag to each string in column, to create a tag cloud

I have a table in SQL Server that stores info from a blog, so each record is a post. One of the columns stores the tags related to each post.
I need to create a tag for each tag stored in that column, they are separated with a comma ','.
For example: For the record 1, the column "tags" stores "cars,planes,boats".
I need to generate a column upon SELECT command that will write this:
cars<a><a href="blog-tags.aspx?tag=planes">planesboats
Any help would be highly appreciated! Thanks
If there's some kind of coding layer between the data and the user, then it's probably best to do this in the coding layer.
If you really must do it in sql, you could first convert the column into a separate temporary table, then join with the temporary table (and select accordingly)
INSERT INTO #tempTable(primaryKey, data)
SELECT yt.primaryKey, s.data
FROM
YourTable yt
inner join dbo.Split(yt.tags, ',') s
Relies on a split function such as found here: split function (which isn't a very fast one... but will suffice)
Then....
Select yt.*,
'<a href="blog-tags.aspx?tag=' + t.data + '">' + t.data + '<a>' as Link
from
YourTable yt
inner join #tempTable t on yt.PrimaryKey = t.PrimaryKey

SQL Server : GROUP CONCAT with DISTINCT is sorting natural data input

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):
ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT
ROW_NUMBER() OVER (ORDER BY CableID) - 1 AS ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]', 'varchar(8000)'), 1, 13) AS Routing
FROM
(SELECT
CableID,
CAST('<XMLRoot><RowData>' + REPLACE([RouteNodeList], ',', '</RowData><RowData>') + '</RowData></XMLRoot>' AS xml) AS x
FROM
[dbo].[Cables]) t
CROSS APPLY
x.nodes('/XMLRoot/RowData') m (n)
ORDER BY
ID)
Now I need to concatenate data from the Routing column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:
Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],
I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.
Perhaps GROUP BY with ORDER BY will work:
stuff((select ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
from vw_Routing x3
where x3.CableID = c.CableId
group by x3.Routing
order by min(x3.id)
for XML PATH ('')
), 1, 1, '') as [Routing],
I also replaced the SUBSTRING() with STUFF(). The latter is more standard for this operation.
To https://stackoverflow.com/users/1144035/gordon-linoff
Unfortunately, that did not work. It gave me the same result as my select statement; that is, no dups but reordering data.
HOWEVER, I found the correct answer earlier today:
I figured it out finally!! I still have to get implement it within the other code and add the new Cable Area code, but the hard part it over!!!!!
I am going to post the following to the forums so that they know not to work on it .... I was writing this to send to my friend for his help, but I figured it out myself before I sent it.
I started with raw, comma separated data in the records of a table … the data is from another source. I had to remove some of the information from each value, so I used the following code to split it up and manipulate it:
Code1
Once that was done, I had to put the manipulated data back in the same form in the same order and with no duplicates. So I needed a SELECT DISTINCT. When I used the commented out SELECT DISTINCT below, it removed duplicates but it changed the order of the data which I could not have as it is Cable Tray Routing Data. When I took out the SELECT DISTINCT, it kept correct order, but left duplicates.
Because I was using XML PATH, I had to change this code …… To this code so that I could use SELECT DISTINCT remove the duplicates:Code2 and Code3


SQL mass string manipulation

I'm working with an oracle DB and need to manipulate a string column within it. The column contains multiple email addresses in this format:
jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com
What I want to do is take out anything that does not have '#gmail.com' at the end (in this example amoore#outlook.com would be removed) however amoore#outlook.com may be the first email in the next row of the column so in this way there is no real fixed format, the only format being that each address is seperated by a semi-colon.
Is there anyway of implementing this through one command to run through every row in the column and remove anything thats not #gmail.com? I'm not really sure if this kind of processing is possible in SQL. Just looking for your thoughts!!
Thanks a lot you guys. Look forward to hearing from you!
Applicable to Oracle 11g (11.2) onward only. Because listagg function is supported only in 11.2 onward. If you are using 10.1 onward up to 11.1, you can write your own string aggregate function or take this one.
with T1 as (
select 1 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual union all
select 2 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual
)
select id
, listagg(email, ';') within group(order by id) emails
from (select id
, regexp_substr(emails,'[^;]+', 1, rn) email
from t1
cross join (select rownum rn
from(select max (regexp_count(emails, '[^;]+')) ml
from t1
)
connect by level <= ml
)
)
where email like '%#gmail.com%'
group by id
Id Emails
--------------------------------------
1 dhookep#gmail.com;jgooooll#gmail.com
2 dhookep#gmail.com;jgooooll#gmail.com
Here is a Demo
This answer is actually for SQL Server, as that is what I know. That being said, perhaps having an example of how to do it in one system will give you an idea of how to do it in yours. Or maybe there is a way to convert the code into the same type of thing in Oracle.
First, the thought process: In SQL Server combining the FOR XML PATH and STUFF functionality allows you to make a comma separated list. I'm adding a WHERE Split.SplitValue LIKE ... clause into this to filter it to only gmail addresses. I'm cross applying this whole thing to the main table, and that turns it into a filtered email list. You could then further filter the main table to run this on a more targeted set of rows.
Second, the SQL Server implementation:
SELECT
*
FROM #Table Base
CROSS APPLY
(
SELECT
STUFF(
(SELECT
';' + Split.SplitValue AS [text()]
FROM dbo.fUtility_Split(Base.Emails, ';') Split
WHERE Split.SplitValue LIKE '%#gmail.com'
FOR XML PATH (''))
, 1, 1, '') Emails
) FilteredEmails
EDIT: I forgot to mention that this answer requires you have some sort of function to split a string column based on a separator value. If you don't have that already, then google for it. There are tons of examples.

Is there any way of improving the performance of this SQL Function?

I have a table which looks something like
Event ID Date Instructor
1 1/1/2000 Person 1
1 1/1/2000 Person 2
Now what I want to do is return this data so that each event is on one row and the Instructors are all in one column split with a <br> tag like 'Person 1 <br> Person 2'
Currently the way I have done this is to use a function
CREATE FUNCTION fnReturnInstructorNamesAsHTML
(
#EventID INT
)
RETURNS VARCHAR(max)
BEGIN
DECLARE #Result VARCHAR(MAX)
SELECT
#result = coalesce(#result + '<br>', '') + inst.InstructorName
FROM
[OpsInstructorEventsView] inst
WHERE
inst.EventID = #EventID
RETURN #result
END
Then my main stored procedure calls it like
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
dbo.fnReturnInstructorNamesAsHTML(ev.EventID) as Names
FROM
[OpsSimEventsView] ev
JOIN
[OpsInstructorEventsView] inst
ON
ev.EventID = inst.EventID
This is very slow, im looking at 4seconds per call to the DB. Is there a way for me to improve the performance of the function? Its a fairly small function so im not sure what I can do here, and I couldnt see a way to work the COALESCE into the SELECT of the main procedure.
Any help would be really appreciated, thanks.
You could try something like this.
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
STUFF((SELECT '<br>'+inst.InstructorName
FROM [OpsInstructorEventsView] inst
WHERE ev.EventID = inst.EventID
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'), 1, 4, '') as Names
FROM
[OpsSimEventsView] ev
Not sure why you have joined OpsInstructorEventsView in the main query. I removed it here but if you needed you can just add it again.
A few things to look at:
1) The overhead of functions makes them expensive to call, especially in the select statement of a query that could potentially be returning thousands of rows. It will have to execute that function for every one of them. Consider merging the behavior of the function into your main stored procedure, where the SQL Server can make better use of its optimizer.
2) Since you are joining on event id in both tables, make sure you have an index on those two columns. I would expect that you do, given that those both appear to be primary key columns, but make sure. An index can make a huge difference.
3) Convert your coalesce call into its equivalent case statements to remove the overhead of calling that function.
Yes make it an INLINE Table-Valued SQL function:
CREATE FUNCTION fnReturnInstructorNamesAsHTML
( #EventID INT )
RETURNS Table
As
Return
SELECT InstructorName + '<br>' result
FROM OpsInstructorEventsView
WHERE EventID = #EventID
Go
Then, in your SQL Statement, use it like this
SELECT ]Other stuff],
(Select result from dbo.fnReturnInstructorNamesAsHTML(ev.EventID)) as Names
FROM OpsSimEventsView ev
JOIN OpsInstructorEventsView inst
ON ev.EventID = inst.EventID
I'm not exactly clear how the query you show in your question is concatenating data from multiple rows in one row of the result, but the problem is that ordinary UDFs are compiled on use, on EVERY use, so for each row in your output result the Query processopr has to recompile the UDF again. THis is NOT True for an "inline table valued" UDF, as it's sql is folded into the outer sql before it is passed to the SQL optimizer, (the subsystem that generates the statement cache plan) and so the UDF is only compiled once.

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)