Showing error in syntax on Azure synapse SQL script

Showing error in syntax on Azure synapse SQL script - sql

My code is
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[RLS_LOGS]
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS
(
[USER_IDENTITY] [nvarchar](4000) NOT NULL,
[DESCRIPTION] [nvarchar](4000) NOT NULL,
[CREATED_ON] [datetime2](7) NOT NULL DEFAULT (GETDATE())
)
Showing the error while running SQL script on azure synapse (Dedicated server pool)

You have some of the clauses a bit out of order. This should work:
CREATE TABLE [dbo].[RLS_LOGS]
(
[USER_IDENTITY] [nvarchar](4000) NOT NULL,
[DESCRIPTION] [nvarchar](4000) NOT NULL,
[CREATED_ON] [datetime2](7) NOT NULL
)
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
I had to remove the getdate() default because as documented here:
In Azure Synapse Analytics, only constants can be used for a default
constraint. An expression cannot be used with a default constraint.
You will just have to ensure each INSERT statement inserts getdate() to that column.

Related

Insert statement running very slowly on SQL Server

I had been getting data from production and loading it into a staging DWH. And then loading it further into usable format later in another DWH from staging.
Both staging and the final DWH are on the same server. This process wasn't taking long before, but now it's taking ages to load the data from staging. It takes a few minutes to load data from production into staging, but it takes hours to load it further and I am not sure why.
FYI: I had been testing the loads, so I have truncated/deleted the table a few times and reloaded them
Also I had a non clustered index on one of the columns in actual DWH which I removed
CONSTRAINT [PK_EncounterTB_Encounter_id]
PRIMARY KEY CLUSTERED ([Encounter_id] ASC),
CONSTRAINT [Uniq_EncounterTB_Encounter_table_id]
UNIQUE NONCLUSTERED ([Encounter_Table_id] ASC)
Below is the table structure for staging and I have removed a few of the columns:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Stg_Encounter]
(
[encntr_id] [float] NOT NULL,
[person_id] [float] NOT NULL,
[visit_id_stay_number] [varchar](1000) NULL,
[mrn] [varchar](1000) NULL,
[encntr_type_cd] [float] NULL,
[reg_dt_tm] [datetime2](7) NULL,
[disch_dt_tm] [datetime2](7) NULL,
[admit_cd] [float] NULL,
[visit_cd] [float] NULL,
[source_cd] [float] NULL,
[sepearation_cd] [float] NULL,
[medical_service_cd] [float] NULL,
[reason_problem] [varchar](1000) NULL,
) ON [PRIMARY]
GO
For the actual DWH, the table structure is as below :
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Encounter]
(
[Encounter_Table_id] [int] NOT NULL,
[Encounter_id] [int] NOT NULL,
[Person_id] [int] NOT NULL,
[Visit_ID] [varchar](1000) NULL,
[MRN] [varchar](1000) NULL,
[Encounter_Type] [varchar](1000) NULL,
[Arrival_Dt_Tm] [datetime2](7) NULL,
[Departure_Dt_Tm] [datetime2](7) NULL,
[Mode_of_Arrival] [varchar](1000) NULL,
[Visit_Type] [varchar](1000) NULL,
[Admit_Source] [varchar](1000) NULL,
[Mode_of_Separation] [varchar](1000) NULL,
[Medical_Service] [varchar](1000) NULL,
[Presenting_Problem] [varchar](1000) NULL,
[LOAD_Dt_Tm] [datetime] NOT NULL,
[Data_Source] [varchar](1000) NOT NULL,
CONSTRAINT [PK_EncounterTB_Encounter_id]
PRIMARY KEY CLUSTERED ([Encounter_id] ASC)
) ON [PRIMARY]
GO
Below insert is being used for inserting data :
INSERT INTO [ACTUAL_DWH].[dbo].[Encounter]
(
[Encounter_Table_id]
,[Encounter_id]
,[Person_id]
,[Visit_ID]
,[MRN]
,[Encounter_Type]
,[Arrival_Dt_Tm]
,[Departure_Dt_Tm]
,[Mode_of_Arrival]
,[Visit_Type]
,[Admit_Source]
,[Mode_of_Separation]
,[Medical_Service]
,[Presenting_Problem]
,[MSAU_LOAD_Dt_Tm]
,[Data_Source]
)
SELECT
[Encounter_Table_id]= CONVERT(INT,Stg_e.[encntr_id])
, [Encounter_id] = CONVERT(INT,Stg_e.[encntr_id])
,[Person_id] = CONVERT(INT,Stg_e.[person_id])
,[Visit_ID] = Stg_e.[visit_id_stay_number]
,[MRN] = Stg_e.[mrn]
,[Encounter_Type] = [ACTUAL_DWH].[dbo].[emr_get_code_Description](Stg_e.encntr_type_cd)
,[Arrival_Dt_Tm] = CONVERT(DATETIME,Stg_e.reg_dt_tm)
,[Departure_Dt_Tm] = CONVERT(DATETIME,Stg_e.disch_dt_tm)
,[Mode_of_Arrival] = [ACTUAL_DWH].[dbo].[Description](Stg_e.admit_cd)
,[Visit_Type] = [ACTUAL_DWH].[dbo].[Description](Stg_e.visit_cd)
,[Admit_Source] = [ACTUAL_DWH].[dbo].[Description](Stg_e.source_cd)
,[Mode_of_Separation] = [ACTUAL_DWH].[dbo].[Description](Stg_e.sepearation_cd)
,[Medical_Service] = [ACTUAL_DWH].[dbo].[Description](Stg_e.medical_service_cd)
,[Presenting_Problem] = Stg_e.reason_problem
,[MSAU_LOAD_Dt_Tm] = getdate()
,[Data_Source] = 'SourceName'
FROM [dbo].Stg_Encounter Stg_e
where NOT EXISTS ( SELECT 1 FROM [ACTUAL_DWH].[dbo].Encounter e
WHERE stg_e.encntr_id = e.encounter_id)
The function used is as per below :
USE [ACTUAL_DWH]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER function [dbo].[Description](#cv int)
returns varchar(80)
as begin
declare #ret varchar(80)
select #ret = cv.DESCRIPTION
from ACTUAL_DWH.DBO.CODE_VALUE cv
where cv.code_value = #cv
and cv.active_ind = 1
return isnull(#ret, 0)
end;
I am just confused where I have missed stuff!!! And what can I change. the table has around 6 million rows and it was loading them in a minute.
After the suggestions provided, I got to know that issue is with the function that I am using.I have read about CROSS APPLY but is it a good idea to apply CROSS APPLY on 15 columns?

You can use SQL CREATE INDEX Statement to retrieve data from the database very fast.
CREATE INDEX IX_Encounter
ON [ACTUAL_DWH].[dbo].[Encounter](Encounter_Table_id) ON [PRIMARY]
INSERT INTO [ACTUAL_DWH].[dbo].[Encounter]
(
[Encounter_Table_id]
,[Encounter_id]
,[Person_id]
,[Visit_ID]
,[MRN]
,[Encounter_Type]
,[Arrival_Dt_Tm]
,[Departure_Dt_Tm]
,[Mode_of_Arrival]
,[Visit_Type]
,[Admit_Source]
,[Mode_of_Separation]
,[Medical_Service]
,[Presenting_Problem]
,[MSAU_LOAD_Dt_Tm]
,[Data_Source]
)
SELECT
[Encounter_Table_id]= CONVERT(INT,Stg_e.[encntr_id])
, [Encounter_id] = CONVERT(INT,Stg_e.[encntr_id])
,[Person_id] = CONVERT(INT,Stg_e.[person_id])
,[Visit_ID] = Stg_e.[visit_id_stay_number]
,[MRN] = Stg_e.[mrn]
,[Encounter_Type] = [ACTUAL_DWH].[dbo].[emr_get_code_Description](Stg_e.encntr_type_cd)
,[Arrival_Dt_Tm] = CONVERT(DATETIME,Stg_e.reg_dt_tm)
,[Departure_Dt_Tm] = CONVERT(DATETIME,Stg_e.disch_dt_tm)
,[Mode_of_Arrival] = [ACTUAL_DWH].[dbo].[Description](Stg_e.admit_cd)
,[Visit_Type] = [ACTUAL_DWH].[dbo].[Description](Stg_e.visit_cd)
,[Admit_Source] = [ACTUAL_DWH].[dbo].[Description](Stg_e.source_cd)
,[Mode_of_Separation] = [ACTUAL_DWH].[dbo].[Description](Stg_e.sepearation_cd)
,[Medical_Service] = [ACTUAL_DWH].[dbo].[Description](Stg_e.medical_service_cd)
,[Presenting_Problem] = Stg_e.reason_problem
,[MSAU_LOAD_Dt_Tm] = getdate()
,[Data_Source] = 'SourceName'
FROM [dbo].Stg_Encounter Stg_e
where NOT EXISTS ( SELECT 1 FROM [ACTUAL_DWH].[dbo].Encounter e
WHERE stg_e.encntr_id = e.encounter_id)
You can check more info about index here.INDEX

Just to close this post. As suggested I had tried to break down the query and found that the function was the culprit. I am exploring it further for the resolution.
The function is taking parameter and running a SQL on another table. Which is slowing down the query. If I do the insert without the function it actually takes a few seconds to load 6 million rows.

SQL Server : How to combine multiple database into one database?

From my original database, I made changes to some tables with columns in the table, I want to merge them into a single database. New database just add some table and old table add some columns.
How to merge multiple database into one database?
SQL example:
CREATE TABLE [dbo].[Item]
(
[ItemID] [nchar](10) NOT NULL,
[Money] [bigint] NOT NULL,
[ItemName] [bigint] NOT NULL,
[MoneyType] [bigint] NOT NULL,
CONSTRAINT [PK_Item]
PRIMARY KEY CLUSTERED ([ItemID] ASC) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Account]
(
[Index] [int] IDENTITY(1,1) NOT NULL,
[AccountID] [nchar](10) NOT NULL,
[AccountName] [int] NOT NULL,
[ItemList] [int] NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Money]
(
[AccountID] [nchar](10) NOT NULL,
[Money] [bigint] NOT NULL,
[MoneyType] [bigint] NOT NULL,
CONSTRAINT [PK_Money]
PRIMARY KEY CLUSTERED ([AccountID] ASC) ON [PRIMARY]
) ON [PRIMARY]
GO
-> Nick.McDermaid: use the schema compare tool in Visual Studio (various free editions) which will create a change script!

This will combine them onto DBCombined.Account if the Account table does NOT exist yet: the SELECT INTO code creates the target table. You would then need to add any indexes from the original tables. Also, "SELECT *" should really be broken out, listing each field, because if you have an ID field it will contain duplicates. Better to leave ID off during the insert and then go back and add an identity column.
USE DBCombined
GO
SELECT *
INTO Account
FROM (
SELECT *
FROM DB1.dbo.Account
UNION ALL
SELECT *
FROM DB2.dbo.Account
) Acct

Dynamic SQL table creation - invalid object after execution?

I am trying to create a table that dynamically pulls the starting IDENTITY ID based on a variable from another table. The SQL executes successfully but afterwards, I am unable to find my temporary table. The DBCC CHECKIDENT brings back Invalid object name '#address_temp'.
IF OBJECT_ID('tempdb.dbo.#address_temp', 'U') IS NOT NULL
DROP TABLE #address_temp
DECLARE #address_temp_ID VARCHAR(MAX)
SET #address_temp_ID = (SELECT MAX(ID) FROM [PRIMARYDB].[dbo].[ADDRESS])
DECLARE #SQLBULK VARCHAR(MAX)
SET #SQLBULK = 'CREATE TABLE #address_temp(
[ID] [int] IDENTITY(' + #address_temp_ID + ',1) NOT NULL,
[NAME] [varchar](128) NOT NULL,
[ADDRESS1] [varchar](128) NOT NULL,
[ADDRESS2] [varchar](128) NULL,
[CITY] [varchar](128) NULL,
[STATE_ID] [smallint] NULL,
[ZIP] [varchar](10) NOT NULL,
[COUNTY] [varchar](50) NULL,
[COUNTRY] [varchar](50) NULL
CREATE CLUSTERED INDEX pk_add ON #address_temp ([NAME])'
EXEC (#SQLBULK)
DBCC CHECKIDENT('#address_temp')

Table names that start with # are temporary tables and SQL Server treats them differently. First of all they are only available to the session that created them (this is not quite true since they you can find them in the temp name space but they have a unique system generated name)
In any case they won't persist so you don't need to drop them (that happens auto-magically) and you certainly can't look at them after your session ends.... they are gone.
Don't use a temp table, take out the # in the name. Things will suddenly start working.

Quick SELECT sometimes time out

I have stored procedure which execute simple select. Any time I run it manually, it runs under the second. But in production (SQL Azure S2 database) it runs inside scheduled task every 12 ours - so I think it is reasonable to expect it to run every time with "cold" - with no cached data. And the performance is very unpredictable - sometimes it takes 5 second, sometimes 30 and sometimes even 100.
The select is optimized to the maximum (of my knowledge, anyway) - I created filtered index including all the columns returned from SELECT, so the only operation in execution plan is Index scan. There is huge difference between estimated and actual rows:
But overall the query seems pretty lightweight. I do not blame environment (SQL Azure) because there is A LOT of queries executing all the time, and this one is the only one with this performance problem.
Here is XML execution plan for SQL ninjas willing to help : http://pastebin.com/u5GCz0vW
EDIT:
Table structure:
CREATE TABLE [myproject].[Purchase](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductId] [nvarchar](50) NOT NULL,
[DeviceId] [nvarchar](255) NOT NULL,
[UserId] [nvarchar](255) NOT NULL,
[Receipt] [nvarchar](max) NULL,
[AppVersion] [nvarchar](50) NOT NULL,
[OSType] [tinyint] NOT NULL,
[IP] [nchar](15) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ValidationState] [smallint] NOT NULL,
[ValidationInfo] [nvarchar](max) NULL,
[ValidationError] [nvarchar](max) NULL,
[ValidatedOn] [datetime] NULL,
[PurchaseId] [nvarchar](255) NULL,
[PurchaseDate] [datetime] NULL,
[ExpirationDate] [datetime] NULL,
CONSTRAINT [PK_Purchase] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
Index definition:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
INCLUDE ( [ProductId],
[DeviceId],
[UserId],
[Receipt],
[AppVersion],
[OSType],
[IP],
[CreatedOn],
[ValidationState],
[ValidationInfo],
[ValidationError],
[PurchaseId],
[PurchaseDate])
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
Data can be considered sensitive, so I cant provide sample.

Since your query returns only 1 match, I think you should trim down your index to a bare minimum. You can get the remaining columns via a Key Lookup from the clustered index:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
This doesn't eliminate the scan, but it makes the index much leaner for a fast read.
Edit: OP stated that the slimmed-down index was ignored by SQL Server. You can force SQL Server to use the filter index:
SELECT *
FROM [myproject].[Purchase] WITH (INDEX(IX_AndroidRevalidationTargets3))

How can I optimise this SQL query

I'm writing a piece of software that is meant to identify files that have been put onto the web server (CMS) but are no longer needed and should/could be deleted.
To start with I'm trying to reproduce all required steps manually.
I'm using a batch script executed in the webroot to identify all (relevant) files on the server. Then, I'm importing the list to SQL Server and the table looks like this:
id filename
1 filename1.docx
2 files/file.pdf
3 files/filename2.docx
4 files/filename3.docx
5 files/file1.pdf
6 file2.pdf
7 file4.pdf
I also have a CMS database (Alterian/Immediacy CMC 6.X) which has 2 tables storing page content: page_data and PageXMLArchive.
I would like to scan the database to see if the files from the first table are referenced anywhere in the content of the site - p_content column from page_data table and PageXML column from PageXMLArchive table.
So I have a loop which gets each filename and checks if it's referenced in any of those tables, if it is it skips it, if it ain't it adds it to a temporary table.
At the end of the query the temporary table is displayed.
Query below:
DECLARE #t as table (_fileName nvarchar(255))
DECLARE #row as int
DECLARE #result as nvarchar(255)
SET #row = 1
WHILE(#row <= (SELECT COUNT(*) FROM ListFileReport))
BEGIN
SET #result = (SELECT [FileName] FROM ListFileReport WHERE id = #row)
IF ((SELECT TOP(1) p_content FROM page_data WHERE p_content LIKE '%' + LTRIM(RTRIM(#result)) + '%') IS NULL) OR ((SELECT TOP(1) PageXML FROM PageXMLArchive WHERE PageXML LIKE '%' + LTRIM(RTRIM(#result)) + '%') IS NULL)
BEGIN
INSERT INTO #t (_fileName) VALUES(#result)
END
SET #row = #row + 1
END
select * from #t
Unfortunately due to my poor SQL skills the query takes over 2 hours to execute and times out.
How can I imporve that query, or change it to achieve a similar thing without having to run 1000s of WHERE x LIKE statements on ntext fields? I can't make any changes to the database, it has to stay untouched (or it won't be supported - big deal for our customers).
Thanks
EDIT:
Currently I'm working around the issue by batching the results few hundred at a time. It works but takes forever.
EDIT:
Can I possibly utilise Full-Text search to achieve this? I am willing to take a snapshot of the database and work on the copy if there is a way of changing the schema to achieve the desired results.
page_data table:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[page_data] Script Date: 12/13/2011 13:19:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[page_data](
[p_page_id] [int] NOT NULL,
[p_title] [nvarchar](120) NULL,
[p_link] [nvarchar](250) NULL,
[p_content] [ntext] NULL,
[p_parent_id] [int] NULL,
[p_top_id] [int] NULL,
[p_stylesheet] [nvarchar](50) NULL,
[p_author] [nvarchar](50) NULL,
[p_last_update] [datetime] NULL,
[p_order] [smallint] NULL,
[p_window] [nvarchar](10) NULL,
[p_meta_keywords] [nvarchar](1000) NULL,
[p_meta_desc] [nvarchar](2000) NULL,
[p_type] [nvarchar](1) NULL,
[p_confirmed] [int] NOT NULL,
[p_changed] [int] NOT NULL,
[p_access] [int] NULL,
[p_errorlink] [nvarchar](255) NULL,
[p_noshow] [int] NOT NULL,
[p_edit_parent] [int] NULL,
[p_hidemenu] [int] NOT NULL,
[p_subscribe] [int] NOT NULL,
[p_StartDate] [datetime] NULL,
[p_EndDate] [datetime] NULL,
[p_pageEnSDate] [int] NOT NULL,
[p_pageEnEDate] [int] NOT NULL,
[p_hideexpiredPage] [int] NOT NULL,
[p_version] [float] NULL,
[p_edit_order] [float] NULL,
[p_order_change] [datetime] NOT NULL,
[p_created_date] [datetime] NOT NULL,
[p_short_title] [nvarchar](30) NULL,
[p_authentication] [tinyint] NOT NULL,
CONSTRAINT [aaaaapage_data_PK] PRIMARY KEY NONCLUSTERED
(
[p_page_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order] DEFAULT (0) FOR [p_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_con__1CF15040] DEFAULT (0) FOR [p_confirmed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_cha__1DE57479] DEFAULT (0) FOR [p_changed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_acc__1ED998B2] DEFAULT (1) FOR [p_access]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_nos__1FCDBCEB] DEFAULT (0) FOR [p_noshow]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_edi__20C1E124] DEFAULT (0) FOR [p_edit_parent]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_hid__21B6055D] DEFAULT (0) FOR [p_hidemenu]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_subscribe] DEFAULT (0) FOR [p_subscribe]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnSDate] DEFAULT (0) FOR [p_pageEnSDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnEDate] DEFAULT (0) FOR [p_pageEnEDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_hideexpiredPage] DEFAULT (1) FOR [p_hideexpiredPage]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_version] DEFAULT (0) FOR [p_version]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_edit_order] DEFAULT (0) FOR [p_edit_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order_change] DEFAULT (getdate()) FOR [p_order_change]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_created_date] DEFAULT (getdate()) FOR [p_created_date]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_authentication] DEFAULT ((0)) FOR [p_authentication]
GO
PageXMLArchive table:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[PageXMLArchive] Script Date: 12/13/2011 13:20:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PageXMLArchive](
[ArchiveID] [bigint] IDENTITY(1,1) NOT NULL,
[P_Page_ID] [int] NOT NULL,
[p_author] [nvarchar](100) NULL,
[p_title] [nvarchar](400) NULL,
[Version] [int] NOT NULL,
[PageXML] [ntext] NULL,
[ArchiveDate] [datetime] NOT NULL,
CONSTRAINT [PK_PageXMLArchive] PRIMARY KEY CLUSTERED
(
[ArchiveID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[PageXMLArchive] ADD CONSTRAINT [DF_PageXMLArchive_ArchiveDate] DEFAULT (getdate()) FOR [ArchiveDate]
GO

You can avoid the loop in many ways, and here is an example...
SELECT
*
FROM
ListFileReport
WHERE
NOT EXISTS (
SELECT *
FROM page_data
WHERE p_content LIKE '%' + LTRIM(RTRIM(ListFileReport.FileName)) + '%'
)
AND
NOT EXISTS (
SELECT *
FROM PageXMLArchive
WHERE PageXML LIKE '%' + LTRIM(RTRIM(ListFileReport.FileName)) + '%'
)
Note: This removes the loop, and will yield a massive improvement because of that. But it still has to parse the whole of both lookup tables for every entry in ListFileReport, without any clever algorithmics, s their could be no useful indexing. So it will still be slow as a dog, it'll just have one broken leg instead of two.
The only way to avoid using LIKE is to parse all of the fields in the page_data and PageXMLArchive tables and create a list of referenced files. As HTML and XML are very structured, this can be done, but I'd look for a library or something to do it for you.
Then, you can create a another table with all of the files, without duplication, and with an appropriate index. Querying against that instead of using LIKE will be massively faster. I have no doubts at all. But writing or finding the code will be a chore.

A stored procedure especially has a loop with select and insert mixed will defiantly slow down the query.
ideally if you could insert into #table select a, b from table it will millions time faster than insert each row separately.
For your example, could do something like:
insert into #t (_fileName) select ... from p_content join ...on .. where sth like %sth
let me know if it is not applicable.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Showing error in syntax on Azure synapse SQL script - sql

Related

Insert statement running very slowly on SQL Server

SQL Server : How to combine multiple database into one database?

Dynamic SQL table creation - invalid object after execution?

Quick SELECT sometimes time out

How can I optimise this SQL query

Categories

Resources