Converting Cursor based Duplication Procedure to SQL - sql

I am trying to take a stored procedure that copies parent/child/grandchild rows into the same tables with a new unique identifier. The purpose of this is to produce a duplicate 'Order' with 'Order Lines' and 'Order Line Attributes'. The procedure currently in place is done using cursors, and I'd like to try and create a set based version.
One issue I've hit early on, is that automatic numbering in a human friendly format is done in a stored procedure.
DECLARE #sales_order_id nvarchar(50)
EXEC GetAutoNumber 'Order', #sales_order_id output
The execution is done within the cursor as it loops through the order Lines of a single order. Is there any way to call this on the fly? I thought of using a table value function but can't because the stored procedure updates the autonumber table to create the new value.
Ideally I would want to craft an insert statement that automatically retrieves/updates the AutoNumber that could be done across multiple rows simulatenously, for example:
INSERT INTO ORDER (
OrderId, -- Guid
OrderNumber, -- Human Friendly value that needs Autoincremented
...
)
SELECT
NEWID(),
???
FROM ORDER
WHERE OrderId = #OrderToBeCopied
I'm using SQL Server 2008, any suggestions?
EDIT: One reason that an identity column would not work is that the table these autonumbers are being stored in serves multiple entities and their prefixes. For instance, here is the DDL for the autonumber table:
CREATE TABLE [dbo].[pt_autonumber_settings](
[autonumber_id] [uniqueidentifier] NULL,
[autonumber_prefix] [nvarchar](10) NULL,
[autonumber_type] [nvarchar](50) NULL,
[changed_by] [nvarchar](30) NOT NULL,
[change_date] [datetime] NOT NULL,
[origin_by] [nvarchar](30) NOT NULL,
[origin_date] [datetime] NOT NULL,
[autonumber_currentvalue] [nvarchar](50) NULL
) ON [PRIMARY]
So the end result from the stored procedure is the newest autonumber_id for a certain autonumber_type, and it also retrieves the autonumber_prefix and concatenates the two together.

Is there some reason you can't use an IDENTITY column?

Please read edit below as my original answer isn't satisfactory
I'm not entirely sure how your OrderNumber is incremented but you could certainly use ROW_NUMBER() for this. Check out the MSDN doco on it
Assuming you just wanted a number allocated to each Order Id, then you'd have something like;
SELECT NEWID(),
ROW_NUMBER() OVER (ORDER BY <whatever column you need to order by to make the row number meaningful>) As MyFriendlyId
FROM ORDER
WHERE OrderId = #OrderToBeCopied
If it needs to have some sort of initial seed value, then you can always use
ROW_NUMBER() OVER (ORDER BY <whatever column you need to order by to make the row number meaningful>) + 1000 As MyFriendlyId -- or whatever your seed value should be
Edit:
I just re-read your question, and I suspect you wish to make OrderNumber to be unique across all records. I misread it initially to be something like an incremental line number to detail line items of the OrderId.
My solution in this case won't be any good, and I'd more inclined to go with the other answer suggested about having an identity column.
You could potentially select MAX(OrderNumber) at the beginning and then use that in conjunction with ROW_NUMBER but this is dangerous as it is likely to be a dirty read and won't guarantee uniquness (if someone performs a concurrent insert). If you did have a unique constraint and there was a concurrent insert while both reading the same MAX(OrderNumber) then you are likely to face unique constraint violations...so yeah...why can't you use an identity column :)

Related

how can i replace columns specially with these rows in transact sql

how can i replace columns with those in transact sql? I only have this code this way.
I could do it directly in sms but I don't understand some things in this code so I prefer to do it directly in transact to be safer.
For example I can make an Id column with int but I don't understand the "Identity" and (1,1)... the get date I have to put it where... so here it is
Thanks
[Id] INT IDENTITY (1, 1) NOT NULL,
[DateCreated] DATETIMEOFFSET NOT NULL DEFAULT (getdate()),
These two fields (or columns) contain auto-generated data. So, let's say you have 3 fields; ID, DateCreated and Username. You will only ever enter data for Username. ID will auto-generate sequential numbers (the "(1,1)" means, "Begin with the number 1, add 1 to the previous number for each new record), and DateCreated will automatically fill with the date you add the new record.
The IDENTITY(1,1) creates an column that automatically increases based on the arguments. With the (1,1) the value of the column starts at 1 (first argument) and increases by 1 (second argument) for each new record (with a caveat or two).
For the rest of the question, what? You want to replace columns. What are you trying to replace? The DateCreated column looks fine. For both Id and DateCreated, they are tagged NOT NULL but with the IDENTITY and DEFAULT constraint, the columns will be automatically populated so you don't actually have to provide data for either column when doing an INSERT. You'll probably want to add another column that describes the thing you are inserting (i.e. Name, Description, etc.)

Optimize SQL Query (If possible) using CONVERT(INT, SUBSTRING( and LEN FUNCTION

My situation is like that :
I have these tables:
CREATE TABLE [dbo].[HeaderResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[ReportDescription] CHAR(255) NOT NULL,
[CatalogNumber] NCHAR(255) NOT NULL,
[WorkerName] NCHAR(255) DEFAULT ('') NOT NULL,
[LastCalibrationDate] DATETIME NOT NULL,
[NextCalibrationDate] DATETIME NOT NULL,
[MachineNumber] INT NOT NULL,
[EditTime] DATETIME NOT NULL,
[Age] NCHAR(255) DEFAULT ((1)) NOT NULL,
[Current] INT DEFAULT ((-1)) NOT NULL,
[Time] BIGINT DEFAULT ((-1)) NOT NULL,
[MachineName] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
[BatchNumber] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
CONSTRAINT [PK_HeaderResultPulser]
PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[ResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[BatchNumber] CHAR(255) NOT NULL,
[DateTime] DATETIME NOT NULL,
[Ocv] FLOAT(53) NOT NULL,
[OcvMin] FLOAT(53) NOT NULL,
[OcvMax] FLOAT(53) NOT NULL,
[Ccv] FLOAT(53) NOT NULL,
[CcvMin] FLOAT(53) NOT NULL,
[CcvMax] FLOAT(53) NOT NULL,
[Delta] BIGINT NOT NULL,
[DeltaMin] BIGINT NOT NULL,
[DeltaMax] BIGINT NOT NULL,
[CurrentFail] BIT DEFAULT ((0)) NOT NULL,
[NumberInTest] INT NOT NULL
);
For every row in HeaderResultPulser I have multiple rows in ResultPulser
my key is the [HeaderResultPulser].[ReportNumber] to get a list of data in ResultPulser, and for every a lot of row with the same [ResultPulser].[ReportNumber]
It has multiple [ResultPulser].[NumberInTest] values
For example: in the ResultPulser table the data can look like this:
ReportNumber | NumberInTest
-------------+-------------
0000006211 | 1
0000006211 | 2
0000006211 | 3
0000006211 | 4
0000006211 | 5
0000006211 | 6
0000006212 | 1
0000006212 | 2
0000006212 | 3
0000006212 | 4
0000006212 | 5
NumberInTest can be 200, 500, 10000 and sometime even more..
The report number column contains two the first 7 chars are a number of machine and the rest is an incrementing number.
For example, 0000006212 is [0000006][212] == [the machine number][the incrementing number]
My query for example :
select
[HeaderResultPulser].[ReportNumber],
max(NumberInTest) as TotalCells
from
ResultPulser, HeaderResultPulser
where
((([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8, LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
and ([HeaderResultPulser].[ReportNumber] = [ResultPulser].[ReportNumber])))
group by
[HeaderResultPulser].[ReportNumber]
Actually I want to get all the rows on the machine number 0000006 that number was 211 to 815 (include both)
This query takes about 6-7 seconds
There is a lot of data (in the hundreds of millions and billions and in the future can be more and can be much more in table ResultPulser), and it can get Tens of thousands of rows in HeaderResultPulser table
And In getting receive I only receive on select a few hundred in the worst case a thousand or about two thousand if I want to go far... but (in numbers) to get the max(NumberInTest) from ResultPulser I take about (It can get to a few millions of rows)
There is any way to optimize my query? Or when It's so much data it's just must this time? (That just the way it is)
The way you are doing joins is no longer standard. It's also hard to read, and dangerous if you ever need to use left joins. Instead of joining this way:
select *
from T1, T2
where T1.column = T2.column
Use ANSI-92 join syntax instead:
select *
from T1
join T2 on T1.column = T2.column
You said that your "key" was ReportNumber. Why isn't that declared in your schema? It sounds like you want a unique constraint on HeaderResultPulser.ReportNumber, and a foreign key on the the ReportPulser table, such that ReportNumber references HeaderResultPulser (ReportNumber)
Since your report number column seems to contain two different values, your table is not in First Normal Form. This is making things difficult for you. Why not split the two parts of the "report number" into two different columns when the data is entered? This will significantly improve your query performance, because you no longer need to perform an expression against the data in the table at query time to separate the ReportNumber into atomic values.
Your comment says that the first 7 characters of the ReportNumber are the MachineNumber. But you already have MachineNumber in the HeaderReportPulser table. So why not just add a separate column for Increment? If you still need ReportNumber to exist as a column, you can make it a calculated column, as the concatenation of MachineNumber and Increment.
If you don't want to touch the "existing" schema, we can do a similar thing in reverse. Your query will not be completely sargable unless you can do something to the schema, because you have to perform some kind of expression on the data in the ReportNumber column. But maybe you have the option to use a calculated column to do this up front:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
Now we have the increment as a column in its own right. But it's still being calculated at query time, because it's not persisted. We can make it persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7) persisted;
We can also index a computed column. Since your required expression is deterministic and precise (see Indexes on Computed Columns), we don't actually have to mark it as persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
create index ix_headerreportpulser_increment on HeaderReportPulser(Increment);
You could do a similar set of operations to create the Increment and MachineNumber on the ReportPulser table. If you always want to use both values, create an index on the combination of (MachineNumber, Increment)
The biggest performance gain might be eliminating the outer group by by using a correlated subquery or lateral join:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber and
right(rp.ReportNumber, 3) between '211' and '815'
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber like '0000006%';
Your logic looks like it only wants the last three characters of the ReportNumber, so I simplified the logic. I'm not 100% that is the case -- it just seems reasonable. Regardless, there is no need to convert the values to integers and then compare as strings. And similar logic can be used even for longer report numbers.
You also want an index on ResultPulser(ReportNumber, NumberInTest) :
create index idx_resultpulser_reportnumber_numberintest on ResultPulser(ReportNumber, NumberInTest)
EDIT:
Actually, I notice that the report number matches between the two tables. So this seems simplest:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815';
You still want to be sure you have the above index on ResultPulser.
If the ReportNumber is not a fixed 10 digits, then you can use:
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815' and
len(hrp.ReportNumber) = 10
This should also use the index and return exactly what you want.
Performance Optimization of any query depends on many factors including environment you are hosting and running your query. Hardware and Software play important part in optimization of heavy running database queries. In your case you can look into following things:
USE ANSI 92 JOIN syntax instead of default cross join
e.g
select *
from T1
join T2 on T1.column = T2.column
Put indexes on columns like
[ReportNumber]
[NumberInTest]
Note: You may need index for each column in the join area which is not primary key.
Remember use of MAX is always heavy and that could be the main problem in your query.
Finally you can further look into optimizing your query syntax using following online tool where you can specify your actual query and environment you are using:
https://www.eversql.com/
Hope it help you.
If you really want to optimize performance, I propose to add a bit of logic beyond SQL structures.
Is it possible that particular value of ReportNumber is present in table ResultPulser, but not in table HeaderResultPulser? If not, and I ssupose so, there is no reason to join table HeaderResultPulser.
Then, I propose to take advantage from fact, that the condition on ReportNumber can be expressed equivalently without dividing in substrings. For your example, the condition
([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8,
LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
is equivalent to:
([ResultPulser].[ReportNumber] BETWEEN '0000006211' and '0000006815')
So the proposal is:
Create index on table ResultPulser(ReportNumber, NumberInTest)
Use selections similar to this:
select ReportNumber, max(NumberInTest) as TotalCells
from ResultPulser
where
ReportNumber BETWEEN '0000006211' and '0000006815'
group by
ReportNumber
(Please, add brackets or double quotes and capitalizations as necessary for MS SQL Server and your taste)
I would expect that good database will execute this query by index-only access, and it will be optimal from execution point of view.
Performance depends on not only on execution path, but also on setup and hardware. Please, make sure that your database has enough cache and fast disk accesses. Also concurrent load is very important.
Simple splitting the field ReportNumber into [the machine number] and [the incrementing number] will probably not improve performance of the query in form proposed by me. But it may be very convenient for other forms of access (other WHERE classes). And it will reflect the structure of the case. Even more important: It will release you from imposed limits. Currently, you have 3 digits for the [the incrementing number]. Are you sure, it will never be necessary to have more than 999 of them for single [the machine number]?
Why the field ReportNumber has type char(255), when only 10 characters are used? char(255) has fixed length, so it will be terrible wasting of space. Only database compression can help. Used space has strong influence on performance – Please, consider the above remark about the database cache.
If both these fields, [the machine number], [the incrementing number], are intergers, why not split ReportNumber and use integer type for them?
Side remark: Field names suggest that you search the total number of rows in table ResultPulser, which belong to single entry in table HeaderResultPulser. The proposed query will deliver this, only if numbers in NumberInTest are consecutive, without gaps. If this is not supplied, you have to count them rather than seek the maximum.

Adding Row in existing table (SQL Server 2005)

I want to add another row in my existing table and I'm a bit hesitant if I'm doing the right thing because it might skew the database. I have my script below and would like to hear your thoughts about it.
I want to add another row for 'Jane' in the table, which will be 'SKATING" in the ACT column.
Table: [Emp_table].[ACT].[LIST_EMP]
My script is:
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
Will this do the trick?
Your statement looks ok. If the database has a problem with it (for example, due to a foreign key constraint violation), it will reject the statement.
If any of the fields in your table are numeric (and not varchar or char), just remove the quotes around the corresponding field. For example, if emp_cod and line_no are int, insert the following values instead:
('REG','EMP',45233,'2016-06-20 00:00:00:00',2,'SKATING','JANE')
Inserting records into a database has always been the most common reason why I've lost a lot of my hairs on my head!
SQL is great when it comes to SELECT or even UPDATEs but when it comes to INSERTs it's like someone from another planet came into the SQL standards commitee and managed to get their way of doing it implemented into the final SQL standard!
If your table does not have an automatic primary key that automatically gets generated on every insert, then you have to code it yourself to manage avoiding duplicates.
Start by writing a normal SELECT to see if the record(s) you're going to add don't already exist. But as Robert implied, your table may not have a primary key because it looks like a LOG table to me. So insert away!
If it does require to have a unique record everytime, then I strongly suggest you create a primary key for the table, either an auto generated one or a combination of your existing columns.
Assuming the first five combined columns make a unique key, this select will determine if your data you're inserting does not already exist...
SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND [EMP_COD] = wsEmpCod AND [DATE] = wsDate AND [LINE_NO] = wsLineno
The wsXXX declarations, you will have to replace them with direct values or have them DECLAREd earlier in your script.
If you ran this alone and recieved a value of 1 or more, then the data exists already in your table, at least those 5 first columns. A true duplicate test will require you to test EVERY column in your table, but it should give you an idea.
In the INSERT, to do it all as one statement, you can do this ...
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
WHERE (SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND
[EMP_COD] = wsEmpCod AND [DATE] = wsDate AND
[LINE_NO] = wsLineno) = 0
Just replace the wsXXX variables with the values you want to insert.
I hope that made sense.

Why am I getting "String or Binary Data would be Truncated Error" when there's no INSERT statement?

I have this query:
USE [SomeDatabase];
GO
DECLARE #percentageValue decimal(15,4) = 1.50;
SELECT a.ID, a.Amount, a.Status
FROM [dbo].ATable as a
INNER JOIN [dbo].BTable as b
ON a.LinkToB = b.ID
INNER JOIN [OtherDatabase].[dbo].CTable as value
ON value.[Key] = CONCAT(N'APrefixAboutThisLongThatsNecessaryBecauseDontAsk',b.AltID)
WHERE a.Status = N'SomeStatus'
AND a.Amount > (COALESCE(TRY_CONVERT(DECIMAL(15,2), value.Value), 0)*#percentageValue);
GO
(Actual column names redacted for confidentiality)
And I'm getting the traditional:
"Msg 8152, Level 16, State 10, Line 3
String or binary data would be truncated."
error. Google tells me that I'm trying to insert something into a column that's too small, which makes sense.
However, this isn't an INSERT operation (and this is literally all of the SQL for my query), so I can't for the life of me detect where the truncation is happening or why. I assume this is something that's in the bowels of Transact-SQL, but the weirdest issue is that I'm getting results from the query despite the error.
On request, here's the relevant parts of the table schema.
USE [SomeDatabase]
CREATE TABLE [dbo].[ATable](
[ID] [uniqueidentifier] NOT NULL,
[Amount] [decimal](15, 2) NOT NULL,
[Status] [nvarchar](32) NOT NULL,
[LinkToB] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
GO
CREATE TABLE [dbo].[BTable](
[ID] [uniqueidentifier] NOT NULL,
[AltID] [uniqueidentifier] NOT NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
GO
USE [OtherDatabase]
GO
CREATE TABLE [dbo].[CTable](
[ID] [uniqueidentifier] NOT NULL,
[Key] [nvarchar](100) NOT NULL,
[Value] [nvarchar](max) NOT NULL)
CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
Okay, this one turns out to be interesting but obvious once you know the secret.
As stated in the comments (and on the schema), value.Value is a nvarchar(max). It's the traditional Key-Value antipattern to store data that you don't want to store somewhere specific. Now, we're running try_convert(nvarchar(max)) on this column, but that's fine, because the
ON Key = N'SomethingSpecific'
clause will mean that it only runs try_convert on the Key=N'SomethingSpecific' row, right?
Right?
Nope.
Depending on the data in the table, the execution plan can choose to try_convert every value in the column. And one of the rows in the nvarchar(max) is beyond the capacity of try_convert's parameter. Hence crashy. This also explains why I get all the results I expect back, it's evaluating those results and then coming to the crash on data rows I expected it to be ignoring.
Even better, unrelated data or structure changes can change the behaviour of the execution plan, so this bug can go "Heisenbug" really easily and vanish given more/less data, minor changes to the structure, and on one debugging pass, I'm pretty sure the whitespace.
So what's the simplest solution? Get the config value into a variable seperately to the try_convert, and use that instead.
And to anyone designing a database or writing database code:
Do NOT manipulate data types inside an SQL where clause
Do NOT use one column to store wildly different datatypes and write complex conversion routines to handle the issues
DO use the correct datatype for your data and treat additional tables with the correct datatypes as good
Remember that SQL functions often do INSERT-like operations under the hood
for grins give this a try
USE [SomeDatabase];
GO
DECLARE #percentageValue decimal(15,4) = 1.50;
DECLARE #prefix nvarchar(max) = N'APrefixAboutThisLongThatsNecessaryBecauseDontAsk';
SELECT a.ID, a.Amount, a.Status
FROM [dbo].ATable as a
INNER JOIN [dbo].BTable as b
ON a.LinkToB = b.ID
INNER JOIN [OtherDatabase].[dbo].CTable as value
ON value.[Key] = CONCAT(#prefix,b.AltID)
WHERE a.Status = #status
AND a.Amount > (COALESCE(TRY_CONVERT(DECIMAL(15,2), value.Value), 0)*#percentageValue);
GO

Enumerated text columns in SQL

I have a number of tables that have text columns that contain only a few different distinct values. I often play the tradeoff between the benefits (primarily reduced row size) of extracting the possible values into a lookup table and storing a small index in the table against the amount of work required to do so.
For the columns that have a fixed set of values known in advance (enumerated values), this isn't so bad, but the more painful case is when I know I have a small set of unique values, but I don't know in advance what they will be.
For example, if I have a table that stores log information on different URLs in a web application:
CREATE TABLE [LogData]
(
ResourcePath varchar(1024) NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
I waste a lot of space by repeating the for every request. There will be a very number of duplicate entries in this table. I usually end up with something like this:
CREATE TABLE [LogData]
(
ResourcePathId smallint NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
CREATE TABLE [ResourcePaths]
(
ResourcePathId smallint NOT NULL,
ResourceName varchar(1024) NOT NULL
)
In this case however, I no longer have a simple way to append data to the LogData table. I have to a lookup on the resource paths table to get the Id, add it if it is missing, and only then can I perform the actual insert. This makes the code much more complicated and changes my write-only logging function to require some sort of transacting against the lookup table.
Am I missing something obvious?
If you have a unique index on ResourseName, the lookup should be very fast even on a big table. However, it has disadvantages. For instance, if you log a lot of data and have to archive it off periodically and want to archive the previous month or year of logdata, you are forced to keep all of resoursepaths. You can come up with solutions for all of that.
yes inserting from existing data doing the lookup as part of the insert
Given #resource, #time and #data as inputs
insert( ResourcePathId, EventTime, ExtraData)
select ResourcePathId, #time, #data
from ResourcePaths
where ResourceName = #resource