Why upsert is not a fundamental SQL operation [closed]

Why upsert is not a fundamental SQL operation [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why don't SQL support the upsert use case? I am not asking how to do in a particular db(*). What I want to know is why upsert is not a fundamental operation like insert and update. I mean it is a pretty straightforward use case right? I guess there must be some fundamental db first principle that gets broken when doing upsert or some technical challenge that the db engine faces when confronted with a upsert.
*. I know about the mysql syntax and the SQL Merge syntax. BTW even while using such db specific syntax you need to be careful about atomicity and locking. And using the merge syntax, it doesn't feel right having to create a psuedo table.
Edit: I am editing this to clarify that I am not asking an opinion. So I dont think this question should be blocked.

Because it isn't easily handable, both acid and syntax-wise.
The conditions for update if exists isn't clear.
For example, replace "insert into" with upsert in the below query
insert into t_something
select * from t_whatever
No foreign keys, no primary keys.
How do you want to update ?
Would the where condition be for the select, or for the update ?
Ultimately, you have to write the condition, and then you can just as well do a "update/insert if"...
Usually, when you're asking yourself the upsert question, you're handling inserting/updating wrong.
You're thinking in object terms instead of set terms.
You want to loop through an array of objects, and insert if count(*) on exists is 0 else update.
That's how object-oriented imperative programming works, but that's not how SQL works.
In SQL, you operate with a SET.
You can easily do a inner join - update on the SET
and a left join where null insert on the same SET.
That's just as comfortable as a merge, and a lot more readable plus simpler to debug.
And it might well be faster.
You can already ensure it's all atomic by putting update & insert into a transaction.
Thinking of upsert, which idiotism do you want next ? "UpSertLeteTrunc" ? MerDel ?
Or perhaps truncsert ?
There are more important things to do, by far.
This is how I do Upsert with MERGE on SQL-Server:
-- How to create the XML
/*
DECLARE #xml XML
SET #xml = ( SELECT (SELECT * FROM T_Benutzer FOR XML PATH('row'), ROOT('table'), ELEMENTS xsinil) AS outerXml )
-- SELECT #xml
*/
DECLARE #xml xml
SET #xml = '<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<PLK_UID>7CA68E6E-E998-FF92-BE70-126064765EAB</PLK_UID>
<PLK_Code>A2 Hoch</PLK_Code>
<PLK_PS_UID>6CF3B5AB-C6C8-4A12-8717-285F95A1084B</PLK_PS_UID>
<PLK_DAR_UID xsi:nil="true" />
<PLK_Name_DE>Mit Legende</PLK_Name_DE>
<PLK_Name_FR>Avec Légende</PLK_Name_FR>
<PLK_Name_IT>Con Leggenda</PLK_Name_IT>
<PLK_Name_EN>With Legend</PLK_Name_EN>
<PLK_IsDefault>0</PLK_IsDefault>
<PLK_Status>1</PLK_Status>
</row>
</table>'
DECLARE #handle INT
DECLARE #PrepareXmlStatus INT
EXEC #PrepareXmlStatus = sp_xml_preparedocument #handle OUTPUT, #XML
;WITH CTE AS
(
SELECT
PLK_UID
,PLK_Code
,PLK_PS_UID
,PLK_DAR_UID
,PLK_Name_DE
,PLK_Name_FR
,PLK_Name_IT
,PLK_Name_EN
,PLK_IsDefault
,PLK_Status
FROM OPENXML(#handle, '/table/row', 2) WITH
(
"PLK_UID" uniqueidentifier 'PLK_UID[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Code" character varying(10) 'PLK_Code[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_PS_UID" uniqueidentifier 'PLK_PS_UID[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_DAR_UID" uniqueidentifier 'PLK_DAR_UID[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Name_DE" national character varying(255) 'PLK_Name_DE[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Name_FR" national character varying(255) 'PLK_Name_FR[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Name_IT" national character varying(255) 'PLK_Name_IT[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Name_EN" national character varying(255) 'PLK_Name_EN[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_IsDefault" bit 'PLK_IsDefault[not(#*[local-name()="nil" and . ="true"])]'
,"PLK_Status" int 'PLK_Status[not(#*[local-name()="nil" and . ="true"])]'
) AS tSource
WHERE (1=1)
-- AND NOT EXISTS
-- (
-- SELECT * FROM T_VWS_Ref_PdfLegendenKategorie
-- WHERE T_VWS_Ref_PdfLegendenKategorie.PLK_UID = tSource.PLK_UID
--)
)
-- SELECT * FROM CTE
MERGE INTO T_VWS_Ref_PdfLegendenKategorie AS A
USING CTE ON CTE.PLK_UID = A.PLK_UID
WHEN MATCHED
THEN UPDATE
SET A.PLK_Code = CTE.PLK_Code
,A.PLK_PS_UID = CTE.PLK_PS_UID
,A.PLK_DAR_UID = CTE.PLK_DAR_UID
,A.PLK_Name_DE = CTE.PLK_Name_DE
,A.PLK_Name_FR = CTE.PLK_Name_FR
,A.PLK_Name_IT = CTE.PLK_Name_IT
,A.PLK_Name_EN = CTE.PLK_Name_EN
,A.PLK_IsDefault = CTE.PLK_IsDefault
,A.PLK_Status = CTE.PLK_Status
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
PLK_UID
,PLK_Code
,PLK_PS_UID
,PLK_DAR_UID
,PLK_Name_DE
,PLK_Name_FR
,PLK_Name_IT
,PLK_Name_EN
,PLK_IsDefault
,PLK_Status
)
VALUES
(
CTE.PLK_UID
,CTE.PLK_Code
,CTE.PLK_PS_UID
,CTE.PLK_DAR_UID
,CTE.PLK_Name_DE
,CTE.PLK_Name_FR
,CTE.PLK_Name_IT
,CTE.PLK_Name_EN
,CTE.PLK_IsDefault
,CTE.PLK_Status
)
-- WHEN NOT MATCHED BY SOURCE THEN DELETE
;
EXEC sp_xml_removedocument #handle

Related

Converting a single XML bracket column to multiple columns in T-SQL SQL-SERVER

I've seen similar questions to mine but the XML format has always been different, as the XML format I have does not follow the "standard" strcuture. My table looks like the following (single column with XML bracket as row values):
|VAL|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
And the outcome I'm looking for is:
|Name|age|city |occupation|
|bob |22 |new york|student |
|bob |22 |new york|student |
I can create hardcoded script with these column names, but the problem is that I have over 20 tables that would then all require a custom script. My thinking is that there is a way where I can dynamically, taken into account destination table and a source table (xml), I could have a procedure where this data is generated.

Your question is not all clear...
As far as I understand, you have a variety of different XMLs and you want to read them generically. If this is true, I'd suggest for your next question, to reflect this in your sample data.
One general statement is: There is no way around dynamically created statements, in cases, where you want to set the descriptive elements of a resultset (in this case: the names of the columns) dynamically. T-SQL relies on some things you must know in advance.
Try this:
I set up a mockup scenario to simulate your issue (please try to do this yourself in your next question):
DECLARE #tbl TABLE(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO #tbl VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--this query relies on all possible attributes known in advance.
--common attributes, like the name, are returned for a person and for a country
--differing attributes return as NULL.
--One advantage might be, that you can use a specific datatype if appropriate.
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#capital)[1]','nvarchar(max)') AS [capital]
,t.VAL.value('(/*[1]/#continent)[1]','nvarchar(max)') AS [continent]
FROM #tbl t;
--This query returns as classical EAV (entity-attribute-value) list
--In this result you get each attribute on its own line
SELECT t.ID
,t.Descr
,A.attrs.value('local-name(.)','nvarchar(max)') AS AttrName
,A.attrs.value('.','nvarchar(max)') AS AttrValue
FROM #tbl t
CROSS APPLY t.VAL.nodes('/*[1]/#*') A(attrs);
Both approaches might be generated as a statement on string-level and then executed by EXEC() or sp_executesql.
Hint: One approach might be to insert the EAV list into a tolerant staging table and proceed from there with conditional aggregation, PIVOT or hardcoded VIEWs.
Dynamic approach
In order to read the <person> elements we would need this:
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
FROM #tbl t
WHERE VAL.value('local-name(/*[1])','varchar(100)')='person';
All we have to do is to generate the changing part:
Try this:
A new mockup with a real table
CREATE TABLE SimulateYourTable(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO SimulateYourTable VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--Filter for <person> entities
DECLARE #entityName NVARCHAR(100)='person';
--This is a string representing the command
DECLARE #cmd NVARCHAR(MAX)=
'SELECT t.ID
,t.Descr
***columns here***
FROM SimulateYourTable t
WHERE VAL.value(''local-name(/*[1])'',''varchar(100)'')=''***name here***''';
--with this we can create all the columns
--Hint: With SQL Server 2017+ there is STRING_AGG() - much simpler!
DECLARE #columns NVARCHAR(MAX)=
(
SELECT CONCAT(',t.VAL.value(''(/*[1]/#',Attrib.[name],')[1]'',''nvarchar(max)'') AS ',QUOTENAME(Attrib.[name]))
FROM SimulateYourTable t
CROSS APPLY t.VAL.nodes('//#*') AllAttrs(a)
CROSS APPLY (SELECT a.value('local-name(.)','varchar(max)')) Attrib([name])
WHERE VAL.value('local-name(/*[1])','varchar(100)')=#entityName
GROUP BY Attrib.[name]
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)');
--Now we stuff this into our command
SET #cmd=REPLACE(#cmd,'***columns here***',#columns);
SET #cmd=REPLACE(#cmd,'***name here***',#entityName);
--This is the command.
--Hint: You might use this to create physical VIEWs without the need to type them in...
PRINT #cmd;
You can use EXEC(#cmd) to execute this dynamic SQL and check the result.

T-SQL Identity Seed Expression [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Is it possible to use an expression for identity seeds in T-SQL?
We're putting a bunch of tables together, using BIGINT for the ID columns, but want to stagger the seeds so that the tables are not overlapping their ID number spaces.
This is much easier to do using hex values, so we can mask out the highest two bytes or so, such as seed 1 * 0x1000000000000 for table1 and seed 2 * 0x1000000000000 for table2 etc. - thereby still leaving plenty of possible IDs available for each table.
The problem here is, SQL doesn't like seeing the hex values or multiplication in the IDENTITY statement, so we tried manually casting them to BIGINT but the error persisted: "incorrect syntax, expected INTEGER or NUMERIC."
It seems like T-SQL doesn't want to see anything other than a single literal value, in decimal (not hex), with no math operations.
We can deal with this, by doing the math ourselves and converting the numbers to decimal - but we'd like to avoid this if possible, since the numbers are more difficult to keep track of in decimal format - bug prone, etc.
(I should explain, bug-prone, because we use these values to determine which table an object belongs to, based solely on it's ID value being in the appropriate number space - those first two bytes being a sort of "table ID")
However, is there another way to accomplish what I'm describing using hex values and multiplication, while using some weird syntax that T-SQL can accept?
I know this is an inconvenience, not a blocking issue, but I want to make sure there truly aren't any alternatives before we settle on this workaround.

Just blend bad ideas by using dynamic SQL:
declare #Bar as BigInt = 0x1000000000000;
declare #Foo as NVarChar(1000) = 'dbcc checkident(''Foo'', RESEED, ' + cast( 2 * #Bar as NVarChar(64) ) + ')';
exec sp_executesql #Foo;
I'm sure a RegEx would improve it.

Create a trigger (before insert) for this table, and disable identity.
Convert Hex to int: Convert integer to hex and hex to integer
Example:
CREATE TRIGGER TRIGGER_NAME
ON TABLE_NAME
INSTEAD OF INSERT
AS
BEGIN
IF (SELECT ID FROM INSERTED) IS NULL
BEGIN
DECLARE #INITIAL_ID BIGINT = (SELECT CONVERT(BIGINT, 0x1000000000000) * 2)
DECLARE #NEW_ID BIGINT =
ISNULL( (SELECT MAX(ID) FROM TABLE_NAME) + 1, #INITIAL_ID )
SELECT * INTO #INSERTED FROM INSERTED
UPDATE #INSERTED SET ID = #NEW_ID
INSERT INTO TABLE_NAME SELECT * FROM #INSERTED
END
ELSE
BEGIN
INSERT INTO TABLE_NAME SELECT * FROM INSERTED
END
END
GO

When you create the table you can set the initial seed value.
create table Table1(Id int Identity(10000000,1), Name varchar(255));
or you can use this statement on a table that is already created
DBCC CHECKIDENT ('dbo.Table1', RESEED, 20000000);
The next entry will be 20000001 assuming you have the step = 1

How to escape ampersand in MS SQL

I have a table named tblCandy with an XML field named CandySpecs. When I try to add a value containing an ampersand (&) I get the error:
UPDATE tblCandy SET OrigOtherData.modify ('insert <BrandName>M&Ms</BrandName> as first into (CandySpecs/Table)[1]') WHERE RecordID = 1
Msg 2282, Level 16, State 1, Line 1
XQuery [tblCandy.CandySpecs.modify()]: Invalid entity reference
I’ve tried various escape sequences with no luck:
/&
\&
&&
There is a lot of guidance out there on this issue and I’m wondering if there is one best way to address this problem.

Here's a much better way to deal with this:
UPDATE tblCandy SET OrigOtherData.modify ('insert <BrandName><![CDATA[M&Ms]]></BrandName> as first into (CandySpecs/Table)[1]') WHERE RecordID = 1
Explanation: the CDATA tag tells the XML to ignore character markup for this block of data.
Related StackOverflow question (not strictly a dupe, but would be worth reading if you're not familiar with this): What does <![CDATA[]]> in XML mean?
This will bypass not only the &, but also other potentially breaking pieces of data such as < and > that could potentially exist within the data you're dealing with.

Special symbols in SQL server are being escaped with \
in your example statement would look following:
UPDATE tblCandy SET OrigOtherData.modify ('insert <BrandName>M\&Ms</BrandName> as first into (CandySpecs/Table)[1]') WHERE RecordID = 1

Using & instead of just &.
I found the answer on this article: http://www.techrepublic.com/article/beware-of-the-ampersand-when-using-xml/

SET NOCOUNT ON
GO
CREATE TABLE tblCandy ( Id INT, Brandname XML )
GO
INSERT INTO tblCandy VALUES ( 1, '<Brandname >test</Brandname >' )
GO
SELECT 'before', * FROM tblCandy
UPDATE tblCandy
SET Brandname.modify('replace value of (//Brandname/text())[1]
with string("as first into")')
WHERE Id = 1
SELECT 'After', * FROM tblCandy
GO
DROP TABLE tblCandy
GO

SQL SELECT ... IN(xQuery)

My first xQuery, may be a bit basic, but it's the only time I need to use it at the moment)
DECLARE #IdList XML;
SET #IdList =
'<Ids>
<Id>6faf5db8-b434-437f-99c8-70299f82dab4</Id>
<Id>5b3ddaf1-3412-471a-a6cf-71f8e1c31168</Id>
<Id>1da6136d-2ff5-44cc-8510-4713451aac4d</Id>
</Ids>';
What I want to do is:
SELECT * FROM [MyTable] WHERE [Id] IN ( /* This list of Id's in the XML */ );
What is the desired way to do this?
Note: The format of the XML (passed into a Stored Procedure from C#) is also under my control, so if there is a better structure (for performance), then please include details.
Also, there could be 1000's of Id's in the real list if that makes a difference..
Thanks
Rob

If you're passing 1000's of Id's, I don't think this will be a stellar performer, but give it a try:
DECLARE #IdList XML;
SET #IdList =
'<Ids>
<Id>6faf5db8-b434-437f-99c8-70299f82dab4</Id>
<Id>5b3ddaf1-3412-471a-a6cf-71f8e1c31168</Id>
<Id>1da6136d-2ff5-44cc-8510-4713451aac4d</Id>
</Ids>';
select * from mytable where id in
(
select cast(T.c.query('text()') as varchar(36)) as result from #idlist.nodes('/Ids/Id') as T(c)
)
You might look at table valued parameters as a better way, unless you already have your data in XML.

how to write the store procedure for searching (CSV)?

how can i write the store procedure for searching particular string in a column of table, for given set of strings (CSV string).
like : select * from xxx where tags like ('oscar','rahman','slumdog')
how can i write the procedure for that combination of tags.

To create a comma seperated string...
You could then apply this list to Oded example to create the LIKE parts of the WHERE cluase on the fly.
DECLARE #pos int, #curruntLocation char(20), #input varchar(2048)
SELECT #pos=0
SELECT #input = 'oscar,rahman,slumdog'
SELECT #input = #input + ','
CREATE TABLE #tempTable (temp varchar(100) )
WHILE CHARINDEX(',',#input) > 0
BEGIN
SELECT #pos=CHARINDEX(',',#input)
SELECT #curruntLocation = RTRIM(LTRIM(SUBSTRING(#input,1,#pos-1)))
INSERT INTO #tempTable (temp) VALUES (#curruntLocation)
SELECT #input=SUBSTRING(#input,#pos+1,2048)
END
SELECT * FROM #tempTable
DR0P TABLE #tempTable

First off, the use of like for exact matches is sub-optimal. Might as well use =, and if doing so, you can use the IN syntax:
select * from xxx
where tags IN ('oscar', 'rahman', 'slumdog')
I am guessing you are not looking for an exact match, but for any record where the tags field contains all of the tags.
This would be something like this:
select * from xxx
where tags like '%oscar%'
and tags like '%rahman%'
and tags like '%slumdog%'
This would be not be very fast or performant though.
Think about moving this kind of logic into your application, where it is faster and easier to do.
Edit:
Following the comments - there are lots of examples on how to parse delimited strings out there. You can put these in a table and use dynamic sql to generate your query.
But, this will have bad performance and SQL Server will not be able to cache query plans for this kind of thing. As I said above - think about moving this kind of logic to application level.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas