Can I have SQL incrementally count XML elements while parsing? - sql

So this is my first foray into parsing XML, and I'm trying to figure out how to get this to work how I want.
Given the following XML format:
<Tiles>
<TileRow>
<TileValue>2</TileValue>
<TileValue>3</TileValue>
<TileValue>4</TileValue>
</TileRow>
<TileRow>
<TileValue>2</TileValue>
<TileValue>7</TileValue>
</TileRow>
</Tiles>
I want it to put it in a SQL table as the following:
| X | Y | Val |
|---|---|-----|
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 1 | 2 |
| 2 | 2 | 7 |
Basically, imagine a grid, and each "TileRow" starts a new Row in that grid. Each "TileValue" assigns the Column position in that grid, with the actual TileValue being what's in the 'cell' in that grid.
Is there a way to make SQL 'count' each time it passes over an element, or something to that effect?

Please try the following solution.
It is based on the Node Order Comparison operator in XQuery.
Node Order Comparison Operators
SQL
DECLARE #xml XML =
N'<Tiles>
<TileRow>
<TileValue>2</TileValue>
<TileValue>3</TileValue>
<TileValue>4</TileValue>
</TileRow>
<TileRow>
<TileValue>2</TileValue>
<TileValue>7</TileValue>
</TileRow>
</Tiles>';
SELECT c.value('for $i in . return count(/Tiles/TileRow[. << $i])', 'INT') AS [X]
, c.value('for $i in . return count(../*[. << $i]) + 1', 'INT') AS [Y]
, c.value('(./text())[1]', 'INT') as Value
FROM #xml.nodes('/Tiles/TileRow/TileValue') AS t(c);
Output
+---+---+-------+
| X | Y | Value |
+---+---+-------+
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 1 | 2 |
| 2 | 2 | 7 |
+---+---+-------+

Unfortunately, SQL Server does not support returning position() directly, it only allows it inside a predicate. If it did then you could simply query:
select
v.TileValue.value('position(parent::*)[1]','int'),
v.TileValue.value('./position()[1]','int'),
v.TileValue.value('text()[1]','int')
from #x.nodes('/Tiles/TileRow/TileValue') v(TileValue)
Instead, you can simulate it with ROW_NUMBER()
select
v1.rn,
row_number() over (partition by v1.rn order by (select 1)),
v2.TileValue.value('text()[1]','int')
from (
select v1.TileRow.query('.') TileRow,
row_number() over (order by (select 1)) rn
from #x.nodes('/Tiles/TileRow') v1(TileRow)
) v1
cross apply v1.TileRow.nodes('TileRow/TileValue') v2(TileValue)

Related

SELECT 1 ID and all belonging elements

I try to create a json select query which can give me back the result on next way.
1 row contains 1 main_message_id and belonging messages. (Like the bottom image.) The json format is not a requirement, if its work with other methods, it will be fine.
I store the data as like this:
+-----------------+---------+----------------+
| main_message_id | message | sub_message_id |
+-----------------+---------+----------------+
| 1 | test 1 | 1 |
| 1 | test 2 | 2 |
| 1 | test 3 | 3 |
| 2 | test 4 | 4 |
| 2 | test 5 | 5 |
| 3 | test 6 | 6 |
+-----------------+---------+----------------+
I would like to create a query, which give me back the data as like this:
+-----------------+-----------------------+--+
| main_message_id | message | |
+-----------------+-----------------------+--+
| 1 | {test1}{test2}{test3} | |
| 2 | {test4}{test5}{test6} | |
| 3 | {test7}{test8}{test9} | |
+-----------------+-----------------------+--+
You can use json_agg() for that:
select main_message_id, json_agg(message) as messages
from the_table
group by main_message_id;
Note that {test1}{test2}{test3} is invalid JSON, the above will return a valid JSON array e.g. ["test1", "test2", "test3"]
If you just want a comma separated list, use string_agg();
select main_message_id, string_ag(message, ', ') as messages
from the_table
group by main_message_id;

How to split a string to rows in SQL without function

I have the following string
Technology|faa5d304-f2d1-42c3-8d21-e87697b42bdc;Application|56b19e9a-e58a-4c79-a518-b129fb5f499f;Database|d7425391-8f8c-4aec-be04-9caf2f55584a;Mobile/BYOD|8f0f30e7-d16d-48a3-ad82-cfdd39156760;Networking|3876dbd8-8cd8-4040-9c67-0633f8477f93;Operating System|10fc2ce4-53fd-4af2-8fd9-9df66a38715f;Reporting|00307182-43f4-4bbf-9a95-cd8dbf59754a;Security|014e8d4d-4fd9-404c-8db8-13e84c9042fe;User Interface|57d65a47-6ad2-4df7-8d36-acdf3e0a3145;Web Tech|1b9e82eb-5f70-4183-9093-5
Each word in bold has to be a row. I am using the different recommendations, but I could only retrieve the first word, Technology. I need each bold words in a row. I need to approach this without function. I am using SQL Server 2012.
Try this:
DECLARE #Tabaldata TABLE ( data nvarchar(max))
INSERT INTO #Tabaldata
SELECT
'Technology|faa5d304-f2d1-42c3-8d21-e87697b42bdc;Application|56b19e9a-e58a-4c79-a518-b129fb5f499f;Database
|d7425391-8f8c-4aec-be04-9caf2f55584a;Mobile/BYOD|8f0f30e7-d16d-48a3-ad82-cfdd39156760;Networking
|3876dbd8-8cd8-4040-9c67-0633f8477f93;Operating System|10fc2ce4-53fd-4af2-8fd9-9df66a38715f;Reporting|
00307182-43f4-4bbf-9a95-cd8dbf59754a;Security|014e8d4d-4fd9-404c-8db8-13e84c9042fe;User Interface|57d65a47-6ad2-4df7-8d36-acdf3e0a3145;Web Tech|1b9e82eb-5f70-4183-9093-5'
SELECT data ActualData,
SUBSTRING(data,CHARINDEX(';' ,data)+1,LEN(data)) AS ExpectedData
FROM
(
SELECT Split.a.value('.','nvarchar(max)') data
FROM(
SELECT CAST('<S>'+REPLACE(data,'|','</S><S>')+'</S>' AS XML) data
FROM #Tabaldata
)AS A
CROSS APPLY data.nodes('S') AS Split(a)
)dt
WHERE PATINDEX('%[0-9]%',(SUBSTRING(data,CHARINDEX(';' ,data)+1,LEN(data))))=0
Demo Result :http://rextester.com/UXDT75928
Using the answer linked to by scsimon, you can use the following script to extract the bolded words in your question:
-- Test table
declare #t table (Id int identity(1,1), Col varchar(1000))
insert into #t(Col) values ('Technology|faa5d304-f2d1-42c3-8d21-e87697b42bdc;Application|56b19e9a-e58a-4c79-a518-b129fb5f499f;Database|d7425391-8f8c-4aec-be04-9caf2f55584a;Mobile/BYOD|8f0f30e7-d16d-48a3-ad82-cfdd39156760;Networking|3876dbd8-8cd8-4040-9c67-0633f8477f93;Operating System|10fc2ce4-53fd-4af2-8fd9-9df66a38715f;Reporting|00307182-43f4-4bbf-9a95-cd8dbf59754a;Security|014e8d4d-4fd9-404c-8db8-13e84c9042fe;User Interface|57d65a47-6ad2-4df7-8d36-acdf3e0a3145;Web Tech|1b9e82eb-5f70-4183-9093-5')
,('asd|a;dse|a;gggg|a')
select t.Id
,n.r.value('.', 'varchar(50)') as String
,left(n.r.value('.', 'varchar(50)'),charindex('|',n.r.value('.', 'varchar(50)'),1)-1) as Words
,substring(n.r.value('.', 'varchar(50)'),charindex('|',n.r.value('.', 'varchar(50)'),1)+1,999999) as GUIDs
from #t as t
cross apply (select cast('<r>'+replace(replace(Col,'&','&'), ';', '</r><r>')+'</r>' as xml)) as S(XMLCol)
cross apply S.XMLCol.nodes('r') as n(r)
order by t.Id
,Words;
Output:
+----+----------------------------------------------------+------------------+--------------------------------------+
| Id | String | Words | GUIDs |
+----+----------------------------------------------------+------------------+--------------------------------------+
| 1 | Application|56b19e9a-e58a-4c79-a518-b129fb5f499f | Application | 56b19e9a-e58a-4c79-a518-b129fb5f499f |
| 1 | Database|d7425391-8f8c-4aec-be04-9caf2f55584a | Database | d7425391-8f8c-4aec-be04-9caf2f55584a |
| 1 | Mobile/BYOD|8f0f30e7-d16d-48a3-ad82-cfdd39156760 | Mobile/BYOD | 8f0f30e7-d16d-48a3-ad82-cfdd39156760 |
| 1 | Networking|3876dbd8-8cd8-4040-9c67-0633f8477f93 | Networking | 3876dbd8-8cd8-4040-9c67-0633f8477f93 |
| 1 | Operating System|10fc2ce4-53fd-4af2-8fd9-9df66a387 | Operating System | 10fc2ce4-53fd-4af2-8fd9-9df66a387 |
| 1 | Reporting|00307182-43f4-4bbf-9a95-cd8dbf59754a | Reporting | 00307182-43f4-4bbf-9a95-cd8dbf59754a |
| 1 | Security|014e8d4d-4fd9-404c-8db8-13e84c9042fe | Security | 014e8d4d-4fd9-404c-8db8-13e84c9042fe |
| 1 | Technology|faa5d304-f2d1-42c3-8d21-e87697b42bdc | Technology | faa5d304-f2d1-42c3-8d21-e87697b42bdc |
| 1 | User Interface|57d65a47-6ad2-4df7-8d36-acdf3e0a314 | User Interface | 57d65a47-6ad2-4df7-8d36-acdf3e0a314 |
| 1 | Web Tech|1b9e82eb-5f70-4183-9093-5 | Web Tech | 1b9e82eb-5f70-4183-9093-5 |
| 2 | asd|a | asd | a |
| 2 | dse|a | dse | a |
| 2 | gggg|a | gggg | a |
+----+----------------------------------------------------+------------------+--------------------------------------+

Preserving Array Member Order in Postgres Query

I would like to know how to preserve/utilize the order of array elements when issuing a select query in Postgres. (In case it's relevant, the array is multidimensional.)
For example, given the following data:
id | points
----+---------------------------------
1 | {{1,3},{7,11},{99,101},{0,1}}
2 | {{99,101},{7,11},{0,1},{77,22}}
I'd like to know how to write a query which finds rows whose points:
contain the subarray {{7, 11}, {99, 101}}
but not {{99, 101},{7, 11}}.
I've tried using various array operators (#>, &&), adding an index using the intarray module, etc. but have not found a workable solution.
to be able to "unnest array by 1 dimention" and use the result set for incomarison, use Pavel Stěhule suggested function:
t=# with c(i,p) as (values(1,'{{1,3},{7,11},{99,101},{0,1}}'::int[][]),(2,'{{99,101},{7,11},{0,1},{77,22}}'))
, p as (select *,a,case when e = '{7, 11}' and lead(e) over (partition by i order by o) = '{99, 101}' and o = lead(o) over (partition by i order by o) -1 then true end from c, reduce_dim(p) with ordinality as a (e,o))
select * from p;
i | p | e | o | a | case
---+---------------------------------+----------+---+----------------+------
1 | {{1,3},{7,11},{99,101},{0,1}} | {1,3} | 1 | ("{1,3}",1) |
1 | {{1,3},{7,11},{99,101},{0,1}} | {7,11} | 2 | ("{7,11}",2) | t
1 | {{1,3},{7,11},{99,101},{0,1}} | {99,101} | 3 | ("{99,101}",3) |
1 | {{1,3},{7,11},{99,101},{0,1}} | {0,1} | 4 | ("{0,1}",4) |
2 | {{99,101},{7,11},{0,1},{77,22}} | {99,101} | 1 | ("{99,101}",1) |
2 | {{99,101},{7,11},{0,1},{77,22}} | {7,11} | 2 | ("{7,11}",2) |
2 | {{99,101},{7,11},{0,1},{77,22}} | {0,1} | 3 | ("{0,1}",3) |
2 | {{99,101},{7,11},{0,1},{77,22}} | {77,22} | 4 | ("{77,22}",4) |
(8 rows)
now, that you see the logic, complete where:
t=# with c(i,p) as (values(1,'{{1,3},{7,11},{99,101},{0,1}}'::int[][]),(2,'{{99,101},{7,11},{0,1},{77,22}}'))
, p as (select *,a,case when e = '{7, 11}' and lead(e) over (partition by i order by o) = '{99, 101}' and o = lead(o) over (partition by i order by o) -1 then true end from c, reduce_dim(p) with ordinality as a (e,o))
select i,p from p where "case";
i | p
---+-------------------------------
1 | {{1,3},{7,11},{99,101},{0,1}}
(1 row)
not to mention that in case of sequential array pair, you can just cast it to text and use like operator:
t=# with c(i,p) as (values(1,'{{1,3},{7,11},{99,101},{0,1}}'::int[][]),(2,'{{99,101},{7,11},{0,1},{77,22}}'))
select * from c where p::text like '%{7,11},{99,101}%';
i | p
---+-------------------------------
1 | {{1,3},{7,11},{99,101},{0,1}}
(1 row)

How to maintain

I'm wondering if SQL Server (i.e. the T-SQL language) has a natural way of doing this or if I have to write fancy constraints/triggers.
Suppose I have a table
RebuplicanCandidates
===================================
Id | Name | ByteIndex
===================================
1 | 'Marco Rubio' | 0
2 | 'Jeb Bush' | 1
3 | 'Donald Trump' | 2
4 | 'Ted Cruz' | 3
and I remove JebBush:
DELETE FROM [RepublicanCandidates] WHERE [Id]=2
Then I want the table to be like
RebuplicanCandidates
===================================
Id | Name | ByteIndex
===================================
1 | 'Marco Rubio' | 0
3 | 'Donald Trump' | 1
4 | 'Ted Cruz' | 2
Notice that the ByteIndex column shuffled.
And then if I insert a candidate
INSERT INTO [RepublicanCandidates] (Name) VALUES ('CarlyFiorina')
the table becomes
RebuplicanCandidates
===================================
Id | Name | ByteIndex
===================================
1 | 'Marco Rubio' | 0
3 | 'Donald Trump' | 1
4 | 'Ted Cruz' | 2
5 | 'Carly Fiorina' | 3
If you created a VIEW based on the table, you could add a row_number() function, and drop the ByteIndex column from the base table.
CREATE VIEW vRebuplicanCandidates
AS
SELECT id, name , ROW_NUMBER() OVER (ORDER BY id) - 1 AS ByteIndex
FROM RebuplicanCandidates
T-SQL cannot do what your are asking. You will have to write some code or the suggestion in the other answer by is a good one IMHO.

SQL Update with CTEs not updating records

I am attempting to update a set of records that are duplicates in three particular columns. The reason for this update is that there is a conflict when trying to insert this data into an updated database schema. The conflict is caused by a new constraint that has been added on DM_ID, DM_CONTENT_TYPE_ID, and DMC_TYPE. I need to adjust the DM_CONTENT_TYPE_ID column to either 1, 3, or 5 based on the row number to get around this. A sample of the duplicate data looks as such. Notice that the first three columns are the same.
+--------+--------------------+----------+--------------------------------------+
| DM_ID | DM_CONTENT_TYPE_ID | DMC_TYPE | DMC_PATH |
+--------+--------------------+----------+--------------------------------------+
| 314457 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7897-0.tif |
| 314457 | 1 | TIF | \\DOCIMG\DR\640\0001_640_0001.tif |
| 314458 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7898-0.tif |
| 314458 | 1 | TIF | \\DOCIMG\TD\640\0002_640_0001.tif |
| 314460 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7900-0.tif |
| 314460 | 1 | TIF | \\DOCIMG\ZZ\640\0003_640_0003.tif |
| 314461 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7901-0.tif |
| 314461 | 1 | TIF | \\DOCIMG\ED\6501\03_0001.tif |
| 314461 | 1 | TIF | \\DOCIMG\ZZ\640\0004_640_0004.tif |
+--------+--------------------+----------+--------------------------------------+
This is the desired output to get around the constraint issue:
+--------+--------------------+----------+--------------------------------------+
| DM_ID | DM_CONTENT_TYPE_ID | DMC_TYPE | DMC_PATH |
+--------+--------------------+----------+--------------------------------------+
| 314457 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7897-0.tif |
| 314457 | 3 | TIF | \\DOCIMG\DR\640\0001_640_0001.tif |
| 314458 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7898-0.tif |
| 314458 | 3 | TIF | \\DOCIMG\TD\640\0002_640_0001.tif |
| 314460 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7900-0.tif |
| 314460 | 3 | TIF | \\DOCIMG\ZZ\640\0003_640_0003.tif |
| 314461 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7901-0.tif |
| 314461 | 3 | TIF | \\DOCIMG\ED\6501\03_0001.tif |
| 314461 | 5 | TIF | \\DOCIMG\ZZ\640\0004_640_0004.tif |
+--------+--------------------+----------+--------------------------------------+
The script I have developed is as such:
;WITH CTE AS
(SELECT -- Grab the documents that have a duplicate.
DM_ID
,DM_CONTENT_TYPE_ID
,DMC_TYPE
,COUNT(*) 'COUNT'
FROM
[DM_CONTENT]
GROUP BY
DM_ID
,DM_CONTENT_TYPE_ID
,DMC_TYPE
HAVING
COUNT(*) > 1),
CTE2 AS
(SELECT -- Designate the row number for the duplicate documents.
DMC.*
,ROW_NUMBER() OVER(PARTITION BY DMC.DM_ID, DMC.DM_CONTENT_TYPE_ID, DMC.DMC_TYPE ORDER BY DMC.DMC_PATH) AS 'ROWNUM'
FROM
[DM_CONTENT] DMC
JOIN CTE
ON DMC.DM_ID = CTE.DM_ID),
CTE3 AS
(SELECT -- Set the new document type ID based on the row number.
*
,CASE
WHEN ROWNUM = 1
THEN 1
WHEN ROWNUM = 2
THEN 3
WHEN ROWNUM = 3
THEN 5
END AS 'DM_CONTENT_TYPE_ID_NEW'
FROM
CTE2)
UPDATE -- Update the records.
DMC
SET
DMC.DM_CONTENT_TYPE_ID = CTE3.DM_CONTENT_TYPE_ID_NEW
FROM
[DM_CONTENT] DMC
JOIN CTE3
ON DMC.DM_ID = CTE3.DM_ID
Now when I execute the script, it says that the appropriate rows have been affected. However, when I check the [DM_CONTENT] table, the DM_CONTENT_TYPE_ID actually hasn't been updated and still remains at a value of 1. If I SELECT from CTE3, the DM_CONTENT_TYPE_ID_NEW, is the appropriate new ID. My logic seems to be sound, but I cannot figure out what mistake I am making. Does anyone have any insight? Thanks in advance!
This seems much simpler to write as:
WITH toupdate AS (
SELECT DMC.*,
ROW_NUMBER() OVER (PARTITION BY DMC.DM_ID, DMC.DM_CONTENT_TYPE_ID, DMC.DMC_TYPE
ORDER BY DMC.DMC_PATH) AS ROWNUM
FROM DM_CONTENT DMC
)
UPDATE toupdate
SET DM_CONTENT_TYPE_ID = (CASE ROWNUM WHEN 2 THEN 3 WHEN 3 THEN 5 END)
WHERE ROWNUM > 1;
Now, I find it suspicious that your join conditions are only on DM_ID. I think the problem is that you are getting multiple matches between the CTE and your table. An arbitrary match is used for the update -- and that happens to be the first one encountered (hence a value of 1).
Try
UPDATE CTE3
SET DM_CONTENT_TYPE_ID = DM_CONTENT_TYPE_ID_NEW
instead of what you're currently doing.
Updating from a CTE works a little different that regular table joins.
Should work with any no. of duplicates. Try this way
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY dm_id, dm_content_type_id, dmc_type
ORDER BY DMC_PATH) AS Rn,
*
FROM dm_content)
UPDATE cte
SET dm_content_type_id = rn + (rn -1)