Getting the absolute position of a node using sql - sql

I have an xml document of the bible
as
<bookcoll>
<book>
<bktshort>Matthew</bktshort>
<chapter><chtitle>Chapter 1</chtitle>
<v>The book of the generation of Jesus Christ, the son of David, the son of Abraham.
</v>
<v>Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;
</v>
..
</chapter>
<chapter><chtitle>Chapter 2</chtitle>
<v>Now when Jesus was born in Bethlehem of Judaea in the days of Herod the king, behold, there came wise men from the east to Jerusalem,
</v>
I would like to keep a row number of the total nuber of <v> nodes.
This statement resets the number for each Chapter node
select
Chapter.value('../../bktshort[1]', 'varchar(200)'),
Replace(Chapter.value('../chtitle[1]', 'varchar(200)'),'Chapter ', ''),
p.number,
Chapter.value('.','varchar(max)')
from
master..spt_values p
CROSS APPLY
#xml.nodes('/bookcoll/book/chapter/v[position()=sql:column("number")]') T(Chapter)
--where p.type = 'p'
order by
Chapter.value('../../bktshort[1]', 'varchar(200)'),
Replace(Chapter.value('../chtitle[1]', 'varchar(200)'),'Chapter ', ''),
p.number
so instead of
Matthew 1 23 Behold, a virgin shall be with child, and shall bring forth a son, and they shall call his name Emmanuel, which being interpreted is, God with us.
Matthew 1 24 Then Joseph being raised from sleep did as the angel of the Lord had bidden him, and took unto him his wife:
Matthew 1 25 And knew her not till she had brought forth her firstborn son: and he called his name JESUS.
Matthew 2 1 Now when Jesus was born in Bethlehem of Judaea in the days of Herod the king, behold, there came wise men from the east to Jerusalem,
Matthew 2 2 Saying, Where is he that is born King of the Jews? for we have seen his star in the east, and are come to worship him.
I would want
Matthew 1 23 Behold, a virgin shall be with child, and shall bring forth a son, and they shall call his name Emmanuel, which being interpreted is, God with us.
Matthew 1 24 Then Joseph being raised from sleep did as the angel of the Lord had bidden him, and took unto him his wife:
Matthew 1 25 And knew her not till she had brought forth her firstborn son: and he called his name JESUS.
Matthew 2 <b>26</b> Now when Jesus was born in Bethlehem of Judaea in the days of Herod the king, behold, there came wise men from the east to Jerusalem,
Matthew 2 <b>27</b> Saying, Where is he that is born King of the Jews? for we have seen his star in the east, and are come to worship him.
Side note I know that spt_values table is not big enough and would have to use a local table.

Maybe this will spur some ideas?
DECLARE #xml XML = '<bookcoll>
<book>
<bktshort>Matthew</bktshort>
<chapter>
<chtitle>Chapter 1</chtitle>
<v>The book of the generation of Jesus Christ, the son of David, the son of Abraham. </v>
<v>Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren; </v>
</chapter>
<chapter>
<chtitle>Chapter 2</chtitle>
<v>Now when Jesus was born in Bethlehem of Judaea in the days of Herod the king, behold, there came wise men from the east to Jerusalem, </v>
</chapter>
</book>
<book>
<bktshort>Mark</bktshort>
<chapter>
<chtitle>Chapter 1</chtitle>
<v>The beginning of the good news about Jesus the Messiah,[a] the Son of God,</v>
</chapter>
</book>
</bookcoll>
'
SELECT BookTitle = Chapter.value('../../bktshort[1]', 'varchar(200)')
, Chapter = REPLACE(Chapter.value('../chtitle[1]', 'varchar(200)'), 'Chapter ', '')
, RunningVerse = ROW_NUMBER() OVER ( PARTITION BY Chapter.value('../../bktshort[1]', 'varchar(200)') ORDER BY Chapter.value('../../bktshort[1]',
'varchar(200)') )
, Scripture = Chapter.value('.', 'varchar(max)')
FROM #xml.nodes('/bookcoll/book/chapter/v') T ( chapter )

MY First solution is
SELECT
BQ.BookName
, BQ.BookNumber
, CQ.ChapterTitle
, CQ.ChapterNumber
, VQ.VerseText
, VQ.VerseNumber
, ROW_NUMBER() OVER (ORDER BY BQ.BookNumber, CQ.ChapterNumber,VQ.VerseNumber)
FROM
(SELECT Books.value('bktshort[1]', 'varchar(200)') as BookName,
p.number as BookNumber
FROM master..spt_values p
CROSS APPLY
#xml.nodes('/bookcoll/book[position()=sql:column("number")]') B(Books)
WHERE p.type='p'
) BQ
INNER JOIN
(SELECT C.Chapter.value('../bktshort[1]', 'varchar(200)') as BookName,
C.Chapter.value('chtitle[1]', 'varchar(200)') as ChapterTitle,
p.number as ChapterNumber
FROM master..spt_values p
CROSS APPLY
#xml.nodes('/bookcoll/book/chapter[position()=sql:column("number")]') C(Chapter)
WHERE p.type='p'
) CQ
on BQ.BookName = CQ.BookName
INNER JOIN
(SELECT V.Verses.value('../../bktshort[1]', 'varchar(200)') as BookName,
V.Verses.value('../chtitle[1]', 'varchar(200)') as ChapterTitle,
V.Verses.value('.', 'varchar(max)') as VerseText,
p.number as VerseNumber
FROM master..spt_values p
CROSS APPLY
#xml.nodes('/bookcoll/book/chapter/v[position()=sql:column("number")]') V(Verses)
WHERE p.type='p'
) VQ
on CQ.BookName = VQ.BookName
and CQ.ChapterTitle = VQ.ChapterTitle
but this takes two minutes to run so I am still taking suggestions

Related

Returning a substring within given HTML tags in a Row in SQL Server

I've below 3 rows in a table -
1. <p><strong>By Dr. Mercola</strong></p> <blockquote> <p>In an interview with ElectromagneticHealth.org
2. <p><strong>By Barbara Loe Fisher</strong></p> <blockquote> <p>Here we are in the winter of 2015, and
3. <p><strong>By Gary Ruskin<br> Co-Founder and Executive Director, U.S. Right to Know</strong></p> <blockquote> <p>U.S. Right to Know
From the above rows i want to fetch a substring and want a result as below -
Dr. Mercola
Barbara Loe Fisher
Gary Ruskin
For this i have written below query-
CASE WHEN CHARINDEX('<p><strong>By', FormattedBody, -1)=1 THEN LTRIM(REPLACE(LEFT(CAST(FormattedBody as nvarchar(max)),CHARINDEX('</strong>', FormattedBody)-1),'<p><strong>By',''))
ELSE 'Dr.Mercola' END as Name
The above query returns the expected output for the first 2 rows, but not for the third row. It returns Gary Ruskin<br> Co-Founder and Executive Director, U.S. Right to Know
Please let me know what additional changes should be made to the query to get expected results. Thanks in advance.!!
Your pattern matching is looking for </strong> to end the name.
However, the examples suggest that < is sufficient for this purpose. It is hard to imagine a person's name with this character, so it seems safe to use that.
So, you can try:
(CASE WHEN CHARINDEX('<p><strong>By', FormattedBody, -1) = 1
THEN LTRIM(REPLACE(LEFT(CAST(FormattedBody as nvarchar(max)), CHARINDEX('<', FormattedBody) - 1, 12), '<p><strong>By', ''))
ELSE 'Dr.Mercola'
END) as Name
SELECT SUBSTRING(REPLACE(col,'<p><strong>By ',''),0,CHARINDEX('<',REPLACE(col,'<p><strong>By ','')))
FROM (VALUES
('<p><strong>By Dr. Mercola</strong></p> <blockquote> <p>In an interview with ElectromagneticHealth.org'),
('<p><strong>By Barbara Loe Fisher</strong></p> <blockquote> <p>Here we are in the winter of 2015, and '),
('<p><strong>By Gary Ruskin<br> Co-Founder and Executive Director, U.S. Right to Know</strong></p> <blockquote> <p>U.S. Right to Know')
) as t (col)
Will give you:
Dr. Mercola
Barbara Loe Fisher
Gary Ruskin
Used below query and working properly for each case-
CASE WHEN CHARINDEX('<p><strong>By', FormattedBody, -1)=1 THEN LTRIM(SUBSTRING(REPLACE(CAST(FormattedBody as varchar(max)),'<p><strong>By ',''),0,CHARINDEX('<',REPLACE(cast(FormattedBody as varchar(max)),'<p><strong>By ',''))))
ELSE 'Dr.Mercola' END as Name

Is it possible to do a split to columns in SQL Server (T-SQL) using a regular expression or other means

I have a rows of data comprised of a row number and string that is comprised of several sentences (varying numbers). I would like to split columns and pivot (unpivot?) using T-SQL and display the appropriate row number and the individual sentences that comprise the string in each row. Note: each new sentence starts with Capital letters and ends with a period.
My data looks something like this:
Row_num Sting
1 JOHN SMITH walked quickly to his car. MARY waited outside for a ride. BOB JOHNS called is fired to pick him up. TOM was not present.
2 SALLY SMITH arrived at work early on. Dave called in sick. BETTY DOE was on vacation.
I would like to be able to split the sentences in each row and end up with something like this:
1 JOHN SMITH walked quickly to his car.
1 MARY waited outside for a ride.
1 BOB JOHNS called is fired to pick him up.
1 TOM was not present.
2 SALLY SMITH arrived at work early on.
2 Dave called in sick.
2 BETTY DOE was on vacation.
I've an written a regular expression (javascript version of regex) that successfully splits the data but I don't know how to achieve this in T-SQL.
Yes you can achieve what you want using XML:
SQLFiddle
Data:
CREATE TABLE tab(row_num INT, String NVARCHAR(MAX));
INSERT INTO tab(Row_num, String)
VALUES
(1, N'JOHN SMITH walked quickly to his car. MARY waited outside for a ride. BOB JOHNS called is fired to pick him up. TOM was not present.'),
(2, N'SALLY SMITH arrived at work early on. Dave called in sick. BETTY DOE was on vacation.')
Main query:
SELECT
row_num
,[sentence] = Split.a.value('.', 'NVARCHAR(1000)')
FROM
(
SELECT
row_num,
[X] = CAST ('<M>' + REPLACE(String, '.', '.</M><M>') + '</M>' AS XML)
FROM tab
) AS A
CROSS APPLY X.nodes ('/M') AS Split(a)
WHERE Split.a.value('.', 'NVARCHAR(1000)') <> '';
To work with custom regex you should use CLR Table Valued Function.
additional variant without using xml
Data sample
DECLARE #tab AS TABLE
(
row_num INT ,
String NVARCHAR(100)
);
INSERT INTO #tab
( row_num, String )
VALUES ( 1,
N'JOHN SMITH walked quickly to his car. MARY waited outside for a ride. BOB JOHNS called is fired to pick him up. TOM was not present.' ),
( 2,
N'SALLY SMITH arrived at work early on. Dave called in sick. BETTY DOE was on vacation.' )
Query
;
WITH cte
AS ( SELECT n = 1
UNION ALL
SELECT n + 1
FROM cte
WHERE n <= 200
)
SELECT S.row_num ,
LTRIM(SUBSTRING(S.String, T.n,
CHARINDEX('.',
SUBSTRING(S.String, T.n, LEN(S.String))))) AS String
FROM cte AS T
JOIN #tab AS S ON SUBSTRING('.' + S.String, T.n, 1) = '.'
AND LEN(S.String) > T.n
ORDER BY S.row_num ,
T.n
OPTION ( MAXRECURSION 200 )
output

Remove partial duplicates in SQL?

The resulting table (CSV) looks like this:
NAME ,TITLE ,YEAR ,QNTY ,CLUB ,PRICE ,LOWEST_CLUB ,LOWEST
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, CARP ,215.95
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, YRB Bronze ,215.95
Andy Aardverk ,Yon-juu Hachi ,1948,1,Basic ,44.95, CARP ,41.95
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, Readers Digest ,10
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, YRB Gold ,29.95
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, York Club ,29.95
Cary Cizek ,Toronto Underground ,2001,1,YRB Gold ,14.45, York Club ,12.95
Egbert Engles ,Capricia's Conundrum ,1993,1,CARP ,13.45, Guelph Club ,12.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, Oprah ,104.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, YRB Silver ,104.95
Ekksdwl Qjksynn ,I don't think so ,2001,1,YRB Gold ,12.5, CAA ,11.5
George Wolf ,Math is fun! ,1995,1,YRB Silver ,13.5, CAA ,12
Jack Daniels ,Eigen Eigen ,1980,1,York Club ,57.95, Oprah ,56.95
Jack Daniels ,Okay Why Not? ,2001,1,York Club ,18.45, Oprah ,17.45
Jackie Johassen ,Getting into Snork U. ,2004,1,YRB Silver ,21.95, Waterloo Club ,20.45
Jackie Johassen ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Klive Kittlehart ,Will Snoopy find Lucy? ,1990,1,YRB Bronze ,14.95, YRB Gold ,12.95
Lux Luthor ,Is Math is fun? ,1996,1,Basic ,72.95, Oprah ,69.95
Lux Luthor ,Tropical Windsor ,2004,1,Basic ,18.95, Oprah ,17.95
Nigel Nerd ,Are my feet too big? ,1993,1,Basic ,13.95, CAA ,11.45
Nigel Nerd ,Dogs are not Cats ,1995,1,Basic ,35.95, UofT Club ,32.95
Phil Regis ,Databases made Real Hard ,2002,1,Basic ,39.95, Oprah ,35.95
Pretence Parker ,Tchuss ,2002,1,Basic ,24.95, Guelph Club ,21.95
Qfwfq ,The Earth is not Enough ,2003,1,YRB Gold ,37.37, Oprah ,36.37
Qfwfq ,Under Your Bed ,2004,1,Oprah ,14.85, CAA ,13.85
Suzy Sedwick ,Are my feet too big? ,1993,1,YRB Silver ,12.95, Oprah ,11.45
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, Readers Digest ,13.95
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, YRB Silver ,13.95
Tracy Turnip ,Yon-juu Hachi ,1948,1,Readers Digest ,41, York Club ,40.95
Valerie Vixen ,Base de Donne ,2003,1,YRB Bronze ,23.95, Readers Digest ,20.95
Xia Xu ,Where art thou Bertha? ,2003,1,Basic ,30.95, CAA ,26.95
Yves Yonge ,Radiator Barbecuing ,2002,2,Basic ,14.2, Waterloo Club ,12.2
Zebulon Zilio ,Transmorgifacation ,2004,1,Basic ,288.73, CAA ,278.73
34 record(s) selected.,,,,,,,
I want to be able to only show one of the lowest options. For example 'Andy Aardverk' has purchased 'Avarice is Good' and could have bought it from 'CARP' or 'YRB Bronze' for a lower price. I only want one to show so it could be 'CARP' or 'YRB Bronze' but not both.
I tried to use 'group by' on 'name, title, year, qnty, club, price' but was given this error:
'SQL0119N An expression starting with "LOWEST_CLUB" specified in a SELECT
clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY
clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a
column function and no GROUP BY clause is specified. SQLSTATE=42803'
It would have been easier with your actual query, but I'll give it a go anyways.
You can solve this problem by using a CTE like this:
;WITH CTE AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY [NAME], [TITLE] ORDER BY LOWEST_CLUB ASC, some_fallback_if_two_prices_are_the_same ASC) AS RowNumber,
[NAME],
[TITLE],
col1,
col2,
lowest_price,
some_fallback_if_two_prices_are_the_same
FROM [Table]
)
SELECT * --or rewrite your columns if you want to avoid the RowNumber
FROM CTE
WHERE RowNumber = 1;
That SELECT inside the CTE should be your current query + the ROW_NUMBER() line.
Seeing as I don't have the query, I can't give you a final result. You'll have to fiddle with it until it works for you.

Have query return column horizontally

I have a table in SQL Server which stores a list of questions and answers from a survey on our website. It's a pretty standard layout, here's how it stores the completed surveys:
Name Question Answer
James Smith What is your address? 23 Duck Ln.
James Smith How old are you? 48
James Smith Do you have a job? yes
Sarah Murphy What is your address? 44 West St.
Sarah Murphy How old are you? 23
Sarah Murphy Do you have a job? no
Jack Western What is your address? PO Box 17
Jack Western Do you have a job? yes
As you can see, it's hard to read the data once a few surveys are completed. I need to have the values returned horizontally, with a person each having only one row the first column containing the person's name, with the other rows each containing a question as the header, and the answer under it. Here's how the query should return values:
Name What is your address? How old are you? Do you have a job?
James Smith 23 Duck Ln. 48 yes
Sarah Murphy 44 West St. 23 no
Jack Western PO Box 17 yes
Is this possible? By the way, I am only posting a few of the questions - it becomes much larger if there are 10+ questions asked on the site.
Thanks for your help!
Edit:
Please don't focus on whether or not the records should be parsed in the application layer. I ultimately want to use the output in R, which isn't even designed to handle large datasets.
If you want to perform this in SQL since you are using SQL Server you can use the PIVOT function to transform the data from rows to columns:
select name,
[What is your address?],
[How old are you?],
[Do you have a job?]
from yourtable
pivot
(
max(answer)
for question in ([What is your address?], [How old are you?], [Do you have a job?])
) piv;
See SQL Fiddle with Demo.
If you have an unknown values, then you can use dynamic SQL to get the solution:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(Question)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT name, ' + #cols + '
from yourtable
pivot
(
max(answer)
for Question in (' + #cols + ')
) p '
execute sp_executesql #query;
See SQL Fiddle with Demo

Trying to combine multiple rows of result set into single row

select C.customerId,(C.lastName+', '+C.firstName) as CustomerName, C.companyName,
D.companyName+' ('+D.lastName+','+D.firstName+')'
as "Parent CompanyName(Last, First)",S.siteId, S.nickName as siteName,
dbo.GetSiteTelemetryBoxList(s.siteId) as "DeviceId's",
dbo.GetSiteTelemetryBoxSKUList(S.siteId,0) as SKU
from Site S
INNER JOIN Customer C ON S.customerId = C.customerId
INNER JOIN Customer D ON D.customerId = C.parentCustomerId
where S.createDate between DATEADD(DAY, -65, GETUTCDATE()) and GETUTCDATE()
order by C.customerId, S.siteId
The above query returns values that look like this:
CID CustomerName companyName Parent CompanyName(Last, First) SiteName DeviceId SKU
888296 DeYoung, Scott DeYoung Farms Mercier Valley Irrigation (Mercier,Ralph) H E east 200241 NETB12WR
890980 Rust, Marcus NULL Chester Inc. (Young,Scott) Byroad east 346370 NETB12WR
890980 Rust, Marcus NULL Chester Inc. (Young,Scott) Byroad west 345431 NETB12WR
891094 Pirani, Mark A Pirani Farm AMX Irrigation (Burroughs,Michael) hwy 64 south 333721 UNKNOWN
891094 Pirani, Mark A Pirani Farm AMX Irrigation (Burroughs,Michael) HWY 64 North 250162 NETB12WR
891094 Pirani, Mark A Pirani Farm AMX Irrigation (Burroughs,Michael) HWY 64 West 250164 NETB12WR
891094 Pirani, Mark A Pirani Farm AMX Irrigation (Burroughs,Michael) HWY 64 East 250157 NETB12WR
891430 Gammil, Bob Gammil FArms AMX Irrigation (Burroughs,Michael) angel 333677 UNKNOWN
891430 Gammil, Bob Gammil FArms AMX Irrigation (Burroughs,Michael) cemetery 333564 UNKNOWN
The problem I face now is that if a customerId/Name is repeating in the result set. The SiteName, deviceId, SKU should be concatenated to represent the data as one value.
For example, Mark Pirani row would look like
CID CustomerName ... SiteName DeviceId's ...
891904 Pirani, Mark ... hwy 64 south, HWY 64 North, HWY 64 West, HWY 64 East 333721,250162,250164,250157 ...
You can convert the rows with something like this to transform the rows into a concatenated string:
select
distinct
stuff((
select ',' + u.username
from users u
where u.username = username
order by u.username
for xml path('')
),1,1,'') as userlist
from users
group by username
I believe this is more of a SQL query issue than a C# code issue, or more appropriately I believe it more efficient to solve this problem at the query level rather than the code level. Off the top of my head you can use SELECT DISTINCT or GROUP BY clauses.
Here is another StackOverflow question addressing this issue - How do I (or can I) SELECT DISTINCT on multiple columns?
I did some digging and found a few ways to implement it. Basically, the simple solution for this is using mysql's group_concat function. These links discuss how the group_concat can be implemented for SQL server. You can choose one based on your requirements.
Simulating group_concat MySQL function in Microsoft SQL Server 2005? -- This thread discusses a few ways to implement it.
Flatten association table to multi-value column? -- This thread discusses the CLR implemenation of it.
http://groupconcat.codeplex.com/ -- This was just perfect for me. Exactly what I was looking for. The project basically creates four aggregate functions that collectively offer similar functionality to the MySQL GROUP_CONCAT function.