Field value query with special character and unfiltered search returning unexpected results? - indexing

Field value query is giving unexpected results when any special character(#,=,#,$,%,^,*) is passed.
please find the 4 sample docs I have inserted in to ML.
<root>
<journalTitle>Dinesh</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Gayari</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Dixit</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Singla</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
CTS Query :
cts:search(
fn:doc(),
cts:field-value-query("Sample","######*()", ("unwildcarded")),
"unfiltered"
)
On running this query I am getting all the documents.
As per my understanding, it should return an empty sequence.
please find below the field I have created.
Field (in XML format) :
<field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://marklogic.com/xdmp/database">
<field-name>Sample</field-name>
<field-path>
<path>/root/journalTitle</path>
<weight>1.0</weight>
</field-path>
<word-lexicons/>
<included-elements/>
<excluded-elements/>
<tokenizer-overrides/>
</field>
Index setting:
If I will add any alphabet(s) in the search string it will give me the correct results.
Like:
##$%F
=====S
df===$d
Please help me to resolve this issue?

Try passing "exact" as an option to cts:field-value-query:
cts:search(
fn:doc(),
cts:field-value-query("Sample","######*()", ("exact")),
"unfiltered"
)
MarkLogic has an index for exact values to help in cases like this. Note it's only on when you have both case sensitive and diacritic sensitive indexes enabled (which you do). I know this works for cts:element-value-query so I expect it will for cts:field-value-query as well.

Use instead the 'exact' option in the field-value-query.
This requires the fast diacritic- and case-sensitive options, but you already have those enabled.
You can also try xdmp:plan before and after using 'exact' to see the effect on the query plan.

In the 'tokenizer overrides' option for your field, add these special character(#,=,#,$,%,^,*) as words (select 'word').
These special characters are not considered for matching by default. You need to override the default tokenizer to include them as words.

May I know what output are you expecting on passing this cts:element-word-query(xs:QName("journalTitle"),"=====S") for the above given for xmls.

Changing the one character searches to true in database config, resolves the issue in element-word-query.

Related

Check if XML nodes are empty in SQL

Hi I am new to XML manipulation, my question would be if there is a possibility of detecting if the XML node is an empty node like this: <gen:nodeName />
I am able to manipulate single nodes however I would be interested if there is an approach like a loop or recursive function that could save some time doing manual labor looking trough every single node. I have no idea how to approach this problem though.
Thanks for help.
You did not specify the dialect of SQL ([sql] is not enough, please specify always the RDBMS incl. version).
This is for SQL-Server, but the semantics should be the same.
DECLARE #xml XML=
N'<root>
<SelfClosing />
<NoContent></NoContent>
<BlankContent> </BlankContent>
<HasContent>blah</HasContent>
<HasContent>other</HasContent>
</root>';
SELECT #xml.query(N'/root/*') AS AnyBelowRoor --All elements
,#xml.query(N'/root/*[text()]') AS AnyWithTextNode --blah and other
,#xml.query(N'/root/*[not(text())]') AS NoText --no text
,#xml.query(N'/root/*[text()="blah"]') AS AnyWithTextNode--blah only
The <SelfClosing /> is semantically the same as the <NoContent><NoContent>. There is no difference.
It might be a surprise, but a blank as content is taken as empty too.
So the check for empty or not empty is the check for the existance of a text() node. one can negate this with not() to find all without a text().
Interesting: The result for NoText comes back as this (SQL-Server)
<SelfClosing />
<NoContent />
<BlankContent />
The three elements are implicitly returned in the shortest format.

Export SQL XML field to grid [duplicate]

I have something like the following XML in a column of a table:
<?xml version="1.0" encoding="utf-8"?>
<container>
<param name="paramA" value="valueA" />
<param name="paramB" value="valueB" />
...
</container>
I am trying to get the valueB part out of the XML via TSQL
So far I am getting the right node, but now I can not figure out how to get the attribute.
select xmlCol.query('/container/param[#name="paramB"]') from LogTable
I figure I could just add /#value to the end, but then SQL tells me attributes have to be part of a node. I can find a lot of examples for selecting the child nodes attributes, but nothing on the sibling atributes (if that is the right term).
Any help would be appreciated.
Try using the .value function instead of .query:
SELECT
xmlCol.value('(/container/param[#name="paramB"]/#value)[1]', 'varchar(50)')
FROM
LogTable
The XPath expression could potentially return a list of nodes, therefore you need to add a [1] to that potential list to tell SQL Server to use the first of those entries (and yes - that list is 1-based - not 0-based). As second parameter, you need to specify what type the value should be converted to - just guessing here.
Marc
Depending on the the actual structure of your xml, it may be useful to put a view over it to make it easier to consume using 'regular' sql eg
CREATE VIEW vwLogTable
AS
SELECT
c.p.value('#name', 'varchar(10)') name,
c.p.value('#value', 'varchar(10)') value
FROM
LogTable
CROSS APPLY x.nodes('/container/param') c(p)
GO
-- now you can get all values for paramB as...
SELECT value FROM vwLogTable WHERE name = 'paramB'

SQL Date column in xml

StudentID ExamID 09/05/2017 08/05/2017 07/05/2017 06/05/2017 05/05/2017
123 AS12 12
123 AS13 13 23
While convert the above using "FOR XML PATH , Elements" in sql statement. I got the error.
error:Column name '09/05/2017' contains an invalid XML identifier as
required by FOR XML; '2'(0x0032) is the first character at fault.
Is there any way I will get XML in format:
<row>
<StudentID>123</StockID>
<LessonID>AS13</LessonID>
<09/05/2017>13</09/05/2017>
<08/05/2017>23</08/05/2017>
<07/05/2017></07/05/2017>
<06/05/2017></06/05/2017>
<05/05/2017></05/05/2017>
</row>
It is a very bad design, to store your date-based values in columns of the student table. Whenever you have to add a column in order to add more data, the design is bad... This should be stored in a related side table, while a PIVOT query constructs this output format, whenever you need it.
And: Avoid culture specific date formats!!!
How should one know, wheter 06/05/2017 is the 6th of May or the 5th of June? Use ISO8601 like 2017-05-06 (which makes it sure, that you think about the 6th of May)
About your question: No, this is impossible!
XML does not allow an element's name like '05/05/2017'. You must start with a non-numeric character or an underscore and several characters like the / are forbidden...
Try to create your XML similar to
<row>
<StudentID>123</StockID>
<LessonID>AS13</LessonID>
<Marks>
<Mark date="2017-05-09">13<Mark>
<Mark date="2017-05-08">23<Mark>
[... more of them ...]
</Marks>
</row>
This error goes back to how to treat strings in the language you wish to program in. In this case once you are inside the brackets(<>) the slash is (/) is a special character and the first set of algorithms that process this (regex) XML identify the slash as an issue thereby throwing the error.
Additionally you may want to consider how you want to treat your objects in XML. First group is the class, the class has many students, and the students take many lessons, and each lesson has a grade. (or in this case it looks like a lesson has many grades, not shown here)
<CLASS>
<STUDENT>
<StudentID>123</StudentID>
<LESSON>
<LessonID>AS12</LessonID>
<DATE>09/05/2017</DATE>
<GRADE>93.00</GRADE>
</LESSON>
<LESSON>
<LessonID>AS12</LessonID>
<DATE>08/05/2017</DATE>
<GRADE>93.00</GRADE>
</LESSON>
</STUDENT>
<STUDENT>
...
</STUDENT>
</CLASS>

How to query database using saxon sql:query where db.id='value of xml attribute'

I have a requirement where i need to query database using saxon sql;query by applying where clause, where database_table.ProductID should match with incoming xml input productId
Here is what i tried so far:
<sql:query connection="$sql.conn" table="table_name" column="Product_ID" row-tag="row" column-tag="col" where="Product_ID="<xsl:value-of select="ProductItem/ProductItemId/text()"/>"" />
I am getting following Exception:
SXXP0003: Error reported by XML parser: Element type "sql:query" must be followed by either attribute specifications, ">" or "/>".
I am finding it difficult to format the where clause in XPath, can any one suggest what would be correct format. Thanks in Advance.
Try to use to use an attribute value template where you put the XPath expression in curly braces {...}, as in
<sql:query connection="$sql.conn" table="table_name" column="Product_ID" row-tag="row" column-tag="col" where="Product_ID="{ProductItem/ProductItemId}""/>

SQL and escaped XML data

I have a table with a mix of escaped and non-escaped XML. Of course, the data I need is escaped. For example, I have:
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment&gt
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment&gt
</ArrayOfComment>
</Node>
</InternalData>
</Root>
As you can see, the data in the Node tag is all escaped. I can use a query to obtain the Node data, but how can I convert it to XML in SQL so that it can be parsed and broken up? I'm pretty new to using XML in SQL, and I can't seem to find any examples of this.
Thanks
You have not given enough information about your end goal, but this will get you very close. FYI - You had two missing ; both after comment&gt
declare #xml xml
set #xml = '
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
</Node>
</InternalData>
</Root>
'
select convert(xml, n.c.value('.', 'varchar(max)'))
from #xml.nodes('Root/InternalData/Node/text()') n(c)
Output
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
The result is an XML column that you can put into a variable or cross-apply into directly to get data from the XML fragment.
Your best bet might be to look into a HTML Decoding UDF. I did a quick search and found this one:
http://www.andreabertolotto.net/Articles/HTMLDecodeUDF.aspx
You may want to modify it so it only decodes > and <. The one above seems to go above and beyond your needs.
UPDATE
#Cyberkiwi's solution seems to be a bit cleaner. I will leave this up in case the version of SQL Server you are running doesn't support his solution.