Can I use Oracle 11g's updatexml function to insert a value into an element without also adding the namespace to the tag that I'm inserting? - sql

Let's say that I have many XML documents similar to the following XML document:
<a xmlns="urn:www.someSite.com/myModel">
<b>
<c>my value</c>
</b>
<d>
<e>my other value</e>
<f>even more other values</f>
</d>
</a>
And I want the document instead to look like this:
<a xmlns="urn:www.someSite.com/myModel">
<b>
<e>my other value</e>
<f>even more other values</f>
</b>
<d>
<e>my other value</e>
<f>even more other values</f>
</d>
</a>
For the sake of argument, let's say these documents are in the table MY_TABLE, in column XML_DOCUMENT_COLUMN. For the sake of simplicity there's also a PRIMARY KEY column on the table called MY_TABLE_KEY.
Using Oracle 11g's UpdateXML function, I can write the following query:
update MY_TABLE mt
set mt.XML_DOCUMENT_COLUMN = (
select updatexml(
mt.XML_DOCUMENT_COLUMN,
'/a/b',
'<b>' || extract(
mt1.XML_DOCUMENT_COLUMN,
'/a/d/*',
'xmlns="urn:www.someSite.com/myModel"'
).getStringVal() || '</b>',
'xmlns="urn:www.someSite.com/myModel"')
from MY_TABLE mt1
where mt1.MY_TABLE_KEY = mt.MY_TABLE_KEY
)
Long story short, what this says is "update the column with this altered version of the column where the key is the same."
The problem is that this always returns a document that looks like this:
<a xmlns="urn:www.someSite.com/myModel">
<b xmlns="urn:www.someSite.com/myModel">
<e>my other value</e>
<f>even more other values</f>
</b>
<d>
<e>my other value</e>
<f>even more other values</f>
</d>
</a>
Now, for various reason, I'm not sure that some other code won't break because of a new xmlns declaration in the middle of the document, so I would really like to remove that before running this on my database. I thought the answer would be to just drop the namespace tag in the UpdateXML call, but after running that as a query, I found that no documents in the database changed.
So finally, the question is - how do I update the document to look like the second document, can I do it with one query, and what does that query look like? Bonus round: what am I doing wrong, and what does the xmlns argument to the UpdateXML function do?
Note: The XML documents and XPaths have been changed to protect guilty parties, and to obscure the project that I'm working on. As such, some of this information may not be accurate, as I'm trying to cross-compile XPaths in my head. I'll change the question to provide any clarification that I can.

Related

Removing HTML Tags from Big Query

I have the a column in my table which stores a paragraph like below :
<p><img src="https://mywebsite.com/medias/NH2xcoUOfANfFb6l4xNgOFch3dc4TvoX2XBnI6to.jpg" alt="" width="250" height="33"></p><p><span style="font-size: 16pt; font-family: Mali, cursive; font-weight: 500;">My beautiful text is here. Show me without tags, please.</span> </p>
I want to remove all the html tags and, if possible, replace an HTML image to (Image)text.
So my expected output will be like below :
(Image) My beautiful text is here. Show me without tags, please.
OR just
My beautiful text is here. Show me without tags, please.
Thank you so much.
Try below naive approach
select html,
regexp_replace(
regexp_replace(
regexp_replace(html,
r'<img [^<>]*>', r'(Image) '),
r'(&)([^&;]*)(;)', r'<\2>'
),r'\<[^<>]*\>', ''
) as text
from your_table
if applied to sample data in your question - output is
As you can see first step is to replace Image Tag with (Image) text, second step is to address HTML encoding by enclosing them into <...> - for example becomes < > and finally remove everything between and including < and >
Note: above is simplistic approach - might not work for more complex htmls

How to do a full text search, when content has HTML tags?

I want to make a full-text search using text content in HTML format. E.g.:
... f<em>oo</em> ...
If the search term will be 'foo' the document containing "foo" will not be found
How to make this work?
I'm using PostgreSQL
I'm afraid you won't find anything out of the box, but you could extract text from your html column, e.g. using regular expressions or xpath (designed for xml)
CREATE TABLE t (html text);
INSERT INTO t VALUES ('<html>
<h1>
<foo>test f<em>oo</em> bar</foo>
</h1>
<h1>
<foo>test bar</foo>
</h1>
</html>');
SELECT * FROM t
WHERE to_tsvector(regexp_replace(html,'<.+?>','','g')) ## plainto_tsquery('test foo');
html
--------------------------------------
<html> +
<h1> +
<foo>test f<em>oo</em> bar</foo>+
</h1> +
<h1> +
<foo>test bar</foo> +
</h1> +
</html>
Keep in mind that creating the ts_vector in query time will make things quite slow. So, if you decided to go this way consider creating a new column for it and then create a gin index, e.g.
CREATE INDEX idx_html_tsvector ON t USING gin(the_new_column);
Demo: db<>fiddle
The PostgreSQL text parser will recognize tags but considers them as ending words, so it will parse that as 'f' and 'oo'. This cannot be changed without hacking the C code. You could implement your own parser, but again in C, and doing so is not easy. Your best bet is probably to pre-process your text with something else to remove the tag, making sure that closes up the gap to give 'foo', not 'f oo'.

mysql delete text starting and ending with specific words

I have a WordPress site and i want to remove text from wp_posts table precisely in post_content field.
Example of the text i want to remove :
<img class="aligncenter wp-image-41174 size-full" src="https://www.ritavpn.com/blog/wp-content/uploads/2019/12/Best-APK-Download-Sites-for-2020.png" alt="" width="214" height="57" data-wp-pid="41174" />
Note : http://www.mediafire.com and https://www.ritavpn.com/blog/wp-content/uploads/2019/12/Best-APK-Download-Sites-for-2020.png
are dynamic they are not the some links in all the posts !
So what I'm looking for is a query to remove the text that starts with
<a href=
and ends with
</a>
Run this query in mysql.
UPDATE wp_posts SET meta_value = replace(meta_value,'string or link','');

Get attributes of a empty child node

I want to transform a old HTML4 code to have closed tags in order to achieve compatibility with WCAG.
I have a lot of anchors without content, used as internal links, like
<h3> <a id="julio" name="julio"></a>Julio 2015</h3>
For reasons I do not understand, some browsers do not understand the self-enclosed tag, and interpret it as the start of an anchor (without end).
<h3> <a id="julio" name="julio"/a>Julio 2015</h3>
Then, I want to delete empty anchors and move the attributes to the parent tag:
<h3 id="julio" name="julio">Julio 2015</h3>
The tool I have to use only supports XSLT 1.0
How can I get it?

XPath to search for elements in a sequence

With xml like:
<a>
<b>
<c>1</c>
<c>2</c>
</c>3</c>
</b>
I'm trying to create an xpath expression (for a postgresql query) that will return if is a particular value and not that is all three values. What I currently have (which does not work) is:
select * from someTable where xpath ('//uim:a/text()', job, ARRAY[ ARRAY['uim','http://www.cmpy.com/uim'] ])::text[] IN (ARRAY['1','3']);
If I try with ARRAY['1'] this will not return any values but with ARRAY['1','2','3'] it will return all three.
How can I select based on a single element in a sequence?
Thanks.
If you're asking how to get the value of a 1 or more XML elements within your XML segment the easiest way is likely to simply utilize a custom SQL CLR library and XPath analysis from within it to assemble and return whatever information you desire. At least that would be my approach.