Better way to remove HTML tags in Oracle SQL - sql

I have a comments column and the comments added to release are stored as rich text in comments column. Now i'm trying to process this data and get an human readable output. I'm providing 2 sample comment data i have in my oracle SQL db which i'm trying to process.
Example 1 :
<html>
<body>
<div align="left"><font face="Arial Unicode MS"><span style="font-size:8pt">Display the frulog on the count values</span></font></div>
</body>
</html>
Example 2: <not implemented in this release>
i used the below query to process the html characters
Select (REGEXP_REPLACE(comments),'<.+?>') from test_table;
Note: consider values provided in Example 1 and Example 2 are passed in as column comments in the above SQL command.
the query result for Example 1 was Display the frulog on the count values which is what i'm expecting. result for Example 2 was ''. Value in Example 2 was not an html tag but it still replaced the tags. How can i make the replace statement smart.
Feel free to drop your suggestions .

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE comments ( value ) AS
SELECT '<html>
<body>
<div align="left">
<font face="Arial Unicode MS">
<span style="font-size:8pt">
Display the frulog on the count values
</span>
</font>
</div>
</body>
</html>' FROM DUAL UNION ALL
SELECT '<not implemented in this release>' FROM DUAL UNION ALL
SELECT '<test a="1"
b=''2''
c = 3
d
e = ">" >test</test>' FROM DUAL;
Query 1:
SELECT value,
REGEXP_REPLACE(
value,
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*',
NULL,
1,
0,
'im'
) AS replaced
FROM comments
Results:
| VALUE | REPLACED |
|------------------------------------------|----------------------------------------|
| <html> | Display the frulog on the count values |
| <body> | |
| <div align="left"> | |
| <font face="Arial Unicode MS"> | |
| <span style="font-size:8pt"> | |
| Display the frulog on the count values | |
| </span> | |
| </font> | |
| </div> | |
| </body> | |
| </html> | |
|------------------------------------------|----------------------------------------|
| <not implemented in this release> | (null) |
|------------------------------------------|----------------------------------------|
| <test a="1" | test |
| b='2' | |
| c = 3 | |
| d | |
| e = ">" >test</test> | |
Note: <not implemented in this release> is a valid HTML custom element with tag name not and attributes implemented, in, this and release.
If you only want to replace specific HTML elements then list them at the start of the regular expression:
\s*</?(a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|command|content|data|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|element|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|head|header|hgroup|hr|html|i|iframe|image|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|multicol|nav|nextid|nobr|noembed|noframes|noscript|object|ol|optgroup|option|output|p|param|picture|plaintext|pre|progress|q|rp|rt|rtc|ruby|s|samp|script|section|select|shadow|slot|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|template|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*

Related

How can Make a selection to filter out the duplicates without grouping them? I want them all to display individually?

I have some data in a database that is sorted into a text column a individual identifier for each text item and a language for each of these text columns.
SELECT Text, Language, COUNT(*)
FROM TableA
WHERE Language = 'English'
GROUP BY Text, Language
HAVING COUNT(*) > 1
This Query gives me a list of the data I need however I have 2 issues, It is grouped up so the results display as:
| Text | Language | Amount Counted |
|------------|----------|-----------------|
| Hello Text | English | 5 |
The issue is I can sort based on the text to make a count however I cannot figure out how to add the unique identifier in there and list these out as one big list? For example The text 'Hello' could be in the list 5 Times and I would get this listed as above. However Each version of hello Will have a Different ID Value Perhaps The first version of Hello is (ID 232) and the Second is (ID 546) how can I add in the ID value which is in the same table and just list all the duplicated with their ID values?
So I would get As a example:
| Text | Language | ID |
|----------------|----------|------|
| Hello Text | English | 232 |
| Hello Text | English | 546 |
| Hello Text | English | 643 |
| Hello Text | English | 745 |
| Hello Text | English | 1353 |
| Other Text | English | 343 |
| Other Text | English | 433 |
| Different Text | English | 433 |
| Different Text | English | 437 |
| Different Text | English | 563 |
| Different Text | English | 898 |
Do you just want a window function?
SELECT text, language, id
FROM (SELECT a.*, COUNT(*) OVER (PARTITION BY Text) as cnt
FROM TableA a
WHERE Language = 'English'
) a
WHERE cnt > 1
ORDER BY id;

IF no value exists in "IN" statement, show values as '0'

I have th following result set for my query (exluding some select statments, joins, and where clauses because I just need the general method of how to accomplish this):
Select *
From hsi.itemdata
Where hsi.itemdata.itemtypenum in ('965','502','530','336','513','506','507','514','515','516')
ItemTypeNum | ItemType | DocTypeCount<br/>
502 | Consultation Report | 4 <br/>
506 | Discharge Summary | 10 <br/>
336 | ED Nurse Notes | 2 <br/>
513 | ED Provider Notes | 8 <br/>
514 | History and Physical | 15 <br/>
I want it to show all even if it doesn't exist in the in statement with a count of '0'. Like so...
ItemTypeNum | ItemType | DocTypeCount<br/>
502 | Consultation Report | 4 <br/>
506 | Discharge Summary | 10 <br/>
336 | ED Nurse Notes | 2 <br/>
513 | ED Provider Notes | 8 <br/>
514 | History and Physical | 15 <br/>
515 | *Appropriate Value* | 0 <br/>
516 | *Appropriate Value* | 0 <br/>
530 | *Appropriate Value* | 0 <br/>
507 | *Appropriate Value* | 0 <br/>
965 | *Appropriate Value* | 0 <br/>
In practice, you would use left join. The exact syntax depends on the database. Here is a version that works in SQL Server and Postgres, for instance:
Select v.itemtypenum,
coalesce(id.ItemType, '*Appropriate Value*') as ItemType,
coalesce(DocTypeCount, '') as DocTypeCount
From (values ('965'), ('502'), ('530'), ('336'), ('513'), ('506'), ('507'), ('514'), ('515'), ('516')
) v(itemtypenum) left join
hsi.itemdata id
on id.itemtypenum = v.itemtypenum;
Note: although not all databases support values in the from clause, almost all have some mechanism for creating a table on the fly. The idea is the same, but the syntax would be a bit different.
One way is by using CASE in your select for DocTypeCount:
SELECT ...
CASE
WHEN hsi.itemdata.itemtypenum IN ('965','502','530','336','513','506','507','514','515','516')
THEN DocTypeCount
ELSE 0
END AS DocTypeCount
FROM hsi.itemdata
Of course, this may get more complicated depending on your actual query and how you're getting these columns.

Change value from one column into another

I have got a table:
ID | Description
--------------------
1.13.1-3 | .1 Hello
1.13.1-3 | .2 World
1.13.1-3 | .3 Text
4.54.1-4 | sthg (.1) Ble
4.54.1-4 | sthg (.2) Bla
4.54.1-4 | aaaa (.3) Qwer
4.54.1-4 | bbbb (.4) Tyuio
And would like to change ending of ID by taking value from second column to have result like:
ID | Description
--------------------
1.13.1 | Hello
1.13.2 | World
1.13.3 | Text
4.54.1 | Ble
4.54.2 | Bla
4.54.3 | Qwer
4.54.4 | Tyuio
Is there any quick way to do it in postgresql?
Use regex to manipulate the strings into what you want:
update mytable set
ID = regexp_replace(ID, '\.[^.]*$', '') || substring(Description from '\.[0-9+]'),
Description = regexp_replace(Description, '.*\.[0-9]+\S* ', '')
See SQLFiddle showing this query working with your data.

Transforming XML in the XML data type in SQL Server

I have a table called tags, which looks like this:
--------------------
|TagId | TagName |
--------------------
| 1 | Travel |
--------------------
| 2 | Gadgets |
--------------------
| 3 | Hobbies |
--------------------
| 4 | Movies |
--------------------
And I have another table, which has an XML data type column called Tags.
-------------------------------------------------------------------------
|PostId | Title | Tags |
-------------------------------------------------------------------------
| 1 | Blog Post 1 | <xml><tags><tag>1</tag/><tag>2</tag></tags>|
--------------------------------------------------------------------------
| 2 | Blog Post 2 | <xml><tags><tag>2</tag/><tag>3</tag></tags>|
--------------------------------------------------------------------------
| 3 | Blog Post 3 | <xml><tags><tag>3</tag/><tag>4</tag></tags>|
--------------------------------------------------------------------------
I want to combine the data from these two tables, to create a single view, which looks like this. The number inside the node should act as a foreign key, to the Tags table.
---------------------------------------------------------------------------
| Title | Tags |
---------------------------------------------------------------------------
| Blog Post 1 | <xml><tags><tag>Travel</tag/><tag>Gadgets</tag><tags> |
----------------------------------------------------------------------------
| Blog Post 2 | <xml><tags><tag>Gadgets</tag/><tag>Hobbies</tag></tags> |
----------------------------------------------------------------------------
| Blog Post 3 | <xml><tags><tag>Hobbies</tag/><tag>Movies</tag></tags> |
----------------------------------------------------------------------------
Is it possible to create a view like this? How do I do that?
Something like this should do the trick:
/* SQL Follows */
select
postID, title,
tags,
(
select
/* 4. Retrieve the tag name */
y.tag.value('.', 'int') tagID,
t.tagName
from
/* 2. Shred the XML into nodes */
p.tags.nodes('/xml/tags/tag') as y(tag)
/* 3. Join the tag ID onto the tags table. */
inner join #tags t on t.tagID = y.tag.value('.', 'int')
for
/* 5. Convert it into XML */
xml path('tag'), type
)tags2
/* 1. For each post */
from #posts p
I've used temp tables #tags and #posts in this example. To get the exact output you are after you will need to tweak the XML a little.
Well I won’t provide a full solution because I don’t know how is your data. I’m sure it isn’t as simple as you displayed, but I think I can point you in the right direction.
I would do a replace on your full string, looking for ID (don’t look just for 1, because you can have 11 for example and that would fail) and then replace it by a select where tagID=ID
select replace('<xml><tags><tag>1</tag/><tag>2</tag></tags>','<tag>1</tag/>',(select name from tags where tagid=1))
result:
<xml><tags>Travel<tag>2</tag></tags>
of course, you would need to do this to each tag, than it is up to you how to do it. You could loo the tag table or if you have just a few tags, run the replace a few times

How do a populate a drop down menu from a database without repeating the data?

I am trying to create a search field that is a dynamically populated drop down menu. The code to query the database and populate the menu is as follows:
<form action="<?=$SERVER['PHP_SELF'];?>" method="post" name="nameSearch">
<label for="nameMenu">
<p>Choose a name from the drop down menu<br />
to retrieve all the data for that person.</p>
</label>
<select name="dateMenu" id="dateMenu">
<option value="0" selected="selected">Choose a name</option>
<?php
$sql_search = 'SELECT id, name from $tbl_name';
$query_search = mysql_query($sql_search);
while($row = mysql_fetch_array($query_search))
{
echo '<option value=' . $row['id'] . '>' . $row['name'] . '</option>';
}
?>
</select>
<input type="submit" name="searchName" value="Find">
</form>
The problem is that because the database holds information that causes it to store the same name repeatedly I get the same name listed as an option in my drop down menu.
The DB looks like so:
----------------------------------------------------------
| id | name | uniqueImage | date | time |
----------------------------------------------------------
| 1 | John | uniqueImage001.png | 2011-03-11 | 14:21:20 |
| 2 | James| uniqueImage002.png | 2011-03-11 | 14:24:30 |
| 3 | Joe | uniqueImage003.png | 2011-03-11 | 14:26:10 |
| 4 | John | uniqueImage004.png | 2011-03-11 | 14:40:10 |
| 5 | Joe | uniqueImage005.png | 2011-03-11 | 14:56:32 |
| 6 | Joe | uniqueImage006.png | 2011-03-11 | 15:02:50 |
| 7 | James| uniqueImage007.png | 2011-03-11 | 15:21:25 |
| 8 | John | uniqueImage008.png | 2011-03-11 | 15:26:30 |
----------------------------------------------------------
and the menu options returned look like so:
<option value="1">John</option>
<option value="2">James</option>
<option value="3">Joe</option>
<option value="4">John</option>
<option value="5">Joe</option>
<option value="6">Joe</option>
<option value="7">James</option>
<option value="8">John</option>
What I need is for the drop down menu to be more like this:
<option value="1">John</option>
<option value="2">James</option>
<option value="3">Joe</option>
This way I can get all of the unique images associated with a particular user. I don't need every name in the DB returned in my query, simply each name only one time. I am thinking I may need to set up a relational DB scheme, but I am trying to avoid this. Is there any other way to set this up? Thanks.
Change your SQL to SELECT DISTINCT id, name from $tbl_name
Change your database call to be:
SELECT id, name from $tbl_name group by id, name