Modify a column, to get rid of html surrounding an ID - sql

I have a table and one of the columns contains html for an iFrame & within it an external video, specifically it's like
<iframe src="http://host.com/videos/ID" otherattributes...></iframe>.
I need to update the current column or create a new one (doesn't matter) so what I have is just the ID of that video, I know I could use a regex for it but I'm really weak with it.
perhaps so it find the content that is within literal characters: [videos/] and the upcoming ["] which comes right after the ID but I'm unsure how.

You can use CHARINDEX() function:
update T SET
VideoID=SUBSTRING(descr,
charindex('/videos/',descr)+LEN('/videos/'),
charindex('"',descr,charindex('/videos/',descr)+LEN('/videos/'))
-(charindex('/videos/',descr)+LEN('/videos/')))
SQLFiddle demo

This should work, assuming the text videos/ doesn't appear anywhere else in the html.
update htmltable
set id = SUBSTRING(SUBSTRING(html,
CHARINDEX('videos/', html) + 7,
LEN(html)
),
0,
CHARINDEX('"', SUBSTRING(html,
CHARINDEX('videos/', html) + 7,
LEN(html)
)
)
)
This updates a field named otherfield in table htmltable where the id in the url is '123'. It's pretty ugly code, but SQL Server has limited string functions.
If you have any control over the table structure, I would suggest you make some changes. The video ID should be stored in its own column, separate from the rest of the url. Then when you need to retrieve the url, you would concatenate the two parts to get the whole url. That would be much more maintainable.

Related

How to make LIKE in SQL look for specific string instead of just a wildcard

My SQL Query:
SELECT
[content_id] AS [LinkID]
, dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name'))) AS [Physician Name]
FROM
[DB].[dbo].[table1]
WHERE
[id] = '188'
AND
(content LIKE '%Urology%')
AND
(contentS = 'A')
ORDER BY
--[content_title]
dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name')))
The issue I am having is, if the content is Neurology or Urology it appears in the result.
Is there any way to make it so that if it's Urology, it will only give Urology result and if it's Neurology, it will only give Neurology result.
It can be Urology, Neurology, Internal Medicine, etc. etc... So the two above used are what is causing the issue.
The content is a ntext column with XML tag inside, for example:
<root><Location><location>Office</location>
<office>Office</office>
<Address><image><img src="Rd.jpg?n=7513" /></image>
<Address1>1 Road</Address1>
<Address2></Address2>
<City>Qns</City>
<State>NY</State>
<zip>14404</zip>
<phone>324-324-2342</phone>
<fax></fax>
<general></general>
<from_north></from_north>
<from_south></from_south>
<from_west></from_west>
<from_east></from_east>
<from_connecticut></from_connecticut>
<public_trans></public_trans>
</Address>
</Location>
</root>
With the update this content column has the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Physicians>
<name>Doctor #1</name>
<picture>
<img src="phys_lab coat_gradation2.jpg?n=7529" />
</picture>
<gender>M</gender>
<langF1>
English
</langF1>
<specialty>
<a title="Neurology" href="neu.aspx">Neurology</a>
</specialty>
</Physicians>
</root>
If I search for Lab the result appears because there is the text lab in the column.
This is what I would do if you're not into making a CLR proc to use Regexes (SQL Server doesn't have regex capabilities natively)
SELECT
[...]
WHERE
(content LIKE #strService OR
content LIKE '%[^a-z]' + #strService + '[^a-z]%' OR
content LIKE #strService + '[^a-z]%' OR
content LIKE '%[^a-z]' + #strService)
This way you check to see if content is equal to #strService OR if the word exists somewhere within content with non-letters around it OR if it's at the very beginning or very end of content with a non-letter either following or preceding respectively.
[^...] means "a character that is none of these". If there are other characters you don't want to accept before or after the search query, put them in every 4 of the square brackets (after the ^!). For instance [^a-zA-Z_].
As I see it, your options are to either:
Create a function that processes a string and finds a whole match inside it
Create a CLR extension that allows you to call .NET code and leverage the REGEX capabilities of .NET
Aaron's suggestion is a good one IF you can know up front all the terms that could be used for searching. The problem I could see is if someone searches for a specific word combination.
Databases are notoriously bad at semantics (i.e. they don't understand the concept of neurology or urology - everything is just a string of characters).
The best solution would be to create a table which defines the terms (two columns, PK and the name of the term).
The query is then a join:
join table1.term_id = terms.term_id and terms.term = 'Urology'
That way, you can avoid the LIKE and search for specific results.
If you can't do this, then SQL is probably the wrong tool. Use LIKE to get a set of results which match and then, in an imperative programming language, clean those results from unwanted ones.
Judging from your content, can you not leverage the fact that there are quotes in the string you're searching for?
SELECT
[...]
WHERE
(content LIKE '%""Urology""%')

SQL Query Needed for Joomla 1.5

I'm not sure if this is even possible but I have a Joomla 1.5 installation and I'm updating to 3.1.5 but the problem is the current site used a plugin which uses the Key Reference of each article to generate the Meta Page Title and I need to take this info from each article (220+) and put it under params in the menu item so I currently have the following;
Article
TABLE joscontent COLUMN id VALUE 39
TABLE joscontent COLUMN attribs VALUE:
show_title=
link_titles=
show_intro=
show_section=
link_section=
show_category=
link_category=
show_vote=
show_author=
show_create_date=
show_modify_date=
show_pdf_icon=
show_print_icon=
show_email_icon=
language=
keyref=Page Title Value is Currently Here
readmore=
MENU ITEM
table column value
jos_menu link index.php?option=com_content&view=article&id=39
jos_menu params
show_noauth=
show_title=
link_titles=
show_intro=
show_section=
link_section=
show_category=
link_category=
show_author=
show_create_date=
show_modify_date=
show_item_navigation=
show_readmore=
show_vote=
show_icons=
show_pdf_icon=
show_print_icon=
show_email_icon=
show_hits=
feed_summary=
page_title=
show_page_title=1
pageclass_sfx=
menu_image=-1
secure=0
So is there a way i can submit an SQL query that will take the keyref value from the attribs column and insert the page_title value of the params colum where the Article ID matches the id used in the menu link and this will do every article on the site.
Hope this makes sense!
EDIT THIS SOLUTION WORKED
update jos_menu jm
set params = (select concat('show_noauth=
show_title=
link_titles=
show_intro=
show_section=
link_section=
show_category=
link_category=
show_author=
show_create_date=
show_modify_date=
show_item_navigation=
show_readmore=
show_vote=
show_icons=
show_pdf_icon=
show_print_icon=
show_email_icon=
show_hits=
feed_summary=
page_title=',
replace(substr(attribs, locate('keyref=', attribs)+7), 'readmore=', ''),
'
show_page_title=1
pageclass_sfx=
menu_image=-1
secure=0 ')
from jos_content jc
where jm.link = concat('index.php?option=com_content&view=article&id=', jc.id))
where Instr(jm.link, 'index.php?option=com_content&view=article&id=')
You might find a single SQL statement that can do this by using some tricky regular expressions. But I would really suggest that you write a script in PHP which loops through all records and does this job. this will give you more control and the ability to make some error checks.

SQL Server : remove duplicated text within a string

I have a SQL Server 2008 table with a column containing lengthy HTML text. Near the top there is a link provided for an associated MP3 file which is unique to each record. The links are are all formatted as follows:
<div class="MediaSaveAs">Download Audio </div>
Unfortunately many records contain two or three sequential and identical instances of this link where there should be only one. Is there a relatively simple script I can run to find and eliminate the redundant links?
I'm not entirely sure - because your explanation wasn't very clear - but this appears to do what you want, although whether or not you consider this to be a "simple script", I don't know.
declare #Link nvarchar(200) = N'<div class="MediaSaveAs">Download Audio </div>'
declare #BadData nvarchar(max) = N'cbjahcgfhjasgfzhjaucv' + replicate(#Link, 3) + N'cabhjcsghagj',
#StartPattern nvarchar(34) = N'<div class="MediaSaveAs"><a href="',
#EndPattern nvarchar(27) = N'">Download Audio </a></div>'
select #BadData
select replace (
#BadData,
substring(#BadData, charindex(#StartPattern, #BadData), len(#BadData)-charindex(reverse(#EndPattern), reverse(#BadData))-charindex(#StartPattern, #BadData) + 2),
substring(#BadData, charindex(#StartPattern, #BadData), charindex(#EndPattern, #BadData) + len(#EndPattern) - charindex(#StartPattern, #BadData))
)
Personally I would not like to have to maintain this code; I would far rather use a script in another language that can actually parse HTML. You said this is "just a repeated text issue", but that doesn't mean it's an easy problem and especially not in a language like TSQL that has such limited support for string operations.
For future reference, please put all relevant information into the question - you can edit it if you need to - instead of leaving them in the comments where they are difficult to read and may be overlooked. And please post sample data and results instead of describing things in words.
First we need to identify the file names, which we can do with PATINDEX:
select
substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
from files
And then secondly identify and the duplicates, check it out:
delete
from files
where id not in (
select max(id)
from files
group by substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
)
http://www.sqlfiddle.com/#!3/887a3/5

MOSS 2007: What is the source of "Directories"?

I'm trying to generate a new SharePoint list item directly using SQL server. What's stopping me is damn tp_DirName column. I have no ideas how to create this value.
Just for instance, I have selected all tasks from AllUserData, and there are possible values for the column: 'MySite/Lists/Task', 'Lists/Task' and even 'MySite/Lists/List2'.
MySite is the FullUrl value from Webs table. I can obtain it. But what about 'Lists/Task' and '/Lists/List2'? Where they are stored?
If try to avoid SQL context, I can formulate it the following way: what is the object, that has such attribute as '/Lists/List2'? Where can I set it up in GUI?
Just a FYI. It is VERY not supported to try and write directly to SharePoint's SQL Tables. You should really try and write something that utilizes the SharePoint Object Model. Writing to the SharePoint database directly mean Microsoft will not support the environment.
I've discovered, that [AllDocs] table, in contrast to its title, contains information about "directories", that can be used to generate tp_DirName. At least, I've found "List2" and "Task" entries in [AllDocs].[tp_Leaf] column.
So the solution looks like this -- concatenate the following 2 components to get tp_DirName:
[Webs].[FullUrl] for the web, containing list, containing item.
[AllDocs].[tp_Leaf] for the list, containing item.
Concatenate the following 2 components to get tp_Leaf for an item:
(Item count in the list) + 1
'_.000'
Regards,
Well, my previous answer was not very useful, though it had a key to the magic. Now I have a really useful one.
Whatever they said, M$ is very liberal to the MOSS DB hackers. At least they provide the following documents:
http://msdn.microsoft.com/en-us/library/dd304112(PROT.13).aspx
http://msdn.microsoft.com/en-us/library/dd358577(v=PROT.13).aspx
Read? Then, you know that all folders are listed in the [AllDocs] table with '1' in the 'Type' column.
Now, let's look at 'tp_RootFolder' column in AllLists. It looks like a folder id, doesn't it? So, just SELECT the single row from the [AllDocs], where Id = tp_RootFolder and Type = 1. Then, concatenate DirName + LeafName, and you will know, what the 'tp_DirName' value for a newly generated item in the list should be. That looks like a solid rock solution.
Now about tp_LeafName for the new items. Before, I wrote that the answer is (Item count in the list) + 1 + '_.000', that corresponds to the following query:
DECLARE #itemscount int;
SELECT #itemscount = COUNT(*) FROM [dbo].[AllUserData] WHERE [tp_ListId] = '...my list id...';
INSERT INTO [AllUserData] (tp_LeafName, ...) VALUES(CAST(#itemscount + 1 AS NVARCHAR(255)) + '_.000', ...)
Thus, I have to say I'm not sure that it works always. For items - yes, but for docs... I'll inquire into the question. Leave a comment if you want to read a report.
Hehe, there is a stored procedure named proc_AddListItem. I was almost right. MS people do the same, but instead of (count + 1) they use just... tp_ID :)
Anyway, now I know THE SINGLE RIGHT answer: I have to call proc_AddListItem.
UPDATE: Don't forget to present the data from the [AllUserData] table as a new item in [AllDocs] (just insert id and leafname, see how SP does it itself).

SQL: Find and Replace with SQL?

I have a MySQL InnoDB database.
I have a column my in 'article' table called url that needs to be updated.
Stored in article.url =
/blog/2010/article-name
/blog/1998/the-article-name
/blog/...
I need to change /blog/ to /news/. (E.g. now article.url = '/news/...')
What is the SQL needed to replace "/blog/" with "/news/" in the article.url column?
update url
set article = replace(article, '/blog/', '/news/')
where article like '/blog/%'
If every url starts with "/blog/" and you don't want to change anything except the prefix, then you can just use substring() and concat() instead of replace():
update article
set url = concat('/news/',substring(url,7))
where url like '/blog/%';
I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:
REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”),
“something else we are looking for”, “replace second instance”)
This is the syntax I used to detect a boolean value:
REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)
Hope this helps!