Remove text from within a database text field - sql

I recently tried to import a bunch of blog posts from an old blog (SharePoint) to my current blog (WordPress). When the import completed, a lot of nasty <div> tags and other HTML made it in to the content of the post, which screwed up the way my site was rendering.
I'm able to view the offending rows in the MySQL database and want to know if there's a way to selectively remove the HTML text that may be causing problems. I could probably hack this in C# by parsing through the text, but I'd like to figure out how I can do this using SQL if I can.
If you want to see a full text sample of what one of these files looks like as it exists in the database text field, I uploaded a full sample file to my web site.
Here's want I want to do:
Remove <![CDATA[<div><b>Body:</b> from the beginning of every file
Remove the meta information at the end of every file, which might look like this:
<div><b>Category:</b> SharePoint</div>
<div><b>Published:</b> 11/12/2007 11:26 AM</div>
]]>
Remove every <div> and closing </div> tag, which might have a class attribute like:
<div class=ExternalClass6BE1B643F13346DF8EFC6E53ECF9043A>
Note: The hex string at the end of the ExternalClass can be different
I haven't used an Update statement in MySQL before and I'm at a loss for where to begin to selectively replace text within a text field. Would I use regex from within a SQL statement to help? How would I execute a statement against the remote DB?

What about cleaning up the posts before you import them? Seems like working with a local file that you can treat as a text file would be far easier. Then you could use Perl or Python to bear down on the problem to your liking before importing.
This assumes that you still have access to the data that was over in SharePoint.

There is no simple way of doing this without utilizing the back-end platform which you are using to serve your website or are most acustomed to. Myself, I would use PHP or Perl to clean the data up which will could be tricky at best. So the answer is, it can be done, but you must use some type of programming/processing language to do so, MySQL on its own won't be able to clean the data.

Assuming you are determined to use SQL like you said in your question, If you have the skill to hack it with C# you should be able to figure out how to create a stored procedure that uses a cursor in a repeat/fetch loop to select the rows, string functions to massage the data, and an update to update the row. Check this out:
http://dev.mysql.com/doc/refman/5.0/en/cursors.html

Related

Newbie: What is the easiest way to bulk add data to a sqlite database?

I'm a total newbie to SQL, and I'd like to know whether anyone knows a means for easily "copy and pasting" hundreds of entries to a sqlite database. Again, I'm not a professional programmer, so software that could automate that process would be great. (I primarily code in JavaScript, but SQL code can be used as well if you could kindly explain the code.)
Essentially, the text I'd be adding would be delimited by a character (the '|' character in my case) for the columns, and line breaks for the rows. It would be added onto a table that's already being used in the database, with columns already set up.
Thanks a lot!! Any suggestions are most appreciated!
You can use DB Browser for SQLite and then File > Import > From CSV file.. after creating a New Database.

Export data from SQL without the hard returns

I have a field in SQL that is called Comments I am trying to pull all comments with a "?" in them so below is a sample code like what I am using. The code works fine however the problem is when I go to copy this data out and paste it into word or excel the information comes out looking jumbled. I have figured out the reason for this is that the input side of the application where the comments are entered allows for the user to do multi-line comments so there are hard returns in the field. Is there a way I can export the data without the hard returns. I am using SSMS with a SQL 2012 database. For Office I have 2013 if that is needed.
Select C.Comment
From Patron.Comment as C
Where C.Comment like '%?%'
Yes, this is possible, you can use the REPLACE function to get rid of unwanted newline characters:
Replace a newline in TSQL

Invalid characters in XML fails Datastage job

I am a new developer who just started using datastage (coming from a bit of experience with SSIS). One of the first things that I am doing is working with XML data flow into a database from MQ. I connect to the MQ, use an XML job to map out the tags to each db column, and then insert it into the db. However, I am having an issue with the incoming xml. One of the fields on each xml file that I process contains the same character sequence which is something along the lines of "&$!0" .
When I run my job I get an error saying that that is an illegal xml character and the job fails.
Is there a way within datastage to replace this value as it comes through the xml, or even just remove it? Is there a specific tool I should be using within my job for this?
Obviously the easiest solution would be to fix that data coming in, however in the mean-time while that is getting squared away, I want to be able to do some testing, so an alternate solution would be great for now.
Any advice would be greatly appreciated. I am a new developer so I apologize if this question is a bit ignorant/low level.
use a text editor like notepad++ to remove the characters yourself...
to automate, sed in linux will do your job and sed for windows will probably work on windows too!
These characters are nothing but Unicode. You need to remove them before you insert into DB table.
Try below code:
s = s.replaceAll("\\p{&$!0}+", "");
NOTE: You need to find out all Unicode and and replace them with "" (blank).
You will get more information here

SQL injection in Symfony/Doctrine

Using parameters instead of placing values directly in the query string is done to prevent SQL injection attacks and should always be done:
... WHERE p.name > :name ...
->setParameter('name', 'edouardo')
Does this mean that if we use parameters like this, we will always be protected against SQL injections? While using a form (registration form of FOS), I put <b>eduardo</b> instead and this was persisted to the database with the tags. I don't really understand why using parameters is preventing against SQL injections...
Why are the tags persisted to the database like this? Is there a way to remove the tags by using Symfony's validation component?
Is there a general tip or method that we should be using before persisting data in the database in Symfony?
Start with reading on what's SQL injection.
SQL injection attack takes place when value put into the SQL alters the query. As a result the query performs something else that it was intended to perform.
Example would be using edouardo' OR '1'='1 as a value which would result in:
WHERE p.name > 'edouardo' OR '1'='1'
(so the condition is always true).
"<b>eduardo</b>" is a completely valid value. In some cases you will want to save it as submited (for example content management system). Of course it could break your HTML when you take it from the database and output directly. This should be solved by your templating engine (twig will automatically escape it).
If you want process data before passing it from a form to your entity use data transformers.
If you use parameters instead of concatenation when creating a request, the program is able to tell SQL keywords and values apart. It can therefore safely escape values that may contain malicious SQL code, so that this malicious does not get executed, but stored in a field, like it should.
HTML code injection is another problem, which has nothing to do with databases. This problem is solved when displaying the value, by using automatic output escaping, which will display <b>eduardo</b> instead of eduardo. This way, any malicious js / html code won't be interpreted : it will be displayed.

SQL find and replace text

I'm working on updating an existing wordpress database and everything is going smoothly. However, the links are still directing to the old site. Is there any way to use a loop or something to run through every record and update http://OLD_URL.com to say http://NEW_URL.com?
I might just be too lazy to manually do it but I will if it comes down to it. Thank you.
I usually run a couple of quick commands in phpmyadmin and I'm done. Here's a blog post that discusses this exact issue: http://www.barrywise.com/2009/02/global-find-and-replace-in-wordpress-using-mysql/ I would read this first: http://codex.wordpress.org/Changing_The_Site_URL to make sure all your bases are covered first.
If you want to update links in a particular table you can use the query like below:
UPDATE TableName
SET URL =
CASE
WHEN URL = 'http://OLD_URL.com'
THEN 'http://NEW_URL.com
ELSE URL
END
FROM TableName