MySQL: Replace substring if string ends in jpg, gif or png - sql

I'm doing a favor for a friend, getting him off of Blogger and onto a hosted WordPress blog.
The big problem is, with over 1,800 posts, there are a lot of image links to deal with. WordPress has no mechanism to import these automatically, so I'm doing it manually.
I've used wget to download every single image that has ever been linked/embedded on the site. Now I need some help building a MySQL query to change all of the images in the blog to their new address.
For example:
http://www.externaldomain.com/some/link/to/an/image.jpg
Ought to become:
http://www.newbloghosting.com/wordpress/wp-content/uploads/legacy/www.externaldomain.com/some/link/to/an/image.jpg
So the condition is, if a string in post_content ends in jpeg, jpg, gif or png, replace:
http://
with
http://www.newbloghosting.com/wordpress/wp-content/uploads/legacy/
I know how to do a blanket replace with
UPDATE wp_posts SET post_content = replace(post_content, 'http://www.old-domain.com', 'http://www.new-domain.com');
But I'm having a hard time figuring out how to accomplish my more nuanced, conditional approach.
Thanks for any guidance you can offer. (Torn between posting here or ServerFault but SO looks like it has plenty of MySQL gurus, so here I am.)

MySQL has a great selection of string manipulation functions that you can plug into your query's WHERE section.
UPDATE wp_posts
SET post_content = REPLACE(post_content, 'http://www.old-domain.com', 'http://www.new-domain.com')
WHERE RIGHT(post_content, 4) = 'jpeg'
OR RIGHT(post_content, 3) IN ('jpg', 'gif', 'png');
If it were me, though, I'd do two additional things: convert it to lowercase to match e.g. '.JPG', and match the dot before jpg, gif, etc.:
WHERE LOWER(RIGHT(post_content, 5)) = '.jpeg'
OR LOWER(RIGHT(post_content, 4)) IN ('.jpg', '.gif', '.png');

REPLACE will only perform an alteration if the old substring is found - I don't see the concern.
REPLACE(post_content, 'http://www.old-domain.com', 'http://www.new-domain.com');
...will work. If you want to limit the updates to rows containing "jpeg", "jpg", "gif" and/or "png", add:
WHERE INSTR(post_content, 'jpeg') > 0
OR INSTR(post_content, 'jpg') > 0
OR INSTR(post_content, 'gif') > 0
OR INSTR(post_content, 'png') > 0
References:
REPLACE
INSTR

If everything fails, what about using the import feature? Then use a plugin to get your images as well (check for the comments in the plugin post since there are some relevant information).

I don't think it can be done in a simple query, but it's relatively easy to do with a simple php script. just have a simple loop in php to go over every single row with content and do a preg_replace on the content field, then update that single row.
It's not nearly as elegant as doing it in sql, but its sure to get the job done today as oposed to sometime this year.
P.S. this is assuming there is more content than just the URL, in which case normal mysql string functions would suffice.

Related

Moved site , all links got corrupted, need correct SQL query to fix

Wordpress. from litespeed to ngix server.
All of my links to plugins, images CSS and scripts added "%20" on the end. Example:
https://example.com%20/wp-content/plugins/wp-maintenance-mode/assets/css/style.min.css?
I assume the solution would be to change all "example.com%20" to "example.com" in the database. What would be the correct query to put in phpmyadmin?
I assume the solution would be to change all "example%20" to "example" in the database.
You can use replace():
update t
set link = replace(link, 'example%20', 'example')
where link like '%example$%20%' escape '$';
You have to be a little careful if 'example%20' were ever in the data before the change (i.e. is correct). However, that is unlikely in this case.
Found the solution, was a trailing space in wp-config.

google analytics API, how to extract pageviews for a specific page?

Google Analytics API: how to extract pageviews for a specific page?
I tried using something like
ga:pagePath=~page.php%3fid%3d44 (page.php?id=44)
but it doesn't seem to work... I get "no results found" where I have 20 pageviews for sure
UPDATE
I think I found the solution
ga:pagePath==/website/page.php?id=44
for some reason I had to include the complete path and ==
To use a partial path to match for a page in filters you should use
ga:pagePath=#page.php?id=44
=# tells ga to match a substring.
What you were originally using was incorrect for this.
I think your problem is that you put the hex version of the ? and = characters into your query, which doesn't match how Analytics stores the page paths. If you change these to the normal characters it should work:
ga:pagePath=~page.php?id=44
Your other solution should work as well but is a bit more inflexible in case you wanted to tweak the query to return other pages.

help to get rid of HTML special chars in database

I've migrated my site from interspire CMS to Joomla! CMS.
I've managed to migrate all the database of articles, but some of them have a weird issue - when I access the page from joomla, the title contains HTML entities like ’.
As you can guess from the CMS's I use, I rely on PHP as my server side, and MySql for my database.
I tried to go over the titles of the articles in the database with htmlspecialchars_decode AND html_entity_decode in order to get rid of those, but it had no effect.
if I just grab an example from the DB and echo it, it will look OK:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
if I go to the article page in joomla it will look like this:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
When I go to PhpMyAdmin to see directly what is in the database, this is the contents of the title:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
I even tried to remove the symbol with:
str_replace("’","",$title);
or replace it like this
str_replace('’',"'",$title);
but nothing.
When I tried to encode it again instead of decoding it (just to see if i'm on the right DB) it worked and encoded it again...
Please, I would be glad to have any new ideas...
Thanks,
Yanipan
Try setting encoding to cp1252. This worked out for me:
$decoded = html_entity_decode($your_string, ENT_QUOTES, 'cp1252');
Probably your best bet is to do search and replace within the database itself vs trying to do it with php. Search and replace in mysql is done like this:
update TABLE_NAME set FIELD_NAME = replace(FIELD_NAME, ‘find this string’, ‘replace found string with this string’);
So yours should look something like:
update ARTICLES set TITLE = replace(TITLE, '’', '\'');
Give that a shot.
Need more info
What is the character encoding on your database? That & or ;, may be something other than the typical ASCII.
It's possible that PHP/Joomla is double-encoding your string. Look at the browser's page source and find the text in the produced HTML. Instead of What’s, it might just be one of the following:
What&rsquo&59;s
What&38;rsquo&59;s
What’s

MySQL: How can I remove trailing HTML from a field in the database?

I want to remove some rogue HTML from a DB field that is supposed to contain a simple filename. Example of ok field:
myfile.pdf
Example of not ok field:
myfile2.pdf<input type="hidden" id="gwProxy" />...
Does anyone know a query I can run that can remove the HTML part but leave the filename? i.e. remove everything from the first < character onwards.
Lets assume the field is called myattachment and is defined as a varchar(250) and the table is called mytable in a MySQL database.
Background info (not necessary to read):
The field in our database is supposed to contain filenames however, due to a issue (documented here) some of the fields now contain a filename and some rogue HTML. We have fixed the root issue and now need to fix the corrupt fields. In the past I have replaced text using this kind of query:
UPDATE mytable SET myattachment = replace(myattachment, 'JPG', 'jpg') WHERE myattachment LIKE '%JPG';
This query seems to work ok, can anyone see any issues with it?
UPDATE mytable
SET myattachment = SUBSTRING_INDEX(myattachment, '<', 1)
WHERE `myattachment` LIKE '%<%';
For docs on SUBSTRING_INDEX see the mysql manual page.

Need to add LF to thousands of MySQL MEDIUMTEXT fields

I need to add a \n or LF or ASCII(x'0A') or however you want to encode it to the end of several thousand MySQL MEDIUMTEXT fields. I tried
update wp_posts set post_content = concat(post_content,ASCII(x'0A'));
but nothing is modified in the field as far as I can see. I suspect this is a limitation of MEDIUMTEXT/LONGTEXT fields, but it would be nice if MySQL would throw an error instead of saying it did something and doing nothing.
If I can't do it with update/concat, what other options do I have? Does someone have a chunk of php code I can use to do it in a loop?
Thanks for any input! I'm trying to get my wordpress site to validate with W3C, and I need the newlines so wordpress does not add a <p> tag inappropriately. Strange, I know.
I don't know where my brain was. I was using HeidiSQL as the interface to MySQL, and wasn't seeing the change in the database. But it did change, which I saw once I refreshed the table data. Sorry for the false question!
Try:
update wp_posts set post_content = concat(post_content,'\n');
I tested with a mediumtext field here and worked..