Building SEO-friendly URLs for accented characters - optimization

We are making our site an SEO-friendly site by following the pattern below:
http://OurWebsite.com/MyArticle/Math/Spain/Glaño
As you see, Glaño has a spelling character that search engines may not like it. On the other hand we cannot build up the last URL!
Any suggestions to maintain our current URL generation code to handle Spanish or French entries or we need to change our approach?

Try these functions:
function Slug($string, $slug = '-', $extra = null)
{
return strtolower(trim(preg_replace('~[^0-9a-z' . preg_quote($extra, '~') . ']+~i', $slug, Unaccent($string)), $slug));
}
function Unaccent($string)
{
return html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8');
}
And use it like this:
echo Slug('Iñtërnâtiônàlizætiøn of Glaño'); // internationalizaetion-of-glano
You can embed the Unaccent() code into the Slug() function if you wish to have only one function.

Perhaps replace accented characters with the closest matching non-accented latin character.
Unless "Glano" means something very rude, this is probably your best bet.
If you search google for "Glaño" it returns pages with "Glano" in it anyway, so the SEO shouldn't be harmed.
To translate the characters from accented to unaccented, you could use this function (this is in PHP, but hopefully you'd be able to use it as a starting point for other languages):
function normalize ($string) {
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
);
return strtr($string, $table);
}
(Author credit goes to allixsenos at gmail http://php.net/manual/en/function.strtr.php)

I agree that unless "Glano" means something very rude, this is probably your best bet. Now, I want to add that if you care about SEO I would consider not having too many folders in the URL. One root, three sub-folders and then the file. This may hurt more than the special character.

Related

Escaping special characters & Encoding unsafe and reserved characters Lucene query syntax Azure Search

I have words "C&&K", "So`am`I" , "Ant||Man", "A*B==AB", "Ant+Man" in index of azure search.
According to Doc for Escaping special characters + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / I need to prefixing them with backslash (\) And for unsafe and reserved characters need to encode them in URL.
for "C&&K" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%5C%26%5C%26K~&queryType=full
for "So`am`I" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=So%5C%60am%5C%60I~&queryType=full
for "Ant||Man" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=A%5C*B%3D%3DAB~&queryType=full
for "A*B==AB" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=A%5C*B%3D%3DAB~&queryType=full
for "Ant+Man" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=Ant%5C%2BMan~&queryType=full
For all off them I do not get search result. I get "value": []
for "C&&K" I have also tried
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%5C%26%26K~&queryType=full
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%26%5C%26K~&queryType=full
for "So`am`I" I have also tried
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=So%60am%60I~&queryType=full
It does not work. What am I doing wrong here?
With standard analysis, all of these would be indexed as multiple terms. Fuzzy queries, however, are not analyzed, so it will attempt to find it as a single term. That is, when you index "Ant||Man", after analysis, you end up with the terms "ant" and "man" in the index. When you search for Ant||Man, it will analyze it in much the same way as at index time, but when searching for Ant||Man~, the query won't be analyzed, and since no terms like that exist in the index, you won't get any matches. Similarly, for "A*B==AB" you get the terms "b" and "ab" ("a" is a stop word with default analysis).
So, try the queries without the ~.
In addition to femtoRgon's response, you may want to consider using a custom analyzer that does not index these as multiple terms if you would always like them to be searchable as they are. There is documentation on custom analyzers here, and you can use the Analyze API to test to make sure a given analyzer works as you expect.

Parse user input for command arguments into array

I'm making a bot in PHP and I want a better way to parse user input into arguments for later operations.
An example would be a user saying "/addresponse -"test" -"works""
I want this to parse that string into:
$command ["test", "works"];
I have found the PHP command parser but I want the user to be able to use human readable commands rather than typing something like /addresponse?p="test"&r="works"
Right now I have a regex working so the user can type "/addresponse "test" "works"" but there are obvious problems because the user cannot make a response for '"test"' only 'test'
I'd appreciate any help, right now I think I can make a regex to get all text between ' -' but I still don't think this is the best solution.
I just looked into using a regex to find text between ' -"' and while this is better than just between quotes, it doesn't solve the whole problem because it still will break if the input contains ' -"'. A string containing this isn't particularly common but I'd like a solution where almost any input will not break it.
Is this a stupid question? I don't think there is a built in php function for this and it got downvoted with no comment...
I found a partial solution:
function parse_cmd($command) {
$command = explode(' -"', $command);
array_splice($command, 0, 1);
foreach($command as &$element) {
$element = substr($element, 0, strlen($element) -1);
}
return $command;
}
This will split everything after ' -"' and return it as an array

Do I need to implement full text search in this case? alternatives?

I have two columns in a table first_name and last_name(PostgreSQL).
In front end, I have an input to allow users to search for people. It is an auto-complete field that calls a web service for searching people by first and/or last names.
Currently, I have made a query (using my query builder):
$searches = preg_split('/\s+/', $search);
if (!empty($search)) {
$orX = $query->expr()->orX();
$i = 0;
foreach ($searches as $value) {
$orX->add($query->expr()->eq('c.firstName', ':name'.$i));
$orX->add($query->expr()->eq('c.lastName', ':name'.$i));
$query->setParameter('name'.$i, $value);
$i++;
}
$query->andWhere($orX);
}
But this query is not as precise as it is required, it uses OR for every word so if I am looking for "Rasmus Lerdorf" it also gives me "Rasmus Adams" and "Adel Lerdorf". It works only if I enter a single word ("Rasmus" for example), in this case it gives me all people with "Rasmus" as first_name or last_name.
I read about MATCH AGAINST but I am using PostgreSQL. I also heard about Full text search feature in PostgreSQL as the equivalent of MATCH AGAINST, but I am wondering if implementing a full text search would be an overkill for such an objective (especially that the maximum number of words in both columns wouldn't exceed 4).
I ask you please your advices, your usual help is always appreciated. Thanks
You don't need fulltext search.
Just add the different search terms with AND instead of OR:
$i = 0;
foreach ($searches as $value) {
$orX = $query->expr()->orX();
$orX->add($query->expr()->eq('c.firstName', ':name'.$i));
$orX->add($query->expr()->eq('c.lastName', ':name'.$i));
$query->setParameter('name'.$i, $value);
$i++;
$query->andWhere($orX);
}
I would also suggest using LIKE instead of an equality comparison (add '%' to the start and end of the users search term), and probably also make everything case insensitive by adding $query->expr()->lower() appropriately.

SQL query to bring all results regardless of punctuation with JSF

So I have a database with articles in them and the user should be able to search for a keyword they input and the search should find any articles with that word in it.
So for example if someone were to search for the word Alzheimer's I would want it to return articles with the word spell in any way regardless of the apostrophe so;
Alzheimer's
Alzheimers
results should all be returned. At the minute it is search for the exact way the word is spell and wont bring results back if it has punctuation.
So what I have at the minute for the query is:
private static final String QUERY_FIND_BY_SEARCH_TEXT = "SELECT o FROM EmailArticle o where UPPER(o.headline) LIKE :headline OR UPPER(o.implication) LIKE :implication OR UPPER(o.summary) LIKE :summary";
And the user's input is called 'searchText' which comes from the input box.
public static List<EmailArticle> findAllEmailArticlesByHeadlineOrSummaryOrImplication(String searchText) {
Query query = entityManager().createQuery(QUERY_FIND_BY_SEARCH_TEXT, EmailArticle.class);
String searchTextUpperCase = "%" + searchText.toUpperCase() + "%";
query.setParameter("headline", searchTextUpperCase);
query.setParameter("implication", searchTextUpperCase);
query.setParameter("summary", searchTextUpperCase);
List<EmailArticle> emailArticles = query.getResultList();
return emailArticles;
}
So I would like to bring back all results for alzheimer's regardless of weather their is an apostrophe or not. I think I have given enough information but if you need more just say. Not really sure where to go with it or how to do it, is it possible to just replace/remove all punctuation or just apostrophes from a user search?
In my point of view, you should change your query,
you should add alter your table and add a FULLTEXT index to your columns (headline, implication, summary).
You should also use MATCH-AGAINST rather than using LIKE query and most important, read about SOUNDEX() syntax, very beautiful syntax.
All I can give you is a native query example:
SELECT o.* FROM email_article o WHERE MATCH(o.headline, o.implication, o.summary) AGAINST('your-text') OR SOUNDEX(o.headline) LIKE SOUNDEX('your-text') OR SOUNDEX(o.implication) LIKE SOUNDEX('your-text') OR SOUNDEX(o.summary) LIKE SOUNDEX('your-text') ;
Though it won't give you results like Google search but it works to some extent. Let me know what you think.

SQL Syntax and Rails: how to generate list of db records with trailing whitespace in console

I'm in the Rails console and I want to generate a list of user names that have a trailing whitespace in them. I was thinking that the syntax would look like this, but it didn't work. Any change a better programmer than me can point out what I'm doing wrong?
> User.name.where("% ")
Don't know if you're using MySQL, but an approach would be:
User.where("name LIKE '% '")
You may change this according to your database. This is kinda slow, though.
One way is
Job.all.select{|j| j =~ /^\d+$/}
but it may not be as efficient as the MySQL version.
Another possibility is to use a named scope to hide the ugly SQL:
named_scope :all_digits, lambda { |regex_str|
{ :condition => [" invoice_number REGEXP '?' " , regex_str] }
}
Then you have Job.all_digits.
Answer taken from How to specify Ruby regex when using Active Record in Rails?
You can have
regex_str = "\w+\s+$"
Thanks
Here's what I went with:
User.all.select { |c| c.name.end_with?(" ") }
This got me the list I needed.
It's based on Paritosh's first answer. I'm making his a the canonical answer because I think it's a better resource in general. My solution only helps me, but his has a lot of strategies that would be helpful.