How to SQL compare columns when one has accented chars? - sql

I have two SQLite tables, that I would love to join them on a name column. This column contains accented characters, so I am wondering how can I compare them for join. I would like the accents dropped for the comparison to work.

You can influence the comparison of characters (such as ignoring case, ignoring accents) by using a Collation. SQLLite has only a few built in collations, although you can add your own.
SqlLite, Data types, Collating Sequences
SqlLite, Define new Collating Sequence
EDIT:
Given that it seems doubtful if Android supports UDFs and computed columns, here's another approach:
Add another column to your table, normalizedName
When your app writes out rows to your table, it normalizes name itself, removing accents and performing other changes. It saves the result in normalizedName.
You use normalizedName in your join.
As the normalization function is now in java, you should have few restrictions in coding it. Several examples for removing accents in java are given here.

There is an easy solution, but not very elegant.
Use the REPLACE function, to remove your accents. Exemple:
SELECT YOUR_COLUMN FROM YOUR_TABLE WHERE replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace( lower(YOUR_COLUMN), 'á','a'), 'ã','a'), 'â','a'), 'é','e'), 'ê','e'), 'í','i'),
'ó','o') ,'õ','o') ,'ô','o'),'ú','u'), 'ç','c') LIKE 'SEARCH_KEY%'
Where SEARCH_KEY is the key word that you wanna find on the column.

As mdma says, a possible solution would be a User-Defined-Function (UDF). There is a document here describing how to create such a function for SQLite in PHP. You could write a function called DROPACCENTS() which drops all the accents in the string. Then, you could join your column with the following code:
SELECT * FROM table1
LEFT JOIN table2
ON DROPACCENTS(table1.column1) = DROPACCENTS(table2.column1)
Much similar to how you would use the UCASE() function to perform a case-insensitive join.
Since you cannot use PHP on Android, you would have to find another way to create the UDF. Although it has been said that creating a UDF is not possible on Android, there is another Stack Overflow article claiming that a content provider could do the trick. The latter sounds slightly complicated, but promising.

Store a special "neutral" column without accented characters and compare / search only this column.

Related

Pattern matching in Big query vs SSMS-Return strings which contain special characters or numerics

I'm a bit lost.
I've had a look at the documentation but I'm not sure if you can use LIKE and pattern match in Big Query the same as SSMS.
The code shown here works in SSMS but the results are not correct in Big Query, so was wondering if there was another way to do it.
WHERE column_name NOT LIKE '[a-Z]%'
I'm looking to return strings which contain special characters or numerics.
Use REGEXP_CONTAINS instead
where not regexp_contains(column_name, r'[a-zA-Z]')
Meantime, LIKE is also supported as a comparison operator

SQLite function that works like the Oracle's "Translate" function?

Oracle has a function called translate that can be used to replace individual characters of the string by others, in the same order that they appear. It is different than the replace function, which replaces the entire second argument occurence by the entire third argument.
translate('1tech23', '123', '456'); --would return '4tech56'
translate('222tech', '2ec', '3it'); --would return '333tith'
I need this to implement a search on a SQLite database ignoring accents (brazilian portuguese language) on my query string. The data in the table that will be queried could be with or without accents, so, depending on how the user type the query string, the results would be different.
Example:
Searching for "maçã", the user could type "maca", "maça", "macã" or "maçã", and the data in the table could also be in one of the four possibilities.
Using oracle, I would only use this:
Select Name, Id
From Fruits
Where Translate(Name, 'ãç','ac') = Translate(:QueryString, 'ãç','ac')
... and these other character substitutions:
áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙãõÃÕäëïöüÄËÏÖÜâêîôûÂÊÎÔÛñÑçÇ
by:
aeiouAEIOUaeiouAEIOUaoAOaeiouAEIOUaeiouAEIOUnNcC
Of course I could nest several calls to Replace, but this wouldn't be a good choice.
Thanks in advance by some help.
Open-source Oracle functions for SQLite have been written at Kansas State University. They include translate() (full UTF-8 support, by the way) and can be found here.
I don't believe there is anything in sqlite that will translate text in a single pass as you describe.
This wouldn't be difficult to implement as a user defined function however. Here is a decent starting reference.
I used replace
REPLACE(string,pattern,replacement)
https://www.sqlitetutorial.net/sqlite-replace-function/

Replace all occurrences of a substring in a database text field

I have a database that has around 10k records and some of them contain HTML characters which I would like to replace.
For example I can find all occurrences:
SELECT * FROM TABLE
WHERE TEXTFIELD LIKE '%&#47%'
the original string example:
this is the cool mega string that contains &#47
how to replace all &#47 with / ?
The end result should be:
this is the cool mega string that contains /
If you want to replace a specific string with another string or transformation of that string, you could use the "replace" function in postgresql. For instance, to replace all occurances of "cat" with "dog" in the column "myfield", you would do:
UPDATE tablename
SET myfield = replace(myfield,"cat", "dog")
You could add a WHERE clause or any other logic as you see fit.
Alternatively, if you are trying to convert HTML entities, ASCII characters, or between various encoding schemes, postgre has functions for that as well. Postgresql String Functions.
The answer given by #davesnitty will work, but you need to think very carefully about whether the text pattern you're replacing could appear embedded in a longer pattern you don't want to modify. Otherwise you'll find someone's nooking a fire, and that's just weird.
If possible, use a suitable dedicated tool for what you're un-escaping. Got URLEncoded text? use a url decoder. Got XML entities? Process them though an XSLT stylesheet in text mode output. etc. These are usually safer for your data than hacking it with find-and-replace, in that find and replace often has unfortunate side effects if not applied very carefully, as noted above.
It's possible you may want to use a regular expression. They are not a universal solution to all problems but are really handy for some jobs.
If you want to unconditionally replace all instances of "&#47" with "/", you don't need a regexp.
If you want to replace "&#47" but not "&#471", you might need a regexp, because you can do things like match only whole words, match various patterns, specify min/max runs of digits, etc.
In the PostgreSQL string functions and operators documentation you'll find the regexp_replace function, which will let you apply a regexp during an UPDATE statement.
To be able to say much more I'd need to know what your real data is and what you're really trying to do.
If you don't have postgres, you can export all database to a sql file, replace your string with a text editor and delete your db on your host, and re-import your new db
PS: be careful

In Oracle, how do you select multiple values from a related table and store them in a single column?

I'm selecting columns from one table and would like to select all values of a column from a related table when the two tables have a matching value, separate them by commas, and display them in a single column with my results from table one.
I'm fairly new to this and apologize ahead of time if I'm not wording it correctly.
It sounds like what you're trying to do is to take multiple rows and aggregate them into a single row by concatenating string values from one or more columns. Yes?
If that's the case, I can tell you that it's a more difficult problem than it seems if you want to do it using portable SQL - especially if you don't know ahead of time how many items you may get.
The Oracle-specific solution often used in such cases is to implement a custom aggregate function - STRAGG(). Here's a link to an article that describes exactly how to do so and has examples of it's usage.
If you're on Oracle 9i or later and are willing to live with using undocumented functions (that could change in the future), you can also look at the WM_CONCAT() function - which does much the same thing.
You want a row aggregation or concatenation function, choices are:
If you are using Oracle 11gR2, there is a built-in function to aggregate strings with a delimiter called LISTAGG(column, delimiter).
If you are using any earlier release of Oracle database, you can use WM_CONCAT(column) function, however you have no choice of delimiter and will have to use something like TRANSLATE(string, string_to_replace, replacement_string) function to change the delimiter afterwards if your data does not contain commas.
As mentioned by LBushkin, you can create a custom function in your schema to perform row aggregation for you. Here is PL/SQL code example for one: http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php#user_defined_aggregate_function

any way to simplify this LIKE wildcard expression in T-SQL, without resorting to CLR?

I have a column of database names like so:
testdb_20091118_124925
testdb_20091119_144925
testdb_20091119_145925
ect...
Is there a more elegant way of returning only similar records then using this like expression:
select * from sys.databases where name
LIKE 'testdb[_][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][_][0-9][0-9][0-9][0-9][0-9][0-9]'
No, no "elegant" solution, I'm afraid.
Furthermore, introducing functions, whether "native" or CLR, in the WHERE clause would prevent SQL of using indexes to resolve the predicate (it would have to scan the whole table, unless some other predicate came to help, in parts)
A few things to notice:
the use of the underscore may be acceptable here since the targeted values seem to follow a very regular pattern. However underscore when used with LIKE, is itself a wildcard (corresponding to one and exactly one character). If you truly want to specify underscore, "escape" them by putting them in brackets, i.e. 'abc[_]def' will match 'abc_def', precisely, but not 'abcXdef' for example.
the expression could be made a bit more selective and shorter with things like
'testdb_20[0-9][0-9][0-1][0-9][0-3][0-9][_][0-9][0-9][0-9][0-9][0-9][0-9]'
i.e. assuming dates will be in this century and limiting for day bigger than 3x etc.
No, it is not possible.
By the way, you need to put your underscore inside brackets because it means any character.