Need to UPPER SQL statement with INNER JOIN SELECT - sql

I'm using Pervasive SQL 10.3 (let's just call it MS SQL since almost everything is the same regarding syntax) and I have a query to find duplicate customers using their email address as the duplicate key:
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT arcus.email2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY arcus.email2 HAVING COUNT(*)>1
) dt
ON arcus.email2=dt.email2
ORDER BY arcus.email2";
My problem is that I need to do a case insensitive search on the email2 field. I'm required to have UPPER() for the conversion of those fields.
I'm a little stuck on how to do an UPPER() in this query. I've tried all sorts of combinations including one that I thought for sure would work:
... ON UPPER(arcus.email2)=UPPER(dt.email2) ...
... but that didn't work. It took it as a valid query, but it ran for so long I eventually gave up and stopped it.
Any idea of how to do the UPPER conversion on the email2 field?
Thanks!

If your database is set up to be case sensitive, then your inner query will have to take account of this to perform the grouping as you intended. If it is not case sensitive, then you won't require UPPER functions.
Assuming your database IS case sensitive, you could try the query below. Maybe this will run faster...
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT UPPER(arcus.email2) as upperEmail2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY UPPER(arcus.email2) HAVING COUNT(*)>1
) dt
ON UPPER(arcus.email2) = dt.upperEmail2

Check out this blog post which discusses case insensitive searches in SQL. In essence, the reason why it was so slow was that most likely none of the current table indexes could be used in the query, so the database engine had to perform a full table scan, likely multiple times.
An index on arcus.email2 is completely useless when wanting to compare between the uppercased versions (UPPER(arcus.email2)), because the database engine cannot look up the values in the index (because they're different values!).
To improve the performance, you can create an index specifically on the result of applying UPPER to the field.
CREATE INDEX IX_arcus_UPPER_email2
ON arcus (UPPER(email2));

The collation of a character string will determine how SQL Server compares character strings. If you store your data using a case-insensitive format then when comparing the character string “AAAA” and “aaaa” they will be equal. You can place a collate Latin1_General_CI_AS for your email column in the where clause.
Check the link below for how to implement collation in a sql query.
How to do a case sensitive search in WHERE clause

Related

Sub-Queries in Sybase SQL

We have an application which indexes data using user-written SQL statements. We place those statements within parenthesis so we can limit that query to a certain criteria. For example:
select * from (select F_Name from table_1)q where ID > 25
Though we have discovered that this format does not function using a Sybase database. Reporting a syntax error around the parenthesis. I've tried playing around on a test instance but haven't been able to find a way to achieve this result. I'm not directly involved in the development and my SQL knowledge is limited. I'm assuming the 'q' is to give the subresult an alias for the application to use.
Does Sybase have a specific syntax? If so, how could this query be adapted for it?
Thanks in advance.
Sybase ASE is case sensitive w.r.t. all identifiers and the query shall work:
as per #HannoBinder query :
select id from ... is not the same as select ID from... so make sure of the case.
Also make sure that the column ID is returned by the Q query in order to be used in where clause .
If the table and column names are in Upper case the following query shall work:
select * from (select F_NAME, ID from TABLE_1) Q where ID > 25

Joining varchar and nvarchar

I'm comparing account numbers in two different databases to make sure the account exists in both. The account field in one database is nvarchar and the other it's varchar. I do a cast to cast them both to varchar(12) and join them to see where there isn't a match. If there is an account number with less than 12 characters then it thinks it's not a match. I'm assuming the extra characters in each field are causing the issue?
table1 - accountnumber(nvarchar(255))
table2 - accountnumber(varchar(20))
select * from
table1
left outer join table2 on table2.accountnumber = table1.accountnumber
In this one example, both tables have an account with the number 12345678, but the join isn't working. I'm not sure if it's data type mismatch or white space or something else.
--Added--
I should add that the data in table2 actually originates from an Oracle database where it's stored as a varchar2(12 byte). I import it into a SQL Server database where it's stored as a varchar(20). I'm not sure if this makes a difference.
Not sure where you are having a problem. This query should return matching account numbers (no need to CAST):
SELECT *
FROM YourTable
JOIN YourOtherTable ON YourTable.AccountNumber = YourOtherTable.AccountNumber
If your data has spaces, you can TRIM your data depending on your RDBMS -- LTRIM and RTRIM for SQL Server.
SELECT *
FROM YourTable
JOIN YourOtherTable ON RTRIM(LTRIM(YourTable.AccountNumber)) = RTRIM(LTRIM(YourOtherTable.AccountNumber))
Here is the SQL Fiddle.
Good luck.
Your query works fine. This is perhaps a character encoding issue. Try using collate. See this previous SO answer which might help.
I ran into absolutely same case, I had even two sibling queries (one created as a copy of another), which both had this problem. Collation and types were no issue here.
Finally after a LOT of testing, one of the queries started to work without aparent changes, just re-written. When I retyped the IN part of the second query, it started to work too.
So there was a problem with a hidden character accidentally typed somewhere in the query.

Considering spaces in a SQL row as null

I was suppose to get all data from the table where the column "Address" is not null
so I made a statement that look like this...
Select * from Table where Address is not null
Unfortunately, there are rows in "Address" column that has spaces so SQL cannot consider it as Null
How can I display rows where Address is not null?
Thanks :)
Most database systems have a NULLIF() function. It was defined together with COALESCE() in the ANSI SQL-99 standard if not earlier. It is implemented in at least SQL Server, Oracle, PostgreSQL, MySQL, SQLite, DB2, Firebird.
Select * from Table where NULLIF(Address,'') is not null
But for me, I like this more
Select * from Table where Address > ''
It kills nulls and empty strings in one go. It will even exclude strings that are made up entirely of spaces ('', ' ', etc). It also retains SARGability.

Sqlite query optimisation needed

I'm using sqlite for a small validation application. I have a simple one table database with 4 varhchar columns and one integer primary key. There are close to 1 million rows in the table. I have optimised it and done a vacuum on it.
I am using the following query to retrieve a presence count from the table. I have changed the fields and names for privacy.
SELECT
count(*) as 'test'
FROM
my_table
WHERE
LOWER(surname) = LOWER('Oliver')
AND
UPPER(address_line2) = UPPER('Somewhere over the rainbow')
AND
house_number IN ('3','4','5');
This query takes about 1.5-1.9 seconds to run. I have tried indexes and they make no difference really. This time may not sound bad but I have to run this test about 40,000 times on a read in csv file so as you may imagine it adds up pretty quickly. Any ideas on how to reduce the execution time. I normally develop in mssql or mysql so if there are some tricks I am missing in sqlite I would be happy to hear them.
All the best.
When you use a function over an indexed column, SQLite cannot use the index, because the function may not preserve the ordering -- i.e. there can be functions such as 1>2, but F(1)<F(2). There are some ways to solve this situation, though:
If you want to use indexes to make your query faster, you must save
the value in a fixed case (upper or lower) and then convert only the
query parameter to the same case:
SELECT count(*) as 'test'
FROM my_table
WHERE surname = LOWER('Oliver')
You can use the case-insensitive LIKE operator (I don't know how indexes are affected!):
SELECT count(*) as 'test'
FROM my_table
WHERE surname LIKE 'Oliver';
Or you can create each column as text collate nocase and don't worry about case differences regarding this column anymore:
CREATE TABLE my_table (surname text collate nocase, <... other fields here ...>);
SELECT count(*) as 'test'
FROM my_table
WHERE surname ='Oliver';
You can find more information about the = and LIKE operators here.
SELECT
count(1) as 'test'
FROM
my_table
WHERE
surname = 'Oliver'
AND
address_line2 = 'Somewhere over the rainbow'
AND
house_number IN ('3','4','5')
COLLATE NOCASE;

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.