Difference between % vs * in string comparison in Hive - hive

When trying to list down all tables names in a database having a specific name format, the following query works fine :
show tables like '*case*';
while the following does not
show tables like '%case%';
On the other hand, when comparing the actual data inside string columns its the vice-versa case
Working query :
select column from database.table where column like '%ABC%' limit 5;
Not working query :
select column from database.table where column like '*ABC*' limit 5;
What's the difference between the 2 operators * and % ?

This is the difference between regular expressions and like patterns.
LIKE is built into the SQL language. It has two wildcards:
% represents any number of characters including zero.
_ represents exactly one character.
Regular expressions are much more flexible for matching almost any pattern in a string.
When SQL was invented, I don't think regular expressions were in common use in computer systems -- at the very least, the folks at IBM who worked on relational databases may not have been familiar with the folks at ATT who were inventing Unix.
Regular expressions are much more powerful than LIKE patterns, of course. And Hive supports them via the RLIKE operator (and some other functions).
The SHOW functionality is not standard SQL. So, the developers of Hive chose the more flexible method for pattern matching.

HiveQL attempts to mimic the SQL, but it does not strictly follow its standards.
The usage of the wildcards are not pertinent to the LIKE clause, but to the statement itself. SHOW statements validate the wildcards based on the Java regular expression whereas when it comes to SELECT statements, Hive tries to stick with the SQL's wildcard validation.

Related

Does array like function exist in SQL

Using SQL I would like to know if its possible to do the following:
If I have a variable that the user inputs mutiple strings into seperated by a comma for example ('aa','bbb','c','dfd'), is it possible using LIKE with a wilcard at the end of each string in stead of having the user to enter each variations in multiple macros.
So say if user was looking for employee numbers that start with ('F','E','C') is it possible without using OR statements is the question I guess am asking?
It would be similar to that of an array I guess
No, LIKE is its own operator and therefore needs separated by an OR.
You might prefer ILIKE to LIKE, as it is a case-insensitive comparison.
You can also try to use REGEXP_LIKE, which is similar to what you want, except you'll have to use regex expressions instead of 'FEC%'
That depends on your SQL dialect; I don't know Impala at all, but other SQL engines have support for regular expressions in string matches, so that you can build a query string like
SELECT fld FROM tbl WHERE fld REGEXP '^[FEC].*$';
No matter what you do, you will need to build a query from your user's input. Passing through user input unprocessed into your SQL processor is a big "nope" anyways, from a "don't accidentally delete a table" point of view:

Wildcard Search for a numeric range

I am trying to filter out a column (plan_name) which have internet plans of customers. Eg (200GB Fast Unlimited,Free Additional Mailbox,Unlimited VDSL...). I specifically want to filter out plans which do not have any numbers in them. The column is of type varchar2 and I'm querying an Oracle database. I have written the following code:
SELECT *
FROM plans
WHERE plan_name NOT LIKE '%[0-9]%'
However, this code still returns plans like 200GB fast that have numbers in them? Can anyone explain why this is?
Oracle's like syntax does not support the kind of pattern matching you are trying. This is SQL Server syntax. Oracle interprets the pattern as a litteral '[0-9]' (which, obviously, something like '200GB'does not match).
However, unlike SQL Server, Oracle has proper suport for regular expression, through the regexp_* functions. If you want values that contain no digit, you can do:
where not regexp_like(plan_name, '\d')

Problems with BETWEEN dates operator

I am practising and experimenting with different syntax of SQL BETWEEN operator in regards to dates from the "https://www.w3schools.com/sql/sql_between.asp"
This is the Order table in my database:
LINK: https://www.w3schools.com/sql/sql_between.asp
The query is fetching the orderdates between a given condition of 2 dates.
These are the two main syntax versions (according to w3schools):
SELECT *
FROM Orders
WHERE OrderDate BETWEEN #01/07/1996# AND #31/07/1996#;
and:
SELECT *
FROM Orders
WHERE OrderDate BETWEEN '1996-07-01' AND '1996-07-31';
The output that we get on typing the above two queries from the Orders table
Number of Records: 22 (out of 196 records). Yes this is correct.
Now I am experimenting with this syntax versions.
CASE #1:
SELECT *
FROM Orders
WHERE OrderDate BETWEEN #1996/07/01# AND #1996/07/31#;
Result of case #1: 22 (same as the above syntax)
In the SQL try it out editor(https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_between_date&ss=-1) they are stating that this SQL statement is not supported in the WebSQL database.The example still works, because it uses a modified version of SQL.
WHY SO?
If you're using the W3Schools Tryit editor in Chrome, you're using WebSQL, which is basically SQLite.
SQLite doesn't have a date/time format, so is probably storing the date values as strings formatted in the ISO-8601 format (see this answer for more information).
Other database systems (e.g. Oracle, Microsoft SQL Server, Postgres, MySQL) have built-in date formats, and you generally represent them as strings (enclosed in single quotes). For example: '1997-07-01' (depending on the specific RDBMS, there might be more specific considerations).
The format that uses pound signs (e.g. #7/1/1997#) is unique to Microsoft Access (see this answer for more information).
Bottom line: Dates are generally enclosed in single quotes. You're best off sticking to the ISO-8601 standard (e.g. 1997-07-01).
If you're learning SQL, there are other resources out there besides W3Schools. I would recommend downloading an open-source RDBMS like Postgres or MySQL, setting up a sample database, and working on some queries. Challenge sites like codewars might also be helpful
One more thing: Don't use BETWEEN for dates. Use >= and <, to make sure you're not excluding dates with a time portion. For more information, read this blog.

How to determine whether a varchar field DOES NOT contain characters in set

I need to determine if all rows in varchar column in a db contain any characters outside of the particular set below:
abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|=''
I tried this but am not sure if it is correct:
select AccName
from Transactions
where AccName not like '%[!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%'
Should this work?
Any help appeciated.
You cannot use a regular expression inside an ordinary LIKE condition in a query. If you want to use regular expressions, you will have to use a special operator. In MySQL, you could try the following:
SELECT AccName
FROM Transactions
WHERE AccName REGEXP [!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%';
If this doesn't run to boot, then you may have to tidy up the regular expression you gave. And as marc_s asked, the exact regular expression and query will depend on the DB system you are using.
Database management systems vary in their support for matching regular expressions. Examples below use PostgreSQL, which supports POSIX regular expressions, along with other flavors. Examples below also test for case-sensitive matches to avoid sentences like "'Mike' doesn't not match the regular expression".
AFAIK, no DBMS lets you mix the like operator with a regular expression.
A like expression in the form column_name like '%a%' will match 'a' if it appears anywhere in the column. But you need your regular expression to match on the whole value of the column. Anchor the regular expression at the start and end of each value (^ and $), and tell the dbms to match one or more instances (+) of the atom.
select 'Mike' ~ '^[a-zA-Z0-9]+$'; -- 'Mike' matches the regex
Write a failing test.
select 'Mike?' ~ '^[a-zA-Z0-9]+$'; -- 'Mike?' doesn't match the regex
Add the question mark to the regex, and verify the test succeeds.
select 'Mike?' ~ '^[a-zA-Z0-9?]+$'; -- 'Mike?' matches the regex
Repeat failing test and succeeding test for each character. When you've caught all the characters you want, invert the logic using the !~ operator in place of the ~ operator.
When your data is clean move this into a CHECK constraint.
PostgreSQL pattern matching

Preg_replace solution for prepared statements

I have a command class that abstracts almost all specific database functions (We have the exactly same application running on Mssql 2005 (using ODBC and the native mssql library), MySQL and Oracle. But sometimes we had some problems with our prepare method that, when executed, replaces all placeholders with their respective values. But the problem is that I am using the following:
if(is_array($Parameter['Value']))
{
$Statement = str_ireplace(':'.$Name, implode(', ', $this->Adapter->QuoteValue($Parameter['Value'])), $Statement);
}
else
{
$Statement = str_ireplace(':'.$Name, $this->Adapter->QuoteValue($Parameter['Value']), $Statement);
}
The problem arises when we have two or mer similar parameters names, for example, session_browser and session_browse_version... The first one will partially replace the last one.
Course we learned to go around specifying the parameters within a specific order, but now that I have some "free" time I want to make it better, so I am thinking on switching to preg_replace... and I am not good in regular expression, can anyone give any help with a regex to replace a string like ':parameter_name`?
Best Regards,
Bruno B B Magalhaes
You should use the \b metacharacter to match the word boundary, so you don't accidentally match a short parameter name within a longer parameter name.
Also, you don't need to special-case arrays if you coerce a scalar Value to an array of one entry:
preg_replace("/:$Name\b/",
implode(",", $this->Adapter->QuoteValue( (array) $Parameter['Value'] )),
$Statement);
Note, however, that this can make false positive matches when an identifier or a string literal contains a pattern that looks like a parameter placeholder:
SELECT * FROM ":Name";
SELECT * FROM Table WHERE column = ':Name';
This gets even more complicated when quoted identifiers and string literals can contain escaped quotes.
SELECT * FROM Table WHERE column = 'word\':Name';
You might want to reconsider interpolating variables into SQL strings during prepare, because you're defeating any benefits of prepared statements with respect to security or performance.
I understand why you're doing what you're doing, because not all RDBMS back-end supports named parameters, and also SQL parameters can't be used for lists of values in an IN() predicate. But you're creating an awfully leaky abstraction.