SQL - Where clause: comparing a column with multiple values - sql

Is there any generic command or syntax in SQL to allow one to have multiple values in a where statement without using OR?
OR just gets tedious when you have many values to choose from and you only want say half of them.
I want to return only columns that contain certain values. I am using Cache SQL, but as I said, a generic syntax might be helpful as well because most people are unfamiliar with Cache SQL. Thanks!

You should use IN:
... where column_name in ('val1', 'val2', ...);

Use the 'IN' clause.
SELECT * FROM product WHERE productid IN (1,2,3)

I believe you are looking for the IN clause.
Determines whether a specified value matches any value in a subquery or a list.

Related

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

check if column contains one of many values

i have an array of strings and a column which may contain one or more of those strings(seperated by space) i want to get all rows where this column contains one of the strings. Since the values all have 3 letters and therefore can't contain each other, i know i could just write
SELECT * FROM table WHERE
column LIKE '%val1%' OR
column LIKE '%val2%' OR
column LIKE '%val3%' OR
column LIKE '%val4%'
But i'm wondering if there isn't an easier statement, like column IN ('val1', 'val2', 'val3', 'val4') (This one seems only to work when the entry is equal to one of the values, but not if it just contains them)
Try reading this Is there a combination of "LIKE" and "IN" in SQL? and Combining "LIKE" and "IN" for SQL Server , this will solve you question.
Something like this from the first link.
SQL Server:
WHERE CONTAINS(t.something, '"bla*" OR "foo*" OR "batz*"')
Ist oracle you could use regular expressions
select *
from table
where regexp_like (column,'val(1|2|3|4)')

What kind of SQL clause is this? Any way to convert it to SQL?

what kind of SQL is this?
SELECT IFNULL(SUM(prenotazione.VALUTAZIONE),0) AS somma,
COUNT(*) AS numero
FROM `prenotazione`
WHERE prenotazione.USER_ID=18793 AND
prenotazione.PRENOTAZIONE_STATO_ID IN (10,11)
I'm using propel as my ORM.
Any way to convert that kind of SQL to Mysql SQL?
This query is valid in MySQL. It selects all rows from the prenotazione table where the user_id is 18793 and the prenotazione_stato_id is 10 or 11. The resulting rows are summarized: in the numero column you get the number of rows found, in the somma column you get the sum of the valutazione values. If no rows were selected, SUM() would return NULL. To prevent this, IFNULL([expr1],[expr2]) is applied, which returns [expr1] if it is not null, and [expr2] if it is null. This makes sure you always return a number.
There is no easy way to do this with Propel, since your result cannot be easily mapped to a Propel object. The best thing you can do is use the underlying database layer (PDO) to escape your parameters and handle the resultset, and you don't open an extra database connection or something like that.
When considering portability, Standard SQL is your friend. This query can be very easily transformed into Standard SQL-92:
Terminate the statement with a semi-colon.
Replace IFNULL with COALESCE.
Remove the single quotes from the table name.
With better spacing it could look like this:
SELECT COALESCE(SUM(prenotazione.VALUTAZIONE), 0) AS somma,
COUNT(*) AS numero
FROM prenotazione
WHERE prenotazione.USER_ID = 18793
AND prenotazione.PRENOTAZIONE_STATO_ID IN (10,11);
That said, for MySQL you probably would need to undo step 3... which leads me to suspect it was MySQL syntax in the first place.
Using Babelfish to give a rough translation from Italian to English results in
SELECT IFNULL(SUM(reservation.APPRAISAL),0) AS sum,
COUNT(*) AS number
FROM `reservation`
WHERE reservation.USER_ID=18793 AND
reservation.RESERVATION_STATE_ID IN (10,11)
Share and enjoy.

COUNT() Function in conjunction with NOT IN clause not working properly with varchar field (T-SQL)

I came across a weird situation when trying to count the number of rows that DO NOT have varchar values specified by a select statement. Ok, that sounds confusing even to me, so let me give you an example:
Let's say I have a field "MyField" in "SomeTable" and I want to count in how many rows MyField values do not belong to a domain defined by the values of "MyOtherField" in "SomeOtherTable".
In other words, suppose that I have MyOtherField = {1, 2, 3}, I wanna count in how many rows MyField value are not 1, 2 or 3. For that, I'd use the following query:
SELECT COUNT(*) FROM SomeTable
WHERE ([MyField] NOT IN (SELECT MyOtherField FROM SomeOtherTable))
And it works like a charm. Notice however that MyField and MyOtherField are int typed. If I try to run the exact same query, except for varchar typed fields, its returning value is 0 even though I know that there are wrong values, I put them there! And if I, however, try to count the opposite (how many rows ARE in the domain opposed to what I want that is how many rows are not) simply by supressing the "NOT" clause in the query above... Well, THAT works! ¬¬
Yeah, there must be tons of workarounds to this but I'd like to know why it doesn't work the way it should. Furthermore, I can't simply alter the whole query as most of it is built inside a C# code and basically the only part I have freedom to change that won't have an impact in any other part of the software is the select statement that corresponds to the domain (whatever comes in the NOT IN clause). I hope I made myself clear and someone out there could help me out.
Thanks in advance.
For NOT IN, it is always false if the subquery returns a NULL value. The accepted answer to this question elegantly describes why.
The NULLability of a column value is independent of the datatype used too: most likely your varchar columns has NULL values
Do deal with this, use NOT EXISTS. For non-null values, it works the same as NOT IN so is compatible
SELECT COUNT(*) FROM SomeTable S1
WHERE NOT EXISTS (SELECT * FROm SomeOtherTable S2 WHERE S1.[MyField] = S2.MyOtherField)
gbn has a more complete answer, but I can't be bothered to remember all that. Instead I have the religious habit of filtering nulls out of my IN clauses:
SELECT COUNT(*)
FROM SomeTable
WHERE [MyField] NOT IN (
SELECT MyOtherField FROM SomeOtherTable
WHERE MyOtherField is not null
)

Using an Alias column in the where clause in ms-sql 2000

I know you cannot use a alias column in the where clause for T-SQL; however, has Microsoft provided some kind of workaround for this?
Related Questions:
Unknown Column In Where Clause
Can you use an alias in the WHERE clause in mysql?
“Invalid column name” error on SQL statement from OpenQuery results
One workaround would be to use a derived table.
For example:
select *
from
(
select a + b as aliased_column
from table
) dt
where dt.aliased_column = something.
I hope this helps.
Depending on what you are aliasing, you could turn it into a user defined function and reference that in both places. Otherwise your copying the aliased code in several places, which tends to become very ugly and means updating 3+ spots if you are also ordering on that column.