PL SQL - Multiple column equality - sql

I'm trying to evaluate multiple columns to save myself a few keystrokes (granted, at this point, the time and effort of the search has long since negated any "benefit" I would ever receive) rather than multiple different compares.
Basically, I have:
WHERE column1 = column2
AND column2 = column3
I want:
WHERE column1 = column2 = column3
I found this other article, that was tangentially related:
Oracle SQL Syntax - Check multiple columns for IS NOT NULL

Use:
x=all(y,z)
instead of
x=y and y=z
The above saves 1 keystroke (1/11 = 9% - not much).
If column names are longer, then it gives bigger savings:
This is 35 characters long:
column1=column2 AND column2=column3
while this one only 28
column1=ALL(column2,column3)
But for this one (95 characters):
column1=column2 AND column2=column3 AND column3=column4
AND column4=column5 AND column5=column6
you will get 43/95 = almost 50% savings
column1=all(column2,column3,column4,column5,column6)
ALL operator is a part of ANSII SQL, it is supported by most databases (Mysql, Postgresql, SQLServer etc.
http://www.w3resource.com/sql/special-operators/sql_all.php
A simple test case that shows how it works:
create table t( x int, y int, z int );
insert all
into t values( 1,1,1)
into t values(1,2,2)
into t values(1,1,2)
into t values(1,2,1)
select 1 from dual;
select *
from t
where x = all(y,z);
X Y Z
---------- ---------- ----------
1 1 1

One possible trick is to utilize the least and greatest functions - if the largest and the smallest values of a list of values are equal, it must mean all the values are equal:
LEAST(col1, col2, col3) = GREATEST(col1, col2, col3)
I'm not sure it saves any keystrokes on a three column list, but if you have many columns, it could save some characters. Note that this solution implicitly assumes that none of the values are null, but so does your original solution, so it should be OK.

Related

Oracle SQL: Adding Case When inside UTL Match

I am using UTL matching to retrieve the values matching from two different tables and how similar they are. I am filtering by those who are at least 90 out of 100 similars so I can manually check if those values actually are the same or not.
As the resulting data is too big I am working on some new queries to take out those values who are surely the same value and does not need any manual checking. Like those >90% similar with a size of 9 characters.
In order to do those I have just been using WHERE CLAUSE, but now I want to insert a CASE STATEMENT as I want to state that those values that contain the word "University" and does not have a 95% similarity should not appear.
The code I am using seem to be running but It's taking a lot of time. Would you know if it can be improved (for time). Thank you!!!
The code I am using will be like:
with consolidate_table as (....)
select
column1, column2,
UTL_MATCH.jaro_winkler_similarity(column1, column2) as jws
from consolidate_table
where UTL_MATCH.jaro_winkler_similarity(column1, column2) >= 90
AND UTL_MATCH.jaro_winkler_similarity(column1, column2) < 100
AND LENGTH (column1) <9
AND column1 = (CASE
WHEN column1 LIKE '%University%'
AND UTL_MATCH.jaro_winkler_similarity(column1, column2) > 94
THEN column1 ELSE NULL
END)
;
Unsure without sample data & expected results.
But at a first glace it can be simplified.
Using a BETWEEN the function is only called once.
And filtering those that are equal could help speed it up.
...
WHERE column1 <> column2
AND UTL_MATCH.jaro_winkler_similarity(column1, column2) BETWEEN (CASE WHEN column1 LIKE '%University%' THEN 95 ELSE 91 END) AND 99

Getting rows for values greater by 3 numbers or letters

Am trying to come up with a query where I can return back values where the the distance between the letters could be one or more than one for the chosen letter.
For example:
I have two columns which have letters in Column A and in Column B. I want to return back with rows when column B distance is more than Column A by one or more letters.
It's not clear to me, when you say "greater" if you mean that the distance between any two letters is 2 or 3 (Column B can be alphabetically before or after Column A, by a distance of 2 or 3).. Or if Column B has to be alphabetically after Column A, by a distance of 2 or 3
Because I'm not certain what you're talking about, I present two options. Read the "if" rule and choose the one that applies to your situation, then use the query under it:
If columnA is D and columnB can be any of: A B F G
SELECT * FROM table WHERE ABS(ASCII(columna) - ASCII(columnb)) IN (2,3)
If columnA is D and columnB can be any of: F G
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) IN (2,3)
Edit1: Per your later comment, you are now saying that the distance is not just 2 or 3 letters (the first line of your question states "2 or 3") but any number of letters distance equal to or greater than 2:
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) >= 2
Overall the technique isn't much different to the above queries and there are many ways to specify what you want:
SELECT * FROM table
WHERE
ASCII(columnb) - ASCII(columna)
BETWEEN <some_number_here> AND <other_number_here>
Ultimately the most important thing is to note the use of ASCII function, which gives us the ascii char code of the first letter in a string:
ASCII('ABCD') => 65
And we can use maths on this to work out if a letter distance from 'A' is more than 1 etc..
Probably also worth noting that ASCII() works on single byte ascii characters. If your data is multibyte (Unicode), you might need to use ORD() instead:
Edit2: Your latest edit to the question revises the limit to "B greater than A by one or more" which is equivalent to >= 1 ..
The question seems not to have a clear spec, please treat the answer as a guide for the general technique:
--for an open ended distance, ascii chars
SELECT * FROM table WHERE ASCII(columnb) - ASCII(columna) >= <some_distance>
--for an open ended distance, unicode
SELECT * FROM table
WHERE ORD(columnb) - ORD(columna) >= <some_distance>
--for a definite range of distances (replace … appropriately)
SELECT * FROM table
WHERE ... BETWEEN <some_distance> AND <some_other_distance>
this will work indeed:
select * from table_name where ascii(col_1)+2=ascii(col_2);
You can use something like this if you need it to be exactly 2 or 3 letters greater
select Column A, ColumnB from table name where ASCII(ColumnB) - ASCII(ColumnA) in (2,3)
If you want all those rows where the the difference is equal more than 2, then use this
select Column A, ColumnB from table name where ASCII(ColumnB) - ASCII(ColumnA) >=2
this is where you can make ascii in action..
select * from SampleTable where (ASCII(sampleTable.ColumnB) - ASCII(ColumnA)) >= 2;

SQL - how to check in WHERE clause if some sign (#) exists at least two times in string (i.e. AB#CDEF#)?

I have column with string values.
I would like to have Select statement which will return all rows where sign # is present two or more times?
For example:
COL1 COL2
1 AB#CDE#
2 AB#
3 AB#CDE#FG#IJ#
If I do
SELECT * FROM TABLE WHERE COL2 LIKE "%#%"
it will return all three rows but I need 1st and 3rd.
Thank you,
SELECT * FROM TABLE WHERE COL2 LIKE "%#%#%"
As long as there's at least 2 instances of "#" then this will catch it.

Giving Range to the SQL Column

I have SQL table in which I have column and Probability . I want to select one row from it with randomly but I want to give more chances to the more waighted probability. I can do this by
Order By abs(checksum(newid()))
But the difference between Probabilities are too much so it gives more chance to highest probability.Like After picking 74 times that value it pick up another value for once than again around 74 times.I want to reduce this .Like I want 3-4 times to it and than others and all. I am thinking to give Range to the Probabilies.Its Like
Row[i] = Row[i-1]+Row[i]
How can I do this .Do I need to create function?Is there any there any other way to achieve this.I am neewby.Any help will be appriciated.Thank You
EDIT:
I have solution of my problem . I have one question .
if I have table as follows.
Column1 Column2
1 50
2 30
3 20
can i get?
Column1 Column2 Column3
1 50 50
2 30 80
3 20 100
Each time I want to add value with existing one.Is there any Way?
UPDATE:
Finally get the solution after 3 hours,I just take square root of my probailities that way I can narrow the difference bw them .It is like I add column with
sqrt(sqrt(sqrt(Probability)))....:-)
I'd handle it by something like
ORDER BY rand()*pow(<probability-field-name>,<n>)
for different values of n you will distort the linear probabilities into a simple polynomial. Small values of n (e.g. 0.5) will compress the probabilities to 1 and thus make less probable choices more probable, big values of n (e.g. 2) will do the opposite and further reduce probability of already inprobable values.
Since the difference in probabilities is too great, you need to add a computed field with a revised weighting that has a more even probability distribution. How you do that depends on your data and preferred distribution. One way to do it is to "normalize" the weighting to an integer between 1 and 10 so that the lowest probability is never more than ten times smaller than the highest.
Answer to your recent question:
SELECT t.Column1,
t.Column2,
(SELECT SUM(Column2)
FROM table t2
WHERE t2.Column1 <= t.Column1) Column3
FROM table t
Here is a basic example how to select one row from the table with taking into account the assigned row weights.
Suppose we have table:
CREATE TABLE TableWithWeights(
Id int NOT NULL PRIMARY KEY,
DataColumn nvarchar(50) NOT NULL,
Weight decimal(18, 6) NOT NULL -- Weight column
)
Let's fill table with sample data.
INSERT INTO TableWithWeights VALUES(1, 'Frequent', 50)
INSERT INTO TableWithWeights VALUES(2, 'Common', 30)
INSERT INTO TableWithWeights VALUES(3, 'Rare', 20)
This is the query that returns one random row with taking into account given row weights.
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
To check query results we can run it for 100 times.
DECLARE #count as int;
SET #count = 0;
WHILE ( #count < 100)
BEGIN
-- This is the query that returns one random row with
-- taking into account given row weights
SELECT * FROM
(SELECT tww1.*, -- Select original table data
-- Add column with the sum of all weights of previous rows
(SELECT SUM(tww2.Weight)- tww1.Weight
FROM TableWithWeights tww2
WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
FROM TableWithWeights tww1) as tww,
-- Add column with random number within the range [0, SumOfWeights)
(SELECT RAND()* sum(weight) as rnd
FROM TableWithWeights) r
WHERE
(tww.SumOfWeightsOfPreviousRows <= r.rnd)
and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)
-- Increase counter
SET #count += 1
END
PS The query was tested on SQL Server 2008 R2. And of course the query can be optimized (it's easy to do if you get the idea)

Counting non-null columns in a rather strange way

I have a table which has 32 columns in an Oracle table.
Two of these columns are identity columns
the rest are values
I would like to get the average of all the value columns, which is complicated by the null (identity) columns. Below is the pseudocode for what I am trying to achieve:
SELECT
((nvl(val0, 0) + nvl(val1, 0) + ... nvl(valn, 0))
/ nonZero_Column_Count_In_This_Row)
Such that: nonZero_Column_Count_In_This_Row = (ifNullThenZeroElse1(val0) + ifNullThenZeroElse1(val1) ... ifNullThenZeroElse(valn))
The difficulty here is of course in getting 1 for any non-null column. It seems I need a function similar to NVL, but with an else clause. Something that will return 0 if the value is null, but 1 if not, rather than the value itself.
How should I go about about getting the value for the denominator?
PS: I feel I must explain some motivation behind this design. Ideally this table would have been organized as the identity columns and one value per row with some identifier for the row itself. This would have made it more normalized and the solution to this problem would have been pretty simple. The reasons for it not to be done like this are throughput, and saving space. This is a huge DB where we insert 10 million values per minute into. Making each of these values one row would mean 10M rows per minute, which is definitely not attainable. Packing 30 of them into a single row reduces the number of rows inserted to something we can do with a single DB, and the overhead data amount (the identity data) much less.
(Case When col is null then 0 else 1 end)
You could use NVL2(val0, 1, 0) + NVL2(val1, 1, 0) + ... since you are using Oracle.
Another option is to use the AVG function, which ignores NULLs:
SELECT AVG(v) FROM (
WITH q AS (SELECT val0, val1, val2, val3 FROM mytable)
SELECT val0 AS v FROM q
UNION ALL SELECT val1 FROM q
UNION ALL SELECT val2 FROM q
UNION ALL SELECT val3 FROM q
);
If you're using Oracle11g you can use the UNPIVOT syntax to make it even simpler.
I see this is a pretty old question, but I don't see a sufficient answer. I had a similar problem, and below is how I solved it. It's pretty clear a case statement is needed. This solution is a workaround for such cases where
SELECT COUNT(column) WHERE column {IS | IS NOT} NULL
does not work for whatever reason, or, you need to do several
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL1 IS NOT NULL;
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL2 IS NOT NULL;
queries but want it as a data set when you run the script. See below; I use this for analysis and it's been working great for me so far.
SUM(CASE NVL(valn, 'X')
WHEN 'X'
THEN 0
ELSE 1
END) as COLUMN_NAME
FROM YOUR_TABLE;
Cheers!
Doug
Generically, you can do something like this:
SELECT (
(COALESCE(val0, 0) + COALESCE(val1, 0) + ...... COALESCE(valn, 0))
/
(SIGN(ABS(COALESCE(val0, 0))) + SIGN(ABS(COALESCE(val1, 0))) + .... )
) AS MyAverage
The top line will return the sum of values (omitting NULL values) whereas the bottom line will return the number of non-null values.
FYI - it's SQL Server syntax, but COALESCE is just like ISNULL for the most part. SIGN just returns -1 for a negative number, 0 for zero, and 1 for a positive number. ABS is "absolute value".