How to generate Dynamic Order by clause in PL/SQL procedure? - sql

I am trying to write a PL/SQL procedure which will have the SQL query to get the results. But the requirement is that the order by can be dynamic and is mainly for sorting the columns in the screen. I am passing 2 parameters to this procedure - in_sort_column and in_sort_order.
The requirement is such that on text columns the sorting is in ASC and for numbers it is DESC.
My query looks something like this without adding the in_sort_order -
SELECT col1, col2, col3 from tabl e1 where col1 > 1000
ORDER BY decode(in_sort_column,'col1', col1, 'col2', col2, 'col3', col3);
I am not able to figure out how to use the in_sort_order parameter in this case. Can someone who has done this before help out ?
Thanks

When doing a dynamic sort, I recommend using separate clauses:
order by (case when in_sort_column = 'col1' then col1 end),
(case when in_sort_column = 'col2' then col2 end),
(case when in_sort_column = 'col3' then col3 end)
This guarantees that you will not have an unexpected problem with type conversion, if the columns are of different types. Note that case return NULL without an else clause.

Since the requirement is based on data type, you could just negate the numeric columns in your decode; if col1 is numeric and the others are text then:
ORDER BY decode(in_sort_column, 'col1', -col1, 'col2', col2, 'col3', col3);
But this is going to attempt to convert the text columns to numbers. You can swap the decode or around to avoid that, but you then do an implicit conversion of your numeric column to a string, and your numbers will then be sorted alphabetically - so 2 comes after 10, for example.
So Gordon Linoff's use of case is better, and you can still negate the col1 value with that to make the numbers effectively sort descending.

Related

Best way to compare three columns in sql Hive

I need to do some comparison through 3 columns containing string dates 'yyyy-mm-dd', in Hive SQL. Please take in consideration that the table has more than 2 million records.
Consider three columns (col1; col2; col3) from table T1, I must guarantee that:
col1 = col2, and both, or at least one is different from col3.
My best regards,
Logically you have an issue.
col1 = col2
Therefore if col1 != col3 then col2 != col3;
There for it's really enough to use:
select * from T1 where col1 = col2 and col1 != col3;
It is appropriate to do this map side so using a where criteria is likely good enough.
If you wanted to say 2 out of the 3 need to match you could use group by with having to reduce comparisons.

SQL query to find columns having at least one non null value

I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.

How does CASE WHEN THEN treat NULL values

Assuming that I have the following piece of code in the SELECT clause which is being executed on Spark:
...
MEAN(CASE
WHEN (col1 = 'A'
AND (col3 = 'A' OR col4 = 'B')) THEN col2
END) AS testing,
...
What would be the output of this query when col2 is NULL? Are the rows containing col2=NULL be ignored by the MEAN function?
Disclaimer - don't know Apache Spark!
I've created a SQL Fiddle - http://sqlfiddle.com/#!9/6f7d5e/3.
If col2 is null, it is not included in the average, unless all the matching records are null.
The result will be NULL. It will have the type of col2 -- this might matter in some databases (or if you are saving the result to a table).
What is the MEAN() function? To calculate the average, use AVG(). This is the standard function for calculating averages in SQL.

Inline If Statements in SQL

I wish to do something like this:
DECLARE #IgnoreNulls = 1;
SELECT Col1, Col2
FROM tblSimpleTable
IF #IgnoreNulls
BEGIN
WHERE Col2 IS NOT NULL
END
ORDER BY Col1 DESC;
The idea is to, in a very PHP/ASP.NET-ish kinda way, only filter NULLs if the user wishes to. Is this possible in T-SQL? Or do we need one large IF block like so:
IF #IgnoreNulls
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
WHERE Col2 IS NOT NULL
ORDER BY Col1 DESC;
END
ELSE
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
ORDER BY Col1 DESC;
END
You can do that this way:
SELECT Col1, Col2
FROM tblSimpleTable
WHERE ( #IgnoreNulls != 1 OR Col2 IS NOT NULL )
ORDER BY Col1 DESC
Dynamically changing searches based on the given parameters is a complicated subject and doing it one way over another, even with only a very slight difference, can have massive performance implications. The key is to use an index, ignore compact code, ignore worrying about repeating code, you must make a good query execution plan (use an index).
Read this and consider all the methods. Your best method will depend on your parameters, your data, your schema, and your actual usage:
Dynamic Search Conditions in T-SQL by by Erland Sommarskog
In general (unless the table is small) the best approach is to separate out the cases and do something like you have in your question.
IF (#IgnoreNulls = 1)
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
WHERE Col2 IS NOT NULL
ORDER BY Col1 DESC;
END
ELSE
BEGIN
SELECT Col1, Col2
FROM tblSimpleTable
ORDER BY Col1 DESC;
END
This is less likely to cause you problems with sub optimal query plans being cached.

SQL Server 2008 - Case / If statements in SELECT Clause [duplicate]

This question already has answers here:
How do I perform an IF...THEN in an SQL SELECT?
(29 answers)
Closed 9 years ago.
I have a Query that's supposed to run like this -
If(var = xyz)
SELECT col1, col2
ELSE IF(var = zyx)
SELECT col2, col3
ELSE
SELECT col7,col8
FROM
.
.
.
How do I achieve this in T-SQL without writing separate queries for each clause? Currently I'm running it as
IF (var = xyz) {
Query1
}
ELSE IF (var = zyx) {
Query2
}
ELSE {
Query3
}
That's just a lot of redundant code just to select different columns depending on a value.
Any alternatives?
You are looking for the CASE statement
http://msdn.microsoft.com/en-us/library/ms181765.aspx
Example copied from MSDN:
USE AdventureWorks;
GO
SELECT ProductNumber, Category =
CASE ProductLine
WHEN 'R' THEN 'Road'
WHEN 'M' THEN 'Mountain'
WHEN 'T' THEN 'Touring'
WHEN 'S' THEN 'Other sale items'
ELSE 'Not for sale'
END,
Name
FROM Production.Product
ORDER BY ProductNumber;
GO
Just a note here that you may actually be better off having 3 separate SELECTS for reasons of optimization. If you have one single SELECT then the generated plan will have to project all columns col1, col2, col3, col7, col8 etc, although, depending on the value of the runtime #var, only some are needed. This may result in plans that do unnecessary clustered index lookups because the non-clustered index Doesn't cover all columns projected by the SELECT.
On the other hand 3 separate SELECTS, each projecting the needed columns only may benefit from non-clustered indexes that cover just your projected column in each case.
Of course this depends on the actual schema of your data model and the exact queries, but this is just a heads up so you don't bring the imperative thinking mind frame of procedural programming to the declarative world of SQL.
Try something like
SELECT
CASE var
WHEN xyz THEN col1
WHEN zyx THEN col2
ELSE col7
END AS col1,
...
In other words, use a conditional expression to select the value, then rename the column.
Alternately, you could build up some sort of dynamic SQL hack to share the query tail; I've done this with iBatis before.
Simple CASE expression:
CASE input_expression
WHEN when_expression THEN result_expression [ ...n ]
[ ELSE else_result_expression ]
END
Searched CASE expression:
CASE
WHEN Boolean_expression THEN result_expression [ ...n ]
[ ELSE else_result_expression ]
END
Reference: http://msdn.microsoft.com/en-us/library/ms181765.aspx
CASE is the answer, but you will need to have a separate case statement for each column you want returned. As long as the WHERE clause is the same, there won't be much benefit separating it out into multiple queries.
Example:
SELECT
CASE #var
WHEN 'xyz' THEN col1
WHEN 'zyx' THEN col2
ELSE col7
END,
CASE #var
WHEN 'xyz' THEN col2
WHEN 'zyx' THEN col3
ELSE col8
END
FROM Table
...
The most obvious solutions are already listed. Depending on where the query is sat (i.e. in application code) you can't always use IF statements and the inline CASE statements can get painful where lots of columns become conditional.
Assuming Col1 + Col3 + Col7 are the same type, and likewise Col2, Col4 + Col8 you can do this:
SELECT Col1, Col2 FROM tbl WHERE #Var LIKE 'xyz'
UNION ALL
SELECT Col3, Col4 FROM tbl WHERE #Var LIKE 'zyx'
UNION ALL
SELECT Col7, Col8 FROM tbl WHERE #Var NOT LIKE 'xyz' AND #Var NOT LIKE 'zyx'
As this is a single command there are several performance benefits with regard to plan caching. Also the Query Optimiser will quickly eliminate those statements where #Var doesn't match the appropriate value without touching the storage engine.