Faster way of doing multiple checks on one dataset

Faster way of doing multiple checks on one dataset - sql

Is there a better way to rewrite the following:
SELECT DISTINCT Column1, 'Testing #1'
FROM MyTable
WHERE Column2 IS NOT NULL && Column3='Value'
UNION ALL
SELECT DISTINCT Column1, 'Testing #2'
FROM MyTable
WHERE Column3 IS NULL && Column2='Test Value'
UNION ALL
SELECT DISTINCT Column1, 'Testing #3'
FROM MyTable
Where ....
In have about 35 union all statements that all query the same table. I was wondering if there's an easier/faster way to do things.

Yes, you can rewrite it with case statements like this
SELECT Column1,
CASE WHEN Column2 IS NOT NULL AND Column3='Value' THEN 'Testing #1'
WHEN Column3 IS NULL AND Column2='Test Value' THEN 'Testing #2'
ELSE 'Testing #3' END as customcol
FROM MyTable
EDIT : Ok, i am making this edit because according to your comment, there are two issues we need to address. (I am leaving the original answer as it is in case it might help somebody.)
1) Result set should be filtered and there should be no else part.
This is actually achievable with this solution since else is optional and data can be filtered with a where clause at the end.
2) Being able to select the same row multiple times with different Testing # values if it matches the criteria.
This however is not achievable with my previous solution. So i thought of a different one. Hope it fits into your case. Here it is
S1 - Create a new table with Testing # values(Testing #1, Testing #2, Testing #3 etc.). Let's say this table is named Testing.
S2 - JOIN your main table (MyTable) with Testing table which contains Testing # values. So now you have every possible combination of real-data and testing values.
S3 - Filter the results you don't want to appear with a where clause.
S4 - Filter the real-data <-> testing combinations with an addition to where clause.
End query should look something like this :
SELECT M.Column1, T.TestingValue
FROM MyTable M
INNER JOIN Testing T ON 1=1
WHERE
(
(M.Column2 IS NOT NULL AND M.Column3='Value' AND T.TestingValue='Testing #1') OR
(M.Column3 IS NULL AND M.Column2='Test Value' AND T.TestingValue='Testing #2') OR
<conditions for other testing values>
)
AND
<other conditions>
I think this should work and produce the results you want. But since i don't have the data i am not able to run any benchmarks vs the union-based solution. So i don't have any scientific evidence to claim this is faster but it is an option. You can test both and use the better one.
It might be a little late but hope this solves your problem.

You can do this in one statement, but you want a different column for each test:
select column1,
(case when column2 is not null and column3 = 'Value' then 1 else 0
end) as Test1
(case when column3 is null and column3 = 'Test Value' then 1 else 0
end) as Test2,
. . .
from t;
Because you only want cases where things fail, you can put this in a subquery and test for any failure:
select *
from (select column1,
(case when column2 is not null and column3 = 'Value' then 1 else 0
end) as Test1
(case when column3 is null and column3 = 'Test Value' then 1 else 0
end) as Test2,
. . .
from t
) t
where test1 + test2 + . . . > 0

Related

SELECT from CTEs which might be null/undefined

Inside a function/stored procedure in Postgres 9.6 I want to grab data from two different tables using one CTE for each table like so:
WITH "CTE_from_table1" AS (SELECT column1, column2 FROM table1 WHERE id = $1),
"CTE_from_table2" AS (SELECT column1, column2 FROM table2 WHERE id = $2)
SELECT
COALESCE(
"CTE_from_table1".column1,
"CTE_from_table2".column1,
CASE WHEN "CTE_from_table1".column2 = 42 OR "CTE_from_table2".column2 = 42
THEN 'Something is 42' ELSE 'something else!' END
)
FROM "CTE_from_table1","CTE_from_table2";
(Data type of column1 and column2 are resp. identical for both tables: column1 being a text, column2 an integer.)
That works as long as both CTEs are defined. The problem is: The parameters $1 and/or $2 could be null or could contain IDs which are simply not there. In this case, I expect the result something else!, because the first two COALESCE parameters evaluate to null and the third, being the CASE WHEN, should go to its ELSE which would return something else!.
That's my theory. However, in practice I get null as soon as one of the CTEs is undefined/null. What can I do about that?

Your problem is the dreaded comma in the FROM clause. Simple rule . . . Never use commas in the FROM clause. In this case, you want an "outer cross join". The comma does an "inner cross join", so no rows are returned if either CTE has no rows.
Unfortunately, OUTER CROSS JOIN doesn't exist, so you can make do with FULL OUTER JOIN:
WITH "CTE_from_table1" AS (SELECT column1, column2 FROM table1 WHERE id = $1),
"CTE_from_table2" AS (SELECT column1, column2 FROM table2 WHERE id = $2)
SELECT COALESCE(ct1.column1, ct2.column1,
CASE WHEN 42 IN (ct1.column2, ct2.column2)
THEN 'Something is 42'
ELSE 'something else!'
END
)
FROM "CTE_from_table1" ct1 FULL OUTER JOIN
"CTE_from_table2" ct2
ON 1=1;
I'm not a big fan of mixing CASE and COALESCE(), so I'd be inclined to write:
SELECT (CASE WHEN ct1.column1 IS NOT NULL THEN ct1.column1
WHEN ct2.column1 IS NOT NULL THEN ct2.column1
WHEN 42 IN (ct1.column2, ct2.column2) THEN 'Something is 42'
ELSE 'something else!'
END)

Joining two datasets with subqueries

I am attempting to join two large datasets using BigQuery. they have a common field, however the common field has a different name in each dataset.
I want to count number of rows and sum the results of my case logic for both table1 and table2.
I believe that I have errors resulting from subquery (subselect?) and syntax errors. I have tried to apply precedent from similar posts but I still seem to be missing something. Any assistance in getting this sorted is greatly appreciated.
SELECT
table1.field1,
table1.field2,
(
SELECT COUNT (*)
FROM table1) AS table1_total,
sum(case when table1.mutually_exclusive_metric1 = "Y" then 1 else 0 end) AS t1_pass_1,
sum(case when table1.mutually_exclusive_metric1 = "Y" AND table1.mutually_exclusive_metric2 IS null OR table1.mutually_exclusive_metric3 = 'Y' then 1 else 0 end) AS t1_pass_2,
sum(case when table1.mutually_exclusive_metric3 ="Y" AND table1.mutually_exclusive_metric2 ="Y" AND table1.mutually_exclusive_metric3 ="Y" then 1 else 0 end) AS t1_pass_3,
(
SELECT COUNT (*)
FROM table2) AS table2_total,
sum(case when table2.metric1 IS true then 1 else 0 end) AS t2_pass_1,
sum(case when table2.metric2 IS true then 1 else 0 end) AS t2_pass_2,
(
SELECT COUNT (*)
FROM dataset1.table1 JOIN EACH dataset2.table2 ON common_field_table1 = common_field_table2) AS overlap
FROM
dataset1.table1,
dataset2.table2
WHERE
XYZ
Thanks in advance!

Sho. Lets take this one step at a time:
1) Using * is not explicit, and being explicit is good. Additionally, stating explicit selects and * will duplicate selects with autorenames. table1.field will become table1_field. Unless you are just playing around, don't use *.
2) You never joined. A query with a join looks like this (note order of WHERE and GROUP statements, note naming of each):
SELECT
t1.field1 AS field1,
t2.field2 AS field2
FROM dataset1.table1 AS t1
JOIN dataset2.table2 AS t2
ON t1.field1 = t2.field1
WHERE t1.field1 = "some value"
GROUP BY field1, field2
Where t1.f1 = t2.f1 contain corresponding values. You wouldn't repeat those in the select.
3) Use whitespace to make your code easier to read. It helps everyone involved, including you.
4) Your subselects are pretty useless. A subselect is used instead of creating a new table. For example, you would use a subselect to group or filter out data from an existing table. For example:
SELECT
subselect.field1 AS ssf1,
subselect.max_f1 AS ss_max_f1
FROM (
SELECT
t1.field1 AS field1,
MAX(t1.field1) AS max_f1,
FROM dataset1.table1 AS t1
GROUP BY field1
) AS subselect
The subselect is practically a new table that you select from. Treat it logically like it happens first, and you take the results from that and use it in your main select.
5) This was a terrible question. It didn't even look like you tried to figure things out one step at a time.

Case when statement in SQL

I am using the following query. In this query I want to apply the where clause based on passed parameter. But the issue is that where clause is like 'value = if parameterVal = 'I' than NULL else NOT NULL'
I've build a query like this
SELECT * FROM MASTER
WHERE
Column1 IS (CASE WHEN :Filter = 'I' THEN 'NULL' ELSE 'NOT NULL' END)
but it's not working. Help me solve this.
UPDATE
Updating question to elaborate question more clearly.
I've one table MASTER. Now I am passing one parameter in query that is Filter (indicated by :Filter in query).
Now when the Filter parameter's value is 'I' than it should return the following result.
SELECT * FROM MASTER WHERE Column1 IS NULL
but if the passed argument is not equal to 'I' than,
SELECT * FROM MASTER WHERE Column1 IS NOT NULL

SELECT * FROM MASTER
WHERE (Filter = 'I' AND Column1 IS NULL)
OR
(Filter <> 'I' AND Column1 IS NOT NULL)

If you really insist on using a CASE the SELECT could be rewritten as:
SELECT *
FROM MASTER
WHERE CASE
WHEN COLUMN1 IS NULL AND FILTER = 'I' THEN 1
WHEN COLUMN1 IS NOT NULL AND FILTER <> 'I' THEN 1
ELSE 0
END = 1
SQLFiddle here
Frankly, though, I think that this is very difficult to interpret, and I suggest that #MAli's version is better.

Your case has assignment not equality check

Better use of redundant subquery in sql- case

I have a question for SQL- Statements: Is it possible to "define a sub query" for multiple use in case. It sounds a bit confusing but with the following example I think it is clear what I have in mind:
select
Column1,
Column2,
Case
WHEN <BigSubquery> > 0 THEN <BigSubquery>
ELSE 0
END
from ...
How can I do this, or what can I use. I have such a query which works wonderful, but it is a huge code and not useable for maintenance.

If you are using a subquery, you should put the condition in the subquery. For instance, if you have:
(select sum(x) from . . . )
Then do:
(select (case when sum(x) > 0 then sum(x) else 0 end) from . . .

If you rewrite your query as
select
Column1,
Column2,
Case
WHEN Column3 > 0 THEN Column3
ELSE 0
END
from
(
select
Column1,
Column2,
BigSubquery as Column3
from ...
)
t
Then you avoid duplicating "BigSubquery", but you do duplicate the select list.

Best way to write union query when dealing with NULL and Empty String values

I have to write a query that performs a union between two tables with similar data. The results need to be distinct. The problem I have is that some fields that should be the same are not when it comes to empty values. Some are indicated as null, and some have empty string values. My question is, is there a better way to perform the following query? (without fixing the actual data to ensure proper defaults are set, etc) Will using the Case When be a big performance hit?
Select
When Column1 = '' Then NULL Else Column1 as [Column1],
When Column2 = '' Then NULL Else Column2 as [Column2]
From TableA
UNION ALL
Select
When Column1 = '' Then NULL Else Column1 as [Column1],
When Column2 = '' Then NULL Else Column2 as [Column2]
From TableB

I don't think it would make any difference in performance, but NULLIF is another way to write this and, IMHO, looks a little cleaner.
Select
NULLIF(Column1, '') as [Column1],
NULLIF(Column2, '') as [Column2]
From TableA
UNION
Select
NULLIF(Column1, '') as [Column1],
NULLIF(Column2, '') as [Column2]
From TableB

Use UNION to remove duplicates - it's slower than UNION ALL for this functionality:
SELECT CASE
WHEN LEN(LTRIM(RTRIM(column1))) = 0 THEN NULL
ELSE column1
END AS column1,
CASE
WHEN LEN(LTRIM(RTRIM(column2))) = 0 THEN NULL
ELSE column2
END AS column2
FROM TableA
UNION
SELECT CASE
WHEN LEN(LTRIM(RTRIM(column1))) = 0 THEN NULL
ELSE column1
END,
CASE
WHEN LEN(LTRIM(RTRIM(column2))) = 0 THEN NULL
ELSE column2
END
FROM TableB
I changed the logic to return NULL if the column value contains any number of spaces and no actual content.
CASE expressions are ANSI, and more customizable than NULLIF/etc syntax.

A Case should perform fine, but IsNull is more natural in this situation. And if you're searching for distinct rows, doing a union instead of a union all will accomplish that (thanks to Jeffrey L Whitledge for pointing this out):
select IsNull(col1, '')
, IsNull(col2, '')
from TableA
union
select IsNull(col1, '')
, IsNull(col2, '')
from TableB

You can keep your manipulation operations separate from the union if you do whatever manipulation you want (substitute NULL for the empty string) in a separate view, then union the views.
You shouldn't have to apply the same manipulation on both sets, though.
If that's the case, union them first, then apply the manipulation to the resulting, unioned set once.
Half as much manipulation code to support that way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Faster way of doing multiple checks on one dataset - sql

Related

SELECT from CTEs which might be null/undefined

Joining two datasets with subqueries

Case when statement in SQL

Better use of redundant subquery in sql- case

Best way to write union query when dealing with NULL and Empty String values

Categories

Resources