SELECT from CTEs which might be null/undefined - sql

Inside a function/stored procedure in Postgres 9.6 I want to grab data from two different tables using one CTE for each table like so:
WITH "CTE_from_table1" AS (SELECT column1, column2 FROM table1 WHERE id = $1),
"CTE_from_table2" AS (SELECT column1, column2 FROM table2 WHERE id = $2)
SELECT
COALESCE(
"CTE_from_table1".column1,
"CTE_from_table2".column1,
CASE WHEN "CTE_from_table1".column2 = 42 OR "CTE_from_table2".column2 = 42
THEN 'Something is 42' ELSE 'something else!' END
)
FROM "CTE_from_table1","CTE_from_table2";
(Data type of column1 and column2 are resp. identical for both tables: column1 being a text, column2 an integer.)
That works as long as both CTEs are defined. The problem is: The parameters $1 and/or $2 could be null or could contain IDs which are simply not there. In this case, I expect the result something else!, because the first two COALESCE parameters evaluate to null and the third, being the CASE WHEN, should go to its ELSE which would return something else!.
That's my theory. However, in practice I get null as soon as one of the CTEs is undefined/null. What can I do about that?

Your problem is the dreaded comma in the FROM clause. Simple rule . . . Never use commas in the FROM clause. In this case, you want an "outer cross join". The comma does an "inner cross join", so no rows are returned if either CTE has no rows.
Unfortunately, OUTER CROSS JOIN doesn't exist, so you can make do with FULL OUTER JOIN:
WITH "CTE_from_table1" AS (SELECT column1, column2 FROM table1 WHERE id = $1),
"CTE_from_table2" AS (SELECT column1, column2 FROM table2 WHERE id = $2)
SELECT COALESCE(ct1.column1, ct2.column1,
CASE WHEN 42 IN (ct1.column2, ct2.column2)
THEN 'Something is 42'
ELSE 'something else!'
END
)
FROM "CTE_from_table1" ct1 FULL OUTER JOIN
"CTE_from_table2" ct2
ON 1=1;
I'm not a big fan of mixing CASE and COALESCE(), so I'd be inclined to write:
SELECT (CASE WHEN ct1.column1 IS NOT NULL THEN ct1.column1
WHEN ct2.column1 IS NOT NULL THEN ct2.column1
WHEN 42 IN (ct1.column2, ct2.column2) THEN 'Something is 42'
ELSE 'something else!'
END)

Related

Select statement subquery, multiple conditions

I am trying to create a query to select a certain condition then within that condition select two other conditions.
Breaking it down.
SELECT condition 1 FROM column 2, if this condition is not met return nothing.
SELECT condition 2 FROM column 3, SELECT condition 3 FROM column 4, if either of these two conditions are met return the respective column value from that rows value.
My feeble attempt which gives an obvious syntax error,
SELECT Column_1
FROM Data_TBL
WHERE Column_2 = 'Condition_1'
GROUP BY(WHERE Column_3 = 'Condition_2' OR Column_4 = 'Condition_3')
ORDER BY Column_1 ASC
Still very new to SQL statements and I am struggling with the syntax.
I think you just need a where clause. For the filtering:
select t.*
from data_tbl t
where (column2 = 'Condition_1') and
(column3 = 'Condition_2' or column4 = 'Condition_3);
I'm not sure what you want to return when both column3 and column4 meet the respective conditions, but I think this is what you want:
select (case when column3 = 'Condition_2' then column3 else column4 end)
from data_tbl t
where (column2 = 'Condition_1') and
(column3 = 'Condition_2' or column4 = 'Condition_3);

Joining two datasets with subqueries

I am attempting to join two large datasets using BigQuery. they have a common field, however the common field has a different name in each dataset.
I want to count number of rows and sum the results of my case logic for both table1 and table2.
I believe that I have errors resulting from subquery (subselect?) and syntax errors. I have tried to apply precedent from similar posts but I still seem to be missing something. Any assistance in getting this sorted is greatly appreciated.
SELECT
table1.field1,
table1.field2,
(
SELECT COUNT (*)
FROM table1) AS table1_total,
sum(case when table1.mutually_exclusive_metric1 = "Y" then 1 else 0 end) AS t1_pass_1,
sum(case when table1.mutually_exclusive_metric1 = "Y" AND table1.mutually_exclusive_metric2 IS null OR table1.mutually_exclusive_metric3 = 'Y' then 1 else 0 end) AS t1_pass_2,
sum(case when table1.mutually_exclusive_metric3 ="Y" AND table1.mutually_exclusive_metric2 ="Y" AND table1.mutually_exclusive_metric3 ="Y" then 1 else 0 end) AS t1_pass_3,
(
SELECT COUNT (*)
FROM table2) AS table2_total,
sum(case when table2.metric1 IS true then 1 else 0 end) AS t2_pass_1,
sum(case when table2.metric2 IS true then 1 else 0 end) AS t2_pass_2,
(
SELECT COUNT (*)
FROM dataset1.table1 JOIN EACH dataset2.table2 ON common_field_table1 = common_field_table2) AS overlap
FROM
dataset1.table1,
dataset2.table2
WHERE
XYZ
Thanks in advance!
Sho. Lets take this one step at a time:
1) Using * is not explicit, and being explicit is good. Additionally, stating explicit selects and * will duplicate selects with autorenames. table1.field will become table1_field. Unless you are just playing around, don't use *.
2) You never joined. A query with a join looks like this (note order of WHERE and GROUP statements, note naming of each):
SELECT
t1.field1 AS field1,
t2.field2 AS field2
FROM dataset1.table1 AS t1
JOIN dataset2.table2 AS t2
ON t1.field1 = t2.field1
WHERE t1.field1 = "some value"
GROUP BY field1, field2
Where t1.f1 = t2.f1 contain corresponding values. You wouldn't repeat those in the select.
3) Use whitespace to make your code easier to read. It helps everyone involved, including you.
4) Your subselects are pretty useless. A subselect is used instead of creating a new table. For example, you would use a subselect to group or filter out data from an existing table. For example:
SELECT
subselect.field1 AS ssf1,
subselect.max_f1 AS ss_max_f1
FROM (
SELECT
t1.field1 AS field1,
MAX(t1.field1) AS max_f1,
FROM dataset1.table1 AS t1
GROUP BY field1
) AS subselect
The subselect is practically a new table that you select from. Treat it logically like it happens first, and you take the results from that and use it in your main select.
5) This was a terrible question. It didn't even look like you tried to figure things out one step at a time.

Faster way of doing multiple checks on one dataset

Is there a better way to rewrite the following:
SELECT DISTINCT Column1, 'Testing #1'
FROM MyTable
WHERE Column2 IS NOT NULL && Column3='Value'
UNION ALL
SELECT DISTINCT Column1, 'Testing #2'
FROM MyTable
WHERE Column3 IS NULL && Column2='Test Value'
UNION ALL
SELECT DISTINCT Column1, 'Testing #3'
FROM MyTable
Where ....
In have about 35 union all statements that all query the same table. I was wondering if there's an easier/faster way to do things.
Yes, you can rewrite it with case statements like this
SELECT Column1,
CASE WHEN Column2 IS NOT NULL AND Column3='Value' THEN 'Testing #1'
WHEN Column3 IS NULL AND Column2='Test Value' THEN 'Testing #2'
ELSE 'Testing #3' END as customcol
FROM MyTable
EDIT : Ok, i am making this edit because according to your comment, there are two issues we need to address. (I am leaving the original answer as it is in case it might help somebody.)
1) Result set should be filtered and there should be no else part.
This is actually achievable with this solution since else is optional and data can be filtered with a where clause at the end.
2) Being able to select the same row multiple times with different Testing # values if it matches the criteria.
This however is not achievable with my previous solution. So i thought of a different one. Hope it fits into your case. Here it is
S1 - Create a new table with Testing # values(Testing #1, Testing #2, Testing #3 etc.). Let's say this table is named Testing.
S2 - JOIN your main table (MyTable) with Testing table which contains Testing # values. So now you have every possible combination of real-data and testing values.
S3 - Filter the results you don't want to appear with a where clause.
S4 - Filter the real-data <-> testing combinations with an addition to where clause.
End query should look something like this :
SELECT M.Column1, T.TestingValue
FROM MyTable M
INNER JOIN Testing T ON 1=1
WHERE
(
(M.Column2 IS NOT NULL AND M.Column3='Value' AND T.TestingValue='Testing #1') OR
(M.Column3 IS NULL AND M.Column2='Test Value' AND T.TestingValue='Testing #2') OR
<conditions for other testing values>
)
AND
<other conditions>
I think this should work and produce the results you want. But since i don't have the data i am not able to run any benchmarks vs the union-based solution. So i don't have any scientific evidence to claim this is faster but it is an option. You can test both and use the better one.
It might be a little late but hope this solves your problem.
You can do this in one statement, but you want a different column for each test:
select column1,
(case when column2 is not null and column3 = 'Value' then 1 else 0
end) as Test1
(case when column3 is null and column3 = 'Test Value' then 1 else 0
end) as Test2,
. . .
from t;
Because you only want cases where things fail, you can put this in a subquery and test for any failure:
select *
from (select column1,
(case when column2 is not null and column3 = 'Value' then 1 else 0
end) as Test1
(case when column3 is null and column3 = 'Test Value' then 1 else 0
end) as Test2,
. . .
from t
) t
where test1 + test2 + . . . > 0

Selecting filtered rows with SQL

I am constructing an SQL statement with some parameters. Finally, an SQL statement is created like
"select * from table where column1 = "xyz"".
But I also need the rows which are filtered with this statement. In this case they're rows which are not "xyz" valued in column1. More specifically, I am looking for something like INVERSE(select * from table where ...). Is it possible?
Edit: My bad, I know I can do it with != or operator. Here the case is, select statement may be more complex (with some ANDs and equal, greater operators). Let's assume a table has A,B,C and my SQL statement brings only A as result. But I need B and C while I only have the statement which brings A.
select * from table where column1 != 'xyz' or column1 is null;
If you want the other ones, do it like this:
select * from table where column1 <> "xyz"
column1 <> (differs from) "xyz"
To check if something is no equal you can use <> or even !=
SELECT *
FROM yourTable
WHERE <> 'xyz'
OR
SELECT *
FROM yourTable
WHERE != 'xyz'
Many database vendors support (see list) both versions of the syntax.
If you're retrieving both result sets at about the same time, and just want to process the xyz ones first, you could do:
select *,CASE WHEN column1 = "xyz" THEN 1 ELSE 0 END as xyz from table
order by CASE WHEN column1 = "xyz" THEN 1 ELSE 0 END desc
This will return all of the rows in one result set. Whilst xyz = 1, these were the rows with column1 = 'xyz'.
It was :
"select * from table where rowId NOT IN (select rowId from table where column1 = "xyz")
I needed a unique rowId column to achieve this.

What is the fastest/easiest way to tell if 2 records in the same SQL table are different?

I want to be able to compare 2 records in the same SQL table and tell if they are different. I do not need to tell what is different, just that they are different.
Also, I only need to compare 7 of 10 columns in the records. ie.) each record has 10 columns but I only care about 7 of these columns.
Can this be done through SQL or should I get the records in C# and hash them to see if they are different values?
You can write a group by query like this:
SELECT field1, field2, field3, .... field7, COUNT(*)
FROM table
[WHERE primary_key = key1 OR primary_key = key2]
GROUP BY field1, field2, field3, .... field7
HAVING COUNT(*) > 1
That way you get all records with same values for field 1 to 7, along with the number of occurrences.
Add the part between brackets to limit your search for duplicates, either with OR, or with IN (...).
IF EXISTS (SELECT Col1, Col2, ColEtc...
from MyTable
where condition1
EXCEPT SELECT Col1, Col2, ColEtc...
from MyTable
where condition2)
BEGIN
-- Query returns all rows from first set that are not column for column
-- also in the second (EXCEPT) set. So if there are any, there will be
-- rows returned, which meets the EXISTS criteria. Since you're only
-- checking EXISTS, SQL doesn't actually need to return columns.
END
No hash is necessary. Normal equality comparison is enough:
select isEqual = case when t1.a <> t2.a or t1.b <> t2.b bbb then 1 else 0 end
SELECT
CASE WHEN (a.column1, a.column2, ..., a.column7)
= (b.column1, b.column2, ..., b.column7)
THEN 'all 7 columns same'
ELSE 'one or more of the 7 columns differ'
END AS result
FROM tableX AS a
JOIN tableX AS b
ON t1.PK = #PK_of_row_one
AND t2.PK = #PK_of_row_two
Can't you just use the DISTINCT keyword? All duplicates will not be returned, so each row you receive is unique (and different from the others).
http://www.mysqlfaqs.net/mysql-faqs/SQL-Statements/Select-Statement/How-does-DISTINCT-work-in-MySQL
So you could make this query:
SELECT DISTINCT x,y,z FROM RandomTable WHERE x = something
Which will only return one row for each unique x,y,z combination.