Nested conditions in sql - sql

I have the where condition in the sql:
WHERE
( Spectrum.access.dim_member.centene_ind = 0 )
AND
(
Spectrum.access.Client_List_Groups.Group_Name IN ( 'Centene Health Plan Book of Business' )
AND
Spectrum.access.dim_member.referral_route IN ( 'Claims Data' )
AND
***(
Spectrum.access.fact_task_metrics.task = 'Conduct IHA'
AND
Spectrum.access.fact_task_metrics.created_by_name <> 'BMU, BMU'
AND
Spectrum.access.fact_task_metrics.created_date BETWEEN '01/01/2015 00:0:0' AND '06/30/2015 00:0:0'
)***
AND
***(
Spectrum.access.fact_outreach_metrics.outreach_type IN ( 'Conduct IHA' )
AND
(
Spectrum.dbo.ufnTruncDate(Spectrum.access.fact_outreach_metrics.metric_date) >= Spectrum.access.fact_task_metrics.metric_date
OR
Spectrum.access.fact_outreach_metrics.metric_date >= Spectrum.access.fact_task_metrics.created_date
)
)***
AND
Spectrum.access.fact_outreach_metrics.episode_seq = 1
AND
Spectrum.access.dim_member.reinstated_date Is Null
)
I have marked two of the conditions in the above code.
The 1st condition have 2 AND operators.
The 2nd condition has an AND and an OR operator.
Question 1: Does removing the outer brackets "(" in the 1st condition impact the results?
Question 2: Does removing the outer brackets "(" in the 2nd condition impact the results?
After removing the outer bracket the filters will look like:
Spectrum.access.dim_member.referral_route IN ( 'Claims Data' )
AND
Spectrum.access.fact_task_metrics.task = 'Conduct IHA'
AND
Spectrum.access.fact_task_metrics.created_by_name <> 'BMU, BMU'
AND
Spectrum.access.fact_task_metrics.created_date BETWEEN '01/01/2015 00:0:0' AND '06/30/2015 00:0:0'
AND
Spectrum.access.fact_outreach_metrics.outreach_type IN ( 'Conduct IHA' )
AND
(
Spectrum.dbo.ufnTruncDate(Spectrum.access.fact_outreach_metrics.metric_date) >= Spectrum.access.fact_task_metrics.metric_date
OR
Spectrum.access.fact_outreach_metrics.metric_date >= Spectrum.access.fact_task_metrics.created_date
)
AND
Spectrum.access.fact_outreach_metrics.episode_seq = 1
Appreciate your help.
Regards,
Jude

Order of operations dictate that AND will be processed before OR when these expressions are evaluated within a parenthesis set.
WHERE (A AND B) OR (C AND D)
Is equivalent to:
WHERE A AND B OR C AND D
But the example below:
WHERE (A OR B) AND (C OR D)
Is not equivalent to:
WHERE A OR B AND C OR D
Which really becomes:
WHERE A OR (B AND C) OR D

Technically, you should be able to safely remove the parenthesis in question for both of your examples. With the AND statement, you are adding all of your conditions together to be one large condition. When using the OR clause, you should carefully place the parenthesis so that the groups are properly segmented.
Take the following examples into consideration:
a) where y = 1 AND n = 2 AND x = 3 or x = 5
b) where y = 1 AND n = 2 AND (x = 3 or x = 5)
c) where (y = 1 AND n = 2 AND x = 3) or x = 5
In example A, the intended outcome is unclear.
In example B, the intended outcome states that all of the conditions must be met and X can be either 3 or 5.
In example C, the intended outcome states that either Y=1, N=2 and X=3 OR x=5. As long as X = 5, it doesn't matter what Y and N equal.

Related

Record type comparison with different numbers of columns isn't failing

Why does the following query not trigger a "cannot compare record types with different numbers of columns" error in PostgreSQL 11.6?
with
s AS (SELECT 1)
, main AS (
SELECT (a) = (b) , (a) = (a), (b) = (b), a, b -- I expect (a) = (b) fails
FROM s
, LATERAL (select 1 as x, 2 as y) AS a
, LATERAL (select 5 as x) AS b
)
select * from main;
While this one does:
with
x AS (SELECT 1)
, y AS (select 1, 2)
select (x) = (y) from x, y;
See the note in the docs on row comparison
Errors related to the number or types of elements might not occur if the comparison is resolved using earlier columns.
In this case, because a.x=1 and b.x=5, it returns false without ever noticing that the number of columns doesn't match. Change them to match, and you will get the same exception (which is also why the 2nd query does have that exception).
testdb=# with
s AS (SELECT 1)
, main AS (
SELECT a = b , (a) = (a), (b) = (b), a, b -- I expect (a) = (b) fails
FROM s
, LATERAL (select 5 as x, 2 as y) AS a
, LATERAL (select 5 as x) AS b
)
select * from main;
ERROR: cannot compare record types with different numbers of columns

SQL where/or confusion

I'm running sql statements on a huge db for the first time and I have code as such.
Select x, sum(y), sum(z) from db
where n = 'xxx' or n = 'yyy' and m = int
group by x
Now if I do this
Select x, sum(y), sum(z) from db
where n = 'xxx' and m = int
group by x
Select x, sum(y), sum(z) from db
where n = 'yyy' and m = int
group by x
And manually add the grouped values together from the 2 tables I am getting different results in my queries, with the separated queries being more accurate.
E.G. Result for row 1 will in the first query will be 20 million, Result for adding Row 1's together in the second block of code will be like 18 million? Not sure what the issue is...?
Best to use parentheses when OR's are used with AND's.
select x, sum(y), sum(z) from db
where (n = 'xxx' or n = 'yyy') and m = int
group by x
In SQL, an AND takes precedence over an OR.
So this:
where n = 'xxx' or n = 'yyy' and m = int
Is actually processed as:
where n = 'xxx' or (n = 'yyy' and m = int)
And that gets the n that are 'xxx' with any m.
Anyway, Gordon has a point.
Using an IN for this is better. Even if it's only 2.
Use in. Your code doesn't really make sense:
where n in ('xxx', 'yyy') and m = int
This query:
where n = 'xxx' or 'yyy' and m = int
should return an error in SQL Server, because of the dangling 'yyy'. MySQL accepts this syntax. In that database, it would be processed as:
where n = 'xxx' or 'yyy' and m = int
-- AND has higher precedence than `or`
where n = 'xxx' or ('yyy' and m = int)
-- `'yyy'` is converted to an integer
where n = 'xxx' or (0 and m = int)
-- which is treated as a boolean
where n = 'xxx' or (false and m = int)
-- which is grouped like this
where n = 'xxx' or (false and (m = int))
-- which is equivalent to
where n = 'xxx'

Performing sum per row based on conditions SQL

I wanted to know how I can perform a sum per row based on conditions for the columns using SQL (I'm new to SQL).
For example, I have this table:
ID Col_1 Col_2 Col_3 ...
1 L L L ...
2 L Q Q ...
3 L L Q ...
4 Q Q L ...
The result that I'm looking for is:
ID count_L count_Q
1 3 0
2 1 2
3 2 1
4 1 2
I'm not sure on how I should approach this. Doing this using Count function if my table was transposed would be easier (I think) but performing the query in the way my data is organized is tricky for me. I think I need nested SQL statements and join them using UNION but not sure how to do it.
I wasn't able to find similar questions/solutions elsewhere. Would appreciate some help!
You can use iif() and +:
select id,
(iif(col_1 = "L" , 1, 0) + iif(col_2 = "L" , 1, 0) + iif(col_3 = "L" , 1, 0) ) as count_l,
(iif(col_1 = "Q" , 1, 0) + iif(col_2 = "Q" , 1, 0) + iif(col_3 = "Q" , 1, 0) ) as count_q
from t;

Theta Notation for N to the Power of Log Manipulation

In Asymptotic Notations for Order of Growth; Is the form
Theta(N ^ ( ( LOGb( a / b) + 1 ) ) )
Equivalent to
Theta(N ^ (LOGb( a ) ) ) ??
Where LOGb(a) means LOG a to base b.
Since log(a/b) = log a - log b and LOGb(b) = 1, we have LOGb(a/b)-1 = LOGb(a) - 1 + 1 = LOGb(a). No mention of asymptotics necessary, this equality is exact for all a, b > 0.

How to do a basic left outer join with data.table in R?

I have a data.table of a and b that I've partitioned into below with b < .5 and above with b > .5:
DT = data.table(a=as.integer(c(1,1,2,2,3,3)), b=c(0,0,0,1,1,1))
above = DT[DT$b > .5]
below = DT[DT$b < .5, list(a=a)]
I'd like to do a left outer join between above and below: for each a in above, count the number of rows in below. This is equivalent to the following in SQL:
with dt as (select 1 as a, 0 as b union select 1, 0 union select 2, 0 union select 2, 1 union select 3, 1 union select 3, 1),
above as (select a, b from dt where b > .5),
below as (select a, b from dt where b < .5)
select above.a, count(below.a) from above left outer join below on (above.a = below.a) group by above.a;
a | count
---+-------
3 | 0
2 | 1
(2 rows)
How do I accomplish the same thing with data.tables? This is what I tried so far:
> key(below) = 'a'
> below[above, list(count=length(b))]
a count
[1,] 2 1
[2,] 3 1
[3,] 3 1
> below[above, list(count=length(b)), by=a]
Error in eval(expr, envir, enclos) : object 'b' not found
> below[, list(count=length(a)), by=a][above]
a count b
[1,] 2 1 1
[2,] 3 NA 1
[3,] 3 NA 1
I should also be more specific in that I already tried merge but that blows through the memory on my system (and the dataset takes only about 20% of my memory).
See if this is giving you something useful. Your example is too sparse to let me know what you want, but it appears it might be a tabulation of values of above$a that are also in below$a
table(above$a[above$a %in% below$a])
If you also want the converse ... values not in below, then this would do it:
table(above$a[!above$a %in% below$a])
And you can concatenate them:
> c(table(above$a[above$a %in% below$a]),table(above$a[!above$a %in% below$a]) )
2 3
1 2
Generally table and %in% run in reasonably small footprints and are quick.
Since you appear to be using package data.table: check ?merge.data.table.
I haven't used it, but it appears this might do what you want:
merge(above, below, by="a", all.x=TRUE, all.y=FALSE)
I think this is easier:
setkey(above,a)
setkey(below,a)
Left outer join:
above[below, .N]
regular join:
above[below, .N, nomatch=0]
full outer join with counts:
merge(above,below, all=T)[,.N, by=a]
I eventually found a way to do this with data.table, which I felt is more natural for me to understand than DWin's table, though YMMV:
result = below[, list(count=length(b)), by=a]
key(result) = 'a'
result = result[J(unique(above$a))]
result$count[is.na(result$count)] = 0
I don't know if this could be more compact, though. I especially wanted to be able to do something like result = below[J(unique(above$a)), list(count=length(b))], but that doesn't work.