Implementation discussion - sql

I am in a position where I want multiple counts from a single table based on different combination of conditions.
The table has 2 flags: A & B.
I want count for following criteria on same page:
A is true (Don't care about B)
A is false (Don't care about B)
A is true AND B is true
A is false AND B is true
A is true AND B is false
A is false AND B is false
B is true (Don't care about A)
B is false (Don't care about A)
I want all above count on same page. Which of following will a good approach for this:
Query for count on that table for each condition. [That is firing 8 queries every time user gives the command.]
Query for list of data from database and then count values for appropriate conditions on UI.
Which option should I choose? Do you know any other alternative for this?

Your table essentially looks like this (The ID column is redundant, but I expect you have other data in your actual table anyway.):
CREATE TABLE `stuff` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`a` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
`b` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
Some sample data:
INSERT INTO `stuff` (`id`, `a`, `b`) VALUES (1, 0, 0);
INSERT INTO `stuff` (`id`, `a`, `b`) VALUES (2, 0, 1);
INSERT INTO `stuff` (`id`, `a`, `b`) VALUES (3, 1, 0);
INSERT INTO `stuff` (`id`, `a`, `b`) VALUES (4, 1, 1);
This query (in mysql, I'm not sure about other DBMS) should produce the results you want.
select
count(if (a = 1, 1, NULL)) as one,
count(if (a = 0, 1, NULL)) as two,
count(if (a = 1 && b = 1, 1, NULL)) as three,
count(if (a = 0 && b = 1, 1, NULL)) as four,
count(if (a = 1 && b = 0, 1, NULL)) as five,
count(if (a = 0 && b = 0, 1, NULL)) as six,
count(if (b = 1, 1, NULL)) as seven,
count(if (b = 0, 1, NULL)) as eight
from stuff
group by null
With the sample, simple data above, the query generates:
one, two, three, four, five, six, seven, eight
2 , 2 , 1, 1, 1, 1, 2, 2
Notes:
group by null
This just causes every row ro be in the group.
count(...)
This function counts all the NON null values in the group, which is why we use the if(...) to return null if the condition is not met.

Create a query that already does the counting. At least with SQL this is not hard.

In my opinion 2nd option is better as you are querying only once. Firing 8 Queries to DB might later impact on performance.

Databases are designed to give you the data you want. In almost all cases, asking for what you want, is quicker than asking for everything and calculate or filter yourself. I'd say, you should blindly go for option 1 (ask what you need) and if it really does not work consider option 2 (or something else).
If every flag is true or false (no null values.) You don't need 8 queries, 4 would be enough.
Get the total
A true (don't care about B)
B true (don't care about A)
A and B true
'A true and B false' is second minus fourth, (A true) - (A and B true). And 'A and B false' = total - A true - B true + A and B true. Look for Inclusion exclusion principle for more information.

Related

Difference between combination of AND and OR when placed inside the parentheses vs when placed stand alone in a conditional statement

I am getting different number of results when I have the script like the following:
select count(distinct(t1.ticketid)),t1.TicketStatus from ticket as t1
inner join Timepoint as t2 on t1.TicketID=t2. ticketid
where
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10
)
And when I use it like this:
select count(distinct(t1.ticketid)),t1.TicketStatus from ticket as t1
inner join Timepoint as t2 on t1.TicketID=t2. ticketid
where
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
t1.TicketBuildStatusID<>12 AND
t1.TicketBuildStatusID<>11 AND
t1.TicketBuildStatusID<>10
Can someone tell me why there is a difference, to me the logic is the same!
Thanks,
In your example, it won't matter because you have all AND clauses. That said, you need to be aware of precedence (ie order of operations) where NOT comes before AND, AND comes before OR and so on.
So just like 3 + 3 x 0 means 3 + (3 x 0), A or B and C means A or (B and C), even if that's not what you meant.
So in cases where you have mixed AND and OR clauses, it matters a lot.
Consider this example:
select *
from A, B
where A.id = B.id and A.family_code = 'ABC' or A.family_code = 'DEF'
It's horrible code, I admit, but for illustrative purposes, bear with me.
You may have meant this:
select *
from A, B
where A.id = B.id and (A.family_code = 'ABC' or A.family_code = 'DEF')
but you said this:
select *
from A, B
where (A.id = B.id and A.family_code = 'ABC') or A.family_code = 'DEF'
Which in the construct above completely blows away your join, resulting in a cartesian product for all cases where the family code is DEF.
So bottom line: when you mix clauses (AND, OR, NOT), it's best to use parentheses to be explicit about what you mean, even when it's not necessary.
Food for thought.
-- EDIT --
The question was changed after I wrote this so that the queries were NOT the same (ands were changed to ors).
Hopefully my explanation still helps.
After the edited to your question there will now be a difference.
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10
)
This query will return values where t1.TicketBuildStatusID is 10, 11 and 12. It states that it should not be 10 (so 11 and 12), or not be 11 (so 10 and 11), or not be 12 (so 10 and 11).
Yes, those queries will produce different results. In fact, the first query will return every value of TicketBuildStatusID unless it has a value of NULL.
When TicketBuildStatusID has a value or 12 it doesn't have a value of 11 or 12 so the expression (t1.TicketBuildStatusID<>12 OR t1.TicketBuildStatusID<>11 OR t1.TicketBuildStatusID<>10), is true. If it has a value of 11, then the same applies again, and for every other possible value, apart from NULL (as {expression}<>NULL = NULL which is not true).
when you do this
AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10)
you are basically doing no filter because any of the condition evaluated to true will make all the condition true e.i.
true AND (true or false or false) = true
when you do this all conditions should match like status should not be 12,11,10
AND
t1.TicketBuildStatusID<>12 AND
t1.TicketBuildStatusID<>11 AND
t1.TicketBuildStatusID<>10
OR isn't the logic that you want. Because if x = 12, then it is not 11. So, all values match x <> 12 and x <> 11.
So, just simply the logic and use not in:
select count(distinct t1.ticketid), t1.TicketStatus
from ticket t1 inner join
Timepoint t2
on t1.TicketID = t2.ticketid
where t2.BuilderAnalystID = 10 and
t1.SubmissionDT >= '2018-04-01' and
t1.TicketBuildStatusID not in (12, 11, 10)
Notes:
distinct is not a function, so there is no need to place the following expression in parentheses.
Use standard date formats. Either 'YYYYMMDD' or 'YYYY-M-DD'.

How to query the same set of columns with different set of values on the same query efficiently

I'm using SQL SERVER 2014 and I have this query which needs to be rebuilt to be more efficient in what it is trying to accomplish.
As an example, I created this schema and added data to it so we could replicate the problem. You can try it at rextester (http://rextester.com/AIYG36293)
create table Dogs
(
Name nvarchar(20),
Owner_ID int,
Shelter_ID int
);
insert into Dogs values
('alpha', 1, 1),
('beta', 2, 1),
('charlie', 3, 1),
('beta', 1, 2),
('alpha', 2, 2),
('charlie', 3, 2),
('charlie', 1, 3),
('beta', 2, 3),
('alpha', 3, 3);
I want to find out which Shelter has these set of owner and dog name combinations and it must be exact. This is the query I'm using right now (this is more or less what query Entity Framework generated but with some slight changes to make it simpler):
SELECT DISTINCT
Shelter_ID
FROM Dogs AS [Extent1]
WHERE ( EXISTS (SELECT
1 AS [C1]
FROM [Dogs] AS [Extent2]
WHERE [Extent1].[Shelter_ID] = [Extent2].[Shelter_ID] AND [Extent2].[Name] = 'charlie' AND [Extent2].[Owner_ID] = 1
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Dogs] AS [Extent3]
WHERE [Extent1].[Shelter_ID] = [Extent3].[Shelter_ID] AND [Extent3].[Name] = 'beta' AND [Extent3].[Owner_ID] = 2
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Dogs] AS [Extent4]
WHERE [Extent1].[Shelter_ID] = [Extent4].[Shelter_ID] AND [Extent4].[Name] = 'alpha' AND [Extent4].[Owner_ID] = 3
))
This query is able to get what I need but I want to know if there is any simpler way of querying it. Because in my actual use case, I have more than just 3 combinations to worry about, it could get up to some crazy combinations like 1000 or more. So just imagine having 1000 subqueries in there so, well, yeah you get the point. When I try querying with that many I get an error saying:
The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions.
NOTE
One solution I tried was using a Pivot to flatten the data and although the query becomes simpler since it would then be just a simple WHERE clause with a number of AND statements but when at some point I get to a higher number number of combinations then I exceed the limit for the allowable max row size and get this error when creating my temporary table to store the flatten data:
Cannot create a row of size 10514 which is greater than the allowable
maximum row size of 8060.
I appreciate any help or thoughts on this matter.
Thanks!
Count them.
WITH dogSet AS (
SELECT *
FROM (
VALUES ('charlie',1),('beta',2),('alpha',3)
) ts(Name,Owner_ID)
)
SELECT Shelter_ID
FROM Dogs AS [Extent1]
JOIN dogSet ts ON ts.Name= [Extent1].name and ts.Owner_ID = [Extent1].Owner_ID
GROUP BY Shelter_ID
HAVING count(*) = (SELECT count(*) n FROM dogSet)

Applying multiple conditions in SQL?

I haven't been able to find a similar question for an answer I'm looking for. What is the best way to apply multiple conditions to my query to exclude certain information. Case or Boolean?
Example code:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
How do I best code for it to return all the tests 1, 2, and 3 while excluding rows for certain visit types (i.e. exclude row ONLY if it is Test 3 AND it is visit Week 26 AND a certain testTypeID)?
<>Not sure what your column names and datatypes are for visitWeek (assuming this is an INT) and testTypeID and what values you want to filter by but here is the logic for it:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
AND (vReport.test NOT IN ('Test 3') AND vReport.testTypeID NOT IN (some value) AND vReport.visitWeek <> 26)
If you can define your exclusions homogeneously, you can store them in another table. Something like:
ExcludedTest
excludedTestId
test
visitType
testTypeId
and your query can be done like this:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport VR
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
AND NOT EXISTS ( SELECT 1 FROM ExcludedTest ET
WHERE ET.testTypeID = VR.testTypeID
AND ET.visitType = VR.visitType
AND ET.test = VR.test)
Also, you should have a better performance if you exclude that OR. One way to do this is to keep submitted as NOT NULL with DEFAULT(0) => vReport.submitted = 0 condition is enough.

In SQL, how to query for rows based on the values in other rows at a certain relative position in the table

I have a database containing events which have a "time" (an integer) plus some other attributes.
E.g.
CREATE TABLE events (time, attr1, attr2);
INSERT INTO events VALUES (1, 'a', 'foo');
INSERT INTO events VALUES (2, 'b', 'bar');
INSERT INTO events VALUES (4, 'a', 'baz');
INSERT INTO events VALUES (9, 'b', 'quux');
INSERT INTO events VALUES (10, 'c', 'foobar');
Now I want to do a somewhat complicated query: I want to find all events which have the property that the next event in the table satisfies some condition. For instance, I might want to find all events that satisfy all these conditions:
attr1 == 'a'
the next event (as determined by the time field) has attr2 == 'bar'
This should return the event at time 1, but not the event at time 4. Or a more complicated example would be: find all events that satisfy
attr1 == 'a'
the next event for which attr1 == 'c' has attr2 == 'foobar'
This would return both the events at times 1 and 4.
It seems like this ought to be possible via some sort of complicated nested select, but I haven't managed to work out how.
Other notes:
I'm using sqlite.
Events are irregularly spaced, so strategies that involve computing the position of the 'next' event won't work.
I know these queries are going to be murder on the query optimizer, that's okay.
I know how to do this by doing multiple selects + non-SQL logic, but I'd much rather do it using pure SQL, because this is embedded in a larger query generation system. I need to be able to generate queries of this form in general, conjoined with other constraints, etc., it's not just a single query I'll write once and be done with.
You can find a record that is the next after some specific time by combining ORDER BY and LIMIT:
SELECT *
FROM events
WHERE time > 1
ORDER BY time
LIMIT 1
By using this in a subquery, you can look up values from the next record.
Your first query can be implemented like this:
SELECT *
FROM events AS e2
WHERE attr1 = 'a'
AND (SELECT attr2
FROM events
WHERE time > e2.time
ORDER BY time
LIMIT 1) = 'bar'
Your second query can be implemented like this (the additional condition belongs into the WHERE of the subquery):
SELECT *
FROM events AS e2
WHERE attr1 = 'a'
AND (SELECT attr2
FROM events
WHERE attr1 = 'c'
AND time > e2.time
ORDER BY time
LIMIT 1) = 'foobar'
The subquery lookups can be made faster with an index on the time column.
select * from events a
where exists
(
select * from events c where c.time =
(select min(b.time) from events b where b.time > a.time)--next_event
and c.attr2 = 'bar'
)
and a.attr1 = 'a'
should be your first query. It returns time 1.
http://sqlfiddle.com/#!2/63baf/12
the second could be :
select * from events a
where exists
(
select * from events c where c.time =
(select min(b.time) from events b where b.time > a.time and attr1 = 'c')
and c.attr2 = 'foobar'
)
and a.attr1 = 'a'
but it returns time 1 and 4 (unlike what you expect, but both these rows comply with your conditions)
http://sqlfiddle.com/#!2/63baf/15
hope this helps
Nicolas

Filter for NULL in list

why does filter for NULL in subqueries does not work?
I hoped to get the correct result by add NULL to the list of allowed values, for example:
SELECT ERP_ServiceProcess.fiStatusNew, RMA.IdRMA
FROM ERP_ServiceProcess RIGHT OUTER JOIN
RMA ON ERP_ServiceProcess.fiRMA = RMA.IdRMA
WHERE (ERP_ServiceProcess.fiStatusNew IN (NULL, 1, 7, 8))
order by ERP_ServiceProcess.fiStatusNew
This gives the incorrect result because all records in RMA that have no records in sub-table ERP_ServiceProcess(where ERP_ServiceProcess.fiStatusNew IS NULL) are dropped.
I must use this (slow) query to get the correct result:
SELECT ERP_ServiceProcess.fiStatusNew, RMA.IdRMA
FROM ERP_ServiceProcess RIGHT OUTER JOIN
RMA ON ERP_ServiceProcess.fiRMA = RMA.IdRMA
WHERE (ERP_ServiceProcess.fiStatusNew IS NULL)
OR (ERP_ServiceProcess.fiStatusNew IN (1, 7, 8))
order by ERP_ServiceProcess.fiStatusNew
Why do i have to use the second, slow query although i used RIGHT OUTER JOIN and i've added NULL to the subquery?
Thank you in advance.
It doesn't work as you expect as it gets expanded to a bunch of equals operations
fiStatusNew = NULL OR fiStatusNew = 1 OR fiStatusNew = 7 OR fiStatusNew = 8
and anything = NULL is unknown.
Given this expansion there's no particular reason to think that adding an additional OR using IS NULL would make things slower on its own (the additional predicate might change the query plan to use a different access path if the statistics lead it to belive that the number of matching rows warrants this though)
You see the same behaviour in the CASE operation
SELECT CASE NULL WHEN NULL THEN 'Yes' ELSE 'No' END /*Returns "No"*/
This is one reason why you should take particular care with the inverse operation NOT IN. If the list contains any NULL values you will always get an empty result set.
fiStatusNew NOT IN (NULL, 1,2)
Would expand to
fiStatusNew<> NULL and fiStatusNew<> 1 and fiStatusNew<> 2
or
Unknown And True/False/Unknown And True/False/Unknown
Which always evaluates to Unknown under three valued logic.
Could you try using
ISNULL(ERP_ServiceProcess.fiStatusNew,0) IN (0, 1, 7, 8)
Untested but might be quicker than the 2nd query.
'ERP_ServiceProcess.fiStatusNew IN (NULL)' evaluates to 'ERP_ServiceProcess.fiStatusNew = NULL' and that always is false. NULL is defined in sql server as 'unknown', not as 'no value'. That's why NULL = NULL or NULL = #var (*) always evaluates to false. If you have two unknowns, you cannot check if they are equal. Only 'is NULL' works.
(*) Well, for sql server, you can set ANSI_NULLS to off but that's not really recommended as it is not standard sql behaviour.