SQL query using aggregate function

SQL query using aggregate function - sql

Taking the following table EXAMPLE:
Name | List | FlagByList
---------------------------
A | 1 | Y
A | 2 | Y
B | 1 | Y
B | 2 | N
C | - | -
C | - | -
I want to return the Names which have 'Y' in all Lists AND the Names which are not present in any list.

A simple aggregate query can do this.
SELECT Name
FROM Table
GROUP BY Name
HAVING COUNT(1) = COUNT(CASE FlagByList WHEN 'Y' THEN 1 END) --counts all rows with Y as Value
OR COUNT(1) = COUNT(CASE WHEN FlagByList IS NULL THEN 1 END); --counts all rows with NULL as value

With decode
select name from example
group by Name
having sum(decode( FlagByList, 'Y',1, 0)) = count(*)
OR sum(decode(List, NULL, 0, 1)) = count(*)

Please make use of the below code. Its working fine with SQL server 2012.
DECLARE #Table TABLE (Name char(2),List char(2) , FlagByList char(2))
INSERT #Table
(Name,List,FlagByList)
VALUES
('A','1','Y'),
('A','2','Y'),
('B','1','Y'),
('B','2','N'),
('C','-','-'),
('C','-','-')
SELECT DISTINCT(Name) FROM #Table WHERE FlagByList ='Y'
UNION
SELECT DISTINCT(Name) FROM #Table WHERE FlagByList ='-'
EXCEPT
SELECT DISTINCT(Name) FROM #Table WHERE FlagByList ='N'

Related

CASE expression on multiple columns

I have a table with below mentioned columns and values
StudentId | Geography | History | Maths
_______________________________________________
1 | NULL | 25 | NULL
2 | 20 | 23 | NULL
3 | 20 | 22 | 21
I need the output like below:
StudentId | Subject
___________________________
1 | History
2 | Geography
2 | History
3 | Geography
3 | History
3 | Maths
Wherever the value in subject columns (Geography, History and Maths) is NON NULL, I need the 'subject' value of the recepective column name.
I have an idea to pull it for one column using CASE, but not sure how to do it for multiple columns.
Here is what I tried:
SELECT StudentId, CASE WHEN IsNUll(Geography, '#NULL#') <> '#NULL#' THEN 'Geography'
CASE WHEN IsNUll(History, '#NULL#') <> '#NULL#' THEN 'History'
CASE WHEN IsNUll(Maths, '#NULL#') <> '#NULL#' THEN 'Maths' END Subject
FROM MyTable

You need to normalise your data. You can do this with a VALUES operator:
--Create sample data
WITH YourTable AS(
SELECT V.StudentID,
V.[Geography],
V.History,
V.Maths
FROM (VALUES(1,NULL,25,NULL),
(2,20,23,NULL),
(3,20,22,21))V(StudentID,[Geography], History, Maths))
--Solution
SELECT YT.StudentID,
V.[Subject]
FROM YourTable YT
CROSS APPLY (VALUES('Geography',YT.[Geography]),
('History',YT.History),
('Maths',YT.Maths))V([Subject],SubjectMark)
WHERE V.SubjectMark IS NOT NULL
ORDER BY YT.StudentID;
DB<>Fiddle

Use union all
select subjectid, Geography from table
union all
select subjectid, history from table
union all
select subjectid, Maths from table

You can use UNPIVOT. It shows you all grades row by row. Below code works fine
SELECT * FROM MyTable t
UNPIVOT
(
[Grade] FOR [Subject] IN ([Geography], [History], [Maths])
) AS u

bitwise comparison in bit columns

I have a database table with columns shaped as following:
| ID | name | A | B | C | D |
| 1 | foo | 1 | 0 | 0 | 1 |
| 2 | bar | 0 | 0 | 1 | 1 |
| 3 | foo | 1 | 1 | 0 | 0 |
| 4 | bar | 1 | 1 | 0 | 0 |
A, B, C and D are bit columns.
I need to get the name values of the rows of which there at least two and that both have at least one identical bit column set to true. the result set I want to get for the given example is as following:
| name |
| foo |
I can do the following:
SELECT l.name
FROM dummy l
INNER JOIN dummy r ON l.name = r.name
WHERE (l.A = 1 AND r.A = 1)
OR (l.B = 1 AND r.B = 1)
OR (l.C = 1 AND r.C = 1)
OR (l.D = 1 AND r.D = 1)
GROUP BY l.name
HAVING COUNT(*) > 1
But this gets unreadable soon since the table is massive. I was wondering if there was a bitwise solution to solve this

I suspect that your data model is wrong. It feels like A-D represent the same "type" of thing and so the data ought to be represented using a single column that contains the data values A-D and (if necessary) one column to store the 1 or 0, with separate rows for each A-D value. (But then, of course, we can use the presence of a row to indicate a 1 and the absence of the row to represent a 0).
We can use UNPIVOT to get this "better" structure for the data and then the query becomes trivial:
declare #t table (ID int not null, name char(3) not null, A bit not null, B bit not null,
C bit not null, D bit not null)
insert into #t(ID,name,A,B,C,D) values
(1,'foo',1,0,0,1),
(2,'bar',0,0,1,1),
(3,'foo',1,1,0,0),
(4,'bar',1,1,0,0)
;With ProperLayout as (
select ID,Name,Property,Value
from #t t
unpivot (Value for Property in (A,B,C,D)) u
where Value = 1
)
select name,Property
from ProperLayout
group by name,Property
having COUNT(*) > 1
Result:
name Property
---- ---------
foo A
(Note also that the top of my script is not much different in size to the sample data in your question but has the massive benefit that it's runnable)

In similar way you could also use Apply opertaor
SELECT a.name FROM table t
CROSS APPLY (
VALUES (name, 'A', A), (name, 'B', B), (name, 'C', C), (name, 'D', D)
)a(name , names , value)
WHERE a.value = 1
GROUP BY a.name, a.Names, a.value
HAVING COUNT(*) > 1

From your description, you seem to want:
SELECT l.name
FROM dummy l
GROUP BY l.name
HAVING SUM( CAST(A as int) ) >= 2 OR
SUM( CAST(B as int) ) >= 2 OR
SUM( CAST(C as int) ) >= 2 OR
SUM( CAST(D as int) ) >= 2 ;
This is based on the description. I don't know what the same result row has to do with the question.

It is not hard to read. It is just long.
This would be more efficient:
SELECT distinct l.name
FROM dummy l
INNER JOIN dummy r
ON l.name = r.name
and l.id < r.id
and ( (l.A = 1 AND r.A = 1)
OR (l.B = 1 AND r.B = 1)
OR (l.C = 1 AND r.C = 1)
OR (l.D = 1 AND r.D = 1)
)
order by l.name
You could build it up reading sys.columns
I don't think TSQL has any bitwise operators.

Faster SQL query with CASE in JOIN instead of CASE in SELECT statement of query?

I have a view of CommunityMembers where each has a primary key for ID. Some also have old ID's from another system and some have a spouse ID. All ID's are unique.
e.g.:
ID | Name | OldID | SpouseID | SpouseName
1 | John.Smith | o71 | s99 | Jenna.Smith
2 | Jane.Doe | o72 | |
3 | Jessie.Jones | |
I also have a view of ActivityDates where each Community member can have multiple activity dates. There are activity dates for old ID's and for Spouse ID's. (Unfortunately I can't clean the data up by converting old to new ID's)
e.g.:
ID | ActivityDate | ActiviyType | ActivityGroup
1 | 2017-12-31 | 1 | 1
1 | 2017-12-31 | 3 | 2
1 | 2017-12-31 | 7 | 1
2 | 2017-12-31 | 1 | 1
3 | 2017-12-31 | 1 | 1
o72 | 2010-12-31 | 1 | 2
o72 | 2010-12-31 | 3 | 1
s99 | 2017-12-31 | 1 | 1
s99 | 2017-12-31 | 2 | 1
I can select the data in the way I need it using the following method having multiple case selects running 3 times to check the 3 possible ID's though it is very slow because it is running a select query multiple times per record:
SELECT
C.ID,
C.Name,
C.OldID,
C.SpouseID,
C.SpouseName,
CASE
WHEN C.ID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.OldID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.SpouseID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
THEN 'Yes'
ELSE ''
END AS Result i.e. HasTheCommunityMemberOrTheirSpouseOnlyEverAttendedActivityTypeAndGroup1After2016?
So I would expect the following results, which I get, it is just slow:
ID | Name | OldID | SpouseID | SpouseName | Result
1 | John.Smith | o71 | s99 | Jenna.Smith |
2 | Jane.Doe | o72 | | | Yes
3 | Jessie.Jones | | | | Yes
I appreciate that there are better ways to do this which I'm happy to hear suggestions on though I have limited flexibility in changing this system so that aside all I am asking is how can I make this faster? Ideally I want to use a join to the table and use conditions off that though I can't work it out. e.g.
SELECT
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
R.Result
FROM
CommunityMembers C
JOIN
CASE WHEN Date ... Type ... Group ... ELSE ... IN ... Not Exist ... THEN ... ActivityDates R
or
SELECT
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
CASE
WHEN R.Date ... R.Type ... R.Group ... ELSE ... THEN 'Yes' END AS Result
FROM
CommunityMembers C
JOIN
ActivityDates R
I suspect I need to make multiple joins though I don't know how to write it.
Thank you

Index is just like this:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
see this link for more details

You want information from table ActivityDates per ID. So group by ID and filter the desired IDs in HAVING:
SELECT ID
FROM ActivityDates
WHERE ActivityDate > '2016-12-31'
GROUP BY ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
You can use this with an EXISTS clause:
select
c.*,
case when exists
(
SELECT a.ID
FROM ActivityDates a
WHERE a.ActivityDate > '2016-12-31'
AND a.ID in (c.id, c.oldid, c.spouseid)
GROUP BY a.ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
) then 'Yes' else '' end as result
from c;
Appropriate indexes to speed this up may be
create index idx1 on ActivityDates (ID, ActivityDate, ActivityType, ActivityGroup);
create index idx2 on ActivityDates (ActivityDate, ID, ActivityType, ActivityGroup);
Find out whether one of them gets used and drop the other (or both in case None gets used).
It is possible that using the subquery non-correlated (which means we must access it multiple times) performs better. It depends on the optimizer if it even comes to a different execution plan:
with good_ids as
(
select id
from activitydates
where activitydate > '2016-12-31'
group by id
having count(case when activiytype = 1 and activiygroup = 1 then 1 end) > 1
and count(case when activiytype > 1 and activiygroup > 1 then 1 end) = 0
)
select
c.*,
case when id in (select id from good_ids)
or oldid in (select id from good_ids)
or spouseid in (select id from good_ids)
then 'Yes' else ''
end as result
from c;

You should try to explain the output .It is difficult to find the correct biz. rule from wrong query.
This way you get best query from here.Just try explaning again that why id 2,3 is yes.Then i will rewrite my query.
Second biggest mistake you are about to commit is that without understanding your biz. rule ,without writing correct query,you are going to create index
Try this,
declare #t table(ID varchar(20),Name varchar(40),OldID varchar(20), SpouseID varchar(20)
, SpouseName varchar(40))
insert into #t VALUES
('1','John.Smith','o71' ,'s99','Jenna.Smith')
,('2','Jane.Doe' ,'o72',null,null)
,('3','Jessie.Jones',null,null,null)
--select * from #t
declare #ActivityDates table(ID varchar(20), ActivityDate date
, ActiviyType int, ActivityGroup int)
insert into #ActivityDates VALUES
('1','2017-12-31',1, 1)
,('1','2017-12-31',3, 2)
,('1','2017-12-31',7, 1)
,('2','2017-12-31',1, 1)
,('3','2017-12-31',1, 1)
,('o72','2010-12-31',1, 2)
,('o72','2010-12-31',3, 1)
,('s99','2017-12-31',1, 1)
,('s99','2017-12-31',2, 1)
SELECT t.*
,case when tbl.id is not null then 'Yes' else null end Remarks
from #t t
left JOIN
(select * from #ActivityDates AD
WHERE(( ActivityDate > '2016-12-31' AND ActiviyType = 1 AND ActivityGroup = 1
AND NOT EXISTS (SELECT ID FROM #ActivityDates ad1 WHERE (ad.id=ad1.id) AND
ActivityDate > '2016-12-31' AND (ActiviyType > 1 or ActivityGroup > 1))
)
))tbl
on t.ID=tbl.ID

Here is another pattern for utilising 'optional joins' that may or may not perform better. It's not quite the same as your output - I'm not sure what you're after there.
SELECT A.*,
COALESCE(C1.Name, C2.Name, C3.Name) As Name
FROM ActivityDates A
LEFT OUTER JOIN CommunityMember As C1
ON C1.ID = A.ID
LEFT OUTER JOIN CommunityMember As C2
ON C2.OldID = CAST(A.ID AS VARCHAR(12))
LEFT OUTER JOIN CommunityMember As C3
ON C2.SpouseID = CAST(A.ID AS VARCHAR(12))
There are cases where this will 'double count' but if you are certain that the entire collection of id's is unique you should be fine. If you only want to know if an activity record exists you can definitely speed this up by using exists but again I don't follow your logic.

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.

You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;

If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group

Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

SQL:Query to check if a column meets certain criteria, if it does perform one action if it doesn't perform another

I have found it quite hard to word what I want to do in the title so I will try my best to explain now!
I have two tables which I am using:
Master_Tab and Parts_Tab
Parts_Tab has the following information:
Order_Number | Completed| Part_Number|
| 1 | Y | 64 |
| 2 | N | 32 |
| 3 | Y | 42 |
| 1 | N | 32 |
| 1 | N | 5 |
Master_Tab has the following information:
Order_Number|
1 |
2 |
3 |
4 |
5 |
I want to generate a query which will return ALL of the Order_Numbers listed in the Master_Tab on the following conditions...
For each Order_Number I want to check the Parts_Tab table to see if there are any parts which aren't complete (Completed = 'N'). For each Order_Number I then want to count the number of uncompleted parts an order has against it. If an Order_Number does not have uncompleted parts or it is not in the Parts_Table then I want the count value to be 0.
So the table that would be generated would look like this:
Order_Number | Count_of_Non_Complete_Parts|
1 | 2 |
2 | 1 |
3 | 0 |
4 | 0 |
5 | 0 |
I was hoping that using a different kind of join on the tables would do this but I am clearly missing the trick!
Any help is much appreciated!
Thanks.

I have used COALESCE to convert NULL to zero where necessary. Depending on your database platform, you may need to use another method, e.g. ISNULL or CASE.
select mt.Order_Number,
coalesce(ptc.Count, 0) as Count_of_Non_Complete_Parts
from Master_Tab mt
left outer join (
select Order_Number, count(*) as Count
from Parts_Tab
where Completed = 'N'
group by Order_Number
) ptc on mt.Order_Number = ptc.Order_Number
order by mt.Order_Number

You are looking for a LEFT JOIN.
SELECT mt.order_number, count(part_number) AS count_noncomplete_parts
FROM master_tab mt LEFT JOIN parts_tab pt
ON mt.order_number=pt.order_number AND pt.completed='N'
GROUP BY mt.order_number;
It is also possible to put pt.completed='N' into a WHERE clause, but you have to be careful of NULLs. Instead of the AND you can have
WHERE pt.completed='N' OR pr.completed IS NULL

SELECT mt.Order_Number SUM(tbl.Incomplete) Count_of_Non_Complete_Parts
FROM Master_Tab mt
LEFT JOIN (
SELECT Order_Number, CASE WHEN Completed = 'N' THEN 1 ELSE 0 END Incomplete
FROM Parts_Tab
) tbl on mt.Order_Number = tbl.Order_Number
GROUP BY mt.Order_Number
Add a WHERE clause to the outer query if you need to filter for specific order numbers.

I think it's easiest to get a subquery in there. I think this should be self-explanitory, if not feel free to ask any questions.
CREATE TABLE #Parts
(
Order_Number int,
Completed char(1),
Part_Number int
)
CREATE TABLE #Master
(
Order_Number int
)
INSERT INTO #Parts
SELECT 1, 'Y', 64 UNION ALL
SELECT 2, 'N', 32 UNION ALL
SELECT 3, 'Y', 42 UNION ALL
SELECT 1, 'N', 32 UNION ALL
SELECT 1, 'N', 5
INSERT INTO #Master
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6
SELECT M.Order_Number, ISNULL(Totals.NonCompletedCount, 0) FROM #Master M
LEFT JOIN (SELECT P.Order_Number, COUNT(*) AS NonCompletedCount FROM #Parts P
WHERE P.Completed = 'N'
GROUP BY P.Order_Number) Totals ON Totals.Order_Number = M.Order_Number

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query using aggregate function - sql

Taking the following table EXAMPLE: Name | List | FlagByList --------------------------- A | 1 | Y A | 2 | Y B | 1 | Y B | 2 | N C | - | - C | - | - I want to return the Names which have 'Y' in all Lists AND the Names which are not present in any list.

A simple aggregate query can do this. SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) = COUNT(CASE FlagByList WHEN 'Y' THEN 1 END) --counts all rows with Y as Value OR COUNT(1) = COUNT(CASE WHEN FlagByList IS NULL THEN 1 END); --counts all rows with NULL as value

With decode select name from example group by Name having sum(decode( FlagByList, 'Y',1, 0)) = count() OR sum(decode(List, NULL, 0, 1)) = count()

Related

CASE expression on multiple columns

bitwise comparison in bit columns

Faster SQL query with CASE in JOIN instead of CASE in SELECT statement of query?

SELECT First Group

SQL:Query to check if a column meets certain criteria, if it does perform one action if it doesn't perform another

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query using aggregate function - sql

Taking the following table EXAMPLE: Name | List | FlagByList --------------------------- A | 1 | Y A | 2 | Y B | 1 | Y B | 2 | N C | - | - C | - | - I want to return the Names which have 'Y' in all Lists AND the Names which are not present in any list.

A simple aggregate query can do this. SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) = COUNT(CASE FlagByList WHEN 'Y' THEN 1 END) --counts all rows with Y as Value OR COUNT(1) = COUNT(CASE WHEN FlagByList IS NULL THEN 1 END); --counts all rows with NULL as value

With decode select name from example group by Name having sum(decode( FlagByList, 'Y',1, 0)) = count(*) OR sum(decode(List, NULL, 0, 1)) = count(*)

Related

CASE expression on multiple columns

bitwise comparison in bit columns

Faster SQL query with CASE in JOIN instead of CASE in SELECT statement of query?

SELECT First Group

SQL:Query to check if a column meets certain criteria, if it does perform one action if it doesn't perform another

Categories

Resources

With decode select name from example group by Name having sum(decode( FlagByList, 'Y',1, 0)) = count() OR sum(decode(List, NULL, 0, 1)) = count()