Filter query with a GROUP BY based on column not in GROUP BY statement

Filter query with a GROUP BY based on column not in GROUP BY statement - sql

Given the following table structure and sample data:
+-------------+------+-------------+
| EmployeeID | Name | WorkWeek |
+--------------+-------+-----------+
| 1 | A | 1 |
| 2 | B | 1 |
| 2 | B | 2 |
| 3 | C | 1 |
| 3 | C | 2 |
| 4 | D | 2 |
+--------------+-------+-----------+
I am looking to select all employees that only worked week 1 (so in this example, only employeeid = 1 would be returned. I am able to get the data with the following query:
SELECT EmployeeId, Name
FROM SomeTable
GROUP BY EmployeeId, Name
HAVING SUM ( WorkWeek ) = 1;
To me, the HAVING SUM( WorkWeek ) = 1 is a hack and this should be handled with some form of a GROUP BY and COUNT but I cannot wrap my head around how that query would be structured.
Any help would be useful and enlightening.

HAVING SUM( WorkWeek ) = 1 may work for week 1 or 2, but will fail for week 3 (since 1+2 = 3).
Use NOT EXISTS operator with a subquery instead:
SELECT EmployeeId, Name
FROM SomeTable t1
WHERE NOT EXISTS (
SELECT * FROM SomeTable t2
WHERE t1.EmployeeId = t2.EmployeeId
AND t2.WorkWeek <> 1
)

Actually, that's exactly why the having clause is for - to filter records according to the aggregated values.
From w3schools sql tutorial:
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

Related

How to combine rows and columns?

I have two tables. I need to combine two table.
This First table.
----------------------------
| row_no | Part No |Qty_A |
----------------------------
| 1 | A | 100 |
| 2 | A | 300 |
----------------------------
Second table.
----------------------------
| row_no | Part No |Qty_B |
----------------------------
| 1 | A | 400 |
| 2 | B | 200 |
----------------------------
This is my result:
--------------------------------------
| row_no | Part No | Qty_A | Qty_B |
--------------------------------------
| 1 | A | 100 | 400 |
| 2 | A | 300 | - |
| 2 | B | - | 200 |
--------------------------------------
Two tables was joined by "row_no" and "Part_no" column.
I try to use "LEFT OUTER JOIN" but results not as expected.
SELECT t1.row_no ,t1.part_no ,t1.Qty_A ,t2.Qty_B
FROM
(SELECT 1 as row_no,'A' as part_no,100 as Qty_A) as t1
LEFT OUTER JOIN
(SELECT 1 as row_no, 'B' as part_no,200 as Qty_B) as t2
ON t1.row_no = t2.row_no and t1.part_no = t2.part_no
Sorry for my unclear example.
Update
This is example from a large transaction.
And I need to group it by Part_no and re-arrange by row number like these.

Try below query with union all:
select row_no ,part_no ,Qty_A , '-' as Qty_B from tableA
union all
select row_no ,part_no ,'-' as Qty_A , Qty_B from tableb
or you can try with full outer join:
SELECT t1.row_no ,t1.part_no ,t1.Qty_A ,t2.Qty_B
FROM
(SELECT 1 as row_no,'A' as part_no,100 as Qty_A) as t1
full OUTER JOIN
(SELECT 1 as row_no, 'B' as part_no,200 as Qty_B) as t2
ON t1.row_no = t2.row_no and t1.part_no = t2.part_no

The UNION operator is used to combine the result-set of two or more SELECT statements.
- Each SELECT statement within UNION must have the same number of
columns
- The columns must also have similar data types
- The columns in each SELECT statement must also be in the same order
The first query in the union statement defines the column names.
So in your case you could
select row_no ,part_no ,Qty_A , null as Qty_B from table1
union all
select row_no ,part_no , null, Qty_B from table2

Select non-unique id where no row meets criteria

Say I have this table, and I want to select the IDs where all D is < 4. In this case it would only select ID 1 because 2's D>4, and 3 has a D>4
+----+---+------+
| ID | D | U-ID |
+----+---+------+
| 1 | 1 | a |
+----+---+------+
| 1 | 2 | b |
+----+---+------+
| 2 | 5 | c |
+----+---+------+
| 3 | 5 | d |
+----+---+------+
| 3 | 2 | e |
+----+---+------+
| 3 | 3 | f |
+----+---+------+
I really don't even know where to start making a query for this, and my sql isn't good enough yet to know what to google, so I'm sorry if this has been asked before.

I would simply do:
select id
from table
group by id
having max(d) < 4;
If you happened to want all the original rows, I would use a window function:
select t.*
from (select t.*, max(d) over (partition by id) as maxd
from t
) t
where maxd < 4;

Here's one option using conditional aggregation:
select id
from yourtable
group by id
having count(case when d >= 4 then 1 end) = 0
SQL Fiddle Demo
If you need all the data from the corresponding rows/columns, you can either join back to the table using the above, or alternatively you could use not exists:
select *
from yourtable t
where not exists (
select 1
from yourtable t2
where t.id = t2.id and
t2.d >= 4
)

use this query.
select ID from yourtablename where D < 4;

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.

You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1

As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.

try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

select top 1 with max 2 fields

I have this table :
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 2 |
| 1 | 10 | 5 |
| 2 | 40 | 6 |
| 2 | 50 | 6 |
| 2 | 52 | 1 |
| 3 | 33 | 3 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
I only need the rows where rev AND then class columns have max value.
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 5 |
| 2 | 52 | 1 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
Query cost is important for me.

Just the rows that satisfy the condition that it has both max values?
Here's an SQL Fiddle;
SELECT h.id, h.rev, h.class
FROM ( SELECT id,
MAX( rev ) rev,
MAX( class ) class
FROM Herp
GROUP BY id ) derp
INNER JOIN Herp h
ON h.rev = derp.rev
AND h.class = derp.class;

The fastest way might be to have an index on t(id, rev) and t(id, class) and then do:
select t.*
from table t
where not exists (select 1
from table t2
where t2.id = t.id and t2.rev > t.rev
) and
not exists (select 1
from table t2
where t2.id = t.id and t2.class > t.class
);
SQL Server is pretty smart in terms of optimization, so the aggregation approach might be just as good. However, in terms of performance, this is just a bunch of index lookups.

Here is a SQL 2012 example. Very straight forward with the implied table and the PARTITION function.
Basically, with each ID as a partition/group, sort the values of the other fields in a descending order assigning each one an incrementing RowId, then only take the first one.
select id, rev, [class]
from
(
SELECT id, rev, [class],
ROW_NUMBER() OVER(PARTITION BY id ORDER BY rev DESC, [class] desc) AS RowId
FROM sample
) t
where RowId = 1
Here is the SQL Fiddle
Keep in mind, this works with the criteria in the example dataset, and not the MAX of two fields as stated in the question's title.

I guess you mean: the max of rev and the max of class. If not, please clarify what to do when there is no row where both fields have the highest value.
select id
, max(rev)
, max(class)
from table
group
by id
If you mean total value of rev and class use this:
select id
, max
, rev
from table
where id in
( select id
, max(rev + class)
from table
group
by id
)

Finding records sets with GROUP BY and SUM

I'd like to do a query for every GroupID (which always come in pairs) in which both entries have a value of 1 for HasData.
|GroupID | HasData |
|--------|---------|
| 1 | 1 |
| 1 | 1 |
| 2 | 0 |
| 2 | 1 |
| 3 | 0 |
| 3 | 0 |
| 4 | 1 |
| 4 | 1 |
So the result would be:
1
4
here's what I'm trying, but I can't seem to get it right. Whenever I do a GROUP BY on the GroupID then I only have access to that in the selector
SELECT GroupID
FROM Table
GROUP BY GroupID, HasData
HAVING SUM(HasData) = 2
But I get the following error message because HasData is acutally a bit:
Operand data type bit is invalid for sum operator.
Can I do a count of two where both records are true?

just exclude those group ID's that have a record where HasData = 0.
select distinct a.groupID
from table1 a
where not exists(select * from table1 b where b.HasData = 0 and b.groupID = a.groupID)

You can use the having clause to check that all values are 1:
select GroupId
from table
group by GroupId
having sum(cast(HasData as int)) = 2
That is, simply remove the HasData column from the group by columns and then check on it.

One more option
SELECT GroupID
FROM table
WHERE HasData <> 0
GROUP BY GroupID
HAVING COUNT(*) > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filter query with a GROUP BY based on column not in GROUP BY statement - sql

HAVING SUM( WorkWeek ) = 1 may work for week 1 or 2, but will fail for week 3 (since 1+2 = 3). Use NOT EXISTS operator with a subquery instead: SELECT EmployeeId, Name FROM SomeTable t1 WHERE NOT EXISTS ( SELECT * FROM SomeTable t2 WHERE t1.EmployeeId = t2.EmployeeId AND t2.WorkWeek <> 1 )

Actually, that's exactly why the having clause is for - to filter records according to the aggregated values. From w3schools sql tutorial: The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

Related

How to combine rows and columns?

Select non-unique id where no row meets criteria

SQL : Getting duplicate rows along with other variables

select top 1 with max 2 fields

Finding records sets with GROUP BY and SUM

Categories

Resources