SQL: Getting the full record with the highest count

SQL: Getting the full record with the highest count - sql

I'm trying to write sql that produces the desired result from the data below.
data:
IDNum Opt1 Opt2 Opt3 Count
1 A A E 1
1 A B J 4
2 A A E 9
3 B A F 1
3 B C K 14
4 A A M 3
5 B D G 5
6 C C E 13
6 C C M 1
desired result:
IDNum Opt1 Opt2 Opt3 Count
1 A B J 4
2 A A E 9
3 B C K 14
4 A A M 3
5 B D G 5
6 C C E 13
Essentially I want, for each ID Num, the full record with the highest count. I tried doing a group by, but if I group by Opt1, Opt2, Opt3, this doesn't work because it returns the highest count for each (ID Num, Opt2, Opt3, Opt4) combination which is not what I want. If I only group by ID Num, I can get the max for each ID Num but I lose the information as to which (Opt1, Opt2, Opt3) combination gives this count.
I feel like I've done this before, but I don't often work with sql and I can't remember how. Is there an easy way to do this?

Edit
Prior to op clarifying question for access this would have worked. I am not famillar with access to know if this query would be supported.
I think this will work on SQL Server.
select * from data
inner join (select idnum, max(count) from data
group by idNum )sub
on sub.IdNum=data.IdNum && sub.Count=data.Count
Of course if you have two id's with the same count it would return both rows...

Something like this:
SELECT * FROM table AS t1
JOIN ( SELECT id, max(count) as Id FROM table GROUP BY id ) AS t2
ON t1.id = t2.id AND t1.id = t2.id
This assumes that no idnum has the same max count or you'll get two idnums

Try this query:
SELECT * FROM my_table
GROUP BY IDNum
HAVING Count = MAX(Count)
It should work on Access, but I didn't test it.

Related

Query problem-getting wrong result when a condition or a set of data included

I have bug in my query, but I have no idea what happen.
So there is 3 tables.
table 1
name grade min max
a
b
c
d
e
table 2
fullname name min max
a a123 1 10
bbbb b 2 20
c cccc 3 30
d dd 1 10
E Ed 2 20
table 3
value grade
25 A
15 B
5 C
my goal is using name, show the grade of the name( the max > value in table 3).
for example, c has 30 in max, it should have A grade, instead of B and C.
Also, the name usually is the fullname in table 2, but sometime it is name in the table2(like b)(here is one of bugs). That's how the table look like, I can't change it.
if I am not include the checking table 1.name = table 2.name . no bug at all, but cannot get grade for b
if i include the table 1.name = table 2.name.then, it has problem
for the query of matching the grade, it is like(assume get the min and max from table 2 before)
update table1
set table1.grade = table3.grade
from table1 inner join table3
on table1.max > table3.value
All the cases are include the checking table 1.name = table 2.name
case 1:
the grade will equal = C for all data if there is some data inlcude.
for example, in table1, if I am not include E, then everything is fine.
but if I include E, will get C grade for all records.
case 2:
if I run the query for all data at the same time, the result goes wrong.
it work fine if I update the record one by one.,
for example, i add one more condition in update query
update table1
set table1.grade = table3.grade
from table1 inner join table3
on table1.max > table3.value and fullname='c'
after getting wrong result, i add the condition and run it again,
then c will get grade 'A' instead of 'C'. but if I remove the condition and run the query again.
c will get grade 'C' again.
case 3:
there is no problem when I only run the set of data that will cause case 1 problem independently.
but if I put the data together, It cases problem.
That is all cases. I don't know what cause the problem. Please help
The result should be:
table 1
name grade min max
a C 1 10
b B 2 20
c A 3 30
d C 1 10
e B 2 20
If I remove table1.name = table2.name, result will be
table 1
name grade min max
a C 1 10
b null null null
c A 3 30
d C 1 10
e B 2 20
with table1.name = table2.name, result will be
table 1
name grade min max
a C 1 10
b C 2 20
c C 3 30
d C 1 10
e C 2 20
with table1.name = table2.name but remove e , result will be
table 1
name grade min max
a C 1 10
b B 2 20
c A 3 30
d C 1 10
with table1.name = table2.name but only for e,result will be
name grade min max
e B 2 20
those situation happen when I run the update query for whole table.
there is no problem with table1.name = table2.name if I update each row one by one.

At the very least, you should avoid the way you update the table. You should be carefull with joins on update. If you happen to have more than one value for the same row, the result is not deterministic. Better use this form:
update table1
set table1.grade = (SELECT TOP 1 table3.grade FROM table3
WHERE table3.value < table1.max
ORDER BY table3.value DESC)

I am not sure about the expected results but you may have traced the problem already: incorrect criteria.
Before doing an update, just try a select with the same criteria eg:
select *
from table1 inner join table3
on table1.max > table3.value
and see what you get.

Attributed null values to each ID in Athena (Presto)

Here is what my initial dataset looks like
prof_id id title
1 5 A
1 5 B
1 5 C
1 5 D
2 5 C
2 5 D
2 5 E
NA 5 F
NA 5 G
Here is what the new table should look like:
prof_id id title
1 5 A
1 5 B
1 5 C
1 5 D
1 5 F
1 5 G
2 5 C
2 5 D
2 5 E
2 5 F
2 5 G
Any row with a null value for a prof_id should be attributed to all of the prof_id. I have provided an example where there are two '
prof_id but there are also instances where there are 1 or 0 prof_id.
For 1, all of the null should be attributed to that single prof_id
For 0, leave it as is
I'm new to SQL so I'm not sure how to start. Any guidance would be much appreciated.
Thanks

In this case, you will need to do cross join, where essentially it is going to multiply 2 tables together.
First to pick out all nulls:
select id, title from table where prof_id is null
Then pick out the prof_id you want to apply to all tables
select distinct prof_id from table where prof_is is not null
Do a cross join together, then union the rest of "good" data back
(select distinct prof_id from table where prof_is is not null)
CROSS JOIN
(select id, title from table where prof_id is null)
UNION ALL
(select prof_id, id, title from table where prof_id is not null)

You can generate all the rows using a cross join. Then use union all to combine this with the rest of the data.
The following syntax should work:
select p.prof_id, i.id, t.title
from (select distinct prof_id
from t
where prof_id <> 'NA' -- or do you mean is not null
) p cross join
(select distinct id from t) i cross join
(select distinct title
from t
where prof_id = 'NA' -- or is null
) t
union all
select prof_id, id, title
from t
where prof_id <> 'NA' -- or is not null

Taking Sample in SQL Query

I'm working on a problem which is something like this :
I have a table with many columns but major are DepartmentId and EmployeeIds
Employee Ids Department Ids
------------------------------
A 1
B 1
C 1
D 1
AA 2
BB 2
CC 2
A1 3
B1 3
C1 3
D1 3
I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID.
like
Employee Id Dept Ids
B 1
C 1
AA 2
CC 2
D1 3
A1 3
Currently I am writing the query,
select
EmployeeId, DeptIds, count(*)
from
table_name
group by 1,2
sample 2
but it gives me total two rows.
Any help?

If the number of departments i know and small you could do a stratified sampling:
select *
from table_name
sample
when DeptIds = 1 then 2
when DeptIds = 2 then 2
when DeptIds = 3 then 2
end
Otherwise a combination of RANDOM and ROW_NUMBER:
select *
from
(
sel EmployeeId, DeptIds, random(1,10000000) as rand
from table_name
) as dt
qualify
row_number()
over (partition by DeptIds
order by rand) <= 2

SQL - only select results that do not have a specific status value in a corresponding table

I have two tables, and I only want to get the Student IDs where they have perfect attendance for all months (they do not have a PerfectAttendance value of N for any month). These tables will have hundreds of millions of rows, so I was trying to come up with an approach that doesn't require a full separate subquery. If anyone has any recommendations, please let me know:
Table Student:
ID Name
------------
1 A
2 B
Table Attendance:
ID Month PerfectAttendance
---------------------------------
1 1 Y
1 2 Y
1 3 Y
1 4 Y
1 5 Y
1 6 Y
1 7 Y
1 8 Y
1 9 Y
1 10 Y
1 11 Y
1 12 Y
2 1 Y
2 2 Y
2 3 Y
2 4 Y
2 5 Y
2 6 Y
2 7 Y
2 8 Y
2 9 Y
2 10 Y
2 11 Y
2 12 N

SELECT *
FROM dbo.Student S
WHERE NOT EXISTS(SELECT 1 FROM dbo.Attendance
WHERE PerfectAttendance = 'N'
AND ID = S.ID);

My suggestion for this would be to query the table and get the number of months that each student has perfect attendance. Once you've done that, you can filter on the count being 12 (since there are twelve months).
Try this:
SELECT s.id, s.name, COUNT(*) AS numPerfectMonths
FROM student s JOIN attendence a ON s.id = a.id
WHERE a.perfectAttendance = 'Y'
GROUP BY s.id
HAVING COUNT(*) = 12;
Here is the SQL Fiddle for you.
EDIT
I made the assumption you will have 12 rows for each student. However, let's say you ran this in October and you want to see which students have a perfect attendance up to that point. You can use a subquery to pull for students without perfect attendance, and filter them out using NOT IN like so:
SELECT id
FROM student
WHERE id
NOT IN(SELECT s.id
FROM student s JOIN attendance a ON s.id = a.id
WHERE a.perfectAttendance = 'N'
GROUP BY s.id
HAVING COUNT(*) > 0);
Have an updated SQL Fiddle. To test this one, try deleting one of the rows for id number 1, and you'll still see that they are returned with perfect attendance.

Assuming you have 12 records per student in attendance table based on your data , you can do it with GROUP BY and HAVING clause.
SELECT S.ID, S.NAME
FROM Student S
JOIN Attendance A
on S.ID = A.ID
AND A.PerfectAttendance = 'Y'
GROUP BY S.ID, S.NAME
HAVING COUNT(*) = 12

I think Lamak's answer is probably the clearest and best-performing, but here is another variation on the GROUP BY method suggested by others, when you don't specifically look for a total of 12 months:
;WITH PerfectAttendance AS (
SELECT a.id
FROM Attendance a
GROUP BY a.id
HAVING MIN(a.PerfectAttendance) = 'Y'
)
SELECT s.id, s.Name
FROM PerfectAttendance p
JOIN Student s ON p.id = s.id;

Select row values that are in repeated rows that have a colum with at least 3 different values

I have a table like this
Ma Mi
-----
A b
A c
A c
A d
B b
B a
B a
B a
B a
C a
C b
I want to make a select that outputs only the "Ma" values that have at least 3 distinct "Mi" values, i've tryed group by and distinct unsuccesfully since i want to group first "Mi" and then count "Ma" with distinct "Mi".
So the intermediate step would be :
Ma Mi
-----
A b
A c
A d
B b
B a
C a
C b
So then i can count the rows per each Ma
In this case, the result would be
Ma num
------
A 3
B 2
C 2
And I could select only A because is the unique equal or higher to 3.
RESULT:
Ma num
------
A 3
Thank you in advanced !
L.

To check the results of a GROUP BY operation, you must use HAVING:
SELECT Ma,
COUNT(DISTINCT Mi) AS Num
FROM ATableLikeThis
GROUP BY Ma
HAVING COUNT(DISTINCT Mi) >= 3

Kindly find query as per your requirement:
Select Distinct MyTable.Ma,
(Select Count(Distinct MyTableAlias.Mi) From MyTable As MyTableAlias Where MyTableAlias.Ma = MyTable.Ma) As Num from MyTable WHere (Select Count(Distinct MyTableAlias.Mi) From MyTable As MyTableAlias Where MyTableAlias.Ma = MyTable.Ma) >= 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Getting the full record with the highest count - sql

Something like this: SELECT * FROM table AS t1 JOIN ( SELECT id, max(count) as Id FROM table GROUP BY id ) AS t2 ON t1.id = t2.id AND t1.id = t2.id This assumes that no idnum has the same max count or you'll get two idnums

Try this query: SELECT * FROM my_table GROUP BY IDNum HAVING Count = MAX(Count) It should work on Access, but I didn't test it.

Related

Query problem-getting wrong result when a condition or a set of data included

Attributed null values to each ID in Athena (Presto)

Taking Sample in SQL Query

SQL - only select results that do not have a specific status value in a corresponding table

Select row values that are in repeated rows that have a colum with at least 3 different values

Categories

Resources