SQL, what aggregation logic makes different results? - sql

SQL1 returns lines with aggreated names while SQL2 returns the non-aggreated.
Question is what's the difference of aggregation logic when executing the two SQLs. Thanks.
SQL1
SELECT
name,
CASE WHEN COUNT(CASE WHEN course = 'SQL' THEN 1 END) > 0 THEN 'o' END AS SQL,
CASE WHEN COUNT(CASE WHEN course = 'UNIX' THEN 1 END) > 0 THEN 'o' END AS UNIX,
CASE WHEN COUNT(CASE WHEN course = 'Java' THEN 1 END) > 0 THEN 'o' END AS Java
FROM Courses
GROUP BY name;
SQL2
SELECT name,
CASE WHEN course = 'SQL' THEN '○' ELSE NULL END s,
CASE WHEN course = 'UNIX' THEN '○' ELSE NULL END u,
CASE WHEN course = 'Java' THEN '○' ELSE NULL END j
FROM Courses
GROUP BY name,course;
Create Table
CREATE TABLE Courses
(name VARCHAR(32),
course VARCHAR(32),
PRIMARY KEY(name, course));
INSERT INTO Courses VALUES('Tom', 'SQL');
INSERT INTO Courses VALUES('Tom', 'UNIX');
INSERT INTO Courses VALUES('Jack', 'SQL');
INSERT INTO Courses VALUES('Mike', 'SQL');
INSERT INTO Courses VALUES('Mike', 'Java');
INSERT INTO Courses VALUES('Jane', 'UNIX');
INSERT INTO Courses VALUES('Mary', 'SQL');

I would say that difference in logic is obvious, in first query you group by just name.
GROUP BY name
Basically, you are saying group all rows with same name as one row.
In second query you group by name and course.
GROUP BY name,course
Which means, all rows with same name and same course should be one row.

Related

Conditional SQL logic

I have a simple table of voting frequencies of registered voters
create table public.campaign_202206 (
registrant_id INTEGER not null references votecal.voter_registration (registrant_id),
voting_frequency smallint
);
I want to insert values into this table with the count of elections that the voter has participated in among the past four elections:
insert into campaign_202206 (
select registrant_id, count(*)
from votecal.voter_participation_history
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
However, if the count is 1, then I want to look at the participation from five elections ago on '2018-06-05' and if there is no participation in that election, I want to store the voting_frequency as 0 instead of 1.
insert into campaign_202206 (
select
registrant_id,
case
when count(*) = 1 then --- what goes here?
else count(*)
end as voting_frequency
from votecal.voter_participation_history
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
What would go in this case-when-then to get the value for this special case?
Use a correlated subquery as foloows:
insert into campaign_202206 (
select
registrant_id,
case when count(*) = 1 then
(
select count(*)
from votecal.voter_participation_history sqvph
where sqvph.election_date = '2018-06-05'
and sqvph.registrant_id = vph.registrant_id
)
else count(*)
end as voting_frequency
from votecal.voter_participation_history vph
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
The resultset providers in the query need aliases for this to work.
User nested case:
insert into campaign_202206 (
select
registrant_id,
case
when count(*) = 1 then
case
when (select count(*) from voter_participation_history
where election_date in ('2018-06-05') and registrant_id
= v1.registrant_id) > 0
then 1
else 0
end
else count(*)
end as voting_frequency from voter_participation_history v1 where
election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by v1.registrant_id);

How to combine values in table

The database has the schema students(name TEXT, score INTEGER), and there is a table called grades:
Grade MIN_score MAX_score
A 4 5
B 3 4
C 2 3
I want to select the names of all students and their grades according to the table, and turn A and B to 'pass' in the resulting table.
Below is my partial solution without turning A and B to 'pass' in the resulting table, and I wonder how to achieve that additional function.
SELECT name, grade
FROM students
LEFT JOIN grades
ON grade BETWEEN MIN_score and MAX_score;
Don't use between. You'll get duplicates.
select s.name, s.score,
(case when g.grade in ('A', 'B') then 'Pass' end) as status
from students s join
grades g
on s.score > g.MIN_score and s.score <= MAX_score;
You need to be very careful about the join condition so a score of "4" is not treated as both an "A" and a "B" (as between would do).
You need to use case when expression, e.g.:
select case when grade in ('A', 'B') then 'Pass' else '' end
I believe you query should be something like this:
select name, score, case when grade in ('A', 'B') then 'Pass' else '-' end
from students
join grades on score between MIN_score and MAX_score

Condition while aggregation in Spark

This question is related to conditional aggregation on SQLs. Normally we put conditions using 'case' statement in select clause but that case condition checks only the row under consideration. Consider the below data:
BEGIN TRANSACTION;
/* Create a table called NAMES */
CREATE TABLE NAMES(M CHAR, D CHAR, A INTEGER);
/* Create few records in this table */
INSERT INTO NAMES VALUES('M1','Y',2);
INSERT INTO NAMES VALUES('M1','Y',3);
INSERT INTO NAMES VALUES('M2','Y',2);
INSERT INTO NAMES VALUES('M2',null,3);
INSERT INTO NAMES VALUES('M3',null,2);
INSERT INTO NAMES VALUES('M3',null,3);
COMMIT;
This query groups using column 'M' and checks if column 'D' is null or not (separately for each record) and put a sum aggregation on column 'A'.
select sum(case when D = 'Y' then 0 else A end) from NAMES group by M;
Output for this query is:
M1|0
M2|3
M3|5
But if we want to check column 'D' for each record in the group if it is null. If any of the records is 'Y' in the group, do not perform 'sum' aggregation at all.
In brief, the expected output for the above scenario is:
M1|0
M2|0
M3|5
Answers in Spark SQL are highly appreciated.
You can use another case expression:
select (case when max(D) = min(D) and max(D) = 'Y' -- all the same
then sum(case when D = 'Y' then 0 else A end)
else 0
end)
from NAMES
group by M;

SQL server mismach columns

I'm trying to write a query for SQL Server.
must sum two separate columns
if the column Debe is less than the column Haber should print "mismatch"
The explanation is :
There are two tables ContD and Cont with a common column ID
The table ContD has two important columns, Debe and Haber.
The table Cont has one important column, Importe
If the sum of Debe is not equal to the sum of Haber should report printing a message on another column.
If the final number Debe and Haber are equal, compared to the column Importe of the table Cont and print coincidence in another column
http://www.grupoalta.com/wp-content/uploads/queryconciliacion.png
This should do the trick:
Note: I have commented the script that you actually want to use, the uncommented script is a quick sample that I have made out of the screenshot you shared, but as #Horaciux said, please, include a sample the next time.
Note 2: I supposed that the id in your table Cont is the primary key or it is just Unique, meanwhile the id in the table contD is a foreign key or it just doesn't have the constraint of Uniqueness.
DECLARE #t1 AS TABLE (id NUMERIC,debe DECIMAL(18,2),haber DECIMAL(18,2))
DECLARE #t2 AS TABLE (id NUMERIC,importe DECIMAL(18,2))
INSERT INTO #t1
SELECT 10887,NULL,603.2 UNION ALL
SELECT 10887,83.2,NULL UNION ALL
SELECT 10887,520,NULL UNION ALL
SELECT 10888,NULL,21.344 UNION ALL
SELECT 10888,18.40,NULL UNION ALL
SELECT 10888,2.944,NULL
INSERT INTO #t2
SELECT 10887,603.2 UNION ALL
SELECT 10888,150
SELECT id
,SUM(debe) 'Debe'
,SUM(haber) 'Haber'
,(SELECT importe FROM #t2 where id=t1.id) 'Importe'
,CASE WHEN SUM(debe)=SUM(haber) THEN '' ELSE 'not equal' END 'Debe=Haber'
,CASE WHEN (SUM(debe)=SUM(haber) AND (SELECT importe FROM #t2 WHERE id=t1.id)=SUM(debe)) THEN 'Coincidence' ELSE '' END 'Debe=Haber=Importe'
FROM #t1 t1
GROUP BY id
/*
SELECT id
,SUM(debe) 'Debe'
,SUM(haber) 'Haber'
,(SELECT importe FROM #t2 where id=t1.id) 'Importe'
,CASE WHEN SUM(debe)<>SUM(haber) THEN 'not equal' ELSE '' END 'Debe=Haber'
,CASE WHEN (SUM(debe)=SUM(haber) AND (SELECT importe FROM cont WHERE id=t1.id)=SUM(debe)) THEN 'Coincidence' ELSE '' END 'Debe=Haber=Importe'
FROM contd t1
GROUP BY id
*/
here is an approach that does the comparison and returns one value, depending on the nature of the match. This uses a case statement:
select cd.id,
(case when cd.sumdebe <> cd.sumhaber then 'not equals'
when cd.sumdebe = c.importe then 'all same'
else 'mismatch'
end)
from (select id, sum(debe) as sumdebe, sum(haber) as sumhaber
from contd
group by id
) cd left outer join
cont c
on cd.id = c.id;

SQL using if statement in stored procedure to update a table

I have a table Students with two fields, StudentName and Grade.
I am trying to write a stored procedure to update the Grade. If the student has an A, I want to change it to B. If they have a B, I want to change it to A. If they have anything else I want to leave it alone. Here is my best attempt
create procedure sp_changegrades
if Grade = 'A' update Students set Grade = 'B'
else if Grade = 'B' update Students set Grade = 'A'
just use CASE
UPDATE Students
SET Grade =
(
CASE WHEN Grade = 'A' THEN 'B'
WHEN Grade = 'B' THEN 'A'
ELSE Grade -- "If they have anything else I want to leave it alone."
END
)
or
UPDATE Students
SET Grade =
(
CASE WHEN Grade = 'A'
THEN 'B'
ELSE 'A'
END
)
WHERE Grade IN ('A','B')
You can utilize a Case statement, and add a where clause so you only update the relavant rows.
UPDATE Students
SET Grade =
(
CASE WHEN Grade = 'A' THEN 'B'
WHEN Grade = 'B' THEN 'A'
ELSE Grade -- "Included for Completeness, should never be utilized."
END
)
WHERE Grade in ('A','B')
you can write smth like this. In this solution you define rules of updating in join part and then updating.
create procedure sp_changegrades
as
begin
update Students set
Grade = G.Grade_New
from Students as S
inner join (values
('A', 'B'),
('B', 'A')
) as G(Grade_Old, Grade_New) on G.Grade_Old = S.Grade
end
or you can use case
create procedure sp_changegrades
as
begin
update Students set
Grade =
case Grade
when 'A' then 'B'
when 'B' then 'A'
else Grade
end
end