SQL for balance due without grouping all select - sql

I've provided a summary of my tables and I've gotten a good start on the SQL, but I'm stuck on figuring out how to limit how many items are returned. I should be able to choose one or more terms, and get back the balance due from just those terms.
A student should have 1 record, they can have several reservations over several terms, but payment is not specific to the reservation but rather to the student. That is the part that is throwing me off.
Table structure, date and the start of my SQL follow. Can someone help me with it please? This result should not be showing the $500 payment from term 3 for Sue Smith.
I'm using PostgreSQL, but I think this is a pretty basic question that doesn't require anything specific to Postgres.
Current result set:
Student ID Last First Total Fees Reservation Count Amount Paid Amount Due
123456 Jones Amy 50 1 50 0
412365 Smith Sue 100 3 545 -445
741258 Anderson Jon 50 1 0.00 50.00
963258 Holmes Fred 100 2 30 70
The schema:
SET search_path TO temp, public;
CREATE TABLE term
(term_id SERIAL PRIMARY KEY,
term_title VARCHAR(100));
CREATE TABLE student
(student_id SERIAL PRIMARY KEY,
student_sis_id VARCHAR(15),
student_first_name VARCHAR(30),
student_last_name VARCHAR(100));
CREATE TABLE reservation
(reservation_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES student ON UPDATE CASCADE,
term_id INTEGER REFERENCES term ON UPDATE CASCADE,
reservation_fee_amount NUMERIC DEFAULT 0.00);
CREATE TABLE payment
(payment_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES student ON UPDATE CASCADE,
term_id INTEGER REFERENCES term ON UPDATE CASCADE,
payment_cash_amount NUMERIC,
payment_credit_card_amount NUMERIC,
payment_check_amount NUMERIC);
INSERT INTO term VALUES (DEFAULT, 'SESSION 1');
INSERT INTO term VALUES (DEFAULT, 'SESSION 2');
INSERT INTO term VALUES (DEFAULT, 'SESSION 3');
INSERT INTO student VALUES (DEFAULT, 412365, 'Sue', 'Smith');
INSERT INTO student VALUES (DEFAULT, 123456, 'Amy', 'Jones');
INSERT INTO student VALUES (DEFAULT, 741258, 'Jon', 'Anderson');
INSERT INTO student VALUES (DEFAULT, 963258, 'Fred', 'Holmes');
INSERT INTO reservation VALUES (DEFAULT, 1, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 1, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 2, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 3, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 4, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 4, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 1, 3, 50);
INSERT INTO payment VALUES (DEFAULT, 1, 1, 25, 0, 0);
INSERT INTO payment VALUES (DEFAULT, 1, 1, 0, 20, 0);
INSERT INTO payment VALUES (DEFAULT, 2, 1, 25, 25, 0);
INSERT INTO payment VALUES (DEFAULT, 4, 1, 10, 10, 10);
INSERT INTO payment VALUES (DEFAULT, 1, 3, 500, 0, 0);
The query:
SELECT
student.student_sis_id AS "Student ID",
student.student_last_name AS Last,
student.student_first_name AS First,
SUM(reservation.reservation_fee_amount) AS "Total Fees",
(
SELECT COUNT(reservation.reservation_id)
FROM reservation
WHERE student.student_id = reservation.student_id
) AS "Reservation Count",
(
SELECT
COALESCE(SUM(
payment.payment_check_amount
+ payment.payment_cash_amount
+ payment.payment_credit_card_amount
), 0.00)
FROM payment
WHERE payment.student_id = student.student_id
) AS "Amount Paid",
SUM(reservation.reservation_fee_amount) - (
SELECT
COALESCE(SUM(
payment.payment_check_amount
+ payment.payment_cash_amount
+ payment.payment_credit_card_amount
), 0.00)
FROM payment WHERE payment.student_id = student.student_id
) AS "Amount Due"
FROM
student
INNER JOIN reservation ON student.student_id = reservation.student_id
WHERE reservation.term_id IN (1,2)
GROUP BY
student.student_id,
student.student_sis_id,
student.student_last_name,
student.student_first_name
ORDER BY
student.student_sis_id
;

Here is my updated version of the query:
SELECT
s.student_sis_id AS "Student ID",
s.student_last_name AS Last,
s.student_first_name AS First,
SUM(r.reservation_fee_amount) AS "Total Fees",
COUNT(r.reservation_id) AS "Reservation Count",
COALESCE(
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
), 0.00
) AS "Amount Paid",
SUM(r.reservation_fee_amount) - (
COALESCE(
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
), 0.00
)
) AS "Amount Due"
FROM
student s
INNER JOIN reservation r ON s.student_id = r.student_id
LEFT JOIN payment p ON p.student_id = r.student_id AND p.term_id = r.term_id
WHERE r.term_id IN (1,2)
GROUP BY
s.student_id,
s.student_sis_id,
s.student_last_name,
s.student_first_name
ORDER BY
s.student_sis_id
;
Things to watch:
I included payments in the main (outer) query in order to avoid subqueries
the join type is LEFT [OUTER] JOIN, so the lack of any payment rows will not prevent other data from appearing in the result set
the join condition includes term_id (basically this was the point where you were lost, I think)
and finally I used short table aliases for improving readability.
I hope this is what you are after.

Found the solution to the 2 payment entry problem (that I didn't recognize in my original question). Here is the answer:
set search_path to temp, public;
SELECT
s.student_sis_id AS "Student ID",
s.student_last_name AS "Last Name",
s.student_first_name AS "First Name",
SUM(r.reservation_fee_amount) AS "Total Fees",
COALESCE(p.paid, 0.00) AS "Amount Paid",
COALESCE(SUM(r.reservation_fee_amount) - p.paid, 0.00) AS "Amount Due"
FROM
student s
INNER JOIN reservation r ON s.student_id = r.student_id
left outer join
(
select student_id, term_id,
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
) AS "paid"
from payment p
group by student_id, term_id
) as p
ON p.student_id = r.student_id AND p.term_id = r.term_id
WHERE r.reservation_completed AND r.term_id IN (1,2)
GROUP BY
s.student_sis_id,
s.student_last_name,
s.student_first_name,
p.paid
ORDER BY
s.student_sis_id
Thank you dezso and davek

Related

sql add columns in group dynamically

It is necessary to build a summary table based on data about the customer and their payments, where the columns will be the sequential number of the contract (contact_number) and the year (year) grouped by gender. The main condition is that contact_number and year should be dynamically generated.
Test data:
CREATE TABLE loans
(
loan_id int,
client_id int,
loan_date date
);
CREATE TABLE clients
(
client_id int,
client_name varchar(20),
gender varchar(20)
);
INSERT INTO CLIENTS
VALUES (1, arnold, 'male'),
(2, lilly, 'female'),
(3, betty, 'female'),
(4, tom, 'male'),
(5, jim, 'male');
INSERT INTO loans
VALUES (1, 1, '20220522'),
(2, 2, '20220522'),
(3, 3, '20220525'),
(4, 4, '20220525'),
(5, 1, '20220527'),
(6, 2, '20220527'),
(7, 3, '20220601'),
(8, 1, '20220603'),
(9, 2, '20220603'),
(10, 1, '20220603');
Formation of columns can be done using the case when construct, but this option is not suitable due to the need to constantly add new lines in the query when adding data.
My code:
with cte as
(
select
l.client_id,
loan_date,
extract(year from loan_date) as year,
client_name,
gender,
row_number() over (partition by l.client_id order by loan_date asc) as serial_number_contact
from
loans l
inner join
client c on l.client_id = c.client_id
)
select
gender
, year
, contract_number
, count(*)
from cte
group by gender, year, contract_number
order by year, contract_number
expected Output :
sex
1 contract, 2022
2 contract, 2022
3 contract, 2022
male
2
2
1
female
4
1
1
RDMBS - postgres

View table SQL Server

I want to create a view, displaying book titles and number of reviews made to the specific book.
What is the options when the values are not compatible?
Relevant columns in the Books table:
ISBN13 PK bigint
Title nvarchar(50)
Language nvarchar(30)
Author Id FK int
Category ID FK int
Sample data Books:
INSERT INTO Books VALUES (9783852913735, 'Ulysses', 'English', 100, 'January 06, 2002', 1, null);
INSERT INTO Books VALUES (9780195038637, 'Battle Cry of Freedom', 'English', 490, 'February 25, 1988', 99, null);
INSERT INTO Books VALUES (9789178615155, 'Surhörningen', 'Swedish', 195, '2019', 4, null);
INSERT INTO Books VALUES (9789178614577, 'Jag älskar regnbågsenhörningar', 'Swedish', 190, '2021', 2, null);
Relevant columns in the Reviews table:
ReviewId PK int
BookId FK bigint -- FK to ISBN13
CategoryID FK
WriterId FK
Date
Sample data Reviews:
insert into Reviews values(0020, '9783852913735', '120', 11, '2001-02-21');
insert into Reviews values(0021, '9789177836599', '140', 4, '2001-10-19');
insert into Reviews values(0022, '9789178130979', '110', 1, '2002-02-22');
insert into Reviews values(0023, '9789178130979', '90', 8, '2003-09-06');
insert into Reviews values(0024, '9789178614677', '50', 2, '2005-08-29');
insert into Reviews values(0025, '9789178615155', '10', 5, '2004-08-25');
insert into Reviews values(0026, '971019503872', '10', 9, '2009-06-11');
insert into Reviews values(0027, '9780195038637', '20', 2, '2010-11-10');
Sample data Categories:
insert into Categories (CategoryId, Name) values(10, 'Architecture');
insert into Categories values(20, 'Art');
insert into Categories values(30, 'Astrology');
insert into Categories values(40, 'Baking');
insert into Categories values(50, 'Business Management');
insert into Categories values(60, 'Biology');
insert into Categories values(70, 'Comics');
insert into Categories values(80, 'Computational Science');
SELECT Books.Title, Books.[Author Id]
FROM Books
INNER JOIN Reviews ON Reviews.BookId=Books.ISBN13;
Below is my code for the reviews part, as I want to show the number of reviews per book:
SELECT
BookId,
COUNT
(BookId) [Reviews]
FROM
Reviews
GROUP BY BookId
HAVING COUNT
(BookId)> 1
So expected results would be:
Title | Author | BookId | Category | Number of Reviews
Have a look in to this query. I created the view and since the category has no values compatible with the books table I used a Left join to retrieve the records which has values in both books and reviews. Feel free to comment on the answer and let me know any other additions or alterations if required. I am happy to assist with. Thanks for posting Insert scripts and table definitions which gave me fast implementation and testing capability.
CREATE view My_View AS
(
SELECT
[B].[ISBN13] AS [BookId]
,[B].[Title]
,[B].[AuthorId] AS [Author]
,[C].[Name] As [Category]
, COUNT([R].[ReviewId]) OVER ( PARTITION BY [B].[Title]) AS [Number of reviews]
FROM Reviews [R]
INNER JOIN Books [B]
ON [R].[BookId] = [B].[ISBN13]
LEFT JOIN Categories [C]
ON [B].[CategoryId] = [C].[CategoryId]
)
SELECT * FROM My_View
Assuming from your sample query you are after just a count of reviews, you would have something like this (guessing obviously for the other tables you need to join with). Several ways to correlate but a simple count only requires an inline correlated subquery:
create view MyView as
select
b.Title,
a.Name Author,
b.ISBN13 BookId,
c.Name Category,
(select Count(*) from Reviews r where r.BookId=b.ISBN13) Reviews
from Books b
join Categories c on c.Id=b.CategoryId
join Authors a on a.Id=b.AuthorId
Using a subset of the data you added, this query works fine
Title BookId Reviews
------------------------------ --------------- -----------
Ulysses 9783852913735 1
Battle Cry of Freedom 9780195038637 1
Surhörningen 9789178615155 1
Jag älskar regnbågsenhörningar 9789178614577 0

What is the correct order for recursive queries using two tables on SQL?

I have the following two tables:
CREATE TABLE empleados (
id INTEGER PRIMARY KEY,
nombre VARCHAR(255) NOT NULL,
gerenteId INTEGER,
FOREIGN KEY (gerenteId) REFERENCES empleados(id)
);
CREATE TABLE ventas (
id INTEGER PRIMARY KEY,
empleadoId INTEGER NOT NULL,
valorOrden INTEGER NOT NULL,
FOREIGN KEY (empleadoId) REFERENCES empleados(id)
);
With the following data:
INSERT INTO empleados(id, nombre, gerenteId) VALUES(1, 'Roberto', null);
INSERT INTO empleados(id, nombre, gerenteId) VALUES(2, 'Tomas', null);
INSERT INTO empleados(id, nombre, gerenteId) VALUES(3, 'Rogelio', 1);
INSERT INTO empleados(id, nombre, gerenteId) VALUES(4, 'Victor', 3);
INSERT INTO empleados(id, nombre, gerenteId) VALUES(5, 'Johnatan', 4);
INSERT INTO empleados(id, nombre, gerenteId) VALUES(6, 'Gustavo', 2);
INSERT INTO ventas(id, empleadoId, valorOrden) VALUES(1, 3, 400);
INSERT INTO ventas(id, empleadoId, valorOrden) VALUES(2, 4, 3000);
INSERT INTO ventas(id, empleadoId, valorOrden) VALUES(3, 5, 3500);
INSERT INTO ventas(id, empleadoId, valorOrden) VALUES(4, 2, 40000);
INSERT INTO ventas(id, empleadoId, valorOrden) VALUES(5, 6, 3000);
I'm trying to get a query to obtain the sum of all the "Orders" which belong directly or inderectly to the main managers. The main managers are the ones whose doesn't report to anybody else. In this calse, Roberto and Tomas are the main managers but there could be other ones. The result must to take into account not just the sales (ventas) made directly by him but also by any of their employees (direct employees or employees of their employees).
So in this case I'm expecting the following result:
-- Id TotalVentas
-- ----------------
-- 1 6900
-- 2 43000
Where the Id column refers to the employees' id which are "main" managers and TotalVentas column is the sum of all the ventas (valorOrden) made by them and their employees.
So Roberto has no records for orders but Rogelio (his employee) has one of 400, Victor (Rogelio's employee) has one for 3000 and Johnatan (Victor's employee) has another for 3500. So the sum of all of them is 6900. And it is the same case with Tomas which has one venta directly made by him plus another one made by Gustavo who is his employee.
The query that I have so far is the following:
WITH cte_org AS (
SELECT
id,
nombre,
gerenteId,
0 as EmpLevel
FROM
dbo.empleados
WHERE gerenteId IS NULL
UNION ALL
SELECT
e.id,
e.nombre,
e.gerenteId,
o.EmpLevel + 1
FROM
dbo.empleados e
INNER JOIN cte_org o
ON o.id = e.gerenteId
WHERE e.gerenteId IS NOT NULL
)
SELECT cte.id, SUM(s.orderValue)
FROM cte_org cte, dbo.sales s
WHERE (cte.id = s.employeeId AND cte.gerenteId is null)
OR
(cte.id = s.employeeId AND cte.EmpLevel <> 0 AND
cte.gerenteId in (select ee.id from dbo.empleados ee where ee.gerenteId is null)
)
--AND
--(cte.gerenteId in (select ee.id from dbo.empleados ee where ee.gerenteId is null)
--OR
--cte.gerenteId is null)
--AND cte.gerenteId = NULL
group by cte.id
;
Could anybody help me with this?
This is traversing a hierarchy, starting with the highest level managers, and then joining in the sales:
with cte as (
select id, nombre, id as manager
from empleados e
where gerenteid is null
union all
select e.id, e.nombre, cte.manager
from cte join
empleados e
on cte.id = e.gerenteid
)
select cte.manager, sum(valororden)
from cte join
ventas v
on cte.id = v.empleadoid
group by cte.manager;
Here is a db<>fiddle. The Fiddle uses SQL Server, because that is consistent with the syntax that you are using.
In oracle you can do this using below query:
SQLFiddle
select manager, sum(amount) as total_amount from
(
select level, CONNECT_BY_ROOT employeeid Manager, a.* from
(
SELECT a.id as employeeid, a.nombre as name , a.GERENTEID as manager_id,
b.EMPLEADOID as sales_id, b.VALORORDEN amount
from empleados a left outer join ventas b
on (a.id = b.empleadoId)
) a
start with manager_id is null
connect by prior employeeid = manager_id) x
group by manager;

PostgreSQL query not returning result as intended

I would like to generate a list of all days where every sailor booked a boat in that particular day.
The table scheme is as follows:
CREATE TABLE SAILOR(
SID INTEGER NOT NULL,
NAME VARCHAR(50) NOT NULL,
RATING INTEGER NOT NULL,
AGE FLOAT NOT NULL,
PRIMARY KEY(SID)
);
CREATE TABLE BOAT(
BID INTEGER NOT NULL,
NAME VARCHAR(50) NOT NULL,
COLOR VARCHAR(50) NOT NULL,
PRIMARY KEY(BID)
);
CREATE TABLE RESERVE (
SID INTEGER NOT NULL REFERENCES SAILOR(SID),
BID INTEGER NOT NULL REFERENCES BOAT(BID),
DAY DATE NOT NULL,
PRIMARY KEY(SID, BID, DAY));
The data is as follows:
INSERT INTO SAILOR(SID, NAME, RATING, AGE)
VALUES
(64, 'Horatio', 7, 35.0),
(74, 'Horatio', 9, 35.0);
INSERT INTO BOAT(BID, NAME, COLOR)
VALUES
(101, 'Interlake', 'blue'),
(102, 'Interlake', 'red'),
(103, 'Clipper', 'green'),
(104, 'Marine', 'red');
INSERT INTO RESERVE(SID, BID, DAY)
VALUES+
(64, 101, '09/05/98'),
(64, 102, '09/08/98'),
(74, 103, '09/08/98');
I have tried using this code:
SELECT DAY
FROM RESERVE R
WHERE NOT EXISTS (
SELECT SID
FROM SAILOR S
EXCEPT
SELECT S.SID
FROM SAILOR S, RESERVE R
WHERE S.SID = R.SID)
GROUP BY DAY;
but it returns a list of all days, no exception. The only day that it should return is "09/08/98". How do I solve this?
I would phrase your query as:
SELECT r.DAY
FROM RESERVE r
GROUP BY r.DAY
HAVING COUNT(DISTINCT r.SID) = (SELECT COUNT(*) FROM SAILOR);
Demo
The above query says to return any day in the RESERVE table whose distinct SID sailor count matches the count of every sailor.
This assumes that SID sailor entries in the RESERVE table would only be made with sailors that actually appear in the SAILOR table. This seems reasonable, and can be enforced using primary/foreign key relationships between the two tables.
Taking a slightly different approach of just counting unique sailors per day:
SELECT day FROM (
SELECT COUNT(DISTINCT sid), day FROM reserve GROUP BY day
) AS sailors_per_day
WHERE count = (SELECT COUNT(*) FROM sailor);
+------------+
| day |
|------------|
| 1998-09-08 |
+------------+

Write a SQL query to find the products which does not have sales at all? ( Which one is efficient and why )

As per the question title I would like to understand which SQL would be efficient and why based on the below solutions assuming the data volumes are high in both the tables PRODUCTS( millions of rows ) AND SALES( billions of rows ). Here are the schema details as below. I am not interested in getting the solution for the question but I would like to get better learning on the optimal solution.
CREATE TABLE PRODUCTS
(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
CREATE TABLE SALES
(
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');
INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');
INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');
INSERT INTO PRODUCTS VALUES ( 400, 'LG');
INSERT INTO SALES VALUES ( 1, 100, 2010, 25, 5000);
INSERT INTO SALES VALUES ( 2, 100, 2011, 16, 5000);
INSERT INTO SALES VALUES ( 3, 100, 2012, 8, 5000);
INSERT INTO SALES VALUES ( 4, 200, 2010, 10, 9000);
INSERT INTO SALES VALUES ( 5, 200, 2011, 15, 9000);
INSERT INTO SALES VALUES ( 6, 200, 2012, 20, 9000);
INSERT INTO SALES VALUES ( 7, 300, 2010, 20, 7000);
INSERT INTO SALES VALUES ( 8, 300, 2011, 18, 7000);
INSERT INTO SALES VALUES ( 9, 300, 2012, 20, 7000);
COMMIT;
--Solution 1
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
LEFT OUTER JOIN
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID);
WHERE S.QUANTITY IS NULL
--Solution 2
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE P.PRODUCT_ID NOT IN
(SELECT DISTINCT PRODUCT_ID FROM SALES);
--Solution 3
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE NOT EXISTS
(SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);
This is too long for a comment.
Your three queries are different semantically, as written with the provided table definitions.
The first query is different from the other two, logically. Unless quantity is declared as NOT NULL, then it can return matching records where the quantity is NULL.
The second query uses a redundant select distinct. I am guessing that all decent databases would optimize it away. The normal syntax is:
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE P.PRODUCT_ID NOT IN (SELECT PRODUCT_ID FROM SALES);
And, this is not equivalent to the third. If SALES.PRODUCT_ID is ever NULL, then this query will return no rows at all.
Semantically, I prefer the third version with NOT EXISTS. However, depending on the database one of the first two (fixed?) versions might be an iota faster. As with any performance question, the answer depends on your data, data structure, database engine, and even potentially your hardware.