PostgreSQL JOIN TWO TABLES SYNTAX - sql

I'm trying to find a solution for my problem but hours of thinking and searching didn't help. So I thought maybe i could ask it here.
I have two tables:
Table: Project
projectid | projectName | projectInformation | totalPrice | projectStatus
-------------- ------------ ------------------- ---------- -------------
1 "Education" "Information..." 2000 FALSE
2 "Hospital" "Information..." 3000 TRUE
3 "Water" "Information..." 1000 TRUE
Table: Donations
donationid | donationamount | date | costumerid | projectid
---------- -------------- ---- ---------- ---------
1 10 now() 3 1
2 20 now() 1 2
3 50 now() 2 2
4 15 now() 4 3
I want to archive the following results:
projectid | projectname | projectinformation | totalprice | projectstatus | sum(amount)
1 "Education" "information..." 2000 FALSE 325
2 "Hospital" "information..." 3000 TRUE
3 "Water" "information..." 1000 TRUE 120
So costumers can donate a x amount for a project.
I want to show the collected amount of money for each project. When I do the following query it gives me only the projects with donations only. I want to show all projects as well if there are no donations yet and put as default amount 0 USD for example.
This was my query that gives only project with donations:
select projectid, projectname, projectinformation, totalprice, projectstatus, sum(amount)
from project
natural join donation
group by projectid, projectname, projectinformation, totalprice, projectstatus;
I guess I need to use left outer join but somehow I couldn't figure out how to write it correctly.

You would use a left join:
select p.projectid, p.projectname, p.projectinformation, p.totalprice,
p.projectstatus, coalesce(sum(d.amount), 0)
from project p left join
donation d
on p.project_id = d.project_id
group by p.projectid, p.projectname, p.projectinformation, p.totalprice, p.projectstatus;
I strongly discourage you from using natural join. Leaving out the join keys makes the query hard to follow and very difficult to debug. You can easily make mistakes with the wrong keys used for the join.
I actually consider natural join an abomination, because it depends on the names of columns rather than on formally declared foreign key relationships -- "abomination" is my description for ignoring such intentionally declared relationships. If you want a short-hand, the using clause can be quite useful.

Related

Count values separately until certain amount of duplicates SQL

I need a Statement that selects all patients and the amount of their appointments and when there are 3 or more appointments that are taking place on the same date they should be counted as one appointment
That is what my Statement looks so far
SELECT PATSuchname, Count(DISTINCT AKTDATUM) AS AKTAnz
FROM tblAktivitaeten
LEFT OUTER JOIN tblPatienten ON (tblPatienten.PATID=tblAktivitaeten.PATID)
WHERE (AKTDeleted<>'J' OR AKTDeleted IS Null)
GROUP BY PATSuchname
ORDER BY AKTAnz DESC
The result should look like this
PATSuchname Appointments
----------------------------------------
Joey Patner 13
Billy Jean 15
Example Name 13
As you can see Joey Patner has 13 Appointments, in the real table though he has 15 appointments but three of them have the same Date and because of that they are only counted as 1
So how can i write a Statement that does exactly that?
(I am new to Stack Overflow, sorry if the format I use is wrong and tell me if it is.
In the table it looks like this.
tblPatienten
----------
PATSuchname PATID
------------------------
Joey Patner 1
Billy Jean 2
Example Name 3
tblAktivitaeten
----------
AKTDatum PATID AKTID
-----------------------------------------
08.02.2021 1 1000 ----
08.02.2021 1 1001 ---- So these 3 should counted as 1
08.02.2021 1 1002 ----
09.05.2021 1 1003
09.07.2021 2 1004 -- these 2 shouldn't be counted as 1
09.07.2021 2 1005 --
Two GROUP BY should do it:
SELECT
x.PATID, PATSuchname, SUM(ApptCount)
FROM (
SELECT
PATID, AKTDatum, CASE WHEN COUNT(*) < 3 THEN COUNT(*) ELSE 1 END AS ApptCount
FROM tblAktivitaeten
GROUP BY
PATID, AKTDatum
) AS x
LEFT JOIN tblPatienten ON tblPatienten.PATID = x.PATID
GROUP BY
x.PATID, PATSuchname

Postgresql Joining tables without losing records

Let's say I have the following tables:
1 - StartingStock:
vendor | starting_stock
------------------------
adidas | 13
Reebok | 5
2 - Restock:
vendor | restocks
-----------------
adidas | 2
nike | 3
3 - Sales:
vendor | quantity_sold
----------------------
adidas | 10
nike | 1
I want my resulting table to be the sell through grouped by vendor. In this scenario, sell through is calculated like this: quantity_sold/(starting_stock + restocks). My only problem is that starting stock and restock tables may not have the some vendors that are present in the sales table. So in the scenario above, StartingStock does not have nike as a record. So if that's the case the sell though for nike would be just 1/3 or 1/(3+0). Therefore, my resulting table would be:
vendor | sell_through
---------------------
adidas | 1.5
nike | 0.33
Reebok | 0
So I'd want all of the vendors present in the result table (if it has no sales, value is 0 like Reebok shown above).
I tried working with the different types of joins but I couldn't get it. Any help would be great. Thanks.
We can try a full outer join approach here:
SELECT
COALESCE(ss.vendor, r.vendor, s.vendor) AS vendor,
COALESCE(s.quantity_sold, 0) /
(COALESCE(ss.starting_stock, 0) + COALESCE(r.restocks, 0)) AS sell_through
FROM StartingStock ss
FULL OUTER JOIN Restock r ON ss.vendor = r.vendor
FULL OUTER JOIN Sales s ON s.vendor = COALESCE(ss.vendor, r.vendor)
Demo
Note that I am coming up with 2/3 for the sell through for Adidas, since the quantity sold is 10, and the sum of stocks is 15.
I would use union all and aggregation:
select vendor,
sum(starting_stock), sum(restock), sum(quantity_sold),
(sum(quantity_sold) * 1.0 / sum(starting_stock) + sum(restock)) as sell_through
from ((select vendor, starting_stock, 0 as restock, 0 as quantity_sold
from startingstock
) union all
(select vendor, 0 as starting_stock, restock, 0 as quantity_sold
from restock
) union all
(select vendor, 0 as starting_stock, 0 as restock, quantity_sold
from sales
)
) v
group by vendor;
In particular, this version includes each number in the calculation only once. A JOIN approach will produce inaccurate results if a vendor has multiple rows in any of the tables.

SQL How can this happen? - Query which normally returns 1 result alone actually resulted in multiple results when put inside WHERE clause

Question brief
I'm doing this practice on w3resource and I couldn't understand why the solution worked. I'm 2 days old to SQL. I'll appreciate very much if someone can help me explain.
I have 2 tables, COMPANY(com_id, com_name) and PRODUCT(pro_name, pro_price, com_id). Each company has several products with different prices. Now I need to write a query to display companies' name together with their most expensive products respectively.
The sample answer on the practice is like this
SELECT c.com_name, p.pro_name, p.pro_price
FROM product p
INNER JOIN company c ON p.com_id = c.com_id
AND p.pro_price =
( SELECT MAX(p.pro_price)
FROM product p
WHERE p.com_id = c.com_id );
The query returned expected result.
com_name pro_name pro_price
--------- --------- -----------
Samsung Monitor 5000.00
iBall DVD drive 900.00
Epsion Printer 2600.00
Zebronics ZIP drive 250.00
Asus Mother Board 3200.00
Frontech Speaker 550.00
But I cannot understand how, especially the part inside the bottom sub-query. Isn't SELECT MAX(p.pro_price) supposed to return only 1 highest price of all companies together?
I also tried subsecting this sub-query like this
SELECT MAX(p.pro_price)
FROM product p
INNER JOIN company c ON p.com_id = c.com_id
WHERE p.com_id = c.com_id;
... and it only returned 1 maximum value.
max(p.pro_price)
-----
5000.00
So how does the final result of the whole query include more than 1 records? There's no GROUP BY or anything.
By the way, the query seemed to use 2 conditions for INNER JOIN. But I also tried swapping the 2nd condition into a WHERE clause and it still worked the same. This is one more thing I don't understand.
The databases involved
COMPANY table
COM_ID | COM_NAME
----------------
11 | Samsung
12 | iBall
13 | Epsion
14 | Zebronics
15 | Asus
16 | Frontech
PRODUCT table
PRO_NAME PRO_PRICE COM_ID
-------------------- ---------- ---------
Mother Board 3200 15
Key Board 450 16
ZIP drive 250 14
Speaker 550 16
Monitor 5000 11
DVD drive 900 12
CD drive 800 12
Printer 2600 13
Refill cartridge 350 13
Mouse 250 12
The sub-query is a correlated sub-query. This query is executed for each value of c.com_id in the outer query:
WHERE p.com_id = c.com_id

Concatenating data from one row into the results from another

I have a SQL Server database of orders I'm struggling with. For a normal order a single table provides the following results:
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
At some point they needed orders that were placed by more than one customer. So they created a row for each customer and split the record over multiple rows.
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
3 John
4 Dan
5 2014-05-10 300
So there is another table I can join on to make sense of this which relates the three rows which are actually one order.
Joint.Orders:
ID Related ID
-----------------------------------------------------------------
5 3
5 4
I'm a little new to SQL and while I can join on the other table and filter to only get the data relating to Order ID 300, but what I'd really like is to concatenate the customers, but after searching for a while I can't see how to do this. What'd I'd really like to achieve is this as an output:
ID Customer Shipdate Order ID
----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
5 John, Dan 2014-05-10 300
You should consider changing the schema first. The below query might help you get a feel of how it can be done with your current design.
Select * From Orders Where IsNull(Customer, '') <> ''
Union All
Select ID,
Customer = (Select Customer + ',' From Orders OI Where OI.ID (Select RelatedID from JointOrders JO Where JO.ID = O.ID)
,ShipDate, OrderID
From Orders O Where IsNull(O.Customer, '') = ''

How do you determine the average total of a column in Postgresql?

Consider the following Postgresql database table:
id | book_id | author_id
---------------------------
1 | 1 | 1
2 | 2 | 1
3 | 3 | 2
4 | 4 | 2
5 | 5 | 2
6 | 6 | 3
7 | 7 | 2
In this example, Author 1 has written 2 books, Author 2 has written 4 books, and Author 3 has written 1 book. How would I determine the average number of books written by an author using SQL? In other words, I'm trying to get, "An author has written an average of 2.3 books".
Thus far, attempts with AVG and COUNT have failed me. Any thoughts?
select avg(totalbooks) from
(select count(1) totalbooks from books group by author_id) bookcount
I think your example data actually only has 3 books for author id 2, so this would not return 2.3
http://sqlfiddle.com/#!15/3e36e/1
With the 4th book:
http://sqlfiddle.com/#!15/67eac/1
You'll need a subquery. The inner query will count the books with GROUP BY author; the outer query will scan the results of the inner query and avg them.
You can use a subquery in the FROM clause for this, or you can use a CTE (WITH expression).
For an average number of books per author you can do simply:
SELECT 1.0*COUNT(DISTINCT book_id)/count(DISTINCT author_id) FROM tbl;
For number of books per author:
SELECT 1.0*COUNT(DISTINCT book_id)/count(DISTINCT author_id)
FROM tbl GROUP BY author_id;
We need 1.0 factor to make the result not integer.
You can remove DISTINCT depending of result you want (it matters only if one book have many authors).
As Craig Ringer rightly pointed out 2 distincts may be expensive. For test performance I have generated 50 000 rows and I got followng results:
My query with 2 DISTINCTS: ~70ms
My query with 1 DISTINCT: ~40ms
Martin Booth's approach: ~30ms
Then added 1 milion rows and tested again:
My query with 2 DISTINCTS: ~1520ms
My query with 1 DISTINCT: ~820ms
Martin Booth's approach: ~1060ms
Then added another 9 milion rows and tested again:
My query with 2 DISTINCTS: ~17s
My query with 1 DISTINCT: ~11s
Martin Booth's approach: ~19s
So there is no universal solution.
This should work:
SELECT AVG(cnt) FROM (
SELECT COUNT(*) cnt FROM t
GROUP BY author_id
) s