Excluding multiple results in specific column (SQL JOIN) - sql

I'm taking my first steps in terms of practical SQL use in real life.
I have a few tables with contractual and financial information and the query works exactly as I need - to a certain point. It looks more or less like that:
SELECT /some columns/ from CONTRACTS
Linked 3 extra tables with INNER JOIN to add things like department names, product information etc. This all works but they all have simplish one-to-one relationship (one contract related to single department in Department table, one product information entry in the corresponding table etc).
Now this is my challenge:
I also need to add contract invoicing information doing something like:
inner join INVOICES on CONTRACTS.contnoC = INVOICES.contnoI
(and selecting also the Invoice number linked to the Contract number, although that's partly optional)
The problem I'm facing is that unlike with other tables where there's always one-to-one relationship when joining tables, INVOICES table can have multiple (or none at all) entries that correspond to a single contract no. The result is that I will get multiple query results for a single contract no (with different invoice numbers presented), needlessly crowding the query results.
Essentially I'm looking to add INVOICES table to a query to just identify if the contract no is present in the INVOICES table (contract has been invoiced or not). Invoice number itself could be presented (it is with INNER JOIN), however it's not critical as long it's somehow marked. Invoice number fields remains blank in the result with the INNER JOIN function, which is also necessary (i.e. to have the row presented even if the match is not found in INVOICES table).
SELECT DISTINCT would look to do what I need, but I seemed to face the problem that I need to levy DISTINCT criteria only for column representing contract numbers, NOT any other column (there can be same values presented, but all those should be presented).
Unfortunately I'm not totally aware of what database system I am using.

Seems like the question is still getting some attention and in an effort to provide some explanation here are a few techniques.
If you just want any contract with details from the 1 to 1 tables you can do it similarily to what you have described. the key being NOT to include any column from Invoices table in the column list.
SELECT
DISTINCT Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
INNER JOIN Invoices i
ON c.contnoC = i.contnoI
Perhaps a Little cleaner would be to use IN or EXISTS like so:
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
EXISTS (SELECT 1 FROM Invoices i WHERE i.contnoI = c.contnoC )
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
contnoC IN (SELECT contnoI FROM Invoices)
Don't use IN if the SELECT ... list can return a NULL!!!
If you Actually want all of the contracts and just know if a contract has been invoiced you can use aggregation and a case expression:
SELECT
Contract, Department, ProductId, CASE WHEN COUNT(i.contnoI) = 0 THEN 0 ELSE 1 END as Invoiced
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
LEFT JOIN Invoices i
ON c.contnoC = i.contnoI
GROUP BY
Contract, Department, ProductId
Then if you actually want to return details about a particular invoice you can use a technique similar to that of cybercentic87 if your RDBMS supports or you could use a calculated column with TOP or LIMIT depending on your system.
SELECT
Contract, Department, ProductId, (SELECT TOP 1 InvoiceNo FROM invoices i WHERE c.contnoC = i.contnoI ORDER BY CreateDate DESC) as LastestInvoiceNo
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
GROUP BY
Contract, Department, ProductId

I would do it this way:
with mainquery as(
<<here goes you main query>>
),
invoices_rn as(
select *,
ROW_NUMBER() OVER (PARTITION BY contnoI order by
<<some column to decide which invoice you want to take eg. date>>) as rn
)
invoices as (
select * from invoices_rn where rn = 1
)
select * from mainquery
left join invoices i on contnoC = i.contnoI
This gives you an ability to get all of the invoice details to your query, also it gives you full control of which invoice you want see in your main query. Please read more about CTEs; they are pretty handy and much easier to understand / read than nested selects.
I still don't know what database you are using. If ROW_NUMBER is not available, I will figure out something else :)
Also with a left join you should use COALESCE function for example:
COALESCE(i.invoice_number,'0')
Of course this gives you some more possibilities, you could for example in your main select do:
CASE WHEN i.invoicenumber is null then 'NOT INVOICED'
else 'INVOICED'
END as isInvoiced

You can use
SELECT ..., invoiced = 'YES' ... where exists ...
union
SELECT ..., invoiced = 'NO' ... where not exists ...
or you can use a column like "invoiced" with a subquery into invoices to set it's value depending on whether you get a hit or not

Related

Group By a Specific Column

I had this query that was working fine...
select a.action_id, a.request_date, a.customer_id from customer.actions a
inner join customer.customers c on c.id = a.customer_id
inner join customer.entities e on e.id = c.entity_id
where request_url like '%/test%'
order by a.request_date desc;
This returned +- 500 lines. Now I wanted to inner join with another table but it's returning like 4000 lines..
select a.action_id, a.customer_id, a.request_date, mp.payment_type from customer.actions a
inner join customer.customers c on c.id = a.customer_id
inner join customer.entities e on e.id = c.entity_id
inner join customer.mobile_payments mp on mp.customer_id = a.customer_id
where request_url like '%/test%'
order by a.request_date desc;
I get duplicate records for every record on the mobile_payments table.
Group by does not work
How do I remove the duplicate action_ids ? DISTINCT (a.action_id) doesn't work
EDIT:
I just want the records from the customer.actions filtered by where request_url like '%/test%' and I want to get the payment type from the mobile_payments table. 1 for each action_id. When I do that I get duplicate actions ids.
As accurately pointed out by J0eBl4ck in comment, the issue is that you are joining to the payments table on the sole condition of matching customer. Think of it like this. You have a credit card for purchases over 10 years. Each month you make a payment, so you have 120 payments made under your account.
Is there some other basis you are joining to the payment table? You have an action ID? Does that mean something? Do you intend to get a payment associated to that specific action? Is there such a relationship where it would be
customer -> payments joined on customer_id AND action_id?
A secondary option is to apply a sub-query from the payments table grouped by the customer id so it only returns a single row, this way it does not distort your overall record count. You may need to adjust the underlying criteria some, but this should help you look into final resolution to your needs.
select
a.action_id,
a.customer_id,
a.request_date,
mp.payment_type
from
customer.actions a
inner join customer.customers c
on a.customer_id = c.id
inner join customer.entities e
on c.entity_id = e.id
inner join
-- this "PQ" Pre-Query is by customer ID and example
-- to get the payment type based on whatever the most recent
-- transaction in first position
( select mp.customer_id,
mp.payment_type,
row_number() over (partition by mp.customer_id
order by mp.WhateverPrimaryKeyOrDateIs DESC ) as rowSeq
from
customer.mobile_payments mp ) PQ
-- so here, we are only joining on the FIRST (most recent due to descending)
-- payment type applicable for the customer ID in question joined to
on a.customer_id = PQ.customer_id
AND PQ.rowSeq = 1
where
request_url like '%/test%'
order by
a.request_date desc;
Again, you WILL probably need to adjust the pre-query as we dont get the context association of payment to a given action. Hopefully it will guide you in your final goal though.

How to put conditions on left joins

I have two tables, CustomerCost and Products that look like the following:
I am joining the two tables using the following SQL query:
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod ON Cost.productId =prod.productId
WHERE prod.productId=4
AND (Cost.Customer_Id =2717
OR Cost.Customer_Id IS NULL)
The result of the join is:
joins result
What i want to do is when I pass customerId 2717 it should return only specific customer cost i.e. 258.93, and when customerId does not match then only it should take cost as 312.50
What am I doing wrong here?
You can get your expected output as follows:
SELECT Cost.ProductId,
Cost.CustomerCost
FROM CUSTOMERCOST Cost
INNER JOIN PRODUCTS prod ON Cost.productId = prod.productId
WHERE prod.productId=4
AND Cost.Customer_Id = 2717
However, if you want to allow customer ID to be passed as NULL, you will have to change the last line to AND Cost.Customer_Id IS NULL. To do so dynamically, you'll need to use variables and generate the query based on the input.
The problem in the original query that you have posted is that you have used an alias called custCost which is not present in the query.
EDIT: Actually, you don't even need a join. The CUSTOMERCOST table seems to have both Customer and Product IDs.
You can simply:
SELECT
Cost.ProductId, Cost.CustomerCost
FROM
CUSTOMERCOST Cost
WHERE
Cost.Customer_Id = 2717
AND Cost.productId = 4
You seem to want:
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id = 2717
UNION ALL
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id IS NULL AND
NOT EXISTS (SELECT 1 FROM CUSTOMERCOST c2 WHERE c2.productId = 4 AND c2.Customer_Id = 2717);
That is, take the matching cost, if it exists for the customer. Otherwise, take the default cost.
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod
ON Cost.productId =prod.productId
AND (Cost.Customer_Id =2717 OR Cost.Customer_Id IS NULL)
WHERE prod.productId=4
WHERE applies to the joined row. ON controls the join condition.
Outer joins are why FROM and ON were added to SQL-92. The old SQL-89
syntax had no support for them, and different vendors added different,
incompatible syntax to support them.

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

I have an e-commerce website (using VirtueMart) and I sell products that consist child products. When a product is a parent, it doesn't have ParentID, while it's children refer to it. I know, not the best logic but I didn't create it.
My SQL is very basic and I believe I ask for something quite easy to achieve
Select products that have children.
Sort results by prices (ASC/DSC).
SELECT * FROM Products INNER JOIN Prices ON Products.ProductID = Prices.ProductID ORDER BY Products.Price [ASC/DSC]
Explanation:
SELECT - Select (Get/Retrieve)
* - ALL
FROM Products - Get them from a DB Table named "Products".
INNER JOIN Prices - Selects all rows from both tables as long as there is a match between the columns in both tables. Rather, JOIN DB Table "Products" with DB Table "Prices".
ON - Like WHERE, this defines which rows will be checked for matches.
Products.ProductID = Prices.ProductID - Your match criteria. Get the rows where "ProductID" exists in both DB Tables "Products" and "Prices".
ORDER BY Products.Price [ASC/DSC] - Sorting. Use ASC for Ascending, DSC for Descending.
This table design is subpar for a number of reasons. First, it appears that the value 0 is being used to indicate lack of a parent (as there's no 0 ID for products). Typically this will be a NULL value instead.
If it were a NULL value, the SQL statement to get everything without a parent would be as simple as this:
SELECT * FROM Products WHERE ParentID IS NULL
However, we can't do that. If we make the assumption that 0 = no parent, we can do this:
SELECT * FROM Products WHERE ParentID = 0
However, that's a dangerous assumption to make. Thus, the correct way to do this (given your schema above), would be to compare the two tables and ensure that the parentID exists as a ProductID:
SELECT a.*
FROM Products AS a
WHERE EXISTS (SELECT * FROM Products AS b WHERE a.ID = b.ParentID)
Next, to get the pricing, we have to join those two tables together on a common ID. As the Prices table seems to reference a ProductID, we can use that like so:
SELECT p.ProductID, p.ProductName, pr.Price
FROM Products AS p INNER JOIN Prices AS pr ON p.ProductID = pr.ProductID
WHERE EXISTS (SELECT * FROM Products AS b WHERE p.ID = b.ParentID)
ORDER BY pr.Price
That might be sufficient per the data you've shown, but usually that type of table structure indicates that it's possible to have more than one price associated with a product (we're unable to tell whether this is true based on the quick snapshot).
That should get you close... if you need something more, we'll need more detail.
use the below script if you are using ssms.
SELECT pd.ProductId,ProductName,Price
FROM product pd
LEFT JOIN price pr ON pd.ProductId=pr.ProductID
WHERE EXISTS (SELECT 1 FROM product pd1 WHERE pd.productID=pd1.ParentID)
ORDER BY pr.Price ASC
Note :neither of your parent product have price in price table. If you want the sum of price of their child product use the below script.
SELECT pd.ProductId,pd.ProductName,SUM(ISNULL(pr.Price,0)) SUM_ChildPrice
FROM product pd
LEFT JOIN product pd1 ON pd.productID=pd1.ParentID
LEFT JOIN price pr ON pd1.ProductId=pr.ProductID
GROUP BY pd.ProductId,pd.ProductName
ORDER BY pr.Price ASC
You will have to use self-join:
For example:
SELECT * FROM products parent
JOIN products children ON parent.id = children.parent_id
JOIN prices ON prices.product_id = children.id
ORDER BY prices.price
Because we are using JOIN it will filter out all entries that don't have any children.
I haven't tested it, I hope it would work.

SQL Query to find MAX Date

I have some software that uses dBase4 for its database. I am attempting to construct a report using fields from 3 tables (Customer, Service & History).
In all of the tables the ACCOUNT field is the same. The 'Customer' and the 'Service' table only have one one record for each Customer. The 'History' table has multiple records for each Customer.
I need to write a query so that only the record with the MAX date in 'History.BILLTHRU' is returned for each Customer. The code below returns all of the records for each Customer in the History table:
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
History.ACCOUNT,
History.BILLTHRU,
Service.ACCOUNT,
Service.OFFERCODE
FROM "C:\Customer.dbf" Customer
INNER JOIN "C:\History.dbf" History
ON (Customer.ACCOUNT = History.ACCOUNT)
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
Use a sub-query and a group by:
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
History.ACCOUNT,
History.BILLTHRU,
Service.ACCOUNT,
Service.OFFERCODE
FROM "C:\Customer.dbf" Customer
INNER JOIN (SELECT ACCOUNT, MAX(BILLTHRU) AS BILLTHRU
FROM "C:\History.dbf"
GROUP BY ACCOUNT) History
ON (Customer.ACCOUNT = History.ACCOUNT)
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
I like to use common table expressions (CTEs). Subqueries are good, but breaking it out like this sometimes makes it easier to keep separate.
with GetMaxDate as (
select account, max(billthru) as MaxBillThru
from "C:\History.dbf"
group by account
)
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
GetMaxDate.ACCOUNT,
GetMaxDate.MaxBillThru,
Service.ACCOUNT,
Service.OFFERCODE
.....
from FROM "C:\Customer.dbf" Customer
INNER JOIN GetMaxDate on customer.ACCOUNT = GetMaxDate.Account
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
EDIT: This is a SQL Server function. I'm leaving it in case it can help you or someone else. I'll delete it if it just clouds the issue.