Simple Table Join with access to joined table columns - sql

I have a simple table of items. Call them "Parts".
Each part can have zero or more related entries in a seperate table, call them "SubParts".
A simple view of the tables is, as you would probably expect:
Parts
-----
PartID int (PK)
PartName varchar
SubParts
--------
SubPartID int (PK)
PartID int (FK_Parts)
SubPartName varchar
SubPartAdded datetime
I would like to return all parts from the primary table, but also have access to the LATEST (order by SubPartAdded DESC) related SubPart if it exists.
My confusion is that there are a 1M+ entries in the subparts table (for many different parts) and I only need the latest one for the current part, if it exists.
Earlier I wrote a statement that performed a left join between the Parts table and a derived table of related Subparts (which works) but the derived table seems to return ALL rows in the subparts table causing a performance hit. I essentially need to do a TOP 1 and order by DESC in the derived select statement, to prefilter the subparts by PartID (and some other columns). However as I cant seem to make reference to the Parts table (outer) columns in the derived select statement, I cant add a WHERE clause to the derived table.
I have also tried the following snippet which does execute, but doesnt return any related records:
SELECT p.PartName, sp.SubPartName, sp.SubPartAdded
FROM Parts p
LEFT JOIN (SELECT TOP 1 SubPartID, SubpartAdded, PartID FROM SubParts ORDER BY SubPartAdded) AS sp
ON sp.PartID = p.PartID
I imagine the "TOP 1" statement is executing against the whole SubParts table, before being filtered by the "ON" statement (?)
Ultimately I need to use some columns from the Subparts table in multiple locations thoughout the main stored proc, so I dont simply want a correlated subquery as this would need to be called multiple times.
(This proc will return multiple parts on each execution. ie. The proc will not be filtered by a single PartID)
I hope this is pretty clear?
It sounds like it should have a very simple solution, but I'm currently stumped!
(Compatibility with SQL Server 2K and above is required)
Regards
Nick

The following should work back to SQL Server 2000.
SELECT PartName, SubPartName, SubPartAdded
FROM Parts
LEFT JOIN
( SELECT SubParts.PartID, SubParts.SubPartName SubParts.SubPartAdded
FROM SubParts
INNER JOIN
( SELECT PartID, MAX(SubPartAdded) [SubPartAdded]
FROM SubParts
GROUP BY PartID
) MaxSubPart
ON MaxSubPart.PartID = SubParts.PartID
AND MaxSubPart.SubPartAdded = SubParts.SubPartAdded
) Subpart
ON SubPart.PartID = Parts.PartID
There are more efficient and elegent ways to do this in later versions (OUTER APPLY, or Window functions), but I am not certain how many of the methods are backwards compatitble to SQL Server 2000.

You need a derived table to extract last subparts. Then you can filter subparts joining it to derived table by id and SubPartAdded columns:
SELECT p.PartName, sp.SubPartName, sp.SubPartAdded
FROM Parts p
LEFT JOIN SubParts sp
ON p.PartID = sp.PartID
LEFT JOIN
(
SELECT PartID, max (SubPartAdded) MaxSubPartAdded
FROM SubParts
GROUP BY PartID
) AS MaxSP
ON sp.PartID = MaxSP.PartID
AND sp.SubPartAdded = MaxSP.MaxSubPartAdded
If Sql Server 2005 or newer:
SELECT p.PartName, sp.SubPartName, sp.SubPartAdded
FROM Parts p
OUTER APPLY
(
select top 1
SubPartName, SubPartAdded
from SubParts
where SubParts.PartID = p.PartID
order by SubPartAdded desc
) sp

Related

Complex SQL View with Joins & Where clause

My SQL skill level is pretty basic. I have certainly written some general queries and done some very generic views. But once we get into joins, I am choking to get the results that I want, in the view I am creating.
I feel like I am almost there. Just can't get the final piece
SELECT dbo.ics_supplies.supplies_id,
dbo.ics_supplies.old_itemid,
dbo.ics_supplies.itemdescription,
dbo.ics_supplies.onhand,
dbo.ics_supplies.reorderlevel,
dbo.ics_supplies.reorderamt,
dbo.ics_supplies.unitmeasure,
dbo.ics_supplies.supplylocation,
dbo.ics_supplies.invtype,
dbo.ics_supplies.discontinued,
dbo.ics_supplies.supply,
dbo.ics_transactions.requsitionnumber,
dbo.ics_transactions.openclosed,
dbo.ics_transactions.transtype,
dbo.ics_transactions.originaldate
FROM dbo.ics_supplies
LEFT OUTER JOIN dbo.ics_orders
ON dbo.ics_supplies.supplies_id = dbo.ics_orders.suppliesid
LEFT OUTER JOIN dbo.ics_transactions
ON dbo.ics_orders.requisitionnumber =
dbo.ics_transactions.requsitionnumber
WHERE ( dbo.ics_transactions.transtype = 'PO' )
When I don't include the WHERE clause, I get 17,000+ records in my view. That is not correct. It's doing this because we are matching on a 1 to many table. Supplies table is 12,000 records. There should always be 12,000 records. Never more. Never less.
The pieces that I am missing are:
I only need ONE matching record from the ICS_Transactions Table. Ideally, the one that I want is the most current 'ICS_Transactions.OriginalDate'.
I only want the ICS_Transactions Table fields to populate IF ICS_Transacions.Type = 'PO'. Otherwise, these fields should remain null.
Sample code or anything would help a lot. I have done a lot of research on joins and it's still very confusing to get what I need for results.
EDIT/Update
I feel as if I asked my question in the wrong way, or didn't give a good overall view of what I am asking. For that, I apologize. I am still very new to SQL, but trying hard.
ICS_Supplies Table has 12,810 records
ICS_Orders Table has 3,666 records
ICS_Transaction Table has 4,701 records
In short, I expect to see a result of 12,810 records. No more and no less. I am trying to create a View of ALL records from the ICS_Supplies table.
Not all records in Supply Table are in Orders and or Transaction Table. But still, I want to see all 12,810 records, regardless.
My users have requested that IF any of these supplies have an open PO (ICS_Transactions.OpenClosed = 'Open' and ICS_Transactions.InvType = 'PO') Then, I also want to see additional fields from ICS_Transactions (ICS_Transactions.OpenClosed, ICS_Transactions.InvType, ICS_Transactions.OriginalDate, ICS_Transactions.RequsitionNumber).
If there are no open PO's for supply record, then these additional fields should be blank/null (regardless to what data is in these added fields, they should display null if they don't meet the criteria).
The ICS_Orders Table is nly needed to hop from the ICS_Supplies to the ICS_Transactions (I first, need to obtain the Requisition Number from the Orders field, if there is one).
I am sorry if I am not doing a good job to explain this. Please ask if you need clarification.
Here's a simplified version of Ross Bush's answer (It removes a join from the CTE to keep things more focussed, speed things up, and cut down the code).
;WITH
ordered_ics_transactions AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY requisitionnumber
ORDER BY originaldate DESC
)
AS seq_id
FROM
dbo.ics_transactions
)
SELECT
s.supplies_id, s.old_itemid,
s.itemdescription, s.onhand,
s.reorderlevel, s.reorderamt,
s.unitmeasure, s.supplylocation,
s.invtype, s.discontinued,
s.supply,
t.requsitionnumber, t.openclosed,
t.transtype, t.originaldate
FROM
dbo.ics_supplies AS s
LEFT OUTER JOIN
dbo.ics_orders AS o
ON o.supplies_id = s.suppliesid
LEFT OUTER JOIN
ordered_ics_transactions AS t
ON t.requisitionnumber = o.requisitionnumber
AND t.transtype = 'PO'
AND t.seq_id = 1
This will only join the most recent transaction record for each requisitionnumber, and only if it has transtype = 'PO'
IF you want to reverse that (joining only transaction records that have transtype = 'PO', and of those only the most recent one), then move the transtype = 'PO' filter to be a WHERE clause inside the ordered_ics_transactions CTE.
You can possibly work with the query below to get what you need.
1. I only need ONE matching record from the ICS_Transactions Table. Ideally, the one that I want is the most current 'ICS_Transactions.OriginalDate'.
I would solve this by creating a CTE with all the ICS_Transaction fields needed in the query, rank-ordered by OPriginalDate, partitioned by suppliesid.
2. I only want the ICS_Transactions Table fields to populate IF ICS_Transacions.Type = 'PO'. Otherwise, these fields should remain null.
If you move the condition from the WHERE clause to the LEFT JOIN then ICS_Transactions not matching the criteria will be peeled and replaced with null values with the rest of the query records.
;
WITH ReqNumberRanked AS
(
SELECT
dbo.ICS_Orders.SuppliesID,
dbo.ICS_Transactions.RequisitionNumber,
dbo.ICS_Transactions.TransType,
dbo.ICS_Transactions.OriginalDate,
dbo.ICS_Transactions.OpenClosed,
RequisitionNumberRankReversed = RANK() OVER(PARTITION BY dbo.ICS_Orders.SuppliesID, dbo.ICS_Transactions.RequisitionNumber ORDER BY dbo.ICS_Transactions.OriginalDate DESC)
FROM
dbo.ICS_Orders
LEFT OUTER JOIN dbo.ICS_Transactions ON dbo.ICS_Orders.RequisitionNumber = dbo.ICS_Transactions.RequsitionNumber
)
SELECT
dbo.ICS_Supplies.Supplies_ID, dbo.ICS_Supplies.Old_ItemID,
dbo.ICS_Supplies.ItemDescription, dbo.ICS_Supplies.OnHand,
dbo.ICS_Supplies.ReorderLevel, dbo.ICS_Supplies.ReorderAmt,
dbo.ICS_Supplies.UnitMeasure,
dbo.ICS_Supplies.SupplyLocation, dbo.ICS_Supplies.InvType,
dbo.ICS_Supplies.Discontinued, dbo.ICS_Supplies.Supply,
ReqNumberRanked.RequsitionNumber,
ReqNumberRanked.OpenClosed,
ReqNumberRanked.TransType,
ReqNumberRanked.OriginalDate
FROM
dbo.ICS_Supplies
LEFT OUTER JOIN dbo.ICS_Orders ON dbo.ICS_Supplies.Supplies_ID = dbo.ICS_Orders.SuppliesID
LEFT OUTER JOIN ReqNumberRanked ON ReqNumberRanked.RequisitionNumber = dbo.ICS_Transactions.RequsitionNumber
AND (ReqNumberRanked.TransType = 'PO')
AND ReqNumberRanked.RequisitionNumberRankReversed = 1

How to use a SQL table instead of a long string for a WHERE in (string) condition

I have the following SQL statement:
select customer_id, prod_id, prod_start, prod_price
from prod_table
where prod_id in (PRODLIST)
Unfortunately, PRODLIST contains about 68K 6-digit numbers. When I try to run this query on my server, I get an error that SQL can't handle so many prod_id as presented in a string.
My next thought was to put all the 68K 6-digit numbers into a single column table included_prodlist with column heading included_prod_id. The resulting included_prodlist table would then be a single column table with 68K rows, and each column would be a unique 6-digit number.
I could then do an inner join of the original query with included_prodlist as follows:
select customer_id, prod_id, prod_start, prod_price
from prod_table
where prod_id in (select included_prod_id from included_prodlist)
Unfortunately, this doesn't seem to be working i.e. the query returns no entries.
Is this the proper way to deal with long conditions?
Should I be using an inner join instead?
select customer_id, prod_id, prod_start, prod_price
from prod_table
inner join included_prodlist on prod_table.prod_id = included_prodlist.included_prod_id
Putting the values in to a table is highly recommended. The one column should be the primary key.
Then, I would go for exists rather than not in:
select p.customer_id, p.prod_id, p.prod_start, p.prod_price
from prod_table p
where exists (select 1
from included_prodlist ip
where ip.included_prod_id = p.prod_id
);
Of course using INNER JOIN can be more helpful with a better performance. For best practices create the index which is recommended in query execution plan :)
INNER JOIN on a single column table is preferable on a nested query
In a nested query the internal query runs 1st and its results are placed in the outer query
Using join, there is only one query, preferably on indexed columns
You will be able to see the differences adding EXPLAIN before the SELECT command
I would generate the product list table as a temp table, with an indexed column, that way the query would run with join even faster

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10

How do I write an SQL query to identify duplicate values in a specific field?

This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!
SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).
Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example
If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle

How to select all attributes in sql Join query

The following sql query below produces the specified result.
select product.product_no,product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
However my intensions are to further produce a more detailed result which will include all the attributes in the PRODUCT table. my approach is to list all the attributes in the first line of the query.
select product.product_no,product.product_date,product.product_colour,product.product_style,
product.product_age product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
Is there another way it can be done instead of listing all the attributes of PRoduct table one by one?
You can use * to select all columns from all tables, or you can use [table/alias].* to select all columns from the specified table. In your case, you can use product.*:
select product.*,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
It is important to note that you should only do this if you are 100% sure you need every single column, and always will. There are performance implications associated with this; if you're selecting 100 columns from a table when you really only need 4 or 5 of them, you're adding a lot of overhead to the query. The DBMS has to work harder, and you're also sending more data across the wire (if your database is not on the same machine as your executing code).
If any columns are later added to the product table, those columns will also be returned by this query in the future.
select
product.*,
salesteam.rep_name,
salesteam.SUPERVISOR_NAME
from product inner join salesteam on
product.product_rep=salesteam.rep_id
ORDER BY
product.Product_No;
This should do.
You can write like this
select P.* --- all Product columns
,S.* --- all salesteam columns
from product P
inner join salesteam S
on P.product_rep=S.rep_id
ORDER BY P.Product_No;