Trouble with pulling distinct data - sql

Ok this is hard to explain partially because I'm bad at sql but this code isn't doing exactly what I want it to do. I'll try to explain what it is supposed to do as best I can and hopefully someone can spot a glaring mistake. I'm sorry about the long winded explanation but there is a lot going on here and I really could use the help.
The point of this script is to search for parts which need to be obsoleted. in other words they haven't been used in three years and are still active.
When we obsolete part, "part.status" is set to 'O'. It is normally null. Also, the word 'OBSOLETE' is usually written in to "part.description"
The "WORK_ORDER" contains every scheduled work order. These are defined by base,lot, and sub ID's. It also contains many dates such as the date when the work order was closed.
the "REQUIREMENT" table contains all the parts require for each job. many jobs may require multiple parts, some at different legs of the job. The way this is handled is that for a given "REQUIREMENT.WORKORDER_BASE_ID" and "REQUIREMENT.WORKORDER_LOT_ID", they may be listed on a dozen or so subsequent rows. Each line specifies a different "REQUIREMENT.PART_ID". The sub id separates what leg of the job that the part is needed. All of the parts I care about start with 'PCH'
When I run this code it returns 14 lines, I happen to know it should be returning about 39 right now. I believe the screwy part starts at line 17. I found that code on another form hoping that it would help solve the original problem. Without that code, I get like 27K lines because the DB is pulling every criteria matching requirement from every criteria matching work order. Many of these parts are used on multiple jobs. I've also tried using DISTINCT on REQUIREMENT.PART_ID which seems like it should solve the problem. Alas it doesn't.
So I know despite all the information I probably still didn't give nearly enough. Does anyone have any suggestions?
SELECT
PART.ID [Engr Master]
,PART.STATUS [Master Status]
,WO.CLOSE_DATE
,PT.ID [Die]
,PT.STATUS [Die Status]
FROM PART
CROSS APPLY(
SELECT
WORK_ORDER.BASE_ID
,WORK_ORDER.LOT_ID
,WORK_ORDER.SUB_ID
,WORK_ORDER.PART_ID
,WORK_ORDER.CLOSE_DATE
FROM WORK_ORDER
WHERE
GETDATE() - (360*3) > WORK_ORDER.CLOSE_DATE
AND PART.ID = WORK_ORDER.PART_ID
AND PART.STATUS ='O'
)WO
CROSS APPLY(
SELECT
REQUIREMENT.WORKORDER_BASE_ID
,REQUIREMENT.WORKORDER_LOT_ID
,REQUIREMENT.WORKORDER_SUB_ID
,REQUIREMENT.PART_ID
FROM REQUIREMENT
WHERE
WO.BASE_ID = REQUIREMENT.WORKORDER_BASE_ID
AND WO.LOT_ID = REQUIREMENT.WORKORDER_LOT_ID
AND WO.SUB_ID = REQUIREMENT.WORKORDER_SUB_ID
AND REQUIREMENT.PART_ID LIKE 'PCH%'
)REQ
CROSS APPLY(
SELECT
PART.ID
,PART.STATUS
FROM PART
WHERE
REQ.PART_ID = PART.ID
AND PART.STATUS IS NULL
)PT
ORDER BY PT.ID

This is difficult to understand without any sample data, but I took a stab at it anyway. I removed the second JOIN to PART (that had alias PART1) as it seemed unecessary. I also removed the subquery that was looking for parts HAVING COUNT(PART_ID) = 1
The first JOIN to PART should be done on REQUIREMENT.PART_ID = PART.PART_ID as the relationship as already been defined from WORK_ORDER to REQUIREMENT, hence you can JOIN PART directly to REQUIREMENT at this point.
EDIT 03/23/2015
If I understand this correctly, you just need a distinct list of PCH parts, and their respective last (read: MAX) CLOSE_DATE. If that is the case, here is what I propose.
I broke the query up into a couple of CTE's. The first CTE is simply going through the PART table and pulling out a DISTINCT list of PCH parts, grouping by PART_ID and DESCRIPTION.
The second CTE, is going through the REQUIREMENT table, joining to the WORK_ORDER table and, for each PART_ID (handled by the PARTITION) assigning the CLOSE_DATE a ROW_NUMBER in descending order. This will ensure that each ROW_NUMBER with a value of "1" will be the Max CLOSE_DATE for each PART_ID.
The final SELECT statement simply JOINS the two Cte's on PART_ID, filtering where LastCloseDate = 1 (the ROW_NUMBER assigned in the second CTE).
If I understand the requirements correctly, this should give you the desired results.
Additionally, I removed the filter WHERE PART.DESCRIPTION NOT LIKE 'OB%' because we're already filtering by PART.STATUS IS NULL and you stated above that an 'O' is placed in this field for Obsolete parts. Also, [DIE] and [ENGR MASTER] have the same value in the 27 rows being pulled before, so I just used the same field and labeled them differently.
; WITH Parts AS(
SELECT prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
FROM PART prt
WHERE prt.STATUS IS NULL
AND prt.PART_ID LIKE 'PCH%'
GROUP BY prt.ID, prt.DESCRIPTION
)
, LastCloseDate AS(
SELECT req.PART_ID
, wrd.CLOSE_DATE
, ROW_NUMBER() OVER(PARTITION BY req.PART_ID ORDER BY wrd.CLOSE_DATE DESC) AS LastCloseDate
FROM REQUIREMENT req
INNER JOIN WORK_ORDER wrd
ON wrd.BASE_ID = req.WORKORDER_BASE_ID
AND wrd.LOT_ID = req.WORKORDER_LOT_ID
AND wrd.SUB_ID = req.WORKORDER_SUB_ID
WHERE wrd.CLOSE_DATE IS NOT NULL
AND GETDATE() - (365 * 3) > wrd.CLOSE_DATE
)
SELECT prt.PART_ID AS [DIE]
, prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
, lst.CLOSE_DATE
FROM Parts prt
INNER JOIN LastCloseDate lst
ON prt.PART_ID = lst.PART_ID
WHERE LastCloseDate = 1

Related

My Joins in query not pulling through correctly

Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated!
Output
select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount)
from wbr_global.gl_ap_details
LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number
Where gl_ap_details.cost_center = '1172'
and gl_ap_details.period_name = 'JUL-21'
and gl_ap_details.ledger_name = 'Amazon.com, Inc.'
Group by 1;
I tried adding the cast function but after 5000 seconds of the query running I canceled it.
The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow.
Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree.
Added GL Account number as described below the query.
select
gl.account_number,
fm.new_mapping_1,
sum(gl.amount)
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
gl.account_number,
fm.new_mapping_1
Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get
GLAccount Description SumOfAmount
glaccount1 null $someAmount
glaccount37 null $someAmount
glaccount49 null $someAmount
glaccount72 Depreciation $someAmount
glaccount87 Real Estate $someAmount
glaccount92 Building $someAmount
glaccount99 Salaries $someAmount
I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found.
Once you find which are missing, you can check / confirm they SHOULD be in the mapping table.
FEEDBACK.
Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has
Mapping file.
GL Descr1 NewMapping
1 test Salaries
1 testView Buildings
1 Another Depreciation
And your GL_AP_Details has
GL Amount
1 $100
Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit.
select
fm.new_mapping_1,
sum(gl.amount),
count(*) as NumberOfEntries,
count( distinct gl.UniqueIdField ) as DistinctTransactions
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
fm.new_mapping_1
Might you also need to limit the mapping table for a specific prophecy or mec view?
If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications.
For instance, pick 'Building Management':
SELECT fixed.new_mapping_1,details.amount,*
FROM wbr_global.gl_ap_details details
LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number
WHERE details.cost_center = '1172'
AND details.period_name = 'JUL-21'
AND details.ledger_name = 'Amazon.com, Inc.'
AND details.account_number = 'Building Management'
Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns.
This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online.
Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query.
The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.

MS Access 2013, How to add totals row within SQL

I'm in need of some assistance. I have search and not found what I'm looking for. I have an assigment for school that requires me to use SQL. I have a query that pulls some colunms from two tables:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";
I need to add a Totals row at the bottom to count the CourseNo and Sum the CrHrs. It has to be done through SQL query design as I need to paste the code. I know it can be done with the datasheet view but she will not accept that. Any advice?
To accomplish this, you can union your query together with an aggregation query. Its not clear from your question which columns you are trying to get "Totals" from, but here's an example of what I mean using your query and getting counts of each (kind of useless example - but you should be able to apply to what you are doing):
SELECT
[Course].[CourseNo]
, [Course].[CrHrs]
, [Sections].[Yr]
, [Sections].[Term]
, [Sections].[Location]
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = [spring]
UNION ALL
SELECT
"TOTALS"
, SUM([Course].[CrHrs])
, count([Sections].[Yr])
, Count([Sections].[Term])
, Count([Sections].[Location])
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = “spring”
You can prepare your "total" query separately, and then output both query results together with "UNION".
It might look like:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring"
UNION
SELECT "Total", SUM(Course.CrHrs), SUM(Sections.Yr), SUM(Sections.Term), SUM(Sections.Location)
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";
Whilst you can certainly union the aggregated totals query to the end of your original query, in my opinion this would be really bad practice and would be undesirable for any real-world application.
Consider that the resulting query could no longer be used for any meaningful analysis of the data: if displayed in a datagrid, the user would not be able to sort the data without the totals row being interspersed amongst the rest of the data; the user could no longer use the built-in Totals option to perform their own aggregate operation, and the insertion of a row only identifiable by the term totals could even conflict with other data within the set.
Instead, I would suggest displaying the totals within an entirely separate form control, using a separate query such as the following (based on your own example):
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM Course INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term = "spring";
However, since CrHrs are fields within your Course table and not within your Sections table, the above may yield multiples of the desired result, with the number of hours multiplied by the number of corresponding records in the Sections table.
If this is the case, the following may be more suitable:
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM
Course INNER JOIN
(SELECT DISTINCT s.CourseNo FROM Sections s WHERE s.Term = "spring") q
ON Course.CourseNo = q.CourseNo

SQL query seems to work for 'AND T1.email_address_ IN (subquery)', but returns 0 rows for 'AND T1.email_address_ NOT IN (subquery)'

Good morning. I'm working in Responsys Interact, which is an Oracle-based email campaign management type SAAS product. I'm creating a query to basically filter a target list for an email campaign designed to target a specific sub-set of our master email contact list. Here's the query I created a few weeks ago that appears to work:
/*
Table Symbolic Name
CONTACTS_LIST $A$
Engaged $B$
TRANSACTIONS_RAW $C$
TRANSACTION_LINES_RAW $D$
-- A Responsys Filter (Engaged) will return only an RIID_, nothing else, according to John # Responsys....so,....let's join on that to contact list...
*/
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$A$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
;
This seems to work, in that if I run the query without the sub-query, we get 720K rows; and if I add back the 'AND NOT IN...' subquery, we get about 700K rows, which appears correct based on what my user knows about her data. What I'm (supposedly) doing with the NOT IN subquery is filtering out any email addresses where the customer has purchased certain items from us in the last 60 days.
So, now I need to add in another constraint. We still don't want customers who made certain purchases in the last 60 days as above, but now also we want to exclude customers who have purchased another particular item, but now within the last 12 months. So, I thought I would add another subquery, as shown below. Now, this has introduced several problems:
Performance - the query, which took a couple minutes to run before, now takes quite a few more minutes to run - in fact it seems to time out....
So, I wondered if there's an issue having two subqueries, but before I went to think about alternatives to this, I decided to test my new subquery by temporarily deleting the first subquery, so that I had just one subquery similar to above, but with the new item = 11 and within the last 12 months logic. And so with this, the query finally returned after a few minutes now, but with zero rows.
Trying to figure out why, I tried simply changing the AND NOT IN (subquery) to AND IN (subquery), and that worked, in that it returned a few thousand rows, as expected.
So why would the same SQL when using AND IN (subquery) "work", but the exact same SQL simply changed to AND NOT IN (subquery) return zero rows, instead of what I would expect which would be my 700 something thousdand plus rows, less the couple thousand encapsulated by the subquery result?
Also, what is the best i.e. most performant way to accomplish what I'm trying to do, which is filter by some purchases made within one date range, AND by some other purchases made within a different date range?
Here's the modified version:
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
AND
$A$.EMAIL_ADDRESS_ NOT IN
(
/* get purchase transactions for another type of item within last year */
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$D$.ITEM_FAMILY_ID = 11 AND $C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -12)
)
;
Thanks for any ideas/insights. I may be missing or mis-remembering some basic SQL concept here - if so please help me out! Also, Responsys Interact runs on top of Oracle - it's an Oracle product - but I don't know off hand what version/flavor. Thanks!
Looks like my problem with the new subquery was due to poor performance due to lack of indexes. Thanks to Alex Poole's comments, I looked in Responsys and there is a facility to get an 'explain' type analysis, and it was throwing warnings, and suggesting I build some indexes. Found the way to do that on the data sources, went back to the explain, and it said, "The query should run without placing an unnecessary burden on the system". And while it still ran for quite a few minutes, it did finally come back with close to the expected number of rows.
Now, I'm on to tackle the other half of the issue, which is to now incorporate this second sub-query in addition to the first, original subquery....
Ok, upon further testing/analysis and refining my stackoverflow search critieria, the answer to the main part of my question dealing with the IN vs. NOT IN can be found here: SQL "select where not in subquery" returns no results
My performance was helped by using Responsys's explain-like feature and adding some indexes, but when I did that, I also happened to add in a little extra SQL in my sub-query's WHERE clause.... when I removed that, even after indexes built, I was back to zero rows returned. That's because as it turned out at least one of the transactions rows for the item family id I was interested in for this additional sub-query had a null value for email address. And as further explained in the link above, when using NOT IN, as soon as you have a null value involved, SQL can't definitively say it's NOT IN, since you can't really compare to null, so as soon as you have a null, the sub-query's going to evaluate 'false', thus zero rows. When using IN, even though there are nulls present, if you get one positive match, well, that's a match, so the sub-query returns 'true', so that's why you'll get rows with IN, but not with NOT IN. I hadn't realized that some of our transaction data may have null email addresses - now I know, so I just added a where not null to the where clause for the email address, and now all's good.

Writing a query to include certain values but exclude others when looking for a latest time period

I am trying to write a query that looks for a people that have a certain code with the latest period (year) but not if they have another code with that latest period(year). I'll be explicit just so my example makes sense.
I want people who have the code A1,A2,A3,A4,A5 but not AG,AP,AQ. There are people who have an A1 code for a period (like 2014) and an AG code for a the same period. I'd like to exclude them. Not everyone has a code so the field value could be NULL.
Is there a way to express this in a different way (i.e. less characters) than the way I did?
SELECT
people.firstName
FROM
people
WHERE EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (SELECT MAX(period) FROM codes codes2 WHERE codes2.people_id = codes.people_id)
AND code LIKE 'A[1-5]'
)
AND NOT EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (
SELECT MAX(period)
FROM codes codes2
WHERE codes2.people_id = codes.people_id
)
AND code LIKE 'A[GPQ]'
)
Schema is as follows:
People
id (PK)
firstName
Codes
people_id (FK) many to one relation with People table
code (e.g. "A1", "A2", "AG")
period (e.g. "2013", "2014")
There are so many ways you could do that, I'm not an SQL expert but I can't see your query being too bad, if you want to try and reduce the number of sub-queries you could consider using the GROUP BY clause along with a SUM Aggregate function in a HAVING clause.
I started updating your code as follows:
SELECT
people.firstName
FROM
people
LEFT JOIN codes AS a15 ON a15.people_id = people.id AND a15.code LIKE 'A[1-5]'
LEFT JOIN codes AS agpq ON agpq.people_id = people.id AND agpq.code LIKE 'A[GPQ]'
GROUP BY
people.firstName
HAVING
SUM(CASE WHEN a15.code IS NULL THEN 0 ELSE 1 END) > 0
AND SUM(CASE WHEN agpq.code IS NULL THEN 0 ELSE 1 END) = 0
This however doesn't take into account anything to do with period specific requirements described. You could add the period to the GROUP BY clause or add it to a WHERE or one of the JOIN constraints but I'm not quite sure from your description exactly what you're after (I don't believe this is through any fault of your own, I just can't personally align the code provided to the description).
I would also like to point out that the SUM functions above will not give an accurate count of the number of matching codes. This is because if both A[GPQ] and A[1_5] return at least one row, the number returned by each constraint will be multiplied by the number returned for the other, it can however be used to determine if there are "any" returned items as if the criteria is matched it will have a SUM(...) > 0
I'm sure a more experienced SQL Developer / DBA will be able to poke many holes in my proposed query but it might give them or someone else something to work from and hopefully gives you ideas for alternatives to using sub-queries.

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate