get most current record if* - sql

i have a set of data for service jobs and i want to identify customers that still have an old part installed (part x) but if that customer has the new replacement part (part y) then i dont want them to populate in my data. The best way i can describe it is think of a recall. Now every Job has a number, that number is always increasing with new jobs across the customer. So im looking for where (part x) has been installed (part y) has not. Customers all have a customer number that any jobs are associated to. In my example below Customers (12373,12369,12349) would all show up on my list but customer (12365,would not because they were upgraded to part y on a numerically higher job #.
Any help would be great, new to sql

My version :)
SELECT
t1.*
FROM
`table` AS t1
LEFT JOIN (
SELECT
`Customer Number`
FROM
`table`
WHERE
`Parts` > 'part x'
) AS t2
ON ( t1.`Customer Number` = t2.`Customer Number` )
WHERE
t1.`Parts` = 'part x'
AND t2.`Customer Number` IS NULL

General sql syntax. Can be a bit shorter in some modern DBMSes, but perfomance should be good enough when index on (customerNumber, jobNumber) exists.
select customerNumber, jobNumber, parts
from theTable t1
where parts='part x' and not exists (
select 1
from theTable t2
where t2.customerNumber = t1.customerNumber
and t2.jobNumber > t1.jobNumber
and t2.parts='part y')

Related

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Google BigQuery asking for JOIN EACH but I'm already using it

I'm trying to run a query in BigQuery which has two sub selects and a join, but I can't get it to work. What I'm doing as a workaround is to run the subselects by themselves, then saving them as tables, then doing another query with a join, but I think I should be able to do this with one query.
I'm getting the error:
Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins
but I'm already using a join each. I've tried using a cross join and using group by each but those give me different errors. The other questions on Stack Overflow about this topic don't help, one says it was a bug in BigQuery and the other was somebody using 'cross join each'...
Below is my sql, forgive me if it's full of errors but I think it should work:
select
t1.device_uuid,
t1.session_uuid,
t1.nth,
t1.Diamonds_Launch,
t2.Diamonds_Close
from (
select
device_uuid,
session_uuid,
nth,
sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch
from [project_id].[table_id]
where name = 'App Launch'
and attributes.Name = 'Inventory - Diamonds'
group by device_uuid, session_uuid, nth
) as t1
join each (
select
device_uuid,
session_uuid,
nth,
sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close
from [project_id].[table_id]
where name = 'App Close'
and attributes.Name = 'Inventory - Diamonds'
group by device_uuid, session_uuid, nth
) as t2
on t1.device_uuid = t2.device_uuid
and t1.session_uuid = t2.session_uuid
You've got a GROUP BY inside a JOIN EACH. GROUP BY hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join.
If you change the GROUP BY to GROUP EACH BY, this will most likely work.
(yes, I realize that this is unpleasant and non-standard. The BigQuery team is currently working hard on making things like this 'just work'.)
This can be combined to one single query:
SELECT device_uuid,
session_uuid,
nth,
SUM(IF (name = 'App Launch', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Launch,
SUM(IF (name = 'App Close', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Close,
FROM [project_id].[table_id]
WHERE attributes.Name = 'Inventory - Diamonds'
GROUP BY device_uuid,
session_uuid,
nth
You also have to use GROUP EACH for large tables.

Joining multiple tables in Access and limiting to Top 1 result

I have three tables that need to be joined in order to get monthly inventory data in return.
Table 1: TargetInventory
Table 2: TargetValue
Table 3: TargetWeight
[TargetInventory] does not change after being added the first time.
[TargetValue] is just a small table that includes prices of various types of metal.
[TargetWeight] is updated monthly as part of our inventory process. We INSERT new data, we never UPDATE old data.
Below is the relationship between these tables. (Sorry, I don't have the reputation points to post an image. Brand new here, so hopefully this makes sense.)
(* = UniqueKey)
--TargetValue-- --TargetInventory-- --TargetWeight--
*MaterialID <===| *TargetID <=====| *ID
Material |===> MaterialID |===> TargetID
PricePerOunce Length RecordDate
Density Width Weight
Thickness
DateInInventory
The TargetWeight table contains multiple records for TargetID (since a new one is added every month at inventory). That's good for me to track historical usage, but for the current inventory value, I only need the most recent TargetWeight.Weight to be returned.
I don't know how to do a CROSS APPLY from within another INNER JOIN, so I'm at a loss for how to do this (without switching to mySQL and just doing a LIMIT 1...)
I think it needs to look something like what's below, but I'm not sure how to finish the query.
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(TargetValue
INNER JOIN TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
CROSS APPLY (
SELECT TOP 1 *
FROM .....
)
The following query works for me in Access 2010. It uses an INNER JOIN on a subquery to take the place of the CROSS APPLY (which Access SQL doesn't support). It assumes that there will be no more than one [TargetWeight] record for a given (TargetID, RecordDate):
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
LatestWeight.ID,
LatestWeight.TargetID AS TargetWeight_TargetID,
LatestWeight.RecordDate,
LatestWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
(
SELECT tw.*
FROM
TargetWeight AS tw
INNER JOIN
(
SELECT TargetID, MAX(RecordDate) AS LatestDate
FROM TargetWeight
GROUP BY TargetID
) AS latest
ON latest.TargetID=tw.TargetID
AND latest.LatestDate=tw.RecordDate
) AS LatestWeight
ON LatestWeight.TargetID = TargetInventory.TargetID
Alternative approach specifically for Access 2010 or later
If the above query bogs down with a large number of rows in [TargetWeight] then another possible solution for Access 2010+ would be to add a Yes/No field named [Current] to the [TargetWeight] table and use the following After Insert data macro to ensure that only the latest record for each [TargetID] is flagged as [Current]:
Once that is done, the query would simply be
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
TargetWeight
ON TargetInventory.TargetID = TargetWeight.TargetID
WHERE TargetWeight.Current = True;
To maximize performance, the [TargetWeight].[TargetID] and [TargetWeight].[Current] fields should be indexed.
SELECT TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density, Weight.ID,
Weight.TargetID AS TargetWeight_TargetID,
Weight.RecordDate,
Weight.Weight
FROM TargetInventory
INNER JOIN TargetValue ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
CROSS APPLY (
SELECT TOP 1 *
FROM TargetWeight
WHERE TargetID = TargetInventory.TargetID
ORDER BY RecordDate DESC
) AS Weight

SQL query on a condition

I'm writing a query to retrieve translated content. I want it so that if there isn't a translation for the given language id, it automatically returns the translation for the default language, with Id 1.
select Translation.Title
,Translation.Summary
from Translation
where Translation.FkLanguageId = 3
-- If there is no LanguageId of 3, select the record with LanguageId of 1.
I'm working in MS SQL but I think the theory is not DBMS-specific.
Thanks in advance.
This assumes one row per Translation only, based on how you phrased the question. If you have multiple rows per FkLanguageId and I've misunderstood, please let us know and the query becomes more complex of course
select TOP 1
Translation.Title
,Translation.Summary
from
Translation
where
Translation.FkLanguageId IN (1, 3)
ORDER BY
FkLanguageId DESC
You'd use LIMIT in another RDBMS
Assuming the table contains different phrases grouped by PhraseId
WITH Trans As
(
select Translation.Title
,Translation.Summary
,ROW_NUMBER() OVER (PARTITION BY PhraseId ORDER BY FkLanguageId DESC) RN
from Translation
where Translation.FkLanguageId IN (1,3)
)
SELECT *
FROM Trans WHERE RN=1
This assumes the existance of a TranslationKey that associates one "topic" with several different translation languages:
SELECT
isnull(tX.Title, t1.Title) Title
,isnull(tX.Summary, t1.Summary) Summary
from Translation t1
left outer join Translation tX
on tx.TranslationKey = t1.Translationkey
and tx.FkLanguageId = #TargetLanguageId
where t1.FkLanguageId = 1 -- "Default
Maybe this is a dirty solution, but it can help you
if not exists(select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3)
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 1
else
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3
Since your reference to pastie.org shows that you're looking up phrases or specific menu item names in a table I'm going to assume that there is a phrase ID to identify the phrases in question.
SELECT ISNULL(forn_lang.Title, default_lang.Title) Title,
ISNULL(forn_lang.Summary, default_lang.Summary) Summary
FROM Translation default_lang
LEFT OUTER JOIN Translation forn_lang ON default_lang.PhraseID = forn_lang.PhraseID AND forn_lang.FkLanguageId = 3
WHERE default_lang.FkLanguageId = 1

Selecting Maximum Version Number from Two Columns

This relates to another question I asked previously. You may have a better understanding of this if you quickly scan it.
Version Numbers float, decimal or double
I have two colums and a foreign in a database table. A [Version] column and a [Revision] column. These are in relation to version numbers. e.g. Version 1, Revision 2 = v1.2
What I need to do is grab the maximum version number for a particular foreign key.
Here's what I have so far:
SELECT f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM
(
SELECT
docs.[fkDocumentHeaderID]
,MAX([Version]) AS Version
,MAX([Revision]) AS Revision
FROM
[ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN
dbo.tbl_Documents docs ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)
AS x
INNER JOIN
dbo.tbl_DocumentFiles f ON
f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID] AND
f.[Version] = x.[Version] AND
f.[Revision] = x.[Revision]
Basically grabbing the maximum and joining back to itself. This obvisouly doesn't work because if I have version numbers 1.1, 1.2 and 2.0 the maximum value I'm returning from the above query is 2.2 (which doesn't exist).
What I need to do (I think) is select the maximum [Version] and then select the maximum [Revision] for that [Version] but I can't quite figure how to do this.
Any help, suggestions, questions are all welcome.
Thanks.
You could change it to
SELECT f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM (
SELECT docs.[fkDocumentHeaderID]
,MAX([Version] * 100000 + [Revision]) AS [VersionRevision]
FROM [ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN dbo.tbl_Documents docs
ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)AS x
INNER JOIN dbo.tbl_DocumentFiles f
ON f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID]
AND f.[Version] * 100000 + f.[Revision] = x.[VersionRevision]
The idea is to multiply the Version with a constant large enough so it never collides with revision (I have taken 100.000 but any value would do).
After that, your JOIN does the same to retrieve the record.
The below should work to extract the top revision.
SELECT TOP 1 f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM
(
SELECT
docs.[fkDocumentHeaderID]
,MAX([Version]) AS Version
-- Comment this out ,MAX([Revision]) AS Revision
FROM
[ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN
dbo.tbl_Documents docs ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)
AS x
INNER JOIN
dbo.tbl_DocumentFiles f ON
f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID] AND
f.[Version] = x.[Version]
ORDER BY x.Revision DESC
Namely, it extracts only the records using the max version into table x. Then it orders these records by revision in descending order, and extracts the topmost of the bunch.