Selecting Maximum Version Number from Two Columns - sql

This relates to another question I asked previously. You may have a better understanding of this if you quickly scan it.
Version Numbers float, decimal or double
I have two colums and a foreign in a database table. A [Version] column and a [Revision] column. These are in relation to version numbers. e.g. Version 1, Revision 2 = v1.2
What I need to do is grab the maximum version number for a particular foreign key.
Here's what I have so far:
SELECT f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM
(
SELECT
docs.[fkDocumentHeaderID]
,MAX([Version]) AS Version
,MAX([Revision]) AS Revision
FROM
[ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN
dbo.tbl_Documents docs ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)
AS x
INNER JOIN
dbo.tbl_DocumentFiles f ON
f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID] AND
f.[Version] = x.[Version] AND
f.[Revision] = x.[Revision]
Basically grabbing the maximum and joining back to itself. This obvisouly doesn't work because if I have version numbers 1.1, 1.2 and 2.0 the maximum value I'm returning from the above query is 2.2 (which doesn't exist).
What I need to do (I think) is select the maximum [Version] and then select the maximum [Revision] for that [Version] but I can't quite figure how to do this.
Any help, suggestions, questions are all welcome.
Thanks.

You could change it to
SELECT f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM (
SELECT docs.[fkDocumentHeaderID]
,MAX([Version] * 100000 + [Revision]) AS [VersionRevision]
FROM [ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN dbo.tbl_Documents docs
ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)AS x
INNER JOIN dbo.tbl_DocumentFiles f
ON f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID]
AND f.[Version] * 100000 + f.[Revision] = x.[VersionRevision]
The idea is to multiply the Version with a constant large enough so it never collides with revision (I have taken 100.000 but any value would do).
After that, your JOIN does the same to retrieve the record.

The below should work to extract the top revision.
SELECT TOP 1 f.[pkFileID]
,x.[fkDocumentHeaderID]
,f.[fkDocumentID]
,x.[Version]
,x.[Revision]
,f.[FileURL]
,f.[UploadedBy]
,f.[UploadedDate]
FROM
(
SELECT
docs.[fkDocumentHeaderID]
,MAX([Version]) AS Version
-- Comment this out ,MAX([Revision]) AS Revision
FROM
[ClinicalGuidanceV2].[dbo].[tbl_DocumentFiles]
INNER JOIN
dbo.tbl_Documents docs ON [fkDocumentID] = [pkDocumentID]
GROUP BY
docs.[fkDocumentHeaderID]
)
AS x
INNER JOIN
dbo.tbl_DocumentFiles f ON
f.[fkDocumentHeaderID] = x.[fkDocumentHeaderID] AND
f.[Version] = x.[Version]
ORDER BY x.Revision DESC
Namely, it extracts only the records using the max version into table x. Then it orders these records by revision in descending order, and extracts the topmost of the bunch.

Related

SQL - Returning fields based on where clause then joining same table to return max value?

I have a table named Ticket Numbers, which (for this example) contain the columns:
Ticket_Number
Assigned_Group
Assigned_Group_Sequence_No
Reported_Date
Each ticket number could contain 4 rows, depending on how many times the ticket changed assigned groups. Some of these rows could contain an assigned group of "Desktop Support," but some may not. Here is an example:
Example of raw data
What I am trying to accomplish is to get the an output that contains any ticket numbers that contain 'Desktop Support', but also the assigned group of the max sequence number. Here is what I am trying to accomplish with SQL:
Queried Data
I'm trying to use SQL with the following query but have no clue what I'm doing wrong:
select ih.incident_number,ih.assigned_group, incident_history2.maxseq, incident_history2.assigned_group
from incident_history_public as ih
left join
(
select max(assigned_group_seq_no) maxseq, incident_number, assigned_group
from incident_history_public
group by incident_number, assigned_group
) incident_history2
on ih.incident_number = incident_history2.incident_number
and ih.assigned_group_seq_no = incident_history2.maxseq
where ih.ASSIGNED_GROUP LIKE '%DS%'
Does anyone know what I am doing wrong?
You might want to create a proper alias for incident_history. e.g.
from incident_history as incident_history1
and
on incident_history1.ticket_number = incident_history2.ticket_number
and incident_history1.assigned_group_seq_no = incident_history2.maxseq
In my humble opinion a first error could be that I don't see any column named "incident_history2.assigned_group".
I would try to use common table expression, to get only ticket number that contains "Desktop_support":
WITH desktop as (
SELECT distinct Ticket_Number
FROM incident_history
WHERE Assigned_Group = "Desktop Support"
),
Than an Inner Join of the result with your inner table to get ticket number and maxSeq, so in a second moment you can get also the "MAXGroup":
WITH tmp AS (
SELECT i2.Ticket_Number, i2.maxseq
FROM desktop D inner join
(SELECT Ticket_number, max(assigned_group_seq_no) as maxseq
FROM incident_history
GROUP BY ticket_number) as i2
ON D.Ticket_Number = i2.Ticket_Number
)
SELECT i.Ticket_Number, i.Assigned_Group as MAX_Group, T.maxseq, i.Reported_Date
FROM tmp T inner join incident_history i
ON T.Ticket_Number = i.Ticket_Number and i.assigned_group_seq_no = T.maxseq
I think there are several different method to resolve this question, but I really hope it's helpful for you!
For more information about Common Table Expression: https://www.essentialsql.com/introduction-common-table-expressions-ctes/

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Joining multiple tables in Access and limiting to Top 1 result

I have three tables that need to be joined in order to get monthly inventory data in return.
Table 1: TargetInventory
Table 2: TargetValue
Table 3: TargetWeight
[TargetInventory] does not change after being added the first time.
[TargetValue] is just a small table that includes prices of various types of metal.
[TargetWeight] is updated monthly as part of our inventory process. We INSERT new data, we never UPDATE old data.
Below is the relationship between these tables. (Sorry, I don't have the reputation points to post an image. Brand new here, so hopefully this makes sense.)
(* = UniqueKey)
--TargetValue-- --TargetInventory-- --TargetWeight--
*MaterialID <===| *TargetID <=====| *ID
Material |===> MaterialID |===> TargetID
PricePerOunce Length RecordDate
Density Width Weight
Thickness
DateInInventory
The TargetWeight table contains multiple records for TargetID (since a new one is added every month at inventory). That's good for me to track historical usage, but for the current inventory value, I only need the most recent TargetWeight.Weight to be returned.
I don't know how to do a CROSS APPLY from within another INNER JOIN, so I'm at a loss for how to do this (without switching to mySQL and just doing a LIMIT 1...)
I think it needs to look something like what's below, but I'm not sure how to finish the query.
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(TargetValue
INNER JOIN TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
CROSS APPLY (
SELECT TOP 1 *
FROM .....
)
The following query works for me in Access 2010. It uses an INNER JOIN on a subquery to take the place of the CROSS APPLY (which Access SQL doesn't support). It assumes that there will be no more than one [TargetWeight] record for a given (TargetID, RecordDate):
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
LatestWeight.ID,
LatestWeight.TargetID AS TargetWeight_TargetID,
LatestWeight.RecordDate,
LatestWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
(
SELECT tw.*
FROM
TargetWeight AS tw
INNER JOIN
(
SELECT TargetID, MAX(RecordDate) AS LatestDate
FROM TargetWeight
GROUP BY TargetID
) AS latest
ON latest.TargetID=tw.TargetID
AND latest.LatestDate=tw.RecordDate
) AS LatestWeight
ON LatestWeight.TargetID = TargetInventory.TargetID
Alternative approach specifically for Access 2010 or later
If the above query bogs down with a large number of rows in [TargetWeight] then another possible solution for Access 2010+ would be to add a Yes/No field named [Current] to the [TargetWeight] table and use the following After Insert data macro to ensure that only the latest record for each [TargetID] is flagged as [Current]:
Once that is done, the query would simply be
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
TargetWeight
ON TargetInventory.TargetID = TargetWeight.TargetID
WHERE TargetWeight.Current = True;
To maximize performance, the [TargetWeight].[TargetID] and [TargetWeight].[Current] fields should be indexed.
SELECT TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density, Weight.ID,
Weight.TargetID AS TargetWeight_TargetID,
Weight.RecordDate,
Weight.Weight
FROM TargetInventory
INNER JOIN TargetValue ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
CROSS APPLY (
SELECT TOP 1 *
FROM TargetWeight
WHERE TargetID = TargetInventory.TargetID
ORDER BY RecordDate DESC
) AS Weight

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

Handling negative values with sql

I have a data set that lists the date and quantity of future stock of products. Occasionally our demand outstrips our future supply and we wind up with a negative future quantity. I need to factor that future negative quantity into previous supply so we don't compound the problem by overselling our supply.
In the following data set, I need to prepare for demand on 10-19 by applying the negative quantity up the chain until i'm left with a positive quantity:
"ID","SKU","DATE","SEASON","QUANTITY"
"1","001","2012-06-22","S12","1656"
"2","001","2012-07-13","F12","1986"
"3","001","2012-07-27","F12","-283"
"4","001","2012-08-17","F12","2718"
"5","001","2012-08-31","F12","-4019"
"6","001","2012-09-14","F12","7212"
"7","001","2012-09-21","F12","782"
"8","001","2012-09-28","F12","2073"
"9","001","2012-10-12","F12","1842"
"10","001","2012-10-19","F12","-12159"
I need to get it to this:
"ID","SKU","DATE","SEASON","QUANTITY"
"1","001","2012-06-22","S12","1656"
"2","001","2012-07-13","F12","152"
I have looked at using a while loop as well as an outer apply but cannot seem to find a way to do this yet. Any help would be much appreciated. This would need to work for sql server 2008 R2.
Here's another example:
"1","002","2012-07-13","S12","1980"
"2","002","2012-08-10","F12","-306"
"3","002","2012-09-07","F12","826"
Would become:
"1","002","2012-07-13","S12","1674"
"3","002","2012-09-07","F12","826"
You don't seem to get a lot of answers - so here's something if you won't get the right 'how-to do it in pure SQL'. Ignore this solution if there's anything SQLish - it's just a defensive coding, not elegant.
If you want to get a sum of all data with same season why deleting duplicate records - just get it outside, run a foreach loop, sum all data with same season value, update table with the right values and delete unnecessary entries. Here's one of the ways to do it (pseudocode):
productsArray = SELECT * FROM products
processed = array (associative)
foreach product in productsArray:
if product[season] not in processed:
processed[season] = product[quantity]
UPDATE products SET quantity = processed[season] WHERE id = product[id]
else:
processed[season] = processed[season] + product[quantity]
DELETE FROM products WHERE id = product[id]
Here is a CROSS APPLY - tested
SELECT b.ID,SKU,b.DATE,SEASON,QUANTITY
FROM (
SELECT SKU,SEASON, SUM(QUANTITY) AS QUANTITY
FROM T1
GROUP BY SKU,SEASON
) a
CROSS APPLY (
SELECT TOP 1 b.ID,b.Date FROM T1 b
WHERE a.SKU = b.SKU AND a.SEASON = b.SEASON
ORDER BY b.ID ASC
) b
ORDER BY ID ASC