Access delete all except max value - sql

I am trying to prune an Access table (contam) full of duplicates and concatenated values.
The table looks like this
FederalSiteIdentifier Contam
001 1
001 1, 2
001 1, 2, 3
001 1, 2, 3, 4
002 1
003 1
003 1, 2
003 1, 2, 3
I only want to keep the last - longest - entry for each ID, but cannot figure out the proper way to do this in Access SQL.
After doing some reading I tried this simple code:
SELECT FederalSiteIdentifier, Max(Contam) as MaxCont
FROM contam
ORDER BY FederalSiteIdentifier
which produces an error.
Can anyone help?

I think you want the maximum length of the Contam values for each FederalSiteIdentifier. You will need a GROUP BY clause.
SELECT FederalSiteIdentifier, Max(Len(Contam)) as MaxCont
FROM contam
GROUP BY FederalSiteIdentifier
If that query identifies the rows you wish to keep, save it as qryRows2Keep, then try this DELETE query:
DELECT FROM contam
WHERE
Len(Contam) < DLookup(
"MaxCont",
"qryRows2Keep",
"FederalSiteIdentifier = '" & FederalSiteIdentifier & "'")
I assumed FederalSiteIdentifier is text data type. If it is numeric, discard the single quotes from the DLookup expression.
Please make sure you have backed up your data before attempting this untested suggestion. :-)

MAX is an aggregate function, you can only do these if you use a GROUP BY statement. Access is also very peculiar in the way it can delete records. You can't left join and delete so you have to identify the records you don't want and then delete them.
Create a new column so we can choose the records to keep.
ALTER TABLE contam ADD Keep BIT NOT NULL DEFAULT 0;
Now identify the records and update the table
UPDATE c
SET keep = 1
FROM contam AS c
INNER JOIN (
SELECT FederalSiteIdentifier, Max(Len(Contam)) as MaxCont
FROM contam
GROUP BY FederalSiteIdentifier
ORDER BY FederalSiteIdentifier
) AS maxc
ON c.FederalSiteIdentifier = maxc.FederalSiteIdentifier
AND Len(c.Contam) = maxc.MaxCont;
Note the GROUP BY line...that's what you missed that was giving you an error..
Finally perform the delete
DELETE FROM contam
WHERE keep = 0;
You can now remove the extra column
ALTER TABLE contam DROP COLUMN Keep;
Long winded but there you go.

Related

How can I use a row value to dynamically select a column name in Oracle SQL 11g?

I have two tables, one with a single row for each "batch_number" and another with defect details for each batch. The first table has a "defect_of_interest" column which I would like to link to one of the columns in the second table. I am trying to write a query that would then pick the maximum value in that dynamically linked column for any "unit_number" in the "batch_number".
Here is the SQLFiddle with example data for each table: http://sqlfiddle.com/#!9/a1c27d
For example, the maximum value in the DEFECT_DETAILS.SCRATCHES column for BATCH_NUMBER = A1 is 12.
Here is my desired output:
BATCH_NUMBER DEFECT_OF_INTEREST MAXIMUM_DEFECT_COUNT
------------ ------------------ --------------------
A1 SCRATCHES 12
B3 BUMPS 4
C2 STAINS 9
I have tried using the PIVOT function, but I can't get it to work. Not sure if it works in cases like this. Any help would be much appreciated.
If the number of columns is fixed (it seems to be) you can use CASE to select the specific value according to the related table. Then aggregating is simple.
For example:
select
batch_number,
max(defect_of_interest) as defect_of_interest,
max(defect_count) as maximum_defect_count
from (
select
d.batch_number,
b.defect_of_interest,
case when b.defect_of_interest = 'SCRATCHES' then d.scratches
when b.defect_of_interest = 'BUMPS' then d.bumps
when b.defect_of_interest = 'STAINS' then d.stains
end as defect_count
from defect_details d
join batches b on b.batch_number = d.batch_number
) x
group by batch_number
order by batch_number;
See Oracle example in db<>fiddle.

SQL list multiple Duplicates

running a SQL query in access that is giving me matches where A = record 1, and B also = record 1 , C= record 2 and D E and F also = record 2.
I want my results to display (only max Value)
B =record 1
F= record 2. ( this is a matching query)
basically i want to eliminate duplicates and select "distinct" does not seem to be working for me.
SELECT
FEED_2.ID AS FEED_2_ID,
FEED_3.field_ID,
FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3 ON FEED_2.[field_ID] = FEED_3.[field_ID]
order by FEED_3.ID
im getting results where feed 2 ID #1,3, and 5 all equal feed 3 - ID #1
i only want feed 2, #5 = feed 3 #1. no Dupes
sorry - hope that helps
It's a shot in the dark but, is something like this you are looking for?
SELECT max(Column_With_ABCDEF), Column_With_record from TABLE_NAME GROUP BY Column_With_record;
If this is not what you are asking for, please do edit your question with your table schema and/or the query you are using so we can help.
---------------- EDIT ----------------
Ok so you can try this:
Select max(FEED_2_ID), field_ID , FEED_3_ID
from (
SELECT FEED_2.ID AS FEED_2_ID, FEED_3.field_ID As field_ID, FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3
ON FEED_2.[field_ID] = FEED_3.[field_ID]
)
GROUP BY FEED_3_ID, field_ID
ORDER BY FEED_3_ID
The main select is going to group the result from the subquery, that way you should not get duplicated values.
Hope this help

The MIN() Function Ms Access

this is a sample sql query that i created ms access query. i am trying to get only one row the min(DATE). how ever when i run my query i get multiple lines. any hits? thanks
SELECT tblWarehouseItem.whiItemName,
tblWarehouseItem.whiQty,
tblWarehouseItem.whiPrice,
Min(tblWarehouseItem.whiDateIn) AS MinOfwhiDateIn,
tblWarehouseItem.whiExpiryDate,
tblWarehouseItem.whiwrhID
FROM tblWarehouseItem
GROUP BY tblWarehouseItem.whiDateIn,
tblWarehouseItem.whiItemName,
tblWarehouseItem.whiQty,
tblWarehouseItem.whiPrice,
tblWarehouseItem.whiExpiryDate,
tblWarehouseItem.whiwrhID;
If i have my sql code like that is working as it should:
SELECT MIN(tblWarehouseItem.whiDateIn) FROM tblWarehouseItem;
In the first query, you group by a number of columns. That means the minimum value will be calculated for each group, which in turn means you may have multiple rows. On the other hand, the second query will only get the minimum value for the specified column from all rows, so that there is only one row in the result set.
A simple example is shown below to illustrate the above.
Table:
Key Value
1 1
1 2
2 3
2 4
On Group By Key:
GroupKey MinValue
1 = min(1,2) = 1 -> Row 1
2 = min(3,4) = 3 -> Row 2
On Min (Value)
MinValue
=min(1,2,3,4) = 1 -> Row 1
For a table like above, if you want to select all rows and also show the minimum value from whole table rather than per group, you can do something like this:
select key, (select min(value) from table)
from table
SELECT WI.*
FROM tblWarehouseItem AS WI INNER JOIN (SELECT whiimtID, MIN(tblWarehouseItem.whiDateIn) AS whiDateIn
FROM tblWarehouseItem
GROUP BY whiimtID) AS MinWI ON (WI.whiDateIn = MinWI.whiDateIn) AND (WI.whiimtID = MinWI.whiimtID);

SQL Get n last unique entries by date

I have an access database that I'm well aware is quite poorly designed, unfortunately it is what I must use. It looks a little like the following:
(Row# is not a column in the database, it's just there to help me describe what I'm after)
Row# ID Date Misc
1 001 01/8/2013 A
2 001 01/8/2013 B
3 001 01/8/2013 C
4 002 02/8/2013 D
5 002 02/8/2013 A
6 003 04/8/2013 B
7 003 04/8/2013 D
8 003 04/8/2013 D
What I'm trying to do is obtain all information entered for the last n (by date) 'entries' where an 'entry' is all rows with a unique ID.
So if I want the last 1 entry I will get rows 6, 7 and 8. The last two entries will get me rows 4-8 etc.
I've tried to get the SN's needed in a subselect and then select all entries where those SN's appear, but I couldn't get it to work. Any help appreciated.
Thanks.
The proper Access syntax:
select *
from t
where ID in (select top 10 ID
from t
group by ID
order by max([date]) desc
)
I think this will work:
select *
from table
where Date in (
select distinct(Date) as unique_date from table order by unique_date DESC limit <num>
)
The idea is to use the subselect with a limit to only identify dates you care about.
EDIT: Some databases do not allow a limit in a subquery (I'm looking at you, mysql). In that case, you'll have to make a temporary table out of the subquery then select * from it.

SQL Query to remove cyclic redundancy

I have a table that looks like this:
Column A | Column B | Counter
---------------------------------------------
A | B | 53
B | C | 23
A | D | 11
C | B | 22
I need to remove the last row because it's cyclic to the second row. Can't seem to figure out how to do it.
EDIT
There is an indexed date field. This is for Sankey diagram. The data in the sample table is actually the result of a query. The underlying table has:
date | source node | target node | path count
The query to build the table is:
SELECT source_node, target_node, COUNT(1)
FROM sankey_table
WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19'
GROUP BY source_node, target_node
In the sample, the last row C to B is going backwards and I need to ignore it or the Sankey won't display. I need to only show forward path.
Removing all edges from your graph where the tuple (source_node, target_node) is not ordered alphabetically and the symmetric row exists should give you what you want:
DELETE
FROM sankey_table t1
WHERE source_node > target_node
AND EXISTS (
SELECT NULL from sankey_table t2
WHERE t2.source_node = t1.target_node
AND t2.target_node = t1.source_node)
If you don't want to DELETE them, just use this WHERE clause in your query for generating the input for the diagram.
If you can adjust how your table is populated, you can change the query you're using to only retrieve the values for the first direction (for that date) in the first place, with a little bit an analytic manipulation:
SELECT source_node, target_node, counter FROM (
SELECT source_node,
target_node,
COUNT(*) OVER (PARTITION BY source_node, target_node) AS counter,
RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
LEAST(source_node, target_node), TRUNC(data_date)
ORDER BY data_date) AS rnk
FROM sankey_table
WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19'
)
WHERE rnk = 1;
The inner query gets the same data you collect now but adds a ranking column, which will be 1 for the first row for any source/target pair in any order for a given day. The outer query then just ignores everything else.
This might be a candidate for a materialised view if you're truncating and repopulating it daily.
If you can't change your intermediate table but can still see the underlying table you could join back to it using the same kind of idea; assuming the table you're querying from is called sankey_agg_table:
SELECT sat.source_node, sat.target_node, sat.counter
FROM sankey_agg_table sat
JOIN (SELECT source_node, target_node,
RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
LEAST(source_node, target_node), TRUNC(data_date)
ORDER BY data_date) AS rnk
FROM sankey_table) st
ON st.source_node = sat.source_node
AND st.target_node = sat.target_node
AND st.rnk = 1;
SQL Fiddle demos.
DELETE FROM yourTable
where [Column A]='C'
given that these are all your rows
EDIT
I would recommend that you clean up your source data if you can, i.e. delete the rows that you call backwards, if those rows are incorrect as you state in your comments.