Finding unique combination of columns associated with 1 non-unique column

Finding unique combination of columns associated with 1 non-unique column - sql

Here's my table:
ItemID
ItemName
ItemBatch
TrackingNumber
a
bag
1
498239
a
bag
1
498239
a
bag
1
958103
b
paper
2
123444
b
paper
2
123444
I'm trying to find occurrences of ItemID + ItemName + ItemBatch that have a non-unique TrackingNumber. So in the example above, there are 3 occurrences of a bag 1 and at least 1 of those rows has a different TrackingNumber from any of the other rows. In this case 958103 is different from 498239 so it should be a hit.
For b paper 2 the TrackingNumber is unique for all the respective rows so we ignore this. Is there a query that can pull this combination of columns with 3 identical fields and 1 non-unique field?

Yet another option:
SELECT *
FROM tab
WHERE ItemBatch IN (SELECT ItemBatch
FROM tab
GROUP BY ItemBatch, TrackingNumber
HAVING COUNT(TrackingNumber) = 1)
This query finds the combination of (ItemBatch, TrackingNumber) that occur only once, then gets all rows corresponding to their ItemBatch values.
Try it here.

You can use GROUP BY and HAVING
SELECT
t.ItemID,
t.ItemName,
t.ItemBatch,
COUNT(*)
FROM YourTable t
GROUP BY
t.ItemID,
t.ItemName,
t.ItemBatch
HAVING COUNT(DISTINCT TrackingNumber) > 1;
Or if you want each individual row you can use a window function. You cannot use COUNT(DISTINCT in a window function, but you can simulate it with DENSE_RANK and MAX
SELECT
t.*
FROM (
SELECT *,
Count = MAX(dr) OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch)
FROM (
SELECT *,
dr = DENSE_RANK() OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch ORDER BY t.TrackingNumber)
FROM YourTable t
) t
) t
WHERE t.Count > 1;
db<>fiddle

Related

SUM a specific column in next rows until a condition is true

Here is a table of articles and I want to store sum of Mass Column from next rows in sumNext Column based on a condition.
If next row has same floor (in floorNo column) as current row, then add the mass of next rows until the floor is changed
E.g : Rows three has sumNext = 2. That is computed by adding the mass from row four and row five because both rows has same floor number as row three.
id
mass
symbol
floorNo
sumNext
2891176
1
D
1
0
2891177
1
L
8
0
2891178
1
L
1
2
2891179
1
L
1
1
2891180
1
1
0
2891181
1
5
2
2891182
1
5
1
2891183
1
5
0
Here is the query, that is generating this table, I just want to add sumNext column with the right value inside.
WITH items AS (SELECT
SP.id,
SP.mass,
SP.symbol,
SP.floorNo
FROM articles SP
ORDER BY
DECODE(SP.symbol,
'P',1,
'D',2,
'L',3,
4 ) asc)
SELECT CLS.*
FROM items CLS;

You could use below solution which uses
common table expression (cte) technique to put all consecutive rows with same FLOORNO value in the same group (new grp column).
Then uses the analytic version of SUM function to sum all next MASS per grp column as required.
Items_RowsNumbered (id, mass, symbol, floorNo, rnb) as (
select ID, MASS, SYMBOL, FLOORNO
, row_number()over(
order by DECODE(symbol, 'P',1, 'D',2, 'L',3, 4 ) asc, ID )
/*
You need to add ID column (or any others columns that can identify each row uniquely)
in the "order by" clause to make the result deterministic
*/
from (Your source query)Items
)
, cte(id, mass, symbol, floorNo, rnb, grp) as (
select id, mass, symbol, floorNo, rnb, 1 grp
from Items_RowsNumbered
where rnb = 1
union all
select t.id, t.mass, t.symbol, t.floorNo, t.rnb
, case when t.floorNo = c.floorNo then c.grp else c.grp + 1 end grp
from Items_RowsNumbered t
join cte c on (c.rnb + 1 = t.rnb)
)
select
ID, MASS, SYMBOL, FLOORNO
/*, RNB, GRP*/
, nvl(
sum(MASS)over(
partition by grp
order by rnb
ROWS BETWEEN 1 FOLLOWING and UNBOUNDED FOLLOWING)
, 0
) sumNext
from cte
;
demo on db<>fiddle

This is a typical gaps-and-islands problem. You can use LAG() in order to determine the exact partitions, and then SUM() analytic function such as
WITH ii AS
(
SELECT i.*,
ROW_NUMBER() OVER (ORDER BY id DESC) AS rn2,
ROW_NUMBER() OVER (PARTITION BY floorNo ORDER BY id DESC) AS rn1
FROM items i
)
SELECT id,mass,symbol, floorNo,
SUM(mass) OVER (PARTITION BY rn2-rn1 ORDER BY id DESC)-1 AS sumNext
FROM ii
ORDER BY id
Demo

Remove all non contiguous records with identical fields

I got a table with some columns like
ID RecordID DateInserted
1 10 now + 1
2 10 now + 2
3 4 now + 3
4 10 now + 4
5 10 now + 5
I would like to remove all non contiguous duplicates of the RecordID Column when they are sorted by DateInserted
In my example I would like to remove record 4 and 5 because between 2 and 4 there is a record with different id.
Is there a way to do it with 1 query ?

You can use window functions. One method is to count the changes in value that occur up to each row and just take the rows with one change:
select t.*
from (select t.*,
sum(case when prev_recordid = recordid then 0 else 1 end) over (order by dateinserted) as grp_num
from (select t.*,
lag(recordid) over (order by dateinserted) as prev_recordid
from t
) t
) t
where grp_num = 1;

One way would be to "flag" all the rows where it is not the first time this RecordID appeared and the prior row contained a different RecordID. Then you just exclude any row beyond that point for that RecordID.
;WITH cte AS
(
SELECT ID, RecordID, DateInserted,
dr = DENSE_RANK() OVER (PARTITION BY RecordID ORDER BY DateInserted),
prior = COALESCE(LAG(RecordID,1) OVER (ORDER BY DateInserted), RecordID)
FROM dbo.table_name
),
FlaggedRows AS
(
SELECT RecordID, dr
FROM cte
WHERE dr > 1 AND prior <> RecordID
)
SELECT cte.ID, cte.RecordID, cte.DateInserted
FROM cte
LEFT OUTER JOIN FlaggedRows AS f
ON cte.RecordID = f.RecordID
WHERE cte.dr < COALESCE(f.dr, cte.dr + 1)
ORDER BY cte.DateInserted;
If you want to actually delete the rows from the source (remove will typically be inferred as removing from the result), then change the SELECT at the end to:
DELETE cte
FROM cte
INNER JOIN FlaggedRows f
ON cte.RecordID = f.RecordID
WHERE cte.dr >= f.dr;

select two grouped by columns but only select row with the highest COUNT()

I have a table that consists of three columns - UPC, ATTRIBUTE, STORE_NUM. I have 10 stores and 2 UPCs at each with different ATTRIBUTEs.
Every store either has either attribute X or Y. I group by UPC and ATTRIBUTE and get the count of stores.
SELECT [UPC], [ATTRIBUTE], COUNT([STORE_NUM]) AS [COUNT]
FROM TABLEA
GROUP BY [UPC], [ATTRIBUTE]
Yields this:
UPC ATTRIBUTE COUNT
1 X 8
1 Y 2
2 X 1
2 Y 9
And I want to select UPC and ATTRIBUTE with the highest count. My desired output would be this:
UPC ATTRIBUTE
1 X
2 Y
I can't figure out how to reach this desired outcome.

You can use window functions with aggregation:
SELECT *
FROM (SELECT [UPC], [ATTRIBUTE], COUNT(*) AS [COUNT],
ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY COUNT(*) DESC) as seqnum
FROM TABLEA
GROUP BY [UPC], [ATTRIBUTE]
) x
WHERE seqnum = 1;
Use RANK() if you want duplicates in the event of ties.

Use row_number and a subquery:
SELECT UPC, ATTRIBUTE
FROM (
SELECT UPC, ATTRIBUTE, ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY a_count DESC) as rn
FROM ( SELECT [UPC],[ATTRIBUTE],COUNT([STORE_NUM]) AS [a_COUNT]
FROM TABLEA
GROUP BY [UPC],[ATTRIBUTE]
) t
) q
WHERE q.rn = 1

Count the total (N) of duplicates in a column

I'm attempting to count the total number of duplicates in a column (not the individual duplicates).
from outputs
GROUP BY journal_id
HAVING ( COUNT(doi) > 1 )
WHERE journal_id = 1
SQL TABLE
doi journal_id
123 1
123 2
123 1
124 1
The expected answer is 2

The number of entire row duplicates can be calculated by taking the total number of rows and subtracting the number of distinct rows:
select a.cnt_all - d.cnt_individual
from (select count(*) as cnt_all
from outputs
) a cross join
(select count(*) as cnt_individual
from (select distinct *
from outputs
) d
) d;
If you know your columns and your database supports multiple arguments to count(distinct), this can be radically simplified to:
select count(*) - count(distinct doi, journal_id)
from outputs;
Or, if your database doesn't support this:
select sum(cnt - 1)
from (select doi, journal_id, count(*) as cnt
from outputs
group by doi, journal_id
) o;

Just sum up the count of the individual duplicates by journal id.
SELECT
SUM(COUNT(doi)) AS total_duplicates
from
outputs
WHERE
journal_id = 1
GROUP BY
journal_id
HAVING
(COUNT(doi) > 1)

Getting rows with duplicate column values

I tried this with solutions avaialble online, but none worked for me.
Table :
Id rank
1 100
1 100
2 75
2 45
3 50
3 50
I want Ids 1 and 3 returned, beacuse they have duplicates.
I tried something like
select * from A where rank in (
select rank from A group by rank having count(rank) > 1
This also returned ids without any duplicates. Please help.

Try this:
select id from table
group by id, rank
having count(*) > 1

select id, rank
from
(
select id, rank, count(*) cnt
from rank_tab
group by id, rank
having count(*) > 1
) t

This general idea should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > 1 AND COUNT(DISTINCT rank) = 1
In plain English: get every id that exists in multiple rows, but all these rows have the same value in rank.
If you want ids that have some duplicated ranks (but not necessarily all), something like this should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > COUNT(DISTINCT rank)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding unique combination of columns associated with 1 non-unique column - sql

Related

SUM a specific column in next rows until a condition is true

Remove all non contiguous records with identical fields

select two grouped by columns but only select row with the highest COUNT()

Count the total (N) of duplicates in a column

Getting rows with duplicate column values

Categories

Resources