Using GROUP BY, select ID of record in each group that has lowest ID - sql

I am creating a file orginization system where you can add content items to multiple folders.
I am storing the data in a table that has a structure similar to the following:
ID TypeID ContentID FolderID
1 101 1001 1
2 101 1001 2
3 102 1002 3
4 103 1002 2
5 103 1002 1
6 104 1001 1
7 105 1005 2
I am trying to select the first record for each unique TypeID and ContentID pair. For the above table, I would want the results to be:
ID
1
3
4
6
7
As you can see, the pairs 101 1001 and 103 1002 were each added to two folders, yet I only want the record with the first folder they were added to.
When I try the following query, however, I only get result that have at least two entries with the same TypeID and ContentID:
select MIN(ID)
from table
group by TypeID, ContentID
results in
ID
1
4
If I change MIN(ID) to MAX(ID) I get the correct amount of results, yet I get the record with the last folder they were added to and not the first folder:
ID
2
3
5
6
7
Am I using GROUP BY or the MIN wrong? Is there another way that I can accomplish this task of selecting the first record of each TypeID ContentID pair?

MIN() and MAX() should return the same amount of rows. Changing the function should not change the number of rows returned in the query.
Is this query part of a larger query? From looking at the sample data provided, I would assume that this code is only a snippet from a larger action you are trying to do. Do you later try to join TypeID, ContentID or FolderID with the tables the IDs are referencing?
If yes, this error is likely being caused by another part of your query and not this select statement. If you are using joins or multi-level select statements, you can get different amount of results if the reference tables do not contain a record for all the foreign IDs.
Another suggestion, check to see if any of the values in your records are NULL. Although this should not affect the GROUP BY, I have sometime encountered strange behavior when dealing with NULL values.

Use ROW_NUMBER
WITH CTE AS
(SELECT ID,TypeID,ContentID,FolderID,
ROW_NUMBER() OVER (PARTITION BY TypeID,ContentID ORDER BY ID) as rn FROM t
)
SELECT ID FROM CTE WHERE rn=1

Use it with ORDER BY:
select *
from table
group by TypeID, ContentID
order by id
SQLFiddle: http://sqlfiddle.com/#!9/024016/12

Try with first ( id) instead of min(id)
select first(id)
from table
group by TypeID, ContentID
It works ?

Related

how to write generic DB script to delete some data based on conditions from microsoft SQL server

I have a user table and RoleUserMapReference tables
From RoleUserMapReference table has coulmn 'referenceId' and 'MapId', for 1 'mapId' multiple referenceId records are there I want to remove specific mapId data if multiple records found for mapId in RoleUserMapReference table. how to write generic query to delete these records
i.e RoleUserMapReference MapId referenceId
11 100 1000
12 100 1002
13 101 1002
14 100 1003
here mapId 100 is having 2 records ,so want to delete records where referenceId ='1000'
how to find and delete with query ?
please help
Below script to delete duplicate values and keep only one rows.
WITH cte AS (
SELECT
[MapId ],
ROW_NUMBER() OVER (
PARTITION BY
[MapId]
ORDER BY
[MapId]
) row_num
FROM
RoleUserMapReference
)
DELETE FROM cte
WHERE row_num = 1;

Updating column according to index within group

In our databases we have a table called conditions which references a table called attributes.
So it looks like this (ignoring some other columns that aren't relevant to the question)
id
attribute_id
execution_index
1
1000
1
2
1000
2
3
1000
1
4
2000
1
5
2000
2
6
2000
2
In theory the combination of attribute_id and execution_index should always be unique, but in practice they're not, and the software ends up essentially using the id to decide which comes first between two conditions with the same execution index. We want to add a uniqueness constraint to the table, but before we do that we need to update the execution indexes. So essentially we want to group them by attribute_id, order them by execution_index then id, and give them new execution indexes so that it becomes
id
attribute_id
execution_index
1
1000
1
2
1000
3
3
1000
2
4
2000
1
5
2000
2
6
2000
3
I'm not sure how to do this without just ordering by attribute_id, execution_index, id and then iterating through incrementing the execution_index by 1 each time and resetting it to be 1 whenever the attribute_id changes. (That would work but it'd be slow and someone is going to have to run this script on several dozen databases so I'd rather it didn't take more than a couple of seconds per database.)
Really I'd like to do something along the lines of
UPDATE c
SET c.execution_index = [this needs to be the index within the group somehow]
FROM condities c
GROUP BY c.attribute_id
ORDER BY c.execution_index asc, c.id asc
But I don't know how to make that actually work.
It looks like you can use an updatable CTE:
with cte as (
select *,
Row_Number() over(partition by attribute_id order by execution_index, id) new
from conditions
)
update cte set execution_index = new
I would suggest adding a new column and first updating that and checking the results are as expected.
Example Fiddle
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER
(
PARTITION BY attribute_id
ORDER BY execution_index, id
) AS RowNum
FROM condities
)
UPDATE cte
SET execution_index = RowNum

How to write sql clause where column not equal and any id associated

I want to setup a simple query that will filter out any row that contains "A" in the ItemID, but my issue is I also do NOT want to display any journal ID from a different row since it matched "A". I tried googling the solution, but I am sure I am not using the right keywords to find it. I am using microsoft sql 2008, but I am not a database admin so I am not to familiar. I tried using distinct, and I also tried group by, but in this situation it does not work.
This is a simplified version of the table that I am working with:
JournalID ItemID PrimaryKEY
1 A 1
1 B 2
2 A 3
2 C 4
3 B 5
4 D 6
And here is how I would like to make it look:
JournalID ItemID PrimaryKEY
3 B 5
4 D 6
This will exclude any rows where the ItemID is 'A' and also any rows that have the same JournalID as a row where a ItemID was 'A'.
SELECT JournalID, ItemID, PrimaryKEY
FROM TABLE
WHERE JournalID NOT IN (Select JournalID FROM TABLE WHERE ItemID = 'A')
Try this:
SELECT *
FROM table_name
WHERE JournalID
NOT IN (SELECT JournalID
FROM table_name
WHERE ItemID = 'A')

SQL Remove Duplicates, save lowest of certain column

I've been looking for an answer to this but couldn't find anything the same as this particular situation.
So I have a one table that I want to remove duplicates from.
__________________
| JobNumber-String |
| JobOp - Number |
------------------
So there are multiples of these two values, together they make the key for the row. I want keep all distinct job numbers with the lowest job op. How can I do this? I've tried a bunch of things, mainly trying the min function, but that only seems to work on the entire table not just the JobNumber sets. Thanks!
Original Table Values:
JobNumber Jobop
123 100
123 101
456 200
456 201
780 300
Code Ran:
DELETE FROM table
WHERE CONCAT(JobNumber,JobOp) NOT IN
(
SELECT CONCAT(JobNumber,MIN(JobOp))
FROM table
GROUP BY JobNumber
)
Ending Table Values:
JobNumber Jobop
123 100
456 200
780 300
With SQL Server 2008 or higher you can enhance the MIN function with an OVER clause specifying a PARTITION BY section.
Please have a look at https://msdn.microsoft.com/en-us/library/ms189461.aspx
You can simply select the values you want to keep:
select jobOp, min(number) from table group by jobOp
Then you can delete the records you don't want:
DELETE t FROM table t
left JOIN (select jobOp, min(number) as minnumber from table group by jobOp ) e
ON t.jobob = e.jobob and t.number = e.minnumber
Where e.jobob is null
I like to do this with window functions:
with todelete as (
select t.*, min(jobop) over (partition by numbers) as minjop
from table t
)
delete from todelete
where jobop > minjop;
It sounds like you are not using the correct GROUP BY clause when using the MIN function. This sql should give you the minimum JobOp value for each JobNumber:
SELECT JobNumber, MIN(JobOp) FROM test.so_test GROUP BY JobNumber;
Using this in a subquery, along with CONCAT (this is from MySQL, SQL Server might use different function) because both fields form your key, gives you this sql:
SELECT * FROM so_test WHERE CONCAT(JobNumber,JobOp)
NOT IN (SELECT CONCAT(JobNumber,MIN(JobOp)) FROM test.so_test GROUP BY JobNumber);

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001