Oracle: DISTINCT or GROUP BY row consistency - sql

I have following table:
Name Parent Status
A P1 0
A P2 1
B PB -1
Will following queue guarantee, that resulting data will be related to a single row:
SELECT
DISTINCT Name, Parent, Status
FROM
MyTable
For ex. could result set contain:
A, P1, 1
It doesn't match any row in the table. How can write an SQL statement, that selects ANY and AT MOST ONE row with each name?

SQL Fiddle
SELECT DISTINCT will get every row and then discard any duplicate rows in the result set.
The data you've given has duplicates in a column but no duplicate rows - so all rows will be returned:
Query 1:
SELECT DISTINCT
Name,
Parent,
Status
FROM MyTable
Results:
| NAME | PARENT | STATUS |
|------|--------|--------|
| A | P2 | 1 |
| B | PB | -1 |
| A | P1 | 0 |
For ex. could result set contain:
A, P1, 1
No, you can see from the above results that it does not. However, you can make a query that does:
Query 2:
SELECT Name,
MIN( Parent ),
MAX( Status )
FROM MyTable
GROUP BY Name
Results:
| NAME | MIN(PARENT) | MAX(STATUS) |
|------|-------------|-------------|
| A | P1 | 1 |
| B | PB | -1 |
In answer to your final question:
How can write an SQL statement, that selects ANY and AT MOST ONE row with each name?
This query orders the rows randomly and then selects the (randomly) first one for each name:
Query 3:
WITH Randomness AS (
SELECT Name,
Parent,
Status,
ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_ID
FROM MyTable
)
SELECT Name,
Parent,
Status
FROM Randomness
WHERE Random_ID = 1
Results:
| NAME | PARENT | STATUS |
|------|--------|--------|
| A | P1 | 0 |
| B | PB | -1 |
If you run Query 3 a second time then you may get the other A row returned (or not - it's random).
Or if you want to be rather silly and completely random then you can select a random parent and a random status for each name (such that the Parent and Status do not have to come from the same row of the original table).
Query 4:
SELECT Name,
MIN( Parent ) KEEP ( DENSE_RANK FIRST ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_Parent,
MIN( Status ) KEEP ( DENSE_RANK FIRST ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_Status
FROM MyTable
GROUP BY Name
Results:
| NAME | RANDOM_PARENT | RANDOM_STATUS |
|------|---------------|---------------|
| A | P1 | 1 |
| B | PB | -1 |

Please try:
SELECT
Name,
Parent,
Status
FROM(
select
Name,
Parent,
Status,
ROW_NUMBER()
OVER (PARTITION BY Name order by Status desc) RNum
From YourTable
)x where RNum=1
SQL Fiddle Demo

Related

Find the count of IDs that have the same value

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks
You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle
You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |
First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

Postgres: Deleting rows that are duplicated in one column based on the conditions of another column

I have a PostgreSQL table that stores user details called users as shown below
ID | user name | item | dos | Charge|
1 | Ed | 32 |01-02-1987| 1 |
2 | Taya | 01 |05-07-1981|-1 |
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
2 | Taya | 01 |05-07-1981| 1 |
1 | Ed | 32 |01-02-1987|-1 |
I want to delete rows where they are same across id, username, item and dos & sum of charges is 0. This means both row 1 and row 6 for ed gets deleted.
With more than 2 occurences, if the sum of charge is 1, i want one of the row with charge -1 and 1 deleted which means one row with charge 1 will be retained. For eg: ROw 2 and Row for Taya will be deleted.
The output table that i am after is:
ID | user name | item | dos | Charge|
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
Any ideas?
You want the having clause:
This will get you the output you want:
select
id, user_name, item, dos, sum (charge)
from table
group by
id, user_name, item, dos
having
sum (charge) != 0
If you're really trying to delete the records that make it zero:
delete from table
where (id, user_name, item, dos) in (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
This does the same thing, and is quite a bit more code, but because it's using a semi-join it might be better for really large datasets:
with delete_me as (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
delete from table t
where exists (
select null
from delete_me d
where
t.id = d.id and
t.user_name = d.user_name and
t.item = d.item and
t.dos = d.dos
)

Select multiple distinct rows from table SQL

I am attempting to select distinct (last updated) rows from a table in my database. I am trying to get the last updated row for each "Sub section". However I cannot find a way to achieve this.
The table looks like:
ID | Name |LastUpdated | Section | Sub |
1 | Name1 | 2013-04-07 16:38:18.837 | 1 | 1 |
2 | Name2 | 2013-04-07 15:38:18.837 | 1 | 2 |
3 | Name3 | 2013-04-07 12:38:18.837 | 1 | 1 |
4 | Name4 | 2013-04-07 13:38:18.837 | 1 | 3 |
5 | Name5 | 2013-04-07 17:38:18.837 | 1 | 3 |
What I am trying to get my SQL Statement to do is return rows:
1, 2, and 5.
They are distinct for the Sub, and the most recent.
I have tried:
SELECT DISTINCT Sub, LastUpdated, Name
FROM TABLE
WHERE LastUpdated = (SELECT MAX(LastUpdated) FROM TABLE WHERE Section = 1)
Which only returns the distinct row for the most recent updated Row. Which makes sense.
I have googled what I am trying, and checked relevant posts on here. However not managed to find one which really answers what I am trying.
You can use the row_number() window function to assign numbers for each partition of rows with the same value of Sub. Using order by LastUpdated desc, the row with row number one will be the latest row:
select *
from (
select row_number() over (
partition by Sub
order by LastUpdated desc) as rn
, *
from YourTable
) as SubQueryAlias
where rn = 1
Wouldn't it be enough to use group by?
SELECT DISTINCT MIN(Sub), MAX(LastUpdated), MIN(NAME) FROM TABLE GROUP BY Sub Where Section = 1

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.