postgresql - filter out double rows (but not the first and last one)

postgresql - filter out double rows (but not the first and last one) - sql

i got an "postgres" SQL problem.
I got a table which looks like this
id name level timestamp
1 pete 1 100
2 pete 1 200
3 pete 1 500
4 pete 5 900
7 pete 5 1000
9 pete 5 1200
15 pete 2 700
Now I want to delete the lines i dont need. i only want to now the first line where he get a new level and the last line he has this level.
id name level timestamp
1 pete 1 100
3 pete 1 500
15 pete 2 700
4 pete 5 900
9 pete 5 1200
(there much more columns like realmpoints and so on)
I have a solution if the the timestamp is only increasing.
SELECT id, name, level, timestamp
FROM player_testing
WHERE id IN ( SELECT MAX(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
UNION
SELECT MIN(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
)
ORDER BY ts
But I find no way to makes it work for my problem.

select id, name, level, timestamp
from (
select id,name,level,timestamp,
row_number() over (partition by name, level order by timestamp) as rn,
count(*) over (partition by name, level) as max_rn
from player_testing
) t
where rn = 1 or rn = max_rn;
Btw: timestamp is a horrible name for a column. For one reason because it's a reserved word, but more importantly because it doesn't document what the column contains. Is that a start_timestamp and end_timestamp a valid_until_timestamp, ...?

Here is an alternate solution to #a_horse_with_no_name's without over partition, and thus more generic SQL:
select *
from player_testing as A
where id = (
select min(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
or id = (
select max(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
Here is the fiddle to show it working: http://sqlfiddle.com/#!2/47bd44/1

Related

How to set an incrementing flag column for related rows?

I am trying to create a flag column called "Related" to use in reporting to highlight specific rows that are related based on the ID column (1 = related, NULL = not related). The original table "table1" looks like below:
Name ID Related
--------------------------------
Jack 101 NULL
John 101 NULL
Pat 105 NULL
Ben 106 NULL
Jordan 106 NULL
George 300 NULL
Alan 500 NULL
Bill 200 NULL
Bob 200 NULL
I then used this UPDATE statement below:
UPDATE a
SET Related = 1
FROM table1 a
JOIN (SELECT ID FROM table1 GROUP BY ID HAVING COUNT(*) > 1) b
ON a.ID = b.ID
Below is the result of this update statement:
Name ID Related
--------------------------------
Jack 101 1
John 101 1
Pat 105 NULL
Ben 106 1
Jordan 106 1
George 300 NULL
Alan 500 NULL
Bill 200 1
Bob 200 1
This gets me close but I need for it to instead of assigning the number 1 to each related row, to increment the number for each set of related rows based on their different ID column values.
Desired result:
Name ID Related
--------------------------------
Jack 101 1
John 101 1
Pat 105 NULL
Ben 106 2
Jordan 106 2
George 300 NULL
Alan 500 NULL
Bill 200 3
Bob 200 3

This is a possible solution using dense_rank to number your related values and an updateable CTE
with r as (
select id
from t
group by id having Count(*) > 1
),
n as (
select t.id, t.related, Dense_Rank() over (order by r.id) r
from r
join t on t.id = r.id
)
update n set related = r

You can do this without a self-join, just using window functions in a CTE, and updating the CTE directly:
WITH tCounted AS (
SELECT
t.id,
t.related,
c = COUNT(*) OVER (PARTITION BY r.id)
FROM t
),
tWithRelated as (
SELECT
t.id,
t.related,
rn = DENSE_RANK() OVER (ORDER BY r.id)
FROM tCounted
WHERE c > 1
)
UPDATE tWithRelated
SET related = rn;

Use an updateable CTE - comments explain the logic.
with cte1 as (
select [Name], ID, Related
-- Get the count within the id partition, less 1 as specified
, count(*) over (partition by id) - 1 cnt
-- Get the row number within the id partition
, row_number() over (partition by id order by id) rn
from #Test
), cte2 as (
select [Name], ID, Related, cnt, rn
-- Add 1 *only* if the count is > 0 *and* its the first row in the id partition
, case when cnt > 0 then sum(case when cnt > 0 and rn = 1 then 1 else 0 end) over (order by id) else null end NewRelated
from cte1
)
update cte2 set Related = NewRelated;
This doesn't assume Related is already null and works for more than 2 rows for any given ID.
It does assume that one can order by the ID column - even though the data provided doesn't do that.

How to delete records with lower version in big query?

Lets say my table contains the following data
id
name
version
1
Rahul
1
1
Rahul
2
2
John
1
3
Mike
1
2
John
2
4
Rubel
1
5
David
1
1
Rahul
3
I need to filter the duplicate records with lower version. How can this be done?
The output essentially should be
id
name
version
1
Rahul
3
2
John
2
3
Mike
1
4
Rubel
1
5
David
1

For this dataset, aggregation seems sufficient:
select id, name, max(version) as max_version
from mytable
group by id, name

You can use not exists as follows:
select id, name, version
from your_table t
Where not exists
(Select 1 from your_table tt
Where tt.id = t.id and tt.version > t.version)
Or you can use analytical function row_number as follows:
Select id, name, version from
(select t.*,
Row_number() over (partition by id order by version desc) as rn
from your_table t) t
Where rn = 1

SQL Server get distinct counts of name by each ID

I have a dataset like :
ID NAME
1 Aaron
2 Theon
3 Jon Snow
4 Jon Snow
4 Dany
5 Arya
5 Robert
5 Tyrion
I need to add a new column to this that shows the output based on the number of distinct names per ID. So expected output would be:
ID NAME Mapping
1 Aaron 1
2 Theon 1
3 Jon Snow 1
4 Jon Snow 2
4 Dany 2
5 Arya 3
5 Robert 3
5 Tyrion 3
I am confused about how to achieve this since I have tried a case statement where count(distinct(name)) does not return the right values.

You may try using COUNT as an analytic function:
SELECT
ID,
Name,
COUNT(*) OVER (PARTITION BY ID) Mapping
FROM yourTable
ORDER BY
ID;

Another approach to get COUNT of DISTINCT Name for each ID
SELECT *,
(SELECT Count(DISTINCT NAME)
FROM #table T
WHERE T1.id = T.id) Mapping
FROM #table T1
Online Demo

You can simply use below query
SELECT COUNT(DISTINCT NAME)
FROM YOUR_TABLE
GROUP BY ID
Thanks

Other method (specif SQL Server, otherwise use INNER JOIN LATERAL):
SELECT *
FROM #table f1
CROSS APPLY
(
select Count(*) Nb from #table f2
where f2.ID=f1.ID
) f3

How to write optimized query for multiple prioritize conditional joins in SQL server?

The scenario I'm after for is :
Result = Nothing
CollectionOfTables = Tbl1, Tbl2, Tbl3
While(True){
CurrentTable = GetHighestPriorityTable(CollectionOfTables)
If(CurrentTable) = Nothing Then Break Loop;
RemoveCurrentTableFrom(CollectionOfTables)
ForEach ID in CurrentTable as TempRow {
If(Result.DoesntContainsId(ID)) Then Result.AddRow(TempRow)
}
}
Assume I have following three tables.
IdNameTable1, Priority 1
1 John
2 Mary
3 Elsa
IdNameTable2, Priority 2
2 Steve
3 Max
4 Peter
IdNameTable3, Priority 3
4 Frank
5 Harry
6 Mona
Here is the final result I need.
IdNameResult
1 John
2 Mary
3 Elsa
4 Peter
5 Harry
6 Mona
A few tips to keep in mind.
Number of actual tables is 10.
Number of rows per table exceeds 1 Million.
It's not necessary to use join in query, but because of amount of data I'm working with the query must be optimized and used set-operations in SQL not a Cursor script.

Here's a way to do it using UNION and ROW_NUMBER():
;With Cte As
(
Select Id, Name, 1 As Prio
From Table1
Union All
Select Id, Name, 2 As Prio
From Table2
Union All
Select Id, Name, 3 As Prio
From Table3
), Ranked As
(
Select Id, Name, Row_Number() Over (Partition By Id Order By Prio) As RN
From Cte
)
Select Id, Name
From Ranked
Where RN = 1
Order By Id Asc;

How to declare a row as a Alternate Row

id Name claim priority
1 yatin 70 5
6 yatin 1 10
2 hiren 30 3
3 pankaj 40 2
4 kavin 50 1
5 jigo 10 4
7 jigo 1 10
this is my table and i want to arrange this table as shown below
id Name claim priority AlternateFlag
1 yatin 70 5 0
6 yatin 1 10 0
2 hiren 30 3 1
3 pankaj 40 2 0
4 kavin 50 1 1
5 jigo 10 4 0
7 jigo 1 10 0
It is sorted as alternate group of same row.
I am Using sql server 2005. Alternate flag starts with '0'. In my example First record with name "yatin" so set AlternateFlag as '0'.
Now second record has a same name as "yatin" so alternate flag would be '0'
Now Third record with name "hiren" is single record, so assign '1' to it
In short i want identify alternate group with same name...
Hope you understand my problem
Thanks in advance

Try
SELECT t.*, f.AlternateFlag
FROM tbl t
JOIN (
SELECT [name],
AlternateFlag = ~CAST(ROW_NUMBER() OVER(ORDER BY MIN(ID)) % 2 AS BIT)
FROM tbl
GROUP BY name
) f ON f.name = t.name
demo

You could use probably an aggregate function COUNT() and then HAVING() and then UNION both Table, like:
SELECT id, A.Name, Claim, Priority, 0 as AlternateFlag
FROM YourTable
INNER JOIN (
SELECT Name, COUNT(*) as NameCount
FROM YourTable
GROUP BY Name
HAVING COUNT(*) > 1 ) A
ON YourTable.Name = A.Name
UNION ALL
SELECT id, B.Name, Claim, Priority, 1 as AlternateFlag
FROM YourTable
INNER JOIN (
SELECT Name, COUNT(*) as NameCount
FROM YourTable
GROUP BY Name
HAVING COUNT(*) = 1 ) B
ON YourTable.Name = B.Name
Now, this assumes that the Names are unique meaning the names like Yatin for example although has two counts is only associated to one person.
See my SqlFiddle Demo

You can use Row_Number() function with OVER that will give you enumeration, than use the reminder of integer division it by 2 - so you'll get 1s and 0s in your SELECT or in the view.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

postgresql - filter out double rows (but not the first and last one) - sql

Related

How to set an incrementing flag column for related rows?

How to delete records with lower version in big query?

SQL Server get distinct counts of name by each ID

How to write optimized query for multiple prioritize conditional joins in SQL server?

How to declare a row as a Alternate Row

Categories

Resources