How to write optimized query for multiple prioritize conditional joins in SQL server? - sql

The scenario I'm after for is :
Result = Nothing
CollectionOfTables = Tbl1, Tbl2, Tbl3
While(True){
CurrentTable = GetHighestPriorityTable(CollectionOfTables)
If(CurrentTable) = Nothing Then Break Loop;
RemoveCurrentTableFrom(CollectionOfTables)
ForEach ID in CurrentTable as TempRow {
If(Result.DoesntContainsId(ID)) Then Result.AddRow(TempRow)
}
}
Assume I have following three tables.
IdNameTable1, Priority 1
1 John
2 Mary
3 Elsa
IdNameTable2, Priority 2
2 Steve
3 Max
4 Peter
IdNameTable3, Priority 3
4 Frank
5 Harry
6 Mona
Here is the final result I need.
IdNameResult
1 John
2 Mary
3 Elsa
4 Peter
5 Harry
6 Mona
A few tips to keep in mind.
Number of actual tables is 10.
Number of rows per table exceeds 1 Million.
It's not necessary to use join in query, but because of amount of data I'm working with the query must be optimized and used set-operations in SQL not a Cursor script.

Here's a way to do it using UNION and ROW_NUMBER():
;With Cte As
(
Select Id, Name, 1 As Prio
From Table1
Union All
Select Id, Name, 2 As Prio
From Table2
Union All
Select Id, Name, 3 As Prio
From Table3
), Ranked As
(
Select Id, Name, Row_Number() Over (Partition By Id Order By Prio) As RN
From Cte
)
Select Id, Name
From Ranked
Where RN = 1
Order By Id Asc;

Related

How to delete rows after the item which equals to exact value?

I have the following dataframe
Block_id step name
1 1 Marie
1 2 Bob
1 3 John
1 4 Lola
2 1 Alex
2 2 John
2 3 Kate
2 4 Herald
3 1 Alec
3 2 Paul
3 3 Rex
As you can see data frame is sorted by block_id and then by step. I want to delete only in one block_id everything after the row where I have name John(the row with John as well). So the desired output would be
Block_id step name
1 1 Marie
1 2 Bob
2 1 Alex
3 1 Alec
3 2 Paul
3 3 Rex
An updatable CTE with a cumulative conditional COUNT seems to be what you are after:
CREATE TABLE dbo.YourTable (BlockID int,
Step int,
[Name] varchar(10));
GO
INSERT INTO dbo.YourTable
VALUES(1,1,'Marie'),
(1,2,'Bob'),
(1,3,'John'),
(1,4,'Lola'),
(2,1,'Alex'),
(2,2,'John'),
(2,3,'Kate'),
(2,4,'Herald'),
(3,1,'Alec'),
(3,2,'Paul'),
(3,3,'Rex');
GO
WITH CTE AS(
SELECT COUNT(CASE [Name] WHEN 'John' THEN 1 END) OVER (PARTITION BY BlockID ORDER BY Step) AS Johns
FROM dbo.YourTable)
DELETE FROM CTE
WHERE Johns >= 1;
GO
SELECT *
FROM dbo.YourTable;
GO
DROP TABLE dbo.YourTable;
One method uses an updatable CTE:
with todelete as (
select t.*,
min(case when name = 'John' then step end) over (partition by block_id) as john_id
from t
)
delete from todelete
where id >= john_id;
Or, if you prefer, a correlated subquery:
delete from t
where id >= (select min(t2.id)
from t t2
where t2.blockid = t.blockid and t2.name = 'John'
);
For performance, both of these can take advantage of an index on (blockid, name, id).

How to delete records with lower version in big query?

Lets say my table contains the following data
id
name
version
1
Rahul
1
1
Rahul
2
2
John
1
3
Mike
1
2
John
2
4
Rubel
1
5
David
1
1
Rahul
3
I need to filter the duplicate records with lower version. How can this be done?
The output essentially should be
id
name
version
1
Rahul
3
2
John
2
3
Mike
1
4
Rubel
1
5
David
1
For this dataset, aggregation seems sufficient:
select id, name, max(version) as max_version
from mytable
group by id, name
You can use not exists as follows:
select id, name, version
from your_table t
Where not exists
(Select 1 from your_table tt
Where tt.id = t.id and tt.version > t.version)
Or you can use analytical function row_number as follows:
Select id, name, version from
(select t.*,
Row_number() over (partition by id order by version desc) as rn
from your_table t) t
Where rn = 1

SQL Server get distinct counts of name by each ID

I have a dataset like :
ID NAME
1 Aaron
2 Theon
3 Jon Snow
4 Jon Snow
4 Dany
5 Arya
5 Robert
5 Tyrion
I need to add a new column to this that shows the output based on the number of distinct names per ID. So expected output would be:
ID NAME Mapping
1 Aaron 1
2 Theon 1
3 Jon Snow 1
4 Jon Snow 2
4 Dany 2
5 Arya 3
5 Robert 3
5 Tyrion 3
I am confused about how to achieve this since I have tried a case statement where count(distinct(name)) does not return the right values.
You may try using COUNT as an analytic function:
SELECT
ID,
Name,
COUNT(*) OVER (PARTITION BY ID) Mapping
FROM yourTable
ORDER BY
ID;
Another approach to get COUNT of DISTINCT Name for each ID
SELECT *,
(SELECT Count(DISTINCT NAME)
FROM #table T
WHERE T1.id = T.id) Mapping
FROM #table T1
Online Demo
You can simply use below query
SELECT COUNT(DISTINCT NAME)
FROM YOUR_TABLE
GROUP BY ID
Thanks
Other method (specif SQL Server, otherwise use INNER JOIN LATERAL):
SELECT *
FROM #table f1
CROSS APPLY
(
select Count(*) Nb from #table f2
where f2.ID=f1.ID
) f3

SQL - Choose 4 movies (one which stars one actor and the other 3 where he doesn't participate)

I have 3 tables:
FILMS
id film
1 Gladiator
2 Pulp Fiction
3 Taxi Driver
4 ...
ACTORS
id actor
1 Russell Crowe
2 Robert DeNiro
3 John Travolta
4 Samuel L. Jackson
RELATIONSHIPS
id_film id_actor
1 1
2 3
2 4
3 2
Now I'm trying to make a query where by passing an actor's id I would get 4 random movies - One where he participates and three others where he doesn't.
I'm finding it hard to find a solution. Any idea of what would be the better approach?
The canonical way would use union all. The following breaks this out into separate CTEs, just to make the logic very clear:
with a_1 as (
select top 1 r.id_file
from relationships r
where r.id_actor = #id_actor
order by newid()
),
nota_3 as (
select top 3 r.id_film
from relationships r
group by r.id_film
having sum(case when r.id_actor = #id_actor then 1 else 0 end) = 0
order by newid()
)
select * from a_1 union all
select * from nota_3;
To my mind the clearest way of expressing the query for non-participation films is with EXCEPT, like so:
;WITH ParticipationFilms AS (
SELECT F.id, F.film
FROM FILMS F INNER JOIN RELATIONSHIPS R ON F.id = R.id_film
WHERE R.id_actor = #id_actor
)
, NonParticipationFilms AS (
SELECT id, film
FROM FILMS
EXCEPT
SELECT id, film
FROM ParticipationFilms
)
SELECT TOP (1) * FROM ParticipationFilms ORDER BY NEWID()
UNION ALL
SELECT TOP (3) * FROM NonParticipationFilms ORDER BY NEWID()
;

postgresql - filter out double rows (but not the first and last one)

i got an "postgres" SQL problem.
I got a table which looks like this
id name level timestamp
1 pete 1 100
2 pete 1 200
3 pete 1 500
4 pete 5 900
7 pete 5 1000
9 pete 5 1200
15 pete 2 700
Now I want to delete the lines i dont need. i only want to now the first line where he get a new level and the last line he has this level.
id name level timestamp
1 pete 1 100
3 pete 1 500
15 pete 2 700
4 pete 5 900
9 pete 5 1200
(there much more columns like realmpoints and so on)
I have a solution if the the timestamp is only increasing.
SELECT id, name, level, timestamp
FROM player_testing
WHERE id IN ( SELECT MAX(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
UNION
SELECT MIN(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
)
ORDER BY ts
But I find no way to makes it work for my problem.
select id, name, level, timestamp
from (
select id,name,level,timestamp,
row_number() over (partition by name, level order by timestamp) as rn,
count(*) over (partition by name, level) as max_rn
from player_testing
) t
where rn = 1 or rn = max_rn;
Btw: timestamp is a horrible name for a column. For one reason because it's a reserved word, but more importantly because it doesn't document what the column contains. Is that a start_timestamp and end_timestamp a valid_until_timestamp, ...?
Here is an alternate solution to #a_horse_with_no_name's without over partition, and thus more generic SQL:
select *
from player_testing as A
where id = (
select min(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
or id = (
select max(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
Here is the fiddle to show it working: http://sqlfiddle.com/#!2/47bd44/1