Omit duplicate rows then pick the surviving row base on certain criteria - sql

So I have the following data in my table
id id2 flag
1 11 0 <- this row should not be part of the result
1 12 1 <- this row should survive the distinct operation
2 13 0
3 14 0
I want my result to be
id id2 flag
1 12 1
2 13 0
3 14 0
How would I construct a query like such?
Thanks
EDIT1: Sorry, using two column dummy data doesn't correctly reflect the problem I am facing. I added another column, which complicates the problem. As you can see I can't group on id2 because they are all unique. But the row with id2 = 11 should be omitted from the result.
EDIT2: Changed the question to use 'omit' instead of 'remove'
EDIT3:
select id, id2, max(flag)
from table
group by id, id2
This query returns all 4 rows because group by id2 includes all 4 rows.

When you want to apply additional criteria to the data, you typically use GROUP BY instead of DISTINCT. For example, if you would like to keep flag of 1 if it exists, or keep zero otherwise, you can do this:
SELECT id, MAX(flag) as flag -- Since 1 > 0, MAX() works fine
FROM myTable
GROUP BY id -- This keeps only distinct ids
EDIT : (in response to edits #2&3)
Another solution would be using NOT EXISTS in a subquery, like this:
SELECT id, id2, flag
FROM myTable o
WHERE NOT EXISTS (
SELECT * FROM myTable i WHERE o.id=i.id AND i.flag > o.flag
)

;with CTE as
(
select
row_number() over (partition by id order by flag desc) as rn,
id,
id2,
flag
from myTable
)
SELECT * from CTE where rn = 1

Related

Select first 5 but only if their next ID are +1 to previous one

So I have to:
select first 5(or any other number, it will be input) filed2
from table1
where field1=YES
AND
last ID = first ID+5 or something
So normally making:
select first 5 field2
from table1
where field1=YES
---------------------------------------
Table1
---------------------------------------
ID Field1 Field2
1 YES something1
2 NO something2
3 NO something3
4 YES something4
5 YES something5
6 YES something6
7 YES something7
8 YES something8
would return
something1
something4
something5
something6
something7
but what I need is:
something4
something5
something6
something7
something8
It all has to be in row, so next ID is always +1 to previous one. No field1 "No" in between.
I'm going to present a solution that uses a recursive common table expression to select the various 'groups' of contiguous rows. It doesn't exactly meet your requirements, because I think this solution is more generically useful, but it can be modified to achieve your requirement.
This solution requires Firebird 3.0 or later; for simplicity I made field1 a BOOLEAN.
with recursive r as (
select s.id, s.field1, s.field2, s.prev_id,
row_number() over () as group_num
from s
where prev_id is null or prev_id < id - 1
union all
select s.id, s.field1, s.field2, s.prev_id, r.group_num
from r
inner join s
on s.id = r.id + 1
),
s as (
select id, field1, field2,
lag(id) over (order by id) as prev_id
from table1
where field1
),
a as (
select id, field2, group_num,
count(*) over (partition by group_num) as group_count
from r
)
select id, field2, group_num, group_count
from a
order by id
The CTE r first finds the first row of each group, and then recursively adds the next rows in that group. Be aware: this doesn't work if a group has more than 1024 rows due to recursion limits in Firebird. Each group receives a number, group_num.
The CTE a is used to count the number of rows in each group. The CTE s is used to determine the previous id for each row that matches the condition.
To get the actual result you want, you need to modify the main query to select the first group with 5 or more rows, and fetch the first 5 rows:
-- with clause omitted for brevity, copy it from the previous code block
select field2
from a
where a.group_num = (
select group_num
from a
where group_count >= 5
order by id
fetch first row only
)
order by id
fetch first 5 rows only

Removing rows from result set where column only has one value against a user

I have a result set
name stage value
---- ----- -----
jim 1 4
jim 1 8
paul 1 8
paul 1 8
want to remove the rows where 8 is the only value against a person
keep the 2 jim rows and lose the 2 paul rows
You can use not exists. For a select query:
select t.*
from t
where not exists (select 1
from t t2
where t2.name = t.name and t2.value = 8
);
Similar logic (except using exists rather than not exists) can be used for a delete -- if you really want to delete the rows from the table.
If you have a complex query that you don't want to repeat, then window functions are helpful:
select t.*
from (select t.*,
sum(case when value = 8 then 1 else 0 end) over (partition by name) as cnt_8
from t
) t
where cnt_8 = 0;
If your database support analytical function then you can use count as follows:
Select * from
(Select t.*,
Count(case when value <> 8 then 1 end) over (partition by name) as cnt
From your_table t) t
Where cnt > 0
Assuming you also have an ID column (defined as an auto increment integer) defined in your table this query would select the row with the highest id for each unique combination:
select max(id) from t group by name,stage,value
In your example this would only return the latest id for rows having values paul,1,8 in columns name,stage,value respectively.
You can then use the prior query to filter out any duplciates using it in the where clause:
select * from t
where id in (select max(id) from t group by name,stage,value)
Finally you can also delete rows that are not unique if that's your goal:
delete from t
where not id in (select max(id) from t group by name,stage,value)

Recursive Lag Column Calculation in SQL

I am trying to write a procedure that inserts calculated table data into another table.
The problem I have is that I need each row's calculated column to be influenced by the result of the previous row's calculated column. I tried to lag the calculation itself but this does not work!
Such as:
(Max is a function I created that returns the highest of two values)
Id Product Model Column1 Column2
1 A 1 5 =MAX(Column1*2, Lag(Column2))
2 A 2 2 =MAX(Column1*2, Lag(Column2))
3 B 1 3 =MAX(Column1*2, Lag(Column2))
If I try the above in SQL:
SELECT
Column1,
MyMAX(Column1,LAG(Column2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
FROM Source
...it says column2 is unknown.
Output I get if I LAG the Column2 calculation:
Select Column1, MyMAX(Column1,LAG(Column1*2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
Id Column1 Column2
1 5 10
2 2 10
3 3 6
Why 6 on row 3? Because 3*2 > 2*2.
Output that I want:
Id Column1 Column2
1 5 10
2 2 10
3 3 10
Why 10 on row 3? Because previous result of 10 > 3*2
The problem is I can't lag the result of Column2 - I can only lag other columns or calculations of them!
Is there a technique of achieving this with LAG or must I use Recursive CTE? I read that LAG succeeds CTE so I assumed it would be possible. If not, what would this 'CTE' look like?
Edit: Or alternatively - what else could I do to resolve this calculation?
Edit
In hindsight, this problem is a running partitioned maximum over Column1 * 2. It can be done as simply as
SELECT Id, Column1, Model, Product,
MAX(Column1 * 2) OVER (Partition BY Model, Product Order BY ID ASC) AS Column2
FROM Table1;
Fiddle
Original Answer
Here's a way to do this with a recursive CTE, without LAG at all, by joining on incrementing row numbers. I haven't assumed that your Id is contiguous, hence have added an additional ROW_NUMBER(). You haven't mentioned any partitioning, so haven't applied same. The query simply starts at the first row, and then projects the greater of the current Column1 * 2, or the preceding Column2
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2,
ROW_NUMBER() OVER (Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
)
SELECT Id, Column1, Column2
FROM lagged;
SqlFiddle here
Edit, Re Partitions
Partitioning is much the same, by just dragging the Model + Product columns through, then partitioning by these in the row numbering (i.e. starting back at 1 each time the Product or Model resets), including these in the CTE JOIN condition and also in the final ordering.
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2, Model, Product,
ROW_NUMBER() OVER (Partition BY Model, Product Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, Model, Product, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.Model, i.Product,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
AND i.Model = l.Model AND i.Product = l.Product
)
SELECT Id, Column1, Column2, Model, Product
FROM lagged
ORDER BY Model, Product, Id;
Updated Fiddle

How to select distinct rows with a specified condition

Suppose there is a table
_ _
a 1
a 2
b 2
c 3
c 4
c 1
d 2
e 5
e 6
How can I select distinct minimum value of all the rows of each group?
So the expected result here is:
_ _
a 1
b 2
c 1
d 2
e 5
EDIT
My actual table contains more columns and I want to select them all. The rows differ only in the last column (the second one in the example). I'm new to SQL and possibly my question is ill-formed in it initial view.
The actual schema is:
| day | currency ('EUR', 'USD') | diff (integer) | id (foreign key) |
The are duplicate pairs (day, currency) that differ by (diff, id). I want to see a table with uniquer pairs (day, currency) with a minimum diff from the original table.
Thanks!
in your case it's as simple as this:
select column1, min(column2) as column2
from table
group by column1
for more than two columns I can suggest this:
select top 1 with ties
t.column1, t.column2, t.column3
from table as t
order by row_number() over (partition by t.column1 order by t.column2)
take a look at this post https://stackoverflow.com/a/13652861/1744834
You can use the ranking function ROW_NUMBER() to do this with a CTE. Especially, if there are more column other than these two column, it will give the distict values like so:
;WITH RankedCTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Colmn2 ) rownum
FROM Table
)
SELECT column1, column2
FROM RankedCTE
WHERE rownum = 1;
This will give you:
COLUMN1 COLUMN2
a 1
b 2
c 1
d 2
e 5
SQL Fiddle Demo
SELECT ColOne, Min(ColTwo)
FROM Table
GROUP BY ColOne
ORDER BY ColOne
PS: not front of a,machine, but give above a try please.
select MIN(col2),col1
from dbo.Table_1
group by col1

SQL Query to Select the 'Next' record (similar to First or Top N)

I need to do a query to return the next (or prev) record if a certain record is not present. For instance consider the following table:
ID (primary key) value
1 John
3 Bob
9 Mike
10 Tom.
I'd like to query a record that has id 7 or greater if 7 is not present.
My questions are,
Are these type of queries possible with SQL?
What are such queries called in the DB world?
Thanks!
Yes, it's possible, but implementation will depend on your RDBMS.
Here's what it looks like in MySQL, PostgreSQL and SQLite:
select ID, value
from YourTable
where id >= 7
order by id
limit 1
In MS SQL-Server, Sybase and MS-Access:
select top 1 ID, value
from YourTable
where id >= 7
order by id
In Oracle:
select * from (
select ID, value
from YourTable
where id >= 7
order by id
)
where rownum = 1
In Firebird and Informix:
select first 1 ID, value
from YourTable
where id >= 7
order by id
In DB/2 (this syntax is in SQL-2008 standard):
select id, value
from YourTable
where id >= 7
order by id
fetch first 1 rows only
In those RDBMS that have "window" functions (in SQL-2003 standard):
select ID, Value
from (
select
ROW_NUMBER() OVER (ORDER BY id) as rownumber,
Id, Value
from YourTable
where id >= 7
) as tmp --- remove the "as" for Oracle
where rownumber = 1
And if you are not sure which RDBMS you have:
select ID, value
from YourTable
where id =
( select min(id)
from YourTable
where id >= 7
)
Try this for MS-SQL:
SELECT TOP 1
id, value
FROM your_table
WHERE id >= 7
ORDER BY id
or for MySql
SELECT id, value
FROM your_table
WHERE id >= 7
ORDER BY id
LIMIT 0,1
I would simply do it like this:
select top 1 * from myTable where id >=7
order by id
implementation of the top 1 part is T-SQL (MSSQL/Sybase), other implementations vary but it is always possible (mysql/postgre LIMIT 1, oracle rownum = 1)
select top 1 * from Persons where Id >= #Id order by Id