how to remove the duplicate fields from SQL

how to remove the duplicate fields from SQL - sql

I am trying to create a SQL query with a similar kind of record like below as well as the expected outcome. Basically, to fetch the Project records with the top most FundSrc in the list.
Can someone please suggest a query for this?
e.g- Tablename- Proj
| Project | FundSrc |
|---------|---------|
| 1001 | ABC |
| 1001 | XYZ |
| 1001 | TYS |
| 1002 | XYZ |
| 1002 | TYS |
| 1003 | ABC |
| 1003 | TYS |
| 1003 | TYS |
Expected outcome-
Result
| Project | FundSrc |
|--------- |--------- |
| 1001 | ABC |
| 1002 | XYZ |
| 1003 | ABC |

Find duplicate rows using the GROUP BY clause or ROW_NUMBER() function.
Use the DELETE statement to remove duplicate rows.
SELECT [Project],
[FundSrc],
COUNT(*) AS CNT
FROM [SampleDB].[dbo].[dbname]
GROUP BY [Project],
[FundSrc]
HAVING COUNT(*) > 1;
First, the CTE uses the ROW_NUMBER() function to find the duplicate rows specified by values in the Projectand and FundSrc columns.
Then, the DELETE statement deletes all the duplicate rows but keeps only one occurrence of each duplicate group.

SQL tables represent unordered sets. There is not "topmost" value unless a column specifies what "topmost" means. Your data doesn't have such a column.
If it did, then you would have different options. One simple way uses row_number():
select p.*
from (select p.*,
row_number() over (partition by project order by <ordering col> desc) as seqnum
from proj p
) p
where seqnum = 1;

You need ordering column to avoid random result
with source (key, value, o) as (values
(1001, 'ABC', 1),
(1001, 'XYZ', 2),
(1001, 'TYS', 3),
(1002, 'XYZ', 4),
(1002, 'TYS', 5),
(1003, 'ABC', 6),
(1003, 'TYS', 7),
(1003, 'TYS', 8)
)
select distinct key, first_value (value) over (partition by key order by o) from source
;

Related

MySQL: Select penult values

There are 2 tables:
table1:
id |phone| order|
---|-----|------|
1 | 122 | 6 |
2 | 122 | 4 |
3 | 122 | 3 |
4 | 123 | 6 |
5 | 123 | 5 |
6 | 123 | 3 |
7 | 124 | 6 |
8 | 124 | 5 |
9 | 125 | 6 |
10| 125 | 5 |
table2:
|phone |
|------|
|122 |
|123 |
|124 |
I have to select id and last order according next conditions:
If order not equals 3 take row with max id value for this phone
If order equals 3 take pre-max id for this phone
Id is in table2.
So result should be:
|phone | order|
|------ |------|
|122 | 4 |
|123 | 5 |
|124 | 5 |
MySQL version: Ver 15.1 Distrib 5.5.64-MariaDB

Basically you want to look at the last two records; if the last record has order 3, then use the previous one.
That would have been a simple query with window functions and/or lateral joins be your old MySQL version does not support these features. User variables are an option, as demonstrated by nbk, but they are tricky to use - and MySQL 8.0 annonced that this feature will be deprecated in a future version.
I am going to recommend correlated subqueries and a little logic:
select t2.id,
coalesce(
nullif((select ord from table1 t1 where t1.id = t2.id order by odering_id desc limit 1), 3),
(select ord from table1 t1 where t1.id = t2.id order by odering_id desc limit 1, 1)
) as ord
from table2 t2
The first subquery gets the latest value; nullif() checks the returned value and returns null if it has order 3; this indicate coalesce() that it should return the result of the second subquery, that gets the previous value.
order is a language keyword, so I used ord instead.
Demo in MySQL 5.5:
id | ord
--: | --:
122 | 4
123 | 5
124 | 5

Your mariadb version is a little old
Thta will use the row number sorted by the order column and it will select onl ythe second one.
the LIMIT in the subquery is needed,, because mariadb follows the standard and would not sort the subselect.
CREATE TABLE Table1
(`id` int, `order` int)
;
INSERT INTO Table1
(`id`, `order`)
VALUES
(122, 6),
(122, 4),
(122, 3),
(123, 6),
(123, 5),
(123, 3),
(124, 6),
(124, 5),
(125, 6),
(125, 5)
;
CREATE TABLE Table2
(`id` int)
;
INSERT INTO Table2
(`id`)
VALUES
(122),
(123),
(124)
;
SELECT id,`order`
FROM (SELECT
t1.`order`
, IF ( #id = t1.id ,#rn := #rn +1, #rn:= 1) AS rownum
, #id := t1.`id` as id
FROM Table1 t1 INNER JOIN Table2 t2 ON t1.id = t2.id,(SELECT #id := 0,#rn := 0) t3
ORDEr BY t1.id,t1.`order` DESC LIMIT 18446744073709551615) t4
WHERE rownum = 2
id | order
--: | ----:
122 | 4
123 | 5
124 | 5
db<>fiddle here

Classify records based on matching table

I have two tables: ITEMS and MATCHING_ITEMS, as below:
ITEMS:
|---------------------|------------------|
| ID | Name |
|---------------------|------------------|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
| 6 | F |
| 7 | G |
|---------------------|------------------|
MATCHING_ITEMS:
|---------------------|------------------|
| ID_1 | ID_2 |
|---------------------|------------------|
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 4 | 5 |
| 4 | 6 |
| 5 | 6 |
|---------------------|------------------|
The MATCHING_ITEMS table defines items that match each other, and thus belong to the same group, i.e. items 1,2, and 3 match with each other and thus belong in a group, and the same for items 4,5, and 6. Item 7 does not have a match belong to any group.
I now need to add a 'Group' column on the ITEMS table which contains a unique integer for each group, so it would look as follows:
ITEMS:
|---------------------|------------------|------------------|
| ID | Name | Group |
|---------------------|------------------|------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 3 | C | 1 |
| 4 | D | 2 |
| 5 | E | 2 |
| 6 | F | 2 |
| 7 | G | NULL |
|---------------------|------------------|------------------|
So far I have been using a stored procedure to do this, looping over each line in the MATCHING_ITEMS table and updating the ITEMS table with a group value. The problem is that I eventually need to do this for a table containing millions of records, and the looping method is far too slow.
Is there a way that I can achieve this without using a loop?

If you have all pairs of matches in the matching table, then you can just use the minimum id to assign the group. For this:
select i.*,
(case when grp_id is not null
then dense_rank() over (order by grp_id)
end) as grouping
from items i left join
(select mi.id_1, least(mi.id1, min(mi.id2)) as grp_id
from matching_items mi
group by mi.id_1
) mi
on i.id = mi.id_1;
Note: This works only if all pairs are in the matching items table. Otherwise, you will need a recursive/hierarchical query to get all the pairs.

You could use min and max at first, then dense_rank to assign group numbers:
select id, name, dense_rank() over (order by mn, mx) grp
from (
select distinct id, name,
min(id_1) over (partition by name) mn,
max(id_2) over (partition by name) mx
from items left join matching_items on id in (id_1, id_2))
order by id
demo

The pairs 2,3 and 5,6 in the Matching_items table seem redundant as they could be derived (if I am reading your question right)
Here is how I did it. I just reused id_1 from your example as the group no:
create table
items (
ID number,
name varchar2 (2)
);
insert into items values (1, 'A');
insert into items values (2, 'B');
insert into items values (3, 'C');
insert into items values (4, 'D');
insert into items values (5, 'E');
insert into items values (6, 'F');
insert into items values (7, 'G');
create table
matching_items (
ID number,
ID_2 number
);
insert into matching_items values (1, 2);
insert into matching_items values (1, 3);
insert into matching_items values (2, 3);
insert into matching_items values (4, 5);
insert into matching_items values (4, 6);
insert into matching_items values (5, 6);
with new_grp as
(
select id, id_2, id as group_no
from matching_items
where id in (select id from items)
and id not in (select id_2 from matching_items)),
assign_grp as
(
select id, group_no
from new_grp
union
select id_2, group_no
from new_grp)
select items.id, name, group_no
from items left outer join assign_grp
on items.id = assign_grp.id;

Using Limit on Distinct group by values psql

Suppose I have a table that looks like this or maybe I am going nowhere.
create table customers (id text, name text, number int, useless text);
With values
insert into customers (id, name, number, useless)
values
('1','apple',1, 'a'),
('2','banana',3, 'b'),
('3','pear',2, 's'),
('4','apple',1,'e'),
('5','banana',3,'s'),
('6','cherry',3, 'a'),
('7','cherry',4, 's'),
('8','apple',2, 'd'),
('9','banana',4, 'c'),
('10','pear',5, 'e');
My failed psql query is this.
select id, name, number, useless
from customers
where number < 4
group by customers.name limit 2
the query i want to use that it returns first 2 unique grouped by customers.name. Not the first 2 rows
In the end I want it to return
('1','apple',1, 'a'),
('4','apple',1,'e'),
('8','apple',2, 'd'),
('2','banana',3, 'b'),
('5','banana',3,'s'),
so it returns the first 2 grouped names.
How can I make this query?
Thank you.
Edit:
this query is my second try I know I am kinda close.
select t.id, t.name, t.ranking
from (
SELECT id, name, dense_rank() OVER (order by name) as
ranking
FROM customers
group by name
) t
where t.ranking < 3

try this:
select id, name, number, useless
from customers
where name in (
select name
from customers
where number < 4
group by customers.name
order by name limit 2
)
| id | name | number | useless |
|----|--------|--------|---------|
| 1 | apple | 1 | a |
| 2 | banana | 3 | b |
| 4 | apple | 1 | e |
| 5 | banana | 3 | s |
| 8 | apple | 2 | d |
| 9 | banana | 4 | c |
SQL Fiddle DEMO

The group by customers.name function do not order your output, just group them by the customers.name, what you want to do is to order the group right? So what i think you want to do is:
select id, name, number, useless
from customers
group by name
order by name []*
*[asc/desc] depends of what order you want to do:
asc - ascendent,
desc - descendent
Hope it helps you.

You can use dense_rank() as:
SELECT * FROM (
SELECT DENSE_RANK() OVER (order by name) AS rank, temp.*
FROM customers temp WHERE number < 4) data
WHERE data.rank <= 2
| rank| id| name | number | useless |
|-----|---|--------|--------|---------|
| 1 | 4 | apple | 1 | e |
| 1 | 1 | apple | 1 | a |
| 1 | 8 | apple | 2 | d |
| 2 | 5 | banana | 3 | s |
| 2 | 2 | banana | 3 | b |

Select distinct one field other first non empty or null

I have table
| Id | val |
| --- | ---- |
| 1 | null |
| 1 | qwe1 |
| 1 | qwe2 |
| 2 | null |
| 2 | qwe4 |
| 3 | qwe5 |
| 4 | qew6 |
| 4 | qwe7 |
| 5 | null |
| 5 | null |
is there any easy way to select distinct 'id' values with first non null 'val' values. if not exist then null. for example
result should be
| Id | val |
| --- | ---- |
| 1 | qwe1 |
| 2 | qwe4 |
| 3 | qwe5 |
| 4 | qew6 |
| 5 | null |

In your case a simple GROUP BY should be the solution:
SELECT Id
,MIN(val)
FROM dbo.mytable
GROUP BY Id
Whenever using a GROUP BY, you have to use an aggregate function on all columns, which are not listed in the GROUP BY.
If an Id has a value (val) other than NULL, this value will be returned.
If there are just NULLs for the Id, NULL will be returned.
As far as i unterstood (regarding your comment), this is exactly what you're going to approach.
If you always want to have "the first" value <> NULL, you'll need another sort criteria (like a timestamp column) and might be able to solve it with a WINDOW-function.

If you want the first non-NULL value (where "first" is based on id), then MIN() doesn't quite do it. Window functions do:
select t.*
from (select t.*,
row_number() over (partition by id
order by (case when val is not null then 1 else 2 end),
id
) as seqnum
from t
) t
where seqnum = 1;

SQL Fiddle:
Create Table from SQL Fiddle:
CREATE TABLE tab1(pid integer, id integer, val varchar(25))
Insert dummy records :
insert into tab1
values (1, 1 , null),
(2, 1 , 'qwe1' ),
(3, 1 , 'qwe2'),
(4, 2 , null ),
(5, 2 , 'qwe4' ),
(6, 3 , 'qwe5' ),
(7, 4 , 'qew6' ),
(8, 4 , 'qwe7' ),
(9, 5 , null ),
(10, 5 , null );
fire below query:
SELECT Id ,MIN(val) as val FROM tab1 GROUP BY Id;

Count Number of Consecutive Occurrence of values in Table

I have below table
create table #t (Id int, Name char)
insert into #t values
(1, 'A'),
(2, 'A'),
(3, 'B'),
(4, 'B'),
(5, 'B'),
(6, 'B'),
(7, 'C'),
(8, 'B'),
(9, 'B')
I want to count consecutive values in name column
+------+------------+
| Name | Repetition |
+------+------------+
| A | 2 |
| B | 4 |
| C | 1 |
| B | 2 |
+------+------------+
The best thing I tried is:
select Name
, COUNT(*) over (partition by Name order by Id) AS Repetition
from #t
order by Id
but it doesn't give me expected result

One approach is the difference of row numbers:
select name, count(*)
from (select t.*,
(row_number() over (order by id) -
row_number() over (partition by name order by id)
) as grp
from t
) t
group by grp, name;
The logic is easiest to understand if you run the subquery and look at the values of each row number separately and then look at the difference.

You could use windowed functions like LAG and running total:
WITH cte AS (
SELECT Id, Name, grp = SUM(CASE WHEN Name = prev THEN 0 ELSE 1 END) OVER(ORDER BY id)
FROM (SELECT *, prev = LAG(Name) OVER(ORDER BY id) FROM t) s
)
SELECT name, cnt = COUNT(*)
FROM cte
GROUP BY grp,name
ORDER BY grp;
db<>fiddle demo
The first cte returns group number:
+-----+-------+-----+
| Id | Name | grp |
+-----+-------+-----+
| 1 | A | 1 |
| 2 | A | 1 |
| 3 | B | 2 |
| 4 | B | 2 |
| 5 | B | 2 |
| 6 | B | 2 |
| 7 | C | 3 |
| 8 | B | 4 |
| 9 | B | 4 |
+-----+-------+-----+
And main query groups it based on grp column calculated earlier:
+-------+-----+
| name | cnt |
+-------+-----+
| A | 2 |
| B | 4 |
| C | 1 |
| B | 2 |
+-------+-----+

I have use Recursive CTE and minimise the use of row_number,also avoid count(*).
I think it will perform better,but in real world it depend what else filter you put to minimise number of rows affected.
If ID is having discreet values then One extra CTE will be use to generate continuous id.
;With CTE2 as
(
select ROW_NUMBER()over(order by id) id, name,1 Repetition ,1 Marker from #t
)
, CTE as
(
select top 1 cast(id as int) id, name,1 Repetition ,1 Marker from CTE2 order by id
union all
select a.id, a.name
, case when a.name=c.name then Repetition +1 else 1 end
, case when a.name=c.name then c.Marker else Marker+1 end
from #t a
inner join CTE c on a.id=c.id+1
)
,CTE1 as
(select *,ROW_NUMBER()over(partition by marker order by id desc)rn from cte c
)
select Name,Repetition from cte1 where rn=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to remove the duplicate fields from SQL - sql

Related

MySQL: Select penult values

Classify records based on matching table

Using Limit on Distinct group by values psql

Select distinct one field other first non empty or null

Count Number of Consecutive Occurrence of values in Table

Categories

Resources