How can I make a distinct with multiple field - sql

I have some duplicate mail in my database but I can't remove it.
I want Select some field but without duplicate mail.
I have a request like this :
SELECT
DISTINCT MAIL,
ID,
CIVILITE,
PRENOM,
NAME
FROM CONTACT WHERE CODE_PAYS = 'DE'
When I launch this request, my duplicate values on mail are already here.
Do you know how can I do that ?
Update: i have tried this approach but i need to use it in a view:
ALTER VIEW ALL_VW_CONTACT_DE WITH SCHEMABINDING
AS
with cte as
(
select rn = row_number() over (partition by c.Mail Order By c.Id asc), c.Mail, c.Id, c.Civilite, c.Prenom, c.Name
from dbo.CONTACT c
where code_pays = 'DE'
)
select Mail, Id, Civilite, Prenom, Name
from cte
where rn = 1
But this doesn't work, i get this error:
Cannot schema bind view 'MY_TABLE' because name 'CONTACT' is invalid
for schema binding. Name must be in two-part format and an object
cannot reference itself

When I launch this request, my duplicate values on mail are already
here.
The reason for it is that DISTINCT doesn't work like you think. It doesn't look only at the first column after the DISTINCT keyword but it compares all columns in the list. So just if all are equal it is considered a duplicate.
One easy way is using ROW_NUMBER:
with cte as
(
select rn = row_number() over (partition by c.Mail Order By c.Id asc), c.*
from dbo.Contact c
where Code_Pays = 'DE'
)
select Mail, Id, Civilite, Prenom, Name
from cte
where rn = 1
Change the order by if you want to take a different record, here i take the one with min-ID.

you can use row_number as below
Select top (1) with ties * from Contact
where CODE_PAYS = 'DE'
order by row_number() over(partition by mail order by id)

When you use DISTINCT with other fields, then you get only original combinations of these fields.
For this case, you should exclude all dynamic fields from query (possibly ID):
SELECT
DISTINCT MAIL,
CIVILITE,
PRENOM,
NAME
FROM CONTACT WHERE CODE_PAYS = 'DE'

The problem here is probably The ID field. Since it should be unique for each row, you can't group the other fields. Remove it from the query and you should be fine.
When you do a distinct query, the trick is to look at the results and finding what columns are returning different values, that's what's differentiating them. If you add the results in your question we can help you further.

Related

SQL Question regarding fields associated with the MAX([field]) only

I'm trying to gather the entire row information associated with the MAX() of a particular field.
I essentially have several [Flight_Leg] for a unique [Shipment_ID], and each one has unique [Destination_Aiport], [Departure_Time], and [Arrival_Time]. Obviously, each [Shipment_ID] can have multiple [Flight_Leg], and each [Flight_Leg] has a unique row of information.
SELECT
[Shipment_ID],
MAX([Flight_Leg]) AS "Final Leg",
[Arrival_Time],
[Destination_Airport]
FROM
[Flight_Info]
Group By
[Shipment_ID],
[Arrival_Time]
The output is multiple lines, rather than having one unique line for [Shipment_ID]. I'm just trying to isolate the FINAL flight info for a shipment.
Depending on your database, most support window functions. Here's one option using row_number():
select *
from (
select *, row_number() over (partition by shipment_id order by flight_leg desc) rn
from flight_info
) t
where rn = 1
Alternatively here's a more generic approach joining back to itself:
select fi.*
from flight_info fi
join (select shipment_id, max(flight_leg) max_flight_leg
from flight_info
group by shipment_id) t on fi.shipment_id = t.shipment_id
and fi.flight_leg = t.max_flight_leg

SQL server 2016 Getting distinct results when only limited to a where statement

I am looking for a distinct list of the CUSTOMER_NAME field from my table. Normally I would simply do
SELECT
distinct
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
or
SELECT
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
Group by [CUSTOMER_NAME]
But I am limited in my query. Due to software restrictions, the query can only be of the form
SELECT * from
[iData3].[dbo].[N241650]
where ...
How do I get a distinct list of customer names given these restrictions? I essentially need to cram everything into the WHERE clause. I'm thinking possibly WHERE EXISTS or NOT EXISTS but I haven't used those conditions before so I'm not certain if they'd be useful.
This is not possible because... is acceptable if the disappointing answer.
You can use row_number() function :
SELECT TOP (1) WITH TIES [CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
ORDER BY ROW_NUMBER() OVER (PARTITION BY CUSTOMER_NAME ORDER BY ?)
? indicates something identity or primary/unique column which you have.
You can group by that column to achieve the same result.
select CUSTOMER_NAME
from ...
group by CUSTOMER_NAME
order by CUSTOMER_NAME;
Another alternative is to use a stored procedure.
If you can't escape from the *, then you can't GROUP BY and if you just have a WHERE then you will need a key (unique set of columns) to be able to filter correctly or else you can't differentiate dupicates (and end up selecting more than 1 row with the same customer name).
It's a bit convoluted, but try with this. It will get you 1 row per each CUSTOMER_NAME.
SELECT
*
from
[iData3].[dbo].[N241650]
where
[N241650].KeyColumn IN
(
SELECT
Z.KeyColumn
FROM
(
SELECT
X.KeyColumn,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn ASC)
FROM
[iData3].[dbo].[N241650] AS X
WHERE
X.KeyColumn IS NOT NULL
) AS Z
WHERE
Z.Ranking = 1
)
The ORDER BY inside the OVER will determine which row you get for each CUSTOMER_NAME.
If you have multiple columns for your key, then you will have to switch the IN for an EXISTS against multiple columns (you can't do a multiple column IN in SQL Server).
SELECT
*
from
[iData3].[dbo].[N241650]
where
EXISTS (
SELECT
'key columns match'
FROM (
SELECT
X.KeyColumn1,
X.KeyColumn2,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn1 ASC)
FROM
[iData3].[dbo].[N241650] AS X
) AS Z
WHERE
Z.Ranking = 1 AND
[N241650].KeyColumn1 = Z.KeyColumn1 AND
[N241650].KeyColumn2 = Z.KeyColumn2
)
You need something unique in each row. If you have that, you can use:
SELECT CUSTOMER_NAME
FROM [iData3].[dbo].[N241650]
WHERE pk = (SELECT MIN(n2.pk)
FROM [iData3].[dbo].[N241650] n2
WHERE n2.CUSTOMER_NAME = N241650.N241650
);
pk is the unique column.

GROUP_CONCAT multiple fields in Vertica

How can I do something like:
SELECT ID, Store,
GROUP_CONCAT(keyword::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS keywords,
GROUP_CONCAT(url::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS urls
FROM table_name
I get the following errors when I run the above query:
cannot specify more than one user-defined transform function in the SELECT list
I tried MySQL GROUP_CONCAT multiple fields but that seems like a MySQL thing. I also believe GROUP_CONCAT is no longer support for vertica 7.1.x, so if there is a better way to do this, I am open to that.
As the error states you can only have one UDTF in a single select statement, so to get around this you can split the query into two sub-queries and join them together.
SELECT x.ID, x.Store, x.keywords, y.urls
FROM (
SELECT
ID,
Store,
GROUP_CONCAT(keyword::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS keywords
FROM table_name
) x
JOIN (
SELECT
ID,
GROUP_CONCAT(url::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS urls
FROM table_name
) y
ON x.ID = y.ID
;
This will evaluate each query with its own GROUP_CONCAT function separately and then join them together.
You can find GROUP_CONCAT at the Vertica github strings package. You should be able to just make and make install if your vsql path is set up right.
Another alternative would be to use agg_concatenate which is included in the examples directory. You'd have to finagle the sql a little to get the ordering in the concatenation correct, though. You can see examples of how to do this in this stackoverflow answer.
You will need to handle the transformation of keyword and url in a saparate CTE and pass those through to the group_concat...
With cte_table_name AS (
SELECT
ID
,Store
,keyword::VARCHAR AS keywords
,url::VARCHAR AS urls
FROM table_name
)
SELECT
t.ID
,t.Store
,GROUP_CONCAT(c.keyword) OVER (PARTITION BY t.ID, t.Store ORDER BY num ASC) AS keywords
,GROUP_CONCAT(c.url) OVER (PARTITION BY t.ID, t.Store ORDER BY num ASC) AS urls
FROM
table_name t
JOIN
cte_table_name c
ON c.ID = t.ID
AND c.Store = t.Score

How to avoid order by in group by query result [duplicate]

I am trying to display the records,order as in the where clause..
example:
select name from table where name in ('Yaksha','Arun','Naveen');
It displays Arun,Naveen,Yaksha (alphabetical order)
I want display it as same order i.e 'Yaksha''Arun','Naveen'
how to display this...
I am using oracle db.
Add this ORDER BY at the query's end:
order by case name when 'Yaksha' then 1
when 'Arun' then 2
when 'Naveen' then 3
end
(There's no other way to get that order. You need an ORDER BY to get a specific result set order.)
It may be a bit clunky, but you can create a custom ordering with a case expression:
SELECT *
FROM my_table
WHERE name IN ('Yaksha', 'Arun','Naveen')
ORDER BY CASE name WHEN 'Yaksha' THEN 1
WHEN 'Arun' THEN 2
WHEN 'Naveen' THEN 3
END ASC
A slightly longer option, but one that prevents duplication of the string literals is to use a subquery:
SELECT m.*
FROM my_table m
JOIN (SELECT 'Yaksha' AS name, 1 AS name_order FROM dual
UNION ALL
SELECT 'Arun' AS name, 2 AS name_order FROM dual
UNION ALL
SELECT 'Naveen' AS name, 3 AS name_order FROM dual) o
ON o.name = m.name
ORDER BY o.name_order ASC
You can try with something like the following:
SELECT *
FROM test
WHERE name IN ( 'Yaksha', 'Arun', 'Naveen' )
ORDER BY instr ( q'['Yaksha', 'Arun', 'Naveen']', name ) ASC
This way could be useful if your IN list is somehow dynamic.
If the list of values is dynamic or you just don't want to repeat the values you could use (or abuse, depending on your point of view) a table collection, and join your real table to a table collection expression instead of using IN:
select your_table.name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen')) t
join your_table on your_table.name = t.column_value;
Which will generally work, but of course without an order-by clause is not guaranteed to work, so you can use an inline view to assign the order:
select your_table.name from (
select row_number() over (order by null) as rn, column_value as name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen'))
) t
join your_table on your_table.name = t.name
order by t.rn;
This still relies on row_number() over (order by null) using the order of the elements in the collection; which relies on collection unnesting preserving the element order. I don't think that's guaranteed either, so there is still some risk involved.

How to display records from a table ordered as in the where clause?

I am trying to display the records,order as in the where clause..
example:
select name from table where name in ('Yaksha','Arun','Naveen');
It displays Arun,Naveen,Yaksha (alphabetical order)
I want display it as same order i.e 'Yaksha''Arun','Naveen'
how to display this...
I am using oracle db.
Add this ORDER BY at the query's end:
order by case name when 'Yaksha' then 1
when 'Arun' then 2
when 'Naveen' then 3
end
(There's no other way to get that order. You need an ORDER BY to get a specific result set order.)
It may be a bit clunky, but you can create a custom ordering with a case expression:
SELECT *
FROM my_table
WHERE name IN ('Yaksha', 'Arun','Naveen')
ORDER BY CASE name WHEN 'Yaksha' THEN 1
WHEN 'Arun' THEN 2
WHEN 'Naveen' THEN 3
END ASC
A slightly longer option, but one that prevents duplication of the string literals is to use a subquery:
SELECT m.*
FROM my_table m
JOIN (SELECT 'Yaksha' AS name, 1 AS name_order FROM dual
UNION ALL
SELECT 'Arun' AS name, 2 AS name_order FROM dual
UNION ALL
SELECT 'Naveen' AS name, 3 AS name_order FROM dual) o
ON o.name = m.name
ORDER BY o.name_order ASC
You can try with something like the following:
SELECT *
FROM test
WHERE name IN ( 'Yaksha', 'Arun', 'Naveen' )
ORDER BY instr ( q'['Yaksha', 'Arun', 'Naveen']', name ) ASC
This way could be useful if your IN list is somehow dynamic.
If the list of values is dynamic or you just don't want to repeat the values you could use (or abuse, depending on your point of view) a table collection, and join your real table to a table collection expression instead of using IN:
select your_table.name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen')) t
join your_table on your_table.name = t.column_value;
Which will generally work, but of course without an order-by clause is not guaranteed to work, so you can use an inline view to assign the order:
select your_table.name from (
select row_number() over (order by null) as rn, column_value as name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen'))
) t
join your_table on your_table.name = t.name
order by t.rn;
This still relies on row_number() over (order by null) using the order of the elements in the collection; which relies on collection unnesting preserving the element order. I don't think that's guaranteed either, so there is still some risk involved.