Remove duplicate rows on a SQL query [duplicate] - sql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Duplicate result
Interview - Detect/remove duplicate entries
I have a SQL Query, which returns a table with one column.
The returned data may be duplicate. for example, my query may be something like:
SELECT item FROM myTable WHERE something = 3
and the returned data may be something like this:
item
-----
2
1
4
5
1
9
5
My Question is, How to remove duplicated items from my query?
I mean, I want to get these results:
item
-----
2
1
4
5
9
Please note that I don't want to change or delete any rows in table. I just want to remove duplicates in that query.
How to do that?

SELECT DISTINCT item FROM myTable WHERE something = 3

As noted, the distinct keyword eliminates duplicate rows—where the rows have identical values in column—from the result set.
However, for a non-trivial query against a properly designed database, the presence of duplicate rows in the result set — and their elimination via select distinct or select ... group by is, IMHO, most often a "code smell" indicating improper or incorrect join criteria, or a lack of understanding of the cardinalities present in relationships between tables.
If I'm reviewing the code, select distinct or gratuitous group by without any obvious need present will get the containing query flagged and that query gone over with a fine toothed comb.

You need to add the DISTINCT keyword to your query.
This keyword is pretty standard and supported on all major databases.
See DISTINCT refs here

SELECT DISTINCT item FROM myTable WHERE something = 3
You just have to use distinct

Related

How can select different records from a list with duplicated values?

I'm new in the SQL/Oracle universe and I would like to ask for your help. This is a very simple question that I'm stuck in.
So, let me give you a picture. I have a regular table, let's call it "table1". The PK is the first column, "c1". Let's suppose that I would like to make the following select:
select (1) from table1 where c1 in ('1','2','3')
This will give me
(1)
1
1
2
1
3
1
However, if I make the following select
select (1) from table1 where c1 in ('1','2','2')
this will give me
(1)
1
1
2
1
My question is, why in the second case there is not 3 records? Can I modify the second case to give 3 records, in other words, how can I prevent to the selection acts like a "distinct" clause?
I know that it may be a dummy question, so let me thank you all in advance.
The where clause filters rows generated by the from clause.
Conditions in the where clause only specify whether or not a given row is in the result set. They do not specify how many times a given row is in the result set.
If you want to "multiply" the number of rows, you would need to use a join with a derived table that has duplicate values.

SQL for getting each category data in maria db

I need to fetch 4 random values from each category. What should be the correct sql syntax for maria db. I have attached one image of table structure.
Please click here to check the structure
Should i write some procedure or i can do it with basic sql syntax?
You can do that with a SQL statement if you only have a few rows:
SELECT id, question, ... FROM x1 ORDER BY rand() LIMIT 1
This works fine if you have only a few rows - as soon as you have thousands of rows the overhead for sorting the rows becomes important, you have to sort all rows for getting only one row.
A trickier but better solution would be:
SELECT id, question from x1 JOIN (SELECT CEIL(RAND() * (SELECT(MAX(id)) FROM x1)) AS id) as id using(id);
Running EXPLAIN on both SELECTS will show you the difference...
If you need random value for different categories combine the selects via union and add a where clause
http://mysql.rjweb.org/doc.php/groupwise_max#top_n_in_each_group
But then ORDER BY category, RAND(). (Your category is the blog's province.)
Notice how it uses #variables to do the counting.
If you have MariaDB 10.2, then use one of its Windowing functions.
SELECT column FROM table WHERE category_id = XXX
ORDER BY RAND()
LIMIT 4
do it for all categories

Assistance with SQL statement

I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?
Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.
To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON
Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.

select distinct over specific columns

A query in a system I maintain returns
QID AID DATA
1 2 x
1 2 y
5 6 t
As per a new requirement, I do not want the (QID, AID)=(1,2) pair to be repeated. We also dont care what value is selected from "data" column. either x or y will do.
What I have done is to enclose the original query like this
SELECT * FROM (<original query text>) Results group by QID,AID
Is there a better way to go about this? The original query uses multiple joins and unions and what not, So I would prefer not to touch it unless its absolutely necesary
If you don't care which DATA will be selected, GROUP BY is nice, though using ungrouped and unaggregated columns in SELECT clause of a GROUP BY statement is MySQL specific and not portable.

Union two or more tables, when you don't know the number of tables you are merging

I am working with MS SQL 2005.
I have defined a tree structure as:
1
|\
2 3
/|\
4 5 6
I have made a SQL-function Subs(id), that gets the id, and returns the subtree table.
So, Subs(3) will return 4 rows with 3,4,5,6, while Subs(2) will return one row, with 2.
I have a select statement that returns the above Ids (joining this tree with other tables)
I want after the select statement that returns the above Ids
(which will result in a table with for example 2 rows:)
2
3
to be able to run the Subs-function as
Subs(2)
union
Subs(3).
(The result should be the rows with id 2,3,4,5,6)
The problem is that I don't know how to pass the arguments and I don't know how to make the dynamic use of union.
Is it possible to solve this at this level, or should I take it to the upper-level (C#)?
I do not think you need UNION here, with SQL Server 2005 you can achieve the desired result using CROSS APPLY:
select
f.*
from
resultsTable rt
cross apply dbo.subs(rt.ID) f
That is assuming that the resultTable is the one that stores the results of your first query, and the name of the field is ID
I think you want to read up on Recursive Common Table Expressions.