Subtract one table from another - sql

I have two tables with two different select statements. These tables contain only one column. I would like to subtract the rows from table2 from rows in table1 only once. In other words: I would like to remove only one occurence, not all.
table1:
apple
apple
orange
table2:
apple
pear
result:
apple
orange

Basically FYI If A={A,A,O},B={A,P} then A-B is logically
select * from t1 except select * from t2
try this !
create table #t(id varchar(10))
create table #t1(id1 varchar(10))
insert into #t values('apple'),('apple'),('orange')
insert into #t1 values('apple'),('pear')
select * from
(
select *,rn=row_number()over(partition by id order by id) from #t
except
select *,rn1=row_number()over(partition by id1 order by id1) from #t1
)x
SEE DEMO

Here is an answer for an Oracle dbms. The trick is to number records per fruit, so to get apple 1, apple 2, etc. Then subtract the sets to stay with apple 2 whereas apple 1 was removed for instance. (The row_number function needs a sort order which is not important for us, but we must specify it for syntax reasons.)
select fruit
from
(
select fruit, row_number() over (partition by fruit order by fruit)
from table1
minus
select fruit, row_number() over (partition by fruit order by fruit)
from table2
);

I don't think you would be able to delete the records through a single SQL query:
According to me, you will need to connect the database to a programming language and will need to write an algorithm like this:
for(all records in table2)
{
if(record present in table1)
{
delete from table1 where record = (record in table2) limit 1;
}
}

Related

What is the difference between Postgres DISTINCT vs DISTINCT ON?

I have a Postgres table created with the following statement. This table is filled by as dump of data from another service.
CREATE TABLE data_table (
date date DEFAULT NULL,
dimension1 varchar(64) DEFAULT NULL,
dimension2 varchar(128) DEFAULT NULL
) TABLESPACE pg_default;
One of the steps in a ETL I'm building is extracting the unique values of dimension1 and inserting them in another intermediary table.
However, during some tests I found out that the 2 commands below do not return the same results. I would expect for both to return the same sum.
The first command returns more results compared with the second (1466 rows vs. 1504.
-- command 1
SELECT DISTINCT count(dimension1)
FROM data_table;
-- command 2
SELECT count(*)
FROM (SELECT DISTINCT ON (dimension1) dimension1
FROM data_table
GROUP BY dimension1) AS tmp_table;
Any obvious explanations for this? Alternatively to an explanation, is there any suggestion of any check on the data I should do?
EDIT: The following queries both return 1504 (same as the "simple" DISTINCT)
SELECT count(*)
FROM data_table WHERE dimension1 IS NOT NULL;
SELECT count(dimension1)
FROM data_table;
Thank you!
DISTINCT and DISTINCT ON have completely different semantics.
First the theory
DISTINCT applies to an entire tuple. Once the result of the query is computed, DISTINCT removes any duplicate tuples from the result.
For example, assume a table R with the following contents:
#table r;
a | b
---+---
1 | a
2 | b
3 | c
3 | d
2 | e
1 | a
(6 rows)
SELECT distinct * from R will result:
# select distinct * from r;
a | b
---+---
1 | a
3 | d
2 | e
2 | b
3 | c
(5 rows)
Note that distinct applies to the entire list of projected attributes: thus
select distinct * from R
is semantically equivalent to
select distinct a,b from R
You cannot issue
select a, distinct b From R
DISTINCT must follow SELECT. It applies to the entire tuple, not to an attribute of the result.
DISTINCT ON is a postgresql addition to the language. It is similar, but not identical, to group by.
Its syntax is:
SELECT DISTINCT ON (attributeList) <rest as any query>
For example:
SELECT DISTINCT ON (a) * from R
It semantics can be described as follows. Compute the as usual--without the DISTINCT ON (a)---but before the projection of the result, sort the current result and group it according to the attribute list in DISTINCT ON (similar to group by). Now, do the projection using the first tuple in each group and ignore the other tuples.
Example:
select * from r order by a;
a | b
---+---
1 | a
2 | e
2 | b
3 | c
3 | d
(5 rows)
Then for every different value of a (in this case, 1, 2 and 3), take the first tuple. Which is the same as:
SELECT DISTINCT on (a) * from r;
a | b
---+---
1 | a
2 | b
3 | c
(3 rows)
Some DBMS (most notably sqlite) will allow you to run this query:
SELECT a,b from R group by a;
And this give you a similar result.
Postgresql will allow this query, if and only if there is a functional dependency from a to b. In other words, this query will be valid if for any instance of the relation R, there is only one unique tuple for every value or a (thus selecting the first tuple is deterministic: there is only one tuple).
For instance, if the primary key of R is a, then a->b and:
SELECT a,b FROM R group by a
is identical to:
SELECT DISTINCT on (a) a, b from r;
Now, back to your problem:
First query:
SELECT DISTINCT count(dimension1)
FROM data_table;
computes the count of dimension1 (number of tuples in data_table that where dimension1 is not null). This query
returns one tuple, which is always unique (hence DISTINCT
is redundant).
Query 2:
SELECT count(*)
FROM (SELECT DISTINCT ON (dimension1) dimension1
FROM data_table
GROUP BY dimension1) AS tmp_table;
This is query in a query. Let me rewrite it for clarity:
WITH tmp_table AS (
SELECT DISTINCT ON (dimension1)
dimension1 FROM data_table
GROUP by dimension1)
SELECT count(*) from tmp_table
Let us compute first tmp_table. As I mentioned above,
let us first ignore the DISTINCT ON and do the rest of the
query. This is a group by by dimension1. Hence this part of the query
will result in one tuple per different value of dimension1.
Now, the DISTINCT ON. It uses dimension1 again. But dimension1 is unique already (due to the group by). Hence
this makes the DISTINCT ON superflouos (it does nothing).
The final count is simply a count of all the tuples in the group by.
As you can see, there is an equivalence in the following query (it applies to any relation with an attribute a):
SELECT (DISTINCT ON a) a
FROM R
and
SELECT a FROM R group by a
and
SELECT DISTINCT a FROM R
Warning
Using DISTINCT ON results in a query might be non-deterministic for a given instance of the database.
In other words, the query might return different results for the same tables.
One interesting aspect
Distinct ON emulates a bad behaviour of sqlite in a much cleaner way. Assume that R has two attributes a and b:
SELECT a, b FROM R group by a
is an illegal statement in SQL. Yet, it runs on sqlite. It simply takes a random value of b from any of the tuples in the group of same values of a.
In Postgresql this statement is illegal. Instead, you must use DISTINCT ON and write:
SELECT DISTINCT ON (a) a,b from R
Corollary
DISTINCT ON is useful in a group by when you want to access a value that is functionally dependent on the group by attributes. In other words, if you know that for every group of attributes they always have the same value of the third attribute, then use DISTINCT ON that group of attributes. Otherwise you would have to make a JOIN to retrieve that third attribute.
The first query gives the number of not null values of dimension1, while the second one returns the number of distinct values of the column. These numbers obviously are not equal if the column contains duplicates or nulls.
The word DISTINCT in
SELECT DISTINCT count(dimension1)
FROM data_table;
makes no sense, as the query returns a single row. Maybe you wanted
SELECT count(DISTINCT dimension1)
FROM data_table;
which returns the number of distinct not null values of dimension1. Note, that it is not the same as
SELECT count(*)
FROM (
SELECT DISTINCT ON (dimension1) dimension1
FROM data_table
-- GROUP BY dimension1 -- redundant
) AS tmp_table;
The last query yields the number of all (null or not) distinct values of the column.
To learn and understand what happens by visual example.
Here's a bit of SQL to execute on a PostgreSQL:
DROP TABLE IF EXISTS test_table;
CREATE TABLE test_table (
id int NOT NULL primary key,
col1 varchar(64) DEFAULT NULL
);
INSERT INTO test_table (id, col1) VALUES
(1,'foo'), (2,'foo'), (3,'bar'), (4,null);
select count(*) as total1 from test_table;
-- returns: 4
-- Because the table has 4 records.
select distinct count(*) as total2 from test_table;
-- returns: 4
-- The count(*) is just one value. Making 1 total unique can only result in 1 total.
-- So the distinct is useless here.
select col1, count(*) as total3 from test_table group by col1 order by col1;
-- returns 3 rows: ('bar',1),('foo',2),(NULL,1)
-- Since there are 3 unique col1 values. NULL's are included.
select distinct col1, count(*) as total4 from test_table group by col1 order by col1;
-- returns 3 rows: ('bar',1),('foo',2),(NULL,1)
-- The result is already grouped, and therefor already unique.
-- So again, the distinct does nothing extra here.
select count(distinct col1) as total5 from test_table;
-- returns 2
-- NULL's aren't counted in a count by value. So only 'foo' & 'bar' are counted
select distinct on (col1) id, col1 from test_table order by col1 asc, id desc;
-- returns 3 rows: (2,'a'),(3,'b'),(4,NULL)
-- So it gets the records with the maximum id per unique col1
-- Note that the "order by" matters here. Changing that DESC to ASC would get the minumum id.
select count(*) as total6 from (select distinct on (col1) id, col1 from test_table order by col1 asc, id desc) as q;
-- returns 3.
-- After seeing the previous query, what else would one expect?
select distinct col1 from test_table order by col1;
-- returns 3 unique values : ('bar'),('foo'),(null)
select distinct id, col1 from test_table order by col1;
-- returns all records.
-- Because id is the primary key and therefore makes each returned row unique
Here's a more direct summary that might useful for Googlers, answering the title but not the intricacies of the full post:
SELECT DISTINCT
availability: ISO
behaviour:
SELECT DISTINCT col1, col2, col3 FROM mytable
returns col1, col2 and col3 and omits any rows in which all of the tuple (col1, col2, col3) are the same. E.g. you could get a result like:
1 2 3
1 2 4
because those two rows are not identical due to the 4. But you could never get:
1 2 3
1 2 4
1 2 3
because 1 2 3 appears twice, and both rows are exactly the same. That is what DISTINCT prevents.
vs GROUP BY: SELECT DISTINCT is basically a subset of GROUP BY where you can't use aggregate functions: Is there any difference between GROUP BY and DISTINCT
SELECT DISTINCT ON
availability: PostgreSQL extension, WONTFIXED by SQLite
behavior: unlike DISTINCT, DISTINCT ON allows you to separate
what you want to be unique
from what you want to return
E.g.:
SELECT DISTINCT ON(col1) col2, col3 FROM mytable
returns col2 and col3, and does not return any two rows with the same col1. E.g.:
1 2 3
1 4 5
could not happen, because we have 1 twice on col1.
And e.g.:
SELECT DISTINCT ON(col1, col2) col2, col3 FROM mytable
would prevent any duplicated (col1, col2) tuples, e.g. you could get:
1 2 3
1 4 3
as it has different (1, 2) and (1, 4) tuples, but not:
1 2 3
1 2 4
where (1, 2) happens twice, only one of those two could appear.
We can uniquely determine which one of the possible rows will be selected with ORDER BY which guarantees that the first match is taken, e.g.:
SELECT DISTINCT ON(col1, col2) col2, col3 FROM mytable
ORDER BY col1 DESC, col2 DESC, col3 DESC
would ensure that among:
1 2 3
1 2 4
only 1 2 4 would be picked as it happens first on our DESC sorting.
vs GROUP BY: DISTINCT ON is not a subset of GROUP BY because it allows you to access extra rows not present in the GROUP BY, which is generally not allowed in GROUP BY, unless:
you group by primary key in Postgres (unique not null is a TODO for them)
or if that is allows as an ISO extension as in SQLite/MySQL
This makes DISTINCT ON extremely useful to fulfill the common use case of "find the full row that reaches the maximum/minimum of some column": Is there any difference between GROUP BY and DISTINCT
E.g. to find the city of each country that has the most sales:
SELECT DISTINCT ON ("country") "country", "city", "amount"
FROM "Sales"
ORDER BY "country" ASC, "amount" DESC, "city" ASC
or equivalently with * if we want all columns:
SELECT DISTINCT ON ("country") *
FROM "Sales"
ORDER BY "country" ASC, "amount" DESC, "city" ASC
Here each country appears only once, within each country we then sort by amount DESC and take the first, and therefore highest, amount.
RANK and ROW_NUMBER window functions
These can be used basically as supersets of DISTINCT ON, and implemented tested as of both SQLite 3.34 and PostgreSQL 14.3. I highly recommend also looking into them, see e.g.: How to SELECT DISTINCT of one column and get the others?
This is how the above "city with the highest amount of each country" query would look like with ROW_NUMBER:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY "country"
ORDER BY "amount" DESC, "city" ASC
) AS "rnk",
*
FROM "Sales"
) sub
WHERE
"sub"."rnk" = 1
ORDER BY
"sub"."country" ASC
Try
SELECT count(dimension1a)
FROM (SELECT DISTINCT ON (dimension1) dimension1a
FROM data_table
ORDER BY dimension1) AS tmp_table;
DISTINCT ON appears to be synonymous with GROUP BY.

Make value from every second row appear in new 3rd column

Lets assume my data looks like this :
Every second row represents old (previous value) in a table that holds historical data.
table 1 :
id value
------------
1 a
1 b
2 c
2 d
3 a
3 b
and i want to get value of every second row to appear in new 3rd column like this :
table 2:
id new_value old_value
------------------------
1 a b
2 c d
3 a b
EDIT:
For clarity ill post the skeleton of query thats producing data i want to transform (so its clear i am already using WITH so cant use additional one due to oracle not yet allowing nesting of WITH elements) :
skeleton code that produces data in table 1 :
with candidates as
(
--select list of candidates
)
SELECT * FROM
(
(
--select new values
MINUS
--select old values
)
UNION
(
--select old values
MINUS
--select new values
)
)
ORDER BY id;
The goal is to finally get only a list of ids that changed with their old and new values.
Thanks in advance.
Use CTE
;WITH CTE AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN
FROM TableName
)
SELECT ID,
MIN(CASE WHEN RN=1 THEN [value] END) NewValue,
MIN(CASE WHEN RN=2 THEN [value] END) OldValue
FROM CTE
GROUP BY ID
It is quite possible that overall query can be written in a much simpler way. Just join intermediary results with old and new values together on id to put them in two different columns instead of unioning them into the same column.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_NewValues
AS
(
--select new values
select id, value AS new_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
,CTE_OldValues
AS
(
--select old values
select id, value AS old_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
SELECT
CTE_NewValues.id
,CTE_NewValues.new_value
,CTE_OldValues.old_value
FROM
CTE_NewValues
INNER JOIN CTE_OldValues ON CTE_NewValues.id = CTE_OldValues.id
WHERE
CTE_NewValues.new_value <> CTE_OldValues.old_value
ORDER BY
CTE_NewValues.id;
If we stick to the skeleton of the query in the question, there are also many ways to do it. Self-join is likely to be less efficient than using analytic functions, like ROW_NUMBER and LEAD.
Sorting just by id is not enough to unambiguously define which value is new or old. You need to have some extra column to resolve it.
You don't "nest" WITH (common-table expressions), you "chain" them. Something like the following. As you do that, make sure to add the sort_order column to be able to distinguish old and new values, if you don't have a similar column already.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_YourQuery
AS
(
SELECT * FROM
(
(
--select new values
select 1 AS sort_order, id, value
MINUS
--select old values
select 1 AS sort_order, id, value
)
UNION ALL
(
--select old values
select 2 AS sort_order, id, value
MINUS
--select new values
select 2 AS sort_order, id, value
)
)
)
,CTE_RowNumber
AS
(
SELECT
id
,value AS new_value
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY sort_order) AS rn
,LEAD(value) OVER (PARTITION BY id ORDER BY sort_order) AS old_value
FROM CTE_YourQuery
)
SELECT
id
,new_value
,old_value
FROM CTE_RowNumber
WHERE rn = 1
ORDER BY id;
Assuming there is some other column which defines the "order" in which the new and old value appears, you can do this:
select t1.id, t1.value as old_value, t2.value as new_value
from the_table t1
join the_table t2 on t1.id = t2.id and t1.sort_order < t2.sort_order
But you have to have some column that distinguishes the row that is considered "old" from the one that is considered "new".

SQL Duplicates multi criteria selection

I am trying to create a query that removes duplicates based multiple rules.
Some sample data:
Column A: Column B:
One Apple
One Pear
Two Apple
Two Mango
Three Pear
Four Mango
Five Plum
Six Mango
Zero Banana
Essentially what I have been rattling about is that I would like the query to return the distinct pairs based upon duplicates from each column. Meaning that the if there is a duplicate in A, all entries are removed based upon the column duplicates(example, two and two would remove two-apple, two-mango). The same logic for B(ex, apple apple and mango mango being taken out) So the final results would be:
Column A: Column B:
Three Pear
Zero Banana
Five Plum
Any pointers would be great. I am on SQL Server. Thank you.
You could join back on the table and then select rows that have no matches (e.g. no duplicates).
SELECT a, b
FROM my_table source
LEFT JOIN my_table a_dups
ON source.a = a_dups.a
AND source.b <> a_dups.b
LEFT JOIN my_table b_dups
ON source.b = b_dups.b
AND source.a <> b_dups.a
WHERE a_dups.a IS NULL
AND b_dups.b IS NULL
Typing this outside IDE so pardon SQL errors but hopefully should give you an idea.
You can use a windowing functions to get the count of the each field and then just check that the count is one for both. Like this:
SELECT ColumnA, ColumnB
FROM (
SELECT ColumnA, ColumnB,
COUNT(*) OVER (PARTITION BY ColumnA) as ACount,
COUNT(*) OVER (PARTITION BY ColumnB) as BCount
FROM TABLE
) X
WHERE ACount = 1 AND BCount = 1
Here it goes:
Creating sample data set:
CREATE TABLE #temp (ColumnA varchar(20), ColumnB varchar(20))
INSERT INTO #temp
VALUES('One','Apple'),
('One','Pear'),
('Two','Apple'),
('Two','Mango'),
('Three','Pear'),
('Four','Mango'),
('Five','Plum'),
('Six','Mango'),
('Zero','Banana');
Showing full data set:
SELECT * FROM #temp;
using Common Table expression with patrition by to identify duplicates in both columns:
WITH CTE AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnA ) AS rn1, ROW_NUMBER() OVER (PARTITION BY ColumnB ORDER BY ColumnB ) AS rn2 FROM #temp)
SELECT * FROM CTE WHERE ColumnA NOT IN (SELECT ColumnA FROM CTE WHERE rn1 <> 1) AND ColumnB NOT IN (SELECT ColumnB FROM CTE WHERE rn2 <> 1)
Showing result after:
SELECT * FROM #temp;

Checking for duplicate data in SQL Server

Please don't ask me why but there is a lot of duplicate data where every field is duplicated.
For example
alex, 1
alex, 1
liza, 32
hary, 34
I will need to eliminate from this table one of the alex, 1 rows
I know this algorithm will be very ineffecient, but it does not matter. I will need to remove duplicate data.
What is the best way to do this? Please keep in mind I do not have 2 fields, I actually have about 10 fields to check on.
As you said, yes this will be very inefficient, but you can try something like
DECLARE #TestTable TABLE(
Name VARCHAR(20),
SomeVal INT
)
INSERT INTO #TestTable SELECT 'alex', 1
INSERT INTO #TestTable SELECT 'alex', 1
INSERT INTO #TestTable SELECT 'liza', 32
INSERT INTO #TestTable SELECT 'hary', 34
SELECT *
FROM #TestTable
;WITH DuplicateVals AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Name, SomeVal ORDER BY (SELECT NULL)) RowID
FROM #TestTable
)
DELETE FROM DuplicateVals WHERE RowID > 1
SELECT *
FROM #TestTable
I understand this does not answer the specific question (eliminating dupes in SAME table), but I'm offering the solution because it is very fast and might work best for the author.
Speedy solution, if you don't mind creating a new table, create a new table with the same schema named NewTable.
Execute this SQL
Insert into NewTable
Select
name,
num
from
OldTable
group by
name,
num
Just include every field name in both the select and group by clauses.
Method A. You can get a deduped version of your data using
SELECT field1, field2, ...
INTO Deduped
FROM Source
GROUP BY field1, field2, ...
for example, for your sample data,
SELECT name, number
FROM Source
GROUP BY name, number
yields
alex 1
hary 34
liza 32
then simply delete the old table, and rename the new one. Of course, there are a number of fancy in-place solutions, but this is the clearest way to do it.
Method B. An in-place method is to create a primary key and delete duplicates that way. For example, you can
ALTER TABLE Source ADD sid INT IDENTITY(1,1);
which makes Source look like this
alex 1 1
alex 1 2
liza 32 3
hary 34 4
then you can use
DELETE FROM Source
WHERE sid NOT IN
(SELECT MIN(sid)
FROM Source
GROUP BY name, number)
which will give the desired result. Of course, "NOT IN" is not exactly the most efficient, but it will do the job. Alternatively, you can LEFT JOIN the grouped table (maybe stored in a TEMP table), and do the DELETE that way.
create table DuplicateTable(name varchar(10), number int)
insert DuplicateTable
values
('alex', 1),
('alex', 1),
('liza', 32),
('hary', 34);
with cte
as
(
select *, row_number() over(partition by name, number order by name) RowNumber
from DuplicateTable
)
delete cte
where RowNumber > 1
A bit different solution which requires primary key(or unique index):
Suppose you have a table your_table(id - PK, name, and num)
DELETE
FROM your_table
FROM your_table AS t2
WHERE
(select COUNT(*) FROM your_table y
where t2.name = y.name and t2.num = y.num) >1
AND t2.id !=
(SELECT top 1 id FROM your_table z
WHERE t2.name = z.name and t2.num = z.num);
I assumed that name and num are NOT NULL, if they can contain NULL values, you need to change wheres in sub-queries.

Select DISTINCT, return entire row

I have a table with 10 columns.
I want to return all rows for which Col006 is distinct, but return all columns...
How can I do this?
if column 6 appears like this:
| Column 6 |
| item1 |
| item1 |
| item2 |
| item1 |
I want to return two rows, one of the records with item1 and the other with item2, along with all other columns.
In SQL Server 2005 and above:
;WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col6 ORDER BY id) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
In SQL Server 2000, provided that you have a primary key column:
SELECT mt.*
FROM (
SELECT DISTINCT col6
FROM mytable
) mto
JOIN mytable mt
ON mt.id =
(
SELECT TOP 1 id
FROM mytable mti
WHERE mti.col6 = mto.col6
-- ORDER BY
-- id
-- Uncomment the lines above if the order matters
)
Update:
Check your database version and compatibility level:
SELECT ##VERSION
SELECT COMPATIBILITY_LEVEL
FROM sys.databases
WHERE name = DB_NAME()
The key word "DISTINCT" in SQL has the meaning of "unique value". When applied to a column in a query it will return as many rows from the result set as there are unique, different values for that column. As a consequence it creates a grouped result set, and values of other columns are random unless defined by other functions (such as max, min, average, etc.)
If you meant to say you want to return all rows for which Col006 has a specific value, then use the "where Col006 = value" clause.
If you meant to say you want to return all rows for which Col006 is different from all other values of Col006, then you still need to specify what that value is => see above.
If you want to say that the value of Col006 can only be evaluated once all rows have been retrieved, then use the "having Col006 = value" clause. This has the same effect as the "where" clause, but "where" gets applied when rows are retrieved from the raw tables, whereas "having" is applied once all other calculations have been made (i.e. aggregation functions have been run etc.) and just before the result set is returned to the user.
UPDATE:
After having seen your edit, I have to point out that if you use any of the other suggestions, you will end up with random values in all other 9 columns for the row that contains the value "item1" in Col006, due to the constraint further up in my post.
You can group on Col006 to get the distinct values, but then you have to decide what to do with the multiple records in each group.
You can use aggregates to pick a value from the records. Example:
select Col006, min(Col001), max(Col002)
from TheTable
group by Col006
order by Col006
If you want the values to come from a specific record in each group, you have to identify it somehow. Example of using Col002 to identify the record in each group:
select Col006, Col001, Col002
from TheTable t
inner join (
select Col006, min(Col002)
from TheTable
group by Col006
) x on t.Col006 = x.Col006 and t.Col002 = x.Col002
order by Col006
SELECT *
FROM (SELECT DISTINCT YourDistinctField FROM YourTable) AS A
CROSS APPLY
( SELECT TOP 1 * FROM YourTable B
WHERE B.YourDistinctField = A.YourDistinctField ) AS NewTableName
I tried the answers posted above with no luck... but this does the trick!
select * from yourTable where column6 in (select distinct column6 from yourTable);
SELECT *
FROM harvest
GROUP BY estimated_total;
You can use GROUP BY and MIN() to get more specific result.
Lets say that you have id as the primary_key.
And we want to get all the DISTINCT values for a column lets say estimated_total, And you also need one sample of complete row with each distinct value in SQL. Following query should do the trick.
SELECT *, min(id)
FROM harvest
GROUP BY estimated_total;
create table #temp
(C1 TINYINT,
C2 TINYINT,
C3 TINYINT,
C4 TINYINT,
C5 TINYINT,
C6 TINYINT)
INSERT INTO #temp
SELECT 1,1,1,1,1,6
UNION ALL SELECT 1,1,1,1,1,6
UNION ALL SELECT 3,1,1,1,1,3
UNION ALL SELECT 4,2,1,1,1,6
SELECT * FROM #temp
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (PARTITION BY C6 Order by C1) ID,* FROM #temp
)T
WHERE ID = 1