I have a query which returns a number of ints and varchars.
I want the result Distinct but by only one of the ints.
SELECT DISTINCT
t.TaskID ,
td.[TestSteps]
FROM SOX_Task t
snip
Is there a simple way to do this?
DISTINCT doesn't work that way ... it ensures that entire rows are not duplicated. Besides, if it did, how would it decide which values should appear in other columns?
What you probably want to do here is a GROUP BY on the column you want to be distinct, and then apply an appropriate aggregate operator to the other columns to get the values you want (eg. MIN, MAX, SUM, AVG, etc).
For example, the following returns a distinct list of task IDs with the maximum number of steps for that task. That may not be exactly what you want, but it should give you the general idea.
SELECT t.TaskID, MAX(t.TestSteps)
FROM SOX_Task t
GROUP BY t.TaskID
I want the result Distinct but by only one of the ints.
This:
SELECT DISTINCT t.taskid
FROM SOX_TASK t
...will return a list of unique taskid values - there will not be any duplicates.
If you want to return a specific value, you need to specify it in the WHERE clause:
SELECT t.*
FROM SOX_TASK t
WHERE t.taskid = ?
That will return all the rows with the specific taskid value. If you want a distict list of the values associated with the taskid value:
SELECT DISTINCT t.*
FROM SOX_TASK t
WHERE t.taskid = ?
GROUP BY is another means of getting distinct/unique values, and it's my preference to use, but in order to use GROUP BY we need to know the columns in the table and what grouping you want.
Related
I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.
I am aware of select count(distinct a), but I recently came across select distinct count(a).
I'm not very sure if that is even valid.
If it is a valid use, could you give me a sample code with a sample data, that would explain me the difference.
Hive doesn't allow the latter.
Any leads would be appreciated!
Query select count(distinct a) will give you number of unique values in a.
While query select distinct count(a) will give you list of unique counts of values in a. Without grouping it will be just one line with total count.
See following example
create table t(a int)
insert into t values (1),(2),(3),(3)
select count (distinct a) from t
select distinct count (a) from t
group by a
It will give you 3 for first query and values 1 and 2 for second query.
I cannot think of any useful situation where you would want to use:
select distinct count(a)
If the query has no group by, then the distinct is anomalous. The query only returns on row anyway. If there is a group by, then the aggregation columns should be in the select, to identify each row.
I mean, technically, with a group by, it would be answering the question: "how many different non-null values of a are in groups". Usually, it is much more useful to know the value per group.
If you want to count the number of distinct values of a, then use count(distinct a).
How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.
I have to find distinct count of combination of 2 variables. I used the following 2 queries to find the count:
select count(*) from
( select V1, V2
from table1
group by 1,2
) a
select count(distinct catx('-', V1, V2))
from table1
Logically, both the above queries should give the same count but I am getting different counts. Note that
both V1 and V2 are integers
Both variables can have null values, though there are no null values in my table
There are no negative values
Any idea why I might be getting different outputs? And which is the best way to find the count of distinct combinations of 2 or more columns?
Thanks.
The SAS log gives the answer when you run the first sql code. Using 'group by' requires a summary function, otherwise it is ignored. The count will therefore return the overall number of rows instead of a distinct count of the 2 variables combined.
Just add count(*) to the subquery and you will get the same answer with both methods.
select count(*) from
( select V1, V2, count(*)
from table1
group by 1,2
) a
Use distinct in the subquery for the first query..
When you do a group by but don't include any aggregate function, it discards the group by.
so you will still have duplicate combinations of v1 and v2.
It seems that GROUP BY doesn't work that way in SAS. You can't use it to remove duplicates unless you have an aggregate function in your query. I found this in the log of my query output -
NOTE: A GROUP BY clause has been discarded because neither the SELECT
clause nor the optional HAVING clause of the associated
table-expression referenced a summary function.
This answers the question.
you can ignore the group by part also and just add a distinct in the sub-query. Also the second query you wrote is more efficient
I want to find the average amount of a field where it meets a criterion. It is embedded in a big table but I would like this average field in there instead of doing it in a separate table.
This is what I have so far:
Select....
Avg( (currbal) where (select * from table
where ament2 in ('r1','r2'))
From table
If you want to AVG only a subset of a query use case when ... then to replace value in non-matching rows with null as nulls are ignored by avg().
Select id,
sum(something) SomethingSummed,
avg(case when ament2 in ('r1','r2') then currbal end) CurrbalAveragedForR1R2
From [table]
group by id
You can put all the other sums which you want to be embedded into the AVG statement, inside the table reference inside the FROM clause. Something like:
SELECT AVG(currbal)
FROM
(
SELECT * -- other sums
FROM table
WHERE ament2 IN ('r1','r2')
) t
You can write a full sub-select into the select list:
SELECT ...,
(SELECT AVG(Currbal) FROM Table WHERE ament2 IN ('r1', 'r2')) AS avg_currbal,
...
FROM ...
Whether that will do exactly what you want depends on a number of things. You might need to make that into a correlated subquery; assuming 'ament2' is in Table, it is not a correlated sub-query at the moment.