SQL Query for multiple columns with one column distinct - sql

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!

You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

Related

SQL Query - Rank showing only 1 rank for all records

I am trying to perform ranking based on some calculation of already existing columns. I tried using the SQL RANK() function however it is showing the result as 1 for all entries even if the value of the order by (score column) is different. Please see the details below:
qu_point and ti_points are calculated columns
score column is again a derived column, however, simply sum of two columns mentioned in point 1.
I have used the SQL query as follow:
use EFR_DB
GO
select d.serial, d.question_set_id, d.correct_answers, d.total_questions, d.time_taken_seconds, q.total_time_in_secs,
(cast(d.correct_answers as float)/d.total_questions) as qu_point, ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs) as ti_point,
(((cast(d.correct_answers as float)/d.total_questions)*2) + ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs)) as score,
rank() over (partition by d.question_set_id order by score)
from daily_quiz_record d join Question_set q
on q.question_set_id=d.question_set_id
Please help me how can I do the raking which is partitioned by question_set_id and ranked on the basis of the score.
Screenshot attached for your reference.
enter image description here
You can’t use an alias defined in the select clause in the same clause. I suppose that one of your table has a column called score, otherwise your query would error - so this existing column is being used for ordering instead of the computed value.
Since your expression is lengthy, it is simpler to turn the query to a subquery, and rank in the outer query:
select
t.*,
rank() over(partition by question_set_id order by score) rn
from (
-- your existing query (without rank)
) t

Selecting distinct values from database

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

Getting unique column amongst duplicate columns but returning the complete row

I need help on creating a select statement in sql to get the unique rows.
I need the unique Reference ID but since Call Time is also unique, I only need to get the first row out of the similar rows.
I have this table[Calls]:
The result should be:
When I used:
Select Distinct * FROM Calls
It will return the same table and not the result I want.
It may helps you...
min(date) is the first datetime for each individual
Select referenceid,min(date),number from calls
group by referenceid,number
Perhaps a simple GROUP BY:
SELECT ReferenceID,
MIN(CallTime) AS CallTime,
MIN(Number) AS Number
FROM dbo.TableName t
GROUP BY ReferenceID

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

select group by in sql multiple columns

I have three columns in one table (code, code alt and product). Code column has duplicate data. I want to leave all results without repeating the code column. I try with this
Select code, code alt, product from table
where code in
(
select code from table
group by code
having count (code)=1
)
but do not appear all results.
Thanks
If you want to leave only one row with the particular code from the bunch of rows with the same code you need to decide which one from multiple rows you want to leave.
You need some criteria by which you will be able to rank rows with the same code and select one from them by higher (for example) rank value. The script below will leave only one - random - row with specific code.
This is just an example that shows you the idea and it intended for SQL Server - because you do not pointed your DBMS
with [src] as (
select code, [code alt], product, rank() over(partition by code order by newid()) [rank]
from [table])
select * from [src] where [rank] = 1
Ranking Functions