How to use DISTINCT used while selecting all columns including sequence number column? - sql

My query is to avoid duplicate in a particular column while selecting all columns. But DISTINCT is not working since seq.number column is also being selected.
Any idea to make the query work
In the below example query seq_num is unique key.
Edit: including sample data in picture
select DISTINCT(name), seq_num from table_1;![enter image description here](https://i.stack.imgur.com/Y3NYn.jpg)

For two columns this query will be enough:
SELECT name, min(seq_num)
FROM table
GROUP BY name
For more column, use row_number analytic functon
SELECT name, col1, col2, .... col500, seq_num
FROM (
SELECT t.*, row_number() over (partition by name order by seq_num ) As rn
FROM table t
)
WHERE rn = 1
The above queries pick only one row with a given name and the smallest seq_num value for each name.

You cannot do what you want. Please read more about DISTINCT clause and query result set. You will understand that distinct is not suitable for your issue. If you provide some sample data for what you have and what should query show, when possible we will help you.

Related

SQL Query for multiple columns with one column distinct

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

SQL Query - Rank showing only 1 rank for all records

I am trying to perform ranking based on some calculation of already existing columns. I tried using the SQL RANK() function however it is showing the result as 1 for all entries even if the value of the order by (score column) is different. Please see the details below:
qu_point and ti_points are calculated columns
score column is again a derived column, however, simply sum of two columns mentioned in point 1.
I have used the SQL query as follow:
use EFR_DB
GO
select d.serial, d.question_set_id, d.correct_answers, d.total_questions, d.time_taken_seconds, q.total_time_in_secs,
(cast(d.correct_answers as float)/d.total_questions) as qu_point, ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs) as ti_point,
(((cast(d.correct_answers as float)/d.total_questions)*2) + ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs)) as score,
rank() over (partition by d.question_set_id order by score)
from daily_quiz_record d join Question_set q
on q.question_set_id=d.question_set_id
Please help me how can I do the raking which is partitioned by question_set_id and ranked on the basis of the score.
Screenshot attached for your reference.
enter image description here
You can’t use an alias defined in the select clause in the same clause. I suppose that one of your table has a column called score, otherwise your query would error - so this existing column is being used for ordering instead of the computed value.
Since your expression is lengthy, it is simpler to turn the query to a subquery, and rank in the outer query:
select
t.*,
rank() over(partition by question_set_id order by score) rn
from (
-- your existing query (without rank)
) t

Return only the newest rows from a BigQuery table with a duplicate items

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.
I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.
I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.
How do I make sure I get the value for title, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

How to read the maximum date in this SQL query?

Below is the image of the query result. I want to show Tucson/Boulder only once based on maximum 'addressvalidfrom'. How can I create/modify the query?
If you do not want to use grouping (to persist the rest of the query) you can add a ROW_NUMBER column and filter it where it is 1.
Example
SELECT * FROM
( -- insert your query here with new line below in the select fields
, ROW_NUMBER() OVER (PARTITION BY CUST_RETAIL_CHANNEL_NAME ORDER BY addressvalidfrom DESC) AS Rnk
) D
WHERE D.Rnk=1
use a max for the addressvalidfrom field, and a group by for the other fields.
I can show you if you post the actual query.
http://www.w3schools.com/sql/sql_groupby.asp
where the aggregate is your max(addressvalidfrom)
Can you also post what you want to get as a result if possible.

SQL Query Returns different results based on the number of columns selected

Hello
I am writing a query and am little confused about the results i'm getting.
select distinct(serial_number)
from AssyQC
This query returns 309,822 results
However if I modify the select statement to include a different column as follows
select distinct(serial_number), SCAN_TIME
from AssyQC
The query returns 309,827 results. The more columns I add the more results show up.
I thought the results would be bound to only the distinct serial_number that were returned initially. That is what I want, only the distinct serial_numbers
Can anyone explain this behavior to me?
Thanks
SELECT distinct applies to the whole selected column list not just serial_number.
The more columns you add then clearly the more unique combinations you are getting.
Edit
From your comment on Cade's answer
let's say i wanted the largest/latest
time stamp
this is what you neeed.
SELECT serial_number, MAX(SCAN_TIME) AS SCAN_TIME
FROM AssyQC
GROUP BY serial_number
Or if you want additional columns
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY serial_number
ORDER BY SCAN_TIME DESC) AS RN
FROM AssyQC
)
SELECT *
FROM CTE
WHERE RN=1
you're probably looking for
select distinct on serial_number serial_number, SCAN_TIME from AssyQC
See this related question:
SQL/mysql - Select distinct/UNIQUE but return all columns?