SQL Query Returns different results based on the number of columns selected - sql

Hello
I am writing a query and am little confused about the results i'm getting.
select distinct(serial_number)
from AssyQC
This query returns 309,822 results
However if I modify the select statement to include a different column as follows
select distinct(serial_number), SCAN_TIME
from AssyQC
The query returns 309,827 results. The more columns I add the more results show up.
I thought the results would be bound to only the distinct serial_number that were returned initially. That is what I want, only the distinct serial_numbers
Can anyone explain this behavior to me?
Thanks

SELECT distinct applies to the whole selected column list not just serial_number.
The more columns you add then clearly the more unique combinations you are getting.
Edit
From your comment on Cade's answer
let's say i wanted the largest/latest
time stamp
this is what you neeed.
SELECT serial_number, MAX(SCAN_TIME) AS SCAN_TIME
FROM AssyQC
GROUP BY serial_number
Or if you want additional columns
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY serial_number
ORDER BY SCAN_TIME DESC) AS RN
FROM AssyQC
)
SELECT *
FROM CTE
WHERE RN=1

you're probably looking for
select distinct on serial_number serial_number, SCAN_TIME from AssyQC
See this related question:
SQL/mysql - Select distinct/UNIQUE but return all columns?

Related

SQL Query for multiple columns with one column distinct

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

How to use DISTINCT used while selecting all columns including sequence number column?

My query is to avoid duplicate in a particular column while selecting all columns. But DISTINCT is not working since seq.number column is also being selected.
Any idea to make the query work
In the below example query seq_num is unique key.
Edit: including sample data in picture
select DISTINCT(name), seq_num from table_1;![enter image description here](https://i.stack.imgur.com/Y3NYn.jpg)
For two columns this query will be enough:
SELECT name, min(seq_num)
FROM table
GROUP BY name
For more column, use row_number analytic functon
SELECT name, col1, col2, .... col500, seq_num
FROM (
SELECT t.*, row_number() over (partition by name order by seq_num ) As rn
FROM table t
)
WHERE rn = 1
The above queries pick only one row with a given name and the smallest seq_num value for each name.
You cannot do what you want. Please read more about DISTINCT clause and query result set. You will understand that distinct is not suitable for your issue. If you provide some sample data for what you have and what should query show, when possible we will help you.

Return only the newest rows from a BigQuery table with a duplicate items

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.
I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.
I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.
How do I make sure I get the value for title, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

Getting unique column amongst duplicate columns but returning the complete row

I need help on creating a select statement in sql to get the unique rows.
I need the unique Reference ID but since Call Time is also unique, I only need to get the first row out of the similar rows.
I have this table[Calls]:
The result should be:
When I used:
Select Distinct * FROM Calls
It will return the same table and not the result I want.
It may helps you...
min(date) is the first datetime for each individual
Select referenceid,min(date),number from calls
group by referenceid,number
Perhaps a simple GROUP BY:
SELECT ReferenceID,
MIN(CallTime) AS CallTime,
MIN(Number) AS Number
FROM dbo.TableName t
GROUP BY ReferenceID

How to read the maximum date in this SQL query?

Below is the image of the query result. I want to show Tucson/Boulder only once based on maximum 'addressvalidfrom'. How can I create/modify the query?
If you do not want to use grouping (to persist the rest of the query) you can add a ROW_NUMBER column and filter it where it is 1.
Example
SELECT * FROM
( -- insert your query here with new line below in the select fields
, ROW_NUMBER() OVER (PARTITION BY CUST_RETAIL_CHANNEL_NAME ORDER BY addressvalidfrom DESC) AS Rnk
) D
WHERE D.Rnk=1
use a max for the addressvalidfrom field, and a group by for the other fields.
I can show you if you post the actual query.
http://www.w3schools.com/sql/sql_groupby.asp
where the aggregate is your max(addressvalidfrom)
Can you also post what you want to get as a result if possible.