DISTINCT for only one Column and other column random?

DISTINCT for only one Column and other column random? - sql

I have one Table name Demodata which have two column col1 and col2. data of table is
col1 col2
1 5
1 6
2 7
3 8
3 9
4 10
and after SELECT command we need this data
col1 Col2
1 5
6
2 7
3 8
9
4 10
is this possible then what is query please guide me

Try this
SELECT CASE WHEN RN > 1 THEN NULL ELSE Col1 END,Col2
FROM
(
SELECT *,Row_Number() Over(Partition by col1 order by col1) AS RN
From yourTable
) AS T

No it is not possible.
SQL Server result sets are row based not tree based. You must have a value for each column (alternatively a NULL value).
What you can do is grouping by col1 and run an aggregate function on the values of col2 (possibly the STUFF function).

You can do this in SQL, using row_number():
select (case when row_number() over (partition by col1 order by col2) = 1
then col1
end), col2
from table t
order by col1, col2;
Notice that the ordering is important. The way you have written the result set, the data is ordered by col1 and then col2. Result sets do not have an inherent ordering, unless you include an order by clause.
Also, I have used NULL for the missing values.
And, finally, although this can be done in SQL, it is often preferable to do these types of manipulations on the client side.

What do you want to select on the duplicates, an empty string, NULL, 0, ... ?
I presume NULL, you can use a CTE with ROW_NUMBER and CASE on col1:
WITH CTE AS(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY (SELECT 1))
, col1, col2
FROM Demodata
)
SELECT col1 = CASE WHEN RN = 1 THEN col1 ELSE NULL END, col2
FROM CTE
Demo

Related

How can I use a COUNT(DISTINCT var) to return the count of unique values per group?

I need to return a count of unique values, but unique per group of the result set, not unique to the entire result set. For example I would like the following code:
SELECT col1 AS letters, count(DISTINCT col2) AS numbers
GROUP BY col1;
applied to this data:
col1 col2
a 5
a 5
a 6
b 1
b 2
b 6
To return this:
col1 col2
a 2
b 3
If the above code will not produce this, how can I accomplish this is T-SQL?

I hope this works for your solution, you need to use group by on col2 with count distinct of col2
SELECT
col1,
COUNT(DISTINCT col2)
FROM
count_unique_values_per_group
GROUP BY
col1

Try this:
SELECT DISTINCT col1
,dense_rank() over (partition by col1 order by col2 asc) + dense_rank() over (partition by col1 order by col2 desc) - 1
FROM my_table

Apply concat function to get the unique count. Hope this helps..
SELECT col1, count(distinct col1 + col2) FROM table_name group by col1;
or
SELECT col1, count(distinct concat(col1,col2)) FROM table_name group by col1;

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?

You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .

You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.

With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

how to select min value from table if table has two unique values with rest of columns are identical

ex:Input
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
2 a a hive
Out put
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
Here my id value and Col3 values are unique but i need to filter on min id and populate all records.
I know below approach will work, but any best approach other than this please suggest
select Col1,Col2,min(ID) from table group by Col1,Col2;
and join this on ID,Col1,Col2

I think you want row_number():
select t.*
from (select t.*, row_number() over (partition by col1, col2 order by id) as seqnum
from t
) t
where seqnum = 1

It appears that Hive supports ROW_NUMBER. Though I’ve never used hive, other rdbms would use it like this to get the entire contents of the min row without needing to join (doesn’t suffer problems if there are repeated minimum values)
SELECT a.* FROM
(
SELECT *, ROW_NUMBER() OVER(ORDER BY id) rn FROM yourtable
) a
WHERE a.rn = 1
The inner query selects all the table data and establishes an incrementing counter in order of ID. It could be based on any column, the min ID (in this case) being row number 1. If you wanted the max, order by ID desc
If you want the number to restart for different values of another column (eg of ten of your Col3 were “sql” and twenty rows had “hive”) you an say PARTITION BY col3 ORDER BY id, and the row number will be a counter that increments for identical values of col3, restarting from 1 for each distinct value of col3

HAVING clause: at least one of the ungrouped values is X

Example table:
Col1 | Col2
A | Apple
A | Banana
B | Apple
C | Banana
Output:
A
I want to get all values of Col1 which have more than one entry and at least one with Banana.
I tried to use GROUP BY:
SELECT Col1
FROM Table
GROUP BY Col1
HAVING count(*) > 1
AND ??? some kind of ONEOF(Col2) = 'Banana'
How to rephrase the HAVING clause that my query works?

Use conditional aggregation:
SELECT Col1
FROM Table
GROUP BY Col1
HAVING COUNT(DISTINCT col2) > 1 AND
COUNT(CASE WHEN col2 = 'Banana' THEN 1 END) >= 1
You can conditionally check for Col1 groups having at least one 'Banana' value using COUNT with CASE expression inside it.
Please note that the first COUNT has to use DISTINCT, so that groups with at least two different Col1 values are detected. If by having more than one entry you mean also rows having the same Col2 values repeated more than one time, then you can skip DISTINCT.

SELECT Col1
FROM Table
GROUP BY Col1
HAVING count(*) > 1
AND Col1 in (select distinct Col1 from Table where Col2 = 'Banana');

Here is a simple approach:
SELECT Col1
FROM table
GROUP BY Col1
HAVING COUNT(DISTINCT CASE WHEN col2= 'Banana' THEN 1 ELSE 2 END) = 2

Try this,
declare #t table(Col1 varchar(20), Col2 varchar(20))
insert into #t values('A','Apple')
,('A','Banana'),('B','Apple'),('C','Banana')
select col1 from #t A
where exists
(select col1 from #t B where a.col1=b.col1 and b.Col2='Banana')
group by col1
having count(*)>1

Return distinct rows from not entirely distinct results

Two columns, first is distcint, second not so much.
Col1 ---- Col2
1 ---- abc
1 ---- abc (123)
2 ---- def
2 ---- def (324)
etc
I need to bring back distinct records, but only the ones with the longer Col2.
I've tried using the CONTAINS function, but my table isn't full-text indexed.

One option is to use use ROW_NUMBER() ordering by the LEN() of Col2:
SELECT *
FROM (
SELECT Col1, Col2, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY LEN(Col2) DESC) rn
FROM YourTable
) t
WHERE rn = 1
SQL Fiddle Demo

SELECT col1 ,
col2
FROM ( SELECT col1 ,
col2 ,
Rank() OVER ( PARTITION BY col1 ORDER BY col2 DESC ) row
FROM dbo.table
) t
WHERE row = 1
You can also try this ..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DISTINCT for only one Column and other column random? - sql

I have one Table name Demodata which have two column col1 and col2. data of table is col1 col2 1 5 1 6 2 7 3 8 3 9 4 10 and after SELECT command we need this data col1 Col2 1 5 6 2 7 3 8 9 4 10 is this possible then what is query please guide me

Try this SELECT CASE WHEN RN > 1 THEN NULL ELSE Col1 END,Col2 FROM ( SELECT *,Row_Number() Over(Partition by col1 order by col1) AS RN From yourTable ) AS T

No it is not possible. SQL Server result sets are row based not tree based. You must have a value for each column (alternatively a NULL value). What you can do is grouping by col1 and run an aggregate function on the values of col2 (possibly the STUFF function).

Related

How can I use a COUNT(DISTINCT var) to return the count of unique values per group?

SQL DISTINCT based on a single column, but keep all columns as output

how to select min value from table if table has two unique values with rest of columns are identical

HAVING clause: at least one of the ungrouped values is X

Return distinct rows from not entirely distinct results

Categories

Resources