sql selecting unique rows based on a specific column

sql selecting unique rows based on a specific column - sql

I have an table like this :
Col1 Col2 Col3 Col4
asasa 1 d 44
asasa 2 sd 34
asasa 3 f 3
dssd 4 d 2
sdsdsd 5 sd 11
dssd 1 dd 34
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to filter out something like this based on Col1
Col1 Col2 Col3 Col4
asasa 1 d 44
dssd 4 d 2
sdsdsd 5 sd 11
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to get the all unique rows based on the values in Col1. If I have duplicates in Col1, the first row should be taken.
I tried SELECT Col1 FROM tblname GROUP BY Col1 and got unique Col1 but extending it using * is giving me error.

You should be able to achieve your goal using something like the following:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2) AS rn FROM MyTable
)
SELECT * FROM CTE WHERE rn = 1
What it does is it creates a CTE (Common Table Expression) that adds a ROW_NUMBER on Col1, ordered by the data in row2.
In the outer select, we then only grab the rows from the CTE where the row number generated is 1.

Try this
;WITH CTE(
SELECT *,
ROW_NUMBER() OVER(PARTITIAN BY Col1 ORDER BY(SELECT NULL))RN
FROM tblname
)
SELECT Col1, Col2, Col3, Col4 FROM CTE;

Depending on the flavor of SQL that you have are using, what may help you are window functions.
In SQL Server, this can be accomplished with the FIRST_VALUE window function like so:
DROP TABLE IF EXISTS #vals;
CREATE TABLE #vals (COL1 VARCHAR(10), COL2 INT, COL3 VARCHAR(5), COL4 INT);
INSERT INTO #vals (COL1, COL2, COL3, COL4)
VALUES ('asasa', 1, 'd', 44),
('asasa', 2, 'sd', 34),
('asasa', 3, 'f', 3),
('dssd' , 4, 'd', 2),
('sdsdsd', 5, 'sd', 11),
('dssd', 1, 'dd', 34),
('xxxsdsds', 2, 'd', 3),
('erewer', 3, 'sd', 3);
SELECT *
FROM #vals
SELECT DISTINCT COL1,
FIRST_VALUE(COL2) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col2,
FIRST_VALUE(COL3) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col3,
FIRST_VALUE(COL4) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col4
FROM #vals AS v1
This returns:
|COL1 | Col2 | Col3 | Col4|
|-----------|-----------|-----------|-------|
|asasa | 1 | d | 44 |
|dssd | 4 | d | 2 |
|erewer | 3 | sd | 3 |
|sdsdsd | 5 | sd | 11 |
|xxxsdsds | 2 | d | 3 |
which may then be ORDERed in whatever way is needed.

Select DISTINCT , should do the trick. Here is a good reference https://www.w3schools.com/sql/sql_distinct.asp

Related

Addition with NULL values across multiple columns

Col1 Col2 Col3 SumCol
4 9 NULL 13
NULL 8 2 10
8 3 NULL 11
NULL 5 5 10
I have a table populated with columns Col1, Col2, and Col3, and I am trying to create a new column, SumCol. I know addition with NULL values is annoying so I appreciate any help

you can use below queries in sql-server
select id, col1, col2, col3, (coalesce(col1, 0) + coalesce(col2, 0) + coalesce(col3, 0)) total
from #tbl
OR
select id, col1, col2, col3, (ISNULL(col1, 0) + ISNULL(col2, 0) + ISNULL(col3, 0)) total
from #tbl

It is very simple by using XQuery or COALESCE().
SQL #1
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Col1 INT, Col2 INT, Col3 INT);
INSERT INTO #tbl (Col1, Col2, Col3) VALUES
( 4 , 9, NULL),
(NULL, 8, 2 ),
( 8 , 3, NULL),
(NULL, 5, 5 );
-- DDL and sample data population, end
SELECT ID, Col1, Col2, Col3
, x.value('sum(/root/*/text())', 'INT') AS Summary
FROM #tbl
CROSS APPLY (SELECT Col1, Col2, Col3 FOR XML PATH(''), TYPE, ROOT('root')) AS t(x);
SQL #2
Based on the #DaleK's advice, a most common solution is below.
SELECT *
, Summary = COALESCE(Col1,0) + COALESCE(Col2,0) + COALESCE(Col3,0)
FROM #tbl;
SQL #3
A generic way tailored towards Col1, Col2, ..., ColN scenario.
SELECT ID, Col1, Col2, Col3
, x.value('sum(/root/*[not(local-name()="ID")]/text())', 'INT') AS Summary
FROM #tbl AS p
CROSS APPLY (SELECT * FROM #tbl AS c
WHERE c.ID = p.ID
FOR XML PATH(''), TYPE, ROOT('root')) AS t(x);
Output
+----+------+------+------+---------+
| ID | Col1 | Col2 | Col3 | Summary |
+----+------+------+------+---------+
| 1 | 4 | 9 | NULL | 13 |
| 2 | NULL | 8 | 2 | 10 |
| 3 | 8 | 3 | NULL | 11 |
| 4 | NULL | 5 | 5 | 10 |
+----+------+------+------+---------+

A computed column is often easier, if you want that calculation to always be available to anyone who queries the table:
ALTER TABLE YourTable
ADD COLUMN SumCol AS ISNULL(Col1, 0) + ISNULL(Col2, 0) + ISNULL(Col3, 0);

T-SQL sequential updating with two columns

I have a table created by:
CREATE TABLE table1
(
id INT,
multiplier INT,
col1 DECIMAL(10,5)
)
INSERT INTO table1
VALUES (1, 2, 1.53), (2, 3, NULL), (3, 2, NULL),
(4, 2, NULL), (5, 3, NULL), (6, 1, NULL)
Which results in:
id multiplier col1
-----------------------
1 2 1.53000
2 3 NULL
3 2 NULL
4 2 NULL
5 3 NULL
6 1 NULL
I want to add a column col2 which is defined as multiplier * col1, however the next value of col1 then updates to take the previous calculated value of col2.
The resulting table should look like:
id multiplier col1 col2
---------------------------------------
1 2 1.53000 3.06000
2 3 3.06000 9.18000
3 2 9.18000 18.36000
4 2 18.36000 36.72000
5 3 36.72000 110.16000
6 1 110.16000 110.16000
Is this possible using T-SQL? I've tried a few different things such as joining id to id - 1 and have played around with a sequential update using UPDATE and setting variables but I can't get it to work.

A recursive CTE might be the best approach. Assuming your ids have no gaps:
with cte as (
select id, multiplier, convert(float, col1) as col1, convert(float, col1 * multiplier) as col2
from table1
where id = 1
union all
select t1.id, t1.multiplier, cte.col2 as col1, cte.col2 * t1.multiplier
from cte join
table1 t1
on t1.id = cte.id + 1
)
select *
from cte;
Here is a db<>fiddle.
Note that I converted the destination type to float, which is convenient for this sort of operation. You can convert back to decimal if you prefer that.

Basically, this would require an aggregate/window function that computes the product of column values. Such set function does not exists in SQL though. We can work around this with arithmetics:
select
id,
multiplier,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2
from table1
Demo on DB Fiddle:
id | multiplier | col1 | col2
-: | ---------: | -----: | -----:
1 | 2 | 1.53 | 3.06
2 | 3 | 3.06 | 9.18
3 | 2 | 9.18 | 18.36
4 | 2 | 18.36 | 36.72
5 | 3 | 36.72 | 110.16
6 | 1 | 110.16 | 110.16
This will fail if there are negative multipliers.
If you wanted an update statement:
with cte as (
select col1, col2,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1_new,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2_new
from table1
)
update cte set col1 = col1_new, col2 = col2_new

DENSE_RANK() without duplication

Here's what my data looks like:
| col1 | col2 | denserank | whatiwant |
|------|------|-----------|-----------|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 1 |
| 3 | 2 | 2 | 2 |
| 4 | 2 | 2 | 2 |
| 5 | 1 | 1 | 3 |
| 6 | 2 | 2 | 4 |
| 7 | 2 | 2 | 4 |
| 8 | 3 | 3 | 5 |
Here's the query I have so far:
SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
FROM [table1]
ORDER BY [col1] asc
What I'd like to achieve is for my denserank column to increment every time there is a change in the value of col2 (even if the value itself is reused). I can't actually order by the column I have denserank on, so that won't work). See the whatiwant column for an example.
Is there any way to achieve this with DENSE_RANK()? Or is there an alternative?

I would do it with a recursive cte like this:
declare #Dept table (col1 integer, col2 integer)
insert into #Dept values(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 2),(7, 2),(8, 3)
;with a as (
select col1, col2,
ROW_NUMBER() over (order by col1) as rn
from #Dept),
s as
(select col1, col2, rn, 1 as dr from a where rn=1
union all
select a.col1, a.col2, a.rn, case when a.col2=s.col2 then s.dr else s.dr+1 end as dr
from a inner join s on a.rn=s.rn+1)
col1, col2, dr from s
result:
col1 col2 dr
----------- ----------- -----------
1 1 1
2 1 1
3 2 2
4 2 2
5 1 3
6 2 4
7 2 4
8 3 5
The ROW_NUMBER is only required in case your col1 values are not sequential. If they are you can use the recursive cte straight away

Try this using window functions:
with t(col1 ,col2) as (
select 1 , 1 union all
select 2 , 1 union all
select 3 , 2 union all
select 4 , 2 union all
select 5 , 1 union all
select 6 , 2 union all
select 7 , 2 union all
select 8 , 3
)
select t.col1,
t.col2,
sum(x) over (
order by col1
) whatyouwant
from (
select t.*,
case
when col2 = lag(col2) over (
order by col1
)
then 0
else 1
end x
from t
) t
order by col1;
Produces:
It does a single table read and forms group of consecutive equal col2 values in increasing order of col1 and then finds dense rank on that.
x: Assign value 0 if previous row's col2 is same as this row's col2 (in order of increasing col1) otherwise 1
whatyouwant: create groups of equal values of col2 in order of increasing col1 by doing an incremental sum of the value x generated in the last step and that's your output.

Here is one way using SUM OVER(Order by) window aggregate function
SELECT col1,Col2,
Sum(CASE WHEN a.prev_val = a.col2 THEN 0 ELSE 1 END) OVER(ORDER BY col1) AS whatiwant
FROM (SELECT col1,
col2,
Lag(col2, 1)OVER(ORDER BY col1) AS prev_val
FROM Yourtable) a
ORDER BY col1;
How it works:
LAG window function is used to find the previous col2 for each row ordered by col1
SUM OVER(Order by) will increment the number only when previous col2 is not equal to current col2

I think this is possible in pure SQL using some gaps and islands tricks, but the path of least resistance might be to use a session variable combined with LAG() to keep track of when your computed dense rank changes value. In the query below, I use #a to keep track of the change in the dense rank, and when it changes this variable is incremented by 1.
DECLARE #a int
SET #a = 1
SELECT t.col1,
t.col2,
t.denserank,
#a = CASE WHEN LAG(t.denserank, 1, 1) OVER (ORDER BY t.col1) = t.denserank
THEN #a
ELSE #a+1 END AS [whatiwant]
FROM
(
SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
FROM [table1]
) t
ORDER BY t.col1

Get ID of a row having maximum value in other column

I have a table like
+------+-------+-------------------------------------+
| id | col2 | col3 |
+------+-------+-------------------------------------+
| 1 | 1 | 10 |
| 2 | 1 | 20 |
| 3 | 1 | 15 |
| 4 | 2 | 10 |
| 5 | 2 | 20 |
| 6 | 2 | 15 |
| 7 | 2 | 30 |
+------+-------+-------------------------------------+
I want to select Id where col3 has maximum value and col2 = 2. (id 7 in this case since it has maximum value 30 where col2=2). I tried with GROUP BY clause
SELECT id, MAX(col3) FROM table_name
WHERE col2 =2
GROUP BY id
But it gives me all the Id's where col2=2. How can I achieve desired output?
Thanks.

You could use ROW_NUMBER:
CREATE TABLE temp(
ID INT,
Col2 INT,
Col3 INT
)
INSERT INTO temp VALUES
(1, 1, 10), (2, 1, 20), (3, 1, 15),
(4, 2, 10), (5, 2, 20), (6, 2, 15),
(7, 2, 30);
SELECT
ID, Col3
FROM(
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col3 DESC)
FROM table_name
)t
WHERE
rn = 1
AND col2 = 2
RESULT
ID Col3
----------- -----------
7 30

You have to use this following simple sql query for your requirement.
SELECT TOP 1 ID
FROM table_name
Where col2 = 2
ORDER BY col3 DESC
You can use also ROW_NUMBER() function. It's another way for same result.

A simple sub-query will do the job:
SELECT id
FROM table_name
WHERE col2 = 2
AND col3 = (SELECT MAX(col3) FROM table_name WHERE col2=2)
Demo
This can return multiple ID's if there are multiple rows with the same max-value, if you don't want that you could use DISTINCT id or an aggregate function like MAX(id) or TOP 1 id.
Here's another approach using GROUP BY:
SELECT TOP 1 MAX(id)
FROM table_name
GROUP BY col2, col3
HAVING col2 = 2
ORDER BY col3 DESC

Grouping multiple rows from a table into column

I have two table as below.
Table 1
+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------+------+
| 1 | 1.5 | 1.5 | 2.5 |
| 1 | 2.5 | 3.5 | 1.5 |
+------+------+------+------+
Table 2
+------+--------+
| Col1 | Col2 |
+------+--------+
| 1 | 12345 |
| 1 | 678910 |
+------+--------+
I want the result as below.
+------+------+------+------+-------+--------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+-------+--------+
| 1 | 4 | 5 | 4 | 12345 | 678910 |
+------+------+------+------+-------+--------+
Here Col2, Col3 and Col4 is the aggregate of value from Col2,3,4 in Table 1. And rows from Table 2 are transposed to Columns in the result.
I use Oracle 11G and tried the PIVOT option. But I couldn't aggregate values from Column 2,3,4 in Table 1.
Is there any function available in Oracle which provides direct solution without any dirty work around?
Thanks in advance.

Since you will always have only 2 records in second table simple grouping and join will do.
Since I dont have tables I am using CTEs and Inline views
with cte1 as (
select 1 as col1 , 1.5 as col2 , 1.5 as col3, 2.5 as col4 from dual
union all
select 1 , 2.5 , 3.5 , 1.5 fom dual
) ,
cte2 as (
select 1 as col1 , 12345 as col2 fom dual
union all
select 1,678910 fom dual )
select* from(
(select col1,sum(col2) as col2 , sum(col3) as col3,sum(col4) as col4
from cte1 group by col1) as x
inner join
(select col1 ,min(col2) as col5 ,max(col2) as col from cte2
group by col1
) as y
on x.col1=y.col1)

with
mytab1 as (select col1, col2, col3, col4, 0 col5, 0 col6 from tab1),
mytab2 as
(
select
col1, 0 col2, 0 col3, 0 col4, "1_COL2" col5, "2_COL2" col6
from
(
select
row_number() over (partition by col1 order by rowid) rn, col1, col2
from
tab2
)
pivot
(
max(col2) col2
for rn in (1, 2)
)
)
select
col1,
sum(col2) col2,
sum(col3) col3,
sum(col4) col4,
sum(col5) col5,
sum(col6) col6
from
(
select * from mytab1 union all select * from mytab2
)
group by
col1

Hello You can use the below query
with t1 (col1,col2,col3,col4)
as
(
select 1,1.5,1.5,2.5 from dual
union
select 1,2.5,3.5,1.5 from dual
),
t2 (col1,col2)
as
(
select 1,12345 from dual
union
select 1,678910 from dual
)
select * from
(
select col1
,max(decode(col2,12345,12345)) as co5
,max(decode(col2,678910,678910)) as col6
from t2
group by col1
) a
inner join
(
select col1,sum(col2) as col2,sum(col3) as col3,sum(col4) as col4
from t1
group by col1
) b
on a.col1=b.col1

Pivot only the second table. You can then do GROUP BY on the nested UNION ALL between table1 (col5 and col6 are null for subsequent group by) and pivoted table2 (col2, col3, col4 are null for subsequent group by).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql selecting unique rows based on a specific column - sql

Try this ;WITH CTE( SELECT *, ROW_NUMBER() OVER(PARTITIAN BY Col1 ORDER BY(SELECT NULL))RN FROM tblname ) SELECT Col1, Col2, Col3, Col4 FROM CTE;

Select DISTINCT , should do the trick. Here is a good reference https://www.w3schools.com/sql/sql_distinct.asp

Related

Addition with NULL values across multiple columns

T-SQL sequential updating with two columns

DENSE_RANK() without duplication

Get ID of a row having maximum value in other column

Grouping multiple rows from a table into column

Categories

Resources