Getting distinct result with Oracle SQL - sql

I have the following data structure
ID | REFID | NAME
1 | 100 | A
2 | 101 | B
3 | 101 | C
With
SELECT DISTINCT REFID, ID, NAME
FROM my_table
ORDER BY ID
I would like to have the following result:
1 | 100 | A
2 | 101 | B
Colum NAME and ID should contain the MIN or FIRST value.
But actually I get stuck at using MIN/FIRST here.
I welcome every tipps :-)

select id,
refid,
name
from (select id,
refid,
name,
row_number() over(partition by refid order by name) as rn
from my_table)
where rn = 1
order by id

You can use a subquery to do this.
WITH Q AS
( SELECT MIN(NAME) AS NAME, REFID FROM T GROUP BY REFID )
SELECT T.ID, T.REFID, T.NAME
FROM T
JOIN Q
ON (T.NAME = Q.NAME)
Also, note that SQL tables have no order. So there's no "First" value.

Related

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

Selecting compared pairs from table

I don't really know how to describe it. I have a table:
ID | Name | Date
-------------------------
1 | Mike | 01.01.2016
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
2 | Sam | 05.03.2015
3 | Tony | 02.04.2012
I want to select pairs of IDs and Names with latest dates in each pair. The result here should be:
ID | Name | Date
-------------------------
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
3 | Tony | 02.04.2012
How do I achieve this?
Oracle Database 11g
You can do it using the ROW_NUMBER() analytic function:
SELECT id, name, "date"
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "date" DESC ) rn
FROM table_name t
)
WHERE rn = 1
This requires only a single table scan (it does not have a self-join or correlated sub-query - i.e. IN (...) or EXISTS(...)).
Have a sub-select that returns each id and it's max date:
select * from table
where (id, date) in (select id, max(date) from table group by id)
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.date > t.date)
Possibly the most efficient method is:
select t.*
from table t
where t.date = (select max(date) from table t2 where t2.id = t.id);
along with an index on table(id, date).
This version should scan the table and look up the correct value in the index.
Or, if there are only three columns, you can use keep:
select id, max(date) as date,
max(name) keep (dense_rank first order by date desc) as name
from table
group by id;
I have found that this version works very well in Oracle.

Amazon Redshift mechanism for aggregating a column into a string [duplicate]

I have a data set in the form.
id | attribute
-----------------
1 | a
2 | b
2 | a
2 | a
3 | c
Desired output:
attribute| num
-------------------
a | 1
b,a | 1
c | 1
In MySQL, I would use:
select attribute, count(*) num
from
(select id, group_concat(distinct attribute) attribute from dataset group by id) as subquery
group by attribute;
I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See this question.
An alternate solution that would work is if there was a way for me to pick a random attribute from each group instead of group_concat. How can this work in Redshift?
I found a way to pick up a random attribute for each id, but it's too tricky. Actually I don't think it's a good way, but it works.
SQL:
-- (1) uniq dataset
WITH uniq_dataset as (select * from dataset group by id, attr)
SELECT
uds.id, rds.attr
FROM
-- (2) generate random rank for each id
(select id, round((random() * ((select count(*) from uniq_dataset iuds where iuds.id = ouds.id) - 1))::numeric, 0) + 1 as random_rk from (select distinct id from uniq_dataset) ouds) uds,
-- (3) rank table
(select rank() over(partition by id order by attr) as rk, id ,attr from uniq_dataset) rds
WHERE
uds.id = rds.id
AND
uds.random_rk = rds.rk
ORDER BY
uds.id;
Result:
id | attr
----+------
1 | a
2 | a
3 | c
OR
id | attr
----+------
1 | a
2 | b
3 | c
Here are tables in this SQL.
-- dataset (original table)
id | attr
----+------
1 | a
2 | b
2 | a
2 | a
3 | c
-- (1) uniq dataset
id | attr
----+------
1 | a
2 | a
2 | b
3 | c
-- (2) generate random rank for each id
id | random_rk
----+----
1 | 1
2 | 1 <- 1 or 2
3 | 1
-- (3) rank table
rk | id | attr
----+----+------
1 | 1 | a
1 | 2 | a
2 | 2 | b
1 | 3 | c
This solution, inspired by Masashi, is simpler and accomplishes selecting a random element from a group in Redshift.
SELECT id, first_value as attribute
FROM(SELECT id, FIRST_VALUE(attribute)
OVER(PARTITION BY id ORDER BY random()
ROWS BETWEEN unbounded preceding AND unbounded following)
FROM dataset)
GROUP BY id, attribute ORDER BY id;
This is an answer for the related question here. That question is closed, so I am posting the answer here.
Here is a method to aggregate a column into a string:
select * from temp;
attribute
-----------
a
c
b
1) Give a unique rank to each row
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select * from sub_table;
attribute | rnk
-----------+-----
a | 1
b | 2
c | 3
2) Use concat operator || to combine in one line
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
(select attribute from sub_table where rnk = 2)||
(select attribute from sub_table where rnk = 3) res_string;
res_string
------------
abc
This only works for a finite numbers of rows (X) in that column. It can be the first X rows ordered by some attribute in the "order by" clause. I'm guessing this is expensive.
Case statement can be used to deal with NULLs which occur when a certain rank does not exist.
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
(select attribute from sub_table where rnk = 2)||
(select attribute from sub_table where rnk = 3)||
(case when (select attribute from sub_table where rnk = 4) is NULL then ''
else (select attribute from sub_table where rnk = 4) end) as res_string;
I haven't tested this query, but these functions are supported in Redshift:
select id, arrary_to_string(array(select attribute from mydataset m where m.id=d.id),',')
from mydataset d

SQL Server : select from duplicate columns where date newest

I have inherited a SQL Server table in the (abbreviated) form of (includes sample data set):
| SID | name | Invite_Date |
|-----|-------|-------------|
| 101 | foo | 2013-01-06 |
| 102 | bar | 2013-04-04 |
| 101 | fubar | 2013-03-06 |
I need to select all SID's and the Invite_date, but if there is a duplicate SID, then just get the latest entry (by date).
So the results from the above would look like:
101 | fubar | 2013-03-06
102 | bar | 2013-04-04
Any ideas please.
N.B the Invite_date column has been declared as a nvarchar, so to get it in a date format I am using CONVERT(DATE, Invite_date)
You can use a ranking function like ROW_NUMBER or DENSE_RANK in a CTE:
WITH CTE AS
(
SELECT SID, name, Invite_Date,
rn = Row_Number() OVER (PARTITION By SID
Order By Invite_Date DESC)
FROM dbo.TableName
)
SELECT SID, name, Invite_Date
FROM CTE
WHERE RN = 1
Demo
Use Row_Number if you want exactly one row per group and Dense_Rank if you want all last Invite_Date rows for each group in case of repeating max-Invite_Dates.
select t1.*
from your_table t1
inner join
(
select sid, max(CONVERT(DATE, Invite_date)) mdate
from your_table
group by sid
) t2 on t1.sid = t2.sid and CONVERT(DATE, t1.Invite_date) = t2.mdate
select
SID,name,MAX(Invite_date)
FROM
Table1
GROUP BY
SID
http://sqlfiddle.com/#!2/6b6f66/1

Get first or most repeated value from a group by in SQL Server

I have an ugly table with codes and names from a source I can't control, which is like this one (OriginalTable):
Code | Name
--------------------
001-001 | Name1_a
001-002 | Name1_a
001-002 | Name1_b
001-003 | Name1_a
002-001 | Name2_a
002-001 | Name2_b
002-002 | Name2_a
003-001 | Name3
...
The problem is that I need an unique name for the first 3 digits of each code (SmallCode), like in the following table:
Id | Code | Name
--------------------
1 | 001 | NameX
2 | 002 | NameY
3 | 003 | NameZ
The criteria I want use for choosing a name is that it should be the most repeated name or the first one in each SmallCode.
For example NameX is the most repeated name on of all codes starting with 001 or the first one (Name1_a in both cases). Same with NameY for 002 and NameZ for 003.
Right now I was using this query:
select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable
select SmallCode, Min(Code) as Code
into #tmpReducedCode
from #tmpCode
group by SmallCode
insert into ResultTable (Code, Name)
select a.SmallCode, a.Name
from #tmpCode a
inner join #tmpReducedCode b
on a.Code = b.Code
But this is my result, which is wrong because there are 2 different names for code 002-001 (Name2_a, Name2_b)
1 | 001 | Name1_a
2 | 002 | Name2_a
3 | 002 | Name2_b
4 | 003 | Name3
So the question is: How can I separate OriginalTable into those 2 tables choosing the most repeated or first appearing name for each small code?
For the first table:
select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable
select SmallCode, Name
into #tmpReducedCode
from (
select SmallCode, Name, row_number() over (partition by SmallCode order by Total desc) rn
from (
select SmallCode, Name, count(*) Total
from #tmpCode
group by SmallCode, Name) x) y
where rn=1;
select distinct a.SmallCode, b.Name
from #tmpCode a
inner join #tmpReducedCode b
on left(a.Code,3) = b.SmallCode
I think the best way to do this is with window functions:
select cast(LEFT(code, 3) as int) as id,
RIGHT(code, 3) as code,
name
from (select cn.*, ROW_NUMBER() over (partition by code order by cnt desc) as seqnum
from (select code, name, COUNT(*) as cnt
from OriginalTable ot
group by code, name
) cn
) cn
where seqnum = 1
This assumes you are using SQL Server 2005 or a more recent version.
Run subquery for each code:
select distinct substring(Code,1,3) as "Code",
(select top 1 Name
from OrginalTable tab2
where substring(tab2.Code,1,3)=substring(tab1.Code,1,3)
group by substring(Code,1,3), Name
order by count(Name) desc) as "Name"
from OrginalTable tab1;