SQL Server get column not in Group By clause? - sql

How to get the following result from this table?
ID1|ID2| Date
----------------------
1 | 1 | 01-01-2014
1 | 2 | 02-01-2014
2 | 3 | 03-01-2014
I want to get ID1 & ID2 for the maximum date when grouped by ID1
Result:
1,2
2,3
My code:
SELECT
ID1, MAX(DATE)
FROM
Table
GROUP BY
ID1
I need something like
SELECT
ID1, ID2, MAX(DATE)
FROM
Table
GROUP BY
ID1
Can someone help me?

There's three ways to do it.
One, a subquery:
SELECT t1.ID1, t1.ID2, t2.MAX_DATE
FROM Table t1
INNER JOIN (
SELECT ID1, MAX(DATE) AS "MAX_DATE" FROM Table GROUP BY ID1) t2
ON t1.ID1 = t2.ID2
Or you can use the OVER() clause if you're on SQL Server 2005+, recent versions of Oracle, or PostgreSQL (and most recent things not MySQL or MariaDB):
SELECT ID1,
ID2,
MAX(DATE) OVER(PARTITION BY ID1)
FROM Table
Or you can use a correlated subquery:
SELECT t1.ID1,
t1.ID2,
(SELECT MAX(DATE) FROM Table WHERE ID1 = t1.ID1)
FROM Table t1

You can accomplish this by joining the table to the aggregate, like this:
SELECT t.*
FROM
Table t
INNER JOIN
(
SELECT
ID1,
MAX(Date) MaxDate
FROM Table
GROUP BY ID1
) MaxDate ON
t.ID1 = MaxDate.ID1 AND
t.Date = MaxDate.MaxDate

you can use ROW_NUMBER analytic function
SELECT *
FROM
(SELECT *,
ROW_NUMBER() over ( partition by ID1 order by [date] desc) as seq
FROM Table1
) T
WHERE T.seq =1

Related

SQL select youngest record

I have a table. I want to run the SQL query and select the youngest record per ID, I also need to output all other columns associated with the youngest row. In the real table, there are more than 500+ columns.
Please note, I am using AWS Athena. The table has no indexes.
ID COL1 COL2 LAST_UPDATED
1 yyy ddd 01/01/2020
1 ccc eee 12/01/2020
2 xxx ddd 02/01/2020
2 vvv eee 19/01/2020
Desired result:
ID COL1 COL2 LAST_UPDATED
1 ccc eee 12/01/2020
2 vvv eee 19/01/2020
I found solution to use ROW_NUMBER() OVER(PARTITION BY
SELECT *
FROM (
SELECT id, updated_at, ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at desc) rn
from table t
)
where rn = 1
Try using below query:
select * from aws
where last_updated in (select max(last_updated) from aws group by id)
A typical and efficient way in most databases is to use a correlated subquery:
select t.*
from t
where t.LAST_UPDATED = (select max(t2.LAST_UPDATED)
from t t2
where t2.id = t.id
);
For performance, you want an index on (id, LAST_UPDATED).
In a database that doesn't have indexes, then use row_nmber():
select t.*
from (select t.*, row_number() over (partition by id order by last_id desc) as seqnum
from t
) t
where seqnum = 1;

Join two tables into one by adding data of all tables sequentially

I am facing an issue in joining three tables with different data.
Suppose I am having table1 and table2 like :
table1 : table2:
ID1 ID2
----- -----
1 102
2 103
I need to join these two tables into table3 as :
table3
------
ID1 ID2
--- ---
1 102
2 103
I am applying cross join in table1 and table2 but i am gettng:
table3 :
ID1 ID2
--- ---
1 102
2 102
1 103
2 103
If you are simply ordering by ID for each table, and then matching the first row with the first row - the following should work.
Select T1.ID1
, T2.ID2
from (Select ID1, row_number() over (order by ID1) rownum from Table1) T1
inner join (Select ID2, row_number() over (order by ID2) rownum from Table2) T2
on T1.rownum = T2.rownum
It create a subquery for each table with a row number, and then inner joins on the row number.
If your ID's are not always in sequential form then use this:
SELECT t1.ID1, T2.ID2
FROM (SELECT ID1, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) rn FROM table1 ) t1
INNER JOIN (SELECT ID2, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) rn FROM table2) t2
ON t1.rn = t2.rn

SQL Complex join not giving distinct result

I have two tables :-
Table1:-
ID1
1
1
1
1
4
5
Table2:-
Id2
2
2
1
1
1
8
I want to show all the ID2 from table2 which are present in ID1 of table1 by using joins
I used :-
select ID2 from Table2 t2 left join Table1 t1
on t2.Id2=t1.Id1
But this was giving repeated result as :-
Id2
1
1
1
1
1
1
1
It should show me 1 as 3 times only as it is present in Table2 3 times.
Please help.
You're matching the value 1 with 4 rows on Table1 and 3 rows on Table2 that's why you're seeing 12 rows. You need an additional JOIN condition. You can add a ROW_NUMBER and do an INNER JOIN to achieve your desired result.
WITH Cte1 AS(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY Id1 ORDER BY (SELECT NULL))
FROM Table1
),
Cte2 AS(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY Id2 ORDER BY (SELECT NULL))
FROM Table2
)
SELECT c2.Id2
FROM Cte2 c2
INNER JOIN Cte1 c1
ON c1.Id1 = c2.Id2
AND c1.rn = c2.rn
However, you can achieve the desired result without using a JOIN.
SELECT *
FROM Table2 t2
WHERE EXISTS(
SELECT 1 FROM Table1 t1 WHERE t1.Id1 = t2.Id2
)
It's the expected behavior of Join Operation. It will match every row from the two tables, so you will get 12 rows containing value 1 in result of join query.
You can use below query to get desired result.
select ID2 from Table2 t2 WHERE ID2 IN (SELECT ID1 FROM Table1 t1)
select id2 from table2 t2 where exists ( select 1 from table1 t1 where t1.id1 = t2.id2)
Your join logic works fine, the problem is each of your ID2 is matching against all ID1s. A simple solution would be to join with a table of distinct ID1s to avoid this duplication.
select
t2.ID2
from Table2 t2
left join (select distinct * from Table1) t1
on t1.Id1=t2.Id2
where t1.ID1 is not null
;
Here is a functional example
This will select your entire ID2 list with ID1 populated in a column. ID1 is null where there was no match. Select your ID2 column from this table but just don't pull null values (with where clause):

Oracle 11.2 SQL - help to condense data in ordered set

I have a data-set with a timestamp column and multiple identifier columns. I want to condense it to a single row for each "block" of adjacent rows with equal identifiers, when ordered by the timestamp. The min and max timestamp for each block is required.
Source Data:
TSTAMP ID1 ID2
t1 A B <= start of new block
t2 A B
t3 C D <= start of new block
t4 E F <= start of new block
t5 E F
t6 E F
t7 A B <= start of new block
t8 G H <= start of new block
Desired Result:
MIN_TSTAMP MAX_TSTAMP ID1 ID2
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
I thought this was ripe for a window-ing analytic function but I cannot partition without grouping ALL equal combinations of IDn - rather than only those in adjacent rows, when ordered by timestamp.
A workaround is to create a key column first in an in-line view that I can later group by i.e. with same value for each row in the block and different value for each block. I can do this using LAG analytic function to compare row values and then calling a PL/SQL function to return nextval/currval values of a sequence (calling nextval/currval directly in the SQL is restricted in this context).
select min(ilv.tstamp), max(ilv.tstamp), id1, id2
from (
select case when (id1 != lag(id1,1,'*') over (partition by (1) order by tstamp)
or id2 != lag(id2,1,'*') over (partition by (1) order by tstamp))
then
pk_seq_utils.gav_get_nextval
else
pk_seq_utils.gav_get_currval
end ident, t.*
from tab1 t
order by tstamp) ilv
group by ident, id1, id2
order by 1;
where the gav_get_xxx functions simply return currval/nextval from a sequence.
But I would like to use SQL only and avoid PL/SQL (as I could also write this easily in PL/SQL and pipe out the result-rows from a pipeline function).
Any ideas?
Thanks.
Tabibitosan to the rescue!
with sample_data as (select 't1' tstamp, 'A' id1, 'B' id2 from dual union all
select 't2' tstamp, 'A' id1, 'B' id2 from dual union all
select 't3' tstamp, 'C' id1, 'D' id2 from dual union all
select 't4' tstamp, 'E' id1, 'F' id2 from dual union all
select 't5' tstamp, 'E' id1, 'F' id2 from dual union all
select 't6' tstamp, 'E' id1, 'F' id2 from dual union all
select 't7' tstamp, 'A' id1, 'B' id2 from dual union all
select 't8' tstamp, 'G' id1, 'H' id2 from dual)
select min(tstamp) min_tstamp, max(tstamp) max_tstamp, id1, id2
from (select tstamp,
id1,
id2,
row_number() over (order by tstamp) - row_number() over (partition by id1, id2 order by tstamp) grp
from sample_data)
group by id1,
id2,
grp
order by min(tstamp);
MIN_TSTAMP MAX_TSTAMP ID1 ID2
---------- ---------- --- ---
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You can use an analytic 'trick' to identify the gaps and islands, comparing the position of each row just against the tstamp across all rows with its position against tstamp just for that id2, id2 combination:
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1;
TS I I BLOCK_ID
-- - - ----------
t1 A B 0
t2 A B 0
t3 C D -2
t4 E F -3
t5 E F -3
t6 E F -3
t7 A B -4
t8 G H -7
The actual value of block_id doesn't matter, just that it's unique for each block for the combination. You can then group using that:
select min(tstamp) as min_tstamp, max(tstamp) as max_tstamp, id1, id2
from (
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1
)
group by id1, id2, block_id
order by min(tstamp);
MI MA I I
-- -- - -
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You should be able to use the row_number window function to do this, like below:
select
min(tstamp) mints, max(tstamp) maxts, id1, id2
from (
select
*,
row_number() over (order by tstamp)
- row_number() over (partition by id1, id2 order by tstamp) as rn
from t
) as subq
group by id1, id2, rn
order by rn
I haven't been able to test it with any Oracle db, but it works with MSSQL and should work in Oracle too as the window function works the same way.
You need to do this step by step:
Detect ID changes with LAG marking each change with a flag = 1.
Generate keys for the groups (i.e. adjacent records with the same ID) with SUM over the ID change flags (running total).
Group by generated group key and get min/max timestamp.
Query:
select
min(tstamp) as min_tstamp,
max(tstamp) as max_tstamp,
min(id1) as id1,
min(id2) as id2
from
(
select
grouped.*,
sum(newgroup) over (order by tstamp) as groupkey
from
(
select
mytable.*,
case when id1 <> lag(id1) over (order by tstamp)
or id2 <> lag(id2) over (order by tstamp)
then 1 else 0 end as newgroup
from mytable
order by tstamp
) grouped
)
group by groupkey
order by groupkey;

How to map the 2 different record set one by one?

Let' say I have 2 sql queries. Table A contains,
ID
--
1
1
1
2
3
4
This query,
Select distinct ID1 FROM A
gives me,
ID
--
1
2
3
4
Second one
Select ID2 FROM B
which gives me,
ID2
--
8
21
33
43
How 2 get this record set?
ID1 ID2
--- ---
1 8
2 21
3 33
4 43
You did not specify what version of sql server but if you are using sql server 2008+, one way that you can do this is by adding the row_number() to each table and then joining on the row_number():
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from a
) a
inner join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn
See SQL Fiddle with Demo
If you want to only use DISTINCT values, then you should be able to use:
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from
(
select distinct id
from a
) a
) a
inner join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn;
See SQL Fiddle with Demo
If you have a different number of rows in each table, then you might want to use a FULL OUTER JOIN:
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from
(
select distinct id
from a
) a
) a
full outer join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn;
See SQL Fiddle with Demo