Oracle SQL, how to select first row in a group? - sql

Here is the my SQL fiddle: http://sqlfiddle.com/#!4/75ab7/2
Basically, I have created a table and insert some data into it.
CREATE TABLE subject (
id INT NOT NULL,
seq_num INT NOT NULL,
name VARCHAR(30) NOT NULL
);
INSERT INTO subject
(id, seq_num, name)
VALUES
(1, 1, 'sub_1_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(2, 1, 'sub_1_2');
INSERT INTO subject
(id, seq_num, name)
VALUES
(3, 2,'sub_2_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(4, 2, 'sub_2_2');
INSERT INTO subject
(id, seq_num, name)
VALUES
(5, 2, 'sub_2_3');
INSERT INTO subject
(id, seq_num, name)
VALUES
(6, 3, 'sub_3_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(7, 3, 'sub_3_1');
I run this select statement:
select
LISTAGG(TRIM(id), ',') WITHIN GROUP (ORDER BY 1) AS IDS,
seq_num,
LISTAGG(TRIM(name), ',') WITHIN GROUP (ORDER BY 1) AS NAMES
from
subject
group by
seq_num
order by
seq_num asc
The select statement result:
| ids | seq_num | names |
|-------|---------|-------------------------|
| 1,2 | 1 | sub_1_1,sub_1_2 |
| 3,4,5 | 2 | sub_2_1,sub_2_2,sub_2_3 |
| 6,7 | 3 | sub_3_1,sub_3_1 |
Can I generate something like this?
| ids | seq_num | names |
|-----|---------|---------|
| 1 | 1 | sub_1_1 |
| 3 | 2 | sub_2_1 |
| 6 | 3 | sub_3_1 |
That is only pick the first row in a group.

Use row number:
select
id, seq_num, name
from
(
select id, seq_num, name,
row_number() over (partition by seq_num order by id) rn
from subject
) t
where rn = 1
order by seq_num;
Here is a link to your updated Fiddle:
Demo

You can use keep and first in oracle:
select seq_num,
max(trim(id)) keep (dense_rank first order by trim(id)) as first_id,
max(trim(name)) keep (dense_rank first order by trim(id)) as first_name
from subject
group by seq_num
order by seq_num asc;
Here is the SQL Fiddle.

This should work:
select
min(id) AS IDS,
seq_num,
min(name) AS NAMES
from
subject
group by
seq_num
order by
seq_num asc;
Working Demo
Hope it helps!

Related

Alternative to DENSE_RANK

I have the following table (table1):
f_name | email
---------|---------------------
john | john123#hotmail.com
peter | peter456#gmail.com
johnny | john123#hotmail.com
peter8 | peter456#gmail.com
...
I would like to add a Group number for the same email value column:
f_name | email |Group |
---------|---------------------|------|
john | john123#hotmail.com | 1 |
peter | peter456#gmail.com | 2 |
johnny | john123#hotmail.com | 1 |
peter8 | peter456#gmail.com | 2 |
...
I use the following:
SELECT
email,
s_index = ROW_NUMBER() OVER(PARTITION BY [email] ORDER BY [email]),
t_index = DENSE_RANK() OVER (ORDER BY [email])
FROM dbo.table1
Is it the best way to do it for big data in Oracle? How can it be done in Impala?
Maybe much to slow, and maybe not applicable; but if you're looking for an alternative to dense_rank()...
If your table has - besides f_name and email - some unique id, then one could take the minimal id of each email-group as a value for the groupId. Of course, the groupId must have the same type as the id, and it should not matter that the values for groupId are not continuous, i.e. they will have gaps:
CREATE TABLE Table1
("id" int, "f_name" varchar2(9), "email" varchar2(21), "groupId" int)
\\
INSERT ALL
INTO Table1 ("id", "f_name", "email")
VALUES (1, 'john', 'john123#hotmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (2, 'peter', 'peter456#gmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (3, 'johnny', 'john123#hotmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (4, 'peter8', 'peter456#gmail.com')
SELECT * FROM dual
\\
update table1 set "groupId" =
(select min(t2."id") as groupId
from table1 t2
where table1."email" = t2."email"
group by t2."email"
)

Merge where definition exists

I have 3 tables I'm working with here. ATTRIBUTE_MAP, GROUP_DEFINITIONS, and GROUP_MAP.
ATTRIBUTE_MAP contains the CUST_ID and the associated ATTRIBUTE_ID.
GROUP_DEFINITIONS defines a group. Its columns are GROUP_ID, ATTRIBUTE_1, VALUE_1, ATTRIBUTE_2, VALUE_2, ATTRIBUTE_3, VALUE_3 A group consists of 1 to 3 attributes with values. For example, an attribute could be 'State' with its value being 'New York'. Values can also be null for boolean values like 'Owns Car'.
GROUP_MAP simply maps the CUST_ID to a GROUP_ID.
Now, I'm trying to write a script that will look at the ATTRIBUTE_MAP and see if a customer falls into one of the defined groups in GROUP_DEFINITIONS. If he (the customer) does, then insert/update a row into GROUP_MAP with the CUST_ID and GROUP_ID. The part I'm having trouble with, is matching the attribute values.
Here is what I have so far:
merge GROUP_MAP gm using
( select am.CUST_ID
,am.ATTRIBUTE_ID
,am.START_DATE
,gd.GROUP_ID
,gd.ATTRIBUTE_1
,gd.VALUE_1
,gd.ATTRIBUTE_2
,gd.VALUE_2
,gd.ATTRIBUTE_3
,gd.VALUE_3
from ATTRIBUTE_MAP am, GROUP_DEFINITIONS gd ) src
on gm.GROUP_ID=src.GROUP_ID
AND gm.CUST_ID=src.CUST_ID
when not matched then -- create association in GROUP_MAP
insert (CUST_ID, GROUP_ID, FROM_DATE)
values (src.CUST_ID, src.GROUP_ID, src.START_DATE);
Am I approaching this correctly? I'm guessing I need to just improve my nested select statement in my merge to join the ATTRIBUTE_MAP and PEER_GROUP_DEFINTIONS and then go from there. Any help/suggestions would be appreciated.
Here's an example for reference:
ATTRIBUTE_MAP:
+---------+--------------+------------+
| CUST_ID | ATTRIBUTE_ID | VALUE |
+---------+--------------+------------+
| 50 | 1 | 'New York' |
+---------+--------------+------------+
| 50 | 2 | |
+---------+--------------+------------+
GROUP_DEFINITIONS:
+----------+-------------+------------+-------------+---------+-------------+---------+
| GROUP_ID | ATTRIBUTE_1 | VALUE_1 | ATTRIBUTE_2 | VALUE_2 | ATTRIBUTE_3 | VALUE_3 |
+----------+-------------+------------+-------------+---------+-------------+---------+
| 10 | 1 | 'New York' | 2 | | | |
+----------+-------------+------------+-------------+---------+-------------+---------+
| 20 | 2 | | | | | |
+----------+-------------+------------+-------------+---------+-------------+---------+
and so the script should generate (in GROUP_MAP):
+---------+----------+--------+
| CUST_ID | GROUP_ID | DATE |
+---------+----------+--------+
| 50 | 10 | *date* |
+---------+----------+--------+
| 50 | 20 | *date* |
+---------+----------+--------+
I could be totally off, but it looks like your inner select needs to be something like this. If I understand what you are trying to do, this will return a unique list of CUST_ID, GROUP_ID, START_DATE where all of the customer attributes match all of the group attributes. Just wrote this fast, so might have some errors, but it might get you going the right direction.
with gd as (
SELECT GROUP_ID, ATTRIBUTE_1 as ATTRIBUTE_ID, VALUE_1 as VALUE from GROUP_DEFINITIONS
UNION
SELECT GROUP_ID, ATTRIBUTE_2, VALUE_2 from GROUP_DEFINITIONS
UNION
SELECT GROUP_ID, ATTRIBUTE_3, VALUE_3 from GROUP_DEFINITIONS
)
MERGE GROUP_MAP gm
USING
(
SELECT am.CUST_ID, gd.GROUP_ID, am.START_DATE
FROM ATTRIBUTE_MAP am
JOIN gd
ON am.ATTRIBUTE_ID = gd.ATTRIBUTE_ID AND coalesce(am.VALUE, '') = coalesce(gd.VALUE, '')
join (select GROUP_ID, count(*) as ATTR_COUNT from gd where ATTRIBUTE_ID is NOT NULL group by GROUP_ID) as gc
on gd.GROUP_ID = gc.GROUP_ID
GROUP BY am.CUST_ID, gd.GROUP_ID, am.START_DATE
HAVING count(am.ATTRIBUTE_ID) = max(gc.ATTR_COUNT)
) src
ON gm.GROUP_ID = src.GROUP_ID
AND gm.CUST_ID = src.CUST_ID
WHEN NOT MATCHED
THEN -- create association in GROUP_MAP
INSERT(CUST_ID,
GROUP_ID,
FROM_DATE) VALUES
(src.CUST_ID, src.GROUP_ID, src.START_DATE);
If I understood the problem correctly this should do it:
Please note I have used GETDATE() as I do not have the field [START_DATE] but you will need to substitute this within the code
SAMPLE DATA:
CREATE TABLE #ATTRIBUTE_MAP(CUST_ID INT,
ATTRIBUTE_ID INT,
VALUE VARCHAR(20));
INSERT INTO #ATTRIBUTE_MAP
VALUES
(50, 1, 'New York'),
(50, 2, NULL);
CREATE TABLE #GROUP_DEFINITIONS(GROUP_ID INT,
ATTRIBUTE_1 INT,
VALUE_1 VARCHAR(20),
ATTRIBUTE_2 INT,
VALUE_2 VARCHAR(20),
ATTRIBUTE_3 INT,
VALUE_3 VARCHAR(20));
INSERT INTO #GROUP_DEFINITIONS
VALUES
(10, 1, 'New York', 2, NULL, NULL, NULL),
(20, 2, NULL, NULL, NULL, NULL, NULL);
CREATE TABLE #GROUP_MAP(CUST_ID INT,
GROUP_ID INT,
[FROM_DATE] DATE);
QUERY:
MERGE #GROUP_MAP gm
USING
(SELECT DISTINCT
am.CUST_ID,
CAST(GETDATE() AS DATE) AS [START_DATE], --<-- you will need to change this
gd.GROUP_ID
FROM #ATTRIBUTE_MAP am
INNER JOIN
(
SELECT GROUP_ID,
ATTRIBUTE_1 AS ATTRIBUTE_ID,
VALUE_1
FROM #GROUP_DEFINITIONS
UNION ALL
SELECT GROUP_ID,
ATTRIBUTE_2,
VALUE_2
FROM #GROUP_DEFINITIONS
UNION ALL
SELECT GROUP_ID,
ATTRIBUTE_3,
VALUE_3
FROM #GROUP_DEFINITIONS) gd ON am.ATTRIBUTE_ID = gd.ATTRIBUTE_ID) src
ON gm.GROUP_ID = src.GROUP_ID
AND gm.CUST_ID = src.CUST_ID
WHEN NOT MATCHED
THEN -- create association in GROUP_MAP
INSERT(CUST_ID,
GROUP_ID,
FROM_DATE) VALUES
(src.CUST_ID, src.GROUP_ID, src.START_DATE);
VERIFY RESULT:
SELECT CUST_ID , GROUP_ID , FROM_DATE
FROM #GROUP_MAP;
RESULT:

Get id of max value in group

I have a table and i would like to gather the id of the items from each group with the max value on a column but i have a problem.
SELECT group_id, MAX(time)
FROM mytable
GROUP BY group_id
This way i get the correct rows but i need the id:
SELECT id,group_id,MAX(time)
FROM mytable
GROUP BY id,group_id
This way i got all the rows. How could i achieve to get the ID of max value row for time from each group?
Sample Data
id = 1, group_id = 1, time = 2014.01.03
id = 2, group_id = 1, time = 2014.01.04
id = 3, group_id = 2, time = 2014.01.04
id = 4, group_id = 2, time = 2014.01.02
id = 5, group_id = 3, time = 2014.01.01
and from that i should get id: 2,3,5
Thanks!
Use your working query as a sub-query, like this:
SELECT `id`
FROM `mytable`
WHERE (`group_id`, `time`) IN (
SELECT `group_id`, MAX(`time`) as `time`
FROM `mytable`
GROUP BY `group_id`
)
Have a look at the below demo
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(id INT , group_id INT , time_st DATE);
INSERT INTO mytable VALUES(1, 1, '2014-01-03'),(2, 1, '2014-01-04'),(3, 2, '2014-01-04'),(4, 2, '2014-01-02'),(5, 3, '2014-01-01');
/** Check all data **/
SELECT * FROM mytable;
+------+----------+------------+
| id | group_id | time_st |
+------+----------+------------+
| 1 | 1 | 2014-01-03 |
| 2 | 1 | 2014-01-04 |
| 3 | 2 | 2014-01-04 |
| 4 | 2 | 2014-01-02 |
| 5 | 3 | 2014-01-01 |
+------+----------+------------+
/** Query for Actual output**/
SELECT
id
FROM
mytable
JOIN
(
SELECT group_id, MAX(time_st) as max_time
FROM mytable GROUP BY group_id
) max_time_table
ON mytable.group_id = max_time_table.group_id AND mytable.time_st = max_time_table.max_time;
+------+
| id |
+------+
| 2 |
| 3 |
| 5 |
+------+
When multiple groups may contain the same value, you could use
SELECT subq.id
FROM (SELECT id,
value,
MAX(time) OVER (PARTITION BY group_id) as max_time
FROM mytable) as subq
WHERE subq.time = subq.max_time
The subquery here generates a new column (max_time) that contains the maximum time per group. We can then filter on time and max_time being identical. Note that this still returns multiple rows per group if the maximum value occurs multiple time within the same group.
Full example:
CREATE TABLE test (
id INT,
group_id INT,
value INT
);
INSERT INTO test (id, group_id, value) VALUES (1, 1, 100);
INSERT INTO test (id, group_id, value) VALUES (2, 1, 200);
INSERT INTO test (id, group_id, value) VALUES (3, 1, 300);
INSERT INTO test (id, group_id, value) VALUES (4, 2, 100);
INSERT INTO test (id, group_id, value) VALUES (5, 2, 300);
INSERT INTO test (id, group_id, value) VALUES (6, 2, 200);
INSERT INTO test (id, group_id, value) VALUES (7, 3, 300);
INSERT INTO test (id, group_id, value) VALUES (8, 3, 200);
INSERT INTO test (id, group_id, value) VALUES (9, 3, 100);
select * from test;
id | group_id | value
----+----------+-------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 300
4 | 2 | 100
5 | 2 | 300
6 | 2 | 200
7 | 3 | 300
8 | 3 | 200
9 | 3 | 100
(9 rows)
SELECT subq.id
FROM (SELECT id,
value,
MAX(value) OVER (partition by group_id) as max_value
FROM test) as subq
WHERE subq.value = subq.max_value;
id
----
3
5
7
(3 rows)

Oracle duplicate row N times where N is a column

I'm new to Oracle and I'm trying to do something a little unusual. Given this table and data I need to select each row, and duplicate ones where DupCount is greater than 1.
create table TestTable
(
Name VARCHAR(10),
DupCount NUMBER
)
INSERT INTO TestTable VALUES ('Jane', 1);
INSERT INTO TestTable VALUES ('Mark', 2);
INSERT INTO TestTable VALUES ('Steve', 1);
INSERT INTO TestTable VALUES ('Jeff', 3);
Desired Results:
Name DupCount
--------- -----------
Jane 1
Mark 2
Mark 2
Steve 1
Jeff 3
Jeff 3
Jeff 3
If this isn't possible via a single select statement any help with a stored procedure would be greatly appreciated.
You can do it with a hierarchical query:
SQL Fiddle
Query 1:
WITH levels AS (
SELECT LEVEL AS lvl
FROM DUAL
CONNECT BY LEVEL <= ( SELECT MAX( DupCount ) FROM TestTable )
)
SELECT Name,
DupCount
FROM TestTable
INNER JOIN
levels
ON ( lvl <= DupCount )
ORDER BY Name
Results:
| NAME | DUPCOUNT |
|-------|----------|
| Jane | 1 |
| Jeff | 3 |
| Jeff | 3 |
| Jeff | 3 |
| Mark | 2 |
| Mark | 2 |
| Steve | 1 |
You can do this with a recursive cte. It would look like this
with cte as (name, dupcount, temp)
(
select name,
dupcount,
dupcount as temp
from testtable
union all
select name,
dupcount,
temp-1 as temp
from cte
where temp > 1
)
select name,
dupcount
from cte
order by name

can this be done in one sql query?

table indexed on the field name
for given value of name "name1" give me that row as well as N rows before and N rows after (alphabetically)
Did it in two select statements replace the number 5 with whatever you want you N to be and change the table name and this will do it. Also replace the asterisk with correct column names. Let me know if you have any problems with this.
select * from
(
Select *
,row_number() over (order by firstname desc) as 'rowNumber'
from attendees
) as temp
where rowNumber between
(
select rownumber-1
from
(
Select *, row_number() over (order by firstname desc) as 'rowNumber'
from attendees
) as temp
where firstname = 'name1') AND (
select rownumber+1
from
(
Select *, row_number() over (order by firstname desc) as 'rowNumber'
from attendees
) as temp
where firstname = 'name1')
The following gets you the row with name = 'name4', the two rows before that, and the two rows after that.
drop table t;
create table t(
name varchar(20)
,primary key(name)
);
insert into t(name) values('name1');
insert into t(name) values('name2');
insert into t(name) values('name3');
insert into t(name) values('name4');
insert into t(name) values('name5');
insert into t(name) values('name6');
insert into t(name) values('name7');
commit;
(select name from t where name = 'name4')
union all
(select name from t where name > 'name4' order by name asc limit 2)
union all
(select name from t where name < 'name4' order by name desc limit 2);
+-------+
| name |
+-------+
| name1 |
| name2 |
| name4 |
| name5 |
| name6 |
+-------+
Edit:
Added descending order by as pointed out by cyberkiwi (otherwise I would have gotten the "first" 2 items on the wrong end).