Alternative to DENSE_RANK - sql

I have the following table (table1):
f_name | email
---------|---------------------
john | john123#hotmail.com
peter | peter456#gmail.com
johnny | john123#hotmail.com
peter8 | peter456#gmail.com
...
I would like to add a Group number for the same email value column:
f_name | email |Group |
---------|---------------------|------|
john | john123#hotmail.com | 1 |
peter | peter456#gmail.com | 2 |
johnny | john123#hotmail.com | 1 |
peter8 | peter456#gmail.com | 2 |
...
I use the following:
SELECT
email,
s_index = ROW_NUMBER() OVER(PARTITION BY [email] ORDER BY [email]),
t_index = DENSE_RANK() OVER (ORDER BY [email])
FROM dbo.table1
Is it the best way to do it for big data in Oracle? How can it be done in Impala?

Maybe much to slow, and maybe not applicable; but if you're looking for an alternative to dense_rank()...
If your table has - besides f_name and email - some unique id, then one could take the minimal id of each email-group as a value for the groupId. Of course, the groupId must have the same type as the id, and it should not matter that the values for groupId are not continuous, i.e. they will have gaps:
CREATE TABLE Table1
("id" int, "f_name" varchar2(9), "email" varchar2(21), "groupId" int)
\\
INSERT ALL
INTO Table1 ("id", "f_name", "email")
VALUES (1, 'john', 'john123#hotmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (2, 'peter', 'peter456#gmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (3, 'johnny', 'john123#hotmail.com')
INTO Table1 ("id", "f_name", "email")
VALUES (4, 'peter8', 'peter456#gmail.com')
SELECT * FROM dual
\\
update table1 set "groupId" =
(select min(t2."id") as groupId
from table1 t2
where table1."email" = t2."email"
group by t2."email"
)

Related

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

SQLite query - filter name where each associated id is contained within a set of ids

I'm trying to work out a query that will find me all of the distinct Names whose LocationIDs are in a given set of ids. The catch is if any of the LocationIDs associated with a distinct Name are not in the set, then the Name should not be in the results.
Say I have the following table:
ID | LocationID | ... | Name
-----------------------------
1 | 1 | ... | A
2 | 1 | ... | B
3 | 2 | ... | B
I'm needing a query similar to
SELECT DISTINCT Name FROM table WHERE LocationID IN (1, 2);
The problem with the above is it's just checking if the LocationID is 1 OR 2, this would return the following:
A
B
But what I need it to return is
B
Since B is the only Name where both of its LocationIDs are in the set (1, 2)
You can try to write two subquery.
get count by each Name
get count by your condition.
then join them by count amount, which means your need to all match your condition count number.
Schema (SQLite v3.17)
CREATE TABLE T(
ID int,
LocationID int,
Name varchar(5)
);
INSERT INTO T VALUES (1, 1,'A');
INSERT INTO T VALUES (2, 1,'B');
INSERT INTO T VALUES (3, 2,'B');
Query #1
SELECT t2.Name
FROM
(
SELECT COUNT(DISTINCT LocationID) cnt
FROM T
WHERE LocationID IN (1, 2)
) t1
JOIN
(
SELECT COUNT(DISTINCT LocationID) cnt,Name
FROM T
WHERE LocationID IN (1, 2)
GROUP BY Name
) t2 on t1.cnt = t2.cnt;
| Name |
| ---- |
| B |
View on DB Fiddle
You can just use aggregation. Assuming no duplicates in your table:
SELECT Name
FROM table
WHERE LocationID IN (1, 2)
GROUP BY Name
HAVING COUNT(*) = 2;
If Name/LocationID pairs can be duplicated, use HAVING COUNT(DISTINCT LocationID) = 2.

Create an SQL query from two tables in postgresql

I have two tables as shown in the image. I want to create a SQL query in postgresql to get the pkey and minimum count for each unique 'pkey' in table 1 where 'name1' is not present in the array of column 'name' in table 2.
'name' is a array
You can use ANY to check if one element exists in your name's array.
create table t1 (pkey int, cnt int);
create table t2 (pkey int, name text[]);
insert into t1 values (1, 11),(1, 9),(2, 14),(2, 15),(3, 21),(3,16);
insert into t2 values
(1, array['name1','name2']),
(1, array['name3','name2']),
(2, array['name4','name1']),
(2, array['name5','name2']),
(3, array['name2','name3']),
(3, array['name4','name5']);
select pkey
from t2
where 'name1' = any(name);
| pkey |
| ---: |
| 1 |
| 2 |
select t1.pkey, min(cnt) count
from t1
where not exists (select 1
from t2
where t2.pkey = t1.pkey
and 'name1' = any(name))
group by t1.pkey;
pkey | count
---: | ----:
3 | 16
dbfiddle here

Oracle SQL, how to select first row in a group?

Here is the my SQL fiddle: http://sqlfiddle.com/#!4/75ab7/2
Basically, I have created a table and insert some data into it.
CREATE TABLE subject (
id INT NOT NULL,
seq_num INT NOT NULL,
name VARCHAR(30) NOT NULL
);
INSERT INTO subject
(id, seq_num, name)
VALUES
(1, 1, 'sub_1_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(2, 1, 'sub_1_2');
INSERT INTO subject
(id, seq_num, name)
VALUES
(3, 2,'sub_2_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(4, 2, 'sub_2_2');
INSERT INTO subject
(id, seq_num, name)
VALUES
(5, 2, 'sub_2_3');
INSERT INTO subject
(id, seq_num, name)
VALUES
(6, 3, 'sub_3_1');
INSERT INTO subject
(id, seq_num, name)
VALUES
(7, 3, 'sub_3_1');
I run this select statement:
select
LISTAGG(TRIM(id), ',') WITHIN GROUP (ORDER BY 1) AS IDS,
seq_num,
LISTAGG(TRIM(name), ',') WITHIN GROUP (ORDER BY 1) AS NAMES
from
subject
group by
seq_num
order by
seq_num asc
The select statement result:
| ids | seq_num | names |
|-------|---------|-------------------------|
| 1,2 | 1 | sub_1_1,sub_1_2 |
| 3,4,5 | 2 | sub_2_1,sub_2_2,sub_2_3 |
| 6,7 | 3 | sub_3_1,sub_3_1 |
Can I generate something like this?
| ids | seq_num | names |
|-----|---------|---------|
| 1 | 1 | sub_1_1 |
| 3 | 2 | sub_2_1 |
| 6 | 3 | sub_3_1 |
That is only pick the first row in a group.
Use row number:
select
id, seq_num, name
from
(
select id, seq_num, name,
row_number() over (partition by seq_num order by id) rn
from subject
) t
where rn = 1
order by seq_num;
Here is a link to your updated Fiddle:
Demo
You can use keep and first in oracle:
select seq_num,
max(trim(id)) keep (dense_rank first order by trim(id)) as first_id,
max(trim(name)) keep (dense_rank first order by trim(id)) as first_name
from subject
group by seq_num
order by seq_num asc;
Here is the SQL Fiddle.
This should work:
select
min(id) AS IDS,
seq_num,
min(name) AS NAMES
from
subject
group by
seq_num
order by
seq_num asc;
Working Demo
Hope it helps!

Get id of max value in group

I have a table and i would like to gather the id of the items from each group with the max value on a column but i have a problem.
SELECT group_id, MAX(time)
FROM mytable
GROUP BY group_id
This way i get the correct rows but i need the id:
SELECT id,group_id,MAX(time)
FROM mytable
GROUP BY id,group_id
This way i got all the rows. How could i achieve to get the ID of max value row for time from each group?
Sample Data
id = 1, group_id = 1, time = 2014.01.03
id = 2, group_id = 1, time = 2014.01.04
id = 3, group_id = 2, time = 2014.01.04
id = 4, group_id = 2, time = 2014.01.02
id = 5, group_id = 3, time = 2014.01.01
and from that i should get id: 2,3,5
Thanks!
Use your working query as a sub-query, like this:
SELECT `id`
FROM `mytable`
WHERE (`group_id`, `time`) IN (
SELECT `group_id`, MAX(`time`) as `time`
FROM `mytable`
GROUP BY `group_id`
)
Have a look at the below demo
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(id INT , group_id INT , time_st DATE);
INSERT INTO mytable VALUES(1, 1, '2014-01-03'),(2, 1, '2014-01-04'),(3, 2, '2014-01-04'),(4, 2, '2014-01-02'),(5, 3, '2014-01-01');
/** Check all data **/
SELECT * FROM mytable;
+------+----------+------------+
| id | group_id | time_st |
+------+----------+------------+
| 1 | 1 | 2014-01-03 |
| 2 | 1 | 2014-01-04 |
| 3 | 2 | 2014-01-04 |
| 4 | 2 | 2014-01-02 |
| 5 | 3 | 2014-01-01 |
+------+----------+------------+
/** Query for Actual output**/
SELECT
id
FROM
mytable
JOIN
(
SELECT group_id, MAX(time_st) as max_time
FROM mytable GROUP BY group_id
) max_time_table
ON mytable.group_id = max_time_table.group_id AND mytable.time_st = max_time_table.max_time;
+------+
| id |
+------+
| 2 |
| 3 |
| 5 |
+------+
When multiple groups may contain the same value, you could use
SELECT subq.id
FROM (SELECT id,
value,
MAX(time) OVER (PARTITION BY group_id) as max_time
FROM mytable) as subq
WHERE subq.time = subq.max_time
The subquery here generates a new column (max_time) that contains the maximum time per group. We can then filter on time and max_time being identical. Note that this still returns multiple rows per group if the maximum value occurs multiple time within the same group.
Full example:
CREATE TABLE test (
id INT,
group_id INT,
value INT
);
INSERT INTO test (id, group_id, value) VALUES (1, 1, 100);
INSERT INTO test (id, group_id, value) VALUES (2, 1, 200);
INSERT INTO test (id, group_id, value) VALUES (3, 1, 300);
INSERT INTO test (id, group_id, value) VALUES (4, 2, 100);
INSERT INTO test (id, group_id, value) VALUES (5, 2, 300);
INSERT INTO test (id, group_id, value) VALUES (6, 2, 200);
INSERT INTO test (id, group_id, value) VALUES (7, 3, 300);
INSERT INTO test (id, group_id, value) VALUES (8, 3, 200);
INSERT INTO test (id, group_id, value) VALUES (9, 3, 100);
select * from test;
id | group_id | value
----+----------+-------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 300
4 | 2 | 100
5 | 2 | 300
6 | 2 | 200
7 | 3 | 300
8 | 3 | 200
9 | 3 | 100
(9 rows)
SELECT subq.id
FROM (SELECT id,
value,
MAX(value) OVER (partition by group_id) as max_value
FROM test) as subq
WHERE subq.value = subq.max_value;
id
----
3
5
7
(3 rows)