Finding maximum value from each group set in Oracle - sql

I have below scenario:
Input Data:
---------------------------------------------------------------
| ID | Account | Sub_Acct | Email |
---------------------------------------------------------------
| 100 | AD | AD1 | 123#xyz.com |
| 100 | AB | AB1 | test#abc.com, 123#xyz.com |
| 100 | AB | AB2 | test#abc.com, 123#xyz.com |
| 200 | CD | CD1 | test.1#pqr.com, 123#abc.com |
| 200 | AB | AB1 | test.2#pqr.com |
| 200 | CD | CD2 | test.1#pqr.com, 123#abc.com |
| 200 | AB | AB2 | 123#abc.com |
| 200 | CD | CD3 | test.1#pqr.com, 123#abc.com |
---------------------------------------------------------------
I need to take count of individual accounts partitioned by IDs. Whichever account has maximum count, I want to display that respective account with single entry with column Sub_acct populated as NULL. Rest all other accounts should be populated with their respective Sub_acct values within that specific ID.
Email domains to be extracted from Email column. The email_domain column values of the secondary accounts (within specific ID) will have values from the primary account( i.e. maximum count).. Below is the expected output:
------- -------------------------------------------
| ID | Account | Sub_Acct | Email_Domain |
------- -------------------------------------------
| 100 | AB | | abc.com, xyz.com |
| 100 | AD | AD1 | abc.com, xyz.com |
| 200 | CD | | pqr.com, abc.com |
| 200 | AB | AB1 | pqr.com, abc.com |
| 200 | AB | AB2 | pqr.com, abc.com |
---------------------------------------------------
I have edited this question. Sorry for the trouble caused.
Can someone pls help with the sql query in Oracle. Thanks in advance.

This works in Oracle 10...
with rank1 as
(
select id,
account,
email, count(account) as account_count,
rank() over (partition by id order by count(account) desc) as order_rank
from table1
group by id, account, email
)
select t1.*,
r1.account as primary_account,
r1.email as primary_email
from table1 t1
join rank1 r1
on r1.id = t1.id
where r1.order_rank = 1

In your table an ID/Account pair can occur multiple times. And in your example such a pair has always the same Email. Is this guaranteed to be so? If it is, then your table isn't normalized and you can get consistency problems in the future.
However, relying on this would make writing the query easy-peasy:
select
t.id,
t.account,
t.email,
s.primary_account,
s.primary_email
from mytable
join
(
select
id,
stats_mode(account) as primary_account,
stats_mode(email) as primary_email
from mytable
group by id
) s on s.id = t.id
order by id, account;
(It would be even easier, supported Oracle STATS_MODE OVER, but it doesn't yet.)

Related

SQL Group By and Join based on a weird client table

I have 3 tables that I want to join together and group it to get client membership info. My code works for grouping the base table together but it breaks at the join part and I can't figure out why.
BASE TABLE : sales_detail
+-------+-----------+-----------+-----------------------------------------+
| order_date | transaction_id| product_cost | payment_type | country
+-------+-----------+-----------+------------------------------------------+
| 10/1 | 12345 | 20 | mastercard | usa
| 10/1 | 12345 | 50 | mastercard | usa
| 10/5 | 82456 | 50 | mastercard | usa
| 10/9 | 64789 | 30 | visa | canada
| 10/15 | 08546 | 20 | mastercard | usa
| 10/15 | 08546 | 90 | mastercard | usa
| 10/17 | 65898 | 50 | mastercard | usa
+-------+-----------+-----------+-------------------------------------+
table : client_information
+-------+-----------+-----------+-------------------+
| other_id | client_Type | item
+-------+-----------+-----------+----------+
| 112341 | new | hola |
| 112341 | old | mango |
| 145634 | old | pine |
| 879547 | old | vip |
| 745688 | new | unio |
| 745688 | old | dog |
| 147899 | new | cat |
| 124589 | new | amigo |
+-------+-----------+-----------+-----------+
table : connector
+-------+-----------+-----------+-------------------+
| transaction_ID | other_id | item
+-------+-----------+-----------+----------+
| 12345 | 112341 | hola |
| 82456 | 145634 | pine |
| 08157 | 879547 | unio |
| 08546 | 745688 | dog |
| 65898 | 147899 | cat |
| 06587 | 124589 | amigo |
+-------+-----------+-----------+-----------+
**I want the output to look something like this: **
IDEAL OUTPUT
+-------+-----------+-----------+--------------------------------+
| order_date | transaction_ID | product_cost | client_Type|
+-------+-----------+-----------+--------------------------------+
| 10/1 | 12345 | 70 | new |
| 10/5 | 82456 | 70 | old |
| 10/15 | 08546 | 110 | old |
| 10/17 | 65898 | 50 | new |
+-------+-----------+-----------+----------------------------------+
**i am trying to join my base table to the connector table by transaction ID to get other_id and items to match to client_type **
This is the code i used but it failed to compile after adding in left joins :
select t1.transaction_id, sum(t1.product_cost), t1.order_date, t3.client_type
from sales_detail t1
left join (select DISTINCT transaction_ID, other_id, fruits from connector) t2
ON t1.transaction_ID=t2.transaction_ID
left join (select DISTINCT order_id, client_type, fruits from client information) t3
ON t2.other_id=t3.other_id and t2.item=t3.item
where t1.payment_type='mastercard' and t1.order_Date between '2020-10-01' and'2020-10-31'
and country != 'canada'
GROUP BY t1.transaction_id, t1.order_date, t3.client_type;
Thanks in advance! I am a beginner so still learning the ins and outs of sql! (am using hive)
I think that's joins and aggregation. For more efficiency, you can pre-aggregate in a subquery, then join:
select sd.*, ci.client_type
from (
select order_date, transaction_id, sum(product_cost) product_cost
from sales_detail
where
payment_type = 'mastercard'
and order_date >= '2020-10-01'
and order_date < '2020-11-01'
and country <> 'canada'
group by order_date, transaction_id
) sd
inner join connector c on c.transaction_id = sd.transaction_id
inner join client_information ci on ci.other_id = c.other_id
Note that I rewrote the filter on order_date to use half-open intervals rather than between. This properly handles the case when your dates have a time portion.
From what I have understood, your code works although not as you would like using an INNER JOIN and it fails to add a LEFT JOIN. I think what happens is a failure due to the NULL elements. to add a NULL element and not get an error, you have to use some function that changes the NULL value to 0 .
One such function is the ISNULL(yourColumn, 0) function of T-SQL.The documentation.
I can see that in result table you only need clients who used mastercard, so you should use inner join there so only those client who used mastercard will be considered. While the remaining query is okay i guess, but main problem was the join on client information.
I think on the answer with GMB you also need to join on item column otherwise you will get multiple rows output.
select sd.*, ci.client_type
from (
select order_date, transaction_id, sum(product_cost) product_cost
from sales_detail
group by order_date, transaction_id
) sd
inner join connector c on c.transaction_id = sd.transaction_id
inner join client_information ci on ci.other_id = c.other_id and ci.item = c.item
Just modify with your filters and you should be sorted.

Oracle to many Same Table

I have to pull records that have location ids spanning multiple contract ids, also excluding duplicates. Below is a sample table. The second table is the query with my desired result.
| CONTRACT ID | LOCATION ID | CONTRACT NAME | CONTRACT DATE | CONTRACT STATUS |
--------------------------------------------------------------------------
| CT1 | 100 | MICROSOFT | 12/09/2029 | PENDING
| CT1 | 100 | MICROSOFT | 12/09/2029 | APPROVED
| CT3 | 155 | YAHOO | 02/03/2030 | EXPIRED
| CT4 | 180 | ADOBE | 02/03/2030 | IN LITIGATION
| CT4 | 180 | ADOBE | 02/03/2030 | APPROVED
| CT5 | 199 | YAHOO | 02/03/2030 | PENDING
| CT6 | 100 | GOOGLE | 10/23/2028 | PENDING
| CT7 | 155 | UBER | 05/15/2027 | PENDING
---------------------------------------------------------------------------
| CONTRACT ID | LOCATION ID |
----------------------------------
| CT1 | 100 |
| CT6 | 100 |
| CT3 | 155 |
| CT7 | 155 |
-----------------------------------
I tried to running this query but it also includes records both CT4 Adobe contracts which features a location id that does not span multiple contract ids.
Even if I put a distinct at the beginning of the query, that contract CB4 id/location id pair should not be part of the results.
SELECT contract_id, location_id from random_table where location_id in
(SELECT location_id FROM random_table
where (location_id is not null)
group by location_id having count( location_id) > 1 )
group by contract_id, location_id
order by location_id
You can try like following using WHERE EXISTS.
SELECT contractid,
locationid
FROM [youtable] y
WHERE EXISTS (SELECT 1
FROM [youtable] TY
WHERE TY.locationid = Y.locationid
AND TY.contractid <> Y.contractid)
GROUP BY contractid,
locationid
ORDER BY locationid
Online Demo
Edit:
If you want to find such location id, you can use following query.
SELECT ty.locationid
FROM tablename TY
INNER JOIN tablename y
ON TY.locationid = Y.locationid
AND TY.contractid <> Y.contractid
GROUP BY ty.locationid
You can try below - using correlated subquery
DEMO
select distinct location_id, contractid
from tablename a
where exists (select 1 from tablename b where a.location_id=b.location_id
having count(distinct b.contractid)>1)
OUTPUT:
LOCATIONID contractid
100 CT1
155 CT3
100 CT6
155 CT7

Finding out Percentage Value using Hive

I have some tables as:
Table_1:
+------------+--------------+
| Student_ID | Student_Name |
+------------+--------------+
| 000 | Jack |
| 001 | Ron |
| 002 | Nick |
+------------+--------------+
Table_2:
+-----+-------+-------+
| ID | Total | Score |
+-----+-------+-------+
| 000 | 100 | 80 |
| 001 | 100 | 80 |
| 002 | 100 | 80 |
+-----+-------+-------+
Table_3:
+-----+-------+-------+
| ID | Total | Score |
+-----+-------+-------+
| 000 | 100 | 60 |
| 001 | 100 | 80 |
| 002 | 100 | 70 |
+-----+-------+-------+
Expected_Output:
ID percent
000 70
001 80
002 75
I have created a hive table before. Now, I want to come up with a single HiveQL so that, I can get the expected output from these above 3 tables.
What I am thinking to do is, in my query I will:
use the Left outer join using ID
find the sum of "Total" and "Score" for each ID
divide sum of "Score" by sum of "Total" to get percentage.
I came up with this:
INSERT OVERWRITE TABLE expected_output
SELECT t1.Student_ID AS ID, (100*t4.SUM1/t4.SUM2) AS percent
FROM Table_1 t1
LEFT OUTER JOIN(
SELECT (ISNULL(Total,0) + ISNULL(Total,0)) AS ‘SUM2’, (ISNULL(Score,0) + ISNULL(Score,0)) AS ‘SUM1’
FROM t4
)ON (t1.Student_ID=t2.ID) JOIN Table_3 t3 ON (t3.ID=t2.ID);
And, I am stuck at this point. Not sure how to reach to the result.
Any idea please?
This is a simple join. Assuming you have one row per id in each of tables t2 and t3, you can do
SELECT t2.Student_ID AS ID, 100.0*(t2.score+t3.score)/(t2.total+t3.total) AS percent
FROM Table_2 t2
JOIN Table_3 t3 ON t3.ID=t2.ID

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo

LEFT JOINing the max/top

I have two tables from which I'm trying to run a query to return the maximum (or top) transaction for each person. I should note that I cannot change the table structure. Rather, I can only pull data.
People
+-----------+
| id | name |
+-----------+
| 42 | Bob |
| 65 | Ted |
| 99 | Stu |
+-----------+
Transactions (there is no primary key)
+---------------------------------+
| person | amount | date |
+---------------------------------+
| 42 | 3 | 9/14/2030 |
| 42 | 4 | 7/02/2015 |
| 42 | *NULL* | 2/04/2020 |
| 65 | 7 | 1/03/2010 |
| 65 | 7 | 5/20/2020 |
+---------------------------------+
Ultimately, for each person I want to return the highest amount. If that doesn't work then I'd like to look at the date and return the most recent date.
So, I'd like my query to return:
+----------------------------------------+
| person_id | name | amount | date |
+----------------------------------------+
| 42 | Bob | 4 | 7/02/2015 | (<- highest amount)
| 65 | Ted | 7 | 5/20/2020 | (<- most recent date)
| 99 | Stu | *NULL* | *NULL* | (<- no records in Transactions table)
+----------------------------------------+
SELECT People.id, name, amount, date
FROM People
LEFT JOIN (
SELECT TOP 1 person_id
FROM Transactions
WHERE person_id = People.id
ORDER BY amount DESC, date ASC
)
ON People.id = person_id
I can't figure out what I am doing wrong, but I know it's wrong. Any help would be much appreciated.
You are almost there but since there are duplicate Id in the Transaction table ,so you need to remove those by using Row_number() function
Try this :
With cte as
(Select People,amount,date ,row_number() over (partition by People
order by amount desc, date desc) as row_num
from Transac )
Select * from People as a
left join cte as b
on a.ID=b.People
and b.row_num=1
The result is in Sql Fiddle
Edit: Row_number() from MSDN
Returns the sequential number of a row within a partition of a result set,
starting at 1 for the first row in each partition.
Partition is used to group the result set and Over by clause is used
Determine the partitioning and ordering of the rowset before the
associated window function is applied.