Join Table + Group by + Sum + Finding Nth highest sum value - sql

I have 2 tables -
Broker Table ->Column names-->ID,Name,City,branch
Policy Table ->Column names-->Policy_num,Broker_id,Premium
I need to find the broker name having 4th highest sum of the premiums.
Please suggest the SQL query for this.
NOTE:I am looking for a query that can run on all platforms.So please don't use
TOP,LIMIT or ROWNUMBER functions.
I am looking for a query similar to mentioned below,which is just finding 4th highest salary from a table.
SELECT Salary
FROM EmployeeSalary Emp1
WHERE 3 = (
SELECT COUNT( DISTINCT ( Emp2.Salary ) )
FROM EmployeeSalary Emp2
WHERE Emp2.Salary > Emp1.Salary
)

First, you don't need a join, because all the information you need is in the policy table. You can join in broker at the end, if you want.
Second, the question is ambiguous, because it doesn't specify how to handle ties.
But basically, you can follow the same structure as the query in your question:
SELECT p.BrokerId
FROM Policy p
GROUP BY p.BrokerId
HAVING 4 = (SELECT COUNT(*)
FROM (SELECT SUM(p2.Premium) as sumpremium
FROM Policy p2
GROUP BY p2.BrokerId
)
WHERE sumpremium >= SUM(p.premium)
);
But ironically, this is a very, very poor way to approach the problem. Why? There is no guarantee that the SUM() on the two aggregation queries will be exactly the same.
This difference can occur in two ways. If premium is a floating point number, then the order of addition can make a difference. Adding values near zero to a large value loses precision. So check this out:
select ( (v1 + v2) + v3), (v1 + (v2 + v3))
from (select cast(100000000.0001 as float) as v1, cast(-100000001.0001 as float) as v2, cast(0.000002 as float) as v3) x;
Even with fixed point numbers you can still have problems, because adding all the positive numbers first can result in an overflow.
It is really better to do arithmetic calculations once rather than doing them multiple times and then comparing the values.

Try this-
SELECT *
FROM
(
SELECT B.Broker_ID,A.Name,
SUM(B.Premium) S_P,
ROW_NUMBER() OVER (ORDER BY SUM(B.Premium) DESC) RN
FROM Broker A
INNER JOIN Policy B ON A.ID = B.Broker_ID
GROUP BY B.Broker_ID,A.Name
)A
WHERE A.RN = 4
Another option you can use TOP twice to get your desired output as below-
SELECT TOP 1 * FROM
(
SELECT TOP 4
B.Broker_ID,A.Name,
SUM(B.Premium) S
FROM Broker A
INNER JOIN Policy B ON A.ID = B.Broker_ID
GROUP BY B.Broker_ID,A.Name
ORDER BY SUM(your_id_column) DESC
)A
ORDER BY S
Another option considering MSSQL-
SELECT * FROM
(
SELECT B.Broker_ID,A.Name,
SUM(B.Premium) S_P,
COUNT(*) OVER (ORDER BY SUM(B.Premium) DESC ROWS UNBOUNDED PRECEDING) AS RN
FROM Broker A
INNER JOIN Policy B ON A.ID = B.Broker_ID
GROUP BY B.Broker_ID,A.Name
)A
WHERE A.RN = 4
Another option considering your sample query-
SELECT * FROM
(
SELECT B.Broker_ID, A.Name, SUM(B.Premium) S
FROM Broker A INNER JOIN Policy B ON A.ID = B.Broker_ID
GROUP BY B.Broker_ID,A.Name
)A
WHERE 3 = (
SELECT COUNT(*)
FROM
(
SELECT B.Broker_ID, A.Name, SUM(B.Premium) S
FROM Broker A INNER JOIN Policy B ON A.ID = B.Broker_ID
GROUP BY B.Broker_ID,A.Name
) B
WHERE B.S > A.S
)

Related

SQL join best matcher

I have table A:
I need to join (SQL) this table onto table B, where I use ProductName as a join, but I want the following order of priorities:
Country being selected as a single row if it has a value (With Standard being null)
Using the combination of Country and Standard
Using Standard by itself (If Country being null).
I have tried looking around a lot. I hope the problem statement is clear.
Table A:
|ProductName|Country|Standard|Reportable|
|ProductA|Null|Value1|Y|
|ProductA|Value2|Value1|N|
|ProductA|Value2|Null|N|
The above is just a subset of the data, but basically the country and standard determine if the output is reportable. Product A could have 1 line or 3, depending on the data required.
Table B:
|ProductName|Year|
|ProductA|2006|
So the final join for the above should be:
|ProductName|Year|Country|Standard|Reportable|
|ProductA|2006|Value2|Null|N|
perhaps something like this: this is pseudo code at this time but hopefully gets the general concepts across.
Does assume A, B tables product are inner join related and not outer join.
the 1st CTE sets your priority. CTE (Common Table Expression)
the 2nd cte assigns a row number based on your priorities.
then the final query filters for the first row number encountered.
Obviously this is untested as we have no sample data or structure to test with at this time.
WITH CTE AS (
SELECT A.*
, B.*
, CASE WHEN A.Attribute is not null and B.attribute is null then 1
WHEN A.Attribute is not null and B.attribute is not null then 2
WHEN A.Attribute is null and B.Attribute is not null then 3 end as priority
FROM A
INNER JOIN B
on A.PRODUCT = B.PRODUCT),
CTE2 as (
SELECT CTE.*
, rowNumber() over (Order by Priority desc) RN
FROM CTE)
SELECT *
FROM CTE2
WHERE RN = 1
Something like this, maybe?
SELECT s.*, p.*
FROM source_table s
OUTER APPLY ( SELECT p.*
FROM product_table p
WHERE p.product_name = s.product_name
AND ( ( p.country = s.country AND s.standard IS NULL )
OR ( p.country = s.country AND p.standard = s.standard )
OR ( s.country IS NULL AND p.standard = s.standard )
ORDER BY CASE
WHEN ( p.country = s.country AND s.standard IS NULL ) THEN 1
WHEN ( p.country = s.country AND p.standard = s.standard ) THEN 2
WHEN ( s.country IS NULL AND p.standard = s.standard ) THEN 3
ELSE 99
FETCH FIRST 1 ROW ONLY )
OUTER APPLY (instead of CROSS APPLY) keeps your source_table result even if there is no product match. That may not be your desired outcome. If not, switch to CROSS APPLY.
There are probably ways to shorten the conditions and the sort order using NVL(). But I think this is the clearest way to start.
Also, if this is always a product match using one of those three conditions, you can shorten the WHERE clause in the OUTER APPLY.

Find MAX with JOIN where Field also shows up in another Table

I have 3 tables: Master, Paper and iCodes. For a certain set of Master.Ref's, I need to find Max(Paper.Date), where the Paper.Code is also in the iCodes table (i.e., Paper.Code is a type of iCode). Master is joined to Paper by the File field.
EDIT:
I only need the Max(Paper.Date) its corresponding Code; I do not need all of the Codes.
I wrote the following but it is very slow. I have a few hundred ref #'s to look for. What is a better way to do this?
SELECT Master.Ref,
Paper.Code,
mp.MaxDate
FROM ( SELECT p.File ,
MAX(p.Date) AS MaxDate ,
FROM Paper AS p
LEFT JOIN Master AS m ON p.File = m.File
WHERE m.Ref IN ('ref1', 'ref2', 'ref3', 'ref4', 'ref5', 'ref6'... )
AND p.Code IN ( SELECT DISTINCT i.iCode
FROM iCodes AS i
)
GROUP BY p.File
) AS mp
LEFT JOIN Master ON mp.File = Master.File
LEFT JOIN Paper ON Master.File = Paper.File
AND mp.MaxDate = Paper.Date
WHERE Paper.Code IN ( SELECT DISTINCT iCodes.iCode
FROM iCodes
)
Does this do what you want?
SELECT m.Ref, p.Code, max(p.date)
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
GROUP BY m.Ref, p.Code;
EDIT:
To get the code on the max date, then use window functions:
select ref, code, date
from (SELECT m.Ref, p.Code, p.date
row_number() over (partition by m.Ref order by p.date desc) as seqnum
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
) mp
where seqnum = 1;
The function row_number() assigns a sequential number starting at 1 to a group of rows. The groups are defined by the partition by clause, so in this case everything with the same m.Ref value would be in a single group. Within the group, rows are assigned the number based on the order by clause. So, the one with the biggest date gets the value of 1. That is the row you want.

oracle select from 2 table restrict from 1 table and order by another

I have book and recipient table. I want to select maximum 20 rows order by recipient table's membershipdate column. After I got it, i want to order it by book table's id column. I wrote that sql. Is there any way to do this with less code?
SELECT *
FROM ( SELECT *
FROM ( SELECT b.*
FROM book b
JOIN recipient r ON r.id = b.recipient_id
WHERE b.bookno = 115
ORDER BY r.membershipdate DESC
)
WHERE ROWNUM <= 20
)
ORDER BY ID DESC
You can remove one layer of select:
SELECT *
FROM (
SELECT b.*
FROM book b
JOIN recipient r ON r.id = b.recipient_id
WHERE b.bookno = 115
ORDER BY r.membershipdate DESC
)
WHERE ROWNUM <= 20
ORDER BY id DESC;
Selecting b.* isn't normally a good, idea, it's better to specify the columns you actually want, even if you do really want them all - to make sure you get them in the order you expect.
You can also look at the row_number() analytic function in place of rownum, but that will give you slightly more code - not that it should matter, the effectiveness and efficiency of the query it rather more important that its length.
Doesn't this work?
SELECT * FROM
(SELECT b.*
FROM book b
JOIN recipient r ON r.id = b.recipient_id
WHERE b.bookno = 115 ORDER BY r.membershipdate DESC
) WHERE ROWNUM <= 20
ORDER BY ID DESC
In oracle you can use ROW_NUMBER() function to reduce the code as you want -
and you can even mix your Order columns also.
select * from (
select b.*,row_number() over (order by r.membershipdate desc) cnt
from book b
JOIN recipient r ON r.id = b.recipient_id
WHERE b.bookno = 115
order by cnt,b.id desc
) where cnt<=20;

Getting difference of two counts in SQL

I'm doing some QA in Netezza and I need to compare the counts from two separate SQL statements. This is the SQL that I am currently using
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
MINUS
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
However, it seems like MINUS doesn't work like that. When the counts match, instead of returning 0, this will return null for Record_count. I basically the record count to be computed as:
record_count=count1-count2
So it is 0 if the counts are equal or the difference otherwise. What is the correct SQL for this?
SELECT
(
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
) -
(
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
) TotalCount
Oracle's MINUS (EXCEPT in SQL Server) is a whole different animal :)
If you understand UNION and then think sets, you will understand MINUS / EXCEPT
MINUS is set difference, not for arithmetic operations.
You could do
SELECT COUNT(*) - (SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY
Or another option
SELECT COUNT(*) - COUNT(DISTINCT A.PrimaryKey) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY
I think this may be what you are looking for
SELECT COUNT(distinct(CURRENTLY_OPTED_IN_FL + F_EMAIL.MID_KEY)) - count(distinct(EXT_ACXIOM_WUL_FILE.MID_KEY))
FROM EXT_ACXIOM_WUL_FILE
LEFT OUTER JOIN F_EMAIL
ON JOIN F_EMAIL.MID_KEY = EXT_ACXIOM_WUL_FILE.MID_KEY

MYSQL top N rows from multiple table join

Like, there is top keyword in sql server 2005, how to select top 1 row in mysql if i have join on multiple table & want to retrieve extreme of each ID/column. Limit restricts the no. of row returns so it can't solve my problem.
SELECT v.*
FROM document d
OUTER APPLY
(
SELECT TOP 1 *
FROM version v
WHERE v.document = d.id
ORDER BY
v.revision DESC
) v
or
SELECT v.*
FROM document d
LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY v.id ORDER BY revision DESC)
FROM version
) v
ON v.document = d.id
AND v.rn = 1
The latter is more efficient if your documents usually have few revisions and you need to select all or almost all documents; the former is more efficient if the documents have many revisions or you need to select just a small subset of documents.
Update:
Sorry, didn't notice the question is about MySQL.
In MySQL, you do it this way:
SELECT *
FROM document d
LEFT JOIN
version v
ON v.id =
(
SELECT id
FROM version vi
WHERE vi.document = d.document
ORDER BY
vi.document DESC, vi.revision DESC, vi.id DESC
LIMIT 1
)
Create a composite index on version (document, revision, id) for this to work fast.
If I understand you correctly, top doesn't solve your problem either. top is exactly equivalent to limit. What you are looking for is aggregate functions, like max() or min() if you want the extremes. for example:
select link_id, max(column_a), min(column_b) from table_a a, table_b b
where a.link_id = b.link_id group by link_id