db2 query top group by - sql

I've been trying for hours but can't ge the query, i want to do the following using DB2. From table Company and Users I have the following tickets quantity info per company/user
QUERY USING:
SELECT T.USER, COUNT(T.USER) AS QUANITTY, T.COMPANY FROM TICKET T
INNER JOIN COMPANY P ON P.COMPANY = T.COMPANY
GROUP BY (T.USER, T.COMPANY) ORDER BY QUANTITY DESC
Outcome is:
user company quantity
----------------------------------
mark nissn 300
tom toyt 50
steve kryr 80
mark frd 20
tom toyt 120
jose toyt 230
tom nissn 145
steve toyt 10
jose kryr 35
steve frd 100
THIS SHOULD BE THE RESULT(Top user per company)
user company quantity
----------------------------------
mark nissn 300
jose toyt 230
steve frd 100
steve kryr 80
as you can see there are many users in a company and each have different quantities per company, the result should
get the user with the highest quantity per company. i.e. : Company nissn it has 2 users and each has (mark with 300) and (tom with 145)
so it should give me the highest user which would be mark with 300. The same would be for toyt, frd, kryr. I need all of them in a query.
I wonder if that's possible in a query or I will need to create a stored procedure.

You can do this with analytic queries. But be careful. The pattern usually works out to involve nested subqueries. (One to produce a dataset, the next to add it to the pattern, the third to select out the rows you want.)
In this case it should look something like this.
Original query.
SELECT T.USER, COUNT(T.USER) AS QUANTITY, T.COMPANY
FROM TICKET T
JOIN COMPANY P
ON P.COMPANY = T.COMPANY
GROUP BY (T.USER, T.COMPANY)
Analytic query. (Note that the s is to name the subquery. I have not used DB2, but the standard strictly doesn't prevent them to be dropped, and I know at least one database that requires them.)
SELECT user, quantity, company
, RANK () OVER (PARTITION BY company ORDER BY quantity DESC) as r
FROM ( ... previous query ... ) s
Final result.
SELECT user, quantity, company
FROM ( ... previous query ... ) t
WHERE r = 1
The combined query is:
SELECT user, quantity, company
FROM (
SELECT user, quantity, company
, RANK () OVER (PARTITION BY company ORDER BY quantity DESC) as r
FROM (
SELECT T.USER, COUNT(T.USER) AS QUANTITY, T.COMPANY
FROM TICKET T
JOIN COMPANY P
ON P.COMPANY = T.COMPANY
GROUP BY (T.USER, T.COMPANY)
) s
) t
WHERE r = 1
As I say I have not used DB2. But according to the SQL standard, that query should work.

Related

SQL - Return unique rows based on 2 columns and a condition

I have a table on HSQLDB with data as
Id Account Opendate Baldate LastName ........ State
1 1234 040111 041217 Jackson AZ
2 1234 040111 051217 James FL
3 2345 050112 061213 Thomas CA
4 2345 050112 061213 Kay DE
How can i write a query that gives me rows that have distinct values in Account and Opendate columns, having the maximum Baldate. If Baldate is also same, then return the first row ordered by Id.
So the resultset should contain
Id Account Opendate Baldate LastName........State
2 1234 040111 051217 James FL
3 2345 050112 061213 Thomas CA
I have gotten this far.
select LastName,...,State, max(BalDate) from ACCOUNTS group by Account, Opendate
But the query fails since I cannot use an aggregate function for columns not in group by (lastname, state etc). How can I resolve this?
HSQLDB supports correlated subqueries, so I think this will work:
select a.*
from accounts a
where a.id = (select a2.id
from accounts a2
where a2.account = a.account and a2.opendate = a.opendate
order by baldate desc, id asc
limit 1
);
I'm not familiar with hslqdb, so this is just ANSI. I see that it doesn't support analytical functions, which would make life easier.
If it works the other answer is a lot cleaner.
SELECT ACC_T1.Id,
ACC_T1.Opendate,
ACC_T1.Baldate
ACC_T1.LastName,
...
ACC_T1.State
FROM Account_Table AS ACC_T1
INNER
JOIN (SELECT account,
OpenDate,
MAX(BalDate) AS m_BalDate
FROM AccountTable
GROUP
BY account,
OpenDate
) AS SB1
ON ACC_T1.Account = SB1.Account
AND ACC_T1.OpenDate = SB1.OpenDate
AND ACC_T1.BalDate = SB1.m_BalDate
INNER
JOIN (SELECT account,
OpenDate,
BalDate,
MIN(id) AS m_id
FROM Account_Table
GROUP
BY account,
OpenDate,
BalDate
) AS SB2
ON ACC_T1.Account = SB2.Account
AND ACC_T1.OpenDate = SB2.OpenDate
AND ACC_T1.BalDate = SB2.BalDate
AND ACC_T1.id = SB2.m_id

SQL Server Amount Split

I have below 2 tables in SQL Server database.
Customer Main Expense Table
ReportID CustomerID TotalExpenseAmount
1000 1 200
1001 2 600
Attendee Table
ReportID AttendeeName
1000 Mark
1000 Sam
1000 Joe
There is no amount at attendee level. I have need to manually calculate individual attendee amount as mentioned below. (i.e split TotalExpenseAmount based on number of attendees and ensure individual split figures round to 2 decimals and sums up to the TotalExpenseAmount exactly)
The final report should look like:
ReportID CustID AttendeeName TotalAmount AttendeeAmount
1000 1 Mark 200 66.66
1000 1 Sam 200 66.66
1000 1 Joe 200 66.68
The final report will have about 1,50,000 records. If you notice the attendee amount I have rounded the last one in such a way that the totals match to 200. What is the best way to write an efficient SQL query in this scenario?
You can do this using window functions:
select ReportID, CustID, AttendeeName, TotalAmount,
(case when seqnum = 1
then TotalAmount - perAttendee * (cnt - 1)
else perAttendee
end) as AttendeeAmount
from (select a.ReportID, a.CustID, a.AttendeeName, e.TotalAmount,
row_number() over (partition by reportId order by AttendeeName) as seqnum,
count(*) over (partition by reportId) as cnt,
cast(TotalAmount * 1.0 / count(*) over (partition by reportId) as decimal(10, 2)) as perAttendee
from attendee a join
expense e
on a.ReportID = e.ReportID
) ae;
The perAttendee amount is calculated in the subquery. This is rounded down by using cast() (only because floor() doesn't accept a decimal places argument). For one of the rows, the amount is the total minus the sum of all the other attendees.
Doing something similar to #Gordon's answer but using a CTE instead.
with CTECount AS (
select a.ReportId, a.AttendeeName,
ROW_NUMBER() OVER (PARTITION BY A.ReportId ORDER BY A.AttendeeName) [RowNum],
COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId) [AttendeeCount],
CAST(c.TotalExpenseAmount / (COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId)) AS DECIMAL(10,2)) [PerAmount]
FROM #Customer C INNER JOIN #Attendee A ON A.ReportId = C.ReportID
)
SELECT CT.ReportID, CT.CustomerId, AT.AttendeeName,
CASE WHEN CC.RowNum = 1 THEN CT.TotalExpenseAmount - CC.PerAmount * (CC.AttendeeCount - 1)
ELSE CC.PerAmount END [AttendeeAmount]
FROM #Customer CT INNER JOIN #Attendee AT
ON CT.ReportID = AT.ReportId
INNER JOIN CTECount CC
ON CC.ReportId = CT.ReportID AND CC.AttendeeName = AT.AttendeeName
I like the CTE because it allows me to separate the different aspects of the query. The cool thing that #Gordon used was the Case statement and the inner calculation to have the lines total correctly.

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date

SQL Query For Most Popular Combination

Suppose I have a grocery store application with a table of purchases:
customerId int
itemId int
Four customers come into the store:
Bob buys a banana, lemonade, and a cookie
Kevin buys a banana, lemonade, and a donut
Sam buys a banana, orange juice, and a cupcake
Susie buys a banana
I am trying to write a query which would return which combinations of items are most popular. In this case, the results of this query should be:
banana and lemonade-2
I have already written a query which tells me a list of all items which were in a multi-item purchase (we exclude sales of one item - it cannot form a "combination"). It returns:
banana - 3
lemonade - 2
cookie - 1
donut - 1
cupcake - 1
orange juice - 1
Here is the query:
SELECT itemId, count( * )
FROM grocery_store
INNER JOIN (
SELECT customerId
FROM grocery_store
GROUP BY customerId
HAVING count( itemId ) > 1
)subQuery ON subQuery.customerId = grocery_store.customerId
GROUP BY itemId;
Could I get a pointer about how to expand my existing query to get the desired output?
select a.itemID, b.itemID, COUNT(*) countForCombination
from grocery_store a
inner join grocery_store b
on a.customer_id = b.customer_id
and a.itemID < b.itemID
group by a.itemID, b.itemID
order by countForCombination desc
Assumed:
grocery_store = sales records
customer_id = unique sale
This query takes all the grocery_store records and for each single sales transaction, it creates all the possible combinations (a.itemid, b.itemid) in a specific order (a.itemid
This specific order eliminates duplicates (apple, orange) is kept whereas (orange, apple) is not necessary.
After producing all the combinations from all sales, a simple group by and sorting by count is used to show the most popular combinations at the top

How do I write a standard SQL GROUP BY that includes columns not in the GROUP BY clause

Let's say I have a table called Customer, defined like this:
Id Name DepartmentId Hired
1 X 101 2001/01/01
2 Y 102 2002/01/01
3 Z 102 2003/01/01
And I want to retrieve the date of the last hiring in each department.
Obviously I would do this
SELECT c.DepartmentId, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
Which returns:
101 2001/01/01
102 2003/01/01
But what do I do if I want to return the name of the guy hired? I.e. I would want this result set:
101 2001/01/01 X
102 2003/01/01 Z
Note that the following does not work, as it would return three rows rather than the two I'm looking for:
SELECT c.DepartmentId, c.Name, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
I can't remember seeing a query that achieves this.
NOTE: It's not acceptable to join on the Hired field, as that would not be guaranteed to be accurate.
A subselect would do the job and would handle the case where more than one person was hired in the same department on the same day:
SELECT c.DepartmentId, c.Name, c.Hired from Customer c,
(SELECT DepartmentId, MAX(Hired) as MaxHired
FROM Customer
GROUP BY DepartmentId) as sub
WHERE c.DepartmentId = sub.DepartmentId AND c.Hired = sub.MaxHired
Standard Sql:
select *
from Customer C
where exists
(
-- Linq to Sql put NULL instead ;-)
-- In fact, you can even put 1/0 here and would not cause division by zero error
-- An RDBMS do not parse the select clause of correlated subquery
SELECT NULL
FROM Customer
where c.DepartmentId = DepartmentId
GROUP BY DepartmentId
having c.Hired = MAX(Hired)
)
If Sql Server happens to support tuple testing, this is the most succint:
select *
from Customer
where (DepartmentId, Hired) in
(select DepartmentId, MAX(Hired)
from Customer
group by DepartmentId)
SELECT a.*
FROM Customer AS a
JOIN
(SELECT DepartmentId, MAX(Hired) AS Hired
FROM Customer GROUP BY DepartmentId) AS b
USING (DepartmentId,Hired);