I have been able to get the total value of the two values but I'm not sure how to get the max. So when I enter this code, I have 4 row with the correct values but I just want to display the row with the maximum value out of those 4 rows.
SELECT AC.ACTID, SUM(AC.HOURS_WORKED * AL.HOURLYRATE) TOTAL
FROM ACTION AC
INNER JOIN ALLOCATION AL
ON AC.ACTID = AL.ACTID
INNER JOIN EMPLOYEE E
ON E.EMPID = AL.EMPID
GROUP BY AC.ACTID
I have to also put in EMPID but I'm not worried about that because that part is fine. Also this is SQL code.
You are showing actions with their cumulated costs.
According to your query the action table contains hours_worked and this value applies to every single employee involved. E.g. with hours_worked = 5 and three employees on that action, there were 15 hours worked.
Then there is the allocation table allowing many employees to work on one action on one hand and one employee to participate on many actions on the other (m:n relation). The employees are thus grouped per action. Say, in the example of three employees, one is allocated with an hourlyrate of 100 and the other two are allocated with an hourlyrate of 200. Then you have a total of 1 * 5 * 100 + 2 * 5 * 200 = 2500.
You are selecting many actions and you only want to show the top one(s) according to the calculated totals. If you have four actions for instance with the totals 1000, 2000, 2500, and again 2500, you want to show the two actions with 2500.
In Oracle (and standard SQL for that matter), you use FETCH FIRST ROW WITH TIES for that:
SELECT
ac.actid,
SUM(ac.hours_worked * al.hourlyrate) AS total
FROM action ac
INNER JOIN allocation al ON ac.actid = al.actid
INNER JOIN employee e ON e.empid = al.empid
GROUP BY ac.actid
ORDER BY total DESC
FETCH FIRST ROW WITH TIES;
As there are multiple employees involved per action, you'll have to create a string with their list, if you want to show them with the action. Use LISTAGG for this.
Related
This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.
Is there any way of joining the result of a case statement with a reference table without creating a CTE, ect.
Result AFTER CASE statement:
ID Name Bonus Level (this is the result of a CASE statement)
01 John A
02 Jim B
01 John B
03 Jake C
Reference table
A 10%
B 20%
C 30%
I want to then get the % next to each employee, then the max %age using the MAX function and grouping by ID, then link it back again to the reference so that each employee has the single correct (highest) bonus level next to their name. (This is a totally fictitious scenario, but very similar to what I am looking for).
Just need help with joining the result of the CASE statement with the reference table.
Thanks in advance.
In place of a temporary value as the result of the case statement, you could use a select statement from the reference table.
So if your case statement looks like:
case when variable1=value then bonuslevel =A
Then, replacing it like this might help
case when variable1=value then (select percentage from ReferenceTable where variable2InReferenceTable=A)
Don't know if I am overly simplifying, but based on the results of your case result query, why not just join that to the reference table, and do a max grouped by ID/Name. Since the ID and persons name wont change anyhow since they are the same person, you are just getting the max you want. To complete the Bonus level, rejoin just that portion after the max percentage determined for the person.
select
lvl1.ID,
lvl1.Name,
lvl1.FinalBonus,
rt2.BonusLvl
from
( select
PQ.ID,
PQ.Name,
max( rt.PcntBonus ) as FinalBonus
from
(however you
got your
data query ) PQ
JOIN RefTbl rt
on PQ.BonusLvl = rt.BonusLvl
) lvl1
JOIN RefTbl rt2
on lvl1.FinalBonus = rt2.PcntBonus
Since the Bonus levels (A,B,C) do not guarantee corresponding % levels (10,20,30), I did it this way... OTHERWISE, you could have just used max() on both the bonus level and percent. But what if your bonus levels were listed as something like
Limited 10%
Aggressive 20%
Ace 30%
You could see that a max of the level above would have "Limited", but the max % = 30 is associated with an "Ace" sales rep... Get the 30% first, then see what the label that matched that is.
I have tried to fetch a record that will return me with the doctor's ID and the total number of all the prescriptions they have given.
SELECT doc.DID, COUNT(pr.DID)
FROM DOCTOR doc, PRESCRIPTION pr
WHERE doc.DID = pr.DID
GROUP BY doc.DID;
By using this statement, I am able to receive the information as long as there is at least one prescription made by a doctor. This is how my results looks like
DID COUNT(PR.DID)
-------------------- -------------
3292848 1
3292885 10
3293063 10
3332949 15
3332950 2
But I want it to display such that even doctors that has not prescribed before will be shown in the record with a count of 0
DID COUNT(PR.DID)
-------------------- -------------
3292848 1
3292885 10
3293042 0
3293063 10
3332949 15
3332950 2
334021 0
First of all, please avoid using old join syntax. Use proper JOIN syntax.
Now here you need a LEFT JOIN which would give you everything from first table and matching records from second table. For non matching records, you will get null, which you can utilize in where or select clause.
SELECT doc.DID, COUNT(pr.DID)
FROM DOCTOR doc
left join
PRESCRIPTION pr
on doc.DID = pr.DID
GROUP BY doc.DID;
I have three tables: Achievements, Characters, and Character_Achievements table that store's the ID's of completed achievements and user id. I am looking to get each category, total amount of points possible and also the amount completed.
I am able to get each category and the amount of points possible but I am unsuccessful at retrieving the completed count as well.
I currently use this to get each category and the amount of points possible
SELECT achievements.category, SUM(points) AS Total
FROM achievements
GROUP BY achievements.category ORDER BY achievements._id asc
I get these results.
Category Total
Operations 50
Events 25
I can also get the amount of points completed
SELECT achievements.category, SUM(points) AS Completed
FROM achievements
LEFT JOIN character_achievements
ON character_achievements.achievements_id = achievements._id
LEFT JOIN character
ON character_achievements.character_id = character._id
WHERE character._id = '1'
which returns this but only the categories that are completed. How do I combine these two queries together.
Category Completed
Operations 50
Events 25
I've tried UNION but it does not return the results I need.
Here are my example tables
Achievements Table
Category Title Points
Operations Epic Enemies 25
Operations Explosive Conflict 25
Events Bounty Contract 25
Character_Achievements Table
Character Character_id Achievements_id
Operations 1 1
Events 1 3
The results I'm looking for would like this.
Results
Category Completed Total
Operations 25 50
Events 25 25
I am able
If I'm understanding your question correctly, you can use SUM with CASE:
SELECT a.category,
SUM(CASE WHEN ca.achievements_id is not null then points end) AS Completed,
SUM(points) Total
FROM achievements a
LEFT JOIN character_achievements ca
ON ca.achievements_id = a._id
GROUP BY a.category
ORDER BY a._id asc
I have 2 query which both aims to select all batchNo that follows 3 conditions:
ClaimStatus must be 95 or 90
CreatedBy = ProviderLink
The minimum dateUpdate should be from 3pm yesterday until when this query was run
Query 1: Outputs 940 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage,
DateUploaded = MIN(csa.DateUpdated)
FROM
Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
INNER JOIN ClaimStatusAudit csa WITH(NOLOCK) on csa.CLAIMNO = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
GROUP BY bh.BatchNo, bh.Coverage
HAVING MIN(CSA.DateUpdated) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Query 2: Outputs 1314 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage
FROM Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
AND (SELECT MIN(DATEUPDATED) FROM CLAIMSTATUSAUDIT WITH(NOLOCK)WHERE CLAIMNO = ci.ClaimNo) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Though both got the same logic.. they output different number of rows... I would like to know which among the two is more accurate...
BTW.. Both outputs follow the 3 given conditions..
Your assumption is wrong. These two queries are not employing the same logic, simply because of the order in which each clause is evaluated. Clauses are evaluated in the following order (see here for the full article):
From
Where
Group By
Having
Select
Order By
With that detail out of the way, let's analyze why these two queries return a different number of rows.
The reason you're returning a different number of rows is because of when you are filtering for a date prior to after 3pm today.
In Query 1, you're selecting all Batch Numbers and Coverages that meet two conditions:
1. have corresponding records in all joined tables
2. have the desired claim status and were created by "ProviderLink"
You get this list of records once the From, Where, and Group by clauses have been executed.
You are then running the aggregate calculation (Min) on that set of data, pulling the minimum DateUpdated, yet you have not yet put any restriction on how the DateUpdated should be limited. So when you then group your data and filter the groups using the Having clause, you're filtering out all records that meet criteria from numbers 1 and 2 above and also had a DateUpdated prior to 3pm today. Let's look at an example.
Record 1 has a BatchNo 123 and Coverage A and was last updated on 4/4/2014 12:00:00.000
Record 2 has a BatchNo 123 and Coverage A and was last updated today at 5/1/2014 3:01:00.000
Assuming Records 1 & 2 have corresponding records in all joined tables, query 1 will pull back the distinct BatchNo and Coverage (123 & A, respectively) and find the minimum DateUpdated which is 4/4/2014 12:00:00.000. Then, once grouped, your Having clause will say the DateUpdated is not greater than today at 3pm, so it will filter the grouped records out.
Query 2, on the other hand, is taking a different approach. It will see Records 1 and 2 as the same in terms of BatchNo & Coverage because those values are identical. However, in the where clause (i.e., the initial filtering process), it's only looking for records where the minimum DateUpdated is greater than today at 3pm, so it's finding Record 2, and returning it in the dataset.
I think you will find that is the case with the 374 missing records from Dataset 1.
All that said, and with the understanding that we cannot tell you which dataset is better, you'll find that Query 1 will only show groups of distinct BatchNos & Coverages where the minimum DateUpdated among any of the records falling into that group was last updated after 3pm today. This means Query 1 is returning only BatchNos and Coverages which contain very new records.
Query 2 is returning any distinct BatchNo & Coverage groupings where any record within its group was last updated after 3pm today. So which one is right for you?