Query results half right after joining two tables - sql

The following query resulted in correct results only for the inner query (post_engagement, website purchases) while all other numbers were incorrectly increased manyfold. Any ideas? Thanks.
Schema of the two tables:
Favorite_ads (id, campaign_id, campaign_name, objective, impressions, spend)
Actions (id, ads_id, action_type, value)
SELECT
f.campaign_id,
f.campaign_name,
f.objective,
SUM(f.impressions) AS Impressions,
SUM(f.spend) AS Spend,
SUM(a.post_engagement) AS "Post Engagement",
SUM(a.website_purchases) AS "Website Purchases"
FROM
favorite_ads f
LEFT JOIN (
SELECT
ads_id,
CASE WHEN action_type = 'post_engagement' THEN SUM(value) END AS
post_engagement,
CASE WHEN action_type = 'offsite_conversion.fb_pixel_purchase' THEN SUM(value) END AS website_purchases
FROM Actions a
GROUP BY ads_id, action_type
) a ON f.id = a.ads_id
WHERE date_trunc('month',f.date_start) = '2018-04-01 00:00:00' AND
date_trunc('month',f.date_stop) = '2018-04-01 00:00:00' --only get campaigns
that ran in April, 2018
GROUP BY f.campaign_id, campaign_name, objective
Order by campaign_id

Without knowing the actual table structure, constraints, dependencies and data, it's hard to tell, what the issue may be.
You already have some leads in the comments, which you should consider first.
For example you wrote, that this sub-query returns correct results:
SELECT ads_id,
CASE
WHEN action_type = 'post_engagement'
THEN SUM(value)
END AS post_engagement,
CASE
WHEN action_type = 'offsite_conversion.fb_pixel_purchase'
THEN SUM(value)
END AS website_purchases
FROM Actions a
GROUP BY ads_id, action_type
Is this one also giving correct results:
SELECT ads_id,
SUM(
CASE
WHEN action_type = 'post_engagement'
THEN value
END
) AS post_engagement,
SUM(
CASE
WHEN action_type = 'offsite_conversion.fb_pixel_purchase'
THEN value
END
) AS website_purchases
FROM Actions
GROUP BY ads_id
If so, then try replacing your sub-query with that one.
If you still have a problem, then I'd investigate if your join condition is correct, as it would seem, that for a campaign (campaign_id) you could probably have multiple entries with the same id, which will multiply the sub-query results - that depends on what is actually the primary key (or unique constraint) in the favorite_ads.

Related

SQL Aggregating data based on condition containing the key fields for aggregation

I am new to SQL (Oracle SQL if it makes a difference) but it so happens I have to use it. I need to aggregate data by some key fields (CustId, AppId). I also have some AppDate, PDate and Amount.Initial data
What I need to do is aggregate but for each key field combination I need to aggregate the data from other rows with the following conditions:
CustID = CustID aka take only information for this custID
AppId != AppId aka take only information for application different than the current one.
AppDate >= PDate aka take only information available at time of application
From a quick look at SQL language my approach was the use of:
select CustId, AppId, Sum(case when
custid=custid and Appid!=Appid and AppDate >= PDate then Amount else 0 end) as SumAmount
From Table
Group by CustId AppId
Unfortunately, the result I get are all 0 for SumAmount. My guess it is because of the last 2 conditions.
The results I want to get from the example table are: Results
Also, I would probably add condition that AppDate - AppDate of other AppID > 6months exclude those from the aggregated amounts.
P.S. I am really sorry for the substandard formatting and probably bad code. I am not really experienced on how to do it.
Edit: I've found a solution as follows:
select distinct a.CustId, a.AppId, a.AppDate, b.PDate, b.Amount
from table a
inner join (select CustId, AppId, Amount, PDate from Table) b
on a.CustId = b.CustId and a.AppId != b.AppId
where a.AppDate >= b.PDate
After that I aggregate by AppId summing the amount.
Basically, I just append the same information based on a condition and since I get a lot of full duplicates I deduplicate with distinct.
I've found a solution as follows:
select distinct a.CustId, a.AppId, a.AppDate, b.PDate, b.Amount
from table a
inner join (select CustId, AppId, Amount, PDate from Table) b
on a.CustId = b.CustId and a.AppId != b.AppId
where a.AppDate >= b.PDate
After that I aggregate by AppId summing the amount.
Basically, I just append the same information based on a condition and since I get a lot of full duplicates I deduplicate with distinct.

1407. Top Travellers ending in runtime error

I attempted the 1407. Top Travellers. But am struggling with my Oracle query below, 'Runtime error'. A little too tired to understand why. Any idea where I am going wrong? Have been rusty with SQL of late. :(
select name as name,
case when rides.distance is null then 0 else sum(rides.distance) end as travelled_distance
from users
left join rides
on users.id = rides.user_id
group by rides.users_id
order by travelled_distance desc, name;
As commented, is another way round:
select
name,
sum(case when rides.distance is null then 0 else rides.distance end) as travelled_distance
from users left join rides on users.id = rides.user_id
group by name
order by travelled_distance desc, name;
Or, simpler, use the nvl function:
select
name,
sum(nvl(rides.distance, 0)) as travelled_distance
from ...
Though, a few more objections:
you should use table aliases (as they simplify query and improve readability)
moreover, you should precede all column names with table aliases; in your case, you failed to do so for the name column. It probably belongs to the users table, but we can't tell for sure as we don't have your data model nor access to your database
group by clause should contain column(s) that aren't aggregated. In your query, that's the name column. You can put rides.users_id into that clause, but you must put name in there
The below solution works. Thanks to one of the Discussion posts at leetcode I could figure out the issue:
select r.name,
case when x.td is null
then 0
else x.td
end travelled_distance
from Users r
left join
(
select user_id, sum(distance) td
from Rides
group by user_id
) x
on r.id = x.user_id
order by travelled_distance desc, r.name;

TSQL "where ... group by ..." issue that needs solution like "having ..."

I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.
What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.
I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE
The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.
It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.

Clean up 'duplicate' data while preserving most recent entry

I want to display each crew member, basic info, and the most recent start date from their contracts. With my basic query, it returns a row for each contract, duplicating the basic info with a distinct start and end date.
I only need one row per person, with the latest start date (or null if they have never yet had a start date).
I have limited understanding of group by and partition functions. Queries I have reverse engineered for similar date use partition and create temp tables where they select from. Ultimately I could reuse that but it seems more convoluted than what we need.
select
Case when P01.EMPLOYMENTENDDATE < getdate() then 'Y'
else ''
end as "Deactivate",
concat(p01.FIRSTNAME,' ',p01.MIDDLENAME) as "First and Middle",
p01.LASTNAME,
p01.PIN,
(select top 1 TELENO FROM PW001P0T WHERE PIN = P01.PIN and TELETYPE = 6 ORDER BY TELEPRIORITY) as "EmailAddress",
org.NAME AS Vessel,
case
WHEN c02.CODECATEGORY= '20' then 'MARINE'
WHEN c02.CODECATEGORY= '10' then 'MARINE'
ELSE 'HOTEL' end as "Department",
c02.name as RankName,
c02.Alternative RankCode,
convert(varchar, ACT.DATEFROM,101) EmbarkDate,
convert(varchar,(case when ACT.DATEFROM is null then p03.TODATEESTIMATED else ACT.DATEFROM end),101) DebarkDate
FROM PW001P01 p01
JOIN PW001P03 p03
ON p03.PIN = p01.PIN
LEFT JOIN PW001C02 c02
ON c02.CODE = p03.RANK
/*LEFT JOIN PW001C02 CCIRankTbl
ON CCIRankTbl.CODE = p01.RANK*/
LEFT JOIN PWORG org
ON org.NUMORGID = dbo.ad_scanorgtree(p03.NUMORGID, 3)
LEFT JOIN PWORGVESACT ACT
ON ACT.numorgid=dbo.ad_scanorgtree(p03.numorgid,3)
where P01.EMPLOYMENTENDDATE > getdate()-10 or P01.EMPLOYMENTENDDATE is null
I only need to show one row per column. The first 5 columns will be the same always. The last columns depend on contract, and we just need data from the most recent one.
<table><tbody><tr><th>Deactivate</th><th>First and Middle</th><th>Lastname</th><th>PIN</th><th>Email</th><th>Vessel</th><th>Department</th><th>Rank</th><th>RankCode</th><th>Embark</th><th>Debark</th></tr><tr><td> </td><td>Martin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship1</td><td>Marine</td><td>ViceCaptain</td><td>VICE</td><td>9/1/2008</td><td>9/20/2008</td></tr><tr><td> </td><td>Matin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship2</td><td>Marine</td><td>Captain</td><td>CAP</td><td>12/1/2008</td><td>12/20/2008</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship1</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>5/1/2009</td><td>8/1/2009</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship3</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>10/1/2010</td><td>12/20/2010</td></tr></tbody></table>
Change your query to a SELECT DISTINCT on the main query and use a sub-select for DebarkDate column:
(SELECT TOP 1 A.DATEFROM FROM PWORGVESACT A WHERE A.numorgid = ACT.numorgid ORDER BY A.DATEFROM DESC) AS DebarkDate
You can do whatever conversions on the date you need to from the result of that sub-query.

Retrieve the total number of orders made and the number of orders for which payment has been done

Retrieve the total number of orders made and the number of orders for which payment has been done(delivered).
TABLE ORDER
------------------------------------------------------
ORDERID QUOTATIONID STATUS
----------------------------------------------------
Q1001 Q1002 Delivered
O1002 Q1006 Ordered
O1003 Q1003 Delivered
O1004 Q1006 Delivered
O1005 Q1002 Delivered
O1006 Q1008 Delivered
O1007 Q1009 Ordered
O1008 Q1013 Ordered
Unable to get the total number of orderid i.e 8
select count(orderid) as "TOTALORDERSCOUNT",count(Status) as "PAIDORDERSCOUNT"
from orders
where status ='Delivered'
The expected output is
TOTALORDERDSCOUNT PAIDORDERSCOUNT
8 5
I think you want conditional aggregation:
select count(*) as TOTALORDERSCOUNT,
sum(case when status = 'Delivered' then 1 else 0 end) as PAIDORDERSCOUNT
from orders;
Try this-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
SUM(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE 0 END ) PAIDORDERSCOUNT
FROM ORDER
You can also use COUNT in place of SUM as below-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
COUNT(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE NULL END ) PAIDORDERSCOUNT
FROM ORDER
you could use cross join between the two count
select count(orderid) as TOTALORDERSCOUNT, t.PAIDORDERSCOUNT
from orders
cross join (
select count(Status) PAIDORDERSCOUNT
from orders where Status ='Delivered'
) t
What I've used in the past for summarizing totals is
SELECT
count(*) 'Total Orders',
sum( iif( orders.STATUS = 'Delivered', 1, 0 ) ) 'Total Paid Orders'
FROM orders
I personally don't like using CASE WHEN if I don't have to. This logic may look like its a little too much for a simple summation of totals, but it allows for more conditions to be added quite easily and also just involves less typing, at least for what I use this regularly for.
Using the iif( statement to set up the conditional where you're looking for all rows in the STATUS column with the value 'Delivered', with this set up, if the status is 'Delivered', then it marks it stores a value of 1 for that order, and if the status is either 'Ordered' or any other value, including null values or if you ever need a criteria such as 'Pending', it would still give an accurate count.
Then, nesting this within the 'sum' function totals all of the 1's denoted from your matched values. I use this method regularly for report querying when there's a need for many conditions to be narrowed down to a summed value. This also opens up a lot of options in the case you need to join tables in your FROM statement.
Also just out of personal preference and depending on which SQL environment you're using this in, I tend to only use AS statements for renaming when absolutely necessary and instead just denote the column name with a single quoted string. Does the same thing, but that's just personal preference.
As stated before, this may seem like it's doing too much, but for me, good SQL allows for easy change to conditions without having to rewrite an entire query.
EDIT** I forgot to mention using count(*) only works if the orderid's are all unique values. Generally speaking for an orders table, orderid is an expected unique value, but just wanted to add that in as a side note.
SELECT DISTINCT COUNT(ORDERID) AS [TOTALORDERSCOUNT],
COUNT(CASE WHEN STATUS = 'ORDERED' THEN ORDERID ELSE NULL END) AS [PAIDORDERCOUNT]
FROM ORDERS
TotalOrdersCount will count all distinct values in orderID while the case statement on PaidOrderCount will filter out any that do not have the desired Status.