full outer join matching on multiple criteria

full outer join matching on multiple criteria - sql

Two tables I am trying to join:
tableA: this table contains all of the leads we won by BID and by SOURCE and by DATE.
Bid Amount = tableA.price, Source = tableA.lead_source_id, Date = tableA.time
tableB: this table contains all of the leads we lost by BID and by SOURCE and by DATE.
Bid Amount = tableB.cost, Source = tableB.lead_source_id, Date = tableB.bid_at
I would like to be able to return the number of bids we WON and LOST by BID and by SOURCE and by DATE. Often times there are only records in one table or the other (we won all bids or lost all bids), so it appears the outer join is needed.
Ideal output would be grouped by lead_source_id, bid amount, and time like such:
Lead Source ID, Date, Bid Amount, Won, Lost
1, 1/1/2015, $20, 5, 0
1, 1/1/2015, $25, 0, 9
5, 1/1/2015, $30, 1, 1
10, 1/2/2015, $50, 0, 1
10, 1/2/2015, $55, 1, 0

Try the following query.
;WITH tmp AS (
SELECT lead_source_id, [time] AS [Date], price, COUNT(*) AS won, 0 AS lost
FROM tableA
GROUP BY lead_source_id, [time], price
UNION ALL
SELECT lead_source_id, bid_at AS [Date], cost AS price, 0 AS won, COUNT(*) AS lost
FROM tableB
GROUP BY lead_source_id, bid_at, cost)
SELECT lead_source_id AS [Lead Source ID], [Date], price AS [Bid Amount], SUM(won) AS Won , SUM(lost) AS Lost
FROM tmp
GROUP BY lead_source_id, [Date], price
ORDER BY lead_source_id, [Date]
The common table expression (tmp) calculates number of won/lost bids and the outer select groups your data by id, date and bid amounts.

Related

Match a row with a date that is between 2 dates in another table

I am writing a query that calculates whether the correct booking fee was charged for all bookings. The structure of the tables are below. Please note that due to the environment I am working in I cannot create any stored procedures or functions to help me.
BOOKINGS
Date date,
CustomerId int,
BookingTypeId int,
FeeCharged money
FEES
BookingTypeId int,
FeeAmount money,
EffectiveFrom date
Given the example data set below how do I query the data to ensure the fee charged per booking matches the correct fee in the fees table based on the effective from date:

try this
WITH
Fees (BookingTypeId, FeeAmount, EffectiveFrom) AS (
SELECT 1, 50, '2019-06-01'
UNION ALL SELECT 1, 55, '2019-09-01'
),
Bookings (Date, CustomerId, BookingTypeId, FeeCharged) AS (
SELECT '2019-07-01', 1, 1, 50
UNION ALL SELECT '2019-07-02', 1, 1, 50
UNION ALL SELECT '2019-10-01', 1, 1, 150
),
BookingFeeUpdate as (
SELECT
Bookings.BookingTypeId,
Bookings.Date,
MAX(EffectiveFrom) LastFeeUpdate
FROM
Bookings
LEFT JOIN Fees ON
Fees.BookingTypeId = Bookings.BookingTypeId
AND Fees.EffectiveFrom <= Bookings.Date
GROUP BY
Bookings.BookingTypeId,
Bookings.Date
)
SELECT
Bookings.*, Fees.FeeAmount as CorrectFee, Fees.EffectiveFrom, case Fees.FeeAmount when FeeCharged then 1 else 0 end Correct
FROM
Bookings
INNER JOIN BookingFeeUpdate ON
BookingFeeUpdate.BookingTypeId = Bookings.BookingTypeId
AND BookingFeeUpdate.Date = Bookings.Date
LEFT JOIN Fees ON
Fees.BookingTypeId = Bookings.BookingTypeId
AND Fees.EffectiveFrom = BookingFeeUpdate.LastFeeUpdate
--WHERE Fees.FeeAmount <> Bookings.FeeCharged

Using a date field for matching SQL Query

I'm having a bit of an issue wrapping my head around the logic of this changing dimension. I would like to associate these two tables below. I need to match the Cost - Period fact table to the cost dimension based on the Id and the effective date.
As you can see - if the month and year field is greater than the effective date of its associated Cost dimension, it should adopt that value. Once a new Effective Date is entered into the dimension, it should use that value for any period greater than said date going forward.
EDIT: I apologize for the lack of detail but the Cost Dimension will actually have a unique Index value and the changing fields to reference for the matching would be Resource, Project, Cost. I tried to match the query you provided with my fields, but I'm getting the incorrect output.
FYI: Naming convention change: EngagementId is Id, Resource is ConsultantId, and Project is ProjectId
I've changed the images below and here is my query
,_cte(HoursWorked, HoursBilled, Month, Year, EngagementId, ConsultantId, ConsultantName, ProjectId, ProjectName, ProjectRetainer, RoleId, Role, Rate, ConsultantRetainer, Salary, amount, EffectiveDate)
as
(
select sum(t.Duration), 0, Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantId, c.ConsultantName, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, 0, c.EffectiveDate
from timesheet t
left join Engagement c on t.EngagementId = c.EngagementId and Month(c.EffectiveDate) = Month(t.EndDate) and Year(c.EffectiveDate) = Year(t.EndDate)
group by Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantName, c.ConsultantId, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, c.EffectiveDate
)
select * from _cte where EffectiveDate is not null
union
select _cte.HoursWorked, _cte.HoursBilled, _cte.Month, _cte.Year, _cte.EngagementId, _cte.ConsultantId, _cte.ConsultantName, _cte.ProjectId, _Cte.ProjectName, _cte.ProjectRetainer, _cte.RoleId, _cte.Role, sub.Rate, _cte.ConsultantRetainer,_cte.Salary, _cte.amount, sub.EffectiveDate
from _cte
outer apply (
select top 1 EffectiveDate, Rate
from Engagement e
where e.ConsultantId = _cte.ConsultantId and e.ProjectId = _cte.ProjectId and e.RoleId = _cte.RoleId
and Month(e.EffectiveDate) < _cte.Month and Year(e.EffectiveDate) < _cte.Year
order by EffectiveDate desc
) sub
where _cte.EffectiveDate is null
Example:
I'm struggling with writing the query that goes along with this. At first I attempted to partition by greatest date. However, when I executed the join I got the highest effective date for every single period (even those prior to the effective date).
Is this something that can be accomplished in a query or should I be focusing on incremental updates of the destination table so that any effective date / time period in the past is left alone?
Any tips would be great!
Thanks,
Channing

Try this one:
; with _CTE as(
select p.* , c.EffectiveDate, c.Cost
from period p
left join CostDimension c on p.id = c.id and p.Month = DATEPART(month, c.EffectiveDate) and p.year = DATEPART (year, EffectiveDate)
)
select * from _CTE Where EffectiveDate is not null
Union
select _CTE.id, _CTE.Month, _CTE.Year, sub.EffectiveDate, sub.Cost
from _CTE
outer apply (select top 1 EffectiveDate, Cost
from CostDimension as cd
where cd.Id = _CTE.id and cd.EffectiveDate < DATETIMEFROMPARTS(_CTE.Year, _CTE.Month, 1, 0, 0, 0, 0)
order by EffectiveDate desc
) sub
where _Cte.EffectiveDate is null

How can I add cumulative sum column?

I use SqlExpress
Following is the query using which I get the attached result.
SELECT ReceiptId, Date, Amount, Fine, [Transaction]
FROM (
SELECT ReceiptId, Date, Amount, 'DR' AS [Transaction]
FROM ReceiptCRDR
WHERE (Amount > 0)
UNION ALL
SELECT ReceiptId, Date, Amount, 'CR' AS [Transaction]
FROM ReceiptCR
WHERE (Amount > 0)
UNION ALL
SELECT strInvoiceNo AS ReceiptId, CONVERT(datetime, dtInvoiceDt, 103) AS Date, floatTotal AS Amount, 'DR' AS [Transaction]
FROM tblSellDetails
) AS t
ORDER BY Date
Result
want a new column which would show balance amount.
For example. 1 Row should show -2500, 2nd should -3900, 3rd should -700 and so on.
basically, it requires previous row' Account column's data and carry out calculation based on transaction type.
Sample Result

Well, that looks like SQL-Server , if you are using 2012+ , then use SUM() OVER() :
SELECT t.*,
SUM(CASE WHEN t.transactionType = 'DR'
THEN t.amount*-1
ELSE t.amount END)
OVER(PARTITION BY t.date ORDER BY t.receiptId,t.TransactionType DESC) as Cumulative_Col
FROM (YourQuery Here) t
This will SUM the value when its CR and the value*-1 when its DR
Right now I grouped by date, meaning each day will recalculate this column, if you want it for all time, replace the OVER() with this:
OVER(ORDER BY t.date,t.receiptId,t.TransactionType DESC) as Cumulative_Col
Also, I didn't understand why in the same date, for the same ReceiptId DR is calculated before CR , I've add it to the order by but if thats not what you want then explain the logic better.

Summing a column over a date range in a CTE?

I'm trying to sum a certain column over a certain date range. The kicker is that I want this to be a CTE, because I'll have to use it multiple times as part of a larger query. Since it's a CTE, it has to have the date column as well as the sum and ID columns, meaning I have to group by date AND ID. That will cause my results to be grouped by ID and date, giving me not a single sum over the date range, but a bunch of sums, one for each day.
To make it simple, say we have:
create table orders (
id int primary key,
itemID int foreign key references items.id,
datePlaced datetime,
salesRep int foreign key references salesReps.id,
price int,
amountShipped int);
Now, we want to get the total money a given sales rep made during a fiscal year, broken down by item. That is, ignoring the fiscal year bit:
select itemName, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName
Simple enough. But when you add anything else, even the price, the query spits out way more rows than you wanted.
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName, price
Now, each group is (name, price) instead of just (name). This is kind of sudocode, but in my database, just this change causes my result set to jump from 13 to 32 rows. Add to that the date range, and you really have a problem:
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between 150101 and 151231
group by itemName, price
This is identical to the last example. The trouble is making it a CTE:
with totals as (
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped, orderDate as startDate, orderDate as endDate
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between startDate and endDate
group by itemName, price, startDate, endDate
)
select totals_2015.itemName as itemName_2015, totals_2015.price as price_2015, ...
totals_2016.itemName as itemName_2016, ...
from (
select * from totals
where startDate = 150101 and endDate = 151231
) totals_2015
join (
select *
from totals
where startDate = 160101 and endDate = 160412
) totals_2016
on totals_2015.itemName = totals_2016.itemName
Now the grouping in the CTE is way off, more than adding the price made it. I've thought about breaking the price query into its own subquery inside the CTE, but I can't escape needing to group by the dates in order to get the date range. Can anyone see a way around this? I hope I've made things clear enough. This is running against an IBM iSeries machine. Thank you!

Depending on what you are looking for, this might be a better approach:
select 'by sales rep' breakdown
, salesRep
, '' year
, sum(price * amountShipped) amount
from etc
group by salesRep
union
select 'by sales rep and year' breakdown
, salesRep
, convert(char(4),orderDate, 120) year
, sum(price * amountShipped) amount
from etc
group by salesRep, convert(char(4),orderDate, 120)
etc

When possible group by the id columns or foreign keys because the columns are indexed already you'll get faster results. This applies to any database.
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte order by rep
or more fancy
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte join reps on cte.rep = reps.rep order by sls desc

I eventually found a solution, and it doesn't need a CTE at all. I wanted the CTE to avoid code duplication, but this works almost as well. Here's a thread explaining summing conditionally that does exactly what I was looking for.

How to do a group by without having to pass all the columns from the select?

I have the following select, whose goal is to select all customers who had no sales since the day X, and also bringing the date of the last sale and the number of the sale:
select s.customerId, s.saleId, max (s.date) from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
This way it brings me the following:
19 | 300 | 26/09/2005
19 | 356 | 29/09/2005
27 | 842 | 10/05/2012
In another words, the first 2 lines are from the same customer (id 19), I wish to get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
By that logic, I should take off s.saleId from the "group by" clause, but if I do, of course, I get the error:
Invalid expression in the select list (not contained in either an
aggregate function or the GROUP BY clause)
I'm using Firebird 1.5
How can I do this?

GROUP BY summarizes data by aggregating a group of rows, returning one row per group. You're using the aggregate function max(), which will return the maximum value from one column for a group of rows.
Let's look at some data. I renamed the column you called "date".
create table sales (
customerId integer not null,
saleId integer not null,
saledate date not null
);
insert into sales values
(1, 10, '2013-05-13'),
(1, 11, '2013-05-14'),
(1, 12, '2013-05-14'),
(1, 13, '2013-05-17'),
(2, 20, '2013-05-11'),
(2, 21, '2013-05-16'),
(2, 31, '2013-05-17'),
(2, 32, '2013-03-01'),
(3, 33, '2013-05-14'),
(3, 35, '2013-05-14');
You said
In another words, the first 2 lines are from the same customer(id 19), i wish he'd get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
select s.customerId, max (s.saledate)
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId max
--
1 2013-05-14
2 2013-05-16
3 2013-05-14
What does that table mean? It means that the latest date on or before May 16 on which customer "1" bought something was May 14; the latest date on or before May 16 on which customer "2" bought something was May 16. If you use this derived table in joins, it will return predictable results with consistent meaning.
Now let's look at a slightly different query. MySQL permits this syntax, and returns the result set below.
select s.customerId, s.saleId, max(s.saledate) max_sale
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId saleId max_sale
--
1 10 2013-05-14
2 20 2013-05-16
3 33 2013-05-14
The sale with ID "10" didn't happen on May 14; it happened on May 13. This query has produced a falsehood. Joining this derived table with the table of sales transactions will compound the error.
That's why Firebird correctly raises an error. The solution is to drop saleId from the SELECT clause.
Now, having said all that, you can find the customers who have had no sales since May 16 like this.
select distinct customerId from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
And you can get the right customerId and the "right" saleId like this. (I say "right" saleId, because there could be more than one on the day in question. I just chose the max.)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join (select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId) max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join (select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')) no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
Personally, I find common table expressions make it easier for me to read SQL statements like that without getting lost in the SELECTs.
with no_sales as (
select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
),
max_dates as (
select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId
)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate

then you can use following query ..
EDIT changes made after comment by likeitlikeit for only one row per CustomerID even when we will have one case where we have multiple saleID for customer with certain condition -
select x.customerID, max(x.saleID), max(x.x_date) from (
select s.customerId, s.saleId, max (s.date) x_date from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
and max(s.date) = ( select max(s1.date)
from sales s1
where s1.customeId = s.customerId))x
group by x.customerID

You can Try Maxing the s.saleId (Max(s.saleId)) and removing it from the Group By clause

A subquery should do the job, I can't test it right now but it seems ok:
SELECT s.customerId, s.saleId, subq.maxdate
FROM sales AS s
INNER JOIN (SELECT customerId, MAX(date) AS maxdate
FROM sales
GROUP BY customerId, saleId
HAVING MAX(s.date) <= '05-16-2013'
) AS subq
ON s.customerId = subq.customerId AND s.date = subq.maxdate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas