Select aggregate ignores where cause - sql

I'm trying to transform an existing view into a format I can work with.
The view vw_temp_appHoursLastTwoEntries looks like this:
RowNumber | PersNr | Client | Localtion | Agent | Date | Calweek | Year
----------+--------+--------+-----------+-------+------------+---------+------
1 | 123 | 1 | 1 | ag-01 | 2020-01-01 | 1 | 2021
2 | 123 | 1 | 1 | ag-01 | 2020-01-03 | 1 | 2021
1 | 9999 | 1 | 4 | ag-01 | 2020-01-01 | 1 | 2021
2 | 9999 | 1 | 4 | ag-01 | 2020-01-07 | 1 | 2021
I need this data in a different format that would look like this:
PersNr | Client | Localtion | Agent | minDate | MaxDate | DateDiff | Calweek | Year
-------+--------+-----------+-------+------------+------------+----------+---------+-------
123 | 1 | 1 | ag-01 | 2020-01-01 | 2020-01-03 | 3 | 1 | 2021
9999 | 1 | 4 | ag-01 | 2020-01-01 | 2020-01-07 | 7 | 1 | 2021
in the original format, one person has only two rows (RowNumber 1 and 2). I'd like to match each column and have the min and max date as well as the difference in a new view.
my Code:
select a.persnr, a.client, a.location, a.agent, a.calweek, a.year,
max(a.date) as maxdate, min(b.date) as mindate
, DATEDIFF(day,a.date,b.date) as dDiff
from vw_temp_appHoursLastTwoEntries a
left join vw_temp_appHoursLastTwoEntries b on
a.persnr = b.persnr and a.client = b.client and
a.agent = b.agent and a.date = b.date
where a.date != b.date and DATEDIFF(day,a.date,b.date) != 0
or (a.date is not null and b.date is not null)
group by a.persnr, a.client, a.location, a.agent, a.calweek, a.year, DATEDIFF(day,a.date,b.date)
The issue:
I'm currently getting back values where it seems like the where cause does not take effect but I don't understand why.
a.date != b.date should not return rows where min- and maxdates are the same. The datediff does not return any other value then 0 even when the min- and maxdates are different.

Pretty sure this is what you want:
declare #Test table (RowNumber int, PersNr int, Client int, Localtion int, Agent varchar(5), [Date] date, Calweek int, [Year] int);
insert into #Test (RowNumber, PersNr, Client, Localtion, Agent, [Date], Calweek, [Year])
values
(1, 123, 1, 1, 'ag-01', '2020-01-01', 1, 2021),
(2, 123, 1, 1, 'ag-01', '2020-01-03', 1, 2021),
(1, 9999, 1, 4, 'ag-01', '2020-01-01', 1, 2021),
(2, 9999, 1, 4, 'ag-01', '2020-01-07', 1, 2021);
select a.PersNr, a.Client, a.Localtion, a.Agent, a.Calweek, a.[Year]
, max(a.[date]) as maxdate
, min(b.[date]) as mindate
, abs(datediff(day,a.[date],b.[date])) as dDiff
from #Test a
left join #Test b on
a.persnr = b.persnr and a.client = b.client and
a.agent = b.agent --and a.[date] = b.[date]
where (/*a.[date] != b.[date] and*/ datediff(day,a.[date],b.[date]) != 0)
and /* not OR */ (a.[date] is not null and b.[date] is not null)
group by a.persnr, a.client, a.Localtion, a.agent, a.calweek, a.[Year], abs(datediff(day,a.[date],b.[date]));
Returns:
PersNr
Client
Localtion
Agent
Calweek
Year
maxdate
mindate
dDiff
123
1
1
ag-01
1
2021
2020-01-03
2020-01-01
2
9999
1
4
ag-01
1
2021
2020-01-07
2020-01-01
6
As Giorgos points out, you don't want to join on a.[date] = b.[date] because your where clause specifically filters that condition out.
The main issue was using OR instead of AND, you want to ensure that both date values are not null so that is an AND condition.
I am also assuming that dDiff is for debugging purposes only, which as you have it kept the rows from grouping, but you can group them by using the absolute value (abs).
You also don't need to test a.[date] != b.[date] because that is already true by virtue of datediff(day,a.[date],b.[date]) != 0.
Please use this form of DDL+DML (or a temp table) in future to provide sample data for us to work with (it gives you a minimal reproducible example also which is never a bad thing, because I picked up a number of typos in your query while copying it).

Related

How to avoid duplicates when finding row changes?

I am working to extract data when a column changes between user IDs in a single table. I am able to pull the change as well as the previous row (ID) using a Select + Union query. For the previous row, I am getting more than one due to the parameters. Looking for suggestions on how to only retrieve a single previous row (ID). The query below is trying to retrieve a single row
| ID | Year | Event | ActivityDate | UserID
| 1 | 2020 | A | 2020-12-01 | xxx
| 1 | 2021 | A | 2021-03-01 | xyz
| 2 | 2020 | B | 2021-01-01 | xxx
| 1 | 2022 | C | 2021-10-01 | yyy
| 3 | 2021 | D | 2021-12-01 | xxx
Select d.ID, d.Year, d.Event, d.ActivityDate, d.UserID
from tableA d
where
d.year in ('2020','2021','2022')
and d.event <>
(select f.event
from tableA f
where
f.year in ('2020','2021','2022')
and d.id = f.id
and d.activityDate < f.activityDate
order by f.activityDate desc
fetch first 1 row only
)
;
I was hoping to retrieve the following
1, 2021, A, 2021-03-01, xyz
But I got
1, 2020, a, 2020-12-01, xxx
1, 2021, a, 2021-03-01, xyz
I think analytic functions will help you to your answer.
The row_number() will get you the last row in a series of duplicates.
The count(id) will allow you to limit yourself to combinations that have more than one row.
WITH
aset
AS
(SELECT d.id
, d.year
, d.event
, d.activitydate
, d.userid
, ROW_NUMBER ()
OVER (PARTITION BY id, event ORDER BY year DESC) AS rn
, COUNT (id) OVER (PARTITION BY id, event) AS n
FROM tablea d)
SELECT *
FROM aset
WHERE rn = 1 AND n > 1;

Joining two tables in SQL to get the SUM between two dates

I'm new to SQL and this website so apologies if anything is unclear.
Basically, I got two separate tables:
Table A:
CustomerID | PromoStart | PromoEnd
1 | 2020-05-01 | 2020-05-30
2 | 2020-06-01 | 2020-07-30
3 | 2020-07-01 | 2020-10-15
Table B:
CustomerID | Date | Payment |
1 | 2020-02-15 | 5000 |
1 | 2020-05-04 | 200 |
1 | 2020-05-28 | 100 |
1 | 2020-06-05 | 1000 |
2 | 2020-06-10 | 20 |
2 | 2020-07-25 | 500 |
2 | 2020-08-02 | 1000 |
3 | 2020-09-05 | 580 |
3 | 2020-12-01 | 20 |
What I want is to get the sum of all payments that fall between PromoStart and PromoEnd for each customer.
so the desired result would be :
CustomerID | TotalPayments
1 | 300
2 | 520
3 | 580
I guess this would involve an inner (left?) join and a where clause however I just can't figure it out.
A LATERAL join would do it:
SELECT a.customer_id, b.total_payments
FROM table_a a
LEFT JOIN LATERAL (
SELECT sum(payment) AS total_payments
FROM table_b
WHERE customer_id = a.customer_id
AND date BETWEEN a.promo_start AND a.promo_end
) b ON true;
This assumes inclusive lower and upper bounds, and that you want to include all rows from table_a, even without any payments in table_b.
You can use a correlated subquery or join with aggregation. The correlated subquery looks like:
select a.*,
(select sum(b.payment)
from b
where b.customerid = a.customerid and
b.date >= a.promostart and
b.date <= a.promoend
) as totalpayments
from a;
You don't mention your database, but this can take advantage of an index on b(customerid, date, payment). By avoiding the outer aggregation, this would often have better performance than an alternative using group by.
I hope I didn't overlook something important but it seems to me simple join on range matching condition should be sufficient:
with a (CustomerID , PromoStart , PromoEnd) as (values
(1 , date '2020-05-01' , date '2020-05-30'),
(2 , date '2020-06-01' , date '2020-07-30'),
(3 , date '2020-07-01' , date '2020-10-15')
), b (CustomerID , d , Payment ) as (values
(1 , date '2020-02-15' , 5000 ),
(1 , date '2020-05-04' , 200 ),
(1 , date '2020-05-28' , 100 ),
(1 , date '2020-06-05' , 1000 ),
(2 , date '2020-06-10' , 20 ),
(2 , date '2020-07-25' , 500 ),
(2 , date '2020-08-02' , 1000 ),
(3 , date '2020-09-05' , 580 ),
(3 , date '2020-12-01' , 20 )
)
select a.CustomerID, sum(b.Payment)
from a
join b on a.CustomerID = b.CustomerID and b.d between a.PromoStart and PromoEnd
group by a.CustomerID
Db fiddle here.

Do I need a CASE expression?

I have the following query which returns an organizations prior year (from the current year, so 2018) total wages.
SELECT
organization_id,
CASE
WHEN organization_id IN (SELECT text_1
FROM combo_table_detail
WHERE combo_table_id = 'wageAdjustment')
THEN SUM(ISNULL(component_value, 0)) + ISNULL(ctd.number_2, 0)
ELSE SUM(ISNULL(component_value, 0))
END AS "total_annual_wage",
MAX((begin_date)) AS "total_annual_wage_eff_date"
FROM
actual_pay_hours aph
LEFT JOIN
combo_table_detail ctd ON aph.organization_id = ctd.text_1
AND combo_table_id = 'wageAdjustment'
WHERE
organization_id = 'Org1'
AND component_name IN ('earnDef', 'earnings')
AND begin_date >= DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()) - 1, 0)
AND begin_date < DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()), 0)
GROUP BY
organization_id, ctd.number_2
However, I've run across an issue where some organizations either don't have a prior year (some only have 2019 wages), or their latest wages are from 2014. In both cases, the query returns blank values. This is due to the line
AND begin_date >= DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()) - 1, 0)
AND begin_date < DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()), 0)
The expected result should look something like this:
+-----------------+-------------------+----------------------------+
| organization_id | total_annual_wage | total_annual_wage_eff_date |
+-----------------+-------------------+----------------------------+
| Org1 | 50000 | 12/1/2018 |
+-----------------+-------------------+----------------------------+
But instead, it looks like this:
+-----------------+-------------------+----------------------------+
| organization_id | total_annual_wage | total_annual_wage_eff_date |
+-----------------+-------------------+----------------------------+
This issue seems to be the fact that in the aph table, some units have wages for the 2018 year, while others don't. Example:
SELECT DISTINCT
YEAR(BEGIN_DATE) AS [Begin Date for Org1]
FROM
ACTUAL_PAY_HOURS aph
WHERE
ORGANIZATION_ID = 'Org1'
Results:
+---------------------+
| Begin Date for Org1 |
+---------------------+
| 1988 |
| 1989 |
| 1990 |
| 1991 |
| 1992 |
| 1993 |
| 1994 |
| 2004 |
| 2005 |
| 2006 |
| 2007 |
| 2008 |
| 2009 |
| 2010 |
| 2011 |
| 2012 |
| 2013 |
| 2014 |
+---------------------+
Additionally, the total wages for the prior year are then being adjusted by a value in the CTD (combo_table_detail) table. The issue is that when there are no values for an exisitng year, nothing is returned. What I need is for the wage total to then be 0 - since there isn't any data for that year, but then the value from the CTD table is added.
So, if Org1 has no wages for 2018, it should come out like this:
+-----------------+-------------+-------------------+-------+
| organization_id | total_wages | combo_table_value | total |
+-----------------+-------------+-------------------+-------+
| Org1 | 0 | 25000 | 25000 |
+-----------------+-------------+-------------------+-------+
So my question is, what logic can I add to this query that will return a result when the Organization doesn't have any prior year wages, but will still be added to the CTD table resulting in a value being returned?
Assuming that 2 of the fields in the WHERE clause belong to the joined table.
Try with moving those criteria to the LEFT JOIN.
SELECT aph.organization_id,
ISNULL(SUM(ctd.component_value),0) +
(CASE
WHEN SUM(ctd.component_value) IS NULL
THEN SUM (
SELECT d.component_value
FROM combo_table_detail d
WHERE d.combo_table_id = 'wageAdjustment'
AND d.text_1 = aph.organization_id
)
ELSE 0
END) AS [total_annual_wage],
MAX(aph.begin_date) AS [total_annual_wage_eff_date]
FROM actual_pay_hours AS aph
LEFT JOIN combo_table_detail AS ctd
ON ctd.text_1 = aph.organization_id
AND ctd.combo_table_id = 'wageAdjustment'
AND aph.component_name IN ('earnDef', 'earnings')
AND aph.begin_date >= DATEFROMPARTS(YEAR(GETDATE())-1,1,1)
AND aph.begin_date < DATEFROMPARTS(YEAR(GETDATE()),1,1)
WHERE aph.organization_id = 'orgID'
GROUP BY aph.organization_id, ctd.number_2
SELECT
aph.organization_id,
MAX(aph.begin_date) AS total_annual_wage_eff_date,
--earn components sum
ISNULL( SUM(aph.component_value), 0)
+
--adjustment sum, if any
(
SELECT ISNULL(SUM(ctd.number_2), 0)
FROM combo_table_detail ctd
WHERE ctd.text_1 = aph.organization_id
AND ctd.combo_table_id = 'wageAdjustment'
) AS total_annual_wage
FROM actual_pay_hours AS aph
WHERE aph.component_name IN ( 'earnDef', 'earnings' )
AND aph.begin_date >= DATEFROMPARTS(YEAR(GETDATE())-1, 1, 1)
AND aph.begin_date < DATEFROMPARTS(YEAR(GETDATE()), 1, 1)
GROUP BY aph.organization_id

Sql query to partition and sum the records grouping by their bill number and Product code

Below are two tables where there are parent bill number like 1, 4 and 8. These parents bill references to nothing/NULL values. They are referenced by one or more child bill number. For eg parent bill 1 is referenced by child bill 2, 3 and 6.
Table B also has the bill no column with prod code with actual service (ST values) and associated service values (SV). SV are the additional cost to ST.
Same ST may occur in multiple bill numbers. Here Bill number is only unique.
For eg, ST1 are in bill number 1 and 8. Also same SV may reference same or different ST.
SV1, SV2 and SV3 are referencing to ST1 corresponding to bill no. 1 and SV2 and SV4 are referencing to ST2 corresponding to bill no.2.
How can we get below expected output?
Table A:
| bill no | ref |
+----------------------------------------+
| 1 | |
| 2 | 1 |
| 3 | 1 |
| 4 | |
| 5 | 4 |
| 6 | 1 |
| 7 | 4 |
| 8 | |
| 9 | 8 |
Table B:
| bill no | Prod code | cost |
+-----------------------------------------------------+
| 1 | ST1 | 10
| 2 | SV1 | 20
| 3 | SV2 | 30
| 4 | ST2 | 10
| 5 | SV2 | 20
| 6 | SV3 | 30
| 7 | SV4 | 40
| 8 | ST1 | 50
| 9 | SV1 | 10
Expected output:
| bill no | Prod code | ST_cost | SV1 | SV2 | SV3 |
+---------------------------------------------------------------------------------------------+
| 1 | ST1 | 10 | 20 | 30 | 30 |
| 4 | ST2 | 10 | 20 | 40 | |
| 8 | ST1 | 50 | 10 | | |
Here's a script that should get you there:
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.TableA;
CREATE TABLE dbo.TableA
(
BillNumber int NOT NULL PRIMARY KEY,
Reference int NULL
);
GO
INSERT dbo.TableA (BillNumber, Reference)
SELECT *
FROM (VALUES (1,NULL),
(2,1),
(3,1),
(4,NULL),
(5,4),
(6,1),
(7,4),
(8,NULL),
(9,8)) AS a(BillNumber, Reference);
GO
DROP TABLE IF EXISTS dbo.TableB;
CREATE TABLE dbo.TableB
(
BillNumber int NOT NULL PRIMARY KEY,
ProductCode varchar(10) NOT NULL,
Cost int NOT NULL
);
GO
INSERT dbo.TableB (BillNumber, ProductCode, Cost)
SELECT BillNumber, ProductCode, Cost
FROM (VALUES (1, 'ST1', 10),
(2, 'SV1', 20),
(3, 'SV2', 30),
(4, 'ST2', 10),
(5, 'SV2', 20),
(6, 'SV3', 30),
(7, 'SV4', 40),
(8, 'ST1', 50),
(9, 'SV1', 10)) AS b(BillNumber, ProductCode, Cost);
GO
WITH ParentBills
AS
(
SELECT b.BillNumber, b.ProductCode, b.Cost AS STCost
FROM dbo.TableB AS b
INNER JOIN dbo.TableA AS a
ON b.BillNumber = a.BillNumber
WHERE a.Reference IS NULL
),
SubBills
AS
(
SELECT pb.BillNumber, pb.ProductCode, pb.STCost,
b.ProductCode AS ChildProduct, b.Cost AS ChildCost
FROM ParentBills AS pb
INNER JOIN dbo.TableA AS a
ON a.Reference = pb.BillNumber
INNER JOIN dbo.TableB AS b
ON b.BillNumber = a.BillNumber
)
SELECT sb.BillNumber, sb.ProductCode, sb.STCost,
MAX(CASE WHEN sb.ChildProduct = 'SV1' THEN sb.ChildCost END) AS [SV1],
MAX(CASE WHEN sb.ChildProduct = 'SV2' THEN sb.ChildCost END) AS [SV2],
MAX(CASE WHEN sb.ChildProduct = 'SV3' THEN sb.ChildCost END) AS [SV3]
FROM SubBills AS sb
GROUP BY sb.BillNumber, sb.ProductCode, sb.STCost
ORDER BY sb.BillNumber;
You could write a function that creates you query based on your SV number.
And use "Execute Immediate" to execute the Query String and then "PIPE ROW" to generate the result.
Check This PIPE ROW EXAMPLE
I don't understand where the "SV1" value comes from on the second row.
But your problem is basically conditional aggregation:
with ab as (
select a.*, b.productcode, b.cost,
coalesce(a.reference, a.billnumber) as parent_billnumber
from a join
b
on b.billnumber = a.billnumber
)
select parent_billnumber,
max(case when reference is null then productcode end) as st,
sum(case when reference is null then cost end) as st_cost,
sum(case when productcode = 'SV1' then cost end) as sv1,
sum(case when productcode = 'SV2' then cost end) as sv2,
sum(case when productcode = 'SV3' then cost end) as sv3
from ab
group by parent_billnumber
order by parent_billnumber;
Here is a db<>fiddle.
Note this works because you have only one level of child relationships. If there are more, then recursive CTEs are needed. I would recommend that you ask a new question if this is possible.
The CTE doesn't actually add much to the query, so you can also write:
select coalesce(a.reference, a.billnumber) as parent_billnumber ,
max(case when a.reference is null then productcode end) as st,
sum(case when a.reference is null then b.cost end) as st_cost,
sum(case when b.productcode = 'SV1' then b.cost end) as sv1,
sum(case when b.productcode = 'SV2' then b.cost end) as sv2,
sum(case when b.productcode = 'SV3' then b.cost end) as sv3
from a join
b
on b.billnumber = a.billnumber
group by coalesce(a.reference, a.billnumber)
order by parent_billnumber;

How to query the previous record that is in another table?

I have a view that shows something like the following:
View VW
| ID | DT | VAL|
|----|------------|----|
| 1 | 2016-09-01 | 7 |
| 2 | 2016-08-01 | 5 |
| 3 | 2016-07-01 | 8 |
I have a table with historical date that has something like:
Table HIST
| ID | DT | VAL|
|----|------------|----|
| 1 | 2016-06-27 | 4 |
| 1 | 2016-06-29 | 3 |
| 1 | 2016-07-15 | 0 |
| 1 | 2016-09-12 | 8 |
| 2 | 2016-05-05 | 3 |
What I need is to add another column to my view with a boolean that means "the immediately previous record exist in history and has a related value greater than zero".
The expected output is the following:
| ID | DT | VAL| FLAG |
|----|------------|----|------|
| 1 | 2016-09-01 | 7 | false| -- previous is '2016-07-15' and value is zero. '2016-09-12' in hist is greater than '2016-09-01' in view, so it is not the previous
| 2 | 2016-08-01 | 5 | true | -- previous is '2016-05-05' and value is 3
| 3 | 2016-07-01 | 8 | false| -- there is no previous value in HIST table
What have I tried
I've used the query below. It works for small loads, but fails in performance in production because my view is extremely complex and the historical table is too large. Is it possible to query this without using the view multiple times? (if so, the performance should be better and I won't see anymore timeouts)
You can test here http://rextester.com/l/sql_server_online_compiler
create table vw (id int, dt date, val int);
insert into vw values (1, '2016-09-01', 7), (2, '2016-08-01', 5), (3, '2016-07-01', 8);
create table hist (id int, dt date, val int);
insert into hist values (1, '2016-06-27', 4), (1, '2016-06-29', 3), (1, '2016-07-15', 0), (1, '2016-09-12', 8), (2, '2016-05-05', 3);
select vw.id, vw.dt, vw.val, (case when hist_with_flag.flag = 'true' then 'true' else 'false' end)
from vw
left join
(
select hist.id, (case when hist.val > 0 then 'true' else 'false' end) flag
from
(
select hist.id, max(hist.dt) as dt
from hist
inner join vw on vw.id = hist.id
where hist.dt < vw.dt
group by hist.id
) hist_with_max_dt
inner join hist
on hist.id = hist_with_max_dt.id and hist.dt = hist_with_max_dt.dt
) hist_with_flag
on vw.id = hist_with_flag.id
You can use OUTER APPLY in order to get the immediately previous record:
SELECT v.ID, v.DT, v.VAL,
IIF(t.VAL IS NULL OR t.VAL = 0, 'false', 'true') AS FLAG
FROM Vw AS v
OUTER APPLY (
SELECT TOP 1 VAL, DT
FROM Hist AS h
WHERE v.ID = h.ID AND v.DT > h.DT
ORDER BY h.DT DESC) AS t
Can you please try with this query, it returns same result as your query. It should work good performance wise
SELECT vw.id, MAX(vw.dt) dt,
MAX(vw.val) val,
case when MAX(h.val) > 0 then 'true' else 'false' END flag
FROM vw
OUTER APPLY(SELECT MAX(dt) dt FROM hist WHERE vw.id = hist.id
AND dt<vw.dt GROUP BY hist.id) t
LEFT JOIN hist h ON vw.id = h.id AND h.dt = t.dt
GROUP BY vw.id
You can avoid multiple JOIN using a simple CTE with 'ROW_NUMBER'.
;with cte_1
as
(select vw.id, vw.dt, vw.val,hist.val HistVal,hist.dt HistDt,ROW_NUMBER()OVER (PARTITION BY vw.id,vw.dt ORDER BY vw.id,vw.dt,hist.dt desc) RNO
FROM vw
left join hist
on hist.id = vw.id and hist.dt < vw.dt
)
SELECT Id,Dt,Val,case when ISNULL(HistVal,0)=0 THEN 'FALSE' ELSE 'TRUE' END as FLAG
FROM cte_1 WHERE RNO=1