SQL Server 2008 Join by date getting right values - sql

My scenario is not really hard. Basically, I have two tables and I need to join them through a PK and a date, the thing is that salary table has a date per each monthly payment and the second table called bonus has just a date with the annual bonus that has to be linked with salary on the date after declare the year bonus but just until next year bonus.
Knowing that and just you give you an idea this is how you check the tables.
Salary table:
Sample data:
DateHist;NumSalarie;ValeurMontant;ChargePatronal;ChargesSalariales
2012-10-31 00:00:00.000;1;3519;1322;766,49
2012-11-30 00:00:00.000;1;3519;1322;766,49
2012-12-31 00:00:00.000;1;3519;1322;766,49
2013-01-31 00:00:00.000;1;3519;1395,15;867,84
2013-02-28 00:00:00.000;1;3592,33;1936,78;1157,09
2013-03-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-04-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-05-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-06-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-07-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-08-31 00:00:00.000;1;3592,33;1202,4;765,41
2013-09-30 00:00:00.000;1;3592,33;1385,19;862,52
2013-10-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-11-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-12-31 00:00:00.000;1;3592,33;1423,23;882,85
2014-01-31 00:00:00.000;1;3592,33;1439,35;897,52
2014-02-28 00:00:00.000;1;3592,33;1825,8;1104,15
2014-03-31 00:00:00.000;1;3666,67;2858,27;1656,17
2014-04-30 00:00:00.000;1;3666,67;1468,1;912,89
2014-05-31 00:00:00.000;1;3666,67;1468,1;912,89
Bonus table:
Sample data:
CodeRubrique;NumSalarie;ValeurMontant;DateHist
1200;1;1267;2013-02-28 00:00:00.000
1200;1;3448,64;2014-03-31 00:00:00.000
1200;1;3633;2015-03-31 00:00:00.000
1200;1;2244;2015-09-30 00:00:00.000
1200;1;4042,84;2016-10-31 00:00:00.000
So, now when I join both tables I do in T-SQL:
SELECT
salpaid.DateHist,
salpaid.NumSalarie,
salpaid.ValeurMontant,
bonus.ValeurMontant AS bonus
FROM
(select CodeRubrique,NumSalarie,ValeurMontant,DateHist
FROM table ) salpaid
LEFT JOIN
(select CodeRubrique,NumSalarie,ValeurMontant,DateHist
FROM T_HBNS
WHERE CodeRubrique='1200') bonus
ON salpaid.NumSalarie=bonus.NumSalarie
AND salpaid.DateHist >= bonus.DateHist
So, here is my problem. The thing is that the date joins it's not right because the result when a complete the first year bonus and then I'm on a date after the first year bonus I link twice. the previous year bonus and the current one. just to show you guys my output:
As you can see the payments before line 73 have NULL in the bonus because the first registered bonus was after those dates then between the line 74 and 87 are ok. the nightmare came when I go through second-year bonus because I get a link on the right bonus category but I have another additional link with previous year bonus as you can see in lines after 88.
How should I improve my code to get the right JOIN?
thanks guys

If you build a CTE with the start and end dates for each bonus, you can outer join to the cte where the DateHist falls between the Begin and End dates of the bonuses
WITH Bonus_Ordered AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [NumSalarie] ORDER BY [DateHist]) Rn
FROM Bonus
WHERE CodeRubrique = '1200'
),
Bonus_Periods AS (
SELECT a.*,
b.DateHist - 1 AS EndDateHist
FROM Bonus_Ordered a LEFT JOIN Bonus_Ordered b ON a.Rn + 1 = b.Rn
)
SELECT *
FROM Salary s
LEFT JOIN Bonus_Periods bp ON s.NumSalarie = bp.NumSalarie
AND s.DateHist BETWEEN bp.DateHist AND bp.EndDateHist
ORDER BY s.DateHist
SQL Fiddle example

What if you change
salpaid.DateHist >= bonus.DateHist
to
Year(salpaid.DateHist) = Year(bonus.DateHist)

Related

Finding Where on PO record has 2 different Ship Dates on multi-line order using SQL Server

I have a bit of a strange situation. We have Purchase Orders that can have 1 or more lines. In case a purchase order has more than one line the requested ship dates should always be the same. The issue is that we are finding instances where it has been entered wrong and we have different ship dates on different lines. I have tried using COUNT but it just counts each line and returns 1 and in cases where both lines are the same it returns 2 so I can't compare a sum on the count. Below is the query with a sample output of an incorrect result along with a correct result (both dates on the PO are the same). Any ideas?
Thanks.
Query:
SELECT PO_NUMBER,PO_ITEM_NUMBER,PO_REQ_SHIP_DATE FROM VW_PO_ITEM
WHERE PO_NUMBER IN ('0284141253', '0284503082')
GROUP BY PO_REQ_SHIP_DATE,
PO_NUMBER,
PO_ITEM_NUMBER
Results:
PO_NUMBER PO_ITEM_NUMBER PO_REQ_SHIP_DATE
---------- --------------------------------------- -----------------------
0284141253 1 2018-02-28 00:00:00.000
0284141253 2 2018-03-20 00:00:00.000
0284503082 1 2018-03-31 00:00:00.000
0284503082 2 2018-03-31 00:00:00.000
(4 row(s) affected)
I am wondering if some type of partition by may be needed? If I COUNT the PO_ITEM and REQ_SHIP_DATE columns I get the below:
PO_REQ_SHIP_DATE PO_NUMBER COUNT_ITEM COUNT_SHIP_DATE
----------------------- ---------- ----------- ---------------
2018-02-28 00:00:00.000 0284141253 1 1
2018-03-20 00:00:00.000 0284141253 1 1
2018-03-31 00:00:00.000 0284503082 2 2
(3 row(s) affected)
It seems as though because the ship dates are different it presents an issue, My thought would be to say where COUNT_SHIP_DATE = 1 but we do have more PO's that are only one line
In my setup I just used dates, not full timestamp. If your timestamps have zeroed-out hh:mm:ss then you should be fine, otherwise you'll have to tweak the SQL:
select * from
(
select po_number, po_item_number, po_req_ship_date
, row_number() over(partition by po_number, po_req_ship_date order by po_number, po_req_ship_date) as 'count_dates'
, row_number() over(partition by po_number order by po_number) as 'count_items'
, count(po_number) over(partition by po_number) as 'count_po_num'
from VW_PO_ITEM
)X
where count_po_num > 1
and count_dates > 1 ;
I believe you want this:
SELECT PO_NUMBER, PO_ITEM_NUMBER
FROM VW_PO_ITEM
WHERE PO_NUMBER IN ('0284141253', '0284503082')
GROUP BY PO_NUMBER, PO_ITEM_NUMBER
HAVING MIN(PO_REQ_SHIP_DATE) <> MAX(PO_REQ_SHIP_DATE);
This returns the items on the order that have inconsistent ship dates. Note: This ignores NULL values. The logic can be adapted to handle them as well.
I found something that does the trick - finds any PO's that have more than one unique ship date and by default must be multi-line PO's - thanks for the help and ideas some stuff to use in the future (note I have joined to an employee table):
SELECT COUNT(DISTINCT I.PO_REQ_SHIP_DATE) AS '# OF REQ_SHIP_DATES', I.PO_NUMBER, E.TEAM_MEMBER_NAME [EMPLOYEE]
FROM PDX_SAP_USER..VW_PO_ITEM I
JOIN ADI_USER_MAINTAINED..SCM_PO_Employee_Name E
ON I.PO_NUMBER = E.PO_NUMBER
WHERE DEL_INDICATOR <> 'L'
AND PO_BALANCE_QUANTITY > 0
GROUP BY I.PO_NUMBER, E.TEAM_MEMBER_NAME
HAVING COUNT(DISTINCT I.PO_REQ_SHIP_DATE) > 1;

SQL Server 2008 Running Total

I'm aware this has been asked but I'm completely baffled.
Trying to run a running total by day using SQL Server 2008. Have looked at solutions elsewhere but would am still completely perplexed.
The below code shows Daily sales but I cannot make a running total fit. Have looked at the similar solutions here but no luck. Have looked at partition by, order by, CTE etc but I'm just not there yet with SQL.
Would appreciate help, my code is below. I know this only returns the total grouped by day...
SELECT
dim_invoice_date.invoice_date AS 'Invoice Date',
round(SUM(invoice_amount_corp),2) AS 'Sales'
FROM
fact_om_bud_invoice
JOIN
dim_invoice_date ON fact_om_bud_invoice.dim_invoice_date_key = dim_invoice_date.dim_invoice_date_key
WHERE
dim_invoice_date.current_cal_month IN ('Current')
AND fact_om_bud_invoice.budget_code IN ('BUDGET')
GROUP BY
dim_invoice_date.invoice_date
HAVING
ROUND(SUM(invoice_amount_corp), 2) <> 0
ORDER BY
'Invoice Date'
This returns the output:
Invoice Date Sales
-----------------------
4/10/2016 24,132
5/10/2016 15,849
6/10/2016 24,481
7/10/2016 10,243
10/10/2016 42,398
11/10/2016 24,187
Required format is something like:
Invoice Date Sales Running Sales
-------------------------------------------
04/10/2016 24,132 24,132
05/10/2016 15,849 39,981
06/10/2016 24,481 64,462
07/10/2016 10,243 74,705
10/10/2016 42,398 117,103
11/10/2016 24,187 141,290
dim_invoice_date is a numeric field, it's looking up a separate date table to display as date time.
For example, can use WITH common_table_expression
WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY h.[Date]) RowN,
h.[Date],
SUM(s.Quantity) q
FROM
Sales s
JOIN Headers h
ON s.ID_Headers = h.ID
WHERE
h.[Date] > '2016.10.31'
GROUP BY
h.[Date]
)
SELECT
c.[Date],
c.q,
SUM(c1.q)
FROM
cte c
JOIN cte c1
ON c1.RowN <= c.RowN
GROUP BY
C.[Date],
c.q
ORDER BY
c.[Date]

SQL Query: Calculating the deltas in a time series

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/

SQL Query Grouping Two Columns Into One

Sorry for the vague Title.
I have a SQL Query
SELECT SaleDate, Location, SUM(DollarSales)
FROM [TableName]
WHERE (SaleDate= '2012-11-19' OR SaleDate= '2012-11-12')
GROUP BY SaleDate, Location
ORDER BY Location, SaleDate
I get the results as follows
SaleDate Location (No column name)
2012-11-12 00:00:00.000 002 21500.38
2012-11-19 00:00:00.000 002 24166.81
2012-11-12 00:00:00.000 016 14723.26
2012-11-19 00:00:00.000 016 12801.00
I want the results to look like
Location (Sale on 11/12) (Sale on 11/19)
002 21500.38 24166.81
016 14723.26 12801.00
I don't want to use inner selects. Is there any way I can do this.
And this is on one of our legacy systems that uses SQL Server 2000.
Thank you in advance.
You could try:
SELECT
Location,
SUM(CASE WHEN SaleDate = '2012-11-12' THEN DollarSales ELSE 0 END)'Sale on 11/12'
SUM(CASE WHEN SaleDate = '2012-11-19' THEN DollarSales ELSE 0 END)'Sale on 11/19'
FROM [TableName]
GROUP BY Location
ORDER BY Location
This would only work if columns were static, you would have to change this if the dates change.

Mixing date frequencies in SQL

I have the query below:
select s1.DATADATE, s1.PRCCD, c.EBIT
from sec_dprc s1
left outer join rdq_temp c
on s1.GVKEY = c.GVKEY
and s1.DATADATE = c.rdq
where s1.GVKEY = 008068
order by s1.DATADATE
I am trying to create a rolling calculation that between the two columns, the PRCCD column is daily prices and the EBIT column is a quarterly value. I want to be able to calculate the product of the two, i.e PRCCD*EBIT for everyday but the EBIT only changes once a quarter on random dates. Summarizing, I want to be able to calculating the product of EBIT and PRCCD going forward using only new values of EBIT when they change each quarter randomly
DATADATE PRCCD EBIT
1984-02-01 00:00:00.000 28.625 NULL
1984-02-02 00:00:00.000 27.875 NULL
1984-02-03 00:00:00.000 26.75 420.155
1984-02-06 00:00:00.000 27 NULL
1984-02-07 00:00:00.000 26.875 NULL
.
.
.
DATADATE PRCCD EBIT
1984-05-02 00:00:00.000 30.75 NULL
1984-05-03 00:00:00.000 30.875 NULL
1984-05-04 00:00:00.000 30.75 NULL
1984-05-07 00:00:00.000 31.125 499.228
1984-05-08 00:00:00.000 31.75 NULL
.
.
.
1984-07-31 00:00:00.000 25.625 NULL
1984-08-01 00:00:00.000 26.75 NULL
1984-08-02 00:00:00.000 26.375 348.364
1984-08-03 00:00:00.000 26.75 NULL
1984-08-06 00:00:00.000 27 NULL
Thanks for the help!
one of the solutions I came to:
select TD.Date, TD.C CD, TQ.C CQ, TQ.C1, TQ.C/TQ.C1 EBITps,TQ.C/TQ.C1/TD.C PE
from
(select DataDate date, PRCCD C from sec_dprc where GVKEY = 008068) TD
cross apply (select top 1 rdq date, ebit C, csh12q C1 from rdq_temp where rdq<=TD.Date order by rdq desc) TQ
order by TD.Date
What you are looking for is a non-equijoin between the two tables. This would be much easier if you had effective and end date on the rdq_temp data. In order to add them in SQL Server, you can do a self join and aggregation (other databases support lag() and lead() functionality).
The following query does this where condition on the join is essentially a "between":
with rdq as (
select r.datadate, r.ebit, min(rnext.datadate) as nextdatadate
from rdq_temp r left outer join
rdq_temp rnext
on r.datadate < rnext.datedate
group by r.datadate, r.ebit
)
select datadate, prccid, rdq.ebit
from sec_dprc sd left outer join
rdq
on sd.datadate >= rdq.datadate and rdq.datadate < rdq.nextdatadate
I'm guessing that data by quarters is not very big, so this should work fine. If you had more data, I would strongly suggest having effective and end dates, rather than just the asof date, in the rdq records.
I havent checked the performance of this one, but I think it gives the result you want.
select datadate
,prccid
,ebit
,( select top 1 ebit
from sec_dprc s2
where s2.datadate <= s1.datadate
and ebit is not null
order by datadate desc
) as latestEbit
from sec_dprc s1