Mixing date frequencies in SQL - sql

I have the query below:
select s1.DATADATE, s1.PRCCD, c.EBIT
from sec_dprc s1
left outer join rdq_temp c
on s1.GVKEY = c.GVKEY
and s1.DATADATE = c.rdq
where s1.GVKEY = 008068
order by s1.DATADATE
I am trying to create a rolling calculation that between the two columns, the PRCCD column is daily prices and the EBIT column is a quarterly value. I want to be able to calculate the product of the two, i.e PRCCD*EBIT for everyday but the EBIT only changes once a quarter on random dates. Summarizing, I want to be able to calculating the product of EBIT and PRCCD going forward using only new values of EBIT when they change each quarter randomly
DATADATE PRCCD EBIT
1984-02-01 00:00:00.000 28.625 NULL
1984-02-02 00:00:00.000 27.875 NULL
1984-02-03 00:00:00.000 26.75 420.155
1984-02-06 00:00:00.000 27 NULL
1984-02-07 00:00:00.000 26.875 NULL
.
.
.
DATADATE PRCCD EBIT
1984-05-02 00:00:00.000 30.75 NULL
1984-05-03 00:00:00.000 30.875 NULL
1984-05-04 00:00:00.000 30.75 NULL
1984-05-07 00:00:00.000 31.125 499.228
1984-05-08 00:00:00.000 31.75 NULL
.
.
.
1984-07-31 00:00:00.000 25.625 NULL
1984-08-01 00:00:00.000 26.75 NULL
1984-08-02 00:00:00.000 26.375 348.364
1984-08-03 00:00:00.000 26.75 NULL
1984-08-06 00:00:00.000 27 NULL
Thanks for the help!
one of the solutions I came to:
select TD.Date, TD.C CD, TQ.C CQ, TQ.C1, TQ.C/TQ.C1 EBITps,TQ.C/TQ.C1/TD.C PE
from
(select DataDate date, PRCCD C from sec_dprc where GVKEY = 008068) TD
cross apply (select top 1 rdq date, ebit C, csh12q C1 from rdq_temp where rdq<=TD.Date order by rdq desc) TQ
order by TD.Date

What you are looking for is a non-equijoin between the two tables. This would be much easier if you had effective and end date on the rdq_temp data. In order to add them in SQL Server, you can do a self join and aggregation (other databases support lag() and lead() functionality).
The following query does this where condition on the join is essentially a "between":
with rdq as (
select r.datadate, r.ebit, min(rnext.datadate) as nextdatadate
from rdq_temp r left outer join
rdq_temp rnext
on r.datadate < rnext.datedate
group by r.datadate, r.ebit
)
select datadate, prccid, rdq.ebit
from sec_dprc sd left outer join
rdq
on sd.datadate >= rdq.datadate and rdq.datadate < rdq.nextdatadate
I'm guessing that data by quarters is not very big, so this should work fine. If you had more data, I would strongly suggest having effective and end dates, rather than just the asof date, in the rdq records.

I havent checked the performance of this one, but I think it gives the result you want.
select datadate
,prccid
,ebit
,( select top 1 ebit
from sec_dprc s2
where s2.datadate <= s1.datadate
and ebit is not null
order by datadate desc
) as latestEbit
from sec_dprc s1

Related

SQL Server 2008 Join by date getting right values

My scenario is not really hard. Basically, I have two tables and I need to join them through a PK and a date, the thing is that salary table has a date per each monthly payment and the second table called bonus has just a date with the annual bonus that has to be linked with salary on the date after declare the year bonus but just until next year bonus.
Knowing that and just you give you an idea this is how you check the tables.
Salary table:
Sample data:
DateHist;NumSalarie;ValeurMontant;ChargePatronal;ChargesSalariales
2012-10-31 00:00:00.000;1;3519;1322;766,49
2012-11-30 00:00:00.000;1;3519;1322;766,49
2012-12-31 00:00:00.000;1;3519;1322;766,49
2013-01-31 00:00:00.000;1;3519;1395,15;867,84
2013-02-28 00:00:00.000;1;3592,33;1936,78;1157,09
2013-03-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-04-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-05-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-06-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-07-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-08-31 00:00:00.000;1;3592,33;1202,4;765,41
2013-09-30 00:00:00.000;1;3592,33;1385,19;862,52
2013-10-31 00:00:00.000;1;3592,33;1423,23;882,85
2013-11-30 00:00:00.000;1;3592,33;1423,23;882,85
2013-12-31 00:00:00.000;1;3592,33;1423,23;882,85
2014-01-31 00:00:00.000;1;3592,33;1439,35;897,52
2014-02-28 00:00:00.000;1;3592,33;1825,8;1104,15
2014-03-31 00:00:00.000;1;3666,67;2858,27;1656,17
2014-04-30 00:00:00.000;1;3666,67;1468,1;912,89
2014-05-31 00:00:00.000;1;3666,67;1468,1;912,89
Bonus table:
Sample data:
CodeRubrique;NumSalarie;ValeurMontant;DateHist
1200;1;1267;2013-02-28 00:00:00.000
1200;1;3448,64;2014-03-31 00:00:00.000
1200;1;3633;2015-03-31 00:00:00.000
1200;1;2244;2015-09-30 00:00:00.000
1200;1;4042,84;2016-10-31 00:00:00.000
So, now when I join both tables I do in T-SQL:
SELECT
salpaid.DateHist,
salpaid.NumSalarie,
salpaid.ValeurMontant,
bonus.ValeurMontant AS bonus
FROM
(select CodeRubrique,NumSalarie,ValeurMontant,DateHist
FROM table ) salpaid
LEFT JOIN
(select CodeRubrique,NumSalarie,ValeurMontant,DateHist
FROM T_HBNS
WHERE CodeRubrique='1200') bonus
ON salpaid.NumSalarie=bonus.NumSalarie
AND salpaid.DateHist >= bonus.DateHist
So, here is my problem. The thing is that the date joins it's not right because the result when a complete the first year bonus and then I'm on a date after the first year bonus I link twice. the previous year bonus and the current one. just to show you guys my output:
As you can see the payments before line 73 have NULL in the bonus because the first registered bonus was after those dates then between the line 74 and 87 are ok. the nightmare came when I go through second-year bonus because I get a link on the right bonus category but I have another additional link with previous year bonus as you can see in lines after 88.
How should I improve my code to get the right JOIN?
thanks guys
If you build a CTE with the start and end dates for each bonus, you can outer join to the cte where the DateHist falls between the Begin and End dates of the bonuses
WITH Bonus_Ordered AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [NumSalarie] ORDER BY [DateHist]) Rn
FROM Bonus
WHERE CodeRubrique = '1200'
),
Bonus_Periods AS (
SELECT a.*,
b.DateHist - 1 AS EndDateHist
FROM Bonus_Ordered a LEFT JOIN Bonus_Ordered b ON a.Rn + 1 = b.Rn
)
SELECT *
FROM Salary s
LEFT JOIN Bonus_Periods bp ON s.NumSalarie = bp.NumSalarie
AND s.DateHist BETWEEN bp.DateHist AND bp.EndDateHist
ORDER BY s.DateHist
SQL Fiddle example
What if you change
salpaid.DateHist >= bonus.DateHist
to
Year(salpaid.DateHist) = Year(bonus.DateHist)

Missing a single day

My database has two tables, a car table and a wheel table.
I'm trying to find the number of wheels that meet a certain condition over a range of days, but some days are not included in the output.
Here is the query:
USE CarDB
SELECT MONTH(c.DateTime1) 'Month',
DAY(c.DateTime1) 'Day',
COUNT(w.ID) 'Wheels'
FROM tblCar c
INNER JOIN tblWheel w
ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND w.Measurement < 18
GROUP BY MONTH(c.DateTime1), DAY(c.DateTime1)
ORDER BY [ Month ], [ Day ]
GO
The output results seem to be correct, but days with 0 wheels do not show up. For example:
Sample Current Output:
Month Day Wheels
2 1 7
2 2 4
2 3 2 -- 2/4 is missing
2 5 9
Sample Desired Ouput:
Month Day Wheels
2 1 7
2 2 4
2 3 2
2 4 0
2 5 9
I also tried a left join but it didn't seem to work.
You were on the right track with a LEFT JOIN
Try run your query with this kind of outer join but remove your WHERE clause. Notice anything?
What's happening is that the join is applied and then the where clause removes the values that don't match the criteria. All this happens before the group by, meaning the cars are excluded.
Here's one method for you:
SELECT Year(cars.datetime1) As the_year
, Month(cars.datetime1) As the_month
, Day(cars.datetime1) As the_day
, Count(wheels.id) As wheels
FROM (
SELECT id
, datetime1
FROM tblcar
WHERE datetime1 BETWEEN '2013-01-05' AND '2013-04-06'
) As cars
LEFT
JOIN tblwheels As wheels
ON wheels.carid = cars.id
What's different this time round is that we're limiting the results of the car table before we join to the wheels table.
You probably want to use a LEFT OUTER JOIN:
USE CarDB
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', COUNT (w.ID) 'Wheels'
FROM tblCar c LEFT OUTER JOIN tblWheel w ON c.ID = w.CarID
WHERE c.DateTime1 BETWEEN '05/01/2013' AND '06/04/2013'
AND (w.Measurement IS NULL OR w.Measurement < 18)
GROUP BY MONTH (c.DateTime1), DAY (c.DateTime1)
ORDER BY [Month], [Day]
GO
Aand then, you need to adapt the WHERE condition, as you want to keep the rows with w.Measurement being NULL due to the OUTER join.
Remove the join and change your select to this:
SELECT MONTH (c.DateTime1) 'Month', DAY (c.DateTime1) 'Day', isnull(select top 1 (select COUNT from tblWheel where id = tblCar.ID and Measurement < 18), 0) 'Wheels'

Selecting specific Data per month

So i have a table that would determine the Region Code of a Branch Depending on the Month,
Lets Say from January to february it's Region code would be 1, from February to March it would be 2 and for the month of april upto date would be 3
so here is the sample look of the table
i have a code that gets the data from a table, but what i want to achieve is that if the LoanDate of the selected Data is within the Dates above (between fld_Datefrom and fld_Dateto) it would use the fld_BranchRegion that is indicated like above above. (EG. the Loandate of the date is 2013-01-12 00:00:00 it would use the RegionCode 4A as indicated above and if the data is 2013-02-04 00:00:00 it would use the Region code 3
here is the code i use
SELECT
TE.LOAN
,bp.REGION
,BP.ID
,TE.AMOUNT
,te.ID
FROM #TrackExpired TE
inner join Transactions.TBLMAIN PM
on TE.ID = PM.ID
inner join #track BP
on BP.ID=cast(TE.ID/1000000000000 as decimal(38,0))
WHERE ((cast(TE.EXPIRATION as date) < cast(TE.newloandate as date))
OR(TE.NewLoanDate is null and (cast(TE.EXPIRATION as date) < cast(PM.REDEEMED as date))) or ((TE.NewLoanDate is null and PM.REDEEMED is null) and (PM.STATUS = 7 or PM.STATUS = 5)) )
The problem with this is that it generates duplicate values so i have 3 occurances of the dates in the #track table the number of the Data is also outputted 3 times with different Region Code!!
Instead of outputting them i would like to achive on selecting the Region Code from **#track
Based on the loan date of the Data.**
i just want to achieve that instead of outputting all of the region code, it would just use the Region code that is between the ranges based on the #track table provided..
Any Help? or other approach?? thank you!. sorry im new to SQL.
EDIT here is the code to create the temp tables.
#trackexpired
SELECT PH.ID
,PH.LOAN
,PH.EXPIRATION
,PH.AMOUNT
,(SELECT T3.LOAN FROM Transactions.HISTO T3 INNER JOIN
(
SELECT MIN(T2.ID) as pawnhisto
FROM Transactions.HISTO T2
WHERE T2.ID > PH.ID
AND PH.ID = T2.ID
) T4
ON T4.pawnhisto = T3.ID
)as 'NewLoanDate'
INTO #TrackExpired
FROM Transactions.HISTO PH
INNER JOIN Transactions.MAIN PM
ON PM.ID=PH.ID
WHERE YEAR(PH.LOAN) = #YEAR
#track
Select bt.CODE
,bp.ID
,AREA
,REGION
,NCODE
,FROM
,isnull(fld_Dateto,GETDATE()) as fld_Dateto
into #sort
from Transactions.tbl_BranchTracking bt
inner join Reference.tbl_BranchProfiles bp
on bt.CODE = bp.CODE
Select * into #track from #sort
where #YEAR >= year(FROM)
and
#YEAR <= year(fld_Dateto)
Test Data
create table #LoanTable (
ID int not null,
RegionCode nvarchar(50) not null,
LoanDate datetime not null
);
insert into #LoanTable values
(1,'5','10/01/2014'),
(2,'5','10/18/2014'),
(3,'5','10/02/2014'),
(4,'3','04/11/2014'),
(5,'3','04/05/2014'),
(6,'4A','01/09/2014'),
(7,'4A','01/05/2014')
create table #LoanDetailsTable (
ID int not null,
LoanAmount INT not null,
LoanDate datetime not null
);
insert into #LoanDetailsTable values
(1,5000,'10/15/2014'),
(2,1000,'10/11/2014'),
(3,2000,'10/09/2014'),
(4,1500,'04/13/2014'),
(5,5000,'04/17/2014'),
(6,500,'01/19/2014'),
(7,2500,'01/15/2014')
Query
;With RegCode
AS
(
SELECT RegionCode, MAX(MONTH(LoanDate)) [Month]
FROM #LoanTable
GROUP BY RegionCode
)
SELECT LDT.* , RC.RegionCode
FROM #LoanDetailsTable LDT INNER JOIN RegCode RC
ON MONTH(LDT.LoanDate) = RC.[Month]
Results
ID LoanAmount LoanDate RegionCode
1 5000 2014-10-15 00:00:00.000 5
2 1000 2014-10-11 00:00:00.000 5
3 2000 2014-10-09 00:00:00.000 5
4 1500 2014-04-13 00:00:00.000 3
5 5000 2014-04-17 00:00:00.000 3
6 500 2014-01-19 00:00:00.000 4A
7 2500 2014-01-15 00:00:00.000 4A
Using CTE extract the Month part of the date along with Region Code associated with it, then join it with you data table on Month of the loan date and extracted month in cte and get the Region code whatever it is at that time. happy days :)

Sql Server Self JOIN (pushing column values down)

I am asked to do the following:
"CycleStartDate needs to be the BillDate from the previous BillDate record. If a previous record does not exist, you should use the most recent CycleEndDate from the DataTime table"
CycleStartDate and CycleEndDate are columns in a table called DataTime
BillDate is a column in a table called BillingData
This is the BillDate values:
2012-07-27 00:00:00.000
2012-07-27 00:00:00.000
2012-08-27 00:00:00.000
2012-08-27 00:00:00.000
2012-09-28 00:00:00.000
2012-09-28 00:00:00.000
2012-10-26 00:00:00.000
2012-10-26 00:00:00.000
2012-11-27 00:00:00.000
2012-11-27 00:00:00.000
2012-12-27 00:00:00.000
How would I set the CycleStartDate values based on the requirements?
The tables Datetime and BillingData are connected by a column called MeterID.
Try something similar to this...
SELECT B.BillDate,
ISNULL(
B2.BillDate,
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
OUTER APPLY (
SELECT TOP 1 B2.BillDate
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData
ORDER BY B2.BillingData DESC
) B2
I still have one doubt... Do you need to take the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID or the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID AND DT.CycleEndDate < B.BillDate?
But it can be done without the OUTER APPLY...
SELECT B.BillDate,
ISNULL(
(SELECT MAX(B2.BillDate)
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData),
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
I think the second version is quite readable... For each row of BillingData B, look for the biggest BillDate (MAX(B2.BillDate)) lesser than the current BillDate and of the same MeterID. If not present (the ISNULL, if the first one is not present then it's NULL, so it goes to the second part of the ISNULL), look for the biggest CycleEndDate from DataTime with the same MeterID and return it.
You can use the ROW_NUMBER() function for offsetting a JOIN:
SELECT a.BillDate, COALESCE(b.BillDate,c.CycleEndDate) 'CycleEndDate'
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)a
LEFT JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)b
ON a.RowRank = b.RowRank - 1
AND a.MeterID = b.MeterID
LEFT JOIN (SELECT MeterID,MAX(CycleEndDate)'CycleEndDate'
FROM DataTime
GROUP BY MeterID
) c
ON a.MeterID = c.MeterID
The PARTITION BY may not be necessary as well as the MeterID criteria in the JOIN, your wording is a little confusing as to whether the ORDER BY should be ascending or descending, as it is above the newest record will be the one that gets it's date from the DateTime table, remove DESC to make it the oldest record that gets it's value from that table.

SQL Query: Calculating the deltas in a time series

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/