SQL table and data extraction

SQL table and data extraction - sql

I have never done SQL before and I been reading up on it. There is a exercise in the book i am reading to get me started, I am also looking up a website called W3School and the book is telling me to attempt the below;
Trades which has the following structure –
trade_id: primary key
timestamp: timestamp of trade
security: underlying security (bought or sold in trade)
quantity: underlyingquantity (positive signifies bought, negative indicates sold)
price:price of 1 security item for this trade
Consider the following table
CREATE TABLE tbProduct
([TRADE_ID] varchar(8), [TIMESTAMP] varchar(8), [SECURITY] varchar(8), [QUANTITY] varchar(8), [PRICE] varchar(8))
;
INSERT INTO tbProduct
([TRADE_ID], [TIMESTAMP], [SECURITY], [QUANTITY], [PRICE])
VALUES
('TRADE1', '10:01:05', 'BP', '+100', '20'),
('TRADE2', '10:01:06', 'BP', '+20', '15'),
('TRADE3', '10:10:00', 'BP', '-100', '19'),
('TRADE4', '10:10:01', 'BP', '-100', '19')
;
In the book it is telling me to write a query to find all trades that happened in the range of 10 seconds and having prices differing by more than 10%.
The result should also list the percentage of price difference between the 2 trades.
For a person who has not done SQL before, reading that has really confused me. They have also provided me the outcome but i am unsure on how they have come to this outcome.
Expected result:
First_Trade Second_Trade PRICE_DIFF
TRADE1 TRADE2 25
I have created a fiddle if this help. If someone could show me how to get the expected result, it will help me understand the book exercise.
Thanks

This will get the result you want.
;with cast_cte
as
(
select [TRADE_ID], cast([TIMESTAMP] as datetime) timestamp, [SECURITY], [QUANTITY], cast([PRICE] as float) as price
from tbProduct
)
select t1.trade_id, t2.trade_id, datediff(ms, t1.timestamp, t2.timestamp) as milliseconds_diff,
((t1.price - t2.price) / t1.price) * 100 as price_diff
from cast_cte t1
inner join cast_cte t2
on datediff(ms, t1.timestamp, t2.timestamp) between 0 and 10000
and t1.trade_id <> t2.trade_id
where ((t1.price - t2.price) / t1.price) * 100 > 10
or ((t1.price - t2.price) / t1.price) * 100 < -10
However, there are a number of problems with the schema and general query parameters:
1) The columns are all varchars. This is very inefficient because they all need to be cast to their appropriate data types in order to get the results you desire. Use datetime, int, float etc. (I have used a CTE to clean up the query as per #Jeroen-Mostert's suggestion)
2) As the table gets larger this query will start performing very poorly as the predicate used (the 10 second timestamp) is not indexed properly.

Slightly different approach to the other answer, but pretty much the same effect. I use 'Between' to find the date range rather than datediff.
select
trade1.trade_ID as TRADE1,
trade2.trade_ID as TRADE2,
(cast(trade1.price as float)-cast(trade2.price as float))/cast(trade1.price as float)*100 as PRICE_DIFF_PERC
from
tbProduct trade1
inner join
tbProduct trade2
on
trade2.timestamp between trade1.timestamp and dateadd(s,10,trade1.TIMESTAMP)
and trade1.TRADE_ID <> trade2.TRADE_ID
where (cast(trade1.price as float)-cast(trade2.price as float))/cast(trade1.price as float) >0.1
The schema could definitely be improved; removing the need for 'CAST's would make this a lot clearer:
CREATE TABLE tbProduct2
([TRADE_ID] varchar(8), [TIMESTAMP] datetime, [SECURITY] varchar(8), [QUANTITY] int, [PRICE] float)
;
Allows you to do:
select *,
trade1.trade_ID as TRADE1,
trade2.trade_ID as TRADE2,
((trade1.price-trade2.price)/trade1.price)*100 as PRICE_DIFF_PERC
from
tbProduct2 trade1
inner join
tbProduct2 trade2
on
trade2.timestamp between trade1.timestamp and dateadd(s,10,trade1.TIMESTAMP)
and trade1.TRADE_ID <> trade2.TRADE_ID
where (trade1.price-trade2.price) /trade1.price >0.1
;

have used lead function to gain expected result. try this :
select
iq.trade_id as FIRST_TRADE,
t1 as SECOND_TRADE,
((price-t3)/price*100) as PRICE_DIFF
from
(
Select trade_id, timestamp, security, quantity, cast(price as float) price,
lead(trade_id) over (partition by security order by timestamp) t1
,lead(timestamp) over (partition by security order by timestamp) t2
,lead(cast(price as float)) over (partition by security order by timestamp) t3
from tbProduct
) iq
where DATEDIFF(SECOND, iq.timestamp,iq.t2) between 0 and 10
and ((price-t3)/price*100) > 10
It is based on fact that partition is done over security. Feel free to comment or suggest corrections.

Related

Convert tick data to candlestick (OHLC) with MS localdb

I have been trying to follow this solution ( Convert tick data to candlestick (OHLC) with SQL ) to fit my needs for my home project which has SQL Server Express Localdb as database. My SQL knowledge is a bit unsharp so hoping for help :-)
I have a price, a float (53), (e.g. value 109,2) and a time, a datetime, (e.g. value 2021-02-11 21:26:45.000)
I need to get candlesticks per minute.
Then I have this T-SQL:
SELECT
t1.price as open,
m.high,
m.low,
t2.price as close,
open_time
FROM
(SELECT
MIN(Publication_time) AS min_time,
MAX(Publication_time) AS max_time,
MIN(price) AS low,
MAX(price) AS high,
FLOOR((CAST(DATEDIFF(s, Publication_time, GETUTCDATE()) AS BIGINT) * 1000) / (1000 * 60)) AS open_time
FROM
stocks
GROUP BY
open_time) m
JOIN
stocks t1 ON t1.Publication_time = min_time
JOIN
stocks t2 ON t2.Publication_time = max_time
It is parsed alright, but I get an error
Invalid column name 'open_time'
on execution. What is the correct way to do this?

A common way to avoid repeating the same calculation is to calculate it in a cross apply e.g.
SELECT
MIN(Publication_time) AS min_time
, MAX(Publication_time) AS max_time
, MIN(price) AS low
, MAX(price) AS high
, C.open_time
FROM stocks S
CROSS APPLY (VALUES (FLOOR((CAST(DATEDIFF(s, Publication_time, GETUTCDATE()) AS BIGINT) * 1000) / (1000 * 60)))) AS C (open_time)
GROUP BY C.open_time
A sub-query will also accomplish the same thing, but isn't as neat (IMO).

Group By column throwing off query

I have a query that checks a database to see if a customer has visited multiple times a day. If they have it counts the number of visits, and then tells me what times they visited. The problem is it throws "Tickets.lcustomerid" into the group by clause, causing me to miss 5 records (Customers without barcodes). How can I change the below query to remove "tickets.lcustomerid" from the group by clause... If I remove it I get an error telling me "Tickets.lCustomerID" is not a valid select because it's not part of an aggregate or groupby clause.
The Query that works:
SELECT Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME) AS dtCreatedDate, COUNT(Customers.sBarcode) AS [Number of Scans],
MAX(Customers.sLastName) AS LastName
FROM Tickets INNER JOIN
Customers ON Tickets.lCustomerID = Customers.lCustomerID
WHERE (Tickets.dtCreated BETWEEN #startdate AND #enddate) AND (Tickets.dblTotal <= 0)
GROUP BY Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME)
HAVING (COUNT(*) > 1)
ORDER BY dtCreatedDate
The Output is:
sBarcode dtcreated Date Number of Scans slastname
1234 1/4/2013 12:00:00 AM 2 Jimbo
1/5/2013 12:00:00 AM 3 Jimbo2
1578 1/6/2013 12:00:00 AM 3 Jimbo3
My current Query with the subquery
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS sub
WHERE ( lcustomerid = tickets.lcustomerid )
AND ( dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME),
tickets.lcustomerid
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
The Current output (notice the record without a barcode is missing) is:
sBarcode dtcreated Date Number of Scans slastname Times Scanned
1234 1/4/2013 12:00:00 AM 2 Jimbo 12:00PM, 1:00PM
1578 1/6/2013 12:00:00 AM 3 Jimbo3 03:05PM, 1:34PM

UPDATE: Based on our "chat" it seems that customerid is not the unique field but barcode is, even though customer id is the primary key.
Therefore, in order to not GROUP BY customer id in the subquery you need to join to a second customers table in there in order to actually join on barcode.
Try this:
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS subticket
inner join
customers as subcustomers
on
subcustomers.lcustomerid = subticket.lcustomerid
WHERE ( subcustomers.sbarcode = customers.sbarcode )
AND ( subticket.dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME)
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate

I can't directly solve your problem because I don't understand your data model or what you are trying to accomplish with this query. However, I can give you some advice on how to solve the problem yourself.
First do you understand exactly what you are trying to accomplish and how the tables fit together? If so move on to the next step, if not, get this knowledge first, you cannot do complex queries without this understanding.
Next break up what you are trying to accomplish in little steps and make sure you have each covered before moving to the rest. So in your case you seem to be missing some customers. Start with a new query (I'm pretty sure this one has more than one problem). So start with the join and the where clauses.
I suspect you may need to start with customers and left join to tickets (which would move the where conditions to the left joins as they are on tickets). This will get you all the customers whether they have tickets or not. If that isn't what you want, then work with the jon and the where clasues (and use select * while you are trying to figure things out) until you are returning the exact set of customer records you need. The reason why you use select * at this stage is to see what in the data may be causeing the problem you are having. That may tell you how to fix.
Usually I start with a the join and then add in the where clasues one at a time until I know I am getting the right inital set of records. If you have multiple joins, do them one at time to know when you suddenly start have more or less records than you would expect.
Then go into the more complex parts. Add each in one at a time and check the results. If you suddenly go from 10 records to 5 or 15, then you have probably hit a problem. When you work one step at a time and run into a problem, you know exactly what caused the problem making it much easier to find and fix.
Group BY is important to understand thoroughly. You must have every non-aggregated field in the group by or it will not work. Think of this as law like the law of gravity. It is not something you can change. However it can be worked around through the use of derived tables or CTEs. Please read up on those a bit if you don't know what they are, they are very useful techniques when you get into complex stuff and you shoud understand them thoroughly. I suspect you will need to use the derived table approach here to group on only the things you need and then join that derived table to the rest of teh query to get the ontehr fields. I'll show a simple example:
select
t1.table1id
, t1.field1
, t1.field2
, a.field3
, a.MostRecentDate
From table1 t1
JOIN
(select t1.table1id, t2.field3, max (datefield) as MostRecentDate
from table1 t1
JOin Table2 t2 on t1.table1id = t2.table1id
Where t2.field4 = 'test'
group by t1.table1id,t2.field3) a
ON a.table1id = t1.table1id
Hope this approach helps you solve this problem.

SQL query to identify paired items (challenging)

Assume there is a relation database with one table:
{datetime, tapeID, backupStatus}
2012-07-09 3:00, ID33, Start
2012-07-09 3:05, ID34, Start
2012-07-09 3:10, ID35, Start
2012-07-09 4:05, ID34, End
2012-07-09 4:10, ID33, Start
2012-07-09 5:05, ID33, End
2012-07-09 5:10, ID34, Start
2012-07-09 6:00, ID34, End
2012-07-10 4:00, ID35, Start
2012-07-11 5:00, ID35, End
tapeID = any of 100 different tapes each with their own unique ID.
backupStatus = one of two assignments either Start or End.
I want to write a SQL query that returns five fields
{startTime,endTime,tapeID,totalBackupDuration,numberOfRestarts}
2012-07-09 3:00,2012-07-09 5:05, ID33, 0days2hours5min,1
2012-07-09 3:05,2012-07-09 4:05, ID34, 0days1hours0min,0
2012-07-09 3:10,2012-07-10 5:00, ID35, 0days0hours50min,1
2012-07-09 5:10,2012-07-09 6:00, ID34, 0days0hours50min,0
I'm looking to pair the start and end dates to identify when each backupset has truely completed. The caveat here is that the backup of a single backupset may be restarted so there may be multiple Start times that are not considered complete until the following End event. A single backupset may be backed up multiple times a day, which would need to be identified as a with a separate start and end time.
Thank you for your assistance in advance!
B

Here's my version. If you add INSERT #T SELECT '2012-07-11 12:00', 'ID35', 'Start' to the table, you'll see unfinished backups in this query as well. OUTER APPLY is a natural way to solve the problem.
SELECT
Min(T.dt) StartTime,
Max(E.dt) EndTime,
T.tapeID,
Datediff(Minute, Min(T.dt), Max(E.dt)) TotalBackupDuration,
Count(*) - 1 NumberOfRestarts
FROM
#T T
OUTER APPLY (
SELECT TOP 1 E.dt
FROM #T E
WHERE
T.tapeID = E.tapeID
AND E.BackupStatus = 'End'
AND E.dt > T.dt
ORDER BY E.dt
) E
WHERE
T.BackupStatus = 'Start'
GROUP BY
T.tapeID,
IsNull(E.dt, T.dt)
One thing about CROSS APPLY is that if you're only returning one row and the outer references are all real tables, you have an equivalent in SQL 2000 by moving it into the WHERE clause of a derived table:
SELECT
Min(T.dt) StartTime,
Max(T.EndTime) EndTime,
T.tapeID,
Datediff(Minute, Min(T.dt), Max(T.EndTime)) TotalBackupDuration,
Count(*) - 1 NumberOfRestarts
FROM (
SELECT
T.*,
(SELECT TOP 1 E.dt
FROM #T E
WHERE
T.tapeID = E.tapeID
AND E.BackupStatus = 'End'
AND E.dt > T.dt
ORDER BY E.dt
) EndTime
FROM #T T
WHERE T.BackupStatus = 'Start'
) T
GROUP BY
T.tapeID,
IsNull(T.EndTime, T.dt)
For outer references that are not all real tables (you want a calculated value from another subquery's outer reference) you have to add nested derived tables to accomplish this.
I finally bit the bullet and did some real testing. I used SPFiredrake's table population script to see the actual performance with a large amount of data. I did it programmatically so there are no typing errors. I took 10 executions each, and threw out the worst and best value for each column, then averaged the remaining 8 column values for that statistic.
The indexes were created after populating the table, with 100% fill factor. The Indexes column shows 1 when just the clustered index is present. It shows 2 when the nonclustered index on BackupStatus is added.
To exclude client network data transfer from the testing, I selected each query into variables like so:
DECLARE
#StartTime datetime,
#EndTime datetime,
#TapeID varchar(5),
#Duration int,
#Restarts int;
WITH A AS (
-- The query here
)
SELECT
#StartTime = StartTime,
#EndTime = EndTime,
#TapeID = TapeID,
#Duration = TotalBackupDuration,
#Restarts = NumberOfRestarts
FROM A;
I also trimmed the table column lengths to something more reasonable: tapeID varchar(5), BackupStatus varchar(5). In fact, the BackupStatus should be a bit column, and the tapeID should be an integer. But we'll stick with varchar for the time being.
Server Indexes UserName Reads Writes CPU Duration
--------- ------- ------------- ------ ------ ----- --------
x86 VM 1 ErikE 97219 0 599 325
x86 VM 1 Gordon Linoff 606 0 63980 54638
x86 VM 1 SPFiredrake 344927 260 23621 13105
x86 VM 2 ErikE 96388 0 579 324
x86 VM 2 Gordon Linoff 251443 0 22775 11830
x86 VM 2 SPFiredrake 197845 0 11602 5986
x64 Beefy 1 ErikE 96745 0 919 61
x64 Beefy 1 Gordon Linoff 320012 70 62372 13400
x64 Beefy 1 SPFiredrake 362545 288 20154 1686
x64 Beefy 2 ErikE 96545 0 685 164
x64 Beefy 2 Gordon Linoff 343952 72 65092 17391
x64 Beefy 2 SPFiredrake 198288 0 10477 924
Notes:
x86 VM: an almost idle virtual machine, Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86)
x64 Beefy: a quite beefy and possibly very busy Microsoft SQL Server 2008 R2 (RTM) - 10.50.1765.0 (X64)
The second index helped all the queries, mine the least.
It is interesting that Gordon's initially low number of reads on one server was high on the second--but it had a lower duration, so it obviously picked a different execution plan probably due to having more resources to search the possible plan space faster (being a beefier server). But, the index raised the number of reads because that plan lowered the CPU cost by a ton and so costed out less in the optimizer.

What you need to do is to assign the next end date to all starts. Then count the number of starts in-between.
select tstart.datetime as starttime, min(tend.datetime) as endtime, tstart.tapeid
from (select *
from t
where BackupStatus = 'Start'
) tstart join
(select *
from t
where BackupStatus = 'End'
) tend
on tstart.tapeid = tend.tapeid and
tend.datetime >= tstart.datetime
This is close, but we have multiple rows for each end time (depending on the number of starts). To handle this, we need to group by the tapeid and the end time:
select min(a.starttime) as starttime, a.endtime, a.tapeid,
datediff(s, min(a.starttime), endtime), -- NOT CORRECT, DATABASE SPECIFIC
count(*) - 1 as NumRestarts
from (select tstart.dt as starttime, min(tend.dt) as endtime, tstart.tapeid
from (select *
from #t
where BackupStatus = 'Start'
) tstart join
(select *
from #t
where BackupStatus = 'End'
) tend
on tstart.tapeid = tend.tapeid and
tend.dt >= tstart.dt
group by tstart.dt, tstart.tapeid
) a
group by a.endtime, a.tapeid
I've written this version using SQL Server syntax. To create the test table, you can use:
create table #t (
dt datetime,
tapeID varchar(255),
BackupStatus varchar(255)
)
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 3:00', 'ID33', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 3:05', 'ID34', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 3:10', 'ID35', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 4:05', 'ID34', 'End')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 4:10', 'ID33', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 5:05', 'ID33', 'End')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 5:10', 'ID34', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-09 6:00', 'ID34', 'End')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-10 4:00', 'ID35', 'Start')
insert into #t (dt, tapeID, BackupStatus) values ('2012-07-11 5:00', 'ID35', 'End')

Thought I'd take a stab at it. Tested out Gordon Linoff's solution, and it doesn't quite calculate correctly for tapeID 33 in his own example (matches to the next start, not the corresponding end).
My attempt assumes you're using SQL server 2005+, as it utilizes CROSS/OUTER APPLY. If you need it for server 2000 I could probably swing it, but this seemed like the cleanest solution to me (as you're starting with all end elements and matching the first start elements to get the result). I'll annotate as well so you can understand what I'm doing.
SELECT
startTime, endT.dt endTime, endT.tapeID, DATEDIFF(s, startTime, endT.dt), restarts
FROM
#t endT -- Main source, getting all 'End' records so we can match.
OUTER APPLY ( -- Match possible previous 'End' records for the tapeID
SELECT TOP 1 dt
FROM #t
WHERE dt < endT.dt AND tapeID = endT.tapeID
AND BackupStatus = 'End') g
CROSS APPLY (SELECT ISNULL(g.dt, CAST(0 AS DATETIME)) dt) t
CROSS APPLY (
-- Match 'Start' records between our 'End' record
-- and our possible previous 'End' record.
SELECT MIN(dt) startTime,
COUNT(*) - 1 restarts -- Restarts, so -1 for the first 'Start'
FROM #t
WHERE tapeID = endT.tapeID AND BackupStatus = 'Start'
-- This is where our previous possible 'End' record is considered
AND dt > t.dt AND dt < endt.dt) starts
WHERE
endT.BackupStatus = 'End'
Edit: Test data generation script found at this link.
So decided to run some data against the three methods, and found that ErikE's solution is fastest, mine is a VERY close second, and Gordon's is just inefficient for any sizable set (even when working with 1000 records, it started showing slowness). For smaller sets (at about 5k records), my method wins over Erik's, but not by much. Honestly, I like my method as it doesn't require any additional aggregate functions to get the data, but ErikE's wins in the efficiency/speed battle.
Edit 2: For 55k records in the table (and 12k matching start/end pairs), Erik's takes ~0.307s and mine takes ~0.157s (averaging over 50 attempts). I was a little surprised about this, because I would've assumed that individual runs would've translated to the overall, but I guess the index cache is being better utilized by my query so subsequent hits are less expensive. Looking at the execution plans, ErikE's only has 1 branch off the main path, so he's ultimately working with a larger set for most of the query. I have 3 branches that combine closer to the output, so I'm churning on less data at any given moment and combine right at the end.

Make it very simple --
Make one sub query for start event and another one for End event. Rank function in each set for each row that has start and end. Then, use a left joins for 2 sub queries:
-- QUERY
WITH CTE as
(
SELECT dt
, ID
, status
--, RANK () OVER (PARTITION BY ID ORDER BY DT) as rnk1
--, RANK () OVER (PARTITION BY status ORDER BY DT) as rnk2
FROM INT_backup
)
SELECT *
FROM CTE
ORDER BY id, rnk2
select * FROM INT_backup order by id, dt
SELECT *
FROM
(
SELECT dt
, ID
, status
, rank () over (PARTITION by ID ORDER BY dt) as rnk
FROM INT_backup
WHERE status = 'start'
) START_SET
LEFT JOIN
(
SELECT dt
, ID
, status
, rank () over (PARTITION by ID ORDER BY dt) as rnk
FROM INT_backup
where status = 'END'
) END_SET
ON Start_Set.ID = End_SET.ID
AND Start_Set.Rnk = End_Set.rnk

Summing historic cost rates over booked time (single effective date)

We have a time management system where our employees or contractors (resources) enter the hours they have worked, and we derive a cost for it. I have a table with the historic costs:
CREATE TABLE ResourceTimeTypeCost (
ResourceCode VARCHAR(32),
TimeTypeCode VARCHAR(32),
EffectiveDate DATETIME,
CostRate DECIMAL(12,2)
)
So I have one date field which marks the effective date. If we have a record which is
('ResourceA', 'Normal', '2012-04-30', 40.00)
and I add a record which is
('ResourceA', 'Normal', '2012-05-04', 50.00)
So all hours entered between the 30th April and the 3rd of May will be at £40.00, all time after midnight on the 4th will be at £50.00. I understand this in principle but how do you write a query expressing this logic?
Assuming my time table looks like the below
CREATE TABLE TimeEntered (
ResourceCode VARCHAR(32),
TimeTypeCode VARCHAR(32),
ProjectCode VARCHAR(32),
ActivityCode VARCHAR(32),
TimeEnteredDate DATETIME,
HoursWorked DECIMAL(12,2)
)
If I insert the following records into the TimeEntered table
('ResourceA','Normal','Project1','Management1','2012-04-30',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-01',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-02',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-03',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-04',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-07',7.5)
('ResourceA','Normal','Project1','Management1','2012-05-08',7.5)
I'd like to get a query that returns the total cost by resource
So in the case above it would be 'ResourceA', (4 * 7.5 * 40) + (3 * 7.5 * 50) = 2325.00
Can anyone provide a sample SQL query? I know this example doesn't make use of TimeType (i.e. it's always 'Normal') but I'd like to see how this is dealt with as well
I can't change the structure of the database. Many thanks in advance

IF OBJECT_ID ('tempdb..#ResourceTimeTypeCost') IS NOT NULL DROP TABLE #ResourceTimeTypeCost
CREATE TABLE #ResourceTimeTypeCost ( ResourceCode VARCHAR(32), TimeTypeCode VARCHAR(32), EffectiveDate DATETIME, CostRate DECIMAL(12,2) )
INSERT INTO #ResourceTimeTypeCost
SELECT 'ResourceA' as resourcecode, 'Normal' as timetypecode, '2012-04-30' as effectivedate, 40.00 as costrate
UNION ALL
SELECT 'ResourceA', 'Normal', '2012-05-04', 50.00
IF OBJECT_ID ('tempdb..#TimeEntered') IS NOT NULL DROP TABLE #TimeEntered
CREATE TABLE #TimeEntered ( ResourceCode VARCHAR(32), TimeTypeCode VARCHAR(32), ProjectCode VARCHAR(32), ActivityCode VARCHAR(32), TimeEnteredDate DATETIME, HoursWorked DECIMAL(12,2) )
INSERT INTO #TimeEntered
SELECT 'ResourceA','Normal','Project1','Management1','2012-04-30',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-01',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-02',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-03',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-04',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-07',7.5
UNION ALL SELECT 'ResourceA','Normal','Project1','Management1','2012-05-08',7.5
;with ranges as
(
select
resourcecode
,TimeTypeCode
,EffectiveDate
,costrate
,row_number() OVER (PARTITION BY resourcecode,timetypecode ORDER BY effectivedate ASC) as row
from #ResourceTimeTypeCost
)
,ranges2 AS
(
SELECT
r1.resourcecode
,r1.TimeTypeCode
,r1.EffectiveDate
,r1.costrate
,r1.effectivedate as start_date
,ISNULL(DATEADD(ms,-3,r2.effectivedate),GETDATE()) as end_date
FROM ranges r1
LEFT OUTER JOIN ranges r2 on r2.row = r1.row + 1 --joins onto the next date row
AND r2.resourcecode = r1.resourcecode
AND r2.TimeTypeCode = r1.TimeTypeCode
)
SELECT
tee.resourcecode
,tee.timetypecode
,tee.projectcode
,tee.activitycode
,SUM(ranges2.costrate * tee.hoursworked) as total_cost
FROM #TimeEntered tee
INNER JOIN ranges2 ON tee.TimeEnteredDate >= ranges2.start_date
AND tee.TimeEnteredDate <= ranges2.end_date
AND tee.resourcecode = ranges2.resourcecode
AND tee.timetypecode = ranges2.TimeTypeCode
GROUP BY tee.resourcecode
,tee.timetypecode
,tee.projectcode
,tee.activitycode

What you have is a cost table that is, as some would say, a slowly changing dimension. First, it will help to have an effective and end date for the cost table. We can get this by doing a self join and group by:
with costs as
(select c.ResourceCode, c.EffectiveDate as effdate,
dateadd(day, -1, min(c1.EffectiveDate)) as endDate,
datediff(day, c.EffectiveDate, c1.EffectiveDate) - 1 as Span
from ResourceTimeTypeCost c left outer join
ResourceTimeTypeCost c1
group by c.ResourceCode, c.EffectiveDate
)
Although you say you cannot change the table structure, when you have a slowly changing dimension, having an effective and end date is good practice.
Now, you can use this infomation with TimeEntered as following:
select te.*, c.CostRate * te.HoursWorked as dayCost
from TimeEntered te join
Costs c
on te.ResouceCode = c.ResourceCode and
te.TimeEntered between c.EffDate and c.EndDate
To summarize by Resource for a given time range, the full query would look like:
with costs as
(select c.ResourceCode, c.EffectiveDate as effdate,
dateadd(day, -1, min(c1.EffectiveDate)) as endDate,
datediff(day, c.EffectiveDate, c1.EffectiveDate) - 1 as Span
from ResourceTimeTypeCost c left outer join
ResourceTimeTypeCost c1
group by c.ResourceCode, c.EffectiveDate
),
te as
(select te.*, c.CostRate * te.HoursWorked as dayCost
from TimeEntered te join
Costs c
on te.ResouceCode = c.ResourceCode and
te.TimeEntered between c.EffDate and c.EndDate
)
select te.ResourceCode, sum(dayCost)
from te
where te.TimeEntered >= <date1> and te.TimeEntered < <date2>

You might give this a try. CROSS APPLY will find first ResourceTimeTypeCost with older or equal date and same ResourceCode and TimeTypeCode as current record from TimeEntered.
SELECT te.ResourceCode,
te.TimeTypeCode,
te.ProjectCode,
te.ActivityCode,
te.TimeEnteredDate,
te.HoursWorked,
te.HoursWorked * rttc.CostRate Cost
FROM TimeEntered te
CROSS APPLY
(
-- First one only
SELECT top 1 CostRate
FROM ResourceTimeTypeCost
WHERE te.ResourceCode = ResourceTimeTypeCost.ResourceCode
AND te.TimeTypeCode = ResourceTimeTypeCost.TimeTypeCode
AND te.TimeEnteredDate >= ResourceTimeTypeCost.EffectiveDate
-- By most recent date
ORDER BY ResourceTimeTypeCost.EffectiveDate DESC
) rttc
Unfortunately I can no longer find article on msdn, hence the blog in link above.
Live test # Sql Fiddle.

SQL query that selects effective costing rate based on charge date

SQL novice here. I'm trying to generate a costing query that outputs employee time card information and calculates cost based on an effective employee costing rate.
My question is similar to the one asked here: Retro-active effective date changes with overlapping dates but I'm not dealing with retro-activity or overlapping date ranges.
Table examples (null values in the rate table indicate current rate):
CREATE TABLE Emp_Rate
(
Emp int,
Rate money,
Rate_Start datetime,
Rate_Exp datetime
)
CREATE TABLE Emp_Time
(
Emp int,
Chrg_Date datetime,
Chrg_Code varchar(10),
Chrg_Hrs decimal(8, 2)
)
Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','20','5/1/09','4/30/10')
Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','21','5/1/10','4/30/11')
Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','22','5/1/11',NULL)
Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/09','B','8')
Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/10','B','8')
Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/11','B','8')
The query (returns dupes caused by multiple rate entries(obviously)):
Select Emp_Time.Emp,
Cast(Emp_Time.Chrg_Date as DATE) as 'Chrg_Date',
Emp_Time.Chrg_Code,
Emp_Time.Chrg_Hrs,
Emp_Rate.Rate,
Emp_Time.Chrg_Hrs * Emp_Rate.Rate as 'Cost'
From Emp_Time inner join
Emp_Rate on Emp_Rate.Emp = Emp_Time.Emp
Order By [Emp],[Chrg_Date]
Desired output:
Emp Chrg_Date Chrg_Code Chrg_Hrs Rate Cost
1 2009-05-10 B 8.00 20.00 160.00
1 2010-05-10 B 8.00 21.00 168.00
1 2011-05-10 B 8.00 22.00 176.00
I've gone around in circles using the Between operator in a sub query to isolate the correct rate based on the charge date, but have not had any luck.
I appreciate any help!

You didn't specify the DBMS type the answer below is for sql-server. I am sure there are other ways to do this but this way will replace the null Rate_Exp date with the current date.
Select et.Emp,
Cast(et.Chrg_Date as DATEtime) as 'Chrg_Date',
et.Chrg_Code,
et.Chrg_Hrs,
er.Rate,
et.Chrg_Hrs * er.Rate as 'Cost'
From Emp_Time et
inner join
(
SELECT Emp
, Rate
, Rate_Start
, CASE
WHEN Rate_Exp is Null
THEN Convert(varchar(10), getdate(), 101)
ELSE Rate_Exp
END as Rate_Exp
FROM Emp_Rate
)er
on er.Emp = et.Emp
WHERE (et.Chrg_Date BETWEEN er.Rate_Start AND er.Rate_Exp)
Order By et.Emp,et.Chrg_Date
OR use the CASE Statement in your WHERE Clause:
Select et.Emp,
Cast(et.Chrg_Date as DATEtime) as 'Chrg_Date',
et.Chrg_Code,
et.Chrg_Hrs,
er.Rate,
et.Chrg_Hrs * er.Rate as 'Cost'
From Emp_Time et
inner join Emp_Rate er
on er.Emp = et.Emp
WHERE (et.Chrg_Date
BETWEEN er.Rate_Start
AND CASE WHEN er.Rate_Exp Is Null
THEN Convert(varchar(10), getdate(), 101)
ELSE er.Rate_Exp END)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas