SQL report to show gaps between record entries - sql

I wrote an application where the usage of a number of vehicles is recorded as seen in the screenshot.
I'd like to generate a report for gaps (in miles) between individual usages because the vehicles should not be used other than for travel that is recorded in the application.
A gap would occur when the current record beg odometer - previous record end odometer yields a number greater than 0 -- for that specific car. See the different colored circles. How can I achieve this with sql? I'm using oracle (11g) but I imagine the sql will be similar. Thank you.
Sample output:
Vehicle V06
Invoice Date Dest Gap
123 1/2/14 York 14.0
122 1/1/14 Pburg 0.0
Vehicle V05
Invoice Date Dest Gap
121 1/3/14 Mill 0.0
* I realize I should have used test data that includes a gap, though these should be rare in practice. In such a case,
Invoice 67189 would have End Od of 92590 resulting in a GAP of 3.0 miles for 67190

Just use lag():
select vu.*
from (select vu.*,
lag(endod) over (partition by vehicle order by date) as prev_endod
from vehicleusage vu
) vu
where begod <> prev_endod;
Note that the comparison will fail for NULL values, so there is no problem with the first recording for a vehicle.

SELECT
D1.invoiceNo,
D1.vehicalId,
CASE WHEN D1.odBeg > D0.odEnd THEN 'MISSING MILES' ELSE 'EXTRA MILES' END AS Notes
FROM
(SELECT vehicalId,invoiceNo, RANK() OVER(PARTITION BY vehicalId ORDER BY dateUsed) AS KEY_CUR, odBeg,odEnd FROM #DATA) AS D0
INNER JOIN
(SELECT vehicalId,invoiceNo, RANK() OVER(PARTITION BY vehicalId ORDER BY dateUsed) - 1 AS KEY_LAST, odBeg,odEnd FROM #DATA) AS D1
ON D0.vehicalId = D1.vehicalId AND D0.KEY_CUR = D1.KEY_LAST
WHERE
D1.odBeg <> D0.odEnd

Related

How to retrieve the earliest date from columns that have different dimensions?

DATA Explanation
I have two data tables, one (PAGE VIEWS) which represents user events (CV 1,2,3 etc) and associated timestamp with member ID. The second table (ORDERS) represents the orders made - event time & order value. Membership ID is available on each table.
Table 1 - PAGE VIEWS (1,000 Rows in Total)
Event_Day
Member ID
CV1
CV2
CV3
CV4
11/5/2021
115126
APP
camp1
Trigger
APP-camp1-Trigger
11/14/2021
189192
SEARCH
camp4
Search
SEARCH-camp4-Search
11/5/2021
193320
SEARCH
camp5
Search
SEARCH-camp5-Search
Table 2 - ORDERS (249 rows in total)
Date
Purchase Order ID
Membership Number
Order Value
7/12/2021
0088
183300
29.34
18/12/2021
0180
132159
132.51
4/12/2021
0050
141542
24.35
What I'm trying to answer
I'd like to attribute the CV columns (PAGE VIEWS) with the (ORDERS) order value, by the earliest event date in (PAGE VIEWS). This would be a simple attribution use case.
Visual explanation of the two data tables
Issues
I've spent the weekend result and scrolling through a variety of online articles but the closest is using the following query
Select min (event_day) As "first date",member_id,cv2,order_value,purchase_order_id
from mta_app_allpages,mta_app_orders
where member_id = membership_number
group by member_id,cv2,order_value,purchase_order_id;
The resulting data is correct using the DISTINCT function as Row 2 is different to Row 1, but I'd like to associate the result to Row 1 for member_id 113290, and row 3 for member_id 170897 etc.
Date
member_id
cv2
Order Value
2021-11-01
113290
camp5
58.81
2021-11-05
113290
camp4
58.51
2021-11-03
170897
camp3
36.26
2021-11-09
170897
camp5
36.26
2021-11-24
170897
camp1
36.26
Image showing the results table
I've tried using partition and sub query functions will little success. The correct call should return a maximum of 249 rows as that is as many rows as I have in the ORDERS table.
First-time poster so hopefully I have the format right. Many thanks.
Using RANK() is the best approach:
select * from
(
select *, RANK()OVER(partition by membership_number order by Event_Day) as rnk
from page_views as pv
INNER JOIN orders as o
ON pv.Member_ID=o.Membership_Number
) as q
where rnk=1
This will only fetch the minimum event_day.
However, you can use MIN() to achieve the same (but with complex sub-query):
select *
from
(select pv.*
from page_views as pv
inner join
(
select Member_ID, min(event_day) as mn_dt
from page_views
group by member_id
) as mn
ON mn.Member_ID=pv.Member_ID and mn.mn_dt=pv.event_day
)as sq
INNER JOIN orders as o
ON sq.Member_ID=o.Membership_Number
Both the queries will get us the same answer.
See the demo in db<>fiddle

Total Sum SQL Server

I have a query that collects many different columns, and I want to include a column that sums the price of every component in an order. Right now, I already have a column that simply shows the price of every component of an order, but I am not sure how to create this new column.
I would think that the code would go something like this, but I am not really clear on what an aggregate function is or why I get an error regarding the aggregate function when I try to run this code.
SELECT ID, Location, Price, (SUM(PriceDescription) FROM table GROUP BY ID WHERE PriceDescription LIKE 'Cost.%' AS Summary)
FROM table
When I say each component, I mean that every ID I have has many different items that make up the general price. I only want to find out how much money I spend on my supplies that I need for my pressure washers which is why I said `Where PriceDescription LIKE 'Cost.%'
To further explain, I have receipts of every customer I've worked with and in these receipts I write down my cost for the soap that I use and the tools for the pressure washer that I rent. I label all of these with 'Cost.' so it looks like (Cost.Water), (Cost.Soap), (Cost.Gas), (Cost.Tools) and I would like it so for Order 1 it there's a column that sums all the Cost._ prices for the order and for Order 2 it sums all the Cost._ prices for that order. I should also mention that each Order does not have the same number of Costs (sometimes when I use my power washer I might not have to buy gas and occasionally soap).
I hope this makes sense, if not please let me know how I can explain further.
`ID Location Price PriceDescription
1 Park 10 Cost.Water
1 Park 8 Cost.Gas
1 Park 11 Cost.Soap
2 Tom 20 Cost.Water
2 Tom 6 Cost.Soap
3 Matt 15 Cost.Tools
3 Matt 15 Cost.Gas
3 Matt 21 Cost.Tools
4 College 32 Cost.Gas
4 College 22 Cost.Water
4 College 11 Cost.Tools`
I would like for my query to create a column like such
`ID Location Price Summary
1 Park 10 29
1 Park 8
1 Park 11
2 Tom 20 26
2 Tom 6
3 Matt 15 51
3 Matt 15
3 Matt 21
4 College 32 65
4 College 22
4 College 11 `
But if the 'Summary' was printed on every line instead of just at the top one, that would be okay too.
You just require sum(Price) over(Partition by Location) will give total sum as below:
SELECT ID, Location, Price, SUM(Price) over(Partition by Location) AS Summed_Price
FROM yourtable
WHERE PriceDescription LIKE 'Cost.%'
First, if your Price column really contains values that match 'Cost.%', then you can not apply SUM() over it. SUM() expects a number (e.g. INT, FLOAT, REAL or DECIMAL). If it is text then you need to explicitly convert it to a number by adding a CAST or CONVERT clause inside the SUM() call.
Second, your query syntax is wrong: you need GROUP BY, and the SELECT fields are not specified correctly. And you want to SUM() the Price field, not the PriceDescription field (which you can't even sum as I explained)
Assuming that Price is numeric (see my first remark), then this is how it can be done:
SELECT ID
, Location
, Price
, (SELECT SUM(Price)
FROM table
WHERE ID = T1.ID AND Location = T1.Location
) AS Summed_Price
FROM table AS T1
to get exact result like posted in question
Select
T.ID,
T.Location,
T.Price,
CASE WHEN (R) = 1 then RN ELSE NULL END Summary
from (
select
ID,
Location,
Price ,
SUM(Price)OVER(PARTITION BY Location)RN,
ROW_number()OVER(PARTITION BY Location ORDER BY ID )R
from Table
)T
order by T.ID

Looking Up Value to Return Hourly Rate

I have a table storing hourly pay rates and a start and end value associated to each. The theory being that your hourly pay is dependent on your takings sitting between the start and end values.
Table Example - dbo.PayScales
PayScaleId Starting Ending HourlyRate
1 0.00 32.88 12.00
2 32.89 34.20 12.50
3 34.21 35.52 13.00
I have the takings stored in a separate table along with a person id, and I need to lookup the hourlyrate based on the takings (which I am having a complete mind block about)
Table Example - dbo.Employees
EmpId Takings HourlyRate
1 33.50
2 31.19
3 37.00
So my exepected results would be:
EmpId 1 Hourly rate = 12.50
EmpId 2 Hourly rate = 12.00
EmpId 3 Hourly rate = 13.00 as the value is greater than the ending value.
You can use CROSS APPLY together with TOP:
SELECT *
FROM dbo.Employees e
CROSS APPLY(
SELECT TOP 1 p.HourlyRate
FROM dbo.PayScales p
WHERE
e.Takings BETWEEN p.Starting AND p.Ending
OR e.Takings > p.Ending
ORDER BY p.Ending DESC
) t
ONLINE DEMO
Of course #FelixPamittan's answer solves the problem, but with a small change in your data it's down to a simple join.
Change the highest Ending to a really high value (999999999), greater than any Takings, or NULL:
FROM #Employees AS e
JOIN #PayScales AS p
ON e.Takings BETWEEN p.Starting AND p.Ending
-- or
ON e.Takings BETWEEN p.Starting AND COALESCE(p.Ending, 999999999)

SQL Server 2005 - SUM'ing one field, but only for the first occurence of a second field

Platform: SQL Server 2005 Express
Disclaimer: I’m quite a novice to SQL and so if you are happy to help with what may be a very simple question, then I won’t be offended if you talk slowly and use small words :-)
I have a table where I want to SUM the contents of multiple rows. However, I want to SUM one column only for the first occurrence of text in a different column.
Table schema for table 'tblMain'
fldOne {varchar(100)} Example contents: “Dandelion“
fldTwo {varchar(8)} Example contents: “01:00:00” (represents hh:mm:ss)
fldThree {numeric(10,0)} Example contents: “65”
Contents of table:
Row number fldOne fldTwo fldThree
------------------------------------------------
1 Dandelion 01:00:00 99
2 Daisy 02:15:00 88
3 Dandelion 00:45:00 77
4 Dandelion 00:30:00 10
5 Dandelion 00:15:00 200
6 Rose 01:30:00 55
7 Daisy 01:00:00 22
etc. ad nausium
If I use:
Select * from tblMain where fldTwo < ’05:00:00’ order by fldOne, fldTwo desc
Then all rows are correctly returned, ordered by fldOne and then fldTwo in descending order (although in the example data I've shown, all the data is already in the correct order!)
What I’d like to do is get the SUM of each fldThree, but only from the first occurrence of each fldOne.
So, SUM the first Dandelion, Daisy and Rose that I come across. E.g.
99+88+55
At the moment, I’m doing this programmatically; return a RecordSet from the Select statement above, and MoveNext through each returned row, only adding fldThree to my ‘total’ if I’ve never seen the text from fldOne before. It works, but most of the Select queries return over 100k rows and so it’s quite slow (slow being a relative term – it takes about 50 seconds on my setup).
The actual select statement (selecting about 100k rows from 1.5m total rows) completes in under a second which is fine. The current programatic loop is quite small and tight, it's just the number of loops through the RecordSet that takes time. I'm using adOpenForwardOnly and adLockReadOnly when I open the record set.
This is a routine that basically runs continuously as more data is added, and also the fldTwo 'times' vary, so I can't be more specific with the Select statement.
Everything that I’ve so far managed to do natively with SQL seems to run quickly and I’m hoping I can take the logic (and work) away from my program and get SQL to take the strain.
Thanks in advance
The best way to approach this is with window functions. These let you enumerate the rows within a group. However, you need some way to identify the first row. SQL tables are inherently unordered, so you need a column to specify the ordering. Here are some ideas.
If you have an id column, which is defined as an identity so it is autoincremented:
select sum(fldThree)
from (select m.*,
row_number() over (partition by fldOne order by id) as seqnum
from tblMain m
) m
where seqnum = 1
To get an arbitrary row, you could use:
select sum(fldThree)
from (select m.*,
row_number() over (partition by fldOne order by (select NULL as noorder)) as seqnum
from tblMain m
) m
where seqnum = 1
Or, if FldTwo has the values in reverse order:
select sum(fldThree)
from (select m.*,
row_number() over (partition by fldOne order by FldTwo desc) as seqnum
from tblMain m
) m
where seqnum = 1
Maybe this?
SELECT SUM(fldThree) as ExpectedSum
FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY fldOne ORDER BY fldTwo DSEC) Rn
FROM tblMain) as A
WHERE Rn = 1

SQL Rounding Percentages to make the sum 100% - 1/3 as 0.34, 0.33, 0.33

I am currently trying to split one value with percentage column. But as most of percentages values are 1/3, I am not able to get aboslute 100% with two decimal points in the value. For example:
Product Supplier percentage totalvalue customer_split
decimal(15,14) (decimal(18,2) decimal(18,2)
-------- -------- ------------ --------------- ---------------
Product1 Supplier1 0.33 10.00 3.33
Product1 Supplier2 0.33 10.00 3.33
Product1 Supplier3 0.33 10.00 3.33
So, here we are missing 0.01 in the value column and suppliers would like to put this missing 0.01 value against any one of the supplier randomly. I have been trying to get this done in a two sets of SQLs with temporary tables, but is there any simple way of doing this. If possible how can I get 0.34 in the percentage column itself for one of the above rows? 0.01 is negligible value, but when the value column is 1000000000 it is significant.
It sounds like you're doing some type of "allocation" here. This is a common problem any time you are trying to allocate something from a higher granulartiy to a lower granularity, and you need to be able to re-aggregate to the total value correctly.
This becomes a much bigger problem when dealing with larger fractions.
For example, if I try to divide a total value of, say $55.30 by eight, I get a decimal value of $6.9125 for each of the eight buckets. Should I round one to $6.92 and the rest to $6.91? If I do, I will lose a cent. I would have to round one to $6.93 and the others to $6.91. This gets worse as you add more buckets to divide by.
In addition, when you start to round, you introduce problems like "Should 33.339 be rounded to 33.34 or 33.33?"
If your business logic is such that you just want to take whatever remainder beyond 2 significant digits may exist and add it to one of the dollar values "randomly" so you don't lose any cents, #Diego is on the right track with this.
Doing it in pure SQL is a bit more difficult. For starters, your percentage isn't 1/3, it's .33, which will yield a total value of 9.9, not 10. I would either store this as a ratio or as a high-precision decimal field (.33333333333333).
P S PCT Total
-- -- ------------ ------
P1 S1 .33333333333 10.00
P2 S2 .33333333333 10.00
P3 S3 .33333333333 10.00
SELECT
BaseTable.P, BaseTable.S,
CASE WHEN BaseTable.S = TotalTable.MinS
THEN BaseTable.BaseAllocatedValue + TotalTable.Remainder
ELSE BaseTable.BaseAllocatedValue
END As AllocatedValue
FROM
(SELECT
P, S, FLOOR((PCT * Total * 100)) / 100 as BaseAllocatedValue,
FROM dataTable) BaseTable
INNER JOIN
(SELECT
P, MIN(S) AS MinS,
SUM((PCT * Total) - FLOOR((PCT * Total * 100)) / 100) as Remainder,
FROM dataTable
GROUP BY P) as TotalTable
ON (BaseTable.P = TotalTable.P)
It appears your calculation is an equal distribution based on the total number of products per supplier. If it is, it may be advantageous to remove the percentage and instead just store the count of items per supplier in the table.
If it is also possible to store a flag indicating the row that should get the remainder value applied to it, you could assign based on that flag instead of randomly.
run this, it will give an idea on how you can solve your problem.
I created a table called orders just with an ID to be easy to understand:
create table orders(
customerID int)
insert into orders values(1)
go 3
insert into orders values(2)
go 3
insert into orders values(3)
go 3
these values represent the 33% you have
1 33.33
2 33.33
3 33.33
now:
create table #tempOrders(
customerID int,
percentage numeric(10,2))
declare #maxOrder int
declare #maxOrderID int
select #maxOrderID = max(customerID) from orders
declare #total numeric(10,2)
select #total =count(*) from orders
insert into #tempOrders
select customerID, cast(100*count(*)/#total as numeric(10,2)) as Percentage
from orders
group by customerID
update #tempOrders set percentage = percentage + (select 100-sum(Percentage) from #tempOrders)
where customerID =#maxOrderID
this code will basically calculate the percentage and the order with the max ID, then it gets the diference from 100 to the percentage sum and add it to the order with the maxID (your random order)
select * from #tempOrders
1 33.33
2 33.33
3 33.34
This should be an easy task using Windowed Aggregate Functions. You probably use them already for the calculation of customer_split:
totalvalue / COUNT(*) OVER (PARTITION BY Product) as customer_split
Now sum up the customer_splits and if there's a difference to total value add (or substract) it to one random row.
SELECT
Product
,Supplier
,totalvalue
,customer_split
+ CASE
WHEN COUNT(*)
OVER (PARTITION BY Product
ROWS UNBOUNDED PRECEDING) = 1 -- get a random row, using row_number/order you might define a specific row
THEN totalvalue - SUM(customer_split)
OVER (PARTITION BY Product)
ELSE 0
END
FROM
(
SELECT
Product
,Supplier
,totalvalue
,totalvalue / COUNT(*) OVER (PARTITION BY Product) AS customer_split
FROM dropme
) AS dt
After more than one trial and test i think i found better solution
Idea
Get Count of all(Count(*)) based on your conditions
Get Row_Number()
Check if (Row_Number() value < Count(*))
Then select round(curr_percentage,2)
Else
Get sum of all other percentage(with round) and subtract it from 100
This steps will select current percentage every time EXCEPT Last one will be
100 - the sum of all other percentages
this is part of my code
Select your_cols
,(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID)
AS cnt_all
,(ROW_NUMBER() over ( order by pe.p_id)) as row_num
,Case when (
(ROW_NUMBER() over ( order by pe.p_id)) <
(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID))
then round(([partnership_partners_perc]*100),2)
else
100-
((select sum(round(([partnership_partners_perc]*100),2)) FROM [dbo].
[tbl_Partner_Entity] PEE where [E_ID] =#E_ID and pee.P_ID != pe.P_ID))
end AS [partnership_partners_perc_Last]
FROM [dbo].[tbl_Partner_Entity] PE
where [E_ID] =#E_ID