Finding the next occurrence of a value in a table - sql

Sorry in advance if this has already been covered.
I am working on a database which isnt particularly well structured but it is owned by a third party and cannot be changed.
I need some assistance with t-sql in find the next occurrence of a value within the table and return records based on the result. Let me first explain the data. I have simplified this to make it easier to understand.
Polref Effective Date Transaction Type Suffix Value
ABCD1 01/06/2010 New Bus 1 175.00
ABCD1 01/06/2011 Ren 2 200.00
ABCD1 19/08/2011 Adjust 3 50.00
ABCD1 23/04/2012 Adjust 4 50.00
ABCD1 01/06/2012 Ren 5 275.00
So if I ran my query for 2011, the code would need to return in this example rows with suffix 2,3 and 4. So what I have been trying to do is find the first suffix of a New Bus or Ren for the specified year and then finding the next suffix for a New Bus or Ren for the same polref and then using those two suffix values to limit my recordset. It aint working!!
I cant use MAX() as transactions for 2013 have already been added to the system to I would get more records than I actually need.
There result I should be expecting for this example data would be:
ABCD1 300.00
Any help would be greatly appreciated.
To answer another question, If I select 2011 as my year to run the report, there should only be one New Bus or Ren transaction for 2011 so if its a New Bus transaction, the next main transaction will be a Ren, if its a Ren then the next main transaction will be a Ren. Again in my example below, if I run for 2011, it should find the Ren from 01/06/2011 so I want to return that Ren and the two Adjust records.
Sorry, I've not used this forum before so apologies if I was a little vague.
The table I am using has many polrefs so I need this code to calculate totals for all polrefs that fall within the date range. Some polrefs may only have one row, a New Bus, some will have many rows depending on how many adjustments have been made throughout the year of the policy

Partial answer:
This query:
declare #t table (PolRef char(5) not null, EffectiveDate date not null,TransactionType varchar(10) not null,Suffix int not null,Value decimal(10,2) not null)
insert into #t (Polref,EffectiveDate,TransactionType,Suffix,Value) values
('ABCD1','20100601','New Bus',1,175.00),
('ABCD1','20110601','Ren',2,200.00),
('ABCD1','20110819','Adjust',3,50.00),
('ABCD1','20120423','Adjust',4,50.00),
('ABCD1','20120601','Ren',5,275.00)
;With StartTransactions as (
select PolRef,Suffix,ROW_NUMBER() OVER (PARTITION BY PolRef ORDER BY Suffix) rn
from #t where TransactionType in ('New Bus','Ren')
), Periods as (
select st1.PolRef,st1.Suffix as StartSuffix,st2.Suffix as EndSuffix
from
StartTransactions st1
left join
StartTransactions st2
on
st1.PolRef = st2.PolRef and
st1.rn = st2.rn - 1
)
select
p.PolRef,t2.EffectiveDate,SUM(t.Value) as Total
from
Periods p
inner join
#t t
on
p.PolRef = t.PolRef and
p.StartSuffix <= t.Suffix and
(p.EndSuffix > t.Suffix or
p.EndSuffix is null)
inner join
#t t2
on
p.PolRef = t2.PolRef and
t2.Suffix = p.StartSuffix
group by
p.PolRef,t2.EffectiveDate
Groups each set of transactions based on each successive Ren or New Bus transaction:
PolRef EffectiveDate Total
------ ------------- ---------------------------------------
ABCD1 2010-06-01 175.00
ABCD1 2011-06-01 300.00
ABCD1 2012-06-01 275.00
From that, it should be trivial to e.g. select out only the ones you're interested in from a particular year. But your question is still vague on some specifics, so I'm not taking it any further at this point.

Related

Duplicate rows because 1 column has multiple distinct values

I'm running a SELECT query to get data across multiple tables in the same server instance. However I've just noticed that the rows pulled on some data get duplicated because the main table I'm pulling from has a few different values in one of the columns. Here's the query:
SELECT DISTINCT BIF030.C_ACCOUNT AS ACCOUNTNUMBER,
BIF003.C_ACCOUNTTYPE AS ACCOUNTTYPECODE,
CON013.C_DESCRIPTION AS ACCOUNTTYPE,
BIF003.C_DIVISION AS ZONE_DIVISONCODE,
CON028.C_DESCRIPTION AS ZONE_DIVISION,
BIF030.C_METER as METERNUMBER,
BIF005.C_METERCUSTOM1 AS REGISTERNUMBER,
CONVERT(DECIMAL(20,2), BIF030.N_CONSUMP) AS CONSUMPTION,
CON007.C_DESCRIPTION AS UNITS,
BIF030.T_READDATE AS READINGDATE,
MONTH(BIF030.T_READDATE) AS READINGMONTH,
DAY(BIF030.T_READDATE) AS READINGDAY,
YEAR(BIF030.T_READDATE) AS READINGYEAR,
BIF030.I_DAYS AS READINGDAYSCOUNT
FROM ADVANCED.BIF030
LEFT JOIN ADVANCED.CON007 ON CON007.C_UNITS=BIF030.C_UNITS
LEFT JOIN ADVANCED.BIF005 ON BIF005.C_METER=BIF030.C_METER
LEFT JOIN ADVANCED.BIF003 ON BIF003.C_ACCOUNT=BIF030.C_ACCOUNT
LEFT JOIN ADVANCED.CON013 ON CON013.C_ACCOUNTTYPE=BIF003.C_ACCOUNTTYPE
LEFT JOIN ADVANCED.CON028 ON CON028.C_DIVISION=BIF003.C_DIVISION
WHERE T_READDATE > '01-01-2014'
ORDER BY ACCOUNTNUMBER, READINGDATE ASC
I know SELECT DISTINCT is frowned upon, but I get even more rows without it. Here's a sample of what the data looks like when pulled:
ACCOUNTNUMBER
ACCOUNTTYPECODE
ACCOUNTTYPE
ZONE_DIVISIONCODE
ZONE_DIVISION
METERNUMBER
REGISTERNUMBER
CONSUMPTION
UNITS
READINGDATE
READINGMONTH
READINGDAY
READINGYEAR
READINGDAYSCOUNT
1234567
SP
ACCOUNT TYPE 1
00
00-NO ZONE
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
MF
ACCOUNT TYPE 2
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
SR
ACCOUNT TYPE 3
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
I also know the column that is messing this up is the "AccountTypeCode" because other accounts that don't have multiple codes associated with the "AccountNumber" only show 1 set of rows. So this one specifically (and probably others) is tripling the amount of rows pulled when it should only pull one for each "ReadingDate".
Also if anyone knows a good way to optimize the query I'd be happy to learn. I know just enough SQL to be dangerous, but not enough to figure this out. Thanks.
Ok. So good news and I want to add this in case it helps anyone else in the future. I found out that since the ACCOUNTTYPECODE and ZONE_DIVISIONCODE were coming from the table BIF003 I needed to add more in the WHERE statement. This is what fixed it for me:
AND BIF030.C_CUSTOMER = BIF003.C_CUSTOMER
Because the C_CUSTOMER column was different (it's a column in the BIF003 and BIF030 tables) which lead to the separate ACCOUNTTYPECODE results I need to check it in the WHERE statement.
Thanks everyone for kick starting my brain on this one.

How to get the set size, first and last record in a db2 ordered set with one call

I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456
As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected

Compare a date with a block of dates in SQL

i tried to ready a lot of date comparisons that i found here on stackoverflow and spread into the internet but i wasn't able to find the solution.
I have the following table (Trips):
VehicleID DriverID xID CheckIn CheckOut DateHour
462 257 7 1 0 16/12/2017 20:40:00
462 257 7 0 1 19/12/2017 10:05:00
5032 3746 11 1 0 02/10/2017 07:00:00
5032 3746 11 0 1 06/10/2017 17:00:00
When my company receives a traffic ticket, i want to compare the date from the ticket with the hole block of dates from the table "Trips", each block starts with CheckIn = 1 and finishes with CheckOut = 1, so this way i will know which driver was responsable for the ticket through the DriverID.
For example: the traffic ticket date and time are: 17/12/2017 08:00:00 and the Vehicle is the one with id = 462, i'll insert this date and time in a field in our system to consult automaticaly which driver was driving that car at that moment, we won´t use the ticket table yet. Looking at my example, i know it should return DriverID = 257, but theres a lot of trips with the same vehicle and diferent drivers.....The major problem is how can i compare the Date and Hour from the Ticket with the range of dates from the trips, since i have to consider 1 trip = 2 lines in the table
Unfortunately i can't change the way this table was created, cause we need this 2 lines, CheckIn and CheckOut, separately.
Any thoughts or directions?
Thank you for your attention
select t1.VehicleID
,t1.DriverID
,t1.xID
,t1.DateHour as Checkin
,t2.DateHour as Checkout
from trips as t1 join trips as t2 --self join trips to get both start and end in a single row
on t1.VehicleID = t2.VehicleID -- add all columns
and t1.DriverID = t2.DriverID -- which define
and t1.xID = t2.xID -- a unique trip
and t1.Checkin = 1 -- start
and t2.Checkout = 1 -- end
join tickets -- now join tickets
on tickets.trafficDateHour between t1.DateHour and t2.DateHour
I didn't make sample tables, this will not run as is, but something like this should do it for you:
SELECT *
FROM tickets, trips
WHERE
trips.datehour in (
SELECT trips.datehour
FROM tickets, trips
WHERE
tickets.ticket_date < trips.datehour AND
trips.checkin = 0
) AND
tickets.ticket_date > trips.datehour AND
trips.checkin = 1
If you are running this for a specific date as described in the comment above, it will work. If you are trying to run it for a set of ticket dates all at once, you'll require recursion. Recursion is a different beast depending on your flavor of SQL.

Set based approach to SQL Server insert where one column is calculated as Max from same column

I'm wondering if this is one of those situations where I'm forced to use a cursor or if I can use a set based approach. I've searched for several hours and also tried to come up with a solution myself to no avail.
I've got a table, SuperSupplierCodes, that contains two columns: SuperSupplierCode INT, and SupplierName NVARCHAR(50).
SuperSupplierID SupplierName
1 21ST CENTURY GRAPHIC TECHNOLOGIES LLC
2 3D SYSTEMS
3 3G
4 A A ABRASIVOS ARGENTINOS SAIC
5 A AND F DRUCKLUFTTECHNIK GMBH
6 A BAY STATIONERS
7 A C T TOOL AND ENGINEERING LLC
8 A HERZOG AG
9 A LI T DI MONTANARI MARCO AND CO SAS
11 A RAYMOND GMBH AND CO KG
I've got a second table with millions of rows in it containing financial data as well as the SupplierName column.
LocalSupplierName
23 JAN HOFMEYER ROAD
303 TAXICAB, LLC
3D MECA SARL
3D SYSTEMS
3E CO ENVIRONMENTAL, ECO. & EN
3E COMPANY
What I need to do is insert into the SuperSupplierCodes table such that each row gets the MAX(SuperSupplierCode) from the previous row, increments it by one, and inserts that into the SuperSupplierCode column along with the SupplierName from the second table.
I've tried the following, just as a test, that I might be able to use for the insert, but of course it will only do the increment once and try to use that same value for SuperSupplierCode for every row:
SELECT s.SuperSupplierID,
s.SupplierName,
s.SupplierAddress,
s.DateCreated,
s.DateModified,
s.SupplierCode,
s.PlantName,
s.id,
x.MaxSSC
FROM SuperSupplierCodes AS s
CROSS APPLY (SELECT MAX(SuperSupplierID)+1 AS MaxSSC FROM dbo.SuperSupplierCodes) x;
I don't like using cursors unless I absolutely have to. Is there a way to do this with T-SQL in a set based manner versus using a cursor?
Create the column as an identity and insert the existing records once using SET IDENTITY_INSERT ON option. Then switch it off for adding new Ids and they will be incremented.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-identity-insert-transact-sql?view=sql-server-2017
Why not something like this?
SELECT (SELECT MAX(SuperSupplierID) FROM dbo.SuperSupplierCodes) + ROW_NUMBER() OVER (ORDER BY s.DateCreated) AS SuperSupplierID,
s.SupplierName,
s.SupplierAddress,
s.DateCreated,
s.DateModified,
s.SupplierCode,
s.PlantName,
s.id
FROM SuperSupplierCodes AS s;
We use the above technique at my work all the time when inserting rows. If some have existing values, you can insert them all into the table and then change the above to only update values that are currently null.

Optimal solution for interview question

Recently in a job interview, I was given the following problem.
Say I have the following table
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
a | 15.00 | 1
b | 30.00 | 1
c | 20.00 | 1
d | 25.00 | 1
where widget_name is holds the name of the widget, widget_costs is the price of a widget, and in stock is a constant of 1.
Now for my business insurance I have a certain deductible. I am looking to find a sql statement that will tell me every widget and it's price exceeds the deductible. So if my dedudctible is $50.00 the above would just return
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
a | 15.00 | 1
d | 25.00 | 1
Since widgets b and c where used to meet the deductible
The closest I could get is the following
SELECT
*
FROM (
SELECT
widget_name,
widget_price
FROM interview.tbl_widgets
minus
SELECT widget_name,widget_price
FROM (
SELECT
widget_name,
widget_price,
50 - sum(widget_price) over (ORDER BY widget_price ROWS between unbounded preceding and current row) as running_total
FROM interview.tbl_widgets
)
where running_total >= 0
)
;
Which gives me
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
c | 20.00 | 1
d | 25.00 | 1
because it uses a and b to meet the majority of the deductible
I was hoping someone might be able to show me the correct answer
EDIT: I understood the interview question to be asking this. Given a table of widgets and their prices and given a dollar amount, substract as many of the widgets you can up to the dollar amount and return those widgets and their prices that remain
I'll put an answer up, just in case it's easier than it looks, but if the idea is just to return any widget that costs more than the deductible then you'd do something like this:
Select
Widget_Name, Widget_Cost, In_Stock
From
Widgets
Where
Widget_Cost > 50 -- SubSelect for variable deductibles?
For your sample data my query returns no rows.
I believe I understand your question, but I'm not 100%. Here is what I'm assuming you mean:
Your deductible is say, $50. To meet the deductible you have you "use" two items. (Is this always two? How high can it go? Can it be just one? What if they don't total exactly $50, there is a lot of missing information). You then want to return the widgets that aren't being used towards deductible. I have the following.
CREATE TABLE #test
(
widget_name char(1),
widget_cost money
)
INSERT INTO #test (widget_name, widget_cost)
SELECT 'a', 15.00 UNION ALL
SELECT 'b', 30.00 UNION ALL
SELECT 'c', 20.00 UNION ALL
SELECT 'd', 25.00
SELECT * FROM #test t1
WHERE t1.widget_name NOT IN (
SELECT t1.widget_name FROM #test t1
CROSS JOIN #test t2
WHERE t1.widget_cost + t2.widget_cost = 50 AND t1.widget_name != t2.widget_name)
Which returns
widget_name widget_cost
----------- ---------------------
a 15.00
d 25.00
This looks like a Bin Packing problem these are really hard to solve especially with SQL.
If you search on SO for Bin Packing + SQL, you'll find how to find Sum(field) in condition ie “select * from table where sum(field) < 150” Which is basically the same problem except you want to add a NOT IN to it.
I couldn't get the accepted answer by brianegge to work but what he wrote about it in general was interesting
..the problem you
describe of wanting the selection of
users which would most closely fit
into a given size, is a bin packing
problem. This is an NP-Hard problem,
and won't be easily solved with ANSI
SQL. However, the above seems to
return the right result, but in fact
it simply starts with the smallest
item, and continues to add items until
the bin is full.
A general, more effective bin packing
algorithm would is to start with the
largest item and continue to add
smaller ones as they fit. This
algorithm would select users 5 and 4.
So with this advice you could write a cursor to loop over the table to do just this (it just wouldn't be pretty).
Aaron Alton gives a nice link to a series of articles that attempts to solve the Bin Packing problem with sql but basically concludes that its probably best to use a cursor to do it.