SQL query to add calculated field which compares data rows - sql

The heading of this question is probably poorly worded as I am finding it difficult to explain concisely what I want, other than to provide some demo data.
I have a query which returns the following data from a sql table:
ID Job User Amount
1 101 Bob 100
2 101 Pete 500
3 102 Bob 400
4 102 Pete 200
5 101 Pete 850
6 102 Bob 650
What I want is the query to also return an additional field called (Difference), which contains the difference between the Amount in consecutive entries for the same User and Job. Hence the data I would like returned would be as follows:
ID Job User Amount Diff
1 101 Bob 100 100
2 101 Pete 500 500
3 102 Bob 400 400
4 102 Pete 200 200
5 101 Pete 850 350
6 102 Bob 650 250
In the first four rows, the Diff is the same as the Amount because each is the first entry per User per Job (hence the Difference is calculated with reference to a starting Amount of nil in effect).
The last two lines contain information for a User and Job combination that have appeared in the table previously, and hence Diff is calculated as follows:
Job 101 User Pete 850 - 500 = 350
Job 102 User Bob 650 - 400 = 250
I've never had to compare data from rows like this in a SQL query before so don't really know where to start. Any help would be much appreciated.
Added
Please note the Amount is not a running total. It is a subjective assessment made periodically of the value of a User's input in each particular job. It is possible that the Amount could in fact go down from one assessment to the next. What I want is a query that returns the difference between successive assessments 'Amounts'.
Alternative Explanation
I'm looking to return a history trail of movements in the Amount assessed. So another example, looking at a single Job and User is as follows:
Job User Amount Movement
101 Bob 100 100
101 Bob 500 400
101 Bob 400 (100)
101 Bob 1,000 600
However, as per the original example, this information will need to be extracted from a table which contains many Jobs and Users all intermingled.

For SQL Server 2012, try this
This assumes that ID=5 value is wrong in your example
For "previous value" per pair
DECLARE #t TABLE (ID int, Job int, Username varchar(10), Amount int);
INSERT #t
VALUES
(1, 101, 'Bob', 100), (2, 101, 'Pete', 500), (3, 102, 'Bob', 400),
(4, 102, 'Pete', 200), (5, 101, 'Pete', 850), (6, 102, 'Bob', 650);
SELECT
t1.*,
t1.Amount - ISNULL(LAG(Amount) OVER (PARTITION BY Job, Username ORDER BY ID), 0) AS DiffAmount
FROM
#t t1
ORDER BY
t1.ID
For "first value" per pair
SELECT
t1.*,
CASE
WHEN FIRST_VALUE(t1.ID) OVER (PARTITION BY Job, Username ORDER BY ID) = t1.ID THEN t1.Amount
ELSE t1.Amount - FIRST_VALUE(t1.Amount) OVER (PARTITION BY Job, Username ORDER BY ID)
END AS DiffAmount
FROM
#t t1
ORDER BY
t1.ID

Related

Dividing a sum value into multiple rows due to field length constraint

I am migrating financial data from a very large table (100 million+ of rows) by summarizing the amount and insert them into summary table. I ran into problem when the summary amount (3 billions) is larger than what the field in the summary table can hold (can only hold up to 999 millions.) Changing the field size is not an option as it requires a change process.
The only option I have is to divide the amount (the one that breach the size limit) into smaller ones so it can be inserted into the table.
I came across this SQL - I need to divide a total value into multiple rows in another table which is similar except the number of rows I need to insert is dynamic.
For simplicity, this is how the source table might look like
account_table
acct_num | amt
-------------------------------
101 125.00
101 550.00
101 650.00
101 375.00
101 475.00
102 15.00
103 325.00
103 875.00
104 200.00
104 275.00
The summary records are as follows
select acct_num, sum(amt)
from account_table
group by acct_num
Account Summary
acct_num | amt
-------------------------------
101 2175.00
102 15.00
103 1200.00
104 475.00
Assuming the maximum value in the destination table is 1000.00, the expected output will be
summary_table
acct_num | amt
-------------------------------
101 1000.00
101 1000.00
101 175.00
102 15.00
103 1000.00
103 200.00
104 475.00
How do I create a query to get the expected result? Thanks in advance.
You need a numbers table. If you have a handful of values, you can define it manually. Otherwise, you might have one on hand or use a similar logic:
with n as (
select (rownum - 1) as n
from account_table
where rownum <= 10
),
a as (
select acct_num, sum(amt) as amt
from account_table
group by acct_num
)
select acct_num,
(case when (n.n + 1) * 1000 < amt then 1000
else amt - n.n * 1000
end) as amt
from a join
n
on n.n * 1000 < amt ;
A variation along these lines might give some ideas (using the 1,000 of your sample data):
WITH summary AS (
SELECT acct_num
,TRUNC(SUM(amt) / 1000) AS times
,MOD(SUM(amt), 1000) AS remainder
FROM account_table
GROUP BY acct_num
), x(acct_num, times, remainder) AS (
SELECT acct_num, times, remainder
FROM summary
UNION ALL
SELECT s.acct_num, x.times - 1, s.remainder
FROM summary s
,x
WHERE s.acct_num = x.acct_num
AND x.times > 0
)
SELECT acct_num
,CASE WHEN times = 0 THEN remainder ELSE 1000 END AS amt
FROM x
ORDER BY acct_num, amt DESC
The idea is to first build a summary table with div and modulo:
ACCT_NUM TIMES REMAINDER
101 2 175
102 0 15
103 1 200
104 0 475
Then perform a hierarchical query on the summary table based on the number of "times" (i.e. rows) you want, with an extra for the remainder.
ACCT_NUM AMT
101 1000
101 1000
101 175
102 15
103 1000
103 200
104 475

Distributing Records Evenly From One Table to Another

I have 3 tables:
Users
-----
UserID (varchar)
Active (bit)
Refunds_Upload
--------------
BorrowerNumber (varchar)
Refunds
-------
BorrowerNumber
UserID
I first select all of the UserID values where Active = 1.
I need to insert the records from Refunds_Upload to Refunds but I need to insert the same (or as close as possible) number of records for each Active UserID.
For example, if Refunds_Upload has 20 records and the Users table has 5 people where Active = 1, then I would need to insert 4 records per UserID into table Refunds.
End Result would be:
BorrowerNumber UserID
105 Fred
110 Fred
111 Fred
115 Fred
120 Billy
122 Billy
123 Billy
125 Billy
130 Lucius
131 Lucius
133 Lucius
135 Lucius
138 Lucy
139 Lucy
140 Lucy
141 Lucy
142 Grady
143 Grady
144 Grady
145 Grady
Of course, it won't always come to an even number of records per User so I need to account for that as well.
First run this and check it returns something like what you want to insert, before you uncomment the insert and actually carry it out..
--INSERT INTO Refunds
SELECT
numbered_u.UserID,
numbered_ru.BorrowerNumber
FROM
(SELECT u.*, ROW_NUMBER() OVER(ORDER BY UserID) - 1 as rown, SUM(CAST(Active as INT)) OVER() as count_users FROM Users u WHERE active=1) numbered_u
INNER JOIN
(SELECT ru.*, ROW_NUMBER() OVER(ORDER BY BorrowerNumber) - 1 as rown, COUNT(*) OVER() as count_ru FROM Refund_Uploads ru) numbered_ru
ON
ROUND(CAST(numbered_ru.rown AS FLOAT) / (count_ru / count_users)) = numbered_u.rown
The logic:
We number every interesting (active=1) row in users and we also count them all. This should return us all 5 users, numbered 0 to 4 and with a ctr that is 5 on each row.
Then we join them to a similarly numbered list of Refund_Uploads (say 20). Similarly, those rows will be numbered 0 to 19 for mathematical reasons that become apparent later. We also count all these rows too
And we then join these two datasets together but the condition is a range of values rather than exact values. The logic is "refund_upload row number, divided by the_count_of_rows_there_should_be_per_user" (i.e. 0..19 / (20/5) ) = user_row_number. Hopefully thus refund rows 0 to 3, associate with user 0, refund rows 4 thru 7 associate with user 1.. etc
It's a little hard to debug without full data - I feel it might need a few +1 / -1 tweaks here and there.
I originally used FLOOR but switched to using ROUND, as I think this might work for distributing sets of numbers where there isn't a whole number of divisions in Refund/User e.g. your 240/13 example.. Hopefully some users will have 18 rows and some 19

Oracle : merging a row and its next one into the same row

I have an issue here.
I have these 4 rows of data :
Origin Destination Distance Carrier Price
Miami New-York 800 BF 500
Dallas Chicago 300 AL 200
Dallas Chicago 300 KH 200
Miami New-York 800 JH 500
What i want is to merge rows 2 and 3 into one row like this :
Dallas Chicago 300 AL, KH 200 (All information is the same except the Carrier)
The problem is that I have to check if the previous row is containing the same information except carriers, for all rows.
How can I achieve that ? with LEAD and LAG ?
Thanks for your help.
Do a self join:
select t1.Origin, t1.Destination, t1.Distance, t1.Carrier, t2.Distance, t2.Carrier
from table t1
join table t2 on t1.Origin = t2.Origin
and t1.Destination = t2.Destination
and t1.Carrier < t2.Carrier
Row order unimportant here. (Of course, that's the dbms way!)
If you want to return alone flights too, do LEFT JOIN instead of just JOIN.
Here you go. But it will add Miami New-York row too. If you want only 2 adjecent rows to be merged, then you need another column like ID or InsertDate or something like that. Then we can modify the given query to aggregate based on that.
with tbl (Origin, Destination, Distance ,Carrier ,Price)
as
(select 'Miami','New-York',800,'BF ', 500 from dual union
select 'Dallas','Chicago',300,' AL', 200 from dual union
select 'Dallas','Chicago',300,' KH', 200 from dual union
select 'Miami','New-York',800,'JH',500 from dual)
select Origin,Destination,Distance,listagg(carrier,',') WITHIN GROUP (ORDER BY origin ) as AggCarrier,Price from tbl
group by Origin,Destination,Distance,Price
Output
Origin Destination Distance AggCarrier Price
Miami New-York 800 BF ,JH 500
Dallas Chicago 300 AL, KH 200
EDIT: Unless we have any column to identify the insert order of the data, you cannot achieve what you want. See the below example. I tried addidng rownum to your data. But it will not assign the rownum in the way you want exactly. It has to come from the table which you want to use. See the example below.
with tbl (Origin, Destination, Distance ,Carrier ,Price)
as
(select 'Miami','New-York',800,'BF', 500 from dual union
select 'Dallas','Chicago',300,'AL', 200 from dual union
select 'Dallas','Chicago',300,'KH', 200 from dual union
select 'Miami','New-York',800,'JH',500 from dual
)
select rownum,tbl.* from tbl
The Output doestn't return Miami first row before Dallas.
ROWNUM ORIGIN DESTINATION DISTANCE CARRIER PRICE
1 Dallas Chicago 300 AL 200
2 Dallas Chicago 300 KH 200
3 Miami New-York 800 BF 500
4 Miami New-York 800 JH 500
So you need anything. ID/InsertTime or any other identifier to find that. Else DB will never know which record was inserted first. Please ask for it to achieve what you want.

Execute an SQL UPDATE using GROUP BY and COUNT

I am working with SQL in an SQLite database. I have a table that looks something like this:
STORAGE
------------------------------
REC_ID SEQ_NO NAME
------------------------------
100 1 plastic jar
100 2 glass cup
100 fiber rug
101 1 steel fork
101 wool scarf
102 1 leather boots
102 2 paintbox
102 3 cast iron pan
102 toolbox
Keep in mind that that this is a very small number of records compared to what I actually have in the table. What I need to do is update the table so that all the records that have a null value for SEQ_NO are set with the actual number they are supposed to be in sequence to the group of records with the same REC_ID.
Here is what I want the table to look like after the update:
STORAGE
------------------------------
REC_ID SEQ_NO NAME
------------------------------
100 1 plastic jar
100 2 glass cup
100 3 fiber rug
101 1 steel fork
101 2 wool scarf
102 1 leather boots
102 2 paintbox
102 3 cast iron pan
102 4 toolbox
so for example, the record with REC_ID 102 should have have SEQ_NO of 4, because it is the fourth record with the REC_ID 102.
If I do:
SELECT REC_ID, COUNT(*) FROM STORAGE GROUP BY REC_ID;
this returns all of the records by REC_ID and the number (count) of records matching each ID, which would also be the number I would want to assign to each of the records with a null SEQ_NO.
Now how would I go about actually updating all of these records with their count values?
this should work:
update storage set
seq_no=(select count(*) from storage s2 where storage.rec_id=s2.rec_id)
where seq_no is null

oracle sql query to get data from two tables of similar type

I have two tables ACTUAL AND ESTIMATE having unique column(sal_id, gal_id, amount, tax).
In ACTUAL table I have
actual_id, sal_id, gal_id, process_flag, amount, tax
1 111 222 N 100 1
2 110 223 N 200 2
In ESTIMATE table I have
estimate_id, sal_id, gal_id, process_flag, amount, tax
3 111 222 N 50 1
4 123 250 N 150 2
5 212 312 Y 10 1
Now I want a final table, which should have record from ACTUAL table and if no record exist for sal_id+gal_id mapping in ACTUAL but exist in ESTIMATE, then populate estimate record (along with addition of amount and tax).
In FINAL table
id sal_id, gal_id, actual_id, estimate_id, total
1 111 222 1 null 101 (since record exist in actual table for 111 222)
2 110 223 2 null 202 (since record exist in actual table for 110 223)
3 123 250 null 4 51 (since record not exist in actual table but estimate exist for 123 250)
(for 212 312 combination in estimate, since record already processed, no need to process again).
I am using Oracle 11g. Please help me on writing a logic in a single sql query?
Thanks.
There are several ways to write this query. One way is to use join and coalesce:
select coalesce(a.sal_id, e.sal_id) as sal_id,
coalesce(a.gal_id, e.gal_id) as gal_id,
coalesce(a.actual_value, e.estimate_value) as actual_value
from actual a full outer join
estimate e
on a.sal_id = e.sal_id and
a.gal_id = e.gal_id
This assumes that sal_id/gal_id provides a unique match between the tables.
Since you are using Oracle, here is perhaps a clearer way of doing it:
select sal_id, gal_id, actual_value
from (select *,
max(isactual) over (partition by sal_id, gal_id) as hasactual
from ((select 1 as isactual, *
from actual
) union all
(select 0 as isactual, *
from estimate
)
) t
) t
where isactual = 1 or hasactual = 0
This query uses a window function to determine whether there is an actual record with the matching sal_id/gal_id. The logic is to take all actuals and then all records that have no match in the actuals.