Distributing Records Evenly From One Table to Another - sql

I have 3 tables:
Users
-----
UserID (varchar)
Active (bit)
Refunds_Upload
--------------
BorrowerNumber (varchar)
Refunds
-------
BorrowerNumber
UserID
I first select all of the UserID values where Active = 1.
I need to insert the records from Refunds_Upload to Refunds but I need to insert the same (or as close as possible) number of records for each Active UserID.
For example, if Refunds_Upload has 20 records and the Users table has 5 people where Active = 1, then I would need to insert 4 records per UserID into table Refunds.
End Result would be:
BorrowerNumber UserID
105 Fred
110 Fred
111 Fred
115 Fred
120 Billy
122 Billy
123 Billy
125 Billy
130 Lucius
131 Lucius
133 Lucius
135 Lucius
138 Lucy
139 Lucy
140 Lucy
141 Lucy
142 Grady
143 Grady
144 Grady
145 Grady
Of course, it won't always come to an even number of records per User so I need to account for that as well.

First run this and check it returns something like what you want to insert, before you uncomment the insert and actually carry it out..
--INSERT INTO Refunds
SELECT
numbered_u.UserID,
numbered_ru.BorrowerNumber
FROM
(SELECT u.*, ROW_NUMBER() OVER(ORDER BY UserID) - 1 as rown, SUM(CAST(Active as INT)) OVER() as count_users FROM Users u WHERE active=1) numbered_u
INNER JOIN
(SELECT ru.*, ROW_NUMBER() OVER(ORDER BY BorrowerNumber) - 1 as rown, COUNT(*) OVER() as count_ru FROM Refund_Uploads ru) numbered_ru
ON
ROUND(CAST(numbered_ru.rown AS FLOAT) / (count_ru / count_users)) = numbered_u.rown
The logic:
We number every interesting (active=1) row in users and we also count them all. This should return us all 5 users, numbered 0 to 4 and with a ctr that is 5 on each row.
Then we join them to a similarly numbered list of Refund_Uploads (say 20). Similarly, those rows will be numbered 0 to 19 for mathematical reasons that become apparent later. We also count all these rows too
And we then join these two datasets together but the condition is a range of values rather than exact values. The logic is "refund_upload row number, divided by the_count_of_rows_there_should_be_per_user" (i.e. 0..19 / (20/5) ) = user_row_number. Hopefully thus refund rows 0 to 3, associate with user 0, refund rows 4 thru 7 associate with user 1.. etc
It's a little hard to debug without full data - I feel it might need a few +1 / -1 tweaks here and there.
I originally used FLOOR but switched to using ROUND, as I think this might work for distributing sets of numbers where there isn't a whole number of divisions in Refund/User e.g. your 240/13 example.. Hopefully some users will have 18 rows and some 19

Related

Solving Logical Questions Using SQL

I am trying to solve a problem for a fun work exercise showing that SQL can be used to solve it. It is a puzzle that goes as follows:
Successfully navigating the waters during sea voyages is a challenging task. A captain’s most important decision is selecting the right crew for the voyage. A mix of different skill sets are required to sail the ship efficiently, navigate to the destination, and fish for food along the way.
Table 1 shows a list of crew members that are available for you to hire for the voyage. Each crew member demands a salary for the voyage and has different skill levels of Fishing, Sailing, and Navigation.
In order for your journey to be successful, you must have a cumulative skill of 15 or more in each of the three skill categories from all of your chosen crew members. You may choose as many crew members as you like.
Question: What is the minimum achievable cost for the voyage?"
I would say I am what I would consider an intermediate to advanced (depending on the situation) SQL user.
Not asking for an answer per-say but I have thought about the best way to solve and I was first thinking using a WHILE loop in some way. I have create a table to hold the data and added a 'salary_ranking' column (below). I am curious if anyone has any tips or suggestions on routes to go? I would like to use something I have never used before but also am trying to get to the most efficient answer.
Here is the data (I added the last column):
NAME FISHING SAILING NAVIGATION SALARY SALARY_RANK
---------- ----------- ----------- ----------- ----------- -----------
Amy 3 5 1 46000 3
Bill 1 2 5 43000 2
Carl 3 4 2 47000 4
Dan 4 3 1 36000 1
Eva 4 2 2 43000 2
Fred 1 3 4 55000 5
Greg 3 1 5 68000 8
Henry 5 4 2 64000 7
Ida 3 3 3 60000 6
(9 rows affected)
This is a CTE version, where I first create test data, then run a recursive query, using a MaxID to prevent it doing all the permutations.
declare #t table(Id int, NAME varchar(10), FISHING int, SAILING int, NAVIGATION int, SALARY int)
insert #t values (1,'Amy',3,5,1,46000)
,(2,'Bill',1,2,5,43000 )
,(3,'Carl',3,4,2,47000)
,(4,'Dan',4,3,1,36000)
,(5,'Eva',4,2,2,43000)
,(6,'Fred',1,3,4,55000)
,(7,'Greg',3,1,5,68000)
,(8,'Henry',5,4,2,64000)
,(9,'Ida',3,3,3,60000 )
;with cte as (
select convert(varchar(1000),name) as crew, fishing, sailing, navigation, salary, ID as MaxID from #t
union all
select convert(varchar(1000),cte.crew+', '+ t.name), cte.fishing+t.fishing, cte.sailing+t.sailing, cte.navigation+t.navigation, cte.salary+t.salary, t.ID
from #t t
join cte on t.ID>cte.MaxID
)
select top 1 crew,fishing,sailing,navigation,salary
from cte
where fishing>=15 and sailing>=15 and navigation>=15
order by salary
result is:
crew fishing sailing navigation salary
Amy, Bill, Carl, Greg, Henry 15 16 15 268000

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

How to Check Duplicate value SQL table?

I am using SQL server.Import data from Excel . i have Following Fields column
Entity ExpenseTypeCode Amount Description APSupplierID ExpenseReportID
12 001 5 Dinner 7171 90
12 001 6 Dinner 7171 90
12 001 5 Dinner 7273 90
12 001 5 Dinner 7171 95
12 001 5 Dinner 7171 90
I added Sample Data. Now I want select Duplicate Records .which Rows have all columns value same i want fetch that row. suppose above My table Fifth Row duplicate . i have more four thousands Query . i want select Duplicate records .Above I mention . please How to select using Query ?
If you want the values that are duplicated, then use group by:
select Entity, ExpenseTypeCode, Amount, Description, APSupplierID, ExpenseReportID, count(*) as numDuplicates
from t
group by Entity, ExpenseTypeCode, Amount, Description, APSupplierID, ExpenseReportID
having count(*) > 1;

oracle sql query to get data from two tables of similar type

I have two tables ACTUAL AND ESTIMATE having unique column(sal_id, gal_id, amount, tax).
In ACTUAL table I have
actual_id, sal_id, gal_id, process_flag, amount, tax
1 111 222 N 100 1
2 110 223 N 200 2
In ESTIMATE table I have
estimate_id, sal_id, gal_id, process_flag, amount, tax
3 111 222 N 50 1
4 123 250 N 150 2
5 212 312 Y 10 1
Now I want a final table, which should have record from ACTUAL table and if no record exist for sal_id+gal_id mapping in ACTUAL but exist in ESTIMATE, then populate estimate record (along with addition of amount and tax).
In FINAL table
id sal_id, gal_id, actual_id, estimate_id, total
1 111 222 1 null 101 (since record exist in actual table for 111 222)
2 110 223 2 null 202 (since record exist in actual table for 110 223)
3 123 250 null 4 51 (since record not exist in actual table but estimate exist for 123 250)
(for 212 312 combination in estimate, since record already processed, no need to process again).
I am using Oracle 11g. Please help me on writing a logic in a single sql query?
Thanks.
There are several ways to write this query. One way is to use join and coalesce:
select coalesce(a.sal_id, e.sal_id) as sal_id,
coalesce(a.gal_id, e.gal_id) as gal_id,
coalesce(a.actual_value, e.estimate_value) as actual_value
from actual a full outer join
estimate e
on a.sal_id = e.sal_id and
a.gal_id = e.gal_id
This assumes that sal_id/gal_id provides a unique match between the tables.
Since you are using Oracle, here is perhaps a clearer way of doing it:
select sal_id, gal_id, actual_value
from (select *,
max(isactual) over (partition by sal_id, gal_id) as hasactual
from ((select 1 as isactual, *
from actual
) union all
(select 0 as isactual, *
from estimate
)
) t
) t
where isactual = 1 or hasactual = 0
This query uses a window function to determine whether there is an actual record with the matching sal_id/gal_id. The logic is to take all actuals and then all records that have no match in the actuals.

Combining two tables in a query and creating new columns from that

I'm having issues with a query that I'm not ENTIRELY sure can be done with the way the database is set up. Basically, I'll be using two different tables in my query, let's say Transactions and Ticket Prices. They look like this (With some sample data):
TRANSACTIONS
Transation ID | Ticket Quantity | Total Price | Salesperson | Ticket Price ID
5489 250 250 Jim 8765
5465 50 150 Jim 1258
7898 36 45 Ann 4774
Ticket Prices
Ticket Price ID | Quantity | Price | Bundle Name
8765 1 1 1 ticket, $1
4774 12 15 5 tickets, $10
1258 1 3 1 ticket, $3
What I'm aiming for is a report, that breaks down each salesperson's sales by bundle type. The resulting table should be something like this:
Sales Volume/Salesperson
Name | Bundle A | Bundle B | Bundle C | Total
Jim 250 0 50 300
Ann 0 36 0 36
I've been searching the web, and it seems the best way of getting it like this is using various subqueries, which works well as far as getting the column titles properly displayed, but it doesn't work as far as the actual numerical totals. It basically combines the data, giving each salesperson a total readout (In this example, both Jim and Ann would have 250 sales in Bundle A, 36 in Bundle B, etc). Is there any way I can write a query that will give me the proper results? Or even something at least close to it? Thanks for any input.
You can use the PIVOT statement in Oracle to do this. A query might look something like this:
WITH pivot_data AS (
SELECT t.salesperson,p.bundle_name,t.ticket_quantity
FROM ticket_prices p, transactions t
where t.ticket_price_id = p.ticket_price_id
)
SELECT *
FROM pivot_data
PIVOT (
sum(ticket_quantity) --<-- pivot_clause
FOR bundle_name --<-- pivot_for_clause
IN ('1 ticket, $1','5 tickets, $10', '1 ticket, $3' ) --<-- pivot_in_clause
);
which would give you results like this: