SQL: How to group rows with the condition that sum of fields is limited to a certain value? - sql

This is my table:
id user_id date balance
1 1 2016-05-10 10
2 1 2016-05-10 30
3 2 2017-04-24 5
4 2 2017-04-27 10
5 3 2017-11-10 40
I want to group the rows by user_id and sum the balance, but so that the sum is equal or less than 30. Moreover, I need to display the minimum date in the group. It should look like this:
id balance date_start
1-1 10 2016-05-10
1-2 30 2016-05-10
2-1 15 2017-04-24
Excuse for my language. Thanks.

You should be able to do so by using group by & having, here is an example of what may solve your case :
SELECT id, user_id, SUM(balance) as balance, data_start
FROM your_table
GROUP BY user_id
HAVING SUM(balance) >= 30
AND MIN(date_start)
This is a good way to do it with one query, but it is a complex query and you should be careful if using it on a very large tables.

Related

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

Count first occurring record per time period

In my table trips , I have two columns: created_at and user_id
Unique users take many different trips. My goal is to count the very first trip made unique per each user_ids per year-month. I understand that in this case the min() function should be applied.
In a previous query, all unique users per year-month were aggregated:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;
Where in the above query should min() be integrated? In other words, instead of counting all unique user id's per month, I only need to count the first occurrence of unique user id per month.
The sample input would look like:
> routes
user_id created_at
1 1 2015-08-07 07:18:21
2 2 2015-05-06 20:43:52
3 3 2015-05-06 20:53:54
4 1 2015-03-30 20:09:07
5 2 2015-10-01 18:28:32
6 3 2015-08-07 07:29:29
7 1 2015-08-28 13:45:44
8 2 2015-08-07 07:37:31
9 3 2015-03-30 20:14:04
10 1 2015-08-07 07:08:50
And the output would be:
count Y-m
1 0 2015-01
2 0 2015-02
3 2 2015-03
4 0 2015-04
5 1 2015-05
Because the first occurrences of user_id 1 and 3 were in March and the first occurrence of user_id 2 was in May
You can do this with 2 levels of aggregation. Get the min time per user_id and then count.
SELECT to_char(first_time, 'YYYY-MM'),count(*)
from (
SELECT user_id,MIN(created_at) as first_time
FROM trips
GROUP BY user_id
) t
GROUP BY to_char(first_time, 'YYYY-MM')

Get the latest price SQLITE

I have a table which contain _id, underSubheadId, wefDate, price.
Whenever a product is created or price is edited an entry is made in this table also.
What I want is if I enter a date, I get the latest price of all distinct UnderSubheadIds before the date (or on that date if no entry found)
_id underHeadId wefDate price
1 1 2016-11-01 5
2 2 2016-11-01 50
3 1 2016-11-25 500
4 3 2016-11-01 20
5 4 2016-11-11 30
6 5 2016-11-01 40
7 3 2016-11-20 25
8 5 2016-11-15 52
If I enter 2016-11-20 as date I should get
1 5
2 50
3 25
4 30
5 52
I have achieved the result using ROW NUMBER function in SQL SERVER, but I want this result in Sqlite which don't have such function.
Also if a date like 2016-10-25(which have no entries) is entered I want the price of the date which is first.
Like for 1 we will get price as 5 as the nearest and the 1st entry is 2016-11-01.
This is the query for SQL SERVER which is working fine. But I want it for Sqlite which don't have ROW_NUMBER function.
select underSubHeadId,price from(
select underSubHeadId,price, ROW_NUMBER() OVER (Partition By underSubHeadId order by wefDate desc) rn from rates
where wefDate<='2016-11-19') newTable
where newTable.rn=1
Thank You
This is a little tricky, but here is one way:
select t.*
from t
where t.wefDate = (select max(t2.wefDate)
from t t2
where t2.underSubHeadId = t.underSubHeadId and
t2.wefdate <= '2016-11-20'
);
select underHeadId, max(price)
from t
where wefDate <= "2016-11-20"
group by underHead;

How to get primary key of selected column in row in SQL Server 2012

I have the below table. I want to select 30 pieces of Product Code 011A from the table. Each row contains a number of pieces, in the column PCS. I want to select the 30 pieces in FIFO order based on date, and return the number of pieces selected from each row, so I'll need to know the primary key value for each row that has pieces selected from it. For example, from this data:
Key Product Code PCS Date
1 011A 10 2015-07-01
2 011B 20 2015-07-01
3 011C 20 2015-07-01
4 011A 12 2015-07-02
5 011A 40 2015-07-03
6 011D 60 2015-07-04
7 011A 20 2015-07-04
Selecting 30 pieces of product code "011A" should give an output table like:
Key Product Code PCS DATE
1 011A 10 2015-07-01
4 011A 12 2015-07-02
5 011A 8 2015-07-03
You can see that the total number of pieces is 30, and that the maximum number of pieces were selected from the rows with primary key 1 and 4, because they're the first dates. Only 8 were selected from row #5, because it's the next in date order, and only 8 is needed to reach 30 total. Row #7 wasn't needed, so it doesn't show up in the result.
How can I write a query to accomplish this?
In SQL Server 2012, you can use cumulative sum:
select t.*
from (select t.*,
sum(pcs) over (partition by productcode order by date) as cumepcs
from thetable t
where productcode = '011A'
) t
where cumepcs - pcs < 30;
Doing a cumulative sum in SQL Server 2008 is a bit more work. Here is one way:
select t.*
from (select t.*,
(select sum(t2.pcs)
from thetable t2
where t2.productcode = t.productcode and
t2.date <= t.date
) as cumepcs
from thetable t
where productcode = '011A'
) t
where cumepcs - pcs < 30;
EDIT:
If you want the allocated amounts from each bucket, you need to tweak the size of the last bucket. Change the select to:
select t.*,
(case when cume_pcs <= 30 then pcs
else 30 - (cumepcs - pcs)
end) as allocated_pcs
. . .

Performing calculations based on dates in oracle

I have the following tables.
Accounts(account_number*,balance)
Transactions(account_number*,transaction_number*,date,amount,type)
Date is the date that the transaction happened. Amount is the amount of the transaction
and it can have a positive or a negative value, dependent of the type(Withdrawal -,Deposit +). I think the type is irrelevant here as the amount is already given in the proper way.
I need to write a query which points out the account_number of the accounts that have at least once had negative balance.
Here's some sample data from the Transactions table, ordered by account_number and date.
account_number transaction_number date amount type
--------------------------------------------------------------------
1 2 02/03/2013 -20000 withdrawal
1 3 03/15/2013 300 deposit
1 1 01/01/2013 100 deposit
2 1 04/15/2013 235236 deposit
3 1 06/15/2013 500 deposit
4 1 03/01/2013 10 deposit
4 2 04/01/2013 80 deposit
5 1 11/11/2013 10000 deposit
5 2 12/11/2013 20000 deposit
5 3 12/13/2013 -10002 withdrawal
6 1 03/15/2013 102300 deposit
7 1 03/15/2013 100 deposit
8 1 08/08/2013 133990 deposit
9 1 05/09/2013 10000 deposit
9 2 06/01/2013 300 deposit
9 3 10/11/2013 23 deposit
Something like this with an analytic to keep a running balance for an account:
SELECT DISTINCT account_number
FROM ( SELECT account_number
,SUM(amount)
OVER (PARTITION BY account_number ORDER BY date) AS running_balance
FROM transactions
) x
WHERE running_balance < 0
Explanation:
It is using an analytic function: the PARTITION BY breaks the table into groups identified by the account number. Within each group, the data is ordered by date. Then there is a walk through each element in the ordered group and the SUM function is applied (by default summing everything from the beginning of the group to the current row). This gives you a running balance. Just run the inner query on its own and take a look at the output, then read a bit about analytic queries. They are pretty cool.