I have table like this:
cust_id acc_no trans_id trans_type amount
1111 1001 10 credit 2000.0
1111 1001 11 credit 1000.0
1111 1001 12 debit 1000.0
2222 1002 13 credit 2000.0
2222 1002 14 debit 1000.0
I want a Hive query or sql query for every transaction done by a customer the balance should be calculated so.
I want output as follows:
cust_id acc_no trans_id trans_type amount balance
1111.0 1001.0 10.0 credit 2000.0 2000.0
1111.0 1001.0 11.0 credit 1000.0 3000.0
1111.0 1001.0 12.0 debit 1000.0 2000.0
2222.0 1002.0 13.0 credit 2000.0 2000.0
2222.0 1002.0 14.0 debit 1000.0 1000.0
I've tried
SELECT *
FROM (SELECT cust_id,
acc_no,
trans_id,
trans_type,
amount,
CASE
WHEN Trim(trans_type) = 'credit' THEN ball =
Trim(bal) + Trim(amt)
ELSE ball = Trim(bal) - Trim(amt)
end
FROM ban) l;
This query will do the trick :
SELECT t1.cust_id,t1.acc_no,t1.trans_id,t1.trans_type,t1.amount,
sum(t2.amount*case when t2.trans_type = 'credit' then 1
else -1 end) as balance
FROM Table1 t1
INNER JOIN Table1 t2 ON t1.cust_id = t2.cust_id AND
t1.acc_no = t2.acc_no AND
t1.trans_id >= t2.trans_id
GROUP BY t1.cust_id,t1.acc_no,t1.trans_id,t1.trans_type,t1.amount
See SQLFIDDLE : http://www.sqlfiddle.com/#!2/3b5d8/15/0
EDIT :
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE Table1
(`cust_id` int, `acc_no` int, `trans_id` int,
`trans_type` varchar(6), `amount` int)
;
INSERT INTO Table1
(`cust_id`, `acc_no`, `trans_id`, `trans_type`, `amount`)
VALUES
(1111, 1001, 10, 'credit', 2000.0),
(1111, 1001, 11, 'credit', 1000.0),
(1111, 1001, 12, 'debit', 1000.0),
(2222, 1002, 13, 'credit', 2000.0),
(2222, 1002, 14, 'debit', 1000.0)
;
Query 1:
SELECT t1.cust_id,t1.acc_no,t1.trans_id,t1.trans_type,t1.amount,
sum(t2.amount*case when t2.trans_type = 'credit' then 1
else -1 end) as balance
FROM Table1 t1
INNER JOIN Table1 t2 ON t1.cust_id = t2.cust_id AND
t1.acc_no = t2.acc_no AND
t1.trans_id >= t2.trans_id
GROUP BY t1.cust_id,t1.acc_no,t1.trans_id,t1.trans_type,t1.amount
Results:
| CUST_ID | ACC_NO | TRANS_ID | TRANS_TYPE | AMOUNT | BALANCE |
|---------|--------|----------|------------|--------|---------|
| 1111 | 1001 | 10 | credit | 2000 | 2000 |
| 1111 | 1001 | 11 | credit | 1000 | 3000 |
| 1111 | 1001 | 12 | debit | 1000 | 2000 |
| 2222 | 1002 | 13 | credit | 2000 | 2000 |
| 2222 | 1002 | 14 | debit | 1000 | 1000 |
A simple solution is to quantify each transaction (- or +) based on trans_type and then get cumulative sum using window function .
SELECT cust_id,
acc_no,
trans_id,
trans_type,
amount,
Sum (real_amount)
OVER (ORDER BY cust_id) AS balance
FROM (SELECT cust_id,
acc_no,
trans_id,
trans_type,
amount,
( CASE trans_type
WHEN 'credit' THEN amount
WHEN 'debit' THEN amount *- 1
END ) AS real_amount
FROM test) t
You could do this easily through a View, calculating this directly on the table is possible but leads to performance and scalability issues (the database will slow down as the table grows). By using a View the calculation is performed as-needed; if you index the view you can keep the balances up to date without impacting the performance of the transaction table.
If you really insist on it being in the transaction table itself you could possibly use a calculated column which runs a user-defined function to determine the current balance. However this will depend largey on the specific SQL backend you're using.
Here's a basic SELECT Statement which calculates the current balance by Account:
select
acc_no,
sum(case trans_type
when 'credit' then amount
when 'debit' then amount * -1
end) as Amount
from Transactions
group by acc_no
You can use window function:
select cust_id,
acc_no, trans_id, trans_type, amount,
sum(pre_balance) over (partition by cust_id order by trans_id) as balance
from
(select cust_id, acc_no, trans_id, trans_type,
amount,
amount as pre_balance from test
where trans_type = 'credit'
union
select cust_id, acc_no, trans_id, trans_type,
amount, -amount as pre_balance from
test where trans_type = 'debit'
order by trans_id) as sub;
with current_balances as (
SELECT
id,
user_id,
SUM(amount) OVER (PARTITION BY user_id ORDER BY created ASC) as current_balance
FROM payments_transaction pt
ORDER BY created DESC
)
SELECT
pt.id,
amount,
pt.user_id,
cb.current_balance as running_balance
FROM
payments_transaction pt
INNER JOIN
current_balances cb
ON pt.id = cb.id
ORDER BY created DESC
LIMIT 10;
This will work very efficiently for big returns, and won't break on filtering or limiting. Please note that if you select only for one user or a subset of them, provide user_id filter in both current_balances cte, and the main select to omit whole table scan.
Table (Transaction)
-
"id" "amount" "is_credit"
1 10000 1
2 2000 0
3 5000 1
Query :
SELECT *
FROM (
SELECT id, amount, SUM(CASE When is_credit=1 Then amount Else -amount End) OVER (ORDER BY id) AS balance
FROM `Transaction`
GROUP BY id, amount
)
ORDER BY id ;
Output :
"id" "amount" "is_credit" "balance"
1 10000 1 10000
2 2000 0 8000
3 5000 1 13000
Related
I'm currently using a UNION on 2 select statements and while I'm getting the correct data, it's not exactly what I actually need when it comes time to use it in a front-end view
I'm currently using this query:
SELECT
T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select
T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
order by data desc;
with these dummy tables as examples and getting the following result:
order_list
employee | order_count | quant | order_date
--------------------------------------------------
123 | 5 | 1 | 2022-03-02
456 | 1 | 5 | 2022-03-02
sales
employee | price | quant | order_date
--------------------------------------------------
123 | 500 | 1 | 2022-03-02
456 | 1000 | 1 | 2022-03-02
Result
employee | type | data
------------------------------------------
123 Orders 1
123 Sales 500
456 Orders 5
456 Sales 1000
Is there a way to use a UNION but alter it so that I can instead get a single row for each employee and just get rid of the type/data columns and instead set each piece of data to the desired column (the type would instead be the column name ) like so:
Desired Result
employee | Orders | Sales
---------------------------------
123 | 1 | 500
456 | 5 | 1000
Try adding an outer query:
select employee,
MAX(case when type=Orders then data end) as orders ,
MAX(case when type=Sales then data end) as Sales
from (
SELECT T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
) as t1
GROUP BY employee;
Note that I removed order by data desc it has no effect inside the union
You can join tables through employee columns such as
SELECT o.employee,
SUM(CASE
WHEN o.order_count < o.quant THEN
o.order_count
ELSE
o.quant
END) AS Orders,
SUM(s.price * s.quant) AS Sales
FROM schemaOne.order_list o
JOIN schemaOne.sales s
ON s.employee = o.employee
AND s.sales_date = o.order_date
WHERE o.order_date > current_date - 35 DAYS
GROUP BY o.employee
Here's an example "transactions" table where each row is a record of an amount and the date of the transaction.
+--------+------------+
| amount | date |
+--------+------------+
| 1000 | 2020-01-06 |
| -10 | 2020-01-14 |
| -75 | 2020-01-20 |
| -5 | 2020-01-25 |
| -4 | 2020-01-29 |
| 2000 | 2020-03-10 |
| -75 | 2020-03-12 |
| -20 | 2020-03-15 |
| 40 | 2020-03-15 |
| -50 | 2020-03-17 |
| 200 | 2020-10-10 |
| -200 | 2020-10-10 |
+--------+------------+
The goal is to return one column "balance" with the balance of all transactions. Only catch is that there is a monthly fee of $5 for each month that there are not at least THREE payment transactions (represented by a negative value in the amount column) that total at least $100. So in the example, the only month where you wouldn't have a $5 fee is March because there were 3 payments (negative amount transactions) that totaled $145. So the final balance would be $2,746. The sum of the amounts is $2,801 minus the $55 monthly fees (11 months X 5). I'm not a postgres expert by any means, so if anyone has any pointers on how to get started solving this problem or what parts of the postgres documentation which help me most with this problem that would be much appreciated.
The expected output would be:
+---------+
| balance |
+---------+
| 2746 |
+---------+
This is rather complicated. You can calculate the total span of months and then subtract out the one where the fee is cancelled:
select amount, (extract(year from age) * 12 + extract(month from age)), cnt,
amount - 5 *( extract(year from age) * 12 + extract(month from age) + 1 - cnt) as balance
from (select sum(amount) as amount,
age(max(date), min(date)) as age
from transactions t
) t cross join
(select count(*) as cnt
from (select date_trunc('month', date) as yyyymm, count(*) as cnt, sum(amount) as amount
from transactions t
where amount < 0
group by yyyymm
having count(*) >= 3 and sum(amount) < -100
) tt
) tt;
Here is a db<>fiddle.
This calculates 2756, which appears to follow your rules. If you want the full year, you can just use 12 instead of the calculating using the age().
I would first left join with a generate_series that represents the months you are interested in (in this case, all in the year 2020). That adds the missing months with a balance of 0.
Then I aggregate these values per month and add the negative balance per month and the number of negative balances.
Finally, I calculate the grand total and subtract the fee for each month that does not meet the criteria.
SELECT sum(amount_per_month) -
sum(5) FILTER (WHERE negative_per_month > -100 OR negative_count < 3)
FROM (SELECT sum(amount) AS amount_per_month,
sum(amount) FILTER (WHERE amount < 0) AS negative_per_month,
month_start,
count(*) FILTER (WHERE amount < 0) AS negative_count
FROM (SELECT coalesce(t.amount, 0) AS amount,
coalesce(date_trunc('month', CAST (t.day AS timestamp)), dates.d) AS month_start
FROM generate_series(
TIMESTAMP '2020-01-01',
TIMESTAMP '2020-12-01',
INTERVAL '1 month'
) AS dates (d)
LEFT JOIN transactions AS t
ON dates.d = date_trunc('month', CAST (t.day AS timestamp))
) AS gaps_filled
GROUP BY month_start
) AS sums_per_month;
This would be my solution by simply using cte.
DB fiddle here.
balance
2746
Code:
WITH monthly_credited_transactions
AS (SELECT Date_part('month', date) AS cred_month,
Sum(CASE
WHEN amount < 0 THEN Abs(amount)
ELSE 0
END) AS credited_amount,
Sum(CASE
WHEN amount < 0 THEN 1
ELSE 0
END) AS credited_cnt
FROM transactions
GROUP BY 1),
credit_fee
AS (SELECT ( 12 - Count(1) ) * 5 AS fee,
1 AS id
FROM monthly_credited_transactions
WHERE credited_amount >= 100
AND credited_cnt >= 3),
trans
AS (SELECT Sum(amount) AS amount,
1 AS id
FROM transactions)
SELECT amount - fee AS balance
FROM trans a
LEFT JOIN credit_fee b
ON a.id = b.id
For me the below query worked (have adopted my answer from #GordonLinoff):
select CAST(totalamount - 5 *(12 - extract(month from firstt) + 1 - nofeemonths) AS int) as balance
from (select sum(amount) as totalamount, min(date) as firstt
from transactions t
) t cross join
(select count(*) as nofeemonths
from (select date_trunc('month', date) as months, count(*) as nofeemonths, sum(amount) as totalamount
from transactions t
where amount < 0
group by months
having count(*) >= 3 and sum(amount) < -100
) tt
) tt;
The firstt is the date of first transaction in that year and 12 - extract(month from firstt) + 1 - nofeemonths are the number of months for which the credit card fees of 5 will be charged.
I am trying to find the customer count and sales by the type of customer (New and Returning) and the number of times they have purchased.
txn_date Customer_ID Transaction_Number Sales Reference(not in the SQL table) customer type (not in the sql table)
1/2/2019 1 12345 $10 Second Purchase SLS Repeat
4/3/2018 1 65890 $20 First Purchase SLS Repeat
3/22/2019 3 64453 $30 First Purchase SLS new
4/3/2019 4 88567 $20 First Purchase SLS new
5/21/2019 4 85446 $15 Second Purchase SLS new
1/23/2018 5 89464 $40 First Purchase SLS Repeat
4/3/2019 5 99674 $30 Second Purchase SLS Repeat
4/3/2019 6 32224 $20 Second Purchase SLS Repeat
1/23/2018 6 46466 $30 First Purchase SLS Repeat
1/20/2018 7 56558 $30 First Purchase SLS new
I am using the below code to get the aggregate sales and customer count for the total customers:
select seqnum, count(distinct customer_id), sum(sales) from (
select co.*,
row_number() over (partition by customer_id order by txn_date) as seqnum
from somya co)
group by seqnum
order by seqnum;
I want to get the same data by the customer type:
for example for the new customers my result should show:
New Customers Customer_Count Sum(Sales)
1st Purchase 3 $80
2nd Purchase 1 $15
Returning Customers Customer_Count Sum(Sales)
1st Purchase 3 $90
2nd Purchase 3 $60
I am trying the below query to get the data for new and repeat customers:
New Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE between '01-JAN-19' and '31-DEC-19'
group by seqnum
order by seqnum asc;
Returning Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE <'01-JAN-19'
group by seqnum
order by seqnum asc;
I am not able to figure out what is wrong with my query or if there is a problem with my logic.
This is just a sample data, I have transactions from all the years in my data base so I need to narrow the transaction date in the query but as soon as I narrowing down the data using the transaction date the repeat customer query doesnt give me anything and the new customer query gives me the total customer for that period.
If I understand correctly, you need to know the first time someone becomes a customer. And then use this:
select (case when first_year < 2019 then 'returning' else 'new' end) as custtype,
seqnum, count(*), sum(sales)
from (select co.*,
row_number() over (partition by customer_id, extract(year from txn_date) order by txn_date) as seqnum,
min(extract(year from txn_date)) over (partition by customer_id) as first_year
from somya co
) s
where txn_date >= date '2019-01-01' and
txn_date < date '2020-01-01'
group by (case when first_year < 2019 then 'returning' else 'new' end),
seqnum
order by custtype, seqnum;
You can categorize your sales data to assign a customer type and a purchase sequence using windowing functions, like this:
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat'
ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd
+-----------+-------------+--------------------+-------+---------------+-------------------+
| TXN_DATE | CUSTOMER_ID | TRANSACTION_NUMBER | SALES | CUSTOMER_TYPE | PURCHASE_SEQUENCE |
+-----------+-------------+--------------------+-------+---------------+-------------------+
| 03-APR-18 | 1 | 65890 | 20 | Repeat | 1 |
| 02-JAN-19 | 1 | 12345 | 10 | Repeat | 2 |
| 22-MAR-19 | 3 | 64453 | 30 | New | 1 |
| 03-APR-19 | 4 | 88567 | 20 | New | 1 |
| 21-MAY-19 | 4 | 85446 | 15 | New | 2 |
| 23-JAN-18 | 5 | 89464 | 40 | Repeat | 1 |
| 03-APR-19 | 5 | 99674 | 30 | Repeat | 2 |
| 23-JAN-18 | 6 | 46466 | 30 | Repeat | 1 |
| 03-APR-19 | 6 | 32224 | 20 | Repeat | 2 |
| 20-JAN-18 | 7 | 56558 | 30 | New | 1 |
+-----------+-------------+--------------------+-------+---------------+-------------------+
Then, you can wrap that in a common table expression (aka "WITH" clause) and summarize by the customer type and purchase sequence:
WITH categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
+---------------+-------------------+----------+------------+
| CUSTOMER_TYPE | PURCHASE_SEQUENCE | COUNT(*) | SUM(SALES) |
+---------------+-------------------+----------+------------+
| New | 1 | 3 | 80 |
| New | 2 | 1 | 15 |
| Repeat | 1 | 3 | 90 |
| Repeat | 2 | 3 | 60 |
+---------------+-------------------+----------+------------+
Here's a full SQL with test data:
with sales_data (txn_date, Customer_ID, Transaction_Number, Sales ) as (
SELECT TO_DATE('1/2/2019','MM/DD/YYYY'), 1, 12345, 10 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2018','MM/DD/YYYY'), 1, 65890, 20 FROM DUAL UNION ALL
SELECT TO_DATE('3/22/2019','MM/DD/YYYY'), 3, 64453, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 4, 88567, 20 FROM DUAL UNION ALL
SELECT TO_DATE('5/21/2019','MM/DD/YYYY'), 4, 85446, 15 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 5, 89464, 40 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 5, 99674, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 6, 32224, 20 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 6, 46466, 30 FROM DUAL UNION ALL
SELECT TO_DATE('1/20/2018','MM/DD/YYYY'), 7, 56558, 30 FROM DUAL ),
-- Query starts here
/* WITH */ categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
Response to comment from OP
all the customers whose first purchase date is in 2019 would be a new customer. Any customer who has transacted in 2019 but their first purchase date is before 2019 would be a repeat customer
So, change
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat' ELSE 'New' END customer_type
to
case when min(txn_date) over ( partition by customer_id )
BETWEEN DATE '2019-01-01' AND DATE '2020-01-01' - INTERVAL '1' SECOND
THEN 'New' ELSE 'Repeat' END customer_type
i.e., if and only if a customer's first purchase was in 2019 then they are "new".
I am playing around with bigquery and hit an interesting use case. I have a collection of customers and account balances. The account balances collection records any account balance change.
Customers:
+---------+--------+
| ID | Name |
+---------+--------+
| 1 | Alice |
| 2 | Bob |
+---------+--------+
Accounts balances:
+---------+---------------+---------+------------+
| ID | customer_id | value | timestamp |
+---------+---------------+---------+------------+
| 1 | 1 | -500 | 2019-02-12 |
| 2 | 1 | -200 | 2019-02-10 |
| 3 | 2 | 200 | 2019-02-10 |
| 4 | 1 | 0 | 2019-02-09 |
+---------+---------------+---------+------------+
The goal is to find out, for how long a customer has a negative account balance. The resulting collection would look like this:
+---------+--------+---------------------------------+
| ID | Name | Negative account balance since |
+---------+--------+---------------------------------+
| 1 | Alice | 2 days |
+---------+--------+---------------------------------+
Bob is not in the collection, because his last account record shows a positive value.
I think following steps are involved:
get last account balance per customer, see if it is negative
go through the account balance values until you hit a positive (or no more) value
compute datediff
Is something like this even possible in sql? Do you have any ideas on who to create such query? To get customers that currently have a negative account balance, I use this query:
SELECT customer_id FROM (
SELECT t.account_balance, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) as seqnum FROM `account_balances` t
) t
WHERE seqnum = 1 AND account_balance<0
Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.balances` AS (
SELECT 1 customer_id, -500 value, DATE '2019-02-12' ts UNION ALL
SELECT 1, -200, '2019-02-10' UNION ALL
SELECT 2, 200, '2019-02-10' UNION ALL
SELECT 1, 0, '2019-02-09'
), `project.dataset.customers` AS (
SELECT 1 id, 'Alice' name UNION ALL
SELECT 2, 'Bob'
)
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
-- ORDER BY customer_id
with result
Row customer_id name negative_days zero_days positive_days
1 1 Alice 3 1 0
2 2 Bob 0 0 1
I have a table like below, I am trying to run a query in T-SQL to get the earliest and latest costs for each project_id according to the date column and calculate the percent cost increase or decrease and return the data-set show in the second table (I have simplified the table in this question).
project_id date cost
-------------------------------
123 7/1/17 5000
123 8/1/17 6000
123 9/1/17 7000
123 10/1/17 8000
123 11/1/17 9000
456 7/1/17 10000
456 8/1/17 9000
456 9/1/17 8000
876 1/1/17 8000
876 6/1/17 5000
876 8/1/17 10000
876 11/1/17 8000
Result:
(Edit: Fixed the result)
project_id "cost incr/decr pct"
------------------------------------------------
123 80% which is (9000-5000)/5000
456 -20%
876 0%
Whatever query I run I get duplicates.
This is what I tried:
select distinct
p1.Proj_ID, p1.date, p2.[cost], p3.cost,
(nullif(p2.cost, 0) / nullif(p1.cost, 0)) * 100 as 'OVER UNDER'
from
[PROJECT] p1
inner join
(select
[Proj_ID], [cost], min([date]) min_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p2 on p1.Proj_ID = p2.Proj_ID
inner join
(select
[Proj_ID], [cost], max([date]) max_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p3 on p1.Proj_ID = p3.Proj_ID
where
p1.date in (p2.min_date, p3.max_date)
Unfortunately, SQL Server does not have a first_value() aggregation function. It does have an analytic function, though. So, you can do:
select distinct project_id,
first_value(cost) over (partition by project_id order by date asc) as first_cost,
first_value(cost) over (partition by project_id order by date desc) as last_cost,
(first_value(cost) over (partition by project_id order by date desc) /
first_value(cost) over (partition by project_id order by date asc)
) - 1 as ratio
from project;
If cost is an integer, you may need to convert to a representation with decimal places.
You can use row_number and OUTER APPLY over top 1 ... prior to SQL 2012
select
min_.projectid,
latest_.cost - min_.cost [Calculation]
from
(select
row_number() over (partition by projectid order by date) rn
,projectid
,cost
from projectable) min_ -- get the first dates per project
outer apply (
select
top 1
cost
from projectable
where
projectid = min_.projectid -- get the latest cost for each project
order by date desc
) latest_
where min_.rn = 1
This might perform a little better
;with costs as (
select *,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date) mincost,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date desc) maxcost
from table1
)
select project_id,
min(case when mincost = 1 then cost end) as cost1,
max(case when maxcost = 1 then cost end) as cost2,
(max(case when maxcost = 1 then cost end) - min(case when mincost = 1 then cost end)) * 100 / min(case when mincost = 1 then cost end) as [OVER UNDER]
from costs a
group by project_id