TR_DATE
ACC_NAME
TYPE
AMOUNT
01-01-2017
AVNEESH
CR
60000
02-01-2017
AVNEESH
DB
8000
03-01-2017
AVNEESH
CR
8000
04-01-2017
AVNEESH
DB
5000
01-01-2017
NUPUR
CR
10000
02-01-2017
NUPUR
DB
8000
03-01-2017
NUPUR
CR
8000
And expected result for above data is
TR_DATE
ACC_NAME
TYPE
AMOUNT
BALANCE
01-01-2017
AVNEESH
CR
60000
60000
02-01-2017
AVNEESH
DB
8000
52000
03-01-2017
AVNEESH
CR
8000
60000
04-01-2017
AVNEESH
DB
5000
55000
01-01-2017
NUPUR
CR
10000
10000
02-01-2017
NUPUR
DB
8000
2000
03-01-2017
NUPUR
CR
8000
10000
You can use the analytic version of the sum() function, with a case expression to turn debits into negative amounts, and a window clause to apply the sum to amounts up to the current row's date:
select tr_date, acc_name, type, amount,
sum(case when type = 'DB' then -1 else 1 end * amount)
over (partition by acc_name order by tr_date) as balance
from passbook
order by acc_name, tr_date
TR_DATE
ACC_NAME
TYPE
AMOUNT
BALANCE
2017-01-01
AVNEESH
CR
60000
60000
2017-01-02
AVNEESH
DB
8000
52000
2017-01-03
AVNEESH
CR
8000
60000
2017-01-04
AVNEESH
DB
5000
55000
2017-01-01
NUPUR
CR
10000
10000
2017-01-02
NUPUR
DB
8000
2000
2017-01-03
NUPUR
CR
8000
10000
fiddle
Related
i got revenue over accounts monthly what am looking for is to view earnings for each account in descending order from last decrease
here is the query
SELECT account_id,
monthly_date,
earnings
FROM accounts_revenue
GROUP BY account_id,
monthly_date
the data is something like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2017-10-01
2000
55
2018-02-01
1500
55
2018-05-01
1200
55
2018-12-01
3000
55
2019-01-01
900
55
2019-02-01
810
55
2019-04-01
1000
55
2019-05-01
600
55
2020-01-01
800
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
so the data should be like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2018-02-01
1500
55
2018-05-01
1200
55
2019-01-01
900
55
2019-02-01
810
55
2019-05-01
600
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
any idea how to achieve this ??
Use NOT EXISTS:
SELECT ar1.*
FROM accounts_revenue ar1
WHERE NOT EXISTS (
SELECT 1
FROM accounts_revenue ar2
WHERE ar2.account_id = ar1.account_id
AND ar2.monthly_date < ar1.monthly_date
AND ar2.earnings <= ar1.earnings
)
ORDER BY ar1.account_id, ar1.monthly_date;
See the demo.
You can use the lag() window function and a CTE (Or subquery if you prefer) to filter out rows you don't want:
WITH revenue AS
(SELECT account_id, monthly_date, earnings,
lag(earnings) OVER (PARTITION BY account_id ORDER BY monthly_date) AS prev_earnings
FROM accounts_revenue)
SELECT account_id, monthly_date, earnings
FROM revenue
WHERE earnings < prev_earnings OR prev_earnings IS NULL
ORDER BY account_id, monthly_date;
For efficiency, you'll want an index on accounts_revenue(account_id, monthly_date).
I used Pandas’ resample function for calculating the sales of a list of proucts every 6 months.
I used the resample function for ‘6M’ and using apply({“column-name”:”sum”}).
Now I’d like to create a table with the sum of the sales for the first six months.
How can I extract the sum of the first 6 months, given that all products have records for more than 3 years, and none of them have the same start date?
Thanks in advance for any suggestions.
Here is an example of the data:
Product Date sales
Product 1 6/30/2017 20
12/31/2017 60
6/30/2018 50
12/31/2018 100
Product 2 1/31/2017 30
7/31/2017 150
1/31/2018 200
7/31/2018 300
1/31/2019 100
While waiting for your data, I worked on this. See if this is something that will be helpful for you.
import pandas as pd
df = pd.DataFrame({'Date':['2018-01-10','2018-02-15','2018-03-18',
'2018-07-10','2018-09-12','2018-10-14',
'2018-11-16','2018-12-20','2019-01-10',
'2019-04-15','2019-06-12','2019-10-18',
'2019-12-02','2020-01-05','2020-02-25',
'2020-03-15','2020-04-11','2020-07-22'],
'Sales':[200,300,100,250,150,350,150,200,250,
200,300,100,250,150,350,150,200,250]})
#first breakdown the data by Yearly Quarters
df['YQtr'] = pd.PeriodIndex(pd.to_datetime(df.Date), freq='Q')
#next create a column to identify Half Yearly - H1 for Jan-Jun & H2 for Jul-Dec
df.loc[df['YQtr'].astype(str).str[-2:].isin(['Q1','Q2']),'HYear'] = df['YQtr'].astype(str).str[:-2]+'H1'
df.loc[df['YQtr'].astype(str).str[-2:].isin(['Q3','Q4']),'HYear'] = df['YQtr'].astype(str).str[:-2]+'H2'
#Do a cummulative sum on Half Year to get sales by H1 & H2 for each year
df['HYear_cumsum'] = df.groupby('HYear')['Sales'].cumsum()
#Now filter out only the rows with the max value. That's the H1 & H2 sales figure
df1 = df[df.groupby('HYear')['HYear_cumsum'].transform('max')== df['HYear_cumsum']]
print (df)
print (df1)
The output of this will be:
Source Data + Half Year cumulative sum:
Date Sales YQtr HYear HYear_cumsum
0 2018-01-10 200 2018Q1 2018H1 200
1 2018-02-15 300 2018Q1 2018H1 500
2 2018-03-18 100 2018Q1 2018H1 600
3 2018-07-10 250 2018Q3 2018H2 250
4 2018-09-12 150 2018Q3 2018H2 400
5 2018-10-14 350 2018Q4 2018H2 750
6 2018-11-16 150 2018Q4 2018H2 900
7 2018-12-20 200 2018Q4 2018H2 1100
8 2019-01-10 250 2019Q1 2019H1 250
9 2019-04-15 200 2019Q2 2019H1 450
10 2019-06-12 300 2019Q2 2019H1 750
11 2019-10-18 100 2019Q4 2019H2 100
12 2019-12-02 250 2019Q4 2019H2 350
13 2020-01-05 150 2020Q1 2020H1 150
14 2020-02-25 350 2020Q1 2020H1 500
15 2020-03-15 150 2020Q1 2020H1 650
16 2020-04-11 200 2020Q2 2020H1 850
17 2020-07-22 250 2020Q3 2020H2 250
The half year cumulative sum for each half year.
Date Sales YQtr HYear HYear_cumsum
2 2018-03-18 100 2018Q1 2018H1 600
7 2018-12-20 200 2018Q4 2018H2 1100
10 2019-06-12 300 2019Q2 2019H1 750
12 2019-12-02 250 2019Q4 2019H2 350
16 2020-04-11 200 2020Q2 2020H1 850
17 2020-07-22 250 2020Q3 2020H2 250
I will look at your sample data and work on it later tonight.
I have some transaction data that looks like this.
import pandas as pd
from io import StringIO
from datetime import datetime
from datetime import timedelta
data = """\
cust_id,datetime,txn_type,txn_amt
100,2019-03-05 6:30,Credit,25000
100,2019-03-06 7:42,Debit,4000
100,2019-03-07 8:54,Debit,1000
101,2019-03-05 5:32,Credit,25000
101,2019-03-06 7:13,Debit,5000
101,2019-03-06 8:54,Debit,2000
"""
df = pd.read_table(StringIO(data), sep=',')
df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%d %H:%M:%S')
# use datetime as the dataframe index
df = df.set_index('datetime')
print(df)
cust_id txn_type txn_amt
datetime
2019-03-05 06:30:00 100 Credit 25000
2019-03-06 07:42:00 100 Debit 4000
2019-03-07 08:54:00 100 Debit 1000
2019-03-05 05:32:00 101 Credit 25000
2019-03-06 07:13:00 101 Debit 5000
2019-03-06 08:54:00 101 Debit 2000
I would like to resample the data at the daily level aggregating (summing) txn_amount for each combination of cust_id and txn_type. At the same time, I want to standardize the index to 5 days (currently the data only contains 3 days of data). In essence, this is what I would like to produce:
cust_id txn_type txn_amt
datetime
2019-03-03 100 Credit 0
2019-03-03 100 Debit 0
2019-03-03 101 Credit 0
2019-03-03 101 Debit 0
2019-03-04 100 Credit 0
2019-03-04 100 Debit 0
2019-03-04 101 Credit 0
2019-03-04 101 Debit 0
2019-03-05 100 Credit 25000
2019-03-05 100 Debit 0
2019-03-05 101 Credit 25000
2019-03-05 101 Debit 0
2019-03-06 100 Credit 0
2019-03-06 100 Debit 4000
2019-03-06 101 Credit 0
2019-03-06 101 Debit 7000 => (note: aggregated value)
2019-03-07 100 Credit 0
2019-03-07 100 Debit 1000
2019-03-07 101 Credit 0
2019-03-07 101 Debit 0
So far, I've tried creating a new datetime index and to try and resample and then use the newly created index like so:
# create a 5 day datetime index
end_dt = max(df.index).to_pydatetime().strftime('%Y-%m-%d')
start_dt = max(df.index) - timedelta(days=4)
start_dt = start_dt.to_pydatetime().strftime('%Y-%m-%d')
dt_index = pd.date_range(start=start_dt, end=end_dt, freq='1D', name='datetime')
However, I am not sure how to go about the grouping part. Resampling with no grouping outputs wrong results:
# resample timeseries so that one row is 1 day's worth of txns
df2 = df.resample(rule='D').sum().reindex(dt_index).fillna(0)
print(df2)
cust_id txn_amt
datetime
2019-03-03 0.0 0.0
2019-03-04 0.0 0.0
2019-03-05 201.0 50000.0
2019-03-06 302.0 11000.0
2019-03-07 100.0 1000.0
So, how can I incorporate a grouping of cust_id and tsn_type when resampling? I have seen this similar question but the op's data structure is different.
I am using reindex here , the key is to setting up the Multiple index
df.index=pd.to_datetime(df.index).date
df=df.groupby([df.index,df['txn_type'],df['cust_id']]).agg({'txn_amt':'sum'}).reset_index(level=[1,2])
drange=pd.date_range(end=df.index.max(),periods =5)
idx=pd.MultiIndex.from_product([drange,df.cust_id.unique(),df.txn_type.unique()])
Newdf=df.set_index(['cust_id','txn_type'],append=True).reindex(idx,fill_value=0).reset_index(level=[1,2])
Newdf
Out[749]:
level_1 level_2 txn_amt
2019-03-03 100 Credit 0
2019-03-03 100 Debit 0
2019-03-03 101 Credit 0
2019-03-03 101 Debit 0
2019-03-04 100 Credit 0
2019-03-04 100 Debit 0
2019-03-04 101 Credit 0
2019-03-04 101 Debit 0
2019-03-05 100 Credit 25000
2019-03-05 100 Debit 0
2019-03-05 101 Credit 25000
2019-03-05 101 Debit 0
2019-03-06 100 Credit 0
2019-03-06 100 Debit 4000
2019-03-06 101 Credit 0
2019-03-06 101 Debit 7000
2019-03-07 100 Credit 0
2019-03-07 100 Debit 1000
2019-03-07 101 Credit 0
2019-03-07 101 Debit 0
I have two tables
exchange_rates
TIMESTAMP curr1 curr2 rate
2018-04-01 00:00:00 EUR GBP 0.89
2018-04-01 01:30:00 EUR GBP 0.92
2018-04-01 01:20:00 USD GBP 1.23
and
transactions
TIMESTAMP user curr amount
2018-04-01 18:00:00 1 EUR 23.12
2018-04-01 14:00:00 1 USD 15.00
2018-04-01 01:00:00 2 EUR 55.00
I want to link these two tables on 1. currency and 2. TIMESTAMP in the following way:
curr in transactions must be equal to curr1 in exchange_rates
TIMESTAMP in exchange_rates must be less than or equal to TIMESTAMP in transactions (so we only pick up the exchange rate that was relevant at the time of transaction)
I have this:
SELECT
trans.TIMESTAMP, trans.user,
-- Multiply the amount in transactions by the corresponding rate in exchange_rates
trans.amount * er.rate AS "Converted Amount"
FROM transactions trans, exchange_rates er
WHERE trans.curr = er.curr1
AND er.TIMESTAMP <= trans.TIMESTAMP
ORDER BY trans.user
but this is linking on two many results as the output is more rows than there are in transactions.
DESIRED OUTPUT:
TIMESTAMP user Converted Amount
2018-04-01 18:00:00 1 21.27
2018-04-01 14:00:00 1 18.45
2018-04-01 01:00:00 2 48.95
The logic behind the Converted Amount:
row 1: user spent at 18:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.92 for EUR at 01:30
row 2: user spent at 14:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 1.23 for USD at 01:20
row 3: user spent at 01:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.89 for EUR at 00:00
How can I do this in postgresql 9.6?
You can use a LATERAL JOIN (CROSS APPLY) and limit the result to the first row that match your conditions.
select t.dt, t.usr, t.amount * e.rate as conv_amount
from transactions t
join lateral (select *
from exchange_rates er
where t.curr = er.curr1
and er.dt <= t.dt
order by dt desc
limit 1) e on true;
dt | usr | conv_amount
:------------------ | --: | ----------:
2018-04-01 18:00:00 | 1 | 21.2704
2018-04-01 14:00:00 | 1 | 18.4500
2018-04-01 01:00:00 | 2 | 48.9500
db<>fiddle here
I have this table in sybase:
Date File_name File_Size customer Id
1/1/205 11:00:00 temp.csv 100000 ESPN 1111
1/1/205 11:10:00 temp.csv 200000 ESPN 1122
1/1/205 11:20:00 temp.csv 400000 ESPN 1456
1/1/205 11:30:00 temp.csv 400000 ESPN 2345
1/2/205 11:00:00 llc.csv 100000 LLC 445
1/2/205 11:10:00 llc1.txt 200000 LLC 677
1/2/205 11:20:00 dtt.txt 500000 LLC 76
1/2/205 11:30:00 jpp.txt 400000 LLC 666
I need to come up with a query to summarize this data by day which will be month/day/Year.
Date total_file_size number_of_unique_customers number_unique_id
1/1/2015 110,000 1 4
1/2/2015 120,000 1 4
How would I do this in sql query? I tried this:
select convert(varchar,arrived_at,110) as Date
sum(File_Size),
count(distinct(customer)),
count(distinct(id))
group by Date
Does not seem to be working, any ideas?
try
select
convert(varchar,arrived_at,110) as Date,
SUM(File_Size),
count(distinct customer) as number_of_unique_customers,
count(distinct id ) as number_unique_id
group by convert(varchar,arrived_at,110)