Grouping of Similar data by amount in Oracle - sql

I have a txn table with columns ac_id, txn_amt. It will store the data txn amounts along with account ids. Below is example of data
AC_ID TXN_AMT
10 1000
10 1000
10 1010
10 1030
10 5000
10 5010
10 10000
20 32000
20 32200
20 5000
I want to write a query in such a way that all the amounts which are within 10% range of the previous amounts should be grouped together. Output should be something like this:
AC_ID TOTAL_AMT TOTAL_CNT GROUP
10 4040 4 1
10 10010 2 2
20 64200 2 3
20 5000 1 4
I tried with LAG function but still clueless. This is the code snippet I tried:
select ac_id, txn_amt, round((((txn_amt - lag(txn_amt, 1) over (partition by ac_id order by ac_id, txn_amt))/txn_amt)*100,2) as amt_diff_pct from txn;
Any clue or help will be highly appreciated.

If by previous you mean "the largest amount less than", then you can do this. You can find where the gaps are (i.e. larger than a 10% difference). Then you can assign a group by counting the number of gaps:
select ac_id, sum(txn_amt) as total_amt, count(*) as total_cnt, grp
from (select t.*,
sum(case when prev_txn_amt * 1.1 > txn_amt then 0 else 1 end) over
(partition by ac_id order by txn_amt) as grp
from (select t.*,
lag(txn_amt) over (partition by ac_id order by txn_amt) as prev_txn_amt
from txn t
) t
) t
group by ac_id, grp;

Related

Can't get the cumulative sum(running total) within a group in SQL Server

I'm trying to get a running total within a group but my current code just gives me an aggregate sum.
For example, my data looks like this
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount
12542 1 Full A 1 12.5 40 500
12542 1 Full A 1 12.5 35 420
12542 2 Full A 1 10 40 400
12542 2 Full B 1.2 10 40 480
17842 1 Full A 1 11 27 297
17842 1 Full B 1.3 11 30 429
And what I want is a running total within the same ID, Shift Number, and Status. For example, I want something like this as my final result
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount Running_Tot
12542 1 Full A 1 12.5 40 500 500
12542 1 Full A 1 12.5 35 420 920
12542 2 Full A 1 10 40 400 400
12542 2 Full B 1.2 10 40 480 880
17842 1 Full A 1 11 27 297 297
17842 1 Full B 1.3 11 30 429 726
However, my current code just gives me the total sum within each group. For example, 920, 920 for row 1&2. Here's my code.
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY ID, ShiftNum, Status) as Runnint_Tot
from table a
How do I fix my code to get the final result I want?
You need an ordering column that uniquely defines each row. There is not an obvious one in your row, but something like this:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY hours) as Running_Tot
Or:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status
ORDER BY (SELECT NULL)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as Running_Tot
The problem you are facing is because the ORDER BY keys have ties. The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note the RANGE. That means that all rows with ties are combined.
Also note that there is no utility to including the PARTITION BY keys in the ORDER BY (well . . . there is one exception in SQL Server if you don't care about the ordering, then including a key can be a handy short-cut). The ordering occurs within a partition.
If your rows can have exact duplicates, I would first suggest that you add a primary key. But, in the meantime, you could use:
with a as (
select a.*,
row_number() over (order by id, shiftnum, status) as seqnum
from tablea a
)
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY seqnum) as Running_Tot
from a;
The ordering will be arbitrary, but it will at least accumulate.

Count the number of transactions per month for an individual group by date Hive

I have a table of customer transactions where each item purchased by a customer is stored as one row. So, for a single transaction there can be multiple rows in the table. I have another col called visit_date.
There is a category column called cal_month_nbr which ranges from 1 to 12 based on which month transaction occurred.
The data looks like below
Id visit_date Cal_month_nbr
---- ------ ------
1 01/01/2020 1
1 01/02/2020 1
1 01/01/2020 1
2 02/01/2020 2
1 02/01/2020 2
1 03/01/2020 3
3 03/01/2020 3
first
I want to know how many times customer visits per month using their visit_date
i.e i want below output
id cal_month_nbr visit_per_month
--- --------- ----
1 1 2
1 2 1
1 3 1
2 2 1
3 3 1
and what is the avg frequency of visit per ids
ie.
id Avg_freq_per_month
---- -------------
1 1.33
2 1
3 1
I tried with below query but it counts each item as one transaction
select avg(count_e) as num_visits_per_month,individual_id
from
(
select r.individual_id, cal_month_nbr, count(*) as count_e
from
ww_customer_dl_secure.cust_scan
GROUP by
r.individual_id, cal_month_nbr
order by count_e desc
) as t
group by individual_id
I would appreciate any help, guidance or suggestions
You can divide the total visits by the number of months:
select individual_id,
count(*) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
If you want the average number of days per month, then:
select individual_id,
count(distinct visit_date) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
Actually, Hive may not be efficient at calculating count(distinct), so multiple levels of aggregation might be faster:
select individual_id, avg(num_visit_days)
from (select individual_id, cal_month_nbr, count(*) as num_visit_days
from (select distinct individual_id, visit_date, cal_month_nbr
from ww_customer_dl_secure.cust_scan c
) iv
group by individual_id, cal_month_nbr
) ic
group by individual_id;

select top 5 max records in "High" column and 5 min records from "Low" Column in same query and from same table partitioned by stock name

we have 6 months historic data and need to find out what is the top 2 max highs and top 2 min lows per each stock for all the stocks. Below is the sample data
Stock High Low Date prevclose ....
------------------------------------
ABB 100 75 29/12/2019 90
ABB 83 50 30/12/2019 87
ABB 73 45 30/12/2019 87
infy 1000 675 29/12/2019 900
infy 830 650 30/12/2019 810
infy 730 645 30/12/2019 788
I tried the following queries, but not getting the expected results.. I need results such as top 2 high rows and top 3 min low in one result set. I tried below query but no luck..
select * into SRTrend from (
--- Resistance
select * from (Select top (5) with ties 'H' as 'Resistance', RowN=Row_Number() over(partition by name order by High desc),* from Historic
order by Row_Number() over(partition by name order by High desc))B
Union all
--Support
select * from (Select top (5) with ties 'L' as 'Support', RowN=Row_Number() over(partition by name order by Low asc),* from Historic
--where name='ABB'
order by Row_Number() over(partition by name order by Low asc))C
)D
PS: Hurdles which I faced is when I tried to export data to another table, getting very messed up results instead of getting top 2 max(highs) and top3 min(lows), I am getting single rows.
You can use rank() as follows:
select *
from (
select
t.*,
rank() over(partition by stock order by high desc) rn_high,
rank() over(partition by stock order by low asc) rn_low
from mytable t
) t
where rn_high <= 2 or rn_low <= 3
The inner query ranks records twice, by descending high and ascending low within groups of stocks. Then the outer query filters on top 2 and bottom 3 per stock (ties included).

SQL Query to get sums among multiple payments which are greater than or less than 10k

I am trying to write a query to get sums of payments from accounts for a month. I have been able to get it for the most part but I have hit a road block. My challenge is that I need a count of the amount of payments that are either < 10000 or => 10000. The business rules are that a single payment may not exceed 10000 but there can be multiple payments made that can total more than 10000. As a simple mock database it might look like
ID | AccountNo | Payment
1 | 1 | 5000
2 | 1 | 6000
3 | 2 | 5000
4 | 3 | 9000
5 | 3 | 5000
So the results I would expect would be something like
NumberOfPaymentsBelow10K | NumberOfPayments10K+
1 | 2
I would like to avoid doing a function or stored procedure and would prefer a sub query.
Any help with this query would be greatly appreciated!
I suggest avoiding sub-queries as much as possible because it hits the performance, specially if you have a huge amount of data, so, you can use something like Common Table Expression instead. You can do the same by using:
;WITH CTE
AS
(
SELECT AccountNo, SUM(Payment) AS TotalPayment
FROM Payments
GROUP BY AccountNo
)
SELECT
SUM(CASE WHEN TotalPayment < 10000 THEN 1 ELSE 0 END) AS 'NumberOfPaymentsBelow10K',
SUM(CASE WHEN TotalPayment >= 10000 THEN 1 ELSE 0 END) AS 'NumberOfPayments10K+'
FROM CTE
You can get the totals per account using SUM and GROUP BY...
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
You can use that result to count the number over 10000
SELECT COUNT(*)
FROM (
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
)
WHERE TotPay>10000
You can get the the number over and the number under in a single query if you want but that's a but more complicated:
SELECT
COUNT(CASE WHEN TotPay<=10000 THEN 1 END) AS Below10K,
COUNT(CASE WHEN TotPay> 10000 THEN 1 END) AS Above10K
FROM (
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
)

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13
As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.