Hive: error calculating SUM then MAX of grouped items - sql

I would like to run a query that calculates maximum money spent for each month of each credit card. For each credit card, I will need to calculate the sum of money spent each month. I have a table containing transactions of credit cards credit_transact:
processdate timestamp ""
cardno_hash string ""
amount int ""
year int ""
month int ""
Made-up sample data:
card year month amount
a123 2016 12 23160
a123 2016 10 287
c123 2016 11 5503
c123 2016 11 4206
I would like:
card year month amount
a123 2016 12 23160
c123 2016 11 9709
One important thing is year and month are partition columns.
I have tried a subquery like below:
USE credit_card_db;
SELECT sum_amount_transact.cardno_hash, sum_amount_transact.year, sum_amount_transact.month, MAX(sum_amount_transact.sum_amount)
FROM
(
SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
GROUP BY cardno_hash, year, month
) AS sum_amount_transact
GROUP BY sum_amount_transact.cardno_hash, sum_amount_transact.year;
However, the following error is shown:
java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid column reference 'month'
The following subquery worked fine and returned results as expected:
SELECT cardno_hash, year, month, SUM(amount) AS sum_amount FROM credit_transact
GROUP BY cardno_hash, year, month
The result is:
card year month amount
a123 2016 12 23160
a123 2016 10 287
c123 2016 11 9709
Would very much appreciate if anyone can help with this problem.

I can't quite tell what you really want, but I'm pretty sure you want row_number(). I think you want the maximum month per year:
SELECT ct.*
FROM (SELECT cardno_hash, year, month, SUM(amount) AS sum_amount,
ROW_NUMBER() OVER (PARTITION BY cardno_hash, year ORDER BY SUM(amount) DESC) as seqnum
FROM credit_transact
GROUP BY cardno_hash, year, month
) ct
WHERE seqnum = 1;

Related

SQL query: get total values for each month

I have a table that stores, number of fruits sold on each day. Stores number of items sold on particular date.
CREATE TABLE data
(
code VARCHAR2(50) NOT NULL,
amount NUMBER(5) NOT NULL,
DATE VARCHAR2(50) NOT NULL,
);
Sample data
code |amount| date
------+------+------------
aple | 1 | 01/01/2010
aple | 2 | 02/02/2010
orange| 3 | 03/03/2010
orange| 4 | 04/04/2010
I need to write a query, to list out, how many apple and orange sold for jan and february?
--total apple for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'aple';
--total apple for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'aple';
--total orange for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'orange';
--total orange for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'orange';
If I need to calculate for more months, more fruits, its tedious.is there a short query to write?
Can I combine at least for the months into 1 query? So 1 query to get total for each month for 1 fruit?
You can use conditional aggregation such as
SELECT TO_CHAR("date",'MM/YYYY') AS "Month/Year",
SUM( CASE WHEN code = 'apple' THEN amount END ) AS apple_sold,
SUM( CASE WHEN code = 'orange' THEN amount END ) AS orange_sold
FROM data
WHERE "date" BETWEEN date'2020-01-01' AND date'2020-02-29'
GROUP BY TO_CHAR("date",'MM/YYYY')
where date is a reserved keyword, cannot be a column name unless quoted.
Demo
select sum(amount), //date.month
from mg.drum
group by //date.month
//data.month Here you can give experssion which will return month number or name.
If you are dealing with months, then you should include the year as well. I would recommend:
SELECT TRUNC(date, 'MON') as yyyymm, code,
SUM(amount)
FROM t
GROUP BY TRUNC(date, 'MON'), code;
You can add a WHERE clause if you want only some dates or codes.
This will return a separate row for each row that has data. That is pretty close to the results from your four queries -- but this does not return 0 values.
select to_char(date_col,'MONTH') as month, code, sum(amount)
from mg.drum
group by to_char(date_col,'MONTH'), code

record for last two month and their difference in oracle

i need variance for last two month and i am using below query
with Positions as
(
select
COUNT(DISTINCT A_SALE||B_SALE) As SALES,
TO_CHAR(DATE,'YYYY-MON') As Period
from ORDERS
where DATE between date '2020-02-01' and date '2020-02-29'
group by TO_CHAR(DATE,'YYYY-MON')
union all
select
COUNT(DISTINCT A_SALE||B_SALE) As SALES,
TO_CHAR(DATE,'YYYY-MON') As Period
from ORDERS
where DATE between date '2020-03-01' and date '2020-03-31'
group by TO_CHAR(DATE,'YYYY-MON')
)
select
SALES,
period,
case when to_char(round((SALES-lag(SALES,1, SALES) over (order by period desc))/ SALES*100,2), 'FM999999990D9999') <0
then to_char(round(abs( SALES-lag(SALES,1, SALES) over (order by period desc))/ SALES*100,2),'FM999999990D9999')||'%'||' (Increase) '
when to_char(round((SALES-lag(SALES,1,SALES) over (order by period desc))/SALES*100,2),'FM999999990D9999')>0
then to_char(round(abs(SALES-lag(SALES,1, SALES) over (order by period desc ))/SALES*100,2),'FM999999990D9999')||'%'||' (Decrease) '
END as variances
from Positions
order by Period;
i am getting output like this
SALES | Period | variances
---------|------------------|--------------------
100 | 2020-FEB | 100%(Increase)
200 | 2020-MAR | NULL
i want record something like that where variance in front of march instead of feb as we are looking variance for the latest month
SALES | Period | variances
---------|------------------|--------------------
200 | 2020-MAR | 100%(Increase)
100 | 2020-FEB | NULL
I did not analyze the query in too much detail but you have one obvious flaw.
You change your period from a date to char.
That means when you apply your window functions your ordering will not work as expected.
a date ordered desc will look like (based on chronological ordering)
MAR - 2020
FEB - 2020
JAN - 2020
Text ordered desc will look like (based on alphabetical ordering)
MAR - 2020
JAN - 2020
FEB - 2020
That being said, you are comparing a 'good' case (FEB + MAR) where both the text ordering and date ordering will work the same way.
The implied ordering is ASCENDING. So at the end when you do
order by Period;
it will display February first and then March. If you do
order by Period DESC;
you will get March displayed first.

How to get monthwise sum from table?

Table Transaction(Id, DateTime, Debit, Credit)
I want a monthwise sum of Debit and Credit.
What is a good option to retrieve monthwise result?
Sample Output:
Month Id Debit Credit
January 1 200 70
January 2 400 80
February 1 400 90
February 2 300 50
Try this below script with GROUP BY function. I have added YEAR in consideration other wise same month from different year will count as same month.
SELECT YEAR(DateTime),
MONTH(DateTime),
Id,
SUM(Debit) total_debit,
SUM(Credit) total_credit
FROM your_table
GROUP BY YEAR(DateTime), MONTH(DateTime), Id
Apply Group by clause to SQL Query
group by month(DateTime),Year(DateTime)

Time series past weeks revenue calculation

I have the following data set
Customer Week Revenue
A 201701 100
A 201702 99
A 201703 120
A 201704 110
I need to create the following variables for each customer week in SQL
Customer Week
past 4 weeks revenue
past 7 weeks revenue
past 11 weeks revenue and so on till past 51 weeks revenue.
Here's the approach which I am trying to use, the issue with this approach is that for each past n weeks, I would have create seperate tables and join all those together.
select customer, sum(revenue)
(select customer, cust2.revenue
from customer1 join customer2
on customerid = customerid
where cust1.week <= cust2.week + 51)
Is there a more efficient way of calculating past 4,7,11,15,18,21 till 51 weeks in SQL? I am using spark sql.
Thanks!
You can use window functions. Assuming you have one row per customer/week:
select t.*,
sum(revenue) over (partition by customer order by week rows between 3 preceding and current row) as revenue_4week,
sum(revenue) over (partition by customer order by week rows between 6 preceding and current row) as revenue_7week,
from t;

Oracle sql: Order by with GROUP BY ROLLUP

I'm looking everywhere for an answer but nothing seems to compare with my problem. So, using rollup with query:
select year, month, count (sale_id) from sales
group by rollup (year, month);
Will give the result like:
YEAR MONTH TOTAL
2015 1 200
2015 2 415
2015 null 615
2016 1 444
2016 2 423
2016 null 867
null null 1482
And I would like to sort by total desc, but I would like year with biggest total to be on top (important: with all records that compares to that year), and then other records for other years. So I would like it to look like:
YEAR MONTH TOTAL
null null 1482
2016 null 867
2016 1 444
2016 2 423
2015 null 615
2015 2 415
2015 1 200
Or something like that. Main purpose is to not "split" records comparing to one year while sorting it with total. Can somebody help me with that?
Try using window function max to get max of total for each year in the order by clause:
select year, month, count(sale_id) total
from sales
group by rollup(year, month)
order by max(total) over (partition by year) desc, total desc;
Hmmm. I think this does what you want:
select year, month, count(sale_id) as cnt
from sales
group by rollup (year, month)
order by sum(count(sale_id)) over (partition by year) desc, year;
Actually, I've never use window functions in an order by with a rollup query. I wouldn't be surprised if a subquery were necessary.
I think you need to used GROUPING SETS and GROUP_ID's. These will help you determine a NULL caused by a subtotal. Take a look at the doc: https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm