Summary Statistics for different categories using hive

Summary Statistics for different categories using hive - sql

I have below two hive tables called a and b. I need to create descriptive summary stats for it.
Now I want to calculate the summary statistics like below:
Expected output
Sum of Amount Count Sum of Fraud Amount Count of Fraud
0-100 120 2 70 1
100-500 610 3 410 2
>500 1300 2 700 1
Where I need Sum of Amount and count by categories mentioned like 0-100, 100-500 and >500.
Second i also need Sum of fraud amount (Where Fraud = 1) and count of frauds.I need to left join to get fraud column to calculate it.
e.g Category 0-100, The sum of amount is 120 (50+70) and count is 2. And Sum of fraud amount is 70 where fraud is 1. Similarly for others i need to calculate.
Table a
ID Amount Date
1 110 01-01-2020
2 200 02-01-2020
3 50 03-01-2020
4 600 04-01-2020
5 700 05-01-2020
6 70 06-01-2020
7 300 07-01-2020
Table b
ID Fraud
1 1
2 0
3 0
4 0
5 1
6 1
7 0
My Approach where i got overall count and Amount sum but i need category wise like, 0-100, 100-500, and >500
select sum(a.Amount), Count(*), count(b.Fraud)
from sample.data a
left join (select id, fraud from sample.label) b
on a.id = b.id
where date between "2020-01-01" and "2020-01-07"
group by fraud;

If I understand correctly, you just need to aggregate by a case expression:
select (case when d.amount <= 100 then '0-100'
when d.amount <= 500 then '101-500'
else '> 500'
end) as grp,
sum(d.Amount), Count(*), sum(l.Fraud)
from sample.data d left join
sample.label l
on a.id = l.id
where d.date between '2020-01-01' and '2020-01-07'
group by (case when d.amount <= 100 then '0-100'
when d.amount <= 500 then '101-500'
else '> 500'
end);

Related

SQL join query for a view with sum of columns across 3 tables

I have 3 tables as below
Table - travel_requests
id industry_id travel_cost stay_cost other_cost
1 2 1000 500 200
2 4 4000 100 200
3 5 3000 0 400
4 1 3000 250 100
5 1 200 100 75
Table - industry_tech_region
id industry_name
1 Auto
2 Aero
3 Machinery
4 Education
5 MTV
Table - industry_allocation
id industry_id allocation
1 1 500000
2 2 300000
3 3 500000
4 4 300000
5 5 500000
6 1 200000
I want to create a view which has 3 columns
industry_name, total_costs, total_allocation
I created a view as below
SELECT industry_tech_region.industry_name,
SUM(travel_requests.travel_cost + travel_requests.stay_cost + travel_requests.other_cost) AS total_cost,
SUM(industry_allocation.allocation) AS total_allocation
FROM industry_tech_region
INNER JOIN industry_allocation
ON industry_tech_region.id = industry_allocation.industry_id
INNER JOIN travel_requests
ON industry_tech_region.id = travel_requests.industry_id
GROUP BY industry_tech_region.industry_name
But the result I get is as below which is incorrect
industry_name total_cost total_allocation
Aero 1700 300000
Auto 7450 1400000 (wrong should be 3725 and 700000)
Education 4300 300000
MTV 3400 500000
This is probably happening because there are 2 entries for industry_id 1 in the travel_requests table. But they should be counted only once.
Please let me know how do we correct the view statement.
Also I want to add another column in view which is remaining_allocation which is difference of total_allocation and total_cost for each industry.

you shoud join the sum (and not sum the join)
select
a.industry_name
, t1.total_cost
, t2.total_allocation
from dbo.industry_tech_region a
left join (
select dbo.travel_requests.industry_id
, SUM(dbo.travel_requests.travel_cost + dbo.travel_requests.stay_cost + dbo.travel_requests.other_cost) AS total_cost
FROM bo.travel_requests
group by dbo.travel_requests.industry_id
) t1 on a.id = t1.industry_id
left join (
select dbo.industry_allocation.industry_id
, SUM(dbo.industry_allocation.allocation) AS total_allocation
from dbo.industry_allocation
group by dbo.industry_allocation.industry_id
) t2 on a.id = t2.industry_id
this happen because you have two entry for the industry_id 1 and then the row are joined two time if you use the subquery for aggreated the row this can't happen ...
I have used left join because seems that not all the industry_id match for the 3 tables ..

You can use this approach too (without the ORDER BY because views do not allow it).
;WITH q AS (
SELECT industry_id
, sum(allocation) AS total_allocation
FROM #industry_allocation
GROUP BY industry_id
)
SELECT #industry_tech_region.industry_name
, isnull(SUM(#travel_request.travel_cost
+ #travel_request.stay_cost
+ #travel_request.other_cost),0.0) AS total_cost
,q.total_allocation AS total_allocation
FROM #industry_tech_region
LEFT JOIN q ON #industry_tech_region.id = q.industry_id
LEFT JOIN #travel_request ON #industry_tech_region.id = #travel_request.industry_id
GROUP BY #industry_tech_region.industry_name,q.total_allocation
ORDER BY industry_name

SQL Server : Joining two pivot tables

I'm trying to create join a table with itself. So for example below the table pivots based on the sum of D (Debits) and C (Credits) however I need to join the table with itself to add additional columns displaying count of an "D" entry and "C" plus two more additional columns showing the overall sum and overall count. How do I join the table below to create the additional columns?
Input table
GL_BU GL_Source GL_JE_Type GL_Amount Amount_Prefix
------------------------------------------------------------------
202 Payables Purchase Invoices 1234 C
202 Payables Purchase Invoices 123 D
202 Inventory Inventory 123 C
202 Payables Purchase Invoices 1234 C
Output Table
GL_BU GL_Source GL_JE_Type Amount D Amount C Count D Count C Total Count Total Amount
------------------------------------------------------------------------------------------
202 Spreadsheet XXXXX 1234 123 1 1 2 1357
202 Manual XXXXX 1234 123 2 2 4 1357
202 Manual XXXXX 1234 123 1 1 2 1357
202 Inventory XXXXX 1234 123 4 4 8 1357
202 Sales Order XXXXXX 1234 123 1 1 2 1357
Current Code
SELECT *
FROM
(SELECT
[GL_Business_Unit]
,[GL_Source]
,[GL_JE_Type]
,([GL_Amount])
,[Amount_Prefix]
FROM [03_rdm].[table_2013]) as t
Pivot(SUM([GL_Amount])
FOR [Amount_Prefix] IN (D,C)) AS pvt1
Current code link in SQLFiddle http://sqlfiddle.com/#!3/92369/2

Your sample data doesn't match your desired result so I'm guessing that this is what you need. You could use a PIVOT to get the result, but it seems that this would be much easier to get this using an aggregate function and some conditional logic via a CASE expression:
select
GL_BU,
GL_Source,
GL_JE_Type,
sum(case when Amount_Prefix = 'D' then GL_Amount else 0 end) Amount_D,
sum(case when Amount_Prefix = 'C' then GL_Amount else 0 end) Amount_C,
sum(case when Amount_Prefix = 'D' then 1 else 0 end) Count_D,
sum(case when Amount_Prefix = 'C' then 1 else 0 end) Count_C,
count(*) TotalCount,
sum(GL_Amount) TotalAmount
from table_2013
group by GL_BU, GL_Source, GL_JE_Type;
See SQL Fiddle with Demo

sql adding specific columns

I have a table in oracle containing fields
id,location,stock,rate
select decode(grouping(id),1,'Total',id) id,loction,
sum(stock) stock,avg(rate) rate from product
group by rollup(id),location
I have got
ID Location stock rate
------------------------------------------
A xx 2 10
A xy 5 20
Total 7 10
B SD 3 4
B RT 6 10
Total 9 7
C FG 12 12
C GH 20 18
Total 32 15
**Now I want a row of Total rows where sum of stock and
sum of rate is shown.**
My desired output is
ID Location stock rate
------------------------------------------
A xx 2 10
A xy 5 20
Total 7 10
B SD 3 4
B RT 6 10
Total 9 7
C FG 12 12
C GH 20 18
Total 32 15
Grand Total 48 32
Note : Rate is not average but sum of average rate of total rows.

You could use UNION to append the total:
select decode(grouping(id),1,'Total',id) id,loction,
sum(stock) stock,avg(rate) rate from product
group by rollup(id),location
UNION
select 'Grand Total','',sum(stock) stock,sum(rate) rate
from product
If you weren't mixing different aggregates you could get a grand total with ROLLUP or with grouping sets, but I don't think you can get a SUM() grand total with AVG() subtotals.

Doing a rollup on all columns will produce the grand total:
select (case when grouping(id) = 1 then 'Total' else id end) as id, loction,
sum(stock) stock,avg(rate) rate
from product
group by rollup(id, location)
However, you do not want the rollup on location. So you can filter that out with a having clause:
select (case when grouping(id) = 1 then 'Total' else id end) as id, loction,
sum(stock) stock, avg(rate) rate
from product
group by rollup(id, location)
having not (grouping(product.location) = 1 and grouping(product.id) = 0);

May be like this:
select decode(grouping(id),1,'Total',id) id,loction,
sum(stock) stock,avg(rate) rate from product
group by rollup(id),location
union
select sum(stock), sum(rate) from
(
select decode(grouping(id),1,'Total',id) id,loction,
sum(stock) stock,avg(rate) rate from product
group by rollup(id),location
) where id='Total' group by id;

Caluculating sum of activity

I have a table which is with following kind of information
activity cost order date other information
10 1 100 --
20 2 100
10 1 100
30 4 100
40 4 100
20 2 100
40 4 100
20 2 100
10 1 101
10 1 101
20 1 101
My requirement is to get sum of all activities over a work order
ex: for order 100
1+2+4+4=11
1(for activity 10)
2(for activity 20)
4 (for activity 30) etc.
i tried with group by, its taking lot time for calculation. There are 1lakh plus records in warehouse. is there any possibility in efficient way.
SELECT SUM(MIN(cost))
FROM COST_WAREHOUSE a
WHERE order = 100
GROUP BY (order, ACTIVITY)

You can use the following query to get the sum of cost for distinct tuples of (activity, order, cost)
SELECT SUM(COST)
FROM
(SELECT DISTINCT activity, order, cost
FROM COST_WAREHOUSE WHERE order = 100) AS A

SQL- Calculating SUM/COUNT with rows of table

I have a PriceComparison table with (StoreNumber, ItemNumber, Price) that keeps pricing data for head-to-head comparison shopping. The goal is a recordset with the following things for all stores:
StoreNumber
COUNT of head-to-head wins for that store
COUNT of head-to-head losses for that store
COUNT of head-to-head ties for that store
SUM of all item pricing for that store
SUM of all head-to-head competitor pricing for items above for that store
Example:
StoreNumber ItemNumber Price
----------- ---------- -----
101 1 1.39
102 1 1.89
101 2 3.49
103 2 2.99
101 3 9.99
104 3 9.99
I'm thinking I can calculate these SUMs and COUNTs if I can get a temporary column added for CompetitorPrice. That way, the item has both prices listed, and it becomes easy.
How can I get this information in the correct configuration? I tried an INNER JOIN to the same table, but that gets tricky.
Thanks!
UPDATE: This is for MS SQL Server.
UPDATE: There will only be two prices per item, no more than 2 stores.

SELECT
a.storenumber,
SUM(CASE WHEN a.price < b.price THEN 1 ELSE 0 END) AS wins,
SUM(CASE WHEN a.price > b.price THEN 1 ELSE 0 END) AS losses,
SUM(CASE WHEN a.price = b.price THEN 1 ELSE 0 END) AS ties,
SUM(a.price) AS store_price_sum,
SUM(b.price) AS competitor_price_sum
FROM
pricecomparison a
INNER JOIN
pricecomparison b ON
a.itemnumber = b.itemnumber AND
a.storenumber <> b.storenumber
GROUP BY
a.storenumber

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Summary Statistics for different categories using hive - sql

Related

SQL join query for a view with sum of columns across 3 tables

SQL Server : Joining two pivot tables

sql adding specific columns

Caluculating sum of activity

SQL- Calculating SUM/COUNT with rows of table

Categories

Resources