to make a global means in a request - sql

my table SALES
ID | NAMEPRODUCT | CATEGORY | AMOUNT
1 | COMPUTER | IT | 600
2 | T-SHIRT | CLOTHING | 25
3 | Doll | TOY | 10
4 | KEYBORD | IT | 30
5 | CAP | CLOTHING | 10
3 | TOY CAR | TOY | 40
I would like to make this type of request :
SELECT SALES1.NAMEPRODUCT,
SALES1.CATEGORY,
SUM(SALES1.AMOUNT)/( select SUM(AMOUNT) FROM SALES AS SALES2 WHERE SALES1.CATEGORY=SALES2.CATEGORY) as ratio
from SALES AS SALES1
GROUP BY
SALES1.NAMEPRODUCT,SALES1.CATEGORY
NAMEPRODUCT | CATEGORY | ratio
COMPUTER | IT | 0,95
T-SHIRT | CLOTHING | 0,71
Doll | TOY | 0,20
KEYBORD | IT | 0,5
CAP | CLOTHING | 0,29
TOY CAR | TOY | 0,80
OR
SELECT SALES1.NAMEPRODUCT,
SALES1.CATEGORY,
SUM(SALES1.AMOUNT),
( select SUM(SALES2 .AMOUNT) FROM SALES2 WHERE SALES1.CATEGORY=SALES2.CATEGORY) AS TOTAL_AMOUNT_for_category
from SALES1
GROUP BY
SALES1.NAMEPRODUCT,
SALES1.CATEGORY
NAMEPRODUCT | CATEGORY | AMOUNT |TOTAL_AMOUNT_for_category
COMPUTER | IT | 600 | 630
T-SHIRT | CLOTHING | 25 | 35
Doll | TOY | 10 | 50
KEYBORD | IT | 30 | 630
CAP | CLOTHING | 10 | 35
TOY CAR | TOY | 40 | 50
I use Cassandra database with Apache Hive connector
I can't make this type of request with hive and cassandra
Can someone help me ?

You seem to have only one table. If so, use window functions:
select nameproduct, category,
sum(amount) as amount,
sum(amount) * 1.0 / sum(sum(amount)) over (partition by category) as ratio
from sales1
group by nameproduct, category;

Related

PostgreSQL query to count net new customers grouped by products and segments

I have a table of billing data that looks like this:
+-----------------------+------------+------------+-------------+---------+
| segment | Product | custname | prod_yr_mth | revenue |
+-----------------------+------------+------------+-------------+---------+
| PUBLIC SECTOR | product 1 | customer a | 201806 | 318.34 |
| LARGE ENTERPRISE | product 2 | customer b | 201902 | 8.96 |
| SMALL MEDIUM BUSINESS | product 3 | customer c | 201907 | -10 |
| SMALL MEDIUM BUSINESS | product 4 | customer d | 201804 | 0 |
| SMALL MEDIUM BUSINESS | product 5 | customer e | 201809 | 9.33 |
| LARGE ENTERPRISE | product 1 | customer f | 201901 | 155.75 |
| PUBLIC SECTOR | product 6 | customer g | 201905 | 24.32 |
| SMALL MEDIUM BUSINESS | product 2 | customer h | 201812 | 0.25 |
| SMALL MEDIUM BUSINESS | product 2 | customer i | 201801 | 5.46 |
| LARGE ENTERPRISE | product 7 | customer j | 201805 | 4.5 |
| LARGE ENTERPRISE | product 1 | customer k | 201812 | 0 |
| SMALL MEDIUM BUSINESS | product 8 | customer l | 201809 | 2.99 |
| LARGE ENTERPRISE | product 2 | customer m | 201812 | 0.71 |
| LARGE ENTERPRISE | product 1 | customer n | 201902 | 0 |
| PUBLIC SECTOR | product 2 | customer o | 201803 | 1.08 |
| SMALL MEDIUM BUSINESS | product 9 | customer p | 201802 | 10.27 |
| LARGE ENTERPRISE | product 10 | customer a | 201905 | 52.99 |
| PUBLIC SECTOR | product 1 | customer b | 201810 | 7 |
| SMALL MEDIUM BUSINESS | product 3 | customer c | 201906 | 40 |
+-----------------------+------------+------------+-------------+---------+
I would like to get a count of net new customers for each product grouped by business segment.
So this month the maximum date in the prod_yr_mth column is 201908. I would consider a customer "net new" if the earliest value for that customer buying that product is 201908.
The end result should look like this:
+-----------------------+------------+-------------------+
| segment | Product | Net_New_Customers |
+-----------------------+------------+-------------------+
| LARGE ENTERPRISE | product 1 | 0 |
| LARGE ENTERPRISE | product 10 | 5 |
| LARGE ENTERPRISE | product 2 | 6 |
| LARGE ENTERPRISE | product 6 | 1 |
| LARGE ENTERPRISE | product 7 | 2 |
| PUBLIC SECTOR | product 1 | 3 |
| PUBLIC SECTOR | product 2 | 1 |
| PUBLIC SECTOR | product 5 | 1 |
| PUBLIC SECTOR | product 6 | 1 |
| SMALL MEDIUM BUSINESS | product 1 | 1 |
| SMALL MEDIUM BUSINESS | product 2 | 0 |
| SMALL MEDIUM BUSINESS | product 3 | 9 |
| SMALL MEDIUM BUSINESS | product 4 | 8 |
| SMALL MEDIUM BUSINESS | product 5 | 7 |
| SMALL MEDIUM BUSINESS | product 6 | 3 |
| SMALL MEDIUM BUSINESS | product 8 | 4 |
| SMALL MEDIUM BUSINESS | product 9 | 5 |
| SMALL MEDIUM BUSINESS | product 10 | 2 |
+-----------------------+------------+-------------------+
Thank you!
If I understand correctly, you want to compare the earliest date for a customer/product to the most recent date in the table. If so:
select seg_lvl5_2, Product,
count(*) filter (where max_yyyymm = min_cp_yyyymm) as num_net_new
from (select t.*,
max(prod_yr_mth) over () as max_yyyymm,
min(prod_yr_mth) over (partition by custname, product) as min_cp_yyyymm
from t
) t
group by seg_lvl5_2, Product;

How to write an SQL report using distinct and sum

I'm trying to write an SQL report that groups rows, removes duplicates, and sums up values in virtual columns.
I have this table
make | model | warranty | price
-------+--------+----------+-------
Honda | Accord | 2 | 700
Honda | Civic | 3 | 500
Lexus | ES 350 | 1 | 900
Lexus | ES 350 | 1 | 900
Lexus | ES 350 | 2 | 1300
Lexus | ES 350 | 3 | 1800
(6 rows)
I'm trying to create a report that adds two virtual columns, qty and total. Total is the sum of qty * price. The table should like the one below.
qty | make | model | warranty | price | total
-------+--------+--------+----------+-------------
1 | Honda | Accord | 2 | 700 | 700
1 | Honda | Civic | 3 | 500 | 500
2 | Lexus | ES 350 | 1 | 900 | 1800
1 | Lexus | ES 350 | 2 | 1300 | 1300
1 | Lexus | ES 350 | 3 | 1800 | 1800
(5 rows)
I think this is simple aggregation:
select count(*) as qty, make, model, warranty,
avg(price) as price, sum(price) as total
from t
group by make, model, warranty;

SQL: Getting current dates price of a product and product code

I have 3 tables and I need to get a listing of product codes and their current prices.
Product table has the product name (string) and it's code (integer), Manufacturer table has the product name (string) and manufacturers code (integer) for it and the Pricing table has the manufacturers code (integer) for the products, price (numeric) and a date (date).
I don't have much experience with SQL beyond the basics and I can't really figure out how to get the proper listing.
I just built you a example in case you need it to learn :)
Product
+----+----------------+
| id | name |
+----+----------------+
| 1 | GFORCE TITAN |
| 2 | GFORCE 770 |
| 3 | GFORCE 1060 TI |
+----+----------------+
Manufacturer
+----+----------+
| id | name |
+----+----------+
| 1 | Gigabyte |
| 2 | Asus |
| 3 | MSI |
+----+----------+
Prices
+----+-------+-----------------+------------+
| id | price | manufacturer_id | product_id |
+----+-------+-----------------+------------+
| 1 | 1000 | 1 | 1 |
| 2 | 600 | 1 | 2 |
| 3 | 400 | 2 | 2 |
| 4 | 300 | 3 | 3 |
+----+-------+-----------------+------------+
And you should query something like this:
SELECT p.price, m.name as manufacturer, pr.name as product
FROM Prices p
JOIN Manufacturer m ON p.manufacturer_id = m.id
JOIN Product pr ON p.product_id = pr.id
ORDER BY p.price DESC
Result would be:
+----+-------+--------------+----------------+
| id | price | manufacturer | product |
+----+-------+--------------+----------------+
| 1 | 1000 | Gigabyte | GFORCE TITAN |
| 2 | 600 | Gigabyte | GFORCE 770 |
| 3 | 400 | Asus | GFORCE 770 |
| 4 | 300 | MSI | GFORCE 1060 TI |
+----+-------+--------------+----------------+

Display the results of sub category and rollup on the parent category

I am using SAP Hana and wanted to get the data with a roll-up to the category. Also, show the data of the subcategory.
Calculating Total
example:-
So my data currently looks like this, based on my query:
category | sub_category | count
------------+----------------------+--------
Grocery | Dairy | 200
Grocery | fruits | 600
Kitchen | Microwave | 100
Kitchen | Stove | 100
Other | shoes | 400
Other | racks | 500
What I want is:
category | sub_category | count
-------------+----------------------+--------
Grocery | Dairy | 200
Grocery | fruits | 600
Total Grocery| null | 800
Kitchen | Microwave | 100
Kitchen | Stove | 100
Total Kitchen| null | 200
Other | shoes | 400
Other | racks | 500
Total Other | null | 900

Calculate percentage of revenue per month

I have the following underlying data:
+-------+-------+---------------+
| Order | Month | sqft produced |
+-------+-------+---------------+
| 1001 | 4 | 10.29 |
| 1001 | 6 | 4'367.66 |
| 1001 | 7 | 203.57 |
| 1001 | 8 | 294.61 |
| 1001 | 9 | 92.28 |
| 1001 | 10 | 34.47 |
| 1001 | 12 | 16.59 |
| 1002 | 1 | 1.74 |
| 1002 | 4 | 19.54 |
| 1002 | 7 | 5'552.21 |
| 1002 | 9 | 309.62 |
| 1002 | 10 | 24.15 |
| 1002 | 12 | 52.16 |
| 1003 | 5 | 807.45 |
+-------+-------+---------------+
Those are three orders and I want to split the revenue according to the percentage of sqft produced in each month.
The revenue table:
+-------+-----------+
| Order | Revenue |
+-------+-----------+
| 1001 | 1'135'465 |
| 1002 | 1'773'499 |
| 1003 | 172'633 |
+-------+-----------+
So the output of the query should look like this:
+-------+-------+------------------+
| Order | Month | Revenue produced |
+-------+-------+------------------+
| 1001 | 4 | 2'327.72 |
| 1001 | 6 | 988'017.67 |
| 1001 | 7 | 46'050.00 |
| 1001 | 8 | 66'644.36 |
| 1001 | 9 | 20'874.86 |
| 1001 | 10 | 7'797.53 |
| 1001 | 12 | 3'752.86 |
| 1002 | 1 | 517.82 |
| 1002 | 4 | 5'815.02 |
| 1002 | 7 | 1'652'314.97 |
| 1002 | 9 | 92'141.64 |
| 1002 | 10 | 7'186.94 |
| 1002 | 12 | 15'522.60 |
| 1003 | 5 | 172'633.00 |
+-------+-------+------------------+
I am struggling with a way of getting the underlying data in that format because I can't seem to be able to split it by month.
Getting the percentage via dividing the square footage over the sum of the square footage partitioned over the [Order] would get you what you are looking for:
select p.[Order],
[Month],
Revenue,
[sqft produced],
CAST([sqft produced] / SUM([sqft produced]) OVER(PARTITION BY p.[Order]) AS DECIMAL(10,5)) * 100 as sqft_percentage,
Revenue * CAST([sqft produced] / SUM([sqft produced]) OVER(PARTITION BY p.[Order]) AS DECIMAL(10,5)) as revenue_produced
FROM [Orders] p
INNER JOIN Revenue r
ON r.[Order] = p.[Order]
Let's call the first table sqft and the second revenues. To avoid reserved words, the first table column's names will be:
order_id, month_num, sqft_produced
The second:
order_id, total_revenue
The SQL for what you need goes something like this (this is for Oracle):
SELECT order_id, month_num, (sqft_produced / total_sqft * total_revenue) revenue_produced
FROM sqft INNER JOIN
(SELECT order_id, total_sqft, total_revenue FROM (
(SELECT order_id, sum(sqft_produced) total_sqft
GROUP BY order_id) sqfts INNER JOIN revenues ON sqfts.order_id = revenues.order_id)) totals
ON sqft.order_id = totals.order_id;
Step by step:
SELECT order_id, sum(sqft_produced) total_sqft
GROUP BY order_id;
This gets you total square feet produced by order.
(SELECT order_id, sum(sqft_produced) total_sqft
GROUP BY order_id) sqfts INNER JOIN revenues ON sqfts.order_id = revenues.order_id;
This gets you a table with total revenues and total square feet.
And then you join this table to your sqft table and divide the square feet produced each month by total square feet and multiply by total revenue, thus splitting the revenue proportionally to square feet.