TERADATA: Aggregate across multiple tables - sql

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)

Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;

Related

simple sql query highest sales

There are 2 tables - Products and Sales
Products
prod_id
prod_nm
Sales
prod_id
cust_id
sls_dt
sls_amt
Write a query selecting ALL the products. For each product show total of sales amounts in the past 30 days or 0 if not sold in 30 day withoug using subqueries.
Since different RDBMS have different date functions, you can filter by date using the following pseudo code - sls_dt > now() - 30.
Im new to sql and im trying it like this as i found this online.
Select prod_id, prod_nm from(
Select sls_amt
From Sales) as t
Where t.rank = 1
However, this isnt' working. Any help is appreciated
Try below:
select p.prod_id,
p.prod_nm,
sum(s.sls_amt)
from products p
left outer join Sales s on p.prod_id = s.prod_id
and s.sls_dt > now() - 30
group by p.prod_id,
p.prod_nm;

Summing a column over a date range in a CTE?

I'm trying to sum a certain column over a certain date range. The kicker is that I want this to be a CTE, because I'll have to use it multiple times as part of a larger query. Since it's a CTE, it has to have the date column as well as the sum and ID columns, meaning I have to group by date AND ID. That will cause my results to be grouped by ID and date, giving me not a single sum over the date range, but a bunch of sums, one for each day.
To make it simple, say we have:
create table orders (
id int primary key,
itemID int foreign key references items.id,
datePlaced datetime,
salesRep int foreign key references salesReps.id,
price int,
amountShipped int);
Now, we want to get the total money a given sales rep made during a fiscal year, broken down by item. That is, ignoring the fiscal year bit:
select itemName, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName
Simple enough. But when you add anything else, even the price, the query spits out way more rows than you wanted.
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName, price
Now, each group is (name, price) instead of just (name). This is kind of sudocode, but in my database, just this change causes my result set to jump from 13 to 32 rows. Add to that the date range, and you really have a problem:
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between 150101 and 151231
group by itemName, price
This is identical to the last example. The trouble is making it a CTE:
with totals as (
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped, orderDate as startDate, orderDate as endDate
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between startDate and endDate
group by itemName, price, startDate, endDate
)
select totals_2015.itemName as itemName_2015, totals_2015.price as price_2015, ...
totals_2016.itemName as itemName_2016, ...
from (
select * from totals
where startDate = 150101 and endDate = 151231
) totals_2015
join (
select *
from totals
where startDate = 160101 and endDate = 160412
) totals_2016
on totals_2015.itemName = totals_2016.itemName
Now the grouping in the CTE is way off, more than adding the price made it. I've thought about breaking the price query into its own subquery inside the CTE, but I can't escape needing to group by the dates in order to get the date range. Can anyone see a way around this? I hope I've made things clear enough. This is running against an IBM iSeries machine. Thank you!
Depending on what you are looking for, this might be a better approach:
select 'by sales rep' breakdown
, salesRep
, '' year
, sum(price * amountShipped) amount
from etc
group by salesRep
union
select 'by sales rep and year' breakdown
, salesRep
, convert(char(4),orderDate, 120) year
, sum(price * amountShipped) amount
from etc
group by salesRep, convert(char(4),orderDate, 120)
etc
When possible group by the id columns or foreign keys because the columns are indexed already you'll get faster results. This applies to any database.
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte order by rep
or more fancy
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte join reps on cte.rep = reps.rep order by sls desc
I eventually found a solution, and it doesn't need a CTE at all. I wanted the CTE to avoid code duplication, but this works almost as well. Here's a thread explaining summing conditionally that does exactly what I was looking for.

SQL / sum on different date ranges with other conditions

I have the following code:
SELECT
day
,product_id
,product_name
,quantity_on_hand
,inventory_condition
FROM
(
SELECT
table1.product_id as product_id
,table1.product_name as product_name
FROM table1
WHERE
product_id = XXXX
)product_table
,
(
SELECT
table2.day as day
,table2.product_id as inv_product_id
,inventory_condition
,sum( table2.quantity) AS quantity_on_hand
FROM table2
WHERE
table2.day = TO_DATE('{RUN_DATE_YYYY/MM/DD}', 'YYYY/MM/DD')
AND table2.inventory_condition = XXX
GROUP BY
table2.day
,table2.product_id
,inventory_conditio
) inv
WHERE
product_id = inv.product_id
this code works great if I want to extract the data for a single day. But I want to extract the data for 3 different days in the same query. I've tried to use a OR() on my condition on table2.day but it will give me the sum of the data for the 3 days all together. I've also tried to do
Sum() over (Partition by table2.day)
But i'm not sure how to use the syntax.
tahks a lot for your help

Using a column in sql join without adding it to group by clause

My actual table structures are much more complex but following are two simplified table definitions:
Table invoice
CREATE TABLE invoice (
id integer NOT NULL,
create_datetime timestamp with time zone NOT NULL,
total numeric(22,10) NOT NULL
);
id create_datetime total
----------------------------
100 2014-05-08 1000
Table payment_invoice
CREATE TABLE payment_invoice (
invoice_id integer,
amount numeric(22,10)
);
invoice_id amount
-------------------
100 100
100 200
100 150
I want to select the data by joining above 2 tables and selected data should look like:-
month total_invoice_count outstanding_balance
05/2014 1 550
The query I am using:
select
to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') as month,
count(i.id) as total_invoice_count,
(sum(i.total) - sum(pi.amount)) as outstanding_balance
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by date_trunc('month', i.create_datetime)
order by date_trunc('month', i.create_datetime);
Above query is giving me incorrect results as sum(i.total) - sum(pi.amount) returns (1000 + 1000 + 1000) - (100 + 200 + 150) = 2550.
I want it to return (1000) - (100 + 200 + 150) = 550
And I cannot change it to i.total - sum(pi.amount), because then I am forced to add i.total column to group by clause and that I don't want to do.
You need a single row per invoice, so aggregate payment_invoice first - best before you join.
When the whole table is selected, it's typically fastest to aggregate first and join later:
SELECT to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') AS month
, count(*) AS total_invoice_count
, (sum(i.total) - COALESCE(sum(pi.paid), 0)) AS outstanding_balance
FROM invoice i
LEFT JOIN (
SELECT invoice_id AS id, sum(amount) AS paid
FROM payment_invoice pi
GROUP BY 1
) pi USING (id)
GROUP BY date_trunc('month', i.create_datetime)
ORDER BY date_trunc('month', i.create_datetime);
LEFT JOIN is essential here. You do not want to loose invoices that have no corresponding rows in payment_invoice (yet), which would happen with a plain JOIN.
Accordingly, use COALESCE() for the sum of payments, which might be NULL.
SQL Fiddle with improved test case.
Do the aggregation in two steps. First aggregate to a single line per invoice, then to a single line per month:
select
to_char(date_trunc('month', t.create_datetime), 'MM/YYYY') as month,
count(*) as total_invoice_count,
(sum(t.total) - sum(t.amount)) as outstanding_balance
from (
select i.create_datetime, i.total, sum(pi.amount) amount
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by i.id, i.total
) t
group by date_trunc('month', t.create_datetime)
order by date_trunc('month', t.create_datetime);
See sqlFiddle
SELECT TO_CHAR(invoice.create_datetime, 'MM/YYYY') as month,
COUNT(invoice.create_datetime) as total_invoice_count,
invoice.total - payments.sum_amount as outstanding_balance
FROM invoice
JOIN
(
SELECT invoice_id, SUM(amount) AS sum_amount
FROM payment_invoice
GROUP BY invoice_id
) payments
ON invoice.id = payments.invoice_id
GROUP BY TO_CHAR(invoice.create_datetime, 'MM/YYYY'),
invoice.total - payments.sum_amount

Select highest profit from each year SQL

How do I obtain the highest value for each year within a table. So let's say we have a table movies and I want to find the highest profiting film for each year.
This is my attempt so far:
SELECT year, MAX(income - cost) AS profit, title
FROM Movies m, Movies m2
GROUP BY year
I am pretty certain it is going to need some sub selects but I can't visualise what I need to do. I was also thinking probably some sort of distinct option to rule out duplicate years.
Title Year Income Cost Length
A 2000 10 2 2
B 2000 9 7 2
So from this the expected result would be
Title Year Profit
A 2000 8
I'm guessing slightly at what you want, but since you've not specified any RDBMS a generic solution would be:
SELECT m.Year, (m.Income - m.Cost) AS Profit, m.Title
FROM Movies m
INNER JOIN
( SELECT m.Year, MAX(m.Income - m.Cost) AS Profit
FROM Movies
GROUP BY m.Year
) MaxProfit
ON MaxProfit.Year = m.Year
AND MaxProfit.Profit = (m.Income - m.Cost)
ORDER BY m.Year
You can also do this using analytic functions if your DBMS permits. e.g. SQL-Server
WITH MovieCTE AS
( SELECT m.Year,
Profit = (m.Income - m.Cost),
m.Title,
RowNumber = ROW_NUMBER() OVER(PARTITION BY m.Year ORDER BY (m.Income - m.Cost) DESC)
FROM Movies
)
SELECT year, Profit, Title
FROM MovieCTE
WHERE RowNumber = 1
It is possible I have misunderstood your exact criteria, but I am sure the same priciples can be applied, you will just need to alter the grouping and the join in the first example, or the partition by in the second.
select m1year,m1profit,title
from
(
(select year as m1year, max(income- cost) as m1profit from movies group by year) m1
join
(select m2year, (income-cost) as m2profit ,title as profit from movies) m2
on
m1profit = m2profit
) m
This will give the highest profit movie for each year, and choose the first title in the event of a tie:
select a.year, a.profit,
(select min(title) from Movies where year = a.year and income - cost = a.profit) as title
from (
select year, max(income - cost) as profit
from Movies -- title, year, cost, income, number
group by year
) as a
order by year desc