Using a column in sql join without adding it to group by clause - sql

My actual table structures are much more complex but following are two simplified table definitions:
Table invoice
CREATE TABLE invoice (
id integer NOT NULL,
create_datetime timestamp with time zone NOT NULL,
total numeric(22,10) NOT NULL
);
id create_datetime total
----------------------------
100 2014-05-08 1000
Table payment_invoice
CREATE TABLE payment_invoice (
invoice_id integer,
amount numeric(22,10)
);
invoice_id amount
-------------------
100 100
100 200
100 150
I want to select the data by joining above 2 tables and selected data should look like:-
month total_invoice_count outstanding_balance
05/2014 1 550
The query I am using:
select
to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') as month,
count(i.id) as total_invoice_count,
(sum(i.total) - sum(pi.amount)) as outstanding_balance
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by date_trunc('month', i.create_datetime)
order by date_trunc('month', i.create_datetime);
Above query is giving me incorrect results as sum(i.total) - sum(pi.amount) returns (1000 + 1000 + 1000) - (100 + 200 + 150) = 2550.
I want it to return (1000) - (100 + 200 + 150) = 550
And I cannot change it to i.total - sum(pi.amount), because then I am forced to add i.total column to group by clause and that I don't want to do.

You need a single row per invoice, so aggregate payment_invoice first - best before you join.
When the whole table is selected, it's typically fastest to aggregate first and join later:
SELECT to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') AS month
, count(*) AS total_invoice_count
, (sum(i.total) - COALESCE(sum(pi.paid), 0)) AS outstanding_balance
FROM invoice i
LEFT JOIN (
SELECT invoice_id AS id, sum(amount) AS paid
FROM payment_invoice pi
GROUP BY 1
) pi USING (id)
GROUP BY date_trunc('month', i.create_datetime)
ORDER BY date_trunc('month', i.create_datetime);
LEFT JOIN is essential here. You do not want to loose invoices that have no corresponding rows in payment_invoice (yet), which would happen with a plain JOIN.
Accordingly, use COALESCE() for the sum of payments, which might be NULL.
SQL Fiddle with improved test case.

Do the aggregation in two steps. First aggregate to a single line per invoice, then to a single line per month:
select
to_char(date_trunc('month', t.create_datetime), 'MM/YYYY') as month,
count(*) as total_invoice_count,
(sum(t.total) - sum(t.amount)) as outstanding_balance
from (
select i.create_datetime, i.total, sum(pi.amount) amount
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by i.id, i.total
) t
group by date_trunc('month', t.create_datetime)
order by date_trunc('month', t.create_datetime);

See sqlFiddle
SELECT TO_CHAR(invoice.create_datetime, 'MM/YYYY') as month,
COUNT(invoice.create_datetime) as total_invoice_count,
invoice.total - payments.sum_amount as outstanding_balance
FROM invoice
JOIN
(
SELECT invoice_id, SUM(amount) AS sum_amount
FROM payment_invoice
GROUP BY invoice_id
) payments
ON invoice.id = payments.invoice_id
GROUP BY TO_CHAR(invoice.create_datetime, 'MM/YYYY'),
invoice.total - payments.sum_amount

Related

SQL / sum on different date ranges with other conditions

I have the following code:
SELECT
day
,product_id
,product_name
,quantity_on_hand
,inventory_condition
FROM
(
SELECT
table1.product_id as product_id
,table1.product_name as product_name
FROM table1
WHERE
product_id = XXXX
)product_table
,
(
SELECT
table2.day as day
,table2.product_id as inv_product_id
,inventory_condition
,sum( table2.quantity) AS quantity_on_hand
FROM table2
WHERE
table2.day = TO_DATE('{RUN_DATE_YYYY/MM/DD}', 'YYYY/MM/DD')
AND table2.inventory_condition = XXX
GROUP BY
table2.day
,table2.product_id
,inventory_conditio
) inv
WHERE
product_id = inv.product_id
this code works great if I want to extract the data for a single day. But I want to extract the data for 3 different days in the same query. I've tried to use a OR() on my condition on table2.day but it will give me the sum of the data for the 3 days all together. I've also tried to do
Sum() over (Partition by table2.day)
But i'm not sure how to use the syntax.
tahks a lot for your help

How to get the discount number of customers in prior period?

I have a requirement where I supposed to roll customer data in the prior period of 365 days.
Table:
CREATE TABLE orders (
persistent_key_str character varying,
ord_id character varying(50),
ord_submitted_date date,
item_sku_id character varying(50),
item_extended_actual_price_amt numeric(18,2)
);
Sample data:
INSERT INTO orders VALUES
('01120736182','ORD6266073' ,'2010-12-08','100856-01',39.90),
('01120736182','ORD33997609' ,'2011-11-23','100265-01',49.99),
('01120736182','ORD33997609' ,'2011-11-23','200020-01',29.99),
('01120736182','ORD33997609' ,'2011-11-23','100817-01',44.99),
('01120736182','ORD89267964' ,'2012-12-05','200251-01',79.99),
('01120736182','ORD89267964' ,'2012-12-05','200269-01',59.99),
('01011679971','ORD89332495' ,'2012-12-05','200102-01',169.99),
('01120736182','ORD89267964' ,'2012-12-05','100907-01',89.99),
('01120736182','ORD89267964' ,'2012-12-05','200840-01',129.99),
('01120736182','ORD125155068','2013-07-27','201443-01',199.99),
('01120736182','ORD167230815','2014-06-05','200141-01',59.99),
('01011679971','ORD174927624','2014-08-16','201395-01',89.99),
('01000217334','ORD92524479' ,'2012-12-20','200021-01',29.99),
('01000217334','ORD95698491' ,'2013-01-08','200021-01',19.99),
('01000217334','ORD90683621' ,'2012-12-12','200021-01',29.990),
('01000217334','ORD92524479' ,'2012-12-20','200560-01',29.99),
('01000217334','ORD145035525','2013-12-09','200972-01',49.99),
('01000217334','ORD145035525','2013-12-09','100436-01',39.99),
('01000217334','ORD90683374' ,'2012-12-12','200284-01',39.99),
('01000217334','ORD139437285','2013-11-07','201794-01',134.99),
('01000827006','W02238550001','2010-06-11','HL 101077',349.000),
('01000827006','W01738200001','2009-12-10','EL 100310 BLK',119.96),
('01000954259','P00444170001','2009-12-03','PC 100455 BRN',389.99),
('01002319116','W02242430001','2010-06-12','TR 100966',35.99),
('01002319116','W02242430002','2010-06-12','EL 100985',99.99),
('01002319116','P00532470001','2010-05-04','HO 100482',49.99);
Using the query below I am trying to get the number of distinct customers by order_submitted_date:
select
g.order_date as "Ordered",
count(distinct o.persistent_key_str) as "customers"
from
generate_series(
(select min(ord_submitted_date) from orders),
(select max(ord_submitted_date) from orders),
'1 day'
) g (order_date)
left join
orders o on o.ord_submitted_date between g.order_date - interval '364 days'
and g.order_date
WHERE extract(year from ord_submitted_date) <= 2009
group by 1
order by 1
This is the output I expected.
Ordered Customers
2009-12-03 1
2009-12-10 1
When I execute the query above I get incorrect results.
How can I make this right?
To get your expected output ("the number of distinct customers") - only days with actual orders 2009:
SELECT ord_submitted_date, count(DISTINCT persistent_key_str) AS customers
FROM orders
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
Formulate the WHERE conditions this way to make the query sargable, and input easy.
If you want one row per day (from the earliest entry up to the latest in orders) - within 2009:
SELECT ord_submitted_date AS ordered
, count(DISTINCT o.persistent_key_str) AS customers
FROM (SELECT generate_series(min(ord_submitted_date) -- single query ...
, max(ord_submitted_date) -- ... to get min / max
, '1d')::date FROM orders) g (ord_submitted_date)
LEFT join orders o USING (ord_submitted_date)
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
Distinct customers per year
SELECT extract(year from ord_submitted_date) AS year
, count(DISTINCT persistent_key_str) AS customers
FROM orders
GROUP BY 1
ORDER BY 1;
SQL Fiddle.

Grouping multiple selects within a SQL query

I have a table Supplier with two columns, TotalStock and Date. I'm trying to write a single query that will give me stock totals by week / month / year for a list of suppliers.
So results will look like this..
SUPPLIER WEEK MONTH YEAR
SupplierA 50 100 2000
SupplierB 60 150 2500
SupplierC 15 25 200
So far I've been playing around with multiple selects but I can't get any further than this:
SELECT Supplier,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) AS StockThisWeek,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) AS StockThisMonth,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-1-1'
GROUP BY Supplier
) AS StockThisYear
This query throws an error as each individual grouping returns multiple results. I feel that I'm close to the solution but can't work out where to go
You don't have to use subqueries to achieve what you want :
SELECT Supplier
, SUM(CASE WHEN Date >= CAST('2014-05-12' as DATE) THEN TotalStock END) AS StockThisWeek
, SUM(CASE WHEN Date >= CAST('2014-05-01' as DATE) THEN TotalStock END) AS StockThisMonth
, SUM(CASE WHEN Date >= CAST('2014-01-01' as DATE) THEN TotalStock END) AS StockThisYear
FROM StockBreakdown
GROUP BY Supplier
You may need to make the selects for the columns return only a single result. You could try this (not tested currently):
SELECT Supplier,
(
SELECT TOP 1 StockThisWeek FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisWeek
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) tmp1
WHERE tmp1.Supplier = Supplier
) AS StockThisWeek,
(
SELECT TOP 1 StockThisMonth FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisMonth
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) tmp2
WHERE tmp2.Supplier = Supplier
) AS StockThisMonth,
...
This selects the supplier and then tries to create two columns StockThisWeek and StockThisMonth by selecting the first entry from the select you created before. As through the GROUP BY there should only be one entry per supplier, so you don't lose and data.

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;

In Oracle SQL, how do you query the proportion of records of a certain value?

Say, you have a query like
SELECT COUNT(*), date FROM ORDERS GROUP BY date ORDER BY date
but you also want to have a third "phantom/dummy field", where it basically tells you the fraction of orders each day that are of a particular type (lets say "Utensils" and "Perishables").
I should say that there is an additional column in the ORDERS table that has the type of the order:
order_type
The third dummy column should do something like take the count of orders on a date that have the "Utensils" or the "Perishables" type (not XOR), then divide by the total count of orders of that day, and then round to 2 decimal points, and append a percentage sign.
The last few formatting things, aren't really important...all I really need to know is how to apply the logic in valid PLSQL syntax.
Example output
4030 2012-02-02 34.43%
4953 2012-02-03 16.66%
You can do something like
SELECT COUNT(*),
dt,
round( SUM( CASE WHEN order_type = 'Utensils'
THEN 1
ELSE 0
END) * 100 / COUNT(*),2) fraction_of_utensils_orders
FROM ORDERS
GROUP BY dt
ORDER BY st
If you find it easier to follow, you could also
SELECT COUNT(*),
dt,
round( COUNT( CASE WHEN order_type = 'Utensils'
THEN 1
ELSE NULL
END) * 100/ COUNT(*), 2) fraction_of_utensils_orders
FROM ORDERS
GROUP BY dt
ORDER BY st
To Add sum of orders of same type to query:
select
o.*,
(
select count(o2.OrderType)
from ORDERS o2
where o2.OrderType = o.OrderType
) as NumberOfOrdersOfThisType
from ORDERS o
To Add fraction of orders of same type to query:
(Check variable definition to make sure it is PL/SQL)
declare totalCount number
select count(*)
into totalCount
from ORDERS
select
o.*,
(
select count(o2.OrderType)
from ORDERS o2
where o2.OrderType = o.OrderType
) / totalCount as FractionOfOrdersOfThisType
from ORDERS o