Query two tables joined over a third table with the same foreign key? - sql

I have a Postgres database schema groceries.
There are two tables purchases 19 and 20 connected over a third one categories.
I can join every table alone with categories without problem.
For calculating the year change I need 19 and 20 together.
It seems the problem is that the third table categories has got only one foreign key for both tables. Thus it return every time a col with zeros because there is no match for one table. Maybe I am wrong.
Any suggestions to query the tables?
More info below.
The groceries database has a subset dairies: 'whole milk','yogurt', 'domestic eggs'.
There are no clear primary keys.
I share the database file with this link:
https://drive.google.com/drive/folders/1BBXr-il7rmDkHAukETUle_ZYcDC7t44v?usp=sharing
I want to answer:
For each month of 2020, what was the percentage increase or decrease in total monthly dairy purchases compared to the same month in 2019 (i.e., the year_change)?
How can I do this?
I have tried different queries along this line:
SELECT
a.month,
COUNT(a.purchaseid) as sales_2020,
COUNT(b.purchase_id) as sales_2019,
ROUND(((CAST(COUNT(purchaseid) as decimal) /
(SELECT COUNT(purchaseid)FROM purchases_2020)) *100),2)
as market_share,
(COUNT(a.purchaseid) - COUNT(b.purchase_id) ) as year_change
FROM purchases_2020 as a
Left Outer Join categories as cat ON a.purchaseid = cat.purchase_id
Left Outer Join purchases_2019 as b ON cat.purchase_id = b.purchase_id
WHERE cat.category in ('whole milk','yogurt', 'domestic eggs')
GROUP BY a.month
ORDER BY a.month
;
It gives me either no result or the result above with an empty sales_2019 column.
The expected result is a table
with the monthly dairy sales for 2020, the montly market share of dairies of all products in 2020, and the monthly year change between 2019 and 2020 in percentage.
How can I calculate the year change?
Thanks for your help.

%%sql
postgresql:///groceries
with p2019Sales as (
select
month,
count(p.purchase_id) as total_sales
from purchases_2019 p
left join categories c
using (purchase_id)
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by month
order by month
),
mkS as (
select
cast(extract(month from fulldate::date)as int) as month,
count(*) as total_share
from purchases_2020
group by month
order by month
),
p2020Sales as (
select
cast(extract(month from fulldate::date)as int) as month,
count(p.purchaseid) as total_sales,
round(count(p.purchaseid)*100::numeric/ m.total_share,2) as market_share,
sum(count(*)) over() as tos
from purchases_2020 p
left join categories c
on p.purchaseid = c.purchase_id
left join mks m
on cast(extract(month from p.fulldate::date)as int) = m.month
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by 1,m.total_share
order by 1,m.total_share
),
finalSale as (
select
month,
p2.total_sales,
p2.market_share,
round((p2.total_sales - p1.total_sales)*100::numeric/p1.total_sales,2) as year_change
from p2019Sales p1
inner join p2020Sales p2
using(month)
)
select *
from finalSale

The answer of user18262778 is excellent.
but as Jeremy Caney is stating:
" add additional details that will help others understand how this addresses the question asked."
I deliver some details.
My goal:
get the output I want in one query
My problem:
The query is long and complicated.
There are several approaches to the problem:
joins
subqueries
All are prone to circular dependencies.
The subqueries and joins produce results,but discard data necessary to move on further towards the final result
The solution:
The with statement allows to compute the aggregation and reference this by name within the query.
If you know it is the WITH statement, then there is of course a lot of info on the web. The description below summarises exactly the benefits of the given solution in general.
"In PostgreSQL, the WITH query provides a way to write auxiliary statements for use in a larger query. It helps in breaking down complicated and large queries into simpler forms, which are easily readable. These statements often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query.
The WITH query being CTE query, is particularly useful when subquery is executed multiple times. It is equally helpful in place of temporary tables. It computes the aggregation once and allows us to reference it by its name (may be multiple times) in the queries.
The WITH clause must be defined before it is used in the query."
PostgreSQL - WITH Clause

Related

Group by Month and Year in SQL

I am trying to make a query, I must receive a date and give a report in which I must show the sum of the amounts at the end of a month.
What I have so far is this:
CREATE PROCEDURE consulta
#fecha DATE
AS
SELECT
SUM(dca.UNIDADES) as Amount,
MONTH(ca.FINICIO) as Month,
YEAR(ca.FINICIO)
FROM
DETALLE_CONTRATO_ALQUILER dca
INNER JOIN
CONTRATOALQUILER ca ON dca.CODCONTRATO = ca.CODCONTRATO
AND ca.FINICIO >= #fecha
AND YEAR(ca.FINICIO) = YEAR(#fecha)
GROUP BY
MONTH(ca.FINICIO), YEAR(ca.FINICIO)
HAVING
SUM(dca.UNIDADES) > 2;
The comparison of years is because I only have to obtain the months of that same year.
I also attach my diagram:
The context of the database is about product rentals, the tables I use are the rental contract and the detail
I know I get errors because when I enter a specific date, I do not get results. I do not know what I'm failing. My query is correctly logical?
What I expect to obtain is:
Amount | Month | Year
12 1 2017
45 2 2017
...
Here's the example
I would assume all rows of both tables have matching row(s) in the other table, so an INNER JOIN is what you need.
There's a small detail in your query that smells fishy. Your join includes filtering conditions that may throw rows out of the query. Maybe you should place the filtering conditions in a WHERE clause instead of a JOIN clause, as in:
SELECT
SUM(dca.UNIDADES) as Amount,
MONTH(ca.FINICIO) as Month,
YEAR(ca.FINICIO)
FROM
DETALLE_CONTRATO_ALQUILER dca
INNER JOIN
CONTRATOALQUILER ca ON dca.CODCONTRATO = ca.CODCONTRATO
WHERE ca.FINICIO >= #fecha -- Using WHERE instead of JOIN here!
AND YEAR(ca.FINICIO) = YEAR(#fecha)
GROUP BY
MONTH(ca.FINICIO), YEAR(ca.FINICIO)
HAVING
SUM(dca.UNIDADES) > 2;
You can place filtering in the JOIN clause and that is very useful for OUTER JOINs. However, for INNER JOINs that applies to the join itself and may filter out rows you wanted to include.

20 Day moving average with joins alone

There are questions like this all over the place so let me specify where I specifically need help.
I have seen moving averages in SQL with Oracle Analytic functions, MSSQL apply, or a variety of other methods. I have also seen this done with self joins (one join for each day of the average, such as here How do you create a Moving Average Method in SQL? ).
I am curious as to if there is a way (only using self joins) to do this in SQL (preferably oracle, but since my question is geared towards joins alone this should be possible for any RDBMS). The way would have to be scalable (for a 20 or 100 day moving average, in contrast to the link I researched above, which required a join for each day in the moving average).
My thoughts are
select customer, a.tradedate, a.shares, avg(b.shares)
from trades a, trades b
where b.tradedate between a.tradedate-20 and a.tradedate
group by customer, a.tradedate
But when I tried it in the past it hadn't worked. To be more specific, I am trying a smaller but similar exmaple (5 day avg instead of 20 day) with this fiddle demo and cant find out where I am going wrong. http://sqlfiddle.com/#!6/ed008/41
select a.ticker, a.dt_date, a.volume, avg(b.volume)
from yourtable a, yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker
group by a.ticker, a.dt_date, a.volume
I don't see anything wrong with your second query, I think the only reason it's not what you're expecting is because the volume field is an integer data type so when you calculate the average the resulting output will also be an integer data type. For an average you have to cast it, because the result won't necessarily be an integer (whole number):
select a.ticker, a.dt_date, a.volume, avg(cast(b.volume as float))
from yourtable a
join yourtable b
on a.ticker = b.ticker
where b.dt_date between a.dt_date - 5 and a.dt_date
group by a.ticker, a.dt_date, a.volume
Fiddle:
http://sqlfiddle.com/#!6/ed008/48/0 (thanks to #DaleM for DDL)
I don't know why you would ever do this vs. an analytic function though, especially since you mention wanting to do this in Oracle (which has analytic functions). It would be different if your preferred database were MySQL or a database without analytic functions.
Just to add to the answer, this is how you would achieve the same result in Oracle using analytic functions. Notice how the PARTITION BY acts as the join you're using on ticker. That splits up the results so that the same date shared across multiple tickers don't interfere.
select ticker,
dt_date,
volume,
avg(cast(volume as decimal)) over( partition by ticker
order by dt_date
rows between 5 preceding
and current row ) as mov_avg
from yourtable
order by ticker, dt_date, volume
Fiddle:
http://sqlfiddle.com/#!4/0d06b/4/0
Analytic functions will likely run much faster.
http://sqlfiddle.com/#!6/ed008/45 would appear to be what you need.
select a.ticker,
a.dt_date,
a.volume,
(select avg(cast(b.volume as float))
from yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker)
from yourtable a
order by a.ticker, a.dt_date
not a join but a subquery

Two tables with no direct relationship

I have 2 tables with no relation between them. I want to display the data in tabular format by month. Here is a sample output:
There are 2 different tables
1 for income
1 for expense
Problem is that we have no direct relation between these. The only commonality between them is month (date). Does anyone have a suggestion on how to generate such a report?
here is my union queries:
SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') , 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY')
UNION
SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') , ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION
ORDER BY 1
Regards:
In order to do this you probably want to make the Income and Expenses into separate sub-queries.
I have taken the two parts of your union query and separated them into sub-queries, one called income and one called expense. Both sub-queries summarise the data over the month period as before, but now you can use a JOIN on the Months to allow the data from each sub-query to be connected. Note: I have used an OUTER JOIN, because this will still join month where there is no income, but there is expense and vice versa. This will require some manipulation, because you probably are better off returning a set of zeros for the month if no transaction occur.
In the top level SELECT, replace the use of *, with the correct listing of fields required. I simply used this to show that each field can be reused from the sub-query in the outer query, by referring to the alias as the table name.
SELECT Income.*, Expenses.*
FROM (SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') as Month, 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY') Income
OUTER JOIN (SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') as Month, ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION) Expenses
ON Income.Month = Expenses.Month
There are still many calculations that you will have to insert, to get your final result, which you will have to work on separately. The resulting query to perform what you expect above will likely be a lot longer than this, I am just trying to show you the structure.
However the final tricky part for you is going to be the BBF. Balance Bought Forward. SQL is great a joining tables and columns, but each row is treated and handled separately, it does not read and value from the previous row within a query and allow you to manipulate that value in the next row. To do this you need another sub-query to SUM() all the changes from a point in time up until the start of the month. Financial products normally store Balance at points in time, because it is possible that not all transaction are accurately recorded and there needs to be a mechanism to adjust the Balance. Using this theory, you you need to write your sub-query to summarise all changes since the previous Balance.
IMO Financial applications are inherently complex, so the solution is going to take some time to mould into the right one.
Final Word: I am not familiar with OracleReports, but there may be something in there which will assist with maintaining the BBF.
sqlite> create table Income(Month text, total_income real);
sqlite> create table Expense(Month text, total_expense real);
sqlite> insert into Income values('Jan 2014', 9000);
sqlite> insert into Income values('Feb 2014', 6000);
sqlite> insert into Expense values('Jan 2014', 9000);
sqlite> insert into Expense values('Feb 2014', 18000);
sqlite> select Income.Month, Income.total_income, Expense.total_expense, Income.total_income - Expense.total_expense as Balance from Income, Expense where Income.Month == Expense.Month
Jan 2014|9000.0|9000.0|0.0
Feb 2014|6000.0|18000.0|-12000.0

Subtracting 2 values from a query and sub-query using CROSS JOIN in SQL

I have a question that I'm having trouble answering.
Find out what is the difference in number of invoices and total of invoiced products between May and June.
One way of doing it is to use sub-queries: one for June and the other one for May, and to subtract the results of the two queries. Since each of the two subqueries will return one row you can (should) use CROSS JOIN, which does not require the "on" clause since you join "all" the rows from one table (i.e. subquery) to all the rows from the other one.
To find the month of a certain date, you can use MONTH function.
Here is the Erwin document
This is what I got so far. I have no idea how to use CROSS JOIN in this situation
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 5
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 6
If you guys can help that would be great.
Thanks
The problem statement suggests you use the following steps:
Construct a query, with a single result row giving the values for June.
Construct a query, with a single result row giving the values for May.
Compare the results of the two queries.
The issue is that, in SQL, it's not super easy to do that third step. One way to do it is by doing a cross join, which yields a row containing all the values from both subqueries; it's then easy to use SELECT (b - a) ... to get the differences you're looking for. This isn't the only way to do the third step, but what you have definitely doesn't work.
can't you do something with subqueries? I haven't tested this, but something like the below should give you 4 columns, invoices and products for may and june.
select (
select 'stuff' a, count(*) as june_invoices, sum(products) as products from invoices
where month = 'june'
) june , (
select 'stuff' a, count(*) as may_invoices, sum(products) as products from invoices
where month = 'may'
) may
where june.a = may.a

SQL to calculate value of Shares at a particular time

I'm looking for a way that I can calculate what the value of shares are at a given time.
In the example I need to calculate and report on the redemptions of shares in a given month.
There are 3 tables that I need to look at:
Redemptions table that has the Date of the redemption, the number of shares that were redeemed and the type of share.
The share type table which has the share type and links the 1st and 3rd tables.
The Share price table which has the share type, valuation date, value.
So what I need to do is report on and have calculated based on the number of share redemptions the value of those shares broken down by month.
Does that make sense?
Thanks in advance for your help!
Apologies, I think I should elaborate a little further as there might have been some misunderstandings. This isn't to calculate daily changing stocks and shares, it's more for fund management. What this means is that the share price only changes on a monthly basis and it's also normally a month behind.
The effect of this is that the what the query needs to do, is look at the date of the redemption, work out the date ie month and year. Then look at the share price table and if there's a share price for the given date (this will need to be calculated as it will be a single day ie the price was x on day y) then multiple they number of units by this value. However, if there isn't a share price for the given date then use the last price for that particular share type.
Hopefully this might be a little more clear but if there's any other information I can provide to make this easier then please let me know and I'll supply you with the information.
Regards,
Phil
This should do the trick (note: updated to group by ShareType):
SELECT
ST.ShareType,
RedemptionMonth = DateAdd(month, DateDiff(month, 0, R.RedemptionDate), 0),
TotalShareValueRedeemed = Sum(P.SharePrice * R.SharesRedeemed)
FROM
dbo.Redemption R
INNER JOIN dbo.ShareType ST
ON R.ShareTypeID = ST.ShareTypeID
CROSS APPLY (
SELECT TOP 1 P.*
FROM dbo.SharePrice P
WHERE
R.ShareTypeID = P.ShareTypeID
AND R.RedemptionDate >= P.SharePriceDate
ORDER BY P.SharePriceDate DESC
) P
GROUP BY
ShareType,
DateAdd(month, DateDiff(month, 0, R.RedemptionDate), 0)
ORDER BY
ShareType,
RedemptionMonth
;
See it working in a Sql Fiddle.
This can easily be parameterized by simply adding a WHERE clause with conditions on the Redemption table. If you need to show a 0 for share types in months where they had no Redemptions, please let me know and I'll improve my answer--it would help if you would fill out your use case scenario a little bit, and describe exactly what you want to input and what you want to see as output.
Also please note: I'm assuming here that there will always be a price for a share redemption--if a redemption exists that is before any share price for it, that redemption will be excluded.
If you have the valuations for every day, then the calculation is a simple join followed by an aggregation. The resulting query is something like:
select year(redemptiondate), month(redemptiondate),
sum(r.NumShares*sp.Price) as TotalPrice
from Redemptions r left outer join
ShareType st
on r.sharetype = st.sharetype left outer join
SharePrice sp
on st.sharename = sp.sharename and r.redemptiondate = sp.pricedate
group by year(redemptiondate), month(redemptiondate)
order by 1, 2;
If I understand your question, you need a query like
select shares.id, shares.name, sum (redemption.quant * shareprices.price)
from shares
inner join redemption on shares.id = redemption.share
inner join shareprices on shares.id = shareprices.share
where redemption.curdate between :p1 and :p2
order by shares.id
group by shares.id, shares.name
:p1 and :p2 are date parameters
If you just need it for one date range:
SELECT s.ShareType, SUM(ISNULL(sp.SharePrice, 0) * ISNULL(r.NumRedemptions, 0)) [RedemptionPrice]
FROM dbo.Shares s
LEFT JOIN dbo.Redemptions r
ON r.ShareType = s.ShareType
OUTER APPLY (
SELECT TOP 1 SharePrice
FROM dbo.SharePrice p
WHERE p.ShareType = s.ShareType
AND p.ValuationDate <= r.RedemptionDate
ORDER BY p.ValuationDate DESC) sp
WHERE r.RedemptionDate BETWEEN #Date1 AND #Date2
GROUP BY s.ShareType
Where #Date1 and #Date2 are your dates
The ISNULL checks are just there so it actually gives you a value if something is null (it'll be 0). It's completely optional in this case, just a personal preference.
The OUTER APPLY acts like a LEFT JOIN that will filter down the results from SharePrice to make sure you get the most recent ValuationDate from table based on the RedemptionDate, even if it wasn't from the same date range as that date. It could probably be achieved another way, but I feel like this is easily readable.
If you don't feel comfortable with the OUTER APPLY, you could use a subquery in the SELECT part (i.e., ISNULL(r.NumRedemptions, 0) * (/* subquery from dbo.SharePrice here */)