Two tables with no direct relationship - sql

I have 2 tables with no relation between them. I want to display the data in tabular format by month. Here is a sample output:
There are 2 different tables
1 for income
1 for expense
Problem is that we have no direct relation between these. The only commonality between them is month (date). Does anyone have a suggestion on how to generate such a report?
here is my union queries:
SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') , 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY')
UNION
SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') , ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION
ORDER BY 1
Regards:

In order to do this you probably want to make the Income and Expenses into separate sub-queries.
I have taken the two parts of your union query and separated them into sub-queries, one called income and one called expense. Both sub-queries summarise the data over the month period as before, but now you can use a JOIN on the Months to allow the data from each sub-query to be connected. Note: I have used an OUTER JOIN, because this will still join month where there is no income, but there is expense and vice versa. This will require some manipulation, because you probably are better off returning a set of zeros for the month if no transaction occur.
In the top level SELECT, replace the use of *, with the correct listing of fields required. I simply used this to show that each field can be reused from the sub-query in the outer query, by referring to the alias as the table name.
SELECT Income.*, Expenses.*
FROM (SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') as Month, 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY') Income
OUTER JOIN (SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') as Month, ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION) Expenses
ON Income.Month = Expenses.Month
There are still many calculations that you will have to insert, to get your final result, which you will have to work on separately. The resulting query to perform what you expect above will likely be a lot longer than this, I am just trying to show you the structure.
However the final tricky part for you is going to be the BBF. Balance Bought Forward. SQL is great a joining tables and columns, but each row is treated and handled separately, it does not read and value from the previous row within a query and allow you to manipulate that value in the next row. To do this you need another sub-query to SUM() all the changes from a point in time up until the start of the month. Financial products normally store Balance at points in time, because it is possible that not all transaction are accurately recorded and there needs to be a mechanism to adjust the Balance. Using this theory, you you need to write your sub-query to summarise all changes since the previous Balance.
IMO Financial applications are inherently complex, so the solution is going to take some time to mould into the right one.
Final Word: I am not familiar with OracleReports, but there may be something in there which will assist with maintaining the BBF.

sqlite> create table Income(Month text, total_income real);
sqlite> create table Expense(Month text, total_expense real);
sqlite> insert into Income values('Jan 2014', 9000);
sqlite> insert into Income values('Feb 2014', 6000);
sqlite> insert into Expense values('Jan 2014', 9000);
sqlite> insert into Expense values('Feb 2014', 18000);
sqlite> select Income.Month, Income.total_income, Expense.total_expense, Income.total_income - Expense.total_expense as Balance from Income, Expense where Income.Month == Expense.Month
Jan 2014|9000.0|9000.0|0.0
Feb 2014|6000.0|18000.0|-12000.0

Related

Query two tables joined over a third table with the same foreign key?

I have a Postgres database schema groceries.
There are two tables purchases 19 and 20 connected over a third one categories.
I can join every table alone with categories without problem.
For calculating the year change I need 19 and 20 together.
It seems the problem is that the third table categories has got only one foreign key for both tables. Thus it return every time a col with zeros because there is no match for one table. Maybe I am wrong.
Any suggestions to query the tables?
More info below.
The groceries database has a subset dairies: 'whole milk','yogurt', 'domestic eggs'.
There are no clear primary keys.
I share the database file with this link:
https://drive.google.com/drive/folders/1BBXr-il7rmDkHAukETUle_ZYcDC7t44v?usp=sharing
I want to answer:
For each month of 2020, what was the percentage increase or decrease in total monthly dairy purchases compared to the same month in 2019 (i.e., the year_change)?
How can I do this?
I have tried different queries along this line:
SELECT
a.month,
COUNT(a.purchaseid) as sales_2020,
COUNT(b.purchase_id) as sales_2019,
ROUND(((CAST(COUNT(purchaseid) as decimal) /
(SELECT COUNT(purchaseid)FROM purchases_2020)) *100),2)
as market_share,
(COUNT(a.purchaseid) - COUNT(b.purchase_id) ) as year_change
FROM purchases_2020 as a
Left Outer Join categories as cat ON a.purchaseid = cat.purchase_id
Left Outer Join purchases_2019 as b ON cat.purchase_id = b.purchase_id
WHERE cat.category in ('whole milk','yogurt', 'domestic eggs')
GROUP BY a.month
ORDER BY a.month
;
It gives me either no result or the result above with an empty sales_2019 column.
The expected result is a table
with the monthly dairy sales for 2020, the montly market share of dairies of all products in 2020, and the monthly year change between 2019 and 2020 in percentage.
How can I calculate the year change?
Thanks for your help.
%%sql
postgresql:///groceries
with p2019Sales as (
select
month,
count(p.purchase_id) as total_sales
from purchases_2019 p
left join categories c
using (purchase_id)
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by month
order by month
),
mkS as (
select
cast(extract(month from fulldate::date)as int) as month,
count(*) as total_share
from purchases_2020
group by month
order by month
),
p2020Sales as (
select
cast(extract(month from fulldate::date)as int) as month,
count(p.purchaseid) as total_sales,
round(count(p.purchaseid)*100::numeric/ m.total_share,2) as market_share,
sum(count(*)) over() as tos
from purchases_2020 p
left join categories c
on p.purchaseid = c.purchase_id
left join mks m
on cast(extract(month from p.fulldate::date)as int) = m.month
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by 1,m.total_share
order by 1,m.total_share
),
finalSale as (
select
month,
p2.total_sales,
p2.market_share,
round((p2.total_sales - p1.total_sales)*100::numeric/p1.total_sales,2) as year_change
from p2019Sales p1
inner join p2020Sales p2
using(month)
)
select *
from finalSale
The answer of user18262778 is excellent.
but as Jeremy Caney is stating:
" add additional details that will help others understand how this addresses the question asked."
I deliver some details.
My goal:
get the output I want in one query
My problem:
The query is long and complicated.
There are several approaches to the problem:
joins
subqueries
All are prone to circular dependencies.
The subqueries and joins produce results,but discard data necessary to move on further towards the final result
The solution:
The with statement allows to compute the aggregation and reference this by name within the query.
If you know it is the WITH statement, then there is of course a lot of info on the web. The description below summarises exactly the benefits of the given solution in general.
"In PostgreSQL, the WITH query provides a way to write auxiliary statements for use in a larger query. It helps in breaking down complicated and large queries into simpler forms, which are easily readable. These statements often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query.
The WITH query being CTE query, is particularly useful when subquery is executed multiple times. It is equally helpful in place of temporary tables. It computes the aggregation once and allows us to reference it by its name (may be multiple times) in the queries.
The WITH clause must be defined before it is used in the query."
PostgreSQL - WITH Clause

Trying to figure out a solution to this question

Eliminating duplicate answers and mismatch in SQL
So the problem states that I need to find the transactions that happened each day and there is a mismatch between my answer on the correct answer and I don't know why!
this is a Short database description "Recycling firm"
The firm owns several buy-back centers for the collection of recyclable materials. Each of them receives funds to be paid to the recyclables suppliers. Data on funds received are recorded in the table
Income_o(point, date, inc)
The primary key is (point, date), where the point holds the identifier of the buy-back center, and the date corresponds to the calendar date the funds were received. The date column doesn’t include the time part, thus, money (inc) arrives no more than once a day for each center. Information on payments to the recyclables suppliers is held in the table
Outcome_o(point, date, out)
In this table, the primary key (point, date) ensures each buy-back center reports about payments (out) no more than once a day, too.
For the case income and expenditure may occur more than once a day, another database schema with tables having a primary key consisting of the single column code is used:
Income(code, point, date, inc)
Outcome(code, point, date, out)
Here, the date column doesn’t include the time part, either.
and The question is :
Under the assumption that receipts of money (inc) and payouts (out) can be registered any number of times a day for each collection point [i.e. the code column is the primary key], display a table with one corresponding row for each operating date of each collection point.
Result set: point, date, total payout per day (out), total money intake per day (inc).
Missing values are considered to be NULL.
SELECT Income.point, Income."date", SUM("out"), SUM(inc)
FROM Income left JOIN
Outcome ON Income.point = Outcome.point AND
Income."date" = Outcome."date"
GROUP BY Income.point, Income."date"
UNION
SELECT Outcome.point, Outcome."date", SUM("out"), SUM(inc)
FROM Outcome left JOIN
Income ON Income.point = Outcome.point AND
Income."date" = Outcome."date"
GROUP BY Outcome.point, Outcome."date";
My guess is that you have a bit of a Cartesian join by not including CODE as part of your join criteria. I think the following query should suit your needs:
WITH calendar AS
(
SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DT
FROM DUAL
CONNECT BY LEVEL < 30
)
SELECT d.pnt AS "POINT",
c.dt AS "DATE",
d.outcome_total,
d.income_total
FROM calendar c
LEFT JOIN (SELECT nvl(inc.pnt, outc.pnt) AS PNT,
nvl(inc.dt, outc.dt) AS DT,
outc.amt AS OUTCOME_TOTAL,
inc.amt AS INCOME_TOTAL
FROM (SELECT i.pnt, i.dt, sum(i.inc) AS AMT
FROM income i
GROUP BY i.pnt, i.dt) inc
FULL JOIN (SELECT o.pnt, o.dt, sum(o.inc) AS AMT
FROM outcome o
GROUP BY o.pnt, o.dt) outc ON inc.pnt = outc.pnt AND inc.dt = outc.dt) d ON c.dt = d.dt;
I added the calendar table to account for the case where there was neither an income nor an outcome on a given day. However, if you don't need that, the query within the LEFT JOIN should be just fine.
N.B.: With the addition of the calendar WITH clause, this query will currently only show results from the last month(-ish). If you need longer, adjust the 30 day window.

(inc) & expenses (out) at each outlet are written not more than once a day, get a result set with the fields: point, date, income, expense

The firm has a few outlets that receive items for recycling.
Each of the outlets receives funds to be paid to deliverers.
Information on received funds is registered in a table:
Income_o(point, date, inc)
The primary key is (point, date),
thus receiption of money (inc) takes place not more than once a day (date column does not include time component of the date).
Information on payments to deliverers is registered in the table:
Outcome_o(point, date, out)
In this table the primary key (point, date)
also ensures bookkeeping of the funds distribution at each point not more than once a day.
In case incomes and expenses may occur more than once a day, another database schema is used.
Corresponding tables include code column as primary key:
Income(code, point, date, inc)
Outcome(code, point, date, out)
In this schema date column does not also include the day time.
Trying the following query, but shooting an error :(
select point, date, sum(inc) as Income,
sum(out) as Outcome from (
select point, date, inc from Income_o
union
select point, date, out from Outcome_o
) result group by point,date
You don't want UNION, for a couple of reasons - one is that it uses the column names from the first query that forms part of the UNION - you're no longer able to distinguish between income and outcome. And secondly, it eliminates duplicates - so if you had a day when the same amount came in and went out, that would produce only a single row for the rest of your query to work with.
What you want instead is a FULL OUTER JOIN:
select COALESCE(i.point,o.point) as point, COALESCE(i.date,o.date) as date,
inc as Income, out as Outcome
from Income_o i
full outer join
Outcome_o o
on
i.point = o.point and
i.date = o.date
For any particular point and date combination that exists in either table, this will produce an output row.
You can also use CASE as follows:
SELECT
CASE
WHEN Income_o.point IS NULL
THEN Outcome_o.point
ELSE Income_o.point
END,
CASE
WHEN Income_o.date IS NULL
THEN Outcome_o.date
ELSE Income_o.date
END,
inc, out
FROM
Income_o FULL JOIN Outcome_o
ON Income_o.point = Outcome_o.point
AND Income_o.date = Outcome_o.date

Create SQL view column from result set

I'm new to SQL and attempting to revise a Create View script to add a new column from a select statement result set I've googled this quite a bit but haven't seen a good example.
Here's the select statement:
select lease_id, year(posting_date) as years1, SUM(amount) as Annual
from la_tbl_lease_projection
group by year(posting_date), lease_id
order by lease_id
The complicating factor is this. The Annual column in the result set is the Annual sum of expenses for a lease_id. However, in the view I'm adding the column to, expenses are listed monthly. So lease_id 100001 has 12 lines in 2010, 2011, etc. I want the view to have the new column show the Annual amount on each of the 12 monthly line items. The new Annual column should be to the right of the amount column and each line should contain the sum of the amount column for that year. e.g.:
Lease_id Posting_Date Amount Annual
100001 2010-01-01 $25 $300
100001 2010-02-01 $25 $300
etc...............
The view I'm adding to is a reasonably complex join and union from multiple tables. Instead of creating a new table for my result set, I'd like to access it using a stored procedure, unless there's a better option. MSDN says temp tables and table variables don't work in views so that's not an option.
I think this can be done by something like "when years1 = years1 AND lease_id = lease_id then [Annual] = resultset total, but can't seem to visualize it. Thanks in advance for your input.
Since you were looking at MSDN, I'm assuming SQL Server for this answer;
To get a yearly column that's a by year sum of amounts, you can use SUM() OVER ();
SELECT *, SUM(Amount) OVER (PARTITION BY YEAR(Posting_Date)) Yearly
FROM la_tbl_lease_projection;
An SQLfiddle to test with.
I think a derived table would do the trick for you something like:
select blah, blah2, blah3, ..., a.annual
from
<Long complicated set of joins>
join
(select lease_id, year(posting_date) as years1, SUM(amount) as Annual
from la_tbl_lease_projection
group by year(posting_date), lease_id
order by lease_id) a
on sometable.lease_id = a.lease_id and year(sometable .posting_date) = a.years1
Where <complex where conditions>

Calculate Average after populating a temp table

I have been tasked with figuring out the average length of time that our customers stick with us. (Specifically from the date they become a customer, to when they placed their last order.)
I am not 100% sure that I am doing this properly, but my thought was to gather the date we enter the customer into the database, and then head over to the order table and grab their most recent order date, dump them into a temp table, and then figure out the length of time between those two dates, and then tally an average based on that number.
( I have to do some other wibbly wobbly time stuff as well, but this is the one thats kicking my butt)
The end goal with this is to be able to say "On Average our customers stick with us for 4 years, and 3 months." (Or whatever the data shows it to be.)
SELECT * INTO #AvgTable
FROM(
SELECT DISTINCT (c.CustNumber) AS [CustomerNumber]
, COALESCE(convert( VARCHAR(10),c.OrgEnrollDate,101),'') AS [StartDate]
, COALESCE(CONVERT(VARCHAR(10),MAX(co.OrderDate),101),'')AS [EndDate]
,DATEDIFF(DD,c.OrgEnrollDate, co.OrderDate) as [LengthOfTime]
FROM dbo.Customer c
JOIN dbo.CustomerOrder co ON c.ID = co.CustomerID
WHERE c.Archived = 0
AND co.Archived =0
AND c.OrgEnrollDate IS NOT NULL
AND co.OrderDate IS NOT NULL
GROUP BY c.CustNumber
, co.OrderDate 2
)
--This is where I start falling apart
Select AVG[LengthofTime]
From #AvgTable
If understand you correctly, then just try
SELECT AVG(DATEDIFF(dd, StartDate, EndDate)) AvgTime
FROM #AvgTable
My guess is that since you are storing the data in a temp table, that the integer result of the datediff is being implicitly converted back to a datetime (which you cannot do an average on).
Don't store the average in your temp table (don't even have a temp table, but that is whole different conversation). Just do the differencing in your select.