Eliminating duplicate answers and mismatch in SQL
So the problem states that I need to find the transactions that happened each day and there is a mismatch between my answer on the correct answer and I don't know why!
this is a Short database description "Recycling firm"
The firm owns several buy-back centers for the collection of recyclable materials. Each of them receives funds to be paid to the recyclables suppliers. Data on funds received are recorded in the table
Income_o(point, date, inc)
The primary key is (point, date), where the point holds the identifier of the buy-back center, and the date corresponds to the calendar date the funds were received. The date column doesn’t include the time part, thus, money (inc) arrives no more than once a day for each center. Information on payments to the recyclables suppliers is held in the table
Outcome_o(point, date, out)
In this table, the primary key (point, date) ensures each buy-back center reports about payments (out) no more than once a day, too.
For the case income and expenditure may occur more than once a day, another database schema with tables having a primary key consisting of the single column code is used:
Income(code, point, date, inc)
Outcome(code, point, date, out)
Here, the date column doesn’t include the time part, either.
and The question is :
Under the assumption that receipts of money (inc) and payouts (out) can be registered any number of times a day for each collection point [i.e. the code column is the primary key], display a table with one corresponding row for each operating date of each collection point.
Result set: point, date, total payout per day (out), total money intake per day (inc).
Missing values are considered to be NULL.
SELECT Income.point, Income."date", SUM("out"), SUM(inc)
FROM Income left JOIN
Outcome ON Income.point = Outcome.point AND
Income."date" = Outcome."date"
GROUP BY Income.point, Income."date"
UNION
SELECT Outcome.point, Outcome."date", SUM("out"), SUM(inc)
FROM Outcome left JOIN
Income ON Income.point = Outcome.point AND
Income."date" = Outcome."date"
GROUP BY Outcome.point, Outcome."date";
My guess is that you have a bit of a Cartesian join by not including CODE as part of your join criteria. I think the following query should suit your needs:
WITH calendar AS
(
SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DT
FROM DUAL
CONNECT BY LEVEL < 30
)
SELECT d.pnt AS "POINT",
c.dt AS "DATE",
d.outcome_total,
d.income_total
FROM calendar c
LEFT JOIN (SELECT nvl(inc.pnt, outc.pnt) AS PNT,
nvl(inc.dt, outc.dt) AS DT,
outc.amt AS OUTCOME_TOTAL,
inc.amt AS INCOME_TOTAL
FROM (SELECT i.pnt, i.dt, sum(i.inc) AS AMT
FROM income i
GROUP BY i.pnt, i.dt) inc
FULL JOIN (SELECT o.pnt, o.dt, sum(o.inc) AS AMT
FROM outcome o
GROUP BY o.pnt, o.dt) outc ON inc.pnt = outc.pnt AND inc.dt = outc.dt) d ON c.dt = d.dt;
I added the calendar table to account for the case where there was neither an income nor an outcome on a given day. However, if you don't need that, the query within the LEFT JOIN should be just fine.
N.B.: With the addition of the calendar WITH clause, this query will currently only show results from the last month(-ish). If you need longer, adjust the 30 day window.
Related
My dataset is about sales, each line corresponds to an invoice. It is possible to have 2 registers in the same day for the same customer, if he had bought twice in that day.
As you can see in the image below, the blue square shows us that customer 355122 (id_cliente = customer_id) bought twice (275831N and 275826N invoice's id) in the same day (2020-12-19) (penult_data = second-last date). This query is meant to be a support table, to left join the main table and bring those results.
First of all, i've created a row number just over customer_id (blue arrow, aux), so that I could just join with aux = 2 (that should be the second-last register), but in cases that the customer bought twice that day, the second-last invoice is not the second-last date he bought. I need the second-last DATE. He can buy 1,2,3,4,5 times a day, so I cannot assume a correct aux number to filter.
Then, for some reason, I also created an aux2, it's a row number over customer and date, but it really didn't help. I needed something that would repeat the index for the same date, so that index = 2 would be the second-last date.
I cannot use group by because i'm retrieving the salesman id (penult_vend), the store id (penult_empe), and so on from the second-last date
This is the output of part of the query I'm using (as I said, the support table to left join the main table). I'm filtering to this customer's id.
Does somebody knows any function or method to make this work?
I'm using google big query.
Thanks
Assuming the column penult_data has only date information without time of the day, you can find the second to last "date" and then the last "invoice" on that date by using the DENSE_RANK() and ROW_NUMBER() functions:
dense_rank() over(partition by id_cliente
order by penult_data desc) as rnd,
row_number() over(partition by id_cliente, penult_data
order by penult_nf desc) as rnf,
Then, you can use the filtering condition:
where rnd = 2 and rnf = 1
I have a column in my fact table that defines whether a Supplier is old or new based on the following case-statement:
CASE
WHEN (SUBSTRING([Reg date], 1, 6) = SUBSTRING([Invoice date], 1, 6)
THEN ('New supplier')
ELSE('Old supplier')
END as [Old/New supplier]
So for example, if a Supplier was registered 201910 and Invoice date was 201910 then the Supplier would be considered a 'New supplier' that month. Now I want to calculate the number of Old/New suppliers for each month by doing an distinct count on Supplier no, which is not a problem. The last step is where it gets tricky, now I want to count the number of New/Old suppliers over a 12-month period(if there has been a match on Invoice date and reg date in any of the lagging 12 months). So I create the following mdx expression:
aggregate(parallelperiod([D Time].[Year-Month-Day].[Year],1,[D Time].[Year-Month-Day].currentmember).lead(1) : [D Time].[Year-Month-Day].currentmember ,[Measures].[Supplier No Distinct Count])
The issue I am facing is that it will count Supplier no "1234" twice since it has been both new and old during that time period. What I wish is that, if it finds one match it would be considered a "New" Supplier for that 12- month period.
This is how the result ends up looking but I want it to be zero for "Old" since Reg date and Invoice date matched once during that 12-month period it should be considered new for the whole Rolling 12 month on 201910
Any help, possible approaches or ideas are highly appreciated.
Best regards,
Rubrix
Aggregate first at the supplier level and then at the type level:
select type, count(*)
from (select supplierid,
(case when min(substring(regdate, 1, 6)) = min(substring(invoicedate, 1, 6))
then 'new' else 'old'
end) as type
from t
group by supplierid
) s
group by type;
Note: I assume your date columns are in some obscure string format for your code to work. Otherwise, you should be using appropriate date functions.
SELECT COUNT(*) OVER () AS TotalCount
FROM Facts
WHERE Regdate BETWEEN(olddate, newdate) OR InvoiceDate BETWEEN(olddate, newdate)
GROUP BY
Supplier
The above query will return all the suppliers within that time period and then group them. Thus COUNT(*) will only include unique subscribers.
You might wanna change the WHERE clause because I didn't quite understand how you are getting the 12 month period. Generally if your where clause returns the suppliers within that time period(they don't have to be unique) then the group by and count will handle the rest.
I'm trying to wrap my head around a problem with making a query for a statistical overview of a system.
The table I want to pull data from is called 'Event', and holds the following columns (among others, only the necessary is posted):
date (as timestamp)
positionId (as number)
eventType (as string)
Another table that most likely is necessary is 'Location', with, among others, holds the following columns:
id (as number)
clinic (as boolean)
What I want is a sum of events in different conditions, grouped by days. The user can give an input over the range of days wanted, which means the output should only show a line per day inside the given limits. The columns should be the following:
date: a date, grouping the data by days
deliverySum: A sum of entries for the given day, where eventType is 'sightingDelivered', and the Location with id=posiitonId has clinic=true
pickupSum: Same as deliverySum, but eventType is 'sightingPickup'
rejectedSum: A sum over events for the day, where the positionId is 4000
acceptedSum: Same as rejectedSum, but positionId is 3000
So, one line should show the sums for the given day over the different criteria.
I'm fairly well read in SQL, but my experience is quite low, which lead to me asking here.
Any help would be appreciated
SQL Server has neither timestamps nor booleans, so I'll answer this for MySQL.
select date(date),
sum( e.eventtype = 'sightingDelivered' and l.clinic) as deliverySum,
sum( e.eventtype = 'sightingPickup' and l.clinic) as pickupSum,
sum( e.position_id = 4000 ) as rejectedSum,
sum( e.position_id = 3000 ) as acceptedSum
from event e left join
location l
on e.position_id = l.id
where date >= $date1 and date < $date2 + interval 1 day
group by date(date);
I have 2 tables with no relation between them. I want to display the data in tabular format by month. Here is a sample output:
There are 2 different tables
1 for income
1 for expense
Problem is that we have no direct relation between these. The only commonality between them is month (date). Does anyone have a suggestion on how to generate such a report?
here is my union queries:
SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') , 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY')
UNION
SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') , ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION
ORDER BY 1
Regards:
In order to do this you probably want to make the Income and Expenses into separate sub-queries.
I have taken the two parts of your union query and separated them into sub-queries, one called income and one called expense. Both sub-queries summarise the data over the month period as before, but now you can use a JOIN on the Months to allow the data from each sub-query to be connected. Note: I have used an OUTER JOIN, because this will still join month where there is no income, but there is expense and vice versa. This will require some manipulation, because you probably are better off returning a set of zeros for the month if no transaction occur.
In the top level SELECT, replace the use of *, with the correct listing of fields required. I simply used this to show that each field can be reused from the sub-query in the outer query, by referring to the alias as the table name.
SELECT Income.*, Expenses.*
FROM (SELECT TO_DATE(TO_CHAR(PAY_DATE,'MON-YYYY'), 'MON-YYYY') as Month, 'FEE RECEIPT', NVL(SUM(SFP.AMOUNT_PAID),0) AMT_RECIEVED
FROM STU_FEE_PAYMENT SFP, STU_CLASS SC, CLASS C
WHERE SC.CLASS_ID = C.CLASS_ID
AND SFP.STUDENT_NO = SC.STUDENT_NO
AND PAY_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
AND SFP.AMOUNT_PAID >0
GROUP BY TO_CHAR(PAY_DATE,'MON-YYYY') Income
OUTER JOIN (SELECT TO_DATE(TO_CHAR(EXP_DATE,'MON-YYYY'), 'MON-YYYY') as Month, ET.DESCRIPTION, SUM(EXP_AMOUNT)
FROM EXP_DETAIL ED, EXP_TYPE ET, EXP_TYPE_DETAIL ETD
WHERE ET.EXP_ID = ETD.EXP_ID
AND ED.EXP_ID = ET.EXP_ID
AND ED.EXP_DETAIL_ID = ETD.EXP_DETAIL_ID
AND EXP_DATE BETWEEN '01-JAN-2014' AND '31-DEC-2014'
GROUP BY TO_CHAR(EXP_DATE,'MON-YYYY'), ET.DESCRIPTION) Expenses
ON Income.Month = Expenses.Month
There are still many calculations that you will have to insert, to get your final result, which you will have to work on separately. The resulting query to perform what you expect above will likely be a lot longer than this, I am just trying to show you the structure.
However the final tricky part for you is going to be the BBF. Balance Bought Forward. SQL is great a joining tables and columns, but each row is treated and handled separately, it does not read and value from the previous row within a query and allow you to manipulate that value in the next row. To do this you need another sub-query to SUM() all the changes from a point in time up until the start of the month. Financial products normally store Balance at points in time, because it is possible that not all transaction are accurately recorded and there needs to be a mechanism to adjust the Balance. Using this theory, you you need to write your sub-query to summarise all changes since the previous Balance.
IMO Financial applications are inherently complex, so the solution is going to take some time to mould into the right one.
Final Word: I am not familiar with OracleReports, but there may be something in there which will assist with maintaining the BBF.
sqlite> create table Income(Month text, total_income real);
sqlite> create table Expense(Month text, total_expense real);
sqlite> insert into Income values('Jan 2014', 9000);
sqlite> insert into Income values('Feb 2014', 6000);
sqlite> insert into Expense values('Jan 2014', 9000);
sqlite> insert into Expense values('Feb 2014', 18000);
sqlite> select Income.Month, Income.total_income, Expense.total_expense, Income.total_income - Expense.total_expense as Balance from Income, Expense where Income.Month == Expense.Month
Jan 2014|9000.0|9000.0|0.0
Feb 2014|6000.0|18000.0|-12000.0
The firm has a few outlets that receive items for recycling.
Each of the outlets receives funds to be paid to deliverers.
Information on received funds is registered in a table:
Income_o(point, date, inc)
The primary key is (point, date),
thus receiption of money (inc) takes place not more than once a day (date column does not include time component of the date).
Information on payments to deliverers is registered in the table:
Outcome_o(point, date, out)
In this table the primary key (point, date)
also ensures bookkeeping of the funds distribution at each point not more than once a day.
In case incomes and expenses may occur more than once a day, another database schema is used.
Corresponding tables include code column as primary key:
Income(code, point, date, inc)
Outcome(code, point, date, out)
In this schema date column does not also include the day time.
Trying the following query, but shooting an error :(
select point, date, sum(inc) as Income,
sum(out) as Outcome from (
select point, date, inc from Income_o
union
select point, date, out from Outcome_o
) result group by point,date
You don't want UNION, for a couple of reasons - one is that it uses the column names from the first query that forms part of the UNION - you're no longer able to distinguish between income and outcome. And secondly, it eliminates duplicates - so if you had a day when the same amount came in and went out, that would produce only a single row for the rest of your query to work with.
What you want instead is a FULL OUTER JOIN:
select COALESCE(i.point,o.point) as point, COALESCE(i.date,o.date) as date,
inc as Income, out as Outcome
from Income_o i
full outer join
Outcome_o o
on
i.point = o.point and
i.date = o.date
For any particular point and date combination that exists in either table, this will produce an output row.
You can also use CASE as follows:
SELECT
CASE
WHEN Income_o.point IS NULL
THEN Outcome_o.point
ELSE Income_o.point
END,
CASE
WHEN Income_o.date IS NULL
THEN Outcome_o.date
ELSE Income_o.date
END,
inc, out
FROM
Income_o FULL JOIN Outcome_o
ON Income_o.point = Outcome_o.point
AND Income_o.date = Outcome_o.date