Where clause in a calculation - sql

Say I have this table:
month
num_of_fruits
harvested
2022-01-01
133
3
2022-02-01
145
12
2022-03-01
123
5
2022-04-01
111
4
2022-05-01
164
9
..
..
..
I want to be able to set a new column called lost based on the month and num_of_fruits columns. To set this lost column, requires a calculation. The calculation is harvested - (num_of_fruits - num_of_fruits(last_month))
I'm having trouble in the parenthesis part - getting the last month's num_of_fruits. I have this to start:
select
id,
"month",
num_of_fruits,
harvested,
harvested - (num_of_fruits - num_of_fruits WHERE date_trunc('month', "month" - interval '1' month)) as lost,
selecting other columns..
It's giving me an error in the where clause.
Can you have a where clause inside a select statement? How would I take the last month's num_of_fruits and subtract it with this month's num_of_fruits - all while inside the select statement?
Any help or advice will greatly help me! Thank you so much in advance!

If you want to check other rows in the table, you will likely want either a subquery in your SELECT or to join the table to itself.
I think you are probably trying to do:
SELECT
harvested - (num_of_fruits - (SELECT num_of_fruits FROM mytable t2 WHERE t2.month = date_trunc('month', t1."month" - interval '1' month))) as lost
FROM mytable t1
Note that I created a whole new subquery (SELECT/FROM/WHERE) within your existing SELECT statement, instead of just adding a stray WHERE clause.
I also changed your condition so that it actually has a compares the result of DATETRUNC with something.
It's not clear to me that you actually need the DATETRUNC here (and, if you do, you might want it on both sides of the comparison), but you can use the basic idea above and fix the condition to match your needs.
An alternative (joining to self) to consider might be:
SELECT
t1.harvested - (t1.num_of_fruits - t2.num_of_fruits)
FROM mytable t1 LEFT OUTER JOIN mytable t2
ON t2.month = date_trunc('month', t1."month" - interval '1' month)))
If you know that you always have one row per month, so the previous row (ordered by month) is also the previous month, you could just use LAG:
SELECT
harvested - (num_of_fruits - LAG(num_of_fruits, 1) OVER (ORDER BY month)
FROM mytable
LAG(num_of_fruits, 1) OVER (ORDER BY month) means "the num_of_fruits from the previous row in the table when the table is ordered by month".

Related

Duplicate records (rows) with modified values (Postgresql)

I have a dataset that I am preparing for a pivot view (in Excel). It's customer data and I want to create a view that allows to summarize how many customers were active every month. Therefore I need to duplicate rows with a modified value for first_status_date. This is my data:
This is where I am trying to get: (The green rows are duplicated, and the cells in bold are modified.
Because I am adding a month on top of the value of the cell above, I am thinking of working with the lag function. But I am not familiar with duplicating rows. The id 123 and 356 needed 2 duplicated rows because there is a 2 month difference between last and first status date, the id 221 needs only 1 duplicated row, as there is a one month difference.
You can use generate_series() to generate the new rows - a little tweaking is needed for the end date to ensure that we do get a row for the last month, regardless of the actual month day:
select
t.id,
t.first_status,
d.first_status_date,
t.last_status,
t.last_status_date
from mytable t
cross join lateral generate_series(
t.first_status_date,
date_trunc('month', t.last_status_date) + interval '1 month',
interval '1 month'
) d(first_status_date)

sum last n days quantity using sql window function

I am trying to create following logic in Alteryx and data is coming from Exasol database.
Column “Sum_Qty_28_days“ should sum up the values of “Qty ” column for same article which falls under last 28 days.
My sample data looks like:
and I want following output:
E.g. “Sum_Qty_28_days” value for “article” = ‘A’ and date = ‘’2019-10-8” is 8 because it is summing up the “Qty” values associated with dates (coming within previous 28 days) Which are:
2019-09-15
2019-10-05
2019-10-08
for “article” = ‘A’.
Is this possible using SQL window function?
I tried myself with following code:
SUM("Qty") OVER (PARTITION BY "article", date_trunc('month',"Date")
ORDER BY "Date")
But, it is far from what I need. It is summing up the Qty for dates falling in same month. However, I need to sum of Qty for last 28 days.
Thanks in advance.
Yes, this is possible using standard SQL and in many databases. However, this will not work in all databases:
select t.*,
sum(qty) over (partition by article
order by date
range between interval '27 day' preceding and current row
) as sum_qty_28_days
from t;
If your RDBMS does not support the range frame, an alternative solution is to use an inline subquery:
select
t.*,
(
select sum(t1.qty)
from mytable t1
where
t1.article = t.article
and t1.date between t.date - interval 28 days and t.date
) sum_qty_28_days
from mytable t

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

How to calculate difference between two rows in a date interval?

I'm trying to compare data from an Access 2010 database based on a date interval. Example I have items from various purchase orders and I want to maintain the history of these item's delivery to a warehouse. So my purchase order has a request for a quantity of 10 of a material, for example, and it can be partially delivered in many deliveries and I want to know how this delivery varied in a date interval. To fill the date field the criteria used is the following: if the item had an update in the QtyPending field, I copy the current row deactivating it with a booelan field, create a new entry with the current update date updating the QtyPending field, so the active record is the actual state of the item. So I have a table that holds informations about these items like that
PO POItem QtyPending Date Active
4500000123 10 10 01/09/2014 FALSE
4500000123 10 8 05/09/2014 TRUE
4500000122 30 5 03/09/2014 FALSE
4500000122 30 1 04/09/2014 TRUE
With this example, for the first item, it means that from date 01/09 to 04/09 the QtyPending field didn't suffer a variation, meaning that the supplier didn't make any delivery to me, but from 01/09 to 05/08 he delivered me a qty of 2 of a material. For the second one, from date 03/09 to 04/09 the supplier delivered me a qty of 4 of a material. So, if I were to be making a report query from 02/09/2014 to 04/09/2014, the expected output is like this:
PO POItem QtyDelivered
4500000123 10 0
4500000122 30 4
And a report from 31/08/2014 to 10/09/2014, would have this output
PO POItem QtyDelivered
4500000123 10 2
4500000122 30 4
I'm not coming up with a query to make this report. Can anyone help me?
There are many ways of solving this. The easiest one would be to simply make a query of all the necessary records between two dates, loop over them and insert into a temporary table the result. This temporary table can then be the source of your report. A lot of people will scream at you for not using a big query instead but getting the result that you want in the fastest and simplest way should be your priority.
Your problem with your schema is that you don't have the QtyDelivered stored for each record. If you would have it, it would be an easy thing to sum over it in order to get needed result. By not storing this value, you have transformed a simple and fast query into a much harder and slower one because you need to recalculate this value in some way or other and you must do this without forgetting the fact that it's possible to have more than two records.
For calculating this value, you can either use a sub-query to retrieve the value from the previous row or a Left join do to the same. Once you have this value, you can subtract these two to get the needed difference; allowing for the possibility of Null value if there is no previous row. Once you have these values, you can now sum over them to get the final result with a Group By. Notice that in order to perform these calculations, you need to have one or two more levels of subquery. The first query should be something like:
Select PO, POItem, QtyPending, (Select Top 1 QtyPending from MyTable T2 where T1.PO = T2.PO and T2.Date < T1.Date And (T2.Date between #Date1 and #Date2) Order by T2.Date Desc) as QtyPending2 from MyTable T1 Where T1.Date between #Date1 and #Date2) ...
With this as either another subquery or as a View, you can then compute the desired difference by comparing the values of QtyPending and QtyPending2; without forgetting that QtyPendin2 may be Null. The remaining steps are easy to do.
Notice that the above example is for SQL-Server, you might have to change it a little for Access. In any case, you can find here many examples on how to compare two rows under Access. As noted earlier, you can also use a Left Join instead of a subquery to compare your rows.
I came up with this query that solved the problem, it wasn't that simple
SELECT
ItmDtIni.PO
,ItmDtIni.POItem AS [PO Item]
,ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) AS [Qty Delivered]
,ROUND((ItmDtIni.QtyPending - ItmDtEnd.QtyPending) * ItmDtEnd.Price, 2) AS [Value delivered(US$)]
//Filtering subqueries to bring only the items in the date interval to make a self join
FROM (((SELECT
PO
,POItem
,QtyPending
,MIN(Date) AS MinDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending) AS ItmDtIni
//Self join filtering to bring only items in the date interval with the previously filtered table
INNER JOIN (SELECT
PO
,POItem
,QtyPending
,Price
,MAX(Date) AS MaxDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending
,Price) AS ItmDtEnd
ON ItmDtIni.PO = ItmDtEnd.PO
AND ItmDtIni.POItem = ItmDtEnd.POItem)
INNER JOIN PO
ON ItmDtEnd.PO = PO.Numero)
WHERE
//Showing only items that had a variation in the date interval
ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) <> 0
//Anchoring min date in the interval for each item found by the first subquery
AND ItmDtIni.MinDate = (SELECT MIN(Item.Date)
FROM Item
WHERE
ItmDtIni.PO = Item.PO
AND ItmDtIni.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))
//Anchoring max date in the interval for each item found by the second subquery
AND ItmDtEnd.MaxDate = (SELECT MAX(Item.Date)
FROM Item
WHERE
ItmDtEnd.PO = Item.PO
AND ItmDtEnd.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))

Difference in time between records

I have a table that has (among others) a timestamp column (named timestamp; it's a standard Oracle DATE datatype). The records are about 4-11 minutes apart, about 7 or 8 records every hour, and I'm trying to determine if there is any pattern to them.
Is there an easy way to see each record, and the number of minutes that record occurred after the previous record?
Thanks,
AndyDan
This is Oracle 9i+, using the LAG function to get the previous timestamp value without needing to self join:
SELECT t.timestamp - LAG(t.timestamp) OVER (ORDER BY t.timestamp) AS diff
FROM YOUR_TABLE t
...but because whole numbers represent the number of days in the result, a difference of less than 24 hours will be a fraction. Also, the LAG will return NULL if there's no earlier value -- same as if having used an OUTER JOIN.
To see minutes, use the ROUND function:
SELECT ROUND((t.timestamp - LAG(t.timestamp) OVER (ORDER BY t.timestamp)) *1440) AS diff_in_minutes
FROM YOUR_TABLE t
If the records have sequential id's you could do a self join like this:
SELECT t2.*, t2.timestamp - t1.timestamp AS timediff
FROM foo t1 inner join foo.t2 on t1.id = t2.id-1
You'd probably need to tweak this to handle the first and last records, but that's the basics.