Finding Max(Date) BEFORE specified date in Redshift SQL - sql

I have a table (Table A) in SQL (AWS Redshift) where I've isolated my beginning population that contains account id's and dates. I'd like to take the output from that table and LEFT join back to the "accounts" table to ONLY return the start date that precedes or comes directly before the date stored in the table from my output.
Table A (Beg Pop)
-------
select account_id,
min(start_date),
min(end_date)
from accounts
group by 1;
I want to return ONLY the date that precedes the date in my current table where account_id match. I'm looking for something like...
Table B
-------
select a.account_id,
a.start_date,
a.end_date,
b.start_date_prev,
b.end_date_prev
from accounts as a
left join accounts as b on a.account_id = b.account_id
where max(b.start_date) less than a.start_date;
Ultimately, I want to return everything from table a and only the dates where max(start_date) is less than the start_date from table A. I know aggregation is not allowed in the WHERE clause and I guess I can do a subquery but I only want the Max date BEFORE the dates in my output. Any suggestions are greatly appreciated.

I want to return ONLY the date that precedes the date in my current table where account_id match
If you want the previous date for a given row, use lag():
select a.*,
lag(start_date) over (partition by account_id order by start_date) as prev_start_date
from accounts a;

As I understand from the requirement is to display all rows from a base table with the preceeding data sorted based on a column and with some conditions
Please check following example which I took from article Select Next and Previous Rows with Current Row using SQL CTE Expression
WITH CTE as (
SELECT
ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY start_date) as RN,
*
FROM accounts
)
SELECT
PreviousRow.*,
CurrentRow.*,
NextRow.*
FROM CTE as CurrentRow
LEFT JOIN CTE as PreviousRow ON
PreviousRow.RN = CurrentRow.RN - 1 and PreviousRow.account_id = CurrentRow.account_id
LEFT JOIN CTE as NextRow ON
NextRow.RN = CurrentRow.RN + 1 and NextRow.account_id = CurrentRow.account_id
ORDER BY CurrentRow.account_id, CurrentRow.start_date;
I tested with following sample data and it seems to be working
create table accounts(account_id int, start_date date, end_date date);
insert into accounts values (1,'20201001','20201003');
insert into accounts values (1,'20201002','20201005');
insert into accounts values (1,'20201007','20201008');
insert into accounts values (1,'20201011','20201013');
insert into accounts values (2,'20201001','20201002');
insert into accounts values (2,'20201015','20201016');
Output is as follows

Related

Can I avoid joining the same table multiple times?

Is there a way to improve the following query?
I would need an optimized version of the following query.
The reason I'm joining the Date_Table multiple times is because the ID and date_value columns are not in ascending order.
ie
ID = 1, date_value = '2022-09-07'; ID = 2, date_value = '2022-02-02'; ID = 3, date_value = '2022-11-12';
Sample data:
The maximum Date from the Agreements table is calculated based on the Date_Table.date_value column. The query will only return a row. In this case, the row highlighted in green will be the result.
Thank you so much!
SELECT * FROM Agreement
WHERE
dim_date_id = (
SELECT
Date_Table.ID
FROM (
SELECT
MAX(Date_Table.date_value) AS date_value
FROM Agreement
INNER JOIN Date_Table
ON Agreement.DIM_DATE_ID = Date_Table.ID
) AS last_day
INNER JOIN Date_Table
ON last_day.date_value = Date_Table.date_value
);
If Agreement is a large table, you should first find all the distinct date_ids, then join it to Date_Table. Also use a rank() windowing function to find the id of the most recent record:
Select Agreement.* From Agreement Inner Join (
Select ID From (
Select Date_Table.ID
,rank() Over (Order by Date_Table.date_value desc) as recent
From Date_Table Inner Join (
Select Distinct Dim_Date_ID as ID From Agreement
) A On A.ID=Date_Table.ID
) where recent=1
) X On Agreement.DIM_DATE_ID = X.ID
On first glance this looks just as complicated as your original query. But it quickly reduces the Agreement results to only a list of date ids, and especially if that field is indexed it is a fast query. Date_Table is then Inner Joined to find the best (most recent) Date_Value using a rank() function. The whole thing is filtered to retain only one record, the most recent, and that date_id is used to filter Agreement.
Again, I recommend that you index Agreement.Dim_Date_ID to make this query perform well.

Select difference between two tables

I want to list four columns, date, hourly count, daily count and difference between two counts.
I have used union all for two tables, but I am getting 2rows as shown in the image:
Select a.date, a.hour,b.daily,sum(a.hour-b.daily)
from (select date,count(*) hour,''daily
From table a union all select '' hour,count(*) daily from table b)
Group by date, daily, hourly..
Please suggest to me a solution.
I see that the code supplied uses a UNION to achieve the output. This would be better served by using a JOIN of some kind.
The result is the total number of rows in table_a grouped by the date subtracted from the total number of rows in table_b grouped by the date.
This code is untested but should give a good indication of how to achieve this:
SELECT a.date,
a.hour,
ISNULL(b.daily, 0) AS daily,
a.hour - ISNULL(b.daily) AS difference
FROM (
SELECT date,
COUNT(*) AS hour
FROM table_a
GROUP BY date
) a
LEFT JOIN (
SELECT date,
COUNT(*) AS daily
FROM table_b
GROUP BY date
) b ON b.date = a.date
ORDER BY a.date;
This works by:
Calculating the count per date in table_a.
Calculating the count per date in table_b.
Joining all results from table_a with those matching in table_b.
Outputting the date, the hour from table_a, the daily (or 0 if NULL) from table_b, and the difference between the two.
Notes:
I have renamed table a and table b to table_a and table_b. I presume these are not the actual table names
An INNER JOIN may be preferable if you only want results that have matching date columns in both tables. Using the LEFT JOIN will return all results from table_a regardless of whether table_b has an entry.
I'm not convinced that date is an allowed column name but I have reproduced it in the code as per the example given by OP.
Your method is fine. Your group by columns are not correct:
Select date, sum(hourly) as hourly, sum(daily) as daily,
sum(hourly) - sum(daily) as diff
from ((select date, count(*) as hourly, 0 as daily
from table a
group by date
) union all
(select date, 0 as hourly, count(*) as daily
from table b
group by date
)
) ab
group by date;
The key idea is that the outer query aggregates only by date -- and you still need aggregation functions there as well.
You have other errors in your subquery, such as missing group bys and date columns. I assume those are transcription errors.

Need to find a difference of data from the same table in hive

I have a history table with loaded timestamp column. I need to fetch the subtracted data using the timestamp column.
Logic:To get the email address by subtracting data from (loaded_timestamp -1)and current_timestamp.Only the subtracted data should be the output.
Select query :
select t1.email_addr
from (select *
from table t1
where loaded_timestamp = current_timestamp
) left outer join
(select *
from table t2
where loaded_timestamp = date_sub(current_timestamp,1)
)
where t1.email!=t2.email;
Table has following columns
Email address, First name , last name, loaded_timestamp.
xxx#gmail.com,xxx,aaa,2020-03-08.
yyy#gmail.com,yyy,bbb,2020-03-08.
zzz#gmail.com,zzz,ccc,2020-03-08.
xxx#gmail.com,xxx,aaa,2020-03-09.
yyy#gmail.com,yyy,bbb,2020-03-09.
Desired Result
zzz#gmail.com
So if subtract the two dates from the same table i.e (2020-03-09 - 2020-03-08 ). I should get only the record which is not matching . Matching records should be discarded and unmatched record should be the output.
The best I can figure out is that you want emails that appear only once. If that is the case, use window functions:
select t.*
from (select t.*, count(*) over (partition by email) as cnt
from t
) t
where cnt = 1;
If you want emails in the data but not loaded on the current date, then:
select t.email
from t
group by t.email
having max(timestamp) <> current_date;

ORACLE SQL: Look for last row in column then update data from another data without affecting older data

i'm currently learning to code in Oracle SQL. Referring to example on photo i attached, I would like to my script to look for last date in Table A, if the date (Table A) is less than date in Table B, then insert only the latest date data from Table B to A without affecting older date data in Table A.
.
The reason is because Table B data is store by month, meaning that month Jan data will be purge out when going to Feb. So the purpose of Table A is to retain all the data i want from Table B.
And how do i embedded the script in table so that it autorun everyday?
Try this:
insert into TableA (
select * from TableB where date >(select max(date) from TableA)
);
If I understand correctly, in Oracle 12c+, you can use:
insert into a (date, sales)
select date, sales
from b
where b.date > (select max(a.date) from a)
order by date desc
fetch first 1 row only;
The order by and fetch ensure that only the most recent row is inserted.
In earlier versions, you can use a subquery:
insert into a (date, sales)
select date, sales
from (select b.*, row_number() over (order by date desc) as seqnum
from b
where b.date > (select max(a.date) from a)
) b
where seqnum = 1;
Note: If you run this code every day, you will just end up inserting every row of b into a. I assume you are aware of that.

How to select multiple rows in SQL Server while filling one column with the first value

Each of my rows have a date. I want the database to keep the good date. But I am in a situation where I want only the first date. But I still want all the other rows. So I would like to fill the date column with all the same date in my result.
For an example (Because I don't think I expressed myself well)
I have this:
name value date
a 10 5/13
b 14 2/13
c 20 1/13
a 11 7/13
a 5 8/13
b 8 9/13
I want it to become like this in the result:
name value date
a 26 5/13
b 22 5/13
c 20 5/13
I searched for this information but I only find the way to select the first row.
for now I'm doing
SELECT name, SUM(value), date FROM table
ORDER BY name
And I'm kind of clueless for what to do next.
Thanks :)
Databases don't have a concept of "first". Here is an attempt, but no guarantees unless you have a way of ordering to determine first:
select name, sum(value), const.date
from table cross join
(select top 1 date from table) const
group by name, const.date
If you only want to do this for a query, to provide this aggregated data for some specific client requirement, then #freshPrince's answer is appropriate. But if want to actually modify the data in the table itself, and prevent the issue from arising again, then you need to change the schema.
Create Table newTable(
name varChar(30) not null,
date datetime not null,
value decimal(10,2) not null default(0),
primary key (name, date) )
Insert newTable (name, date, value)
Select name, SUM(value), Min(date)
FROM currentTable
Group By Name
and delete the old table... then rename the new table to whatever...
You will also have to modify the process used to insert new rows so that instread of always inserting a new row, it updates the existing row for a specified name and date if it already exists...
Your question is slightly confusing since your desired result is showing a date that does not exists with either b or c but if that is the result that you want want you could use something similar to the following:
select name, sum(value) value, d.date
from yt
cross join
(
select min(date) date
from yt
where name = (select min(name)
from yt)
) d
group by name, d.date;
See SQL Fiddle with Demo
But it seems like you actually would want the min(date) for each name:
select name, sum(value) value, min(date)
from yt
group by name;
See SQL Fiddle with Demo.
If the order of the date should be the determined by the name then you could use:
select t.name, sum(value) value, d.date
from yt t
cross join
(
select top 1 name, date
from yt
order by name, date
) d
group by t.name, d.date;
See Demo