Need to find a difference of data from the same table in hive

Need to find a difference of data from the same table in hive - sql

I have a history table with loaded timestamp column. I need to fetch the subtracted data using the timestamp column.
Logic:To get the email address by subtracting data from (loaded_timestamp -1)and current_timestamp.Only the subtracted data should be the output.
Select query :
select t1.email_addr
from (select *
from table t1
where loaded_timestamp = current_timestamp
) left outer join
(select *
from table t2
where loaded_timestamp = date_sub(current_timestamp,1)
)
where t1.email!=t2.email;
Table has following columns
Email address, First name , last name, loaded_timestamp.
xxx#gmail.com,xxx,aaa,2020-03-08.
yyy#gmail.com,yyy,bbb,2020-03-08.
zzz#gmail.com,zzz,ccc,2020-03-08.
xxx#gmail.com,xxx,aaa,2020-03-09.
yyy#gmail.com,yyy,bbb,2020-03-09.
Desired Result
zzz#gmail.com
So if subtract the two dates from the same table i.e (2020-03-09 - 2020-03-08 ). I should get only the record which is not matching . Matching records should be discarded and unmatched record should be the output.

The best I can figure out is that you want emails that appear only once. If that is the case, use window functions:
select t.*
from (select t.*, count(*) over (partition by email) as cnt
from t
) t
where cnt = 1;
If you want emails in the data but not loaded on the current date, then:
select t.email
from t
group by t.email
having max(timestamp) <> current_date;

Related

Finding Max(Date) BEFORE specified date in Redshift SQL

I have a table (Table A) in SQL (AWS Redshift) where I've isolated my beginning population that contains account id's and dates. I'd like to take the output from that table and LEFT join back to the "accounts" table to ONLY return the start date that precedes or comes directly before the date stored in the table from my output.
Table A (Beg Pop)
-------
select account_id,
min(start_date),
min(end_date)
from accounts
group by 1;
I want to return ONLY the date that precedes the date in my current table where account_id match. I'm looking for something like...
Table B
-------
select a.account_id,
a.start_date,
a.end_date,
b.start_date_prev,
b.end_date_prev
from accounts as a
left join accounts as b on a.account_id = b.account_id
where max(b.start_date) less than a.start_date;
Ultimately, I want to return everything from table a and only the dates where max(start_date) is less than the start_date from table A. I know aggregation is not allowed in the WHERE clause and I guess I can do a subquery but I only want the Max date BEFORE the dates in my output. Any suggestions are greatly appreciated.

I want to return ONLY the date that precedes the date in my current table where account_id match
If you want the previous date for a given row, use lag():
select a.*,
lag(start_date) over (partition by account_id order by start_date) as prev_start_date
from accounts a;

As I understand from the requirement is to display all rows from a base table with the preceeding data sorted based on a column and with some conditions
Please check following example which I took from article Select Next and Previous Rows with Current Row using SQL CTE Expression
WITH CTE as (
SELECT
ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY start_date) as RN,
*
FROM accounts
)
SELECT
PreviousRow.*,
CurrentRow.*,
NextRow.*
FROM CTE as CurrentRow
LEFT JOIN CTE as PreviousRow ON
PreviousRow.RN = CurrentRow.RN - 1 and PreviousRow.account_id = CurrentRow.account_id
LEFT JOIN CTE as NextRow ON
NextRow.RN = CurrentRow.RN + 1 and NextRow.account_id = CurrentRow.account_id
ORDER BY CurrentRow.account_id, CurrentRow.start_date;
I tested with following sample data and it seems to be working
create table accounts(account_id int, start_date date, end_date date);
insert into accounts values (1,'20201001','20201003');
insert into accounts values (1,'20201002','20201005');
insert into accounts values (1,'20201007','20201008');
insert into accounts values (1,'20201011','20201013');
insert into accounts values (2,'20201001','20201002');
insert into accounts values (2,'20201015','20201016');
Output is as follows

SQL - Summarize column with maximum date value and other fields

I have a table with the following fields:
Id|Date|Name
---------------
A|2019-04-24|"VALUE1"
A|2019-04-23|"VALUE2"
A|2019-06-11|"VALUE3"
A|2019-06-12|"VALUE4"
B|2019-05-21|"VALUE5"
B|2019-05-22|"VALUE6"
B|2019-03-13|"VALUE7"
C|2019-01-03|"VALUE8"
I would like to get one line per Id having the info of the maximum date line. This would be the output:
Id|Date|Name
---------------
A|2019-06-12|"VALUE4"
B|2019-05-22|"VALUE6"
C|2019-01-03|"VALUE8"
I have achieved through a group by getting the Id and the MAX Date, but not the value associated to that date.
What I am working on now is to inner join that table with the input one joining it on date and id, but I am not able to join on two fields.
Is there any way to bring to the result the value field related to the max date in the group by clause?
Otherwise, How could I join on two different fields those two tables?
Any Suggestion?
Thank you so much!!

You can use a correlated subquery :
select t.*
from table t
where t.date = (select max(t1.date) from table t1 where t1.id = t.id);
However, Most of DBMS supports analytical functions, so you can use :
select t.*
from (select t.*, row_number() over (partition by t.id order by t.date desc) as seq
from table t
) t
where seq = 1;

Select last item for each unique column value

I have a table containing message logs. Each conversation has a conversation ID.
I want to select distinct conversation IDs, and for each of them, find the latest message with that conversation ID and join it into the row.
This is what I tried but it doesn't add any data into the table except the two columns (conversationId and id). I want to get all columns from that table for each row with the latest
SELECT
logs.conversationId,
-- latest message id
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs
-- trying to get the remaining columns for the last message with that conversation ID
LEFT JOIN [dbo].[Logs] AS logs2 ON logs.id = logs2.id
WHERE
-- only conversations for last month
logs.timestamp >= DATEADD(month, -1, GETDATE())
GROUP BY logs.conversationId
When I try to add another column into SELECT, I get the error saying I need to add that column into the GROUP BY clause. But that causes the statement to run for an extremely long time, over 20 seconds for just a few dozen rows in the result.

use row_number() function
select *
from (
select *,
row_number() over(partition by conversationId order by id desc) as rn
from logs
) as t where t.rn=1

First get max log id per conversion from logs and then apply left join:
select * from
(SELECT
logs.conversationId,
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs group by logs.conversationId)a
left join [dbo].[Logs] AS logs2 ON a.id = logs2.id and a.conversationid=logs.conversationid

I would use a subquery in where to make it.
select *
from logs t
where t.id = (
SELECT MAX(tt.id)
from logs tt
WHERE tt.conversationId = t.conversationId
GROUP BY tt.conversationId
)
Note
if you make index in id might be faster than row_number version

How to check if a person has duplicate date records?

I am looking to query my Access database from Excel (DAO) to determine if any name in the table has more than one record per date. E.g. If Bob has two records on 05/05/17 then I want to return both records as part of a recordset.

Seems like you are looking for something like:
SELECT *
FROM yourtable
INNER JOIN
(
SELECT count(*), name, date
FROM yourtable
GROUP BY name, date
HAVING COUNT(*) > 1
) multi
ON multi.name = yourtable.name
AND multi.date = yourtable.date
The inner select returns rows with more than 1 entry for the same name and date.

In Access you can do
select name, date
from your_table
group by name, date
having count(*) > 1

Microsoft Access 2010: Select most recent max Record ID for each LANID

I need to filter out this data based on some criteria.
For every unique LANID, a user can have up to 2 records. Some users will only have 1 record.
I need to select the max Record ID for each LANID.

So create one query to determine the max(recordID) when grouped by LANID, then a second query using the first as the datasource joining it back to your table on LANID and max(recordID)

Assuming the last update date is not duplicated for a given row, then one method is to use a correlated subquery to get the last date and then get the rest of the columns in the row:
select sd.*
from sampleData as sd
where sd.RecordId = (select max(sd2.RecordId)
from sampleData as sd2
where sd2.lanId = sd.lanId
);
EDIT:
If you wanted the largest record id for the most recent update date:
select sd.*
from sampleData as sd
where sd.RecordId = (select top 1 sd2.RecordId
from sampleData as sd2
where sd2.lanId = sd.lanId
order by sd2.lastUpdateDate desc, sd2.RecordId desc
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Need to find a difference of data from the same table in hive - sql

Related

Finding Max(Date) BEFORE specified date in Redshift SQL

SQL - Summarize column with maximum date value and other fields

Select last item for each unique column value

How to check if a person has duplicate date records?

Microsoft Access 2010: Select most recent max Record ID for each LANID

Categories

Resources