Teradata / Window functions - QUALIFY - sql

I'm analyzing the code of my collegue.
Found this query:
SELECT
client_id
from lib.applications
QUALIFY Row_Number() Over(PARTITION BY client_id ORDER BY closed) = 1
WHERE closed=0 and application_date > '2016-01-01'
Logically, the query should return a list of clients with active (not closed) applications.
I can't uderstand, why he used QUALIFY etc.. here?
The request below is simplier and returns the same:
SELECT
client_id
from lib.applications
WHERE closed=0 and application_date > '2016-01-01'
Do you have any idea, for what reason QUALIFY could be used here?

QUALIFY is returning one row per client_id. The more colloquial way of writing the query would be:
SELECT DISTINCT client_id
FROM lib.applications
WHERE closed = 0 and application_date > '2016-01-01';
Perhaps the author of the query checked performance and found that QUALIFY is faster in this case (although I would doubt that). Perhaps the author was thinking of including other columns, in which case SELECT DISTINCT would not work.

Related

Get first record based on time in PostgreSQL

DO we have a way to get first record considering the time.
example
get first record today, get first record yesterday, get first record day before yesterday ...
Note: I want to get all records considering the time
sample expected output should be
first_record_today,
first_record_yesterday,..
As I understand the question, the "first" record per day is the earliest one.
For that, we can use RANK and do the PARTITION BY the day only, truncating the time.
In the ORDER BY clause, we will sort by the time:
SELECT sub.yourdate FROM (
SELECT yourdate,
RANK() OVER
(PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
In the main query, we will sort the data beginning with the latest date, meaning today's one, if available.
We can try out here: db<>fiddle
If this understanding of the question is incorrect, please let us know what to change by editing your question.
A note: Using a window function is not necessary according to your description. A shorter GROUP BY like shown in the other answer can produce the correct result, too and might be absolutely fine. I like the window function approach because this makes it easy to add further conditions or change conditions which might not be usable in a simple GROUP BY, therefore I chose this way.
EDIT because the question's author provided further information:
Here the query fetching also the first message:
SELECT sub.yourdate, sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Or if only the message without the date should be selected:
SELECT sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Updated fiddle here: db<>fiddle

If statement in a WHERE clause for between two dates

I have a script that counts the number of doses a client has had between their start date and 180 days out.
Now i am trying to have some form of an IF (or CASE) statement in the where clause so its either between the first date and 180 days out OR if that 180 days exceeds 6/30/20, then just do the count between start date and 6/30/20.
In my research i couldnt find anything about using an IF else (or CASE) with dates, in the WHERE function.
This is my current script in SQL Server
SELECT
t.clinic,
t.display_id,
m.FirstDate,
DATEADD(DAY,180,MIN(take_on_date)) AS Days_180,
COUNT(t.dose_number) AS Doses
FROM (SELECT CLINIC
, display_id
, MIN(TAKE_ON_DATE) AS FirstDate
FROM factMedHist
GROUP BY Clinic, display_id
) AS m
INNER JOIN factMedHist AS t
ON t.Clinic = m.Clinic
AND t.display_id = m.display_id
WHERE t.take_on_date
BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
GROUP BY t.Clinic, t.display_id,m.FirstDate
So "start date" = "FirstDate" = "min(TAKE_ON_DATE)". And "Client" = "display_id" but you group and join on the tuple <display_id, Clinic>. I see many struggles in the future based on this unfortunately common issue. Consistent terminology is important.
So here is one take on your issue. A bit verbose to demonstrate what it does. Note also the provision of a MVCE - something that you should provide to encourage others to help. It is a bit of effort you should not expect others to take on just to solve your issues.
You were on the correct path with CASE - but lost it when you started thinking of it as a control-of-flow construct as it is in most other languages. You can compute the startdate and enddate for each client (clinic, display_id)
with cte as (select *,
min(takedate) over (partition by display_id, clinic order by takedate) as startdate,
dateadd(day, 180, min(takedate) over (partition by display_id, clinic order by takedate)) as enddate
from #medhist
)
You were doing that - but the problem is that you need reference that end date in the where clause to filter the rows as desired. Like this:
where takedate <= case when enddate <= '20200630' then enddate else '20200630' end
Fiddle here. Notice that the first take date is irrelevant in the where clause. This is one way to achieve your result. Another obvious approach is to use a conditional sum. That would be good practice if you wanted to increase your knowledge. Depending on your situation one might be more efficient than the other. CTEs are just syntactic sugar but do allow the building of logic in a piece-by-piece approach - something I find can be very helpful in developing a completed tsql statement.
WHERE t.take_on_date BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
OR t.take_on_date > DATEADD(DAY, 180, '20200630')

How to use sql LAG() properly

I have the following SQL line that has a syntax error. I'm trying to reference prior day close in my SQL query how do i fix my query to not error out?
Thanks!
SELECT *
FROM "daily_data"
WHERE date >'2018-01-01' and (open-LAG(close))/LAG(close)>=1.4 and volume > 1000000 and open > 1
Error:
Query execution failed
Reason: SQL Error [42809]: ERROR: window function lag requires an OVER
clause Position: 63
You need to use a subquery. You cannot use window functions in the where clause. You also need an ORDER BY and potentially a PARTITION BY clause:
SELECT *
FROM (SELECT dd.*,
LAG(close) OVER (ORDER BY date) as prev_close
FROM "daily_data" dd
) dd
WHERE date > '2018-01-01' AND
(open - prev_close) / prev_close >= 1.4 AND
volume > 1000000 AND
open > 1;
lag(close) means "the value of close from the prior record." So the phrase by itself is missing something fundamental, specifically how do you define "prior record" since there is never any implied order in a RDBMS.
As with functions such as rank and row_number, to propery form the lead and lag commands you need to establish the prior (or next) record by defining the order of output. In other words, "if you were to sort the output by x, the prior record's close" would look like this:
lag (close) over (order by x)
To order by something descending:
lag (close) over (order by x desc)
You can optionally chunk the data by a field using partition by which may or may not be useful in your problem. For example, "for each item, if you were to sort the output by x, the prior record's close:"
lag (close) over (partition by item order by x)
To the question here is prior record (lag)... how? By which fields, in which order?
As a final thought, analytic/windowing functions cannot be used in the where clause in PostgreSQL. To accomplish this, wrap them in a subquery:
with daily as (
SELECT
d.*,
LAG (d.close) over (order by d.<something>) as prior_close
FROM "daily_data" d
WHERE
d.date >'2018-01-01' and
d.volume > 1000000 and
d.open > 1
)
select *
from daily
where
(open - prior_close) / prior_close >= 1.4

Get Max(date) or latest date with 2 conditions or group by or subquery

I only have basic SQL skills. I'm working in SQL in Navicat. I've looked through the threads of people who were also trying to get latest date, but not yet been able to apply it to my situation.
I am trying to get the latest date for each name, for each chemical. I think of it this way: "Within each chemical, look at data for each name, choose the most recent one."
I have tried using max(date(date)) but it needs to be nested or subqueried within chemical.
I also tried ranking by date(date) DESC, then using LIMIT 1. But I was not able to nest this within chemical either.
When I try to write it as a subquery, I keep getting an error on the ( . I've switched it up so that I am beginning the subquery a number of different ways, but the error returns near that area always.
Here is what the data looks like:
1
Here is one of my failed queries:
SELECT
WELL_NAME,
CHEMICAL,
RESULT,
APPROX_LAT,
APPROX_LONG,
DATE
FROM
data_all
ORDER BY
CHEMICAL ASC,
date( date ) DESC (
SELECT
WELL_NAME,
CHEMICAL,
APPROX_LAT,
APPROX_LONG,
DATE
FROM
data_all
WHERE
WELL_NAME = WELL_NAME
AND CHEMICAL = CHEMICAL
AND APPROX_LAT = APPROX_LAT
AND APPROX_LONG = APPROX_LONG,
LIMIT 2
)
If someone does have a response, it would be great if it is in as lay language as possible. I've only had one coding class. Thanks very much.
Maybe something like this?
SELECT WELL_NAME, CHEMICAL, MAX(DATE)
FROM data_all
GROUP BY WELL_NAME, CHEMICAL
If you want all information, then use the ANSI-standard ROW_NUMBER():
SELECT da.*
FROM (SELECT da.*
ROW_NUMBER() OVER (PARTITION BY chemical, name ORDER BY date DESC) as senum
FROM data_all da
) da
WHERE seqnum = 1;

SQL merging result sets on a unique column value

I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.