Let's take a simple query in Oracle:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
Now let's say another table, EVENT, contains multiple events which may be associated with each case (linked via EVENT.CASE_ID). OR not exist at all. I want to report on the earliest-dated future event per case - or if nothing exists, return NULL. I can do this with a subquery in the SELECT clause, as follows:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
MIN(EVENT.DATE)
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
This will return a table like this:
Case ID Case Type Date Raised Min Event Date
76 A 03/01/2019 10/05/2019
43 B 02/02/2019 [NULL]
89 A 29/01/2019 08/07/2019
90 A 04/03/2019 [NULL]
102 C 15/04/2019 20/05/2019
Note that if there do not exist any Events which match the criteria, the line is still returned but without a value. This is because the subquery is in the SELECT clause. This works just fine.
My problem, however, is if I want to return more than one column from the EVENT table - while still at the same time preserving the possibility that there are no matching rows from the EVENT table. The above code only returns EVENT.DATE as the single subquery result, to ONE column of the main query. But what if I also want to return EVENT.ID, or EVENT.TYPE? While still allowing for them to be NULL (if no matching records from CASE are found)?
I suppose I could use multiple subqueries in the SELECT clause: each returning just ONE column. But this seems horribly inefficient, given that each subquery would be based on the same criteria (the minimum-dated EVENT whose CASE ID matches that of the main query; or NULL if no such events found).
I suspect some nifty joins would be the answer - although I'm struggling to understand which ones exactly.
Please note that the above examples are vastly simplified versions of my actual code, which already contains multiple joins in the "old style" Oracle format, eg:
WHERE
CASE.ID(+) = EVENT.CASE_ID
There are reasons why this is so - therefore a request to anyone answering this, please would you demonstrate any solutions in this style of coding, as my SQL isn't far enough advanced to be able to re-factor the "newer" style joins into existing code.
You can use a join and window functions. For instance:
select c.*, e.*
from c left join
(select e.*,
row_number() over (partition by e.case_id order by e.date desc) as seqnum
from events e
) e
on e.case_id = c.id and e.seqnum = 1;
where c.date_raised > date '2019-01-01'; -- assuming the value is a date
Is this what you mean? I just rewrote Gordon's answer with old Oracle join syntax and your code style.
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
MIN_E.DATE AS MIN_EVENT_DATE
FROM
CASE,
(SELECT EVENT.*,
ROW_NUMBER() OVER (PARTITION BY EVENT.CASE_ID ORDER BY EVENT.DATE DESC) AS SEQNUM
FROM
EVENT
WHERE
EVENT.DATE >= CURRENT_DATE
) MIN_E
WHERE
CASE.DATE_RAISED > DATE '2019-01-01'
AND MIN_E.CASE_ID (+) = CASE.ID
AND MIN_E.SEQNUM (+) = 1;
Create object type with columns you want and return it from subquery. Your query will be like
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
t_your_new_type ( MIN(EVENT.DATE) , min ( EVENT.your_another_column ) )
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
Related
I have a SQL query (postgresql) that looks something like this:
SELECT
my_timestamp::timestamp::date as the_date,
count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date
The result is a table of YYYY-MM-DD, count pairs.
Now I've been asked to fill in the empty dates with zero. So if I was previously providing
2022-03-15 3
2022-03-17 1
I'd now want to return
2022-03-15 3
2022-03-16 0
2022-03-17 1
Now I can easily do this client-side (relative to the database) and let my program compute and return the zero-augmented list to its clients based on the original list from postgres. But perhaps it would better if I could just tell postgresql to include zeros.
I suspect this isn't easy at all, because postgres has no obvious way of knowing what I'm up to. But in the interests of learning more about postgres and SQL, I thought I'd have try. The try isn't too promising thus far...
Any pointers before I conclude that I was right to leave this to my (postgres client) program?
Update
This is an interesting case where my simplification of the problem led to a correct answer that didn't work for me. For those who come after, I thought it worth documenting what followed, because it take some fun twists through constructing SQL queries.
#a_horse_with_no_name responded with a query that I've verified works if I simplify my own query to match. Unfortunately, my query had some extra baggage that I didn't think pertinent, and so had trimmed out when posting the original question.
Here's my real (original) query, with all names preserved (if shortened):
-- current query
SELECT
LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
count(*) as count
FROM reading_group_reader rgr
INNER JOIN ( SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) > current_date - 30
GROUP BY the_date
ORDER BY the_date;
If I translate that directly into the proposed solution, however, the inner join between reading_group_reader and the temporary table TT causes the left join to become inner (I think) and the date sequence drops its zeros again. Fwiw, the table TT is a table because sometimes it actually is a subselect.
So I transformed my query into this:
SELECT
g.dt::date as the_date,
count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;
but this outputs 1's instead of 0's at the places that should be 0.
The reason for that, however, is because I've now selected every date, so, of course, there's one of each. I need to include an additional field (which will be NULL) and count that.
So this query finally does what I want:
SELECT
g.dt::date as the_date,
count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
rgr.device_id
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;
And, of course, on re-reading the accepted answer, I eventually saw that he did count an unrelated field, which I'd simply missed on my first several readings.
You will need to join to a list of dates. This can e.g. be done using generate_series()
SELECT g.dt::date as the_date,
count(t.my_timestamp) as count
FROM generate_series(date '2022-03-01',
date '2022-03-31',
interval '1 day') as g(dt)
LEFT JOIN my_table as t
ON t.my_timestamp::date = g.dt::date
AND ... -- the original WHERE clause goes here!
GROUP BY the_date
ORDER BY the_date;
Note that the original WHERE conditions need to go into the join condition of the LEFT JOIN. You can't put them into a WHERE clause because that would turn the outer join back into an inner join (which means the missing dates wouldn't be returned).
I have the following query that return me: 100 rows
SELECT uni_id, uni_mast_id, uni_type
FROM UNIVERSITIES
WHERE uni_master ='SO88'AND uni_stat= 'OK'
now i need to do a join with another table and to obtain last entry of that day then:
SELECT uni_id, uni_teach_name, MAX(cal_update), cal_status
FROM UNIVERSITIES
LEFT JOIN CALENDAR
ON unı_id = cal_id
WHERE uni_master = 'SO88'
AND uni_stat = 'OK'
AND cal_name = 'REGISTRED'
GROUP BY uni_id, uni_teach_name, uni_stat
ORDER BY cal_update
but this query gives me 102 records, because cal_update appears 2 times.
One for example with date : 22-OCT-2020 11:34:55 another for the same uni_id at time 22-OCT-2020 11:30:22
I want just to get the max date for that date, not both.
In this case the query with the join needs to return the same records of the first select query.
I think you can do what you want using row_number():
SELECT UNI_ID, UNI_TEACH_NAME, CAL_UPDATE, CAL_STATUS
FROM (SELECT U.UNI_ID, U.UNI_TEACH_NAME, C.CAL_UPDATE, C.CAL_STATUS,
ROW_NUMBER() OVER (PARTITION BY U.UNI_ID, TRUNC(C.CAL_UPDATE) ORDER BY C.CAL_UPDATE DESC) as seqnum
FROM UNIVERSITIES U LEFT JOIN
CALENDAR C
ON U.UNI_ID = C.CAL_ID AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88' AND
U.UNI_STAT= 'OK'
) UC
WHERE seqnum = 1;
I have to guess where the columns come from, because the question is not clear. Any filtering columns from CALENDAR should be in the ON clause if you are using a LEFT JOIN.
You can replace the last part of the query, while aliasing the MAX(cal_update) with cal_update , as
ORDER BY cal_update DESC
FETCH FIRST 1 ROW WITH TIES
for DB version 12c+ to descendingly order by the concerned column in order to pick the record with the latest value for that column.
WITH TIES option stand for bringing all records with the same datetime values, might be replaced with ONLY in order to bring only one row even for those cases occur.
The column call_status(within the select list) should be removed which's a non- aggregated column
As an alternative to a subquery and rank, you could use KEEP...LAST :
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME,
TRUNC(C.CAL_UPDATE)
I've moved the CAL_NAME check into the outer join's ON clause; if it's in the WHERE clause then it will effectively turn it back into an inner join. So this will get one row per university per day that the calendar was updated: "I want just to get the max date for that date". And it will show nulls for the calendar fields if there is no matching calendar, since it's an outer join.
If you actually only want the latest update on any day then just remove the TRUNC(C.CAL_UPDATE) from the grouping:
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME
db<>fiddle with some made-up data; and also (just for fun) showing Gordon's query with the calendar name clause in both places to show the difference, and to show this gets the same result for that dummy data. (And an 18c version which shows Barbaros' too; getting back a single row.)
I want to calculate DAU and exclude user that we don't consider "real" (employees, beta testers etc).
It worked fine previously when I wrote the filtering in the query:
SELECT
count(distinct user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
user_id IN (SELECT
distinct id
from
"user"."user"
WHERE
username IS NOT NULL AND position IS NOT NULL )
GROUP BY date
When I try changing it to below, which should give more or less the same count (basically instead of defining the 4000 "real users" I define the 1000 "non-users" I want to exclude). However, this gives me way higher counts. It's like the distinct statement isn't working.
I added the NOT NULL to the subquery but doesn't change the result. Is there something with the NOT IN + subquery that works in another way than the IN clause?
SELECT
count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
e.user_id NOT IN (SELECT distinct id FROM "public"."non_users" WHERE id IS NOT NULL)
GROUP BY
date
ORDER BY
date
Yes. If any of the values in the subquery are NULL, then NOT IN returns no rows For this reason, I strongly recommend that you always use NOT EXISTS -- it behaves as expected.
You seem to know this, because you are using a NULL comparison in the WHERE. So, the difference is probably due to the other condition. So, include it as well:
SELECT count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM "public"."events" e
WHERE NOT EXISTS (SELECT 1
FROM "public"."non_users" nu
WHERE e.user_id = nu.id AND
nu.position IS NOT NULL
)
GROUP BY date
ORDER BY date;
I am trying to return a set of results and decided to try my luck with CTE, the first table "Vendor", has a list of references, the second table "TVView", has ticket numbers that were created using a reference from the "Vendor" table. There may be one or more tickets using the same ticket number depending on the state of that ticket and I am wanting to return the last entry for each ticket found in "TVView" that matches a selected reference from "Vendor". Also, the "TVView" table has a seed field that is incremented.
I got this to return the right amount of entries (meaning not showing the duplicate tickets but only once) but I cannot figure out how to add an additional layer to go back through and select the last entry for that ticket and return some other fields. I can figure out how to sum which is actually easy, but I really need the Top 1 of each ticket entry in "TVView" regardless if its a duplicate or not while returning all references from "Vendor". Would be nice if SQL supported "Last"
How do you do that?
Here is what I have done so far:
with cteTickets as (
Select s.Mth2, c.Ticket, c.PyRt from Vendor s
Inner join
TVView c on c.Mth1 = s.Mth1 and c.Vendor = s.Vendor
)
Select Mth2, Ticket, PayRt from cteTickets
Where cteTickets.Vendor >='20'
and cteTickets.Vendor <='40'
and cteTickets.Mth2 ='8/15/2014'
Group by cteTickets.Ticket
order by cteTickets.Ticket
Several rdbms's that support Common Table Expressions (CTE) that I am aware of also support analytic functions, including the very useful ROW_NUMBER(), so the following should work in Oracle, TSQL (MSSQL/Sybase), DB2, PostgreSQL.
In the suggestions the intention is to return just the most recent entry for each ticket found in TVView. This is done by using ROW_NUMBER() which is PARTITIONED BY Ticket that instructs row_number to recommence numbering for each change of the Ticket value. The subsequent ORDER BY Mth1 DESC is used to determine which record within each partition is assigned 1, here it will be the most recent date.
The output of row_number() needs to be referenced by a column alias, so using it in a CTE or derived table permits selection of just the most recent records by RN = 1 which you will see used in both options below:
-- using a CTE
WITH
TVLatest
AS (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
)
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
-- using a derived table instead
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
) TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
please note: "SELECT *" is a convenience or used as an abbreviation if full details are unknown. The queries above may not operate without correctly specifying the field list (eg. 'as is' they would fail in Oracle).
I have the following query:
SELECT created_at::DATE, count (*)
FROM messages
WHERE city = 'los angeles'
GROUP BY created_at::DATE
Which works great. The challenge is that if there are no messages for a given date, then it returns no record for that date. How do you make the above query return the date and 0 if there are no messages on that date, for all days between a given date and today?
Working in PostgreSQL 8.3.
Thanks!
It sounds like you need a table of all the dates you are interested in, as it may contain dates not in your messages table. If you have, or build, this table then left join with the messages table and do count on a column that table--it will return 0 where nothing matches the join.
select d.created_at, count(m.messageId)
from possibleDates d
left join messages m
on d.created_at = m.created_at
group by d.created_at
Typical way is to have a separate calendar table with all of the dates in it, left joined to your table on date column, and then some sort of ifnull(x, 0) statement [whatever the function is for PostgreSQL] or case statement to return 0 when the left-join on the date returns null or 1 when it is not null. Then you can do your normal group by and use SUM(x) instead of count().
Very often, when you want to fill in zeroes for missing entries in a series, the answer in PostgreSQL involves the generate_series function. (Search Stackoverflow for lots of similar questions and answers.) In your case, use something like this:
SELECT ts::date AS date, coalesce(count, 0) AS count
FROM
(SELECT created_at::date, count(*)
FROM messages
WHERE city = 'los angeles'
GROUP BY created_at::date) AS m
RIGHT JOIN
(SELECT *
FROM generate_series(timestamp '2011-07-01',
timestamp 'today',
interval '1 day')) AS series(ts)
ON m.created_at = series.ts
ORDER BY 1;