SQL In Oracle - How to search through occurrences in an interval? - sql

I've gotten myself stuck working in Oracle with SQL for the first time. In my library example, I need to make a query on my tables for a library member who has borrowed more than 5 books in some week during the past year. Here's my attempt:
SELECT
PN.F_NAME,
PN.L_NAME,
M.ENROLL_DATE,
COUNT(*) AS BORROWED_COUNT,
(SELECT
(BD.DATE_BORROWED + INTERVAL '7' DAY)
FROM DUAL, BORROW_DETAILS BD
GROUP BY BD.DATE_BORROWED + INTERVAL '7' DAY
HAVING COUNT(*) > 5
) AS VALID_INTERVALS
FROM PERSON_NAME PN, BORROW_DETAILS BD, HAS H, MEMBER M
WHERE
PN.PID = M.PID AND
M.PID = BD.PID AND
BD.BORROWID = H.BORROWID
GROUP BY PN.F_NAME, PN.L_NAME, M.ENROLL_DATE, DATEDIFF(DAY, BD.DATE_RETURNED, VALID_INTERVALS)
ORDER BY BORROWED_COUNT DESC;
As I'm sure you can tell, Im really struggling with the Dates in oracle. For some reason DATEDIFF wont work at all for me, and I cant find any way to evaluate the VALID_INTERVAL which should be another date...
Also apologies for the all caps.

DATEDIFF is not a valid function in Oracle; if you want the difference then subtract one date from another and you'll get a number representing the number of days (or fraction thereof) between the values.
If you want to count it for a week starting from Midnight Monday then you can TRUNCate the date to the start of the ISO week (which will be Midnight of the Monday of that week) and then group and count:
SELECT MAX( PN.F_NAME ) AS F_NAME,
MAX( PN.L_NAME ) AS L_NAME,
MAX( M.ENROLL_DATE ) AS ENROLL_DATE,
TRUNC( BD.DATE_BORROWED, 'IW' ) AS monday_of_iso_week,
COUNT(*) AS BORROWED_COUNT
FROM PERSON_NAME PN
INNER JOIN MEMBER M
ON ( PN.PID = M.PID )
INNER JOIN BORROW_DETAILS BD
ON ( M.PID = BD.PID )
GROUP BY
PN.PID,
TRUNC( BD.DATE_BORROWED, 'IW' )
HAVING COUNT(*) > 5
ORDER BY BORROWED_COUNT DESC;
db<>fiddle
You haven't given your table structures or any sample data so its difficult to test; but you don't appear to need to include the HAS table and I'm assuming there is a 1:1 relationship between person and member.
You also don't want to GROUP BY names as there could be two people with the same first and last name (who happened to enrol on the same date) and should use something that uniquely identifies the person (which I assume is PID).

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

ORACLE How to get data from a table depending on year and the amount of duplicates

so the question is "Produce a list of those employees who have made bookings at the Sports Club more than 5 times in the last calendar year (this should be calculated and not hard coded). Order these by the number of bookings made."
what i am struggling with is being able to get todays year and then subtract 1 year aswell as showing ONLY one name of people appearing 5 times.
this is what i have so far, but it doesnt work.
SELECT DISTINCT
FIRSTNAME,SURNAME,DATEBOOKED,MEMBERSHIPTYPEID,BOOKINGID,MEMBER.MEMBERID
FROM MEMBERSHIP,MEMBER,BOOKING
WHERE MEMBER.MEMBERID = BOOKING.MEMBERID
AND MEMBERSHIP.MEMBERSHIPTYPEID = 3 AND BOOKING.DATEBOOKED = (SYSDATE,
'DD/MON/YY -1') AND FIRSTNAME IN (
SELECT FIRSTNAME
FROM MEMBER
GROUP BY FIRSTNAME,SURNAME
HAVING COUNT(FIRSTNAME) >= 5
)
ORDER BY MEMBER.MEMBERID;
EDR
tabele structures
What you need is this day last year. There are various different ways of calculating that. For instance,
add_months(sysdate, -12)
or
sysdate - interval '1' year
The subquery looks a bit whacky too. What you're after is the number of BOOKING records in the last year. So the subquery should drive off the BOOKING table, and the date filter should be there.
Finally, there is a missing join condition between MEMBER and MEMBERSHIP. That's probably why you think you need that distinct; fix the join and you'll get the result set you want and not a product. One advantage of the ANSI 92 explicit join syntax is that it stops us from missing joins.
Your query needs to be sorted by the number of bookings, so you need to count those and sort by the total. This means you don't actually need a subquery at all.
So your query should look something like:
SELECT MEMBER.MEMBERID
, MEMBER.FIRSTNAME
, MEMBER.SURNAME
, count(BOOKING.BOOKID) as no_of_bookings
FROM MEMBER
inner join MEMBERSHIP
on MEMBER.MEMBERID = MEMBERSHIP.MEMBERID
inner join BOOKING
on MEMBER.MEMBERID = BOOKING.MEMBERID
WHERE MEMBERSHIP.MEMBERSHIPTYPEID = 3
and BOOKING.DATEBOOKED >= add_months(trunc(sysdate), -12)
GROUP BY MEMBER.MEMBERID
, MEMBER.FIRSTNAME
, MEMBER.SURNAME
HAVING COUNT(*) >= 5
ORDER BY no_of_bookings desc;
Here is a SQL Fiddle demo for my query.
SELECT
FIRSTNAME,SURNAME,DATEBOOKED,MEMBERSHIPTYPEID,BOOKINGID,MEMBER.MEMBERID
FROM MEMBERSHIP,MEMBER,BOOKING
WHERE MEMBER.MEMBERID = BOOKING.MEMBERID
AND MEMBERSHIP.MEMBERSHIPTYPEID = 3 AND BOOKING.DATEBOOKED between (SYSDATE) and
(sysdate - interval '1' year)
AND FIRSTNAME IN (
SELECT distinct FIRSTNAME
FROM MEMBER
GROUP BY FIRSTNAME,SURNAME
HAVING COUNT(FIRSTNAME) >= 5
)
ORDER BY MEMBER.MEMBERID;
I think this answers the question:
SELECT m.FIRSTNAME, m.SURNAME, m.MEMBERID
FROM MEMBER m JOIN
BOOKING b
ON m.MEMBERID = b.MEMBERID JOIN
MEMBERSHIP ms
ON ms.MEMBERID = m.MEMBERID
WHERE ms.MEMBERSHIPTYPEID = 3 AND
B.DATEBOOKED >= SYSDATE - INTERVAL '1 YEAR'
GROUP BY m.FIRSTNAME, m.SURNAME, m.MEMBERID
HAVING COUNT(*) >= 5
ORDER BY COUNT(* DESC;
Table aliases make the query much easier to write and to read.

RedShift: Alternative to 'where in' to compare annual login activity

Here are the two cases:
Members Lost: Get the distinct count of user ids from 365 days ago who haven't had any activity since then
Members Added: Get the distinct count of user ids from today who don't exist in the previous 365 days.
Here are the SQL statements I've been writing. Logically I feel like this should work (and it does for sample data), but the dataset is 5Million+ rows and takes forever! Is there any way to do this more efficiently? (base_date is a calendar that I'm joining on to build out a 2 year trend. I figured this was faster than joining the 5million table on itself...)
-- Members Lost
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_lost
FROM base_date
LEFT JOIN site_visit
-- Get Login Activity for 365th day
ON DATEDIFF(day, srclogindate, effective_date) = 365
WHERE dwuserid NOT IN (
-- Get Distinct Login activity for Current Day (PY) + 1 to Current Day (CY) (i.e. 2013-01-02 to 2014-01-01)
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 0 AND 364
)
GROUP BY effective_date
ORDER BY effective_date;
-- Members Added
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_added
FROM base_date
LEFT JOIN site_visit ON srclogindate = effective_date
WHERE dwuserid NOT IN (
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 1 AND 365
)
GROUP BY effective_date
ORDER BY effective_date;
Thanks in advance for any help.
UPDATE
Thanks to #JohnR for pointing me in the right direction. I had to tweak your response a bit because I need to know on any login day how many were "Member Added" or "Member Lost" so it had to be a 365 rolling window looking back or looking forward. Finding the IDs that didn't have a match in the LEFT JOIN was much faster.
-- Trim data down to one user login per day
CREATE TABLE base_login AS
SELECT DISTINCT "dwuserid", "srclogindate"
FROM site_visit
-- Members Lost
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_lost"
FROM base_login current
LEFT JOIN base_login future
ON current."dwuserid" = future."dwuserid"
AND current."srclogindate" < future."srclogindate"
AND DATEADD(day, 365, current."srclogindate") >= future."srclogindate"
WHERE future."dwuserid" IS NULL
GROUP BY current."srclogindate"
-- Members Added
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_added"
FROM base_login current
LEFT JOIN base_login past
ON current."dwuserid" = past."dwuserid"
AND current."srclogindate" > past."srclogindate"
AND DATEADD(day, 365, past."srclogindate") >= current."srclogindate"
WHERE past."dwuserid" IS NULL
GROUP BY current."srclogindate"
NOT IN should generally be avoided because it has to scan all data.
Instead of joining to the site_visit table (which is presumably huge), try joining to a sub-query that selects UserID and the most recent login date -- that way, there is only one row per user instead of one row per visit.
For example:
SELECT dwuserid, min (srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
You could then simplify the queries to something like:
-- Members Lost: Last login was between 12 and 13 months ago
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
last_login BETWEEN current_date - interval '13 months' and current_date - interval '12 months'
-- Members Added: First visit in last 12 months
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
first_login > current_date - interval '12 months'

in sql, calculating date parts versus date lookup table in group queries

many queries are by week, month or quarter when the base table date is either date or timestamp.
in general, in group by queries, does it matter whether using
- functions on the date
- a day table that has extraction pre-calculated
note: similar question as DATE lookup table (1990/01/01:2041/12/31)
for example, in postgresql
create table sale(
tran_id serial primary key,
tran_dt date not null default current_date,
sale_amt decimal(8,2) not null,
...
);
create table days(
day date primary key,
week date not null,
month date not null,
quarter date non null
);
-- week query 1: group using funcs
select
date_trunc('week',tran_dt)::date - 1 as week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
where date_trunc('week',tran_dt)::date - 1 between '2012-1-1' and '2011-12-31'
group by date_trunc('week',tran_dt)::date - 1
order by 1;
-- query 2: group using days
select
days.week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
join days on( days.day = sale.tran_dt )
where week between '2011-1-1'::date and '2011-12-31'::date
group by week
order by week;
to me, whereas the date_trunc() function seems more organic, the the days table is easier to use.
is there anything here more than a matter of taste?
-- query 3: group using instant "immediate" calendar table
WITH calender AS (
SELECT ser::date AS dd
, date_trunc('week', ser)::date AS wk
-- , date_trunc('month', ser)::date AS mon
-- , date_trunc('quarter', ser)::date AS qq
FROM generate_series( '2012-1-1' , '2012-12-31', '1 day'::interval) ser
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calender cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Note: I fixed an apparent typo in the BETWEEN range.
UPDATE: I used Erwin's recursive CTE to squeeze out the duplicated date_trunc(). Nested CTE galore:
WITH calendar AS (
WITH RECURSIVE montag AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 AS dd
FROM montag
WHERE dd < '2012-1-1'::date
)
SELECT mo.dd, date_trunc('week', mo.dd + 1)::date AS wk
FROM montag mo
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calendar cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Yes, it is more than a matter of taste. The performance of the query depends on the method.
As a first approximation, the functions should be faster. They don't require joins, doing the read in a single table scan.
However, a good optimizer could make effective use of a lookup table. It would know the distribution of the target values. And, an in memory join could be quite fast.
As a database design, I think having a calendar table is very useful. Some information such as holidays just isn't going to work as a function. However, for most ad hoc queries the date functions are fine.
1. Your expression:
... between '2012-1-1' and '2011-12-31'
doesn't work. Basic BETWEEN requires the left argument to be less than or equal to the right argument. Would have to be:
... BETWEEN SYMMETRIC '2012-1-1' and '2011-12-31'
Or it's just a typo and you mean something like:
... BETWEEN '2011-1-1' and '2011-12-31'
It's unclear to me, what your queries are supposed to retrieve. I'll assume you want all weeks (Monday to Sunday) that start in 2011 for the rest of this answer. This expression generates exactly that in less than a microsecond on modern hardware (works for any year):
SELECT generate_series(
date_trunc('week','2010-12-31'::date) + interval '7d'
,date_trunc('week','2011-12-31'::date) + interval '6d'
, '1d')::date
*Note that the ISO 8601 definition of the "first week of a year is slightly different.
2. Your second query does not work at all. No GROUP BY?
3. The question you link to did not deal with PostgreSQL, which has outstanding date / timestamp support. And it has generate_series() which can obviate the need for a separate "days" table in most cases - as demonstrated above. Your query would look like this:
In the meantime #wildplasser provided an example query that was supposed to go here.
By popular* demand, a recursive CTE version - which is actually not that far from being a serious alternative!
* and by "popular" I mean #wildplasser's very serious request.
WITH RECURSIVE days AS (
SELECT '2011-01-01'::date AS dd
,date_trunc('week', '2011-01-01'::date )::date AS wk
UNION ALL
SELECT dd + 1
,date_trunc('week', dd + 1)::date AS wk
FROM days
WHERE dd < '2011-12-31'::date
)
SELECT d.wk
,count(*) AS sale_ct
,sum(s.sale_amt) AS sale_amt
FROM days d
JOIN sale s ON s.tran_dt = d.dd
-- WHERE d.wk between '2011-01-01' and '2011-12-31'
GROUP BY 1
ORDER BY 1;
Could also be written as (compare to #wildplasser's version):
WITH RECURSIVE d AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 FROM d WHERE dd < '2011-12-31'::date
), days AS (
SELECT dd, date_trunc('week', dd + 1)::date AS wk
FROM d
)
SELECT ...
4. If performance is of the essence, just make sure, that you do not apply functions or calculations to the values of your table. This prohibits the use of indexes and is generally very slow, because every row has to be processed. That's why your first query is going to suck with big table. When ever possible, apply calculations to the values you filter with, instead.
Indexes on expressions are one way around this. If you had an index like
CREATE INDEX sale_tran_dt_week_idx ON sale (date_trunc('week', tran_dt)::date);
.. your first query could be very fast again - at some cost for write operations for index maintenance.