Compare value of the same column in PL SQL between 2 days - sql

I have table called ACCOUNTS which has every day data. I want to compare if there is change in email address between yesterday and today employee wise.
select EMAIL,EMPLOYEE from ACCOUNTS where day='30-DEC-20'; --today's data
select EMAIL,EMPLOYEE from ACCOUNTS where day='29-DEC-20' -- yesterday'data
I have to deal with bulk data sets here and have no clue whatsoever.

Assuming you have one row per day per employee, one method is aggregation:
select employee,
max(case when date = date '2020-12-29' then email end) as email_yesterday,
max(case when date = date '2020-12-30' then email end) as email_today
from accounts
where date in (date '2020-12-29', date '2020-12-30')
group by employee
having min(email) <> max(email);
If you wanted to generalize this to any day:
select employee,
max(case when date = trunc(sysdate) - interval '1' day then email end) as email_yesterday,
max(case when date = trunc(sysdate) then email end) as email_today
from accounts
where date >= trunc(sysdate) - interval '1' day
group by employee
having min(email) <> max(email);

This option compares e-mail addresses between "this" and "previous" days and returns a row if they differ.
Sample data is in a CTE (lines #1 - 9) - you don't type that as you have it in your table.
Query you might be interested in begins at line #11.
SQL> with accounts (email, employee, day) as
2 -- sample data; you already have that in your table
3 (select 'scott#x.com', 'Scott', date '2020-12-01' from dual union all
4 select 'scott#x.com', 'Scott', date '2020-12-02' from dual union all
5 select 'scott#y.com', 'Scott', date '2020-12-04' from dual union all
6 --
7 select 'adams#x.com', 'Adams', date '2020-12-11' from dual union all
8 select 'adams#y.com', 'Adams', date '2020-12-12' from dual
9 ),
10 -- query begins here
11 data as
12 -- fetch "today's" and "previous day's" e-mail addresses
13 (select email todays_email,
14 lag(email) over (partition by employee order by day desc) previous_email,
15 employee,
16 day
17 from accounts
18 )
19 --
20 -- display data where today's and previous day's e-mail adresses differ
21 select employee, day, todays_email, previous_email
22 from data
23 where todays_email <> previous_email
24 order by employee, day;
EMPLO DAY TODAYS_EMAI PREVIOUS_EM
----- ---------- ----------- -----------
Adams 11.12.2020 adams#x.com adams#y.com
Scott 02.12.2020 scott#x.com scott#y.com
SQL>

Related

How to filter the last 7 days based on the previous query? -BigQuery

Hi I just want to ask how to resolve this problem.
Example in the query indicated below.
In the next query I will prepare, I want to filter the last 7 days of the delivery date. Do not use current_date because the maximum date is very late.
Assuming the current date is 7/12/2022 but the query shows a maximum date of 7/07/2022. How can I filter the date from 7/1/2022 to 7/07/2022?
, Datas1 as
(select distinct (delivery_due_date) as delivery_date
, Specialist
, Id_number
, Staff_Total as Total_Items
from joining
where Delivery_Due_Date is not null
)
Actually I used max function in where but I get an error. Please help me.
Created Examples of such data in first block.
Performed the select on that data in second block.
Extracted Maximum Delivery data in 3rd Block.
Restricted last block for 7 days of data collected from 3rd block.
WITH joining AS(
SELECT '2022-07-01' AS delivery_due_date, 'ABC' as Specialist,222 as Id_number, 21 as Staff_Total union all
SELECT '2022-07-07' AS delivery_due_date, 'ABC2' as Specialist,223 as Id_number, 01 as Staff_Total union all
SELECT '2022-07-15' AS delivery_due_date, 'ABC4' as Specialist,212 as Id_number, 25 as Staff_Total union all
SELECT '2022-07-20' AS delivery_due_date, 'AB5C' as Specialist,224 as Id_number, 15 as Staff_Total union all
SELECT '2022-07-05' AS delivery_due_date, 'ABC7' as Specialist,226 as Id_number, 87 as Staff_Total ),
Datas1 as (select distinct (delivery_due_date) as delivery_date , Specialist
, Id_number , Staff_Total as Total_Items from joining where Delivery_Due_Date is not null ),
Datas2 as (
select max(delivery_date) as ddd from Datas1)
select Datas1.* from Datas1,Datas2 where date(delivery_date) between date_sub(date(Datas2.ddd), interval 7 day) and date(Datas2.ddd)

Google Big Query SQL - Get most recent unique value by date

#EDIT - Following the comments, I rephrase my question
I have a BigQuery table that i want to use to get some KPI of my application.
In this table, I save each create or update as a new line in order to keep a better history.
So I have several times the same data with a different state.
Example of the table :
uuid |status |date
––––––|–––––––––––|––––––––––
3 |'inactive' |2018-05-12
1 |'active' |2018-05-10
1 |'inactive' |2018-05-08
2 |'active' |2018-05-08
3 |'active' |2018-05-04
2 |'inactive' |2018-04-22
3 |'inactive' |2018-04-18
We can see that we have multiple value of each data.
What I would like to get:
I would like to have the number of current 'active' entry (So there must be no 'inactive' entry with the same uuid after). And to complicate everything, I need this total per day.
So for each day, the amount of 'active' entries, including those from previous days.
So with this example I should have this result :
date | actives
____________|_________
2018-05-02 | 0
2018-05-03 | 0
2018-05-04 | 1
2018-05-05 | 1
2018-05-06 | 1
2018-05-07 | 1
2018-05-08 | 2
2018-05-09 | 2
2018-05-10 | 3
2018-05-11 | 3
2018-05-12 | 2
Actually i've managed to get the good amount of actives for one day. But my problem is when i want the results for each days.
What I've tried:
I'm stuck with two solutions that each return a different error.
First solution :
WITH
dates AS(
SELECT GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), CURRENT_DATE(), INTERVAL 1 DAY)
arr_dates )
SELECT
i_date date,
(
SELECT COUNT(uuid)
FROM (
SELECT
uuid, status, date,
RANK() OVER(PARTITION BY uuid ORDER BY date DESC) rank
FROM users
WHERE
PARSE_DATE("%Y-%m-%d", FORMAT_DATETIME("%Y-%m-%d",date)) <= i_date
)
WHERE
status = 'active'
and rank = 1
## rank is the condition which causes the error
) users
FROM
dates, UNNEST(arr_dates) i_date
ORDER BY i_date;
The SELECT with the RANK() OVER correctly returns the users with a rank column that allow me to know which entry is the last for each uuid.
But when I try this, I got a :
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN. because of the rank = 1 condition.
Second solution :
WITH
dates AS(
SELECT GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), CURRENT_DATE(), INTERVAL 1 DAY)
arr_dates )
SELECT
i_date date,
(
SELECT
COUNT(t1.uuid)
FROM
users t1
WHERE
t1.date = (
SELECT MAX(t2.date)
FROM users t2
WHERE
t2.uuid = t1.uuid
## Here that's the i_date condition which causes problem
AND PARSE_DATE("%Y-%m-%d", FORMAT_DATETIME("%Y-%m-%d", t2.date)) <= i_date
)
AND status='active' ) users
FROM
dates,
UNNEST(arr_dates) i_date
ORDER BY i_date;
Here, the second select is working too and correctly returning the number of active user for a current day.
But the problem is when i try to use i_date to retrieve datas among the multiple days.
And Here i got a LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. error...
Which solution is more able to succeed ? What should i change ?
And, if my way of storing the data isn't good, how should i proceed in order to keep a precise history ?
Below is for BigQuery Standard SQL
#standardSQL
SELECT date, COUNT(DISTINCT uuid) total_active
FROM `project.dataset.table`
WHERE status = 'active'
GROUP BY date
-- ORDER BY date
Update to address your "rephrased" question :o)
Below example is using dummy data from your question
#standardSQL
WITH `project.dataset.users` AS (
SELECT 3 uuid, 'inactive' status, DATE '2018-05-12' date UNION ALL
SELECT 1, 'active', '2018-05-10' UNION ALL
SELECT 1, 'inactive', '2018-05-08' UNION ALL
SELECT 2, 'active', '2018-05-08' UNION ALL
SELECT 3, 'active', '2018-05-04' UNION ALL
SELECT 2, 'inactive', '2018-04-22' UNION ALL
SELECT 3, 'inactive', '2018-04-18'
), dates AS (
SELECT day FROM UNNEST((
SELECT GENERATE_DATE_ARRAY(MIN(date), MAX(date))
FROM `project.dataset.users`
)) day
), active_users AS (
SELECT uuid, status, date first, DATE_SUB(next_status.date, INTERVAL 1 DAY) last FROM (
SELECT uuid, date, status, LEAD(STRUCT(status, date)) OVER(PARTITION BY uuid ORDER BY date ) next_status
FROM `project.dataset.users` u
)
WHERE status = 'active'
)
SELECT day, COUNT(DISTINCT uuid) actives
FROM dates d JOIN active_users u
ON day BETWEEN first AND IFNULL(last, day)
GROUP BY day
-- ORDER BY day
with result
Row day actives
1 2018-05-04 1
2 2018-05-05 1
3 2018-05-06 1
4 2018-05-07 1
5 2018-05-08 2
6 2018-05-09 2
7 2018-05-10 3
8 2018-05-11 3
9 2018-05-12 2
I think this -- or something similar -- will do what you want:
SELECT day,
coalesce(running_actives, 0) - coalesce(running_inactives, 0)
FROM UNNEST(GENERATE_DATE_ARRAY(DATE('2015-05-11'), DATE('2018-06-29'), INTERVAL 1 DAY)
) AS day left join
(select date, sum(countif(status = 'active')) over (order by date) as running_actives,
sum(countif(status = 'active')) over (order by date) as running_inactives
from t
group by date
) a
on a.date = day
order by day;
The exact solution depends on whether the "inactive" is inclusive of the day (as above) or takes effect the next day. Either is handled the same way, by using cumulative sums of actives and inactives and then taking the difference.
In order to get data on all days, this generates the days using arrays and unnest(). If you have data on all days, that step may be unnecessary

POSTGRES - Average for previous 4 weekdays

Hi I am trying to calculate the average of previous 4 Tuesdays. I have daily sales data and I am trying to calculate what the average for previous 4 weeks were for the same weekday.
Attached is a snapshot of how my dataset looks like
Now for March 6, I would like to know what is the average for the previous 4 weeks were, (namely Feb 6, Feb 13, Feb 20 and Feb 27). This value needs to be assigned to Monthly Average column
I am using a PostGres DB.
Thanks
You can use window functions:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 3 preceding and current row
) as avg_4_weeks
from t
where day = 'Tuesday';
This assumes that "previous 4 weeks" is the current date plus the previous three weeks. If it starts the week before, only the windowing clause needs to change:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 4 preceding and 1 preceding
) as avg_4_weeks
from t
where day = 'Tuesday';
I decided to post my answer also, for anyone else searching. My answer will allow you to put in any date and get the average for the previous 4 weeks ( current day + previous 3 weeks matching the day).
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE sales (sellerName varchar(10), dailyCount int, saleDay date) ;
INSERT INTO sales (sellerName, dailyCount, saleDay)
SELECT 'ABC',10,to_date('2018-03-15','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',11,to_date('2018-03-14','YYYY-MM-DD') UNION ALL
SELECT 'ABC',12,to_date('2018-03-12','YYYY-MM-DD') UNION ALL
SELECT 'ABC',13,to_date('2018-03-11','YYYY-MM-DD') UNION ALL
SELECT 'ABC',14,to_date('2018-03-10','YYYY-MM-DD') UNION ALL
SELECT 'ABC',15,to_date('2018-03-09','YYYY-MM-DD') UNION ALL
SELECT 'ABC',16,to_date('2018-03-08','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',17,to_date('2018-03-07','YYYY-MM-DD') UNION ALL
SELECT 'ABC',18,to_date('2018-03-06','YYYY-MM-DD') UNION ALL
SELECT 'ABC',19,to_date('2018-03-05','YYYY-MM-DD') UNION ALL
SELECT 'ABC',20,to_date('2018-03-04','YYYY-MM-DD') UNION ALL
SELECT 'ABC',21,to_date('2018-03-03','YYYY-MM-DD') UNION ALL
SELECT 'ABC',22,to_date('2018-03-02','YYYY-MM-DD') UNION ALL
SELECT 'ABC',23,to_date('2018-03-01','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',24,to_date('2018-02-28','YYYY-MM-DD') UNION ALL
SELECT 'ABC',25,to_date('2018-02-22','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',26,to_date('2018-02-15','YYYY-MM-DD') UNION ALL
SELECT 'ABC',27,to_date('2018-02-08','YYYY-MM-DD') UNION ALL
SELECT 'ABC',28,to_date('2018-02-01','YYYY-MM-DD')
;
Now For The Query:
WITH theDay AS (
SELECT to_date('2018-03-15','YYYY-MM-DD') AS inDate
)
SELECT AVG(dailyCount) AS totalCount /* 18.5 = (10(3/15)+16(3/8)+23(3/1)+25(2/22))/4 */
FROM sales
CROSS JOIN theDay
WHERE extract(dow from saleDay) = extract(dow from theDay.inDate)
AND saleDay <= theDay.inDate
AND saleDay >= theDay.inDate-INTERVAL '3 weeks' /* Since we want to include the entered
day, for the INTERVAL we need 1 less week than we want */
Results:
| totalcount |
|------------|
| 18.5 |

Checking if a Birthday was between a 6 month span which crosses the year break

I am trying to write SQL code (using SQL Developer) that checks if a person had a birthday within the past 6 month insurance term.
This is what my code currently looks like.
SELECT DRIVER_KEY, CASE WHEN BDAY BETWEEN EFFDAY AND EXPDAY THEN 1 ELSE 0 END AS BDAYIND FROM (
SELECT DISTINCT A.DRIVER_KEY
, TO_CHAR(A.BIRTH_DATE,'mm/dd') AS BDAY
, TO_CHAR(SUBSTR(A.EFFECTIVE_DATE_KEY,5,2)||'/'||SUBSTR(A.EFFECTIVE_DATE_KEY,7,2) ) AS EFFDAY
, TO_CHAR(SUBSTR(A.EXPIRATION_DATE_KEY,5,2)||'/'||SUBSTR(A.EXPIRATION_DATE_KEY,7,2) ) AS EXPDAY
FROM DRIVER_TABLE A
);
It works - so long as the term doesn't cross the break in year. However, my code currently says that 01/25 is NOT between 09/19 and 03/19... How do I fix this?
EDIT: As APC pointed out, my solution does not work for leap years. I would normally delete this post, but it was already selected as the answer to the question. I updated my code below to use the year logic from Brian Leach's solution instead of the to_date strings. Please upvote Brian or APC's answers instead.
Here is my create statement with arbitrary dates:
create table DRIVER_TABLE
(
BIRTH_DATE date,
EFFECTIVE_DATE_KEY date,
EXPIRATION_DATE_KEY date
);
insert into DRIVER_TABLE
values(to_date('05/01/1980','MM/DD/YYYY'),
to_date('11/01/2016','MM/DD/YYYY'),
to_date('04/01/2017','MM/DD/YYYY'));
Here is the query:
select case when BirthdayEFFYear between EFFECTIVE_DATE_KEY and EXPIRATION_DATE_KEY
or BirthdayEXPYear between EFFECTIVE_DATE_KEY and EXPIRATION_DATE_KEY
or to_number(EXPIRATION_DATE_KEY - EFFECTIVE_DATE_KEY) / 365 > 1
then 1 else 0 end BDAYIND
from(
select add_months(BIRTH_DATE,12 * (extract(year from EFFECTIVE_DATE_KEY) - extract(year from BIRTH_DATE))) BirthdayEFFYear,
add_months(BIRTH_DATE,12 * (extract(year from EXPIRATION_DATE_KEY) - extract(year from BIRTH_DATE))) BirthdayEXPYear,
EFFECTIVE_DATE_KEY,EXPIRATION_DATE_KEY
from DRIVER_TABLE A
)
SQLFiddle
Compare dates as dates, not as strings.
Apparently EFFECTIVE_DATE_KEY contains the year in the first four characters, and as such the following should give you what you're looking for:
SELECT DRIVER_KEY,
CASE
WHEN BDAY BETWEEN EFFDAY AND EXPDAY THEN 1
ELSE 0
END AS BDAYIND
FROM (SELECT DISTINCT A.DRIVER_KEY,
A.BIRTH_DATE AS BDAY,
TO_DATE(A.EFFECTIVE_DATE_KEY, 'YYYYMMDD') AS EFFDAY,
TO_DATE(A.EXPIRATION_DATE_KEY, 'YYYYMMDD') AS EXPDAY
FROM DRIVER_TABLE A);
Best of luck.
'01/25' is not between '09/19' and '03/19' because between() is never true when the second argument is smaller than the first argument. You fall ito this trap because you're working with strings. It is always easier to work with dates using the DATE datatype.
It looks like your columns effective_date and expiry_date may not be stored as dates but rather a string; unfortunately this is a common data modelling mistake. If so, you need to cast them to DATE first before applying the following.
This solution has a subquery which selects the pertinent columns from driver_table and also calculates each driver's current age in years. The age is used to derive the last birthday, which is then compared in the main query to the bounds of the insurance term. Because we derive an actual date we can use Oracle's standard date arithmetic so the bdayind is calculated correctly.
SQL> with cte as (
2 select driver_key
3 , date_of_birth
4 , trunc(months_between(sysdate, date_of_birth)/12) as age
5 , add_months(date_of_birth, 12 * (trunc(months_between(sysdate, date_of_birth)/12))) as last_birthday
6 , effective_date
7 , expiry_date
8 from driver_table
9 )
10 select driver_key
11 , date_of_birth as dob
12 , age
13 , effective_date as eff_date
14 , expiry_date as exp_date
15 , last_birthday as last_bday
16 , case
17 when last_birthday between effective_date and expiry_date
18 then 1
19 else 0 end as bdayind
20 from cte
21 /
DRIVER_KEY DOB AGE EFF_DATE EXP_DATE LAST_BDAY BDAYIND
---------- --------- ---- --------- --------- --------- ----------
12 02-APR-98 19 01-DEC-16 31-MAY-17 02-APR-17 1
22 02-APR-98 19 01-JAN-17 30-JUN-17 02-APR-17 1
32 02-SEP-98 18 01-DEC-16 31-MAY-17 02-SEP-16 0
42 02-SEP-98 18 01-JAN-17 30-JUN-17 02-SEP-16 0
SQL>
The subquery produces both age and last_birthday just for demonstration purposes. In real life you only need the last_birthday column.
This solution differs slightly from the others in that:
It works for any birthday between any effective and expiration dates
It accounts for leap years
The raw_data is just setting up the dates for the example:
WITH
raw_data
AS
(SELECT DATE '1963-08-03' AS birthday
, DATE '2017-04-01' AS effectiveday
, DATE '2017-10-31' AS expirationday
, 'Billy' AS name
FROM DUAL
UNION ALL
SELECT DATE '1995-03-20' AS birthday
, DATE '2017-04-01' AS effectiveday
, DATE '2017-10-31' AS expirationday
, 'Sue' AS name
FROM DUAL
UNION ALL
SELECT DATE '1997-01-15' AS birthday
, DATE '2016-12-01' AS effectiveday
, DATE '2017-05-31' AS expirationday
, 'Olga' AS name
FROM DUAL),
mod_data
AS
(SELECT raw_data.*
, ADD_MONTHS (
birthday
, (extract(year from effectiveday) - extract (year from birthday)) * 12
)
effectiveanniversary
, ADD_MONTHS (
birthday
, (extract(year from expirationday) - extract (year from birthday)) * 12
)
expirationanniversary
FROM raw_data)
SELECT name, mod_data.birthday, effectiveday, expirationday
, CASE
WHEN effectiveanniversary BETWEEN effectiveday AND expirationday
OR expirationanniversary BETWEEN effectiveday AND expirationday
THEN
1
ELSE
0
END
found_between
FROM mod_data
NAME BIRTHDAY EFFECTIVEDAY EXPIRATIONDAY FOUND_BETWEEN
Billy 1963/08/03 2017/04/01 2017/10/31 1
Sue 1995/03/20 2017/04/01 2017/10/31 0
Olga 1997/01/15 2016/12/01 2017/05/31 1

alternative to lag SQL command

I have a table which has a table like this.
Month-----Book_Type-----sold_in_Dollars
Jan----------A------------ 100
Jan----------B------------ 120
Feb----------A------------ 50
Mar----------A------------ 60
Mar----------B------------ 30
and so on
I have to calculate the expected sales for each month and book type based on the last 2 months sales.
So for March and type A it would be (100+50)/2 = 75
For March and type B it is 120/1 since no data for Feb is there.
I was trying to use the lag function but it wouldn't work since there is data missing in a few rows.
Any ideas on this?
Since it plans to ignore missing values, this should probably work. Don't have a database to test it on at the moment but will give it another go in the morning
select
month,
book_type,
sold_in_dollars,
avg(sold_in_dollars) over (partition by book_type order by month
range between interval '2' month preceding and interval '1' month preceding) as avg_sales
from myTable;
This sort of assumes that month has a date datatype and can be sorted on... if it's just a text string then you'll need something else.
Normally you could just use rows between 2 preceding and 1 preceding but but this will take the two previous data points and not necessarily the two previous months if there are rows missing.
You could work it out with lag but it would be a bit more complicated.
As far as I know, you can give a default value to lag() :
SELECT Book_Type,
(lag(sold_in_Dollars, 1, 0) OVER(PARTITION BY Book_Type ORDER BY Month) + lag(sold_in_Dollars, 2, 0) OVER(PARTITION BY Book_Type ORDER BY Month))/2 AS expected_sales
FROM your_table
GROUP BY Book_Type
(Assuming Month column doesn't really contain JAN or FEB but real, orderable dates.)
What about something like (forgive the sql server syntax, but you get the idea):
Select Book_type, AVG(sold_in_dollars)
from MyTable
where Month in (Month(DATEADD('mm'-1,GETDATE)),Month(DATEADD('mm'-2,GETDATE)))
group by booktype
A partition outer join can help create the missing data. Create a set of months and join those values to each row by the month and perform the join once for each book type. I created the months January through April in this example:
with test_data as
(
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'A' book_type, 100 sold_in_dollars from dual union all
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'B' book_type, 120 sold_in_dollars from dual union all
select to_date('01-FEB-2010', 'DD-MON-YYYY') month, 'A' book_type, 50 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'A' book_type, 60 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'B' book_type, 30 sold_in_dollars from dual
)
select book_type, month, sold_in_dollars
,case when denominator = 0 then 'N/A' else to_char(numerator / denominator) end expected_sales
from
(
select test_data.book_type, all_months.month, sold_in_dollars
,count(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) denominator
,sum(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) numerator
from
(
select add_months(to_date('01-JAN-2010', 'DD-MON-YYYY'), level-1) month from dual connect by level <= 4
) all_months
left outer join test_data partition by (test_data.book_type) on all_months.month = test_data.month
)
order by book_type, month