Extract records between days - sql

I have an Audit table for each and every day. All add/modify/delete records are stored. When any record is deleted it doesn’t show up the next day. Something like below.
Date records
---- --------
15th 100
16th 102 - Pickup all records, between 15 and 16, which are not in 16th
17th 110 - Pickup all records, between 16 and 17, which are not in 17th
18th 150 - Pickup all records, between 17 and 18, which are not in 18th
.. So on..
This is an Audit table which has the deleted records in the previous day, but not present today. I need to pick up all the deleted records, between dates.
But I don’t want to hard code the dates, instead, it should work from date to today()
How to achieve this in a single SQL query? I tried using “Union” it works, but with hardcoded dates. Is there any way we can achieve as a generic query which works as of today.

You can use two levels of aggregation. The first gets the maximum date for each id. The second records on the delete on the next day:
select max_date + interval 1 day, count(*)
from (select a.id, max(date) as max_date
from audit a
group by a.id
) t
group by max_date
order by max_date;
You might want a where clause to limit the maximum date to before the maximum date in the data (otherwise everything will look like it is being deleted on the following day).
An alternative method uses lead():
select date + 1, count(*)
from (select a.*,
lead(date) over (partition by id order by date) as next_date
from audit a
) t
where next_date <> date_add(date, INTERVAL 1 DAY) or next_date is null
group by date
order by date;
If records can be resurrected and you still want to count them as deleted when they disappear, this is the better method.
Here is a db<>fiddle.

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

Teradata loop for dates, column adding within loop

I have a table where every row is transaction and there are few columns: clients IDs and dates for every transaction.
I am trying to write a query which will give a table where column N shows number of clients whose first transaction happened in month N made transactions in months: N, N+1, N+2, ...
For example (desired table for 3 months data):
1 2 3
100 90 78
80 80
60
First row of the column 1 shows number of clients whose first transaction happened in month 1, second row shows how many of this clients stayed after 1 month, third row - after two month etc
My current query (Year is a column wit year for the date, like 2017, month is a number of month like 1 for January):
WITH not_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date<date "2017-01-01"),
ID_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date BETWEEN date "2017-01-01" AND date "2017-01-31"
),
from_this AS(
SELECT ID, Year, month
FROM table
)
SELECT Year, Month, count(distinct ID)
FROM from_this
WHERE ID IN (select ID from ID_in)
AND
ID NOT IN (select ID from not_in)
GROUP BY 1,2
ORDER BY 1,2
But this gives only one column (for January 2017) of the desired table. I need to change dates for other months in 2017, 2018 and so on manually.
How to avoid this?
I guess, it should be looped somehow. And I think, I should create volatile table and add columns to it within loop, then select * from it.
Also I can not find an instruction for variables declaration and while loops in Teradata, any clearifications are appreciated.

Need to count unique transactions by month but ignore records that occur 3 days after 1st entry for that ID

I have a table with just two columns: User_ID and fail_date. Each time somebody's card is rejected they are logged in the table, their card is automatically tried again 3 days later, and if they fail again, another entry is added to the table. I am trying to write a query that counts unique failures by month so I only want to count the first entry, not the 3 day retries, if they exist. My data set looks like this
user_id fail_date
222 01/01
222 01/04
555 02/15
777 03/31
777 04/02
222 10/11
so my desired output would be something like this:
month unique_fails
jan 1
feb 1
march 1
april 0
oct 1
I'll be running this in Vertica, but I'm not so much looking for perfect syntax in replies. Just help around how to approach this problem as I can't really think of a way to make it work. Thanks!
You could use lag() to get the previous timestamp per user. If the current and the previous timestamp are less than or exactly three days apart, it's a follow up. Mark the row as such. Then you can filter to exclude the follow ups.
It might look something like:
SELECT month,
count(*) unique_fails
FROM (SELECT month(fail_date) month,
CASE
WHEN datediff(day,
lag(fail_date) OVER (PARTITION BY user_id,
ORDER BY fail_date),
fail_date) <= 3 THEN
1
ELSE
0
END follow_up
FROM elbat) x
WHERE follow_up = 0
GROUP BY month;
I'm not so sure about the exact syntax in Vertica, so it might need some adaptions. I also don't know, if fail_date actually is some date/time type variant or just a string. If it's just a string the date/time specific functions may not work on it and have to be replaced or the string has to be converted prior passing it to the functions.
If the data spans several years you might also want to include the year additionally to the month to keep months from different years apart. In the inner SELECT add a column year(fail_date) year and add year to the list of columns and the GROUP BY of the outer SELECT.
You can add a flag about whether this is a "unique_fail" by doing:
select t.*,
(case when lag(fail_date) over (partition by user_id order by fail_date) > fail_date - 3
then 0 else 1
end) as first_failure_flag
from t;
Then, you want to count this flag by month:
select to_char(fail_date, 'Mon'), -- should aways include the year
sum(first_failure_flag)
from (select t.*,
(case when lag(fail_date) over (partition by user_id order by fail_date) > fail_date - 3
then 0 else 1
end) as first_failure_flag
from t
) t
group by to_char(fail_date, 'Mon')
order by min(fail_date)
In a Derived Table, determine the previous fail_date (prev_fail_date), for a specific user_id and fail_date, using a Correlated subquery.
Using the derived table dt, Count the failure, if the difference of number of days between current fail_date and prev_fail_date is greater than 3.
DateDiff() function alongside with If() function is used to determine the cases, which are not repeated tries.
To Group By this result on Month, you can use MONTH function.
But then, the data can be from multiple years, so you need to separate them out yearwise as well, so you can do a multi-level group by, using YEAR function as well.
Try the following (in MySQL) - you can get idea for other RDBMS as well:
SELECT YEAR(dt.fail_date) AS year_fail_date,
MONTH(dt.fail_date) AS month_fail_date,
COUNT( IF(DATEDIFF(dt.fail_date, dt.prev_fail_date) > 3, user_id, NULL) ) AS unique_fails
FROM (
SELECT
t1.user_id,
t1.fail_date,
(
SELECT t2.fail_date
FROM your_table AS t2
WHERE t2.user_id = t1.user_id
AND t2.fail_date < t1.fail_date
ORDER BY t2.fail_date DESC
LIMIT 1
) AS prev_fail_date
FROM your_table AS t1
) AS dt
GROUP BY
year_fail_date,
month_fail_date
ORDER BY
year_fail_date ASC,
month_fail_date ASC

Retrieve records between a current date and previous date

I need to get a count of the # of rows resulting from a query which needs to have the below logic:
Assume the table includes 3 columns for now; ID, VALUE and INSERT DATE
Records inserted on
Current day-1
Minus
records inserted on
the latest business day prior to the (current day-1)
To add more details:
****I am looking for 'number of records' inserted between 2 dates i.e. if 200 records were inserted between Thursday and Friday then when I run the query on Monday my result should show me '200 records'.
Assumption: Business days = Mon-Fri
USE DATEADD Function:
Select
*
FROM
TableName
WHERE CreatedDate Between CAST(DATEADD(d,-1,GETDATE() AS Date) AND CAST( GETDATE() AS Date)
Please try this
Select * FROM TableName WHERE AddedDate Between DATEADD(day,-1,GETDATE()) and GETDATE()