Getting the oldest datas from a table that are older than a 100 days

Getting the oldest datas from a table that are older than a 100 days - sql

I've been struggling with the following problem:
EXPLAINING
I have a table called part_subhourly_data that holds production data for a part (For the purpose of the problem, no need to know what a part is).
I need to archive the any data older than a 100 days. But since there's a lot of data (they arrive each 5 or 10 minutes) and we have more than 1000 parts, I need to do it the 5 oldest days each time.
This is the schema of my table:
part_subhourly_data
id INTERGER,
part_id INTEGER,
produced_at TIMESTAMP
data HSTORE
So basically I need to get all data that is in this table, where produced_at is prior to 100 days ago and limit that to the first 5 days, per part.
Example:
Part 1 has data from 15 Aug 2016 until 12 Dec 2016
Part 2 has data from 1st Sep 2016 until 12 Dec 2016
100 days ago would be 3 Sep 2016.
For Part 1 I would take data from 15 Aug 2016 until 19 Aug 2016 (5 days).
For Part 2 I would take data from 1st Sep 2016 until 3 Sep 2016 (3 days because of the 100 days old condition).
WHAT HAVE I TRIED
Well, I'm using rails on this, but a SQL solution is welcome as well. For now, What I'm doing is to grab the oldest data with:
SELECT "part_subhourly_data"."part_id", MIN(produced_at) produced_at
FROM "part_subhourly_data"
WHERE (produced_at < (NOW() - INTERVAL '100 days'))
GROUP BY "part_subhourly_data"."part_id"
And then I loop Over each part_id and grab the data based on the MIN(produced_at). It works, but it doesn't seems ideal. I'm sure that there is some SQL magic to make it simpler, and quicker, without having to loop over each part.
Any idea?

Take all records where produced_at is prior to 100 days ago.
dense rank the records per part_id ordered by produced_at::date in ascending order.
The records with the oldest date will get 1, the records with the next oldest date will get 2 etc.
select part_id,produced_at
from (select part_id,produced_at
,dense_rank () over (partition by part_id order by produced_at::date) as dr
from part_subhourly_data
where produced_at < now() - interval '100 days'
) p
where dr <= 5
;

Related

Redshift SQL SUM over fixed number days

I'm currently stucked on an issue. I have daily data and I want to SUM all the data for the next 30 days over a year.
Date
Views
28-01-2021
1
29-01-2021
5
30-01-2021
1
31-01-2021
5
And I want to have the number of views starting the 28th for the next 30 days, then the next 30 days etc... over a year (or twelve times)
So basically what I want to see is something like this, Series being the series of 30 days (first 30 days, second 30 days etc...)
Series
Views
1 (from 28-01 to 28-02)
250
2 (from 01-03 to 30-03)
200
3 (from 31-04 to 29-04)
300
4 (from 30-04 to 29-05)
550
Thank you if anyone can help.
Regards

Assuming you have one row for each day, you can use a window function:
select t.*,
sum(views) over (order by date
rows between current row and 29 following
) as views_30days
from t;
Note: This interprets "next 30 days" as really being "today plus the next 29 days". If you don't want the current day, then the window frame would be:
rows between 1 following and 29 following

Add results to a row regarding the last 12 months rows- SQL Server

At my last meeting someone asked me if it was possible to hide people who where ill since a year from a dashboard. So I'm searching for the best way to actually KNOW who has been ill for 12 months.
I am working with a table with the number of days you've been absent for every kind of absence you could have, the number of days you should have been working that month, with a row per person, department and profession each month.
So it looks something like this :
PersonID
YEAR
MONTH
DEPARTMENT
PROFESSION
Absence1
Absence2
Absence3
WORKING DAYS OF THE MONTH...
11111
2021
07
HR
ASSISTANT
0
2
0
22
11111
2021
08
HR
ASSISTANT
0
0
0
22
==> So if I'm on a row of July 2021 I need to check the lines from June 2020 to June 2021.
My guess is that I need to add a column to this table who will say (with some kind of loop maybe) "if for the last 12 months (rows) the total number of days of absence equal the number of working days of the last 12 months then "ILL FOR A YEAR OR MORE" for each person (knowing that a person can work in more than one department or more than one profession so she'll have more than one row per month).
But I really have no idea how to actually write it in a script as I usually do very basic things. I'm using SQL SERVER and have 429 207 rows in the table. I'm thinking about doing it in the whole table and not only treating this month's rows because in the dashboard we show an historic.

Your table is heavily denormalized. If you want to represent all this information in the database, I would have expected the following tables, instead of just one:
Person
Department
Illness (list of illnesses)
IllnessAbsence (join table between Person and Illness)
Either way, you can get the information you need with something like this:
I've assumed you want the whole table, so you need a window function
We need to flip the logic on its head: exclude all rows which have no non-absence in the last 12 months
SELECT
PersonID,
YEAR,
MONTH,
DEPARTMENT,
PROFESSION,
ILLNESS1,
ILLNESS2,
ILLNESS3,
[WORKING DAYS OF THE MONTH]
FROM (
SELECT *,
NotIllLast12Months = COUNT(CASE WHEN DATEFROMPARTS(YEAR + 1, MONTH, 1) >= GETDATE()
AND ILLNESS1 + ILLNESS2 + ILLNESS3 = 0 THEN 1 END)
OVER (PARTITION BY ID)
FROM HETP_ABS
) abs
WHERE NotIllLast12Months > 0;

return values within the last 365 counting from newest date ORACLE SQL

[ORACLE SQL] I'm trying to write a query that returns all the values in a single column in the last 365 days counting from the last time (or newest date; in other words) data was entered.
For example: table: EMPLOYEE_TIMESTAMPS
EMPLOYEE_ID TIMESTAMP_DATE
1 AUG 2014
1 AUG 2015
2 JAN 2016
1 FEB 2016
1 OCT 2016
the resulting data should be only the last two rows, as it should count 365 days from OCT 2016 backwards.
I tried using the following code but resulted in [ORA-00934: group function is not allowed here] because of the MAX function. Using SYSDATE does not get the job done as the last data could have been added months ago.
SELECT * FROM EMPLOYEE_TIMESTAMPS
WHERE TIMESTAMP_DATE >= MAX(TIMESTAMP_DATE) -365;
I'm fairly new to programming so I still have a hard time transmitting my ideas. Thanks for the help.

One method of doing what you want uses window functions:
SELECT *
FROM (SELECT et.*,
MAX(TIMESTAMP_DATE) OVER (PARTITION BY EMPLOYEE_ID) as MAX_TIMESTAMP_DATE
FROM EMPLOYEE_TIMESTAMPS
) et
WHERE TIMESTAMP_DATE >= MAX_TIMESTAMP_DATE - 365;
The problem with your version is the use of the aggregation function MAX(). First, there is no GROUP BY in the query. And, second, these functions are not allowed in the WHERE clause.
The MAX() in the above version is called an analytic function, because it has the OVER clause.

SQL- Last 4 weeks with date column

USERS
ID TIMEMODIFIED
1 1400481271
2 1400481489
3 1400486453
4 1400486525
5 1401777484
I have timemodified field, From timemodified, I need to get the rows of last 4 weeks by taking from today's date.
SELECT id FROM USERS
WHERE FROM_UNIXTIME(timemodified,'%d-%m-%Y') >= curdate()
AND FROM_UNIXTIME(timemodified,'%d-%m-%Y') < curdate()-1

Your times are already in Unix timestamp format. Bear in mind that it'll be far more efficient to compare [TIMEMODIFIED] against the current date converted to a Unix timestamp. In addition, you don't need to check any upper bound unless [TIMEMODIFIED] can be in the future.
Try:
-- 60x60x24x7x4 = 2419200 seconds in four weeks
SET #unix_four_weeks_ago = UNIX_TIMESTAMP(curdate()) - 2419200;
SELECT id FROM USERS
WHERE timemodified >= #unix_four_weeks_ago;
NB. Four weeks ago (i.e. today – 28 days) was 1,437,696,000 (24th July) at the time of this answer. The latest record in the sample you provided has a timestamp going back to the 3rd June 2014, and so none of these records will be returned by the query.

MS Access grouping query spanning start and end dates

I would like to get a running tally of how many widgets were/are rented at any one time, by month, by year. Data is held in an MS Access 2003 db;
Table name: rent_table
Fields:
rentid
startdate
enddate
rentfee
rentcost
bookingfee
Something like; Count number of rentid's that fall between month/year, then group them?
e.g. if a widget was rented from 5th Jan 2014 to 8th April 2014 it would appear as a count in Jan, Feb, Mar and April tally's.
Many thanks.
EDIT
More details (sorry);
Access db is fronted by classic ASP.
If possible I don't want to create any new tables.
No input is required in order to run the report.
There are around 350-400 widgets that could be rented at any one time.
Each widget is rented exclusively.
Report output example;
Month | Year | NumRented
Jan 2014 86
Feb 2014 113
...
Can a query pick up dates within dates? So literally do a count of the table where date >Dec 31st 2013 AND <1st Feb 2014 (to grab a count for all of January 2014) and would that include the example of the rent starting on the 5th Jan? So I could just do twelve counts for each year?

create a calendar table, e.g.
table = cal_yyyymm with one column dt_yyyymm as numeric field
populate the table with ... say 5 or 10 years of data
201401 201402 201403 ... 60 or 120 rows, a small table
make a sql
Select
dt_yyyymm,
count(*) as cnt
From cal_yyyymm
Left Join rent_table
On format(startdate,"yyyymm") >= dt_yyyymm
And dt_yyyymm >= format(enddate,"yyyymm")
think about the complications in the data -- --
widget was rented from 5th Jan 2014 to 8th Jan 2014
and again rented from 11th Jan 2014 to 21st Jan 2014
does this count at 1 or 2 in the month?
if it is 1, then the sql gets more complicated because
the rent_table first needs to have its dates converted
to yyyymm format, and second needs to be de-duped on rentid,
and third then joined to cal_ On the dates...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas