Creating min and max values and comparing them to timestamp values sql - sql

I have a PostgreSQL database and I have a table that I am looking to query to determine which presses have been updated between the first cycle created_timestamp and the most recent cycle created_timestamp. Here is an example of the table, which is called event_log_summary.
press_id cycle_number created_timestamp
1 1 2020-02-07 16:07:52
1 2 2020-02-07 16:07:53
1 3 2020-02-07 16:07:54
1 4 2020-04-01 13:23:10
2 1 2020-01-13 8:33:23
2 2 2020-01-13 8:33:24
2 3 2020-01-13 8:33:25
3 1 2020-02-21 18:45:44
3 2 2020-02-21 18:45:45
3 3 2020-02-26 14:22:12
This is the query that I used to get me a three column output of press_id, mincycle, max_cycle, but then I want to compare the maxcycle created_timestamp to the mincycle created_timestamp and see if there is at least x amount of time between the two, say at least 1 day, I am unsure about how to implement that.
SELECT
press_id,
MIN(cycle_number) AS minCycle,
MAX(cycle_number) AS maxCycle
FROM
event_log_detail
GROUP BY
press_id
I have tried different things like using WHERE (MAX(cycle_number) - MIN(cycle_number > 1), but I am pretty new to SQL and don't quite fully know how to implement this. The output I am looking for, would have a difference of at least one day would be the following:
press_id
1
3
Presses 1 and 3 have their maximum cycle created_timestamp at least 1-day difference than their minimum cycle created_timestamp. I am just looking for the press_ids whose first cycle and the last cycle have a difference of at least 1 day, I don't need any other information on the output, just one column with the press_ids. Any help would be appreciated. Thanks.

You can use a HAVING clause:
select press_id,
max(created_timestamp) - min(created_timestamp) as diff
from event_log_detail
group by press_id
having max(created_timestamp) > min(created_timestamp) + interval '1 day';

Related

Generating columns for daily stats in SQL

I have a table that currently looks like this (simplified to illustate my issue):
Thing| Date
1 2022-12-12
2 2022-11-05
3 2022-11-18
4 2022-12-01
1 2022-11-02
2 2022-11-21
5 2022-12-03
5 2022-12-08
2 2022-11-18
1 2022-11-20
I would like to generate the following:
Thing| 2022-11 | 2022-12
1 2 1
2 3 0
3 1 0
4 0 1
5 0 2
I'm new to SQL and can't quite figure this out - would I use some sort of FOR loop equivalent in my SELECT clause? I'm happy to figure out the exact syntax myself, I just need someone to point me in the right direction.
Thank you!
You may use conditional aggregation as the following:
Select Thing,
Count(Case When Date Between '2022-11-01' And '2022-11-30' Then 1 End) As '2022-11',
Count(Case When Date Between '2022-12-01' And '2022-12-31' Then 1 End) As '2022-12'
From table_name
Group By Thing
Order By Thing
See a demo.
The count function counts only the not null values, so for each row not matching the condition inside the count function a null value is returned, hence not counted.

Return rows depends on timestamp between them

The problem is that I want to get every n'th record from table but based on datetime.
I have table where I add record every 30 minutes with current state for each of my Sub objects, something like below:
Id
SubId
Color
Timestamp
1
7EB43D1D-7274-41C4-35DA-08D727A424E6
orange
2022-06-27 08:00:17.9843893
2
A8FDBB08-3747-4B93-BC66-08D7382060CE
purple
2022-06-27 08:00:17.9843893
3
7EB43D1D-7274-41C4-35DA-08D727A424E6
red
2022-06-27 08:30:15.7043893
4
A8FDBB08-3747-4B93-BC66-08D7382060CE
blue
2022-06-27 08:30:15.7043893
5
7EB43D1D-7274-41C4-35DA-08D727A424E6
yellow
2022-06-27 09:00:18.2841893
6
A8FDBB08-3747-4B93-BC66-08D7382060CE
orange
2022-06-27 09:00:18.2841893
And now I need to get points for one Sub object in certain period. But I dont want to get all entires cause I can end with too many points, I just want to get sometimes 1 per hour or 1 per day (it may change)
I already tried with ROW_NUMBER as I know that I'm adding point every 30 minutes but cause I need add where clausure for SubId then I might end with incorrect result (cause I'm adding or removing those Subobject in meanwhile)
SELECT * FROM (
SELECT [Id]
,[SubId]
,[Color]
,[Timestamp]
, ROW_NUMBER() OVER (ORDER BY OccupancyHistoryId) as rownum
FROM [dbo].[Table]) AS t
WHERE t.SubId = '7EB43D1D-7274-41C4-35DA-08D727A424E6' AND t.rownum % 2 = 0
Am I miss something obviouse? Or maybe my approach is wrong?
Expected result: For e.g records from 2022-06-27 to 2022-06-28 but only 1 per each 2 hours.
Id
SubId
Color
Timestamp
1
7EB43D1D-7274-41C4-35DA-08D727A424E6
orange
2022-06-27 08:00:17.9843893
5
7EB43D1D-7274-41C4-35DA-08D727A424E6
yellow
2022-06-27 10:00:18.2841893
10
7EB43D1D-7274-41C4-35DA-08D727A424E6
orange
2022-06-27 12:00:11.2821893
Thanks to #ourmandave's comments, I was able to resolve the problem. I didn't notice that I can use DATEDIFF with %.
So, to get entries only one per two hours, I simply write the query like that. So, obviously:
SELECT *
FROM [dbo].[Table] WHERE DateDiff(Minute, 0, TimestampUtc) % 120 = 0

Select maximum value where another column is used for for the Grouping

I'm trying to join several tables, where one of the tables is acting as a
key-value store, and then after the joins find the maximum value in a
column less than another column. As a simplified example, I have the following three tables:
Documents:
DocumentID
Filename
LatestRevision
1
D1001.SLDDRW
18
2
P5002.SLDPRT
10
Variables:
VariableID
VariableName
1
DateReleased
2
Change
3
Description
VariableValues:
DocumentID
VariableID
Revision
Value
1
2
1
Created
1
3
1
Drawing
1
2
3
Changed Dimension
1
1
4
2021-02-01
1
2
11
Corrected typos
1
1
16
2021-02-25
2
3
1
Generic part
2
3
5
Screw
2
2
4
2021-02-24
I can use the LEFT JOIN/IS NULL thing to get the latest version of
variables relatively easily (see http://sqlfiddle.com/#!7/5982d/3/0).
What I want is the latest version of variables that are less than or equal
to a revision which has a DateReleased, for example:
DocumentID
Filename
Variable
Value
VariableRev
DateReleased
ReleasedRev
1
D1001.SLDDRW
Change
Changed Dimension
3
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-01
4
1
D1001.SLDDRW
Description
Drawing
1
2021-02-25
16
1
D1001.SLDDRW
Change
Corrected Typos
11
2021-02-25
16
2
P5002.SLDPRT
Description
Generic Part
1
2021-02-24
4
How do I do this?
I figured this out. Add another JOIN at the start to add in another version of the VariableValues table selecting only the DateReleased variables, then make sure that all the VariableValues Revisions selected are less than this date released. I think the LEFT JOIN has to be added after this table.
The example at http://sqlfiddle.com/#!9/bd6068/3/0 shows this better.

Calculate time different from previous record

I have a set of data that I want to determine the difference in days between the Begin_time and End_Time for every 2 records to determine the processing time. I'm familiar with DateDiff('d','End_Time','Begin_Time',) to determine the processing time on the same row but how do I determine this for the previous record? For example, something like this DateDiff('Record2.Begin_time','Record1.End_Time') then DateDiff('Record4.Begin_time','Record3.End_Time') then DateDiff('Record6.Begin_time','Record5.End_Time') etc. It doesn't have to use DateDiff function, I'm just using that to illustrate my question. thanks
> Record Begin_Time End_Time Processing_Time
1 11/23/2020 11/24/2020 1
2 11/23/2020 11/24/2020 1
3 11/30/2020 11/30/2020 0
4 11/30/2020 11/30/2020 0
5 11/2/2020 11/3/2020 1
6 11/2/2020 11/3/2020 1
7 11/3/2020 11/5/2020 2
8 11/3/2020 11/5/2020 2
An Aproach could be like this:
Select DateDiff(YourTableEven.Begin_time, YourTableOdd.End_Time)
From YourTable AS YourTableEven
Join YourTable AS YourTableOdd ON YourTableOdd.Record = YourTableEven.Record + 1
Where YourTableEven.Record % 2 = 0

add column with fixed values for each value of another column Redshift

I have following table
]1
want to add date range for each user
How to achieve this:
if this is possible from query in Redshift then that be useful
If not, efficient way to create this in python pandas as data is having 8lk records
Given this dataframe df:
userid username
0 1 a
1 2 b
2 3 c
you can use numpy repeat and tile:
dr = pd.date_range('2020-01-01','2020-01-03')
df = pd.DataFrame(np.repeat(df.to_numpy(), len(dr), 0), columns=df.columns).assign(date=np.tile(dr.to_numpy(), len(df)))
Result:
userid username date
0 1 a 2020-01-01
1 1 a 2020-01-02
2 1 a 2020-01-03
3 2 b 2020-01-01
4 2 b 2020-01-02
5 2 b 2020-01-03
6 3 c 2020-01-01
7 3 c 2020-01-02
8 3 c 2020-01-03
In Sql this is simple too - just cross join with the list of dates you want to add to each row (replicate rows). You can see that in your example that 3 rows and 3 dates results in 9 rows. (untested explanatory code:)
select userid, username, "date" from <table> cross join (select values ('2020-01-01'::date), ('2020-02-01'::date), ('2020-03-01'::date));
Now the problem with simple approach is that if you are dealing with large tables and long lists of dates the multiplication will kill you. 10 billion rows by 5,000 dates is 15 trillion resulting rows - making this will take a long time and storing it will takes lots of disk space. For small tables and short lists of dates this works fine.
If you are in the "big" side of things you will likely need to rethink what you are trying to do. Since you are using Redshift there is a possibility that you may need to do this.