How to sum the minutes of each activity in Postgresql? - sql

The column "activitie_time_enter" has the times.
The column "activitie_still" indicates the type of activity.
The column "activitie_walking" indicates the other type of activity.
Table example:
activitie_time_enter | activitie_still | activitie_walking
17:30:20 | Still |
17:31:32 | Still |
17:32:24 | | Walking
17:33:37 | | Walking
17:34:20 | Still |
17:35:37 | Still |
17:45:13 | Still |
17:50:23 | Still |
17:51:32 | | Walking
What I need is to sum up the total minutes for each activity separately.
Any suggestions or solution?

First calculate the duration for each activity (the with CTE) and then do conditional sum.
with t as
(
select
*, lead(activitie_time_enter) over (order by activitie_time_enter) - activitie_time_enter as duration
from _table
)
select
sum (duration) filter (where activitie_still = 'Still') as total_still,
sum (duration) filter (where activitie_walking = 'Walking') as total_walking
from t;
/** Result:
total_still|total_walking|
-----------+-------------+
00:19:16| 00:01:56|
*/
BTW do you really need two columns (activitie_still and activitie_walking)? Only one activity column with those values will do. This will allow more activities (Running, Sleeping, Working etc.) w/o having to change the table structure.

Related

Dividing sum results

I'm really sorry as this was probably answered before, but I couldn't find something that solved the problem.
In this case, I'm trying to get the result of dividing two sums in the same column.
| Id | month | budget | sales |
| -- | ----- | ------ | ----- |
| 1 | jan | 1000 | 800 |
| 2 | jan | 1000 | 850 |
| 1 | feb | 1200 | 800 |
| 2 | feb | 1100 | 850 |
What i want is to get the % of completition for each id and month (example: get 0,8 or 80% in a fifth column for id 1 in jan)
I have something like
sel
id,
month,
sum (daily_budget) as budget,
sum (daily_sales) as sales,
budget/sales over (partition by 1,2) as efectivenes
from sales
group by 1,2
I know im doing this wrong but I'm kinda new with sql and cant find the way :|
Thanks!
This should do it
CAST(ROUND(SUM(daily_sales) * 100.00 / SUM(daily_budget), 1) AS DECIMAL(5,2)) AS Effectiveness
I'm new at SQL too but maybe I can help. Try this?
sel
id,
month,
sum (daily_budget) as budget,
sum (daily_sales) as sales,
(sum(daily_budget)/sum(daily_sales)) over (partition by id) as efectivenes
from sales
group by id
If you want to ALTER your table so that it contains a fifth column where the result of budget/sales is automatically calculated, all you need to do this add the formula to this auto-generated column. The example I am about to show is based on MySQL.
Open MySQL
Find the table you wish to modify in the Navigator Pane, right-click on it and select "Alter Table"
Add a new row to your table. Make sure you select NN (Not Null) and G (Generated Column) check boxes
In the Default/Expression column, simply enter the expression budget / sales.
Once you run your next query, you should see your column generated and populated with the calculated results. If you simply want the SQL statement to do the same from the console, it will be something like this: ALTER table YOUR_TABLE_NAME add result FLOAT as (budget / sales);

Get the difference in time between multiple rows with the same column name

I need to get the time difference between two dates on different rows. This part is okay but I can have instances of the same title. A quick example which will explain things some more.
Lets say we have a table with the following records:
| ID | Title | Date |
| ----- | ------- |--------------------|
| 1 | Down |2021-03-07 12:05:00 |
| 2 | Up |2021-03-07 13:05:00 |
| 3 | Down |2021-03-07 10:30:00 |
| 4 | Up |2021-03-07 11:00:00 |
I basically need to get the time difference between the first "Down" and "Up". So ID 1 & 2 = 1 hour.
Then ID 3 & 4 = 30 mins, and so on for the amount of "Down" and "Up" rows there are.
(These will always be grouped together one after another)
It doesn't matter if the results are seperate or a SUM of all the differences.
I'm trying to get this done without a temp table.
Thank you.
This can be done using analytical functions, the availability of which will be determined based on your sql engine. The idea is to get the next value in the same row as the one you need to calculate the diff/sum
In the case above it would look some thing like below
SELECT
id ,
title,
Date as startdate,
LEAD(Date,1) OVER (
ORDER BY id
) enddate
FROM
table;
Once you have it on the same row, you can carry out your time difference operation.

Is there a way to calculate a SUM of a Count Alias in SQL?

I am trying to create a custom SQL report that will give me a percentage of DispositionCodes that are clicked after a customer service rep ends a call with a customer.
I am currently using a COUNT Alias to count how many times a Disposition code is assigned to a customer call. I would then like to summarize that DispositionCount alias into another column called "Total". Then I would like to see the percentage of times that a disposition code is selected by calculating DispositionCount / Total. Is it possible to SUM an alias to give me a Total count, and then calculate a percentage based off of two Alias columns?
CURRENT QUERY:
SELECT
WrapupData,
ISNULL(WrapupData, 'No Dispos Code Entered') as DispositionCode,
COUNT(CASE WHEN WrapupData IS NULL THEN 0 ELSE 1 END) AS DispositionCount
FROM Termination_Call_Detail tcd
LEFT JOIN dbo.t_Call_Type ct ON ct.CallTypeID = tcd.CallTypeID
GROUP BY
WrapupData
CURRENT OUTPUT
+---------------------+-------------------------+---------------------+
| | | |
+---------------------+-------------------------+---------------------+
| WrapupData | DispositionCode | DispositionCount |
| NULL | No Dispos Code Entered | 8 |
| Appointment Request | Appointment Request | 3 |
+---------------------+-------------------------+---------------------+
DESIRED OUTPUT
+---------------------+-------------------------+------------------+------------------+
| WrapupData | DispositionCode | DispositionCount |Total | Percentage|
| NULL | No Dispos Code Entered | 8 | 11 | 72.72 |
| Appointment Request | Appointment Request | 3 | 11 | 27.27 |
+---------------------+-------------------------+------------------+------------------+
I have tried count(sum(WrapupData))
but WrapupData is varchar and invalid for sum operator.
I have also tried count(sum(DispositionCount))
but DispositionCount comes back as an Invalid column name (I'm assuming because it's an Alias and is only temporary)
Any help or suggestions would be greatly appreciated!
You could use analytic functions here:
SELECT
WrapupData,
ISNULL(WrapupData, 'No Dispos Code Entered') AS DispositionCode,
COUNT(WrapupData) AS DispositionCount,
SUM(COUNT(WrapupData)) OVER () AS Total,
100.0 * COUNT(WrapupData) / SUM(COUNT(WrapupDatalse)) OVER () AS Percentage
FROM Termination_Call_Detail tcd
LEFT JOIN dbo.t_Call_Type ct
ON ct.CallTypeID = tcd.CallTypeID
GROUP BY
WrapupData;
The here is to use SUM() with a window over the entire table, post aggregation, to find the total. We can also find the percentage by normalizing the count using this sum.

JOIN or analytic function to match different sensors on nearby timestamps within a large dataset?

I have a large dataset consisting of four sensors in a single stream, but for simplicity's sake let's reduce that to two sensors that transmit at approximate (but not exact) same times like this:
+---------+-------------+-------+
| Sensor | Time | Value |
+---------+-------------+-------+
| SensorA | 10:00:01.14 | 10 |
| SensorB | 10:00:01.06 | 8 |
| SensorA | 10:00:02.15 | 11 |
| SensorB | 10:00:02.07 | 9 |
| SensorA | 10:00:03.14 | 13 |
| SensorA | 10:00:04.09 | 12 |
| SensorB | 10:00:04.13 | 6 |
+---------+-------------+-------+
I am trying to find the difference between SensorA and SensorB when their readings are within a half-second of each other. Like this:
+-------------+-------+
| Trunc_Time | Diff |
+-------------+-------+
| 10:00:01 | 2 |
| 10:00:02 | 2 |
| 10:00:04 | 6 |
+-------------+-------+
I know I could write queries to put each sensor in its own table (say SensorA_table and SensorB_table), and then join those tables like this:
SELECT
TIMESTAMP_TRUNC(a.Time, SECOND) as truncated_sec,
a.Value - b.Value as sensor_diff
FROM SensorA_table AS a JOIN SensorB_Table AS b
ON b.Time BETWEEN TIMESTAMP_SUB(a.Time, INTERVAL 500 MILLISECOND) AND TIMESTAMP_ADD(a.Time, INTERVAL 500 MILLISECOND)
But that seems very expensive to make every row of the SensorA_table compare against every row of the SensorB_table, given that the sensor tables are each about 10 TB. Or does partitioning automatically take care of this and only look at one block of SensorB's table per row of SensorA's table?
Either way, I am wondering if there is a better way to do this than a full JOIN. Since the matching values are all coming from within a few rows of each other in the original table, it seems like an analytic function might be able to look at a smaller amount of data at a time, but because we can't guarantee alternating rows of A & B, there's no clear LAG or LEAD offset that would always return the correct row.
Is it a matter of writing an analytic functions to return a few LAG and LEAD rows for each row, then evaluate each of those rows with a CASE statement to see if it is the correct row, then calculating the value? Or is there a way of doing a join against an analytic function's window?
Thanks for any guidance here.
One method uses lag(). Something like this:
select st.time, st.value - st.prev_value
from (select st.*,
lag(sensor) over (order by time, sensor) as prev_sensor,
lag(time) over (order by time, sensor) as prev_time,
lag(value) over (order by time, sensor) as prev_value
from sensor_table st
) st
where ( st.sensor = 'A' <> prev_sensor = 'B' ) and
prev_time > timestamp_add(time, interval 1 second)

Crosstab splitting results due to presence of unrelated field

I'm using postgres 9.1 with tablefunc:crosstab
I have a table with the following structure:
CREATE TABLE marketdata.instrument_data
(
dt date NOT NULL,
instrument text NOT NULL,
field text NOT NULL,
value numeric,
CONSTRAINT instrument_data_pk PRIMARY KEY (dt , instrument , field )
)
This is populated by a script that fetches data daily. So it might look like so:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I then use the following crosstab query to pivot the table:
select dt, instrument, vol, oi
FROM crosstab($$
select dt, instrument, field, value
from marketdata.instrument_data
where field = 'PX_VOLUME' or field = 'OPEN_INT'
$$::text, $$VALUES ('PX_VOLUME'),('OPEN_INT')$$::text
) vol(dt date, instrument text, vol numeric, oi numeric);
Running this I get the result:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | 2 |
The problem:
When running this with lot of real data in the table, I noticed that for some fields the function was splitting the result over two rows:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
I checked that the dt and instrument fields were identical and produced a work-around by grouping the ouput of the crosstab.
Analysis
I've discovered that it's the presence of one other entry in the input table that causes the output to be split over 2 rows. If I have the input as follows:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-04-23 | EUX.Bund.2014M | PX_VOLUME | 0 |
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I get:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-04-23 | EUX.Bund.2014M | 0 | |
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
Where it gets really weird...
If I recreate the above input table manually then the output is as we would expect, combined into a single row.
If I run:
update marketdata.instrument_data
set instrument = instrument
where instrument = 'EUX.Bund.2014M'
Then again, the output is as we would expect, which is surprising as all I've done is set the instrument field to itself.
So I can only conclude that there is some hidden character/encoding issue in that Bund entry that is breaking crosstab.
Are there any suggestions as to how I can determine what it is about that entry that breaks crosstab?
Edit:
I ran the following on the raw table to try and see any hidden characters:
select instrument, encode(instrument::bytea, 'escape')
from marketdata.bloomberg_future_data_temp
where instrument = 'EUX.Bund.2014M';
And got:
| instrument | encode |
|----------------+----------------|
| EUX.Bund.2014M | EUX.Bund.2014M |
Two problems.
1. ORDER BY is required.
The manual:
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row.
With the one-parameter form of crosstab(), ORDER BY 1,2 would be necessary.
2. One column with distinct values per group.
The manual:
crosstab(text source_sql, text category_sql)
source_sql is a SQL statement that produces the source set of data.
...
This statement must return one row_name column, one category column,
and one value column. It may also have one or more "extra" columns.
The row_name column must be first. The category and value columns must
be the last two columns, in that order. Any columns between row_name
and category are treated as "extra". The "extra" columns are expected
to be the same for all rows with the same row_name value.
Bold emphasis mine. One column. It seems like you want to form groups over two columns, which does not work as you desire.
Related answer:
Pivot on Multiple Columns using Tablefunc
The solution depends on what you actually want to achieve. It's not in your question, you silently assumed the function would do what you hope for.
Solution
I guess you want to group on both leading columns: (dt, instrument). You could play tricks with concatenating or arrays, but that would be slow and / or unreliable. I suggest a cleaner and faster approach with a window function rank() or dense_rank() to produce a single-column unique value per desired group. This is very cheap, because ordering rows is the main cost and the order of the frame is identical to the required order anyway. You can remove the added column in the outer query if desired:
SELECT dt, instrument, vol, oi
FROM crosstab(
$$SELECT dense_rank() OVER (ORDER BY dt, instrument) AS rnk
, dt, instrument, field, value
FROM marketdata.instrument_data
WHERE field IN ('PX_VOLUME', 'OPEN_INT')
ORDER BY 1$$
, $$VALUES ('PX_VOLUME'),('OPEN_INT')$$
) vol(rnk int, dt date, instrument text, vol numeric, oi numeric);
More details:
PostgreSQL Crosstab Query
You could run a query that replaces irregular characters with an asterisk:
select regexp_replace(instrument, '[^a-zA-Z0-9]', '*', 'g')
from marketdata.instrument_data
where instrument = 'EUX.Bund.2014M'
Perhaps the instrument = instrument assignment discards trailing whitespace. That would also explain why where instrument = 'EUX.Bund.2014M' matches two values that crosstab sees as different.