SQL Group by joining with time difference - sql

I have this college project with a good focus on the frontend, but I'm struggling with a SQL query (PostgreSQL) that needs to be executed at one of the backend endpoints.
The table I'm speaking of is the following:
id
todo_id
column_id
time_in_status
0
259190
3
0
1
259190
10300
30
2
259190
10001
60
3
259190
10600
90
4
259190
6
30
A good way to simplify what it is, is saying it's a to-do organizer by vertical columns where each column would be represented by its column_id, and each row is task column change event.
With all that said what I need to get the job done is to generate a view (or another suggested better way) from this table that will show how long each task spent on each column_id. Also for a certain todo_id, column_id is not unique, so that could be multiple events on column 10300 and the table below would group by it and sum them
For example, the table above would output a view like this:
id
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
...
0
259190
0
30
60
...

select *
from crosstab(
'select todo_id, id, time_in_status
from t'
)
as t(todo_id int, "time_in_column_3" int, "time_in_column_10300" int, "time_in_column_10001" int, "time_in_column_10600" int, "time_in_column_6" int )
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
time_in_column_10600
time_in_column_6
259190
0
30
60
90
30
Fiddle

Related

SAS Proc SQL - ranking top nth (3rd) highest for a group of say universities and their price? (HW to be honest)

(this is homework, not going to lie)
I have an ANSI SQL query I wrote
this produces
the required
3rd highest prices correctly,
table sample is
select unique uni, price
from
(
(
select unique uni, price
from
(
select unique uni, price
from table1
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
now i need to list the 1st, 2nd and 3rd into one table but make is such that it could be used nth times.
example:
Col1 Col2
uni1 10
uni1 20
uni2 20
uni2 10
uni3 30
uni3 20
uni1 30
/sorry for the formatting i havent been here for a very long time, i appreciate any assistance, i will supply a link to the uni of which i have asked the tutor if i can do so he said yes but not the whole code, something like 10%, but anyways./
In SAS you can use the proprietary option OUTOBS to restrict how many rows of a result set are output.
Example:
Use OUTOBS=3 to create top 3 table. Then use that table in a subsequent query.
data have;
input x ##; datalines;
10 9 8 7 6 5 4 3 2 1 0
;
proc sql;
reset outobs=3;
create table top3x as
select * from have
order by x descending;
reset outobs=max;
* another query;
quit;

SQL percentage usage calculation using 2 columns

Trying to get the percentage usage for a report based on the following columns:
Dept Ext Sec1 Sec2 StartDate EndDate
---------------------------------------------------------------
1 1234 5 5 2017-05-01:08:00:00 2017-05-04:08:00:10
2 1230 8 8 2017-05-01:09:10:00 2017-05-04:09:10:11
1 1234 15 15 2017-05-02:08:01:00 2017-05-04:08:01:20
I need to display the percentage time the user spent on the phone, based on the total seconds in Sec1, for the time period. If needs be, I can create a 3rd column with the percentage total as part of the creation job (the final table is generated form a join query of 2 other tables). Thanks
I had to add these lines to my creatDB query to get the right results:
alter table compinfo.dbo.pabxreport add TotalSec Int
alter table compinfo.dbo.pabxreport add TotalPer Decimal(14,8)
update compinfo.dbo.pabxreport
set TotalSec= (
select sum(billsec1) from pabxreport)
update compinfo.dbo.pabxreport
set TotalPer= (billsec1 * 100.00 / Totalsec)

Subtitute Parts/Rows Postgresql

I want to create relationship in Postgresql that allows me to have ‘substitute’ parts. For example, parts with id 1, 2 and 4 are substitutes.
One way to do this would be to setup a sub_id field and fill that with the substitute part’s id. But as you can see from the dataset below, a simple query:
SELECT *
FROM part
WHERE sub_id = 1
would not return all the substitute parts for part AA (part A2 would be missed). How can I ensure all the substitute parts are catered for?
I know that if the sub_id was 1 for part A2, all would be good. However, it’s possible that in real world usage the users end up making a mistake and that would return the wrong result.
id part sub_id
1 AA 1
2 A 1
3 B NULL
4 A2 2
You can try a different approach.
I don't know what your needs are but this is a general idea:
Set up a table that define a part "group":
partgid, part type, partg name
1 A Engine for A319
2 B Wheel for A319
code:
Create Table partgroup
(
partgid int,
parttype text,
partgname text
);
then define parts in the group:
partid partgid partname manufacturerid
12 1 Engine X319 800
13 1 Engine XL319 800
14 1 Engine XFR319 784
15 2 Wheel F1111 341
code:
Create Table parts
(
partid int,
partgid int,
partname text,
manufacturerid int
);
then you can access all parts in a specific group in a simple query:
By ID:
select *
from parts
where partgid='ID'
By name:
select *
from parts
left join partgroup on (partgroup.partgid=parts.partgid)
where partgroup.parttype= 'TYPE NAME'

How to consolidate blocks of time?

I have a derived table with a list of relative seconds to a foreign key (ID):
CREATE TABLE Times (
ID INT
, TimeFrom INT
, TimeTo INT
);
The table contains mostly non-overlapping data, but there are occasions where I have a TimeTo < TimeFrom of another record:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 10 | 30 |
| 10 | 50 | 70 |
| 10 | 60 | 150 |
| 10 | 75 | 150 |
| .. | ... | ... |
+----+----------+--------+
The result set is meant to be a flattened linear idle report, but with too many of these overlaps, I end up with negative time in use. I.e. If the window above for ID = 10 was 150 seconds long, and I summed the differences of relative seconds to subtract from the window size, I'd wind up with 150-(20+20+90+75)=-55. This approach I've tried, and is what led me to realizing there were overlaps that needed to be flattened.
So, what I'm looking for is a solution to flatten the overlaps into one set of times:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 10 | 30 |
| 10 | 50 | 150 |
| .. | ... | ... |
+----+----------+--------+
Considerations: Performance is very important here, as this is part of a larger query that will perform well on it's own, and I'd rather not impact its performance much if I can help it.
On a comment regarding "Which seconds have an interval", this is something I have tried for the end result, and am looking for something with better performance. Adapted to my example:
SELECT SUM(C.N)
FROM (
SELECT A.N, ROW_NUMBER()OVER(ORDER BY A.N) RowID
FROM
(SELECT TOP 60 1 N FROM master..spt_values) A
, (SELECT TOP 720 1 N FROM master..spt_values) B
) C
WHERE EXISTS (
SELECT 1
FROM Times SE
WHERE SE.ID = 10
AND SE.TimeFrom <= C.RowID
AND SE.TimeTo >= C.RowID
AND EXISTS (
SELECT 1
FROM Times2 D
WHERE ID = SE.ID
AND D.TimeFrom <= C.RowID
AND D.TimeTo >= C.RowID
)
GROUP BY SE.ID
)
The problem I have with this solution is I have get a Row Count Spool out of the EXISTS query in the query plan with a number of executions equal to COUNT(C.*). I left the real numbers in that query to illustrate that getting around this approach is for the best. Because even with a Row Count Spool reducing the cost of the query by quite a bit, it's execution count increases the cost of the query as a whole by quite a bit as well.
Further Edit: The end goal is to put this in a procedure, so Table Variables and Temp Tables are also a possible tool to use.
OK. I'm still trying to do this with just one SELECT. But This totally works:
DECLARE #tmp TABLE (ID INT, GroupId INT, TimeFrom INT, TimeTo INT)
INSERT INTO #tmp
SELECT ID, 0, TimeFrom, TimeTo
FROM Times
ORDER BY Id, TimeFrom
DECLARE #timeTo int, #id int, #groupId int
SET #groupId = 0
UPDATE #tmp
SET
#groupId = CASE WHEN id != #id THEN 0
WHEN TimeFrom > #timeTo THEN #groupId + 1
ELSE #groupId END,
GroupId = #groupId,
#timeTo = TimeTo,
#id = id
SELECT Id, MIN(TimeFrom), Max(TimeTo) FROM #tmp
GROUP BY ID, GroupId ORDER BY ID
Left join each row to its successor overlapping row on the same ID value (where such exist).
Now for each row in the result-set of LHS left join RHS the contribution to the elapsed time for the ID is:
isnull(RHS.TimeFrom,LHS.TimeTo) - LHS.TimeFrom as TimeElapsed
Summing these by ID should give you the correct answer.
Note that:
- where there isn't an overlapping successor row the calculation is simply
LHS.TimeTo - LHS.TimeFrom
- where there is an overlapping successor row the calculation will net to
(RHS.TimeFrom - LHS.TimeFrom) + (RHS.TimeTo - RHS.TimeFrom)
which simplifies to
RHS.TimeTo - LHS.TimeFrom
What about something like below (assumes SQL 2008+ due to CTE):
WITH Overlaps
AS
(
SELECT t1.Id,
TimeFrom = MIN(t1.TimeFrom),
TimeTo = MAX(t2.TimeTo)
FROM dbo.Times t1
INNER JOIN dbo.Times t2 ON t2.Id = t1.Id
AND t2.TimeFrom > t1.TimeFrom
AND t2.TimeFrom < t1.TimeTo
GROUP BY t1.Id
)
SELECT o.Id,
o.TimeFrom,
o.TimeTo
FROM Overlaps o
UNION ALL
SELECT t.Id,
t.TimeFrom,
t.TimeTo
FROM dbo.Times t
INNER JOIN Overlaps o ON o.Id = t.Id
AND (o.TimeFrom > t.TimeFrom OR o.TimeTo < t.TimeTo);
I do not have a lot of data to test with but seems decent on the smaller data sets I have.
I also wrapped by head around this issue - and afterall I found, that the problem is your data.
You claim (if i get that right), that these entries should reflect the relative times, when a user goes idle / comes back.
So, you should consider to sanitize your data and refactor your inserts to produce valid data sets.
For instance, the two lines:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 50 | 70 |
| 10 | 60 | 150 |
how can it be possible that a user is idle until second 70, but goes idle on second 60? This already implies, that he has been back latest at around second 59.
I can only assume that this issue comes from different threads and/or browser windows (tabs) a user might be using your application with. (Each having it's own "idle detection")
So instead of working-around the symptoms - you should fix the cause! Why is this data entry inserted into the table? You could avoid this by simple checking, if the user is already idle before inserting a new row.
Create a unique key constraint on ID and TimeTo
Whenever an idle-event is detected, execute the following query:
INSERT IGNORE INTO Times (ID,TimeFrom,TimeTo)VALUES('10', currentTimeStamp, -1);
-- (If the user is already "idle" - nothing will happen)
Whenever an comeback-event is detected, execute the following query:
UPDATE Times SET TimeTo=currentTimeStamp WHERE ID='10' and TimeTo=-1
-- (If the user is already "back" - nothing will happen)
The fiddle linked here: http://sqlfiddle.com/#!2/dcb17/1 would reproduce the chain of events for your example, but resulting in a clean and logical set of idle-windows:
ID TIMEFROM TIMETO
10 10 30
10 50 70
10 75 150
Note: The Output is slightly different from the output you desired. But I feel that this is more accurate, cause of the reason outlined above: A user cannot go idle on second 70 without returning from it's current idle state before. He either STAYS idle (and a second thread/tab runs into the idle-event) Or he returned in between.
Especially for your need to maximize performance, you should fix the data and not invent a work-around-query. This is maybe 3 ms upon inserts, but could be worth 20 seconds upon select!
Edit: if Multi-Threading / Multiple-Sessions is the cause for the wrong insert, you would also need to implement a check, if most_recent_come_back_time < now() - idleTimeout - otherwhise a user might comeback on tab1, and is recorded idle on tab2 after a few seconds, cause tab2 did run into it's idle timeout, cause the user only refreshed tab1.
I had the 'same' problem once with 'days' (additionaly without counting WE and Holidays)
The word counting gave me the following idea:
create table Seconds ( sec INT);
insert into Seconds values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9), ...
select count(distinct sec) from times t, seconds s
where s.sec between t.timefrom and t.timeto-1
and id=10;
you can cut the start to 0 (I put the '10' here in braces)
select count(distinct sec) from times t, seconds s
where s.sec between t.timefrom- (10) and t.timeto- (10)-1
and id=10;
and finaly
select count(distinct sec) from times t, seconds s,
(select min(timefrom) m from times where id=10) as m
where s.sec between t.timefrom-m.m and t.timeto-m.m-1
and id=10;
additionaly you can "ignore" eg. 10 seconds by dividing you loose some prezition but earn speed
select count(distinct sec)*d from times t, seconds s,
(select min(timefrom) m from times where id=10) as m,
(select 10 d) as d
where s.sec between (t.timefrom-m)/d and (t.timeto-m)/d-1
and id=10;
Sure it depends on the range you have to look at, but a 'day' or two of seconds should work (although i did not test it)
fiddle ...

SUM by Combination in SAS

I want to get from this table:
[ProductCode] [ClientNO] [Fund]
11 3 100
12 4 45
11 3 18
12 4 5
To this one:
[ProductCode] [ClientNO] [Fund]
11 3 118
12 4 50
So basically sum FUND when all the given variables match.
I'm almost there with this statement:
Proc sql;
create table SumByCombination as
select *, sum(Fund) as Total
from FundsData
group by ProductCode,ClientNO
;
quit;
But with this I get all the rows (duplicates) with a SUM column.
Edit: This is what I get.
[ProductCode] [ClientNO] [_SUM_]
11 3 118
12 4 50
11 3 118
12 4 50
I know this should be a no-brainer but I keep getting stuck.
What would be the easiest way to do this in Proc SQL ? What about other methods ?
Thanks
Stop using SELECT * in your queries. You should explicitly identify the columns that you want the SELECT to return.
Select * is nasty and evil and should very very rarely, if ever, be used.
Here is the SQL Fiddle, which returns your expected result
select ProductCode
,ClientNO
,sum(Fund) as Total
from FundsData
group by
ProductCode
,ClientNO
You're using SAS, so do it the SAS way - PROC MEANS.
proc means data=fundsdata;
var fund;
class productcode clientno;
types productcode*clientno;
output out=sumbycombination sum(fund)=fund;
run;