Group ranges in table with non-time-based start and end columns - sql

I need to find the ranges in consecutive data points that may have gaps between them based on double precision start and end columns.
For simplicity let's call them startPoint and endPoint which track a position on a line. The difference between the endPoint and the startPoint would denominate a distance. Over this "distance" particular force/effect signal values are captured and based on the values a state is stored on the table. Each row has a unique id identifier.
Thus, the table looks like the following:
| id | startPoint | endPoint | state |
|----|------------|----------|----------|
| 1 | 0.0 | 5.8 | Active |
| 2 | 5.8 | 7.1 | Inactive |
| 3 | 7.5 | 10.2 | Inactive |
| 4 | 10.2 | 11.3 | Inactive |
| 5 | 11.6 | 12.1 | Active |
| 6 | 12.1 | 12.9 | Active |
I have struggled to come up with a query that works in PostgresSQL that yields the following result:
| startGap | endGap | state |
|------------|----------|----------|
| 0.0 | 5.8 | Active |
| 5.8 | 7.1 | Inactive |
| 7.5 | 11.3 | Inactive |
| 11.6 | 12.9 | Active |
Any help would be greatly appreciated.

Hmmm . . . You can identify where a group starts using lag() and then use cumulative sums:
select min(startPoint) as startPoint, max(endPoint) as endPoint, state
from (select t.*,
sum( (prev_endPoint is distinct from startPoint)::int) over () as grp
from (select t.*,
lag(endPoint) over (partition by state order by startPoint) as prev_endPoint
from t
) t
) t
group by state, grp;
To be honest, floating point numbers are rather dangerous, because two could look the same. The sum() defining grp is probably better written as:
sum( (abs(prev_endPoint - startPoint) > 0.001)::int) over () as grp
I would also suggest that you switch to fixed point representation (numeric) rather than floating point.

Related

PostgreSQL: update table ensuring rows have unique timestamp (no duplicate unix timestamp)

I am having a table of GPS traces with Unix timestamp as shown below:
SELECT * FROM mytable LIMIT 10;
id | lat | lon | seconds | speed
-----------+------------+------------+------------+-------
536889001 | 41.1794675 | -8.6017187 | 1460465697 | 1.25
536889001 | 41.1794709 | -8.601675 | 1460465698 | 2
536889001 | 41.1794636 | -8.6016337 | 1460465700 | 1.25
536889001 | 41.1794468 | -8.6016014 | 1460465700 | 2.5
536889001 | 41.1794114 | -8.6015662 | 1460465701 | 3.5
536889001 | 41.1794376 | -8.6015672 | 1460465703 | 1.5
536889001 | 41.17944 | -8.6015516 | 1460465703 | 1.5
536889001 | 41.1794315 | -8.6015353 | 1460465704 | 1.5
536889001 | 41.1794367 | -8.6015156 | 1460465705 | 1.25
536889001 | 41.1794337 | -8.6014974 | 1460465706 | 1.75
(10 rows)
Column seconds is the Unix timestamp. I would like to update the table by selecting ONLY one row, for rows with timestamps logged more than 1. So for example in above, we see two rows each at timestamp 1460465700 and 1460465703.
Without a unique id on the row, this is tricky. But assuming that the combination of values is unique, you can use:
update gps
set . . .
from (select gps.*, count(*) over (partition by id, seconds) as cnt,
row_number() over (partition by id, seconds order by seconds) as seqnum
from gps
) gps2
where gps2.cnt > 1 and pgs2.seqnum = 1 and
gps2.seconds = pgs.seconds and
gps2.id = gps.id and
gps2.speed = gps.speed and
gps2.lat = gps.lat and
gps2.lon = gps.lon ;
I would advise you to add a unique id to the table, so this is much simpler (and guaranteed to work even if the table has duplicates).

How to find two consecutive rows sorted by date, containing a specific value?

I have a table with the following structure and data in it:
| ID | Date | Result |
|---- |------------ |-------- |
| 1 | 30/04/2020 | + |
| 1 | 01/05/2020 | - |
| 1 | 05/05/2020 | - |
| 2 | 03/05/2020 | - |
| 2 | 04/05/2020 | + |
| 2 | 05/05/2020 | - |
| 2 | 06/05/2020 | - |
| 3 | 01/05/2020 | - |
| 3 | 02/05/2020 | - |
| 3 | 03/05/2020 | - |
| 3 | 04/05/2020 | - |
I'm trying to write an SQL query (I'm using SQL Server) which returns the date of the first two consecutive negative results for a given ID.
For example, for ID no. 1, the first two consecutive negative results are on 01/05 and 05/05.
The first two consecutive results for ID No. 2 are on 05/05 and 06/05.
The first two consecutive negative results for ID No. 3 are on on 01/05 and 02/05 .
So the query should produce the following result:
| ID | FirstNegativeDate |
|---- |------------------- |
| 1 | 01/05 |
| 2 | 05/05 |
| 3 | 01/05 |
Please note that the dates aren't necessarily one day apart. Sometimes, two consecutive negative tests may be several days apart. But they should still be considered as "consecutive negative tests". In other words, two negative tests are not 'consecutive' only if there is a positive test result in between them.
How can this be done in SQL? I've done some reading and it looks like maybe the PARTITION BY statement is required but I'm not sure how it works.
This is a gaps-and-island problem, where you want the start of the first island of '-'s that contains at least two rows.
I would recommend lead() and aggregation:
select id, min(date) first_negative_date
from (
select t.*, lead(result) over(partition by id order by date) lead_result
from mytable t
) t
where result = '-' and lead_result = '-'
group by id
Use LEAD or LAG functions over ID partition ordered by your Date column.
Then simple check where LEAD/LAG column is equal to Result.
You'll need also to filter the top ones.
The image attached just shows what LEAD/LAG would return

Query'd top 15 faults, need the accumulated downtime from another column

I'm currently trying to query up a list of the top 15 occurring faults on a PLC in the warehouse. I've gotten that part down:
Select top 15 fault_number, fault_message, count(*) FaultCount
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc
HOOOWEVER I now need to find out the accumulated downtime of said faults in the top 15 list, information in another column "Fault_duration". How would I go about doing this? Thanks in advance, you've all helped me so much already.
+--------------+---------------------------------------------+------------+
| Fault Number | Fault Message | FaultCount |
+--------------+---------------------------------------------+------------+
| 122 | ST10: Part A&B Failed | 23 |
| 4 | ST16: Part on Table B | 18 |
| 5 | ST7: No Spring Present on Part A | 15 |
| 6 | ST7: No Spring Present on Part B | 12 |
| 8 | ST3: No Pin Present B | 8 |
| 1 | ST5: No A Housing | 5 |
| 71 | ST4: Shuttle Right Not Loaded | 4 |
| 144 | ST15: Vertical Cylinder did not Retract | 3 |
| 98 | ST8: Plate Loader Can not Retract | 3 |
| 72 | ST4: Shuttle Left Not Loaded | 2 |
| 94 | ST8: Spring Gripper Cylinder did not Extend | 2 |
| 60 | ST8: Plate Loader Can not Retract | 1 |
| 83 | ST6: No A Spring Present | 1 |
| 2 | ST5: No B Housing | 1 |
| 51 | ST4: Vertical Cylinder did not Extend | 1 |
+--------------+---------------------------------------------+------------+
I know I wouldn't be using the same query, but I'm at a loss at how to do this next step.
Fault duration is a column which dictates how long the fault lasted in ms. I'm trying to have those accumulated next to the corresponding fault. So the first offender would have those 23 individual fault occurrences summed next to it, in another column.
You should be able to use the SUM accumulator:
Select top 15 fault_number, fault_message, count(*) FaultCount, SUM (Fault_duration) as FaultDuration
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc

How to optimize nested innner hive query

I have a table with following stock data where we have couple of columns like date, ticker, open and close(stock prices).
To query this data, I want to know which stock has given the highest margin on particular date. So if I have 516 different stocks, my query should return 516 rows of ticker, date, open, close and a new column Margin(which will be max(close-open)).
| deep_stocks.date_ | deep_stocks.ticker | deep_stocks.open | deep_stocks.close |
+--------------------+---------------------+-------------------+--------------------+--+
| 20100721 | A | 27.68 | 27.58 |
| 20100722 | A | 27.95 | 28.72 |
| 20100723 | A | 28.56 | 29.3 |
| 20100726 | A | 29.22 | 29.64 |
| 20100727 | A | 29.73 | 28.87 |
| 20100728 | A | 28.79 | 28.78 |
| 20100729 | A | 28.97 | 28.15 |
| 20100730 | A | 27.78 | 27.93 |
| 20100802 | A | 28.35 | 28.82 |
| 20100803 | A | 28.7 | 27.84 |
I have written a query where my approach was:
Step 1 - Get the difference between Close and Open prices (Inner/Sub query)
Step 2 - Get the maximum of margin for every stock (used group by with max function)
Step 3 - Join the results with Main Table and get the data.
I'll put my query in solution or comments can someone please correct it as it is taking more time. Also I would like to know can we have any other alternative approach.
As already told about my approach please find below query:
SELECT ds.ticker, ds.date_, ds.close, ds.open, ds.Margin FROM
(SELECT ticker, date_, close, open, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks) ds
JOIN
(SELECT dsIn.ticker, max(dsIn.Margin) mxMargin FROM
(select ticker, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks ) dsIn group by dsIn.ticker) dsEx
ON ds.ticker=dsEx.ticker AND ds.Margin=dsEx.mxMargin ORDER BY ds.Margin;
Do we have any other alternatives for this query or can it be possible to optimize it.

Apply Limit for a Condition

I have a query that returns the credit notes (CN) and debit notes (DN) of an operation, each CN is accompanied by two or more DN (referenced by the field payment_plan_id). At the time of paging, for example I must bring 10 operations, that is 10 CN and their DN, but if I leave the limit at 10, it will also count the debit notes of the transaction that I must return in the query. So, it will only bring me 2, 3 or 4 operations depending on the number of DNs that accompany the credit note.
SELECT
value, installment, payment_plan_id, model,
creation_date, operation
FROM payment_plant
WHERE model != 'IMMEDIATE'
AND operation IN ('CN', 'DN')
AND creation_date BETWEEN '2017-06-12' AND '2017-07-12 23:59:59'
ORDER BY
model,
creation_date,
operation
LIMIT 10
OFFSET 1
Example of the table obviating some fields:
| id | payment_plan_id | value | installment | operation |
---------------------------------------------------------
| 1 | b3cdaede | 12 | 1 | NC |
| 2 | b3cdaede | 3.5 | 1 | ND |
| 3 | b3cdaede | 1.2 | 1 | ND |
| 4 | e1d7f051 | 36 | 1 | NC |
| 5 | e1d7f051 | 5.9 | 1 | ND |
| 6 | 00e6a0b4 | 15 | 1 | NC |
| 7 | 00e6a0b4 | 1 | 1 | ND |
| 8 | 00e6a0b4 | 3.6 | 1 | ND |
How can I limit the Limit so that it only consider the NCs?
Well, the query you give above doesn't do remotely what you describe. Assuming you actually want "the last 10 CN and their DN". You also don't explain what fields CN and DN have in common, so I'm going to assume that the fields are payment_plan_id and installment. Given that here's how you would get it:
WITH last_10_cn AS (
SELECT
value, installment, payment_plan_id, model,
creation_date
FROM payment_plant
WHERE model != 'IMMEDIATE'
AND operation = 'CN'
AND creation_date BETWEEN '2017-06-12' AND '2017-07-12 23:59:59'
ORDER BY
model,
creation_date,
operation
LIMIT 10
OFFSET 1 )
SELECT last_10_cn.*,
dn.value as dn_value, dn.model as dn_model,
dn.creation_date as dn_creation_date
FROM last_10_cn JOIN payment_plant as dn
ON last_10_cn.payment_plan_id = dn.payment_plan_id
AND last_10_cn.installment = dn.installment
ORDER BY
last_10_cn.model,
last_10_cn.creation_date,
last_10_cn.operation
dn.creation_date;
Adjust the above according to the actual join conditions and how you really want things to be sorted.
BTW, your table structure is what's giving you trouble here. DNs should really be a separate table with a foreign key to CNs. I realize that's not how most GLs do it, but the GL model predates relational databases.