Verifying a cycle of values in table - sql

I have a table which looks like this (the real table has dates and time in place of the Letters):
| assigned | start | end
| xyz | A | B
| xyz | B | C
| xyz | C | D
| xyz | D | E
| xyz | E | F
| fgh | A | B
| fgh | B | C
etc.
There is a rotation with each assigned code (xyz,fgh and so on) where 'end' is congruent with the next 'start' up to a value indicating a defined end (here 'F').
I am looking for a statement which scans/verifys that this rotation is indeed occurring, that it starts at A and ends with F and did every step up until then.
Any help is greatly appreciated.
edit: The rotation always uses 5 rows (or 4 steps), even if the intervall length can change in between.

This is really a hack that works because the dates are replaced by characters, but it might give you ideas on how to make it work for real.
select * from (
select a_code, min(a_start) as thestart, max(a_end) as theend,
substring(group_concat(a_start order by a_start), 3) as starts,
substring(group_concat(a_end order by a_end), 1, length(group_concat(a_end))-2) as ends
from so_test
group by a_code ) as grpSelect
where thestart = 'a'
and theend = 'f'
and starts = ends
The group_concat of a_start for xyz prduces a string of 'a,b,c,d,e' while the group_concat for a_end prduces b,c,d,e,f. The substring removes the a from the start and the f from the end so that the outer query can compare b,c,d,e in both strings.

Related

PostgreSQL Compare value from row to value in next row (different column)

I have a table of encounters called user_dates that is ordered by 'user' and 'start' like below. I want to create a column indicating whether an encounter was followed up by another encounter within 30 days. So basically I want to go row by row checking if "encounter_stop" is within 30 days of "encounter_start" in the following row (as long as the following row is the same user).
user | encounter_start | encounter_stop
A | 4-16-1989 | 4-20-1989
A | 4-24-1989 | 5-1-1989
A | 6-14-1993 | 6-27-1993
A | 12-24-1999 | 1-2-2000
A | 1-19-2000 | 1-24-2000
B | 2-2-2000 | 2-7-2000
B | 5-27-2001 | 6-4-2001
I want a table like this:
user | encounter_start | encounter_stop | subsequent_encounter_within_30_days
A | 4-16-1989 | 4-20-1989 | 1
A | 4-24-1989 | 5-1-1989 | 0
A | 6-14-1993 | 6-27-1993 | 0
A | 12-24-1999 | 1-2-2000 | 1
A | 1-19-2000 | 1-24-2000 | 0
B | 2-2-2000 | 2-7-2000 | 1
B | 5-27-2001 | 6-4-2001 | 0
You can select..., exists <select ... criteria>, that would return a boolean (always true or false) but if really want 1 or 0 just cast the result to integer: true=>1 and false=>0. See Demo
select ts1.user_id
, ts1.encounter_start
, ts1. encounter_stop
, (exists ( select null
from test_set ts2
where ts1.user_id = ts2.user_id
and ts2.encounter_start
between ts1.encounter_stop
and (ts1.encounter_stop + interval '30 days')::date
)::integer
) subsequent_encounter_within_30_days
from test_set ts1
order by user_id, encounter_start;
Difference: The above (and demo) disagree with your expected result:
B | 2-2-2000 | 2-7-2000| 1
subsequent_encounter (last column) should be 0. This entry starts and ends in Feb 2000, the other B entry starts In May 2001. Please explain how these are within 30 days (other than just a simple typo that is).
Caution: Do not use user as a column name. It is both a Postgres and SQL Standard reserved word. You can sometimes get away with it or double quote it. If you double quote it you MUST always do so. The big problem being it has a predefined meaning (run select user;) and if you forget to double quote is does not necessary produce an error or exception; it is much worse - wrong results.

Postgresql: Dynamic Regex Pattern

I have event data that looks like this:
id | instance_id | value
1 | 1 | a
2 | 1 | ap
3 | 1 | app
4 | 1 | appl
5 | 2 | b
6 | 2 | bo
7 | 1 | apple
8 | 2 | boa
9 | 2 | boat
10 | 2 | boa
11 | 1 | appl
12 | 1 | apply
Basically, each row is a user typing a new letter. They can also delete letters.
I'd like to create a dataset that looks like this, let's call it data
id | instance_id | value
7 | 1 | apple
9 | 2 | boat
12 | 1 | apply
My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.
To do so, I was planning to do a regex operation like so:
select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))
Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.
The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?
Try simple LIKE operator instead of regex patterns:
SELECT * FROM data d1
WHERE NOT EXISTS (
SELECT * FROM data d2
WHERE d2.value LIKE d1.value ||'_%'
)
Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39
Create an index on value column, this should speed up the query a bit.
To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:
with cte as (
select
*,
length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
from data)
select * from cte where is_peak order by id;
Demo

sql - Postgresql Retrieving distinct row

In the table below.. I am supposed to retrieve all row where the deleted is false and disabled is true and a distinct phrase.. If the phrase isn't the only one in the table (for example the "bad" word).. I must return the one with the device_id.. If it is only one in the table, I must return it even if the device_id is blank..
id | device_id | phrase | disabled | deleted |
----+-----------+---------+----------+---------+
2 | 1 | WTF | f | f |
3 | 1 | White | f | f |
4 | | WTF | f | f |
5 | | wTf | f | f |
6 | 2 | fck | f | f |
7 | 1 | damn | f | f |
8 | 1 | bitch | f | f |
9 | 1 | crap | f | f |
1 | 1 | Shit | t | t |
10 | 1 | ass | f | f |
11 | | bad | f | f |
12 | 1 | bad | t | f |
13 | 1 | badshit | f | f |
What I've done is this query and returns what I've expected.. (for example, the return is only 1 "bad" word with device_id = 1)
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
But when add a keyword search for example the "bad"..
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
The return is "badshit" (ok) and "bad" (but the device_id is null).. My expected is that the "bad" word's device_id is 1..
I'm kind of new to postgresql.. Thanks!
I already fixed this error 9 months ago but was too busy to post it here.
Here's my answer:
order by phrase, device_id
either:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is not null)
order by phrase;
or:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase = 'bad' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
first if you want to only retrieve records without null values in device. second if you want to retrieve records with exact phrase bad.
where phrase like '%bad%'
specifically asks postgres to return both bad and bad****, because they are both 'like' bad.
On another note, clean up your post before asking for help.
Nevermind, I fixed it by adding device_id:
order by phrase;
into
order by phrase, device_id;
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example:
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group
for your case use below code as you want device_id=1
select distinct on (phrase) phrase, id, device_id, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
device_id = 1
order by phrase,device_id;

SQL pairing adjacent rows

Using postgreSQL (latest). I'm a total noob.
I have a view that always gives me a table of an even number of rows- no duplicates (the letters are analogous to unique keys) and no nulls, let's call it letter_view:
| letter |
|-------------|
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| H |
My view already uses an ORDER BY clause so the table is pre-sorted.
What I'm trying to do is merge every two rows into a single row
with each value from those two rows. So for n rows, I need the result set to have
n / 2 rows with combined adjacent rows.
| l1 | l2 |
|-------|------|
| A | B |
| C | D |
| E | F |
| G | H |
I've tried using lead and I think I'm close but I can't quite get it in the format I need.
My best query attempt looks like this:
SELECT letter AS letter_1, lead(letter, 1) OVER (PARTITION BY 2) AS letter_2 from letter_view;
but I get:
letter_1 | letter_2
----------+----------
A | B
B | C <--- Don't need this
C | D
D | E <--- Don't need this
E | F
F | G <--- Don't need this
G | H
H | <--- Don't need this
(8 rows)
I checked several other answers on SO, and looked through
the PostgreSQL docs and w3C SQL tutorials but I can't find a succinct answer.
What is this technique called and how would I do it?
I'm trying to do this in pure SQL if possible.
I know I could use multiple queries with LIMIT and OFFSET to get the data with multiple selects or potentially by using a cursor but that seems very inefficient for large input sets although I could be totally wrong. Again, total noob.
Any help in the right direction is highly appreciated.
You can use lead() to get the next value . . . but you need a way to filter as well. I would suggest row_number():
select letter_1, letter_2
from (select letter AS letter_1,
lead(letter, 1) OVER (PARTITION BY 2 order by ??) AS letter_2,
row_number() over (partition by 2 order by ??) as seqnum
from letter_view
) lv
where seqnum % 2 = 1;
Notes:
I included the partition clause as you have in the original code. I don't know what "2" refers to.
You should be explicit about the order by. It is not wise to depend on the ordering of some underlying table or view.

How do I do a WHERE NOT IN for Hierarchical data?

I have a table that is a list of paths between points. I want to create a query to return a list with pointID and range(number of point) from a given point. But have spent a day trying to figure this out and haven't go any where, does any one know how this should be done? ( I am writing this for MS-SQL 2005)
example
-- fromPointID | toPointID |
---------------|-----------|
-- 1 | 2 |
-- 2 | 1 |
-- 1 | 3 |
-- 3 | 1 |
-- 2 | 3 |
-- 3 | 2 |
-- 4 | 2 |
-- 2 | 4 |
with PointRanges ([fromPointID], [toPointID], [Range])
AS
(
-- anchor member
SELECT [fromPointID],
[toPointID],
0 AS [Range]
FROM dbo.[Paths]
WHERE [toPointID] = 1
UNION ALL
-- recursive members
SELECT P.[fromPointID],
P.[toPointID],
[Range] + 1 AS [Range]
FROM dbo.[Paths] AS P
INNER JOIN PointRanges AS PR ON PR.[toPointID] = P.[fromPointID]
WHERE [Range] < 5 -- This part is just added to limit the size of the table being returned
--WHERE P.[fromPointID] NOT IN (SELECT [toPointID] FROM PointRanges)
--Cant do the where statment I want to because it wont allow recurssion in the sub query
)
SELECT * FROM PointRanges
--Want this returned
-- PointID | Range |
-----------|-------|
-- 1 | 0 |
-- 2 | 1 |
-- 3 | 1 |
-- 4 | 2 |
Markus Jarderot's link gives a good answer for this. I end tried using his answer and also tried rewriting my problem in C# and linq but it ended up being more of a mathematical problem than a coding problem because I had a table with several thousands of point that interlinked. This is still something I am interested in and trying to get a better understanding of by reading books on mathematics and graph theory but if anyone else runs into this problem I think Markus Jarderot's link is the best answer you will find.