PostgreSQL Compare value from row to value in next row (different column) - sql

I have a table of encounters called user_dates that is ordered by 'user' and 'start' like below. I want to create a column indicating whether an encounter was followed up by another encounter within 30 days. So basically I want to go row by row checking if "encounter_stop" is within 30 days of "encounter_start" in the following row (as long as the following row is the same user).
user | encounter_start | encounter_stop
A | 4-16-1989 | 4-20-1989
A | 4-24-1989 | 5-1-1989
A | 6-14-1993 | 6-27-1993
A | 12-24-1999 | 1-2-2000
A | 1-19-2000 | 1-24-2000
B | 2-2-2000 | 2-7-2000
B | 5-27-2001 | 6-4-2001
I want a table like this:
user | encounter_start | encounter_stop | subsequent_encounter_within_30_days
A | 4-16-1989 | 4-20-1989 | 1
A | 4-24-1989 | 5-1-1989 | 0
A | 6-14-1993 | 6-27-1993 | 0
A | 12-24-1999 | 1-2-2000 | 1
A | 1-19-2000 | 1-24-2000 | 0
B | 2-2-2000 | 2-7-2000 | 1
B | 5-27-2001 | 6-4-2001 | 0

You can select..., exists <select ... criteria>, that would return a boolean (always true or false) but if really want 1 or 0 just cast the result to integer: true=>1 and false=>0. See Demo
select ts1.user_id
, ts1.encounter_start
, ts1. encounter_stop
, (exists ( select null
from test_set ts2
where ts1.user_id = ts2.user_id
and ts2.encounter_start
between ts1.encounter_stop
and (ts1.encounter_stop + interval '30 days')::date
)::integer
) subsequent_encounter_within_30_days
from test_set ts1
order by user_id, encounter_start;
Difference: The above (and demo) disagree with your expected result:
B | 2-2-2000 | 2-7-2000| 1
subsequent_encounter (last column) should be 0. This entry starts and ends in Feb 2000, the other B entry starts In May 2001. Please explain how these are within 30 days (other than just a simple typo that is).
Caution: Do not use user as a column name. It is both a Postgres and SQL Standard reserved word. You can sometimes get away with it or double quote it. If you double quote it you MUST always do so. The big problem being it has a predefined meaning (run select user;) and if you forget to double quote is does not necessary produce an error or exception; it is much worse - wrong results.

Related

Get the row with latest start date from multiple tables using sub select

I have data from 3 tables as copied below . I am not using joins to get data. I dont know how to use joins for multiple tables scenario. My situation is to update the OLD(eff_start_ts) date rows to sydate in one of the tables when we find the rows returned for a particular user is more than 2. enter code here
subscription_id |Client_id
----------------------------
20685413 |37455837
reward_account_id|subscription_id |CURRENCY_BAL_AMT |CREATE_TS |
----------------------------------------------------------------------
439111697 | 20685413 | -40 |1-09-10 |
REWARD_ACCT_DETAIL_ID|REWARD_ACCOUNT_ID |EFF_START_TS |EFF_STOP_TS |
----------------------------------------------------------------------
230900968 | 439111697 | 14-06-11 | 15-01-19
47193932 | 439111697 | 19-02-14 | 19-12-21
243642632 | 439111697 | 18-03-23 | 99-12-31
247192972 | 439111697 | 17-11-01 | 17-11-01
The SQL should update the EFF_STOP_TS of last table except the second row - 47193932 bcz that has the latest EFF_START_TS.
Expected result is to update the EFF_STOP_TS column of 230900968, 243642632 and 247192972 to sysdate.
As per my understanding, You need to update it per REWARD_ACCOUNT_ID. So, You can try the below code -
UPDATE REWARD_ACCT_DETAIL RAD
SET EFF_STOP_TS = SYSDATE
WHERE EFF_START_TS NOT IN (SELECT MAX(EFF_START_TS)
FROM REWARD_ACCT_DETAIL RAD1
WHERE RAD.REWARD_ACCOUNT_ID = RAD1.REWARD_ACCOUNT_ID)

Filter json values regardless of keys in PostgreSQL

I have a table called diary which includes columns listed below:
| id | user_id | custom_foods |
|----|---------|--------------------|
| 1 | 1 | {"56": 2, "42": 0} |
| 2 | 1 | {"19861": 1} |
| 3 | 2 | {} |
| 4 | 3 | {"331": 0} |
I would like to count how many diaries having custom_foods value(s) larger than 0 each user have. I don't care about the keys, since the keys can be any number in string.
The desired output is:
| user_id | count |
|---------|---------|
| 1 | 2 |
| 2 | 0 |
| 3 | 0 |
I started with:
select *
from diary as d
join json_each_text(d.custom_foods) as e
on d.custom_foods != '{}'
where e.value > 0
I don't even know whether the syntax is correct. Now I am getting the error:
ERROR: function json_each_text(text) does not exist
LINE 3: join json_each_text(d.custom_foods) as e
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
My using version is: psql (10.5 (Ubuntu 10.5-1.pgdg14.04+1), server 9.4.19). According to PostgreSQL 9.4.19 Documentation, that function should exist. I am so confused that I don't know how to proceed now.
Threads that I referred to:
Postgres and jsonb - search value at any key
Query postgres jsonb by value regardless of keys
Your custom_foods column is defined as text, so you should cast it to json before applying json_each_text. As json_each_text by default does not consider empty jsons, you may get the count as 0 for empty jsons from a separate CTE and do a UNION ALL
WITH empty AS
( SELECT DISTINCT user_id,
0 AS COUNT
FROM diary
WHERE custom_foods = '{}' )
SELECT user_id,
count(CASE
WHEN VALUE::int > 0 THEN 1
END)
FROM diary d,
json_each_text(d.custom_foods::JSON)
GROUP BY user_id
UNION ALL
SELECT *
FROM empty
ORDER BY user_id;
Demo

Postgresql: Dynamic Regex Pattern

I have event data that looks like this:
id | instance_id | value
1 | 1 | a
2 | 1 | ap
3 | 1 | app
4 | 1 | appl
5 | 2 | b
6 | 2 | bo
7 | 1 | apple
8 | 2 | boa
9 | 2 | boat
10 | 2 | boa
11 | 1 | appl
12 | 1 | apply
Basically, each row is a user typing a new letter. They can also delete letters.
I'd like to create a dataset that looks like this, let's call it data
id | instance_id | value
7 | 1 | apple
9 | 2 | boat
12 | 1 | apply
My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.
To do so, I was planning to do a regex operation like so:
select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))
Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.
The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?
Try simple LIKE operator instead of regex patterns:
SELECT * FROM data d1
WHERE NOT EXISTS (
SELECT * FROM data d2
WHERE d2.value LIKE d1.value ||'_%'
)
Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39
Create an index on value column, this should speed up the query a bit.
To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:
with cte as (
select
*,
length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
from data)
select * from cte where is_peak order by id;
Demo

sql - Postgresql Retrieving distinct row

In the table below.. I am supposed to retrieve all row where the deleted is false and disabled is true and a distinct phrase.. If the phrase isn't the only one in the table (for example the "bad" word).. I must return the one with the device_id.. If it is only one in the table, I must return it even if the device_id is blank..
id | device_id | phrase | disabled | deleted |
----+-----------+---------+----------+---------+
2 | 1 | WTF | f | f |
3 | 1 | White | f | f |
4 | | WTF | f | f |
5 | | wTf | f | f |
6 | 2 | fck | f | f |
7 | 1 | damn | f | f |
8 | 1 | bitch | f | f |
9 | 1 | crap | f | f |
1 | 1 | Shit | t | t |
10 | 1 | ass | f | f |
11 | | bad | f | f |
12 | 1 | bad | t | f |
13 | 1 | badshit | f | f |
What I've done is this query and returns what I've expected.. (for example, the return is only 1 "bad" word with device_id = 1)
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
But when add a keyword search for example the "bad"..
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
The return is "badshit" (ok) and "bad" (but the device_id is null).. My expected is that the "bad" word's device_id is 1..
I'm kind of new to postgresql.. Thanks!
I already fixed this error 9 months ago but was too busy to post it here.
Here's my answer:
order by phrase, device_id
either:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is not null)
order by phrase;
or:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase = 'bad' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
first if you want to only retrieve records without null values in device. second if you want to retrieve records with exact phrase bad.
where phrase like '%bad%'
specifically asks postgres to return both bad and bad****, because they are both 'like' bad.
On another note, clean up your post before asking for help.
Nevermind, I fixed it by adding device_id:
order by phrase;
into
order by phrase, device_id;
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example:
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group
for your case use below code as you want device_id=1
select distinct on (phrase) phrase, id, device_id, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
device_id = 1
order by phrase,device_id;

Verifying a cycle of values in table

I have a table which looks like this (the real table has dates and time in place of the Letters):
| assigned | start | end
| xyz | A | B
| xyz | B | C
| xyz | C | D
| xyz | D | E
| xyz | E | F
| fgh | A | B
| fgh | B | C
etc.
There is a rotation with each assigned code (xyz,fgh and so on) where 'end' is congruent with the next 'start' up to a value indicating a defined end (here 'F').
I am looking for a statement which scans/verifys that this rotation is indeed occurring, that it starts at A and ends with F and did every step up until then.
Any help is greatly appreciated.
edit: The rotation always uses 5 rows (or 4 steps), even if the intervall length can change in between.
This is really a hack that works because the dates are replaced by characters, but it might give you ideas on how to make it work for real.
select * from (
select a_code, min(a_start) as thestart, max(a_end) as theend,
substring(group_concat(a_start order by a_start), 3) as starts,
substring(group_concat(a_end order by a_end), 1, length(group_concat(a_end))-2) as ends
from so_test
group by a_code ) as grpSelect
where thestart = 'a'
and theend = 'f'
and starts = ends
The group_concat of a_start for xyz prduces a string of 'a,b,c,d,e' while the group_concat for a_end prduces b,c,d,e,f. The substring removes the a from the start and the f from the end so that the outer query can compare b,c,d,e in both strings.