sql - Postgresql Retrieving distinct row - sql

In the table below.. I am supposed to retrieve all row where the deleted is false and disabled is true and a distinct phrase.. If the phrase isn't the only one in the table (for example the "bad" word).. I must return the one with the device_id.. If it is only one in the table, I must return it even if the device_id is blank..
id | device_id | phrase | disabled | deleted |
----+-----------+---------+----------+---------+
2 | 1 | WTF | f | f |
3 | 1 | White | f | f |
4 | | WTF | f | f |
5 | | wTf | f | f |
6 | 2 | fck | f | f |
7 | 1 | damn | f | f |
8 | 1 | bitch | f | f |
9 | 1 | crap | f | f |
1 | 1 | Shit | t | t |
10 | 1 | ass | f | f |
11 | | bad | f | f |
12 | 1 | bad | t | f |
13 | 1 | badshit | f | f |
What I've done is this query and returns what I've expected.. (for example, the return is only 1 "bad" word with device_id = 1)
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
But when add a keyword search for example the "bad"..
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
The return is "badshit" (ok) and "bad" (but the device_id is null).. My expected is that the "bad" word's device_id is 1..
I'm kind of new to postgresql.. Thanks!

I already fixed this error 9 months ago but was too busy to post it here.
Here's my answer:
order by phrase, device_id

either:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is not null)
order by phrase;
or:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase = 'bad' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
first if you want to only retrieve records without null values in device. second if you want to retrieve records with exact phrase bad.
where phrase like '%bad%'
specifically asks postgres to return both bad and bad****, because they are both 'like' bad.
On another note, clean up your post before asking for help.

Nevermind, I fixed it by adding device_id:
order by phrase;
into
order by phrase, device_id;

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example:
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group
for your case use below code as you want device_id=1
select distinct on (phrase) phrase, id, device_id, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
device_id = 1
order by phrase,device_id;

Related

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

PostgreSQL Compare value from row to value in next row (different column)

I have a table of encounters called user_dates that is ordered by 'user' and 'start' like below. I want to create a column indicating whether an encounter was followed up by another encounter within 30 days. So basically I want to go row by row checking if "encounter_stop" is within 30 days of "encounter_start" in the following row (as long as the following row is the same user).
user | encounter_start | encounter_stop
A | 4-16-1989 | 4-20-1989
A | 4-24-1989 | 5-1-1989
A | 6-14-1993 | 6-27-1993
A | 12-24-1999 | 1-2-2000
A | 1-19-2000 | 1-24-2000
B | 2-2-2000 | 2-7-2000
B | 5-27-2001 | 6-4-2001
I want a table like this:
user | encounter_start | encounter_stop | subsequent_encounter_within_30_days
A | 4-16-1989 | 4-20-1989 | 1
A | 4-24-1989 | 5-1-1989 | 0
A | 6-14-1993 | 6-27-1993 | 0
A | 12-24-1999 | 1-2-2000 | 1
A | 1-19-2000 | 1-24-2000 | 0
B | 2-2-2000 | 2-7-2000 | 1
B | 5-27-2001 | 6-4-2001 | 0
You can select..., exists <select ... criteria>, that would return a boolean (always true or false) but if really want 1 or 0 just cast the result to integer: true=>1 and false=>0. See Demo
select ts1.user_id
, ts1.encounter_start
, ts1. encounter_stop
, (exists ( select null
from test_set ts2
where ts1.user_id = ts2.user_id
and ts2.encounter_start
between ts1.encounter_stop
and (ts1.encounter_stop + interval '30 days')::date
)::integer
) subsequent_encounter_within_30_days
from test_set ts1
order by user_id, encounter_start;
Difference: The above (and demo) disagree with your expected result:
B | 2-2-2000 | 2-7-2000| 1
subsequent_encounter (last column) should be 0. This entry starts and ends in Feb 2000, the other B entry starts In May 2001. Please explain how these are within 30 days (other than just a simple typo that is).
Caution: Do not use user as a column name. It is both a Postgres and SQL Standard reserved word. You can sometimes get away with it or double quote it. If you double quote it you MUST always do so. The big problem being it has a predefined meaning (run select user;) and if you forget to double quote is does not necessary produce an error or exception; it is much worse - wrong results.

Filter json values regardless of keys in PostgreSQL

I have a table called diary which includes columns listed below:
| id | user_id | custom_foods |
|----|---------|--------------------|
| 1 | 1 | {"56": 2, "42": 0} |
| 2 | 1 | {"19861": 1} |
| 3 | 2 | {} |
| 4 | 3 | {"331": 0} |
I would like to count how many diaries having custom_foods value(s) larger than 0 each user have. I don't care about the keys, since the keys can be any number in string.
The desired output is:
| user_id | count |
|---------|---------|
| 1 | 2 |
| 2 | 0 |
| 3 | 0 |
I started with:
select *
from diary as d
join json_each_text(d.custom_foods) as e
on d.custom_foods != '{}'
where e.value > 0
I don't even know whether the syntax is correct. Now I am getting the error:
ERROR: function json_each_text(text) does not exist
LINE 3: join json_each_text(d.custom_foods) as e
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
My using version is: psql (10.5 (Ubuntu 10.5-1.pgdg14.04+1), server 9.4.19). According to PostgreSQL 9.4.19 Documentation, that function should exist. I am so confused that I don't know how to proceed now.
Threads that I referred to:
Postgres and jsonb - search value at any key
Query postgres jsonb by value regardless of keys
Your custom_foods column is defined as text, so you should cast it to json before applying json_each_text. As json_each_text by default does not consider empty jsons, you may get the count as 0 for empty jsons from a separate CTE and do a UNION ALL
WITH empty AS
( SELECT DISTINCT user_id,
0 AS COUNT
FROM diary
WHERE custom_foods = '{}' )
SELECT user_id,
count(CASE
WHEN VALUE::int > 0 THEN 1
END)
FROM diary d,
json_each_text(d.custom_foods::JSON)
GROUP BY user_id
UNION ALL
SELECT *
FROM empty
ORDER BY user_id;
Demo

Postgresql: Dynamic Regex Pattern

I have event data that looks like this:
id | instance_id | value
1 | 1 | a
2 | 1 | ap
3 | 1 | app
4 | 1 | appl
5 | 2 | b
6 | 2 | bo
7 | 1 | apple
8 | 2 | boa
9 | 2 | boat
10 | 2 | boa
11 | 1 | appl
12 | 1 | apply
Basically, each row is a user typing a new letter. They can also delete letters.
I'd like to create a dataset that looks like this, let's call it data
id | instance_id | value
7 | 1 | apple
9 | 2 | boat
12 | 1 | apply
My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.
To do so, I was planning to do a regex operation like so:
select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))
Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.
The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?
Try simple LIKE operator instead of regex patterns:
SELECT * FROM data d1
WHERE NOT EXISTS (
SELECT * FROM data d2
WHERE d2.value LIKE d1.value ||'_%'
)
Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39
Create an index on value column, this should speed up the query a bit.
To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:
with cte as (
select
*,
length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
from data)
select * from cte where is_peak order by id;
Demo

Getting a distinct code-name pair but sorting on other columns

I'm a bit stumped on writing a query (SQL not my strong point).
Say I have the following TABLE1:
CODE NAME SCOPE1 SCOPE2 SEQ
------------------------------------
A a Here 1
B b Here 2
C c Here 3
C    c            Room      1
A aa Room 2
B bbb Room 3
The business key is CODE + SCOPE1 + SCOPE2, where SCOPE1 and SCOPE2 are always mutually exclusive.
How can I get a distinct result of CODE and NAME given that I need sort by SCOPE1, SCOPE2, and SEQ?
That is, given SCOPE1 = 'Here' and SCOPE2 = 'Room', I would like to get this result:
CODE NAME
---------
A a
B b
C c
A aa
B bbb
Note: C c from Room is not wanted as it's a duplicate to C c from Here.
I do realise the limitation of using DISTINCT with ORDER BY and the best I could come up with was the following:
select distinct CODE, NAME from
(
select CODE, NAME from MYTABLE
where (SCOPE1='Here' or SCOPE2='Room')
order by SCOPE1, SCOPE2, SEQ
);
The above produces the correct pairs but in the wrong sequence. I tried messing around with GROUP BY, but I guess I didn't know enough.
I have to stick with standard SQL (that is, no product-specific SQL constructs, unless it's Oracle, maybe), and I guess with this particular query, it's probably impossible to avoid subselects.
I would be very grateful for any pointers. Thanks in advance.
UPDATE: I've updated the data set, and based on peterm's answer, here's what I have so far: sqlfiddle. The MIN/MAX trick doesn't work well when I start tweaking the sequences.
The assumption is that I will always search for one specific SCOPE1 paired with one specific SCOPE2. But I need all SCOPE1 records to appear before SCOPE2. The idea is that I don't care whether CODE + NAME comes from SCOPE1 or SCOPE2 - I just want unique pairs that are sorted by SCOPE1, SCOPE2, and SEQ.
UPDATE Based on your updated requirements for Oracle
SELECT CODE, NAME
FROM
(
SELECT CODE, NAME,
ROW_NUMBER() OVER (ORDER BY SCOPE1, SCOPE2, SEQ) rnum
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
) q
GROUP BY CODE, NAME
ORDER BY MIN(rnum)
Here is SQLFiddle
To make it work the same way in SQL Server
SELECT CODE, NAME
FROM
(
SELECT CODE, NAME,
ROW_NUMBER() OVER (ORDER BY CASE WHEN SCOPE1 IS NULL
THEN 1 ELSE 0 END, SCOPE1,
CASE WHEN SCOPE2 IS NULL
THEN 2 ELSE 3 END, SCOPE2, SEQ) rnum
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
) q
GROUP BY CODE, NAME
ORDER BY MIN(rnum)
Here is SQLFiddle
Output:
| CODE | NAME |
---------------
| A | a |
| B | b |
| C | c |
| A | aa |
| B | bbb |
Original answer: The only thing I could think of based on your description of requirements
SELECT CODE, NAME
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
GROUP BY CODE, NAME
ORDER BY MIN(SCOPE1), MIN(SCOPE2), MIN(SEQ)
Here is SQLFiddle demo (MySql)
Here is SQLFiddle demo (SQL Server)
Here is SQLFiddle demo (Oracle)
Now in MySql and SQL Server NULLs go first by default therefore you'll get
| CODE | NAME |
---------------
| B | bbb |
| A | a |
| B | b |
| C | c |
In Oracle NULLs go last by default therefore you'll get
| CODE | NAME |
---------------
| A | a |
| B | b |
| C | c |
| B | bbb |