I have text field in a table and I want query this field using where a condition: I want to query all records that has at least one word from as list of words and returns a JSON like this:
text
The employee was fired today
He likes chocolate a lot
She eat chocolate today
Car was stolen yesterday
select * from tbl
where text CONTAINS ANY ['today','likes','eat']
Desidered Output 1:
{"id":"1", "text":"The employee was fired today", "tag":"today"}
{"id":"2", "text":"He likes chocolate a lot", "tag":"likes"}
{"id":"3", "text":"She eat chocolate today", "tag":["today","eat"]}
Desidered Output 2:
text tag tag_counts
The employee was fired today today 1
He likes chocolate a lot likes 1
She eat chocolate today eat, today 2
I would like to get any of these outputs.
I already found that I can use WHERE IN ('today','likes','eat') but I can't find out how to get the result in any of the desired output, if possible.
I chose the column name words for the text column. "text" is a basic type name and too confusing as such.
For your given table with a plain text column:
SELECT *
FROM tbl t
, LATERAL (
SELECT string_agg(tag, ', ') AS tags, count(*) AS tag_count
FROM (
SELECT unnest(string_to_array(words, ' ')) tag
INTERSECT ALL
SELECT unnest ('{today, likes, eat}'::text[])
) i
) ct
WHERE string_to_array(t.words, ' ') && '{today, likes, eat}';
Simpler with a text array (text[]) in the table to begin with:
SELECT *
FROM tbl1 t
, LATERAL (
SELECT string_agg(tag, ', ') AS tags, count(*) AS tag_count
FROM (
SELECT unnest(words) tag
INTERSECT ALL
SELECT unnest ('{today, likes, eat}'::text[])
) i
) ct
WHERE t.words && '{today, likes, eat}';
db<>fiddle here
Can be supported with a GIN index. An expression index for the text column:
CREATE INDEX tbl_words_gin_idx ON tbl USING gin (string_to_array(words, ' '));
Simpler, yet again, for text[]:
CREATE INDEX tbl1_words_gin_idx ON tbl1 USING gin (words);
See:
Query and order by number of matches in JSON array
Related
I have a table like below, called abc_table:
Id
Name
Tags
1
abc
1,4,5
2
aef
11,14,55
3
xyz
1,44,9
4
demo
1,98,4
Now, based on above data, I am looking for the name which has tag 1 and 4 / 1 or 4.
I tried using LIKE, in SQL operator, but it is not returning the expected output; I also tried with REGEX but that didn't work for me.
SELECT
ad.name, ad.tags
FROM
abc_table ad
AND CONCAT(',', ad.tags, ',') IN (',1,4,')
This will row1 data but not the row 4 data as 98 is in between the 1 and 4
One idea would be to use the LIKE operator and check all possible cases for each value, e.g.:
the tag is at the beginning and the tags column contains only one element (tags = '4')
the tag is at the beginning and the tags column contains further elements (tags LIKE '4,%')
the tag is in the middle and the tags column contains elements before and after (tags LIKE '%,4,%')
the tag is at the end and the tags column contains elements before it (tags LIKE '%,4')
Apply this for each tag value (1, 4) and combine the results correspondingly (if you want 1 and 4 => intersection and if 1 or 4 => union) and you should get the necessary result.
You can do like this to get tags that has (1 and 4) / (1 or 4)
SELECT name, tags
FROM abc_table
CROSS APPLY STRING_SPLIT(Tags, ',')
where value in (1,4)
group by name, tags;
Result :
name tags
abc 1,4,5
xyz 1,44,9
demo 1,98,4
STRING_SPLIT to Split comma-separated value string.
CROSS APPLY to transforms each list of tags and joins them with the original row
If you want to get tags that contains (1 and 4) OR (4 and 1) you can do it as follows :
SELECT name, tags
FROM abc_table
CROSS APPLY STRING_SPLIT(Tags, ',')
where value in (1,4)
group by name, tags
having count(*) = 2
result :
name tags
abc 1,4,5
demo 1,98,4
demo here
I have a table with two columns item_name, value where item_names looks like "abracadabra_prefix.tag_name". And I need to select rows with tag_names from a list that doesn't have a prefix.
Should be somthing like:
tag_names = ['f1', 'k500', '23_g']
SELECT * FROM table WHERE item_name IN (LIKE "%{tag_names});
input table:
item_name
value
fasdaf.f1
1
asdfe.f2
2
eywvs.24_g
2
asdfe.l500
2
asdfe.k500
2
eywvs.23_g
2
output table:
item_name
value
fasdaf.f1
1
asdfe.k500
2
eywvs.23_g
2
I have tried concatenating a string in a loop to get a query like this:
SELECT * FROM table WHERE item_name LIKE '%f1' OR item_name LIKE '%k500' OR item_name LIKE '%23_g';
But I can have from 1 to 200 tags, and with a large number of tags, this makes the query too complicated,as I understand it.
You can extract the suffix of item_name using substring with regexp and then use the any operator for comparison in the where clause.
select * from the_table
where substring (item_name from '\.(\w+)$') = any('{f1,k500,23_g}'::text[]);
SQL fiddle demo
If you intend to use the query as a parameterized one then it will be convenient to replace '{f1,k500,23_g}'::text[] with string_to_array('f1,k500,23_g', ','), i.e. pass the list of suffixes as a comma-separated string. Please note that this query will result in a sequential scan.
You can use:
UNNEST to extract tag values from your array,
CROSS JOIN to associate tag value to each row of your table
LIKE to make a comparison between your item_name and your tag
SELECT item_name, value_
FROM tab
CROSS JOIN UNNEST(ARRAY['f1', 'k500', '23_g']) AS tag
WHERE item_name LIKE '%' || tag || '%'
Output:
item_name
value_
fasdaf.f1
1
asdfe.k500
2
eywvs.23_g
2
Check the demo here.
I have a table with one word each row and a table with some text in a row. I need
to select from the second table only those rows that does not contain words from the first table.
For example:
Table with constratint words
constraint_word
example
apple
orange
mushroom
car
qwerty
Table with text
text
word1. apple; word3, example
word1, apple, word2. car
word1 word2 orange word3
mushroomword1 word2 word3
word1 car
qwerty
Nothing should be selected in this case, because every row in the second table contains words from the first table.
I only have an idea to use CROSS JOIN to achive this
SELECT DISTINCT text FROM text_table CROSS JOIN words_table
WHERE CONTAINS(text, constraint_word ) = 0
Is there a way to do it without using CROSS JOIN?
contains means Oracle Text; cross join means Cartesian product (usually performance nightmare).
One option which avoids both of these is instr function (which checks existence of the constraint_word in text, but this time using inner join) and the minus set operator.
Something like this, using sample data you posted:
SQL> select * from text_table;
TEXT
---------------------------
word1.apple; word3, example
word1, apple, word2.car
word1 word2 orange word3
mushroomword1 word2 word3
word1 car
qwerty
6 rows selected.
SQL> select * From words_table;
CONSTRAI
--------
example
apple
orange
mushroom
car
qwerty
6 rows selected.
SQL>
As you said, initially query shouldn't return anything because all constraint_words exist in text:
SQL> select c.text
2 from text_table c
3 minus
4 select b.text
5 from words_table a join text_table b on instr(b.text, a.constraint_word) > 0;
no rows selected
Let's modify one of text rows:
SQL> update text_table set text = 'xxx' where text = 'qwerty';
1 row updated.
What's the result now?
SQL> select c.text
2 from text_table c
3 minus
4 select b.text
5 from words_table a join text_table b on instr(b.text, a.constraint_word) > 0;
TEXT
---------------------------
xxx
SQL>
Right; text we've just modified.
Your idea is fine, since you need to test all words for each text.
This is what CROSS JOIN does - a combination (cartesian product).
We can even be more restrictive for better performance and use INNER JOIN, or the shorthand JOIN.
See also: CROSS JOIN vs INNER JOIN in SQL
Additionally you need to filter all text records, where there are no matches at all. This means the count of non-matches over all combinations per text is maximum (= number of constraint_words, here 6).
This filter can be done using GROUP BY WITH HAVING
-- text without any constaint_word
SELECT t.text, count(*)
FROM text_table t
JOIN words_table w ON CONTAINS(t.text, w.constraint_word, 1) = 0
GROUP BY t.text
HAVING count(*) = (SELECT count(*) FROM words_table)
;
It will output:
text
count(*)
mushroomword1 word2 word3
6
Try the demo on on SQL Fiddle
Entire-word vs partial matches
Note that 'mushroom' from constraint words is not matched by CONTAINS because it is contained as word-part not as entire word.
For partial-matches you can use INSTR as answered by Littlefoot.
See also
Use string contains function in oracle SQL query
How does contains() in PL-SQL work?
Oracle context indexes
Creating and Maintaining Oracle Text Indexes
I believe this works (I think the issue with the CROSS JOIN route is that it includes any texts that don't contain at least one of the words--not just texts that don't contain any):
SELECT DISTINCT text FROM text_table WHERE (SELECT COUNT(*) FROM words_table WHERE CONTAINS(text, constraint_word)) = 0;
I have a table that has a field where the contents are a concatenated list of selections from a multi-select form. I would like to convert the data in this field into in another table where each row has the text of the selection and a count the number of times this selection was made.
eg.
Original table:
id selections
1 A;B
2 B;D
3 A;B;D
4 C
I would like to get the following out:
selection count
A 2
B 3
C 1
D 2
I could easily do this with split and maps in javascript etc, but not sure how to approach it in SQL. (I use Postgresql) The goal is to use the second table to plot a graph in Google Data Studio.
A much simpler solution:
select regexp_split_to_table(selections, ';'), count(*)
from test_table
group by 1
order by 1;
You can use a lateral join and handy set-returning function regexp_split_to_table() to unnest the strings to rows, then aggregate and count:
select x.selection, count(*) cnt
from mytable t
cross join lateral regexp_split_to_table(t.selections, ';') x(selection)
group by x.selection
I have a DB of people with jsonb column interests. In my application user can search for people by providing their hobbies which is set of some predefined values. I want to offer him a best match and in order to do so I would like to count match as intersection/union of interests. This way the top results won't be people who have plenty of hobbies in my DB.
Example:
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
user input in app:
["reading", "swimming", "knitting", "cars"]
my script should output this:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
Now I'm using
SELECT name
FROM people
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
but this gives me even records with many interests and no way to order it.
Is there any way I can achieve it in a reasonable time - let's say up to 5 seconds in DB with around 400K records?
EDIT:
I added another example to clarify my calculations. My calculation needs to filter people with many hobbies. Therefore match should be calculated as Intersection(input, db_record)/Union(input, db_record).
Example:
input = ["reading"]
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Match for Mary would be calculated as (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])) which is 0.3333
and for Carl it would be (LENGTH(["reading"]))/LENGTH([("reading")]) which is 1
UPDATE: I managed to do it with
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
Here's my fiddle https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
However it's not fast enough and I would appreciate any improvements.
One option is to unnest the parameter and use the ? operator to check each and every element the jsonb array:
select
t.name,
x.match_ratio
from mytable t
cross join lateral (
select avg( (t.interests ? a.val)::int ) match_ratio
from unnest(array['reading', 'swimming', 'knitting', 'cars']) a(val)
) x
It is not very clear what are the rules behind the result that you are showing. This gives you a ratio that represents the percentage of values in the parameter array that can be found in the interests of each person (so Mary gets 0.5 since she has two interests in common with the search parameter, and all other names get 0.25).
Demo on DB Fiddle
One option would be using jsonb_array_elements() to unnest the jsonb column :
SELECT name, count / SUM(count) over () AS ratio
FROM(
SELECT name, COUNT(name) AS count
FROM people
JOIN jsonb_array_elements(interests) AS j(elm) ON TRUE
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
GROUP BY name ) q
Demo