Postgres regex rowwise on large comma separated text string - sql

I have created a table with various columns to filter on such as id, date, and regex_col in the examples below. The goal of this database is to allow the user to filter appropriately for the json_b_value they are looking for. The current database is not very large around ~100M rows.
I have taken the row names out of json_b_value to create the regex_col with the thought process that I can index the regex_col in some way and allow users to regex search for the json_b_value they are looking for. The text in the regex_col is stored as a large comma separated string, with the number of words ranging from 10 - 150.
id | date | regex_col | json_b_value
1 2019 'some','stuff','to','search' json
2 2018 'different','stuff','other' json
3 2019 'lots','of','stuff' json
The user will interact and search this column using a selectize.js dropdown. A separate table takes all of the comma separated words from regex_col and binds them all together rowwise, like below. Then words matching their search will populate as they type, any words not matching anything will result in a null.
search_words |
'some'
'stuff'
'to'
'search'
What would be an effective way to index the regex_col? Is this the optimal way to do this, should I even be creating the regex_col or should I be trying to optimize around the json_b_value?
example of json value for id 1 below
[{"regex_col":"some","current":100,"previous":200},{"regex_col":"stuff","current":200,"previous":400},{"regex_col":"to","current":300,"previous":600},{"regex_col":"search","current":400,"previous":800}]

There can be a lot of factors, but in most cases the best solution is the folowing:
Create a new table with two columns
id, term
and then populate
1 'some'
1 stuff'
1 'to'
1 'search'
2 'different'
2 'stuff'
2 'other'
3 'lots'
3 'of'
3 'stuff'
Now put an index on this table and you are GTG

Related

How to match entires in SQL based on their ending letter?

So I'm trying to match entries in two databases so in the new table the row is comprised of two words that end in the same ending letter. I'm working with two tables that have one column in each of them, each named word. table 1 contains the following data in order: Dog, High, It, Weeks, while table two contains the data: Bat, Is, Laugh, Sing. I need to select from both of these tables and match the words so that each row is as follows: Dog | Sing, High | Laugh, It | Bat, Weeks | Is
The screenshot is what I have so far for my SQL statement. I'm still early on in learning SQL so any info to help on this would be appreciated.
Recommend reading up on SUBSTR() for more information on why the below code works: https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2101.htm#OLADM679
SELECT
a.word
, b.word
FROM sec1313_words1 a
JOIN sec1313_words2 b
ON SUBSTR(b.word, -1) = SUBSTR(a.word, -1)
ORDER BY a.word

SQL Server Multiple Likes

I have an unusual question that seems simple but has me stumped in a SQL Server stored procedure.
I have 2 tables as described below.
tblMaster
ID, CommitDate, SubUser, OrigFileName
Sample data
ID CommitDate SubUser OrigFileName
----------------------------------------
1 2014-10-07 Test1 Test1.pdf
2 2014-10-08 Test2 Test2.pdf
3 2014-10-09 Test3 Test3.pdf
The above table is basically the header table that tracks the committed files. In addition to this, we have a details table with the following structure.
tblIndex
ID, FileID (Linking column to the header row described above), Word
Sample data:
1. 1, 1, Oil
2. 2, 1, oil
3. 3, 2, oil
4. 4, 2, tank
5. 5, 3, tank
The above rows represent the words that we want to search on and if a certain criteria matches return the corresponding filename/header row ID. What I would love to figure out to do is if I do a search for
One word (i.e. "oil"), then the system should respond with all the files that meet the criteria (easiest case and figured out)
If more than one word is searched for (i.e. "oil" and "tank"), then we should only see the second file since it is the only one that has both oil and tank as its key words.
Tried using a LIKE "%oil%" AND LIKE "%tank%" and that resulted in no rows being created since one value can't be both oil and tank.
Tried doing a LIKE "%oil%" OR LIKE "%tank%" but I get files 1, 2, and 3 since the OR is inclusive of all the other rows.
One last thing, I recognize I could just do a search for the first term and then save the results into a temp table and then search for the second term in that second table and I will get what I am looking for. The problem with that is that I don't exactly know how many items will be searched for. I don't want to have to create a structure where I am constantly having to store data into another temp table if someone does a search for 6 "keywords".
Any help/ideas will be much appreciated.
try this ! slightly differing from the previous answer
SELECT distinct FileID,COUNT(distinct t.word) FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(distinct t.word) > 1
One simple option would be to do something like this :
SELECT FileID
FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(*) > 1
This assume you do not have duplicate in your tblIndex.
I'm also unsure whether you really need the like or not. According to your sample data you don't, a basic comparison would be way more efficient and would avoid possible collisions.

SQL: Find highest number if its in nvarchar format containing special characters

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.
This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

Postgres text search on multiple rows

I have a table called 'exclude' that contains hashtags:
-------------
id | tag
-------------
1 #oxford
2 #uk
3 #england
-------------
I have another table called 'post':
-----------------------------------------------
id | tags | text
1 #oxford #funtimes Sitting in the sun
2 #oz Beach fun
3 #england Milk was a bad choice
-----------------------------------------------
In order to do a text search on the posts tags I've been running a query like follows:
SELECT * FROM post WHERE to_tsvector(tags) ## plainto_tsquery('mysearchterm')
However, I now want to be able to exclude all posts where some or all of the tags are in my exclude table. Is there any easy way to do this in SQL/Postgres?
I tried converting the tags row into one column, and using this term within the plainto_tsquery function but it doesn't work (I don't know how to do a text search 'not equal' to either, hence the logic is actual wrong, albeit on the right lines in my mind):
select * from post where to_tsvector(tags) ## plainto_tsquery(
select array_to_string(array(select RTRIM(value) from exclude where key = 'tag'), ' ')
)
What version of PostgreSQL are you on? And how flexible is your schema design? In other words, can you change it at will? Or is this out of your control?
Two things immediately popped to mind when I read your questions. One is you should be able to use array and the the #> (contains) or <# (is contains by) operators.
Here is documentation
Second, you might be able to utilize an hstore and do a similar operation.
to:
hstore #> hstore
It's not a true hstore, because you are not using a real key=>value pair. But, I guess you could do {tagname}=True or {tagname}=NULL. Might be a bit hackish.
You can see the documentation (for PostgreSQL 9.1) hstore and how to use it here

How do I create sql query for searching partial matches?

I have a set of items in db .Each item has a name and a description.I need to implement a search facility which takes a number of keywords and returns distinct items which have at least one of the keywords matching a word in the name or description.
for example
I have in the db ,three items
1.item1 :
name : magic marker
description: a writing device which makes erasable marks on whiteboard
2.item2:
name: pall mall cigarettes
description: cigarette named after a street in london
3.item3:
name: XPigment Liner
description: for writing and drawing
A search using keyword 'writing' should return magic marker and XPigment Liner
A search using keyword 'mall' should return the second item
I tried using the LIKE keyword and IN keyword separately ,..
For IN keyword to work,the query has to be
SELECT DISTINCT FROM mytable WHERE name IN ('pall mall cigarettes')
but
SELECT DISTINCT FROM mytable WHERE name IN ('mall')
will return 0 rows
I couldn't figure out how to make a query that accommodates both the name and description columns and allows partial word match..
Can somebody help?
update:
I created the table through hibernate and for the description field, used javax.persistence #Lob annotation.Using psql when I examined the table,It is shown
...
id | bigint | not null
description | text |
name | character varying(255) |
...
One of the records in the table is like,
id | description | name
21 | 133414 | magic marker
First of all, this approach won't scale in the large, you'll need a separate index from words to item (like an inverted index).
If your data is not large, you can do
SELECT DISTINCT(name) FROM mytable WHERE name LIKE '%mall%' OR description LIKE '%mall%'
using OR if you have multiple keywords.
This may work as well.
SELECT *
FROM myTable
WHERE CHARINDEX('mall', name) > 0
OR CHARINDEX('mall', description) > 0