Fuzzy match a substring within a larger string in Postgres

Fuzzy match a substring within a larger string in Postgres - sql

Is it possible to fuzzy match a substring within a larger string in Postgres?
Example:
For a search of colour (ou), return all records where the string includes color, colors or colour.
select
*
from things
where fuzzy(color) in description;
id | description
----------------
1 | A red coloured car
2 | The garden
3 | Painting colors
=> return records 1 and 3
I was wondering if it's possible to combine both fuzzystrmatch and tsvector so that the fuzzy matching could be applied to each vectorized term?
Or if there is another approach?

You can do it of course, but I doubt it will be very useful:
select *,levenshtein(lexeme,'color') from things, unnest(to_tsvector('english',description))
order by levenshtein;
id | description | lexeme | positions | weights | levenshtein
----+--------------------+--------+-----------+---------+-------------
3 | Painting colors | color | {2} | {D} | 0
1 | A red coloured car | colour | {3} | {D} | 1
1 | A red coloured car | car | {4} | {D} | 3
1 | A red coloured car | red | {2} | {D} | 5
3 | Painting colors | paint | {1} | {D} | 5
2 | The garden | garden | {2} | {D} | 6
Presumably you would want to embellish the query to apply some cutoff, probably where the cutoff depends on the lengths, and return only the best result for each description assuming it met that cutoff. Doing that should be just routine SQL manipulations.
Perhaps better would be the word similarity operators recently added to pg_trgm.
select *, description <->> 'color' as distance from things order by description <->> 'color';
id | description | distance
----+--------------------+----------
3 | Painting colors | 0.166667
1 | A red coloured car | 0.333333
2 | The garden | 1
Another option would be to find a stemmer or thesaurus which standardizes British/American spellings (I am not aware of one readily available), and then not use fuzzy matching at all. I think this would be best, if you can do it.

Related

How to select or view only the top row in airtable?

I have this table:
Name | Weight | Color
1 Cherry | 1 | Red
2 Apple | 4 | Green
3 Pear | 3 | Yellow
I need a view in which only the top row is visible
Cherry | 1 | Red
When the table changes (new record, sorting), the view changes accordingly
Example 1:
Name | Weight V| Color
1 Apple | 4 | Green
2 Pear | 3 | Yellow
3 Cherry | 1 | Red
single row view:
Apple | 4 | Green
Example 2:
Name | Weight | Color
1 Almond | 0.5 | Brown
2 Apple | 4 | Green
3 Pear | 3 | Yellow
4 Cherry | 1 | Red
single row view:
Almond | 0.5 | Brown
This doesn't seem possible. Didn't find anything in related forums.
GPT3 suggestions were selecting a record by row_id or time_of_creation fields, but this won't help with table resorting.
It also suggested using SELECT(table_name, {}, {limit: N, fields: ["field_1", "field_2"]}) - but limit does not work. Same for FIRST() which doesn't exist.
Any solution to this?

Try a record list. You can sort elements there as well as limit how many you want to list.
I guess the other would be scripting and just "limiting" what the query returns when displaying it (which doesn't sound like what you want). I am not sure there is a simple native way.
I guess in the end record list limits are the closest to what you are trying to achieve, especially since your main criterion for a top row is sorting.

SQL "IN" clause usage to include / exclude items at the same time? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I would like to have some help on the best way to address a specific revision. I have the following SQL query:
SELECT * FROM PRODDTA.F4105
WHERE (COITM, COMCU) IN ((`"Set of Data 1"`))
AND (COLOCN, COLEDG) NOT IN ((`"Set of Data 2"`));
What I want to do is to revise if there is someway to merge these items in one:
SELECT * FROM PRODDTA.F4105
WHERE (COITM, COMCU, COLOCN, COLEDG) IN ((`"Set of Data 1 & 2"`));
Do you know if this is achievable?
Thanks a lot in advance,

This really depends on where Set of data 1 and Set of data2 come from and what their relationship is to each other, and I'm pressed to imagine a real world scenario where this would be a good idea.
A Set of Data here is really just a set of Records with Fields. These Fields have, inherently, a relationship to one another which is why they are present on a single Record.
Issue 1: The two sets contain data that don't relate to each other:
Set of Data1 could be completely disparate from Set of Data2 and so there is no way to say that they can be combined.
Imagine two sets:
Set1:
+-------+--------+
| color | animal |
+-------+--------+
| brown | dog |
| white | dog |
| black | dog |
| green | parrot |
| green | turtle |
| brown | turtle |
+-------+--------+
Set2:
+------------+--------+
| food | flavor |
+------------+--------+
| pepper | spicy |
| water | |
| grapefruit | bitter |
| lemon | sour |
| candy | sweet |
+------------+--------+
A queries WHERE clause:
WHERE (f1, f2) IN (SELECT color, animal FROM set1)
AND (f3, f4) IN (SELECT food, flavor FROM set2)
There's no good way to write this where we test (f1, f2, f3, f4) as there is no relationship between color | animal and food | flavor.
We could, if we were in crazytown, cross join the two sets to get their cartesian product yielding the same result set as the original query:
WHERE (f1, f2, f3, f4) IN (SELECT color, animal, food, flavor FROM set1, set2)
But now we have a subquery with an intermediate result set of set1 x set2 number of records. This is dumb for multiple reasons:
Indexes on the two sets are ignored
If set1 or set2 are more than a few records, you end up with a HUGE intermediate result set.
There is no relationship between these two sets, so it's crazy town to combine them just to make your SQL less characters.
There will be huge amounts of unnecessary system resources (CPU, Disk, I/O) used to build and store this intermediate result set resulting in a cumbersome slow query.
Any other developer upon seeing it will hunt you down and kill you. If they call me up, I'll provide the getaway car.
Issue 2: The two sets could have a relationship with one another, but changing your IN conditions into 1 causes records to drop.
Even if the two sets can be combined, you may still end up with results that differ from the original query. Imagine:
Set1:
+-------+--------+
| color | animal |
+-------+--------+
| brown | dog |
| white | dog |
| black | dog |
| green | parrot |
| green | turtle |
| brown | turtle |
+-------+--------+
Set 2:
+--------+-------------+
| animal | stink_scale |
+--------+-------------+
| turtle | 2 |
| parrot | 4 |
| dog | 5 |
| skunk | 10 |
+--------+-------------+
Table1:
+-------+--------+---------+-------------+
| color | animal | animal2 | stink_scale |
+-------+--------+---------+-------------+
| white | dog | dog | 5 |
| brown | dog | parrot | 4 |
| green | turtle | turtle | 2 |
+-------+--------+---------+-------------+
The query you want to change:
SELECT * FROM table1
WHERE (color, animal) IN (SELECT color, animal FROM Set1)
AND (animal2, stink_scale) IN (SELECT animal, stink_scale FROM set2);
This would yield 3 records as white | dog is in set1 and dog | 5 is in set as it brown | dog in set1 and parrot | 4 is in set2 and the same for the third record in table1.
BUT if we combine these two sets on their animal key:
SELECT set1.color, set1.animal, set2.animal as animal2, set2.stink_scale FROM set1 JOIN set2 ON set1.animal = set2.animal;
We will get the set:
+-------+--------+---------+-------------+
| color | animal | animal2 | stink_scale |
+-------+--------+---------+-------------+
| brown | dog | dog | 5 |
| white | dog | dog | 5 |
| black | dog | dog | 5 |
| green | parrot | parrot | 4 |
| green | turtle | turtle | 2 |
| brown | turtle | turtle | 2 |
+-------+--------+---------+-------------+
And we use that to combine our IN conditions:
SELECT *
FROM table1
WHERE (color, animal, animal2, stink_scale) IN (SELECT set1.color, set1.animal, set2.animal as animal2, set2.stink_scale FROM set1 JOIN set2 ON set1.animal = set2.animal)
We only get 2 records back since there is no result in that subquery where brown | dog | parrot | 4 will exist.
So, in the end unless there is a reason to change the conditions, thus changing the definition of the result set, you probably best off leaving it alone. It really changes the logic.

Concatenating multiple entries for a given group on large data set in Access

I have a question regarding joining some large tables then attempting to concatenate multiple entries of an attribute. The data is stored in Access which is where I am attempting to restructure the data via queries for my use case.
I am able to join the tables fine as seen in example.. but not sure what is the best method for concatenating multi-valued attributes.. my data set is huge so I have been having performance issues.
I have created some comparable data I am working with to give an idea on how I am joining data. I have noted the number of rows that I have for each table.
TabOrders (121,965 rows)
------------------------------------------
OrderNum | Product | ConfigInstance
------------------------------------------
1 | Product1| 100
2 | Product2| 200
TabConfigurations (121,965 rows)
-------------------------------------
ConfigInstance | Configuration
-------------------------------------
100 | C100
200 | C200
TabConfigDetails (4,021,244 rows)
--------------------------------------
Configuration | ConfigIndicator
--------------------------------------
C100 | A1V2
C100 | A2V1
C100 | A3V1
C100 | A3V2
C100 | A4V2
C200 | A1V1
C200 | A2V2
C200 | A2V4
C200 | A3V4
C200 | A3V5
C200 | A4V2
TabAttributes (27,665 rows)
-------------------------------------------
ConfigIndicator | Attribute | Value
-------------------------------------------
A1V1 | Product | Car
A1V2 | Product | Bike
A1V3 | Product | Motorcycle
A1V4 | Product | Go Cart
A2V1 | Color | Red
A2V2 | Color | Green
A2V3 | Color | Blue
A2V4 | Color | Orange
A3V1 | Accessories| Helmet
A3V2 | Accessories| Cup Holder
A3V3 | Accessories| Cargo
A3V4 | Accessories| Trailer
A3V5 | Accessories| GPS
A4V1 | Size | Small
A4V2 | Size | Large
Here is the query Ive used to join everything:
SELECT TabOrders.OrderNum, TabOrders.Product, TabAttributes.Attribute, TabAttributes.Value
FROM ((TabOrders INNER JOIN TabConfigurations ON TabOrders.[ConfigInstance] = TabConfigurations.[ConfigInstance]) INNER JOIN TabConfigDetails ON TabConfigurations.[Configuration] = TabConfigDetails.[Configuration]) INNER JOIN TabAttributes ON TabConfigDetails.[ConfigIndicator] = TabAttributes.[ConfigIndicator]
And gets me:
OrderNum | Product | Attribute | Value
------------------------------------------
1 | Product1| Product | Bike
1 | Product1| Color | Red
1 | Product1| Accessories| Helmet
1 | Product1| Accessories| Cup Holder
1 | Product1| Size | Large
2 | Product2| Product | Car
2 | Product2| Color | Green
2 | Product2| Color | Orange
2 | Product2| Accessories| Trailer
2 | Product2| Accessories| GPS
2 | Product2| Size | Large
But I would like to get the data formated as below.. but the methods* I have used takes way too long and access crashes..
OrderNum | Product | Attribute | Value
------------------------------------------
1 | Product1| Product | Bike
1 | Product1| Color | Red
1 | Product1| Accessories| **Helmet;Cup Holder**
1 | Product1| Size | Large
2 | Product2| Product | Car
2 | Product2| Color | **Green;Orange**
2 | Product2| Accessories| **Trailer;GPS**
2 | Product2| Size | Large
*Ive mostly attempted utilizing functions, I attempted using GetList function (I created another column CONCAT1 to be used as index.. concatenating ConfigInstance and Attribute then saved the query as DataConfigurations)
GetList: GetList
Is there a better way to structure the query for better performance? It seems when function runs, it reprocesses the entire query each time its triggered.
Here is the query:
SELECT DISTINCT DataConfigurations.OrderNum, DataConfigurations.Product, DataConfigurations.Attribute, GetList("Select Value From DataConfigurations As T1 Where DataConfigurations.CONCAT1 = " & [DataConfigurations].[CONCAT1],"",", ") AS Value_CONCAT
FROM DataConfigurations
This seemed to work only when processing on small amount of orders.. if I tried on entire data set it would run and hangup my computer.

Reference this other question, seems to achieve what you are wanting to accomplish. Performance issues could be result of a improper link (aka duplication) or your PC just needs to be beefier.

Return all the distinct values of column B in one row for each distinct value in column A

Take the following table:
CREATE TABLE boxes (
box integer,
color character varying,
size integer,
...
);
where both box and color can assume not unique values out of a small
set.
Querying this table with:
SELECT color, box FROM boxes;
the result will be something like:
+-------+-----+
| color | box |
+-------+-----+
| blue | 2 |
| blue | 3 |
| blue | 4 |
| green | 1 |
| green | 3 |
| red | 1 |
| red | 2 |
| red | 2 |
+-------+-----+
Is it possible to query this table in a manner such that the result has two columns, one with an array (or string, or list) with all the different box values for each distinct color?
The result should be something like this:
+-------+-----------+
| color | box_types |
+-------+-----------+
| blue | {2,3,4} |
| green | {1,3} |
| red | {1,2} |
+-------+-----------+
where the color column must contain unique values, and each row must contain only distinct box numbers in the aggregate column.
Given the non-agnostic character of this question, I would like to collect all the best solutions for the major DBMS. When answering, please specify for which DBMS each query works.

Try below.
SELECT
color ,
STUFF(
(SELECT DISTINCT ',' +CONVERT(varchar(10), box)
FROM boxes
WHERE color = a.color
FOR XML PATH (''))
, 1, 1, '') AS box_types
FROM boxes AS a
GROUP BY color;
Check SQL Fiddle

Well, in MySQL you can do the following :
select color, group_concat(box) from tbl group by color
In Oracle:
select color, wm_concat(box) from tbl group by color

First of all, this is the negation of the principle of "normalization", in other words it's "bad".
However, there are some dbms, like Microsoft SQL Server, that implement this possibility with the clause PIVOT (and its contrary UNPIVOT).
This clause permits to create a table (using your example) like this:
+-------+------+------+------+
| color | box1 | box2 | box3 |
+-------+------+------+------+
| blue | 2 | 3 | 4 |
| green | 1 | 3 | null |
| red | 1 | 2 | null |
+-------+------+------+------+

How to get the cartesian product in MySQL for a single table

Disclaimers first: I'm dealing with a legacy database with a pretty bizarre schema. Plus, I'm a complete SQL noob, so that's not helping either.
Basically, I have a table that has product variations. A good example might be t-shirts. The "generic" t-shirt has a product id. Each type of variation (size, color) has an id. Each value of the variation (red, small) has an id.
So table looks like:
+----+----------+-----------+-------------+----------+------------+
| id | tshirt | option_id | option_name | value_id | value_name |
+----+----------+-----------+-------------+----------+------------+
| 1 | Zombies! | 2 | color | 13 | red |
| 1 | Zombies! | 2 | color | 24 | black |
| 1 | Zombies! | 3 | size | 35 | small |
| 1 | Zombies! | 3 | size | 36 | medium |
| 1 | Zombies! | 3 | size | 56 | large |
| 2 | Ninja! | 2 | color | 24 | black |
| 2 | Ninja! | 3 | size | 35 | small |
+----+----------+-----------+-------------+----------+------------+
I want to write a query that retrieves the different combinations for a given product.
In this example, the Zombie shirt comes in Red/Small, Red/Medium, Red/Large, Black/Small, Black/Medium, and Black/Large (six variations). The Ninja shirt just has the one variation: Black/Small.
I believe this is the cartesian product of size and color.
Those ids are really foreign keys to other tables, so those names/values aren't but wanted to include for clarity.
The number of options can vary (not limited to two) and the number of values per option can vary as well. Ultimately, the numbers are likely to small-ish for a given product so I'm not worried about millions of rows here.
Any ideas on how I might do this?
Thanks,
p.

try this:
select
f.id,
f.tshirt,
color.option_id as color_option_id,
color.option_name as color_option_name,
color.value_id as color_value_id,
color.value_name as color_value_name,
size.option_id as size_option_id,
size.option_name as size_option_name,
size.value_id as size_value_id,
size.value_name as size_value_name
from
foo f
inner join foo color on f.id = color.id and f.value_id = color.value_id and color.option_id = 2
inner join foo size on f.id = size.id and size.option_id = 3
order by
f.id,
color.option_id, color.value_id,
size.value_id, size.value_id;

Looks like distinct can do the trick:
select distinct tshirt
, option_name
, value_name
from YourTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Fuzzy match a substring within a larger string in Postgres - sql

Related

How to select or view only the top row in airtable?

SQL "IN" clause usage to include / exclude items at the same time? [closed]

Concatenating multiple entries for a given group on large data set in Access

Return all the distinct values of column B in one row for each distinct value in column A

How to get the cartesian product in MySQL for a single table

Categories

Resources