SQL "IN" clause usage to include / exclude items at the same time? [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I would like to have some help on the best way to address a specific revision. I have the following SQL query:
SELECT * FROM PRODDTA.F4105
WHERE (COITM, COMCU) IN ((`"Set of Data 1"`))
AND (COLOCN, COLEDG) NOT IN ((`"Set of Data 2"`));
What I want to do is to revise if there is someway to merge these items in one:
SELECT * FROM PRODDTA.F4105
WHERE (COITM, COMCU, COLOCN, COLEDG) IN ((`"Set of Data 1 & 2"`));
Do you know if this is achievable?
Thanks a lot in advance,

This really depends on where Set of data 1 and Set of data2 come from and what their relationship is to each other, and I'm pressed to imagine a real world scenario where this would be a good idea.
A Set of Data here is really just a set of Records with Fields. These Fields have, inherently, a relationship to one another which is why they are present on a single Record.
Issue 1: The two sets contain data that don't relate to each other:
Set of Data1 could be completely disparate from Set of Data2 and so there is no way to say that they can be combined.
Imagine two sets:
Set1:
+-------+--------+
| color | animal |
+-------+--------+
| brown | dog |
| white | dog |
| black | dog |
| green | parrot |
| green | turtle |
| brown | turtle |
+-------+--------+
Set2:
+------------+--------+
| food | flavor |
+------------+--------+
| pepper | spicy |
| water | |
| grapefruit | bitter |
| lemon | sour |
| candy | sweet |
+------------+--------+
A queries WHERE clause:
WHERE (f1, f2) IN (SELECT color, animal FROM set1)
AND (f3, f4) IN (SELECT food, flavor FROM set2)
There's no good way to write this where we test (f1, f2, f3, f4) as there is no relationship between color | animal and food | flavor.
We could, if we were in crazytown, cross join the two sets to get their cartesian product yielding the same result set as the original query:
WHERE (f1, f2, f3, f4) IN (SELECT color, animal, food, flavor FROM set1, set2)
But now we have a subquery with an intermediate result set of set1 x set2 number of records. This is dumb for multiple reasons:
Indexes on the two sets are ignored
If set1 or set2 are more than a few records, you end up with a HUGE intermediate result set.
There is no relationship between these two sets, so it's crazy town to combine them just to make your SQL less characters.
There will be huge amounts of unnecessary system resources (CPU, Disk, I/O) used to build and store this intermediate result set resulting in a cumbersome slow query.
Any other developer upon seeing it will hunt you down and kill you. If they call me up, I'll provide the getaway car.
Issue 2: The two sets could have a relationship with one another, but changing your IN conditions into 1 causes records to drop.
Even if the two sets can be combined, you may still end up with results that differ from the original query. Imagine:
Set1:
+-------+--------+
| color | animal |
+-------+--------+
| brown | dog |
| white | dog |
| black | dog |
| green | parrot |
| green | turtle |
| brown | turtle |
+-------+--------+
Set 2:
+--------+-------------+
| animal | stink_scale |
+--------+-------------+
| turtle | 2 |
| parrot | 4 |
| dog | 5 |
| skunk | 10 |
+--------+-------------+
Table1:
+-------+--------+---------+-------------+
| color | animal | animal2 | stink_scale |
+-------+--------+---------+-------------+
| white | dog | dog | 5 |
| brown | dog | parrot | 4 |
| green | turtle | turtle | 2 |
+-------+--------+---------+-------------+
The query you want to change:
SELECT * FROM table1
WHERE (color, animal) IN (SELECT color, animal FROM Set1)
AND (animal2, stink_scale) IN (SELECT animal, stink_scale FROM set2);
This would yield 3 records as white | dog is in set1 and dog | 5 is in set as it brown | dog in set1 and parrot | 4 is in set2 and the same for the third record in table1.
BUT if we combine these two sets on their animal key:
SELECT set1.color, set1.animal, set2.animal as animal2, set2.stink_scale FROM set1 JOIN set2 ON set1.animal = set2.animal;
We will get the set:
+-------+--------+---------+-------------+
| color | animal | animal2 | stink_scale |
+-------+--------+---------+-------------+
| brown | dog | dog | 5 |
| white | dog | dog | 5 |
| black | dog | dog | 5 |
| green | parrot | parrot | 4 |
| green | turtle | turtle | 2 |
| brown | turtle | turtle | 2 |
+-------+--------+---------+-------------+
And we use that to combine our IN conditions:
SELECT *
FROM table1
WHERE (color, animal, animal2, stink_scale) IN (SELECT set1.color, set1.animal, set2.animal as animal2, set2.stink_scale FROM set1 JOIN set2 ON set1.animal = set2.animal)
We only get 2 records back since there is no result in that subquery where brown | dog | parrot | 4 will exist.
So, in the end unless there is a reason to change the conditions, thus changing the definition of the result set, you probably best off leaving it alone. It really changes the logic.

Related

Fuzzy match a substring within a larger string in Postgres

Is it possible to fuzzy match a substring within a larger string in Postgres?
Example:
For a search of colour (ou), return all records where the string includes color, colors or colour.
select
*
from things
where fuzzy(color) in description;
id | description
----------------
1 | A red coloured car
2 | The garden
3 | Painting colors
=> return records 1 and 3
I was wondering if it's possible to combine both fuzzystrmatch and tsvector so that the fuzzy matching could be applied to each vectorized term?
Or if there is another approach?
You can do it of course, but I doubt it will be very useful:
select *,levenshtein(lexeme,'color') from things, unnest(to_tsvector('english',description))
order by levenshtein;
id | description | lexeme | positions | weights | levenshtein
----+--------------------+--------+-----------+---------+-------------
3 | Painting colors | color | {2} | {D} | 0
1 | A red coloured car | colour | {3} | {D} | 1
1 | A red coloured car | car | {4} | {D} | 3
1 | A red coloured car | red | {2} | {D} | 5
3 | Painting colors | paint | {1} | {D} | 5
2 | The garden | garden | {2} | {D} | 6
Presumably you would want to embellish the query to apply some cutoff, probably where the cutoff depends on the lengths, and return only the best result for each description assuming it met that cutoff. Doing that should be just routine SQL manipulations.
Perhaps better would be the word similarity operators recently added to pg_trgm.
select *, description <->> 'color' as distance from things order by description <->> 'color';
id | description | distance
----+--------------------+----------
3 | Painting colors | 0.166667
1 | A red coloured car | 0.333333
2 | The garden | 1
Another option would be to find a stemmer or thesaurus which standardizes British/American spellings (I am not aware of one readily available), and then not use fuzzy matching at all. I think this would be best, if you can do it.

Sybase, show all rows but don't display column data when duplicate

Product: Sybase ASE 11/12/15/16
I am looking to update a Stored Procedure that gets called by different applications, so changing the application(s) isn't an option. What is needed is best explained in examples:
Current results:
type | breed | name
------------------------------------
dog | german shepherd | Bernie
dog | german shepherd | James
dog | husky | Laura
cat | british blue | Mr Fluffles
cat | other | Laserchild
cat | british blue | Sleepy head
fish | goldfish | Goldie
What I need is for the First column's data to be cleared on duplicates. For example, the above data should look like:
type | breed | name
------------------------------------
dog | german shepherd | Bernie
| german shepherd | James
| husky | Laura
cat | british blue | Mr Fluffles
| other | Laserchild
| british blue | Sleepy head
fish | goldfish | Goldie
I know I can do a cursor, but there are around 10,000 records and that doesn't seem proficient. Looking for a select command, don't want to change the data in the database.
After mulling over this, I found a solution that would work and not use a cursor.
select Type,breed,name
into #DontDisplay
from #MyDataList as a1
group by breed
Having breed= (select max(name)
from #MyDataList a2
where a1.breed= a2.breed)
order by breed, name
select n.Type,d.Breed,d.Name
from #MyDataList as d
left join #DontDisplay as n
on d.Breed= n.Breed and d.Name= n.Name
order by Breed
Works great and the solution was based on another solution Sybase SQL Select Distinct Based on Multiple Columns with an ID

Concatenating multiple entries for a given group on large data set in Access

I have a question regarding joining some large tables then attempting to concatenate multiple entries of an attribute. The data is stored in Access which is where I am attempting to restructure the data via queries for my use case.
I am able to join the tables fine as seen in example.. but not sure what is the best method for concatenating multi-valued attributes.. my data set is huge so I have been having performance issues.
I have created some comparable data I am working with to give an idea on how I am joining data. I have noted the number of rows that I have for each table.
TabOrders (121,965 rows)
------------------------------------------
OrderNum | Product | ConfigInstance
------------------------------------------
1 | Product1| 100
2 | Product2| 200
TabConfigurations (121,965 rows)
-------------------------------------
ConfigInstance | Configuration
-------------------------------------
100 | C100
200 | C200
TabConfigDetails (4,021,244 rows)
--------------------------------------
Configuration | ConfigIndicator
--------------------------------------
C100 | A1V2
C100 | A2V1
C100 | A3V1
C100 | A3V2
C100 | A4V2
C200 | A1V1
C200 | A2V2
C200 | A2V4
C200 | A3V4
C200 | A3V5
C200 | A4V2
TabAttributes (27,665 rows)
-------------------------------------------
ConfigIndicator | Attribute | Value
-------------------------------------------
A1V1 | Product | Car
A1V2 | Product | Bike
A1V3 | Product | Motorcycle
A1V4 | Product | Go Cart
A2V1 | Color | Red
A2V2 | Color | Green
A2V3 | Color | Blue
A2V4 | Color | Orange
A3V1 | Accessories| Helmet
A3V2 | Accessories| Cup Holder
A3V3 | Accessories| Cargo
A3V4 | Accessories| Trailer
A3V5 | Accessories| GPS
A4V1 | Size | Small
A4V2 | Size | Large
Here is the query Ive used to join everything:
SELECT TabOrders.OrderNum, TabOrders.Product, TabAttributes.Attribute, TabAttributes.Value
FROM ((TabOrders INNER JOIN TabConfigurations ON TabOrders.[ConfigInstance] = TabConfigurations.[ConfigInstance]) INNER JOIN TabConfigDetails ON TabConfigurations.[Configuration] = TabConfigDetails.[Configuration]) INNER JOIN TabAttributes ON TabConfigDetails.[ConfigIndicator] = TabAttributes.[ConfigIndicator]
And gets me:
OrderNum | Product | Attribute | Value
------------------------------------------
1 | Product1| Product | Bike
1 | Product1| Color | Red
1 | Product1| Accessories| Helmet
1 | Product1| Accessories| Cup Holder
1 | Product1| Size | Large
2 | Product2| Product | Car
2 | Product2| Color | Green
2 | Product2| Color | Orange
2 | Product2| Accessories| Trailer
2 | Product2| Accessories| GPS
2 | Product2| Size | Large
But I would like to get the data formated as below.. but the methods* I have used takes way too long and access crashes..
OrderNum | Product | Attribute | Value
------------------------------------------
1 | Product1| Product | Bike
1 | Product1| Color | Red
1 | Product1| Accessories| **Helmet;Cup Holder**
1 | Product1| Size | Large
2 | Product2| Product | Car
2 | Product2| Color | **Green;Orange**
2 | Product2| Accessories| **Trailer;GPS**
2 | Product2| Size | Large
*Ive mostly attempted utilizing functions, I attempted using GetList function (I created another column CONCAT1 to be used as index.. concatenating ConfigInstance and Attribute then saved the query as DataConfigurations)
GetList: GetList
Is there a better way to structure the query for better performance? It seems when function runs, it reprocesses the entire query each time its triggered.
Here is the query:
SELECT DISTINCT DataConfigurations.OrderNum, DataConfigurations.Product, DataConfigurations.Attribute, GetList("Select Value From DataConfigurations As T1 Where DataConfigurations.CONCAT1 = " & [DataConfigurations].[CONCAT1],"",", ") AS Value_CONCAT
FROM DataConfigurations
This seemed to work only when processing on small amount of orders.. if I tried on entire data set it would run and hangup my computer.
Reference this other question, seems to achieve what you are wanting to accomplish. Performance issues could be result of a improper link (aka duplication) or your PC just needs to be beefier.

mysql GROUP_CONCAT duplicates

I make my join from a farmTOanimal table like this. There is a similar farmTotool table
id | FarmID | animal
1 | 1 | cat
2 | 1 | dog
When I join my tables in a view, I get a result that looks like this
FarmID | animal | tool
1 | cat | shovel
1 | dog | shovel
1 | cat | bucket
1 | dog | bucket
Now, I do GROUP BY FarmID, and GROUP_CONCAT(animal) and GROUP_CONCAT(tool), i get
FarmID | animals | tools
1 | cat,dog,cat,dog | shovel,shovel,bucket,bucket
But, what I really want is a result that looks like this. How can I do it?
FarmID | animals | tools
1 | cat,dog | shovel,bucket
You need to use the DISTINCT option:
GROUP_CONCAT(DISTINCT animal)

How to get the cartesian product in MySQL for a single table

Disclaimers first: I'm dealing with a legacy database with a pretty bizarre schema. Plus, I'm a complete SQL noob, so that's not helping either.
Basically, I have a table that has product variations. A good example might be t-shirts. The "generic" t-shirt has a product id. Each type of variation (size, color) has an id. Each value of the variation (red, small) has an id.
So table looks like:
+----+----------+-----------+-------------+----------+------------+
| id | tshirt | option_id | option_name | value_id | value_name |
+----+----------+-----------+-------------+----------+------------+
| 1 | Zombies! | 2 | color | 13 | red |
| 1 | Zombies! | 2 | color | 24 | black |
| 1 | Zombies! | 3 | size | 35 | small |
| 1 | Zombies! | 3 | size | 36 | medium |
| 1 | Zombies! | 3 | size | 56 | large |
| 2 | Ninja! | 2 | color | 24 | black |
| 2 | Ninja! | 3 | size | 35 | small |
+----+----------+-----------+-------------+----------+------------+
I want to write a query that retrieves the different combinations for a given product.
In this example, the Zombie shirt comes in Red/Small, Red/Medium, Red/Large, Black/Small, Black/Medium, and Black/Large (six variations). The Ninja shirt just has the one variation: Black/Small.
I believe this is the cartesian product of size and color.
Those ids are really foreign keys to other tables, so those names/values aren't but wanted to include for clarity.
The number of options can vary (not limited to two) and the number of values per option can vary as well. Ultimately, the numbers are likely to small-ish for a given product so I'm not worried about millions of rows here.
Any ideas on how I might do this?
Thanks,
p.
try this:
select
f.id,
f.tshirt,
color.option_id as color_option_id,
color.option_name as color_option_name,
color.value_id as color_value_id,
color.value_name as color_value_name,
size.option_id as size_option_id,
size.option_name as size_option_name,
size.value_id as size_value_id,
size.value_name as size_value_name
from
foo f
inner join foo color on f.id = color.id and f.value_id = color.value_id and color.option_id = 2
inner join foo size on f.id = size.id and size.option_id = 3
order by
f.id,
color.option_id, color.value_id,
size.value_id, size.value_id;
Looks like distinct can do the trick:
select distinct tshirt
, option_name
, value_name
from YourTable