How to handle multiple one->many relationships? - sql

I have a database with several tables, 5 of which are dedicated to specific publication types. Each of these 5 have a one->many relationship with a status table and a people table. All of these tables are tied together using a unique "pubid". I have a view which includes the pubid (for all 5 types), along with their associated keywords. When a user does a keyword search and the results span across more than 1 of those 5 publication type tables I am really not sure how to handle it.
If there was only a single publication type (so just one table that had a 1->many) it would be very easy to accomplish with a nested join, something like:
SELECT * FROM articles
INNER JOIN status ON articles.spubid = status.spubid
INNER JOIN people ON articles.spubid = people.spubid
WHERE people.saffil = 'ABC' ORDER BY people.iorder, articles.spubid;
In that example 'articles' is one of the 5 tables I mentioned that has a 1->many relationship. Lets say that the keyword search brings back results that include articles, books and papers. How can I achieve this same end with that many different tables? If I were to figure out how to use a JOIN in that case the Cartesian product would be so large that I think the overhead to parse it out into a usable format would be too high. What are my other options in this case?

Why are they in separate tables? Are the columns that different? And what columns do you want to return (never ever use select * in production especially with joins), are they different between the different types in your query?
If you can get the columns to be the same I suggest you use UNION ALL. Even if the columns are different by supplying all that you need in each part of the union statement (giving the value of null for those columns that set of tables doesn't have), you can still get what you want. Simplified code follows:
SELECT articlename, articlestatus, author, ISBN_Number
FROM articles
INNER JOIN status ON articles.spubid = status.spubid
INNER JOIN people ON articles.spubid = people.spubid
WHERE people.saffil = 'ABC'
UNION ALL
SELECT papername, paperstatus, author, null
FROM papers
INNER JOIN status ON papers.spubid = status.spubid
INNER JOIN people ON papers.spubid = people.spubid
WHERE people.saffil = 'ABC'

You could create a view that was the union of all the various tables. The tricky bit is to make sure that all the queries being UNIONed have the same fields, so you'll need to stub a few in each one. Here's a simplified example with only two tables:
CREATE VIEW AllTables AS
SELECT Afield1, Afield2, NULL as Bfield1, NULL as Bfield2 FROM Atable
UNION
SELECT NULL as Afield1, NULL as Afield2, Bfield1, Bfield2 FROM Btable;
You could use something other than NULL as the stub value if necessary, of course. Then you run your query against the view. If you need to vary the formatting according to the publication type, you could include the originating table as part of the view (ie, add "'magazine' AS publication_type" or something similar to each of your selects).

Perhaps you could create views for each of the 5 types (vwBook, vwArticle, etc.)
As you're searching, perhaps call into a stored proc that will use all 5 views using the keywords that you throw at it. Each of the 5 results could go into a table variable in your stored proc.
Modify, of course, as you see fit. Here's a broad stroke example:
create proc MySearch
#MySearchTerm varchar(50)
AS
DECLARE #SearchResultsTABLE
(
Type varchar(10) -- the view you found the result in.
,ID int -- the primary key of the Book record. whatever you want to link it back to the original
,FoundText varchar(512)
--etc
)
INSERT INTO #SearchResults(Type, ID, FoundText)
SELECT 'Articles', ID, SomeKeyField
FROM vwArticle
WHERE SomeKeyField LIKE '%' + #MySearchTerm + '%'
INSERT INTO #SearchResults(Type, ID, FoundText)
SELECT 'Book', ID, SomeKeyField
FROM vwBook
WHERE SomeKeyField LIKE '%' + #MySearchTerm + '%'
--repeat as needed with the 3 other views that you'd build
SELECT * FROM #SearchResults

I really don't like the table design here. However I guess redesigning the entire database from scratch is a bit too extreme.
Given the table design, I think you're going to have to go with a UNION and specify the columns in each table. I know it's a monster, but that's what happens when you design tables with lots of columns which are "almost" alike.
And HLGEM is right. Using "select *" in a permanently-stored query is really dangerous.

We ended up creating a very elaborate view which includes the 1-many tables as array columns in the view. This allows for a single query to be performed on a single view and all the required data is returned. The view definition is VERY complex but it is working like a champ, the real trick was using the ARRAY() function in PostgreSQL.

Related

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

Can I leverage BigQuery (BQ) partition via a join?

I am a Tableau designer, and we are building some views that get filtered by category a lot. Because of this, we tried to create a category_id that would serve as partition. The problem seems to be that if I filter data category only, the partition doesn't get used and the total table GB and cost gets hit.
Our team is trying to see if this could be minimized by using a nested query as follows:
SELECT *
FROM table a
INNER JOIN (
SELECT DISTINCT category_id, category
FROM table
) b
ON a.category_id = b.category_id
WHERE b.category = 'Category A'
The idea is that we could show the user b.category, they select it in Tableau and then the inner join would kick off the partition and limit the bytes returned. When I try this in the BQ interface, the estimated returned size comes back the same.
You'll need to filter on the partitioned field before you make the inner join.
I haven't used tableau before so don't know if this is possible but just an idea. You could create a parameter which is set by the chosen category in tableau, which could be referenced in the where statement of the partitioned table?
SELECT *
FROM table a
INNER JOIN (
SELECT DISTINCT category_id, category
FROM table
Where category = #chosen_category
) b
ON a.category_id = b.category_id;
When you say that your attempts to filter only by category, the partition isn't used, have you actually tested querying the table from the console to test whether the partition is being used or not. If it isn't then you need to look at the partition, but if it is, then you would need to take another look at your Tableau query.
VizQL (Viz query language) is Tableau's sql parser that converts your Tableau viz into SQL for execution, so whilst you cannot really modify the outgoing SQL, you can at least capture it and test which enables you to identify poor performing calculations and/or vizzes, as well as optimise the backend for the queries that Tableau will send.
I've written an article about this here: https://datawonders.atlassian.net/wiki/spaces/TABLEAU/pages/1290600449/Let+s+Talk+Errors+Tuning+6+minute+read
The thing about Tableau is that it treats the source as a derived table, with all filters being placed at the upper-level of the query immediately before the stream,
so your query:
Select *
From table a
Join (
Select Distinct Category_ID, Category
From table
)b On a.category_id = b.category_id
Where b.category = 'Category A'
Will actually look like this (assuming you just select everything):
Select a1.*
From (
Select *
From table a
Join (
Select Distinct Category_ID, Category
From table
)b On a.category_id = b.category_id
)a1
Where a1.category = <your selected category>
So you can see from here that being two-levels deep, your Category table just won't be hit, instead everything shall be read into the spool, the join taking place in tempdb, and only the complete set is filtered immediately before streaming to Tableau.
Bad, underperforming sql it most certainly is.
And this is where the relational method of v2020.2 comes into play, as this has been designed to treat each table as a separate exclusive entity, joins are only made at execution time, so you could build a view that uses data from table a where you are using table b to provide the filtering.
As an alternative, and my preferred overall method is to switch entirely to Custom SQL, utilising this with parameters, as this will enable you to craft and test your own sql to create your own high-performance, low-loading query, but as parameters are parsed before the query is executed, you can place the filtering deep down in the query without the need for a secondary look-up table or filtered derived statement - a select distinct as you are currently using it is still going to produce a large plan, as unless the category column is indexed, the engine shall still need to read every record from the table.
So using parameters, your new query will look something like:
Select a1.*
From (
Select *
From table a
Join lookup_table b On On a.category_id = b.category_id
And b.category = <parameters.pCategory>
)a1
(I've placed the filter condition directly onto the join as this can improve performance in some circumstances, though this actually shouldn't make much difference)
And when used in conjunction with the Set parameter action, you can now use parameters as in/out updateable variables which shall update as the user interacts directly with the viz, instead of the user needing to manually update as they go. If you haven't used these before, I wrote an article about it here: https://community.tableau.com/s/news/a0A4T00000313S0UAI/psst-have-you-had-a-go-with-variables-in-tableau-yet
Steve

sql query search into two tables, performance issue

I have two tables (I'll list only fields that I want to search)
MySQL version: 5.0.96-community
Table 1: Clients (860 rows)
ID_CLIENT (varcahar 10)
ClientName (text)
Table 2: Details (22380 rows)
ID_CLIENT (varchar 10)
Details (varchar 1000)
The Details table can have multiple rows from the same client.
I need to search into those two tables and retrive the ids of clients that match a search value.
If i have a search value "blue" it has to match CLientName (ex the Blueberries Corp), or the Details in the second table (ex "it has a blue logo)
The result should be a list of client id's that match the criteria.
If I make a query for one table, it takes a decent time
#0.002 sec
select a.`ID_CLIENT` from clienti a
where
a.`ClientName` LIKE '%blue%'
#0.1 sec
SELECT b.`ID_CLIENT` FROM Details b
WHERE b.`Details` LIKE '%blue%'
GROUP BY b.`GUID_CLIENT`
But if I try to join those two queries it takes ages.
My questions(s)
What's the best way of doing what I need here, to get a list of ID-s based on the search result from both tables
What to change to improve search performance in the Details table, I know that %..% is not fast, but I need partial matches too.
Edit (based on the answers)
#~0.2 sec
(SELECT a.`ID_CLIENT` FROM `clienti` a where a.`ClientName` like '%blue%')
union
(SELECT distinct b.`ID_CLIENT` FROM `Details` b where b.`Details` like '%blue%')
It returns a list of IDs from both tables filtred by the search value.
Edit 2: final query
And with that list of ids I can filter the client table, to get only the clients that are in boths tables based on their id
select cl.`ID_CLIENT`, `ClientName`, `OtherField`
from clients cl
join
((SELECT a.`ID_CLIENT` FROM `clients` a where a.`ClientName` like '%blue%')
union
(SELECT distinct b.`ID_Client` FROM `Details` b where b.`Detail` like '%blue%' )) rez
on cl.`ID_CLIENT` = rez.`ID_CLIENT`
If your two queries work, just use union:
select a.`ID_CLIENT`
from clienti a
where a.`ClientName` LIKE '%blue%'
union
SELECT b.`ID_CLIENT`
FROM Details b
WHERE b.`Details` LIKE '%blue%';
The union will remove all duplicates, so you don't need a separate group by query.
Why are the two search strings different for the two tables? The question suggests searching for blue in both of them.
If the individual queries don't perform well, you might need to switch to a full text index.
If both queries are functioning as you wish, simply union the results together. This is much faster than 'or' on a query that cannot effectively use indexes and it will also allow you to remove duplicates in the same statement.
Divide et impera: create two subqueries.
In the first subquery, left join Clients with Details based on client_id, filtering on row values where clientname like '%xxxxx%'
Then another subquery, left joining details with clients ( but keep the same order of fields in output projection), filtering on details text field. Then create a union query from the two subqueries, and finally create a select distinct * from this union
Final schema:
Select distinct * from (subquery1 union subquery2)
This seems to be a really slow and silly "manual query plan optimization", just give a try and let we know of it works!

How to avoid Cartesian product in an INNER JOIN query?

I have 6 tables, let's call them a,b,c,d,e,f. Now I want to search all the colums (except the ID columns) of all tables for a certain word, let's say 'Joe'. What I did was, I made INNER JOINS over all the tables and then used LIKE to search the columns.
INNER JOIN
...
ON
INNER JOIN
...
ON.......etc.
WHERE a.firstname
~* 'Joe'
OR a.lastname
~* 'Joe'
OR b.favorite_food
~* 'Joe'
OR c.job
~* 'Joe'.......etc.
The results are correct, I get all the colums I was looking for. But I also get some kind of cartesian product, I get 2 or more lines with almost the same results.
How can i avoid this? I want so have each line only once, since the results should appear on a web search.
UPDATE
I first tried to figure out if the SELECT DISTINCT thing would work by using this statement: pastie.org/970959 But it still gives me a cartesian product. What's wrong with this?
try SELECT DISTINCT?
On what condition do you JOIN this tables? Do you have foreign keys or something?
Maybe you should find that word on each table separately?
What kind of server are you using? Microsoft SQL Server has a full-text index feature (I think others have something like this too) which lets you search for keywords in a much less resource-intensive way.
Also consider using UNION instead of joining the tables.
Without seeing your tables, I can only really assume what's going on here is you have a one-to-many relationship somewhere. You probably want to do everything in a subquery, select out the distinct IDs, then get the data you want to display by ID. Something like:
SELECT a.*, b.*
FROM (SELECT DISTINCT a.ID
FROM ...
INNER JOIN ...
INNER JOIN ...
WHERE ...) x
INNER JOIN a ON x.ID = a.ID
INNER JOIN b ON x.ID = b.ID
A couple of things to note, however:
This is going to be sloooow and you probably want to use full-text search instead (if your RDBMS supports it).
It may be faster to search each table separately rather than to join everything in a Cartesian product first and then filter with ORs.
If your tables are entity type tables, for example a being persons and b being companies, I don't think you can avoid a cartesian product if you search for the results in this way (single query).
You say you want to search all the tables for a certain word, but you probably want to separate the results into the corresponding types. Right? Otherwise a web search would not make much sense.
So if you seach for 'Joe', you want to see persons containing the name 'Joe' and for example the company named 'Joe's gym'. Since you are searching for different entities so you should split the search into different queries.
If you really want to do this in one query, you will have to change your database structure to accommodate. You will need some form of 'search table' containing an entity ID (PK) and entity type, and a list of keywords you want that entity to be found with. For example:
EntityType, EntityID, Keywords
------------------------------
Person, 4, 'Joe', 'Doe'
Company, 12, 'Joe''s Gym', 'Gym'
Something like that?
However it's different when your search returns only one type of entity, say a Person, and you want to return the Persons for which you get a hit on that keyword (in any related table to that Person). Then you will need to select all the fields you want to show and group by them, leaving out the fields in which you are searching. Including them inevitably leads to a cartesian product.
I'm just brainstorming here, by the way. It hope it's helpful.

Best way to check that a list of items exists in an SQL database column?

If I have a list of items, say
apples
pairs
pomegranites
and I want to identify any that don't exist in the 'fruit' column in an SQL DB table.
Fast performance is the main concern.
Needs to be portable over different SQL implementations.
The input list could contain an arbitrary number of entries.
I can think of a few ways to do it, thought I'd throw it out there and see what you folks think.
Since the list of fruits you are selecting from can be arbitrarily long, I would suggest the following:
create table FruitList (FruitName char(30))
insert into FruitList values ('apples'), ('pears'), ('oranges')
select * from FruitList left outer join AllFruits on AllFruits.fruit = FruitList.FruitName
where AllFruits.fruit is null
A left outer join should be much faster than "not in" or other kinds of queries.
Make the search list into a string that looks like '|fruit1|fruit2|...fruitn|' and make your where clause:
where
#FruitListString not like '%|' + fruit + '|%'
Or, parse the aforementioned string into a temp table or table variable and do where not in (select fruit from temptable). Depending on the number of items you're searching for and the number of items being searched, this method could be faster.
if exists(select top 1 name from fruit where name in ('apples', 'pairs', 'pomegranates'))
PRINT 'one exists'