How to retrieve identical entries from two different queries which are based on the same table - sql

I think I got a knot in my line of thought, surely you can untie it.
Basically I have two working queries which are based on the same table and result in an identical structure (same as source table). They are simply two different kinds of row filters. Now I would like to "stack" these filters, meaning that I want to retract all the entries which are in query a and query b.
Why do I want that?
Our club is structured in several local groups and I need to hand different kinds of lists (e.g. members with email-entry) to these groups. In this example I would have a query "groupA" and a query "newsletter". Another could be "groupB" and "activemember", but also "groupB" and "newsletter". Unfortunately each query is based on a set of conditions, which imho would be stored best in a single query instead of copying the conditions several times to different queries (in case something changes).
Judging from the Venn diagrams 1, I suppose I need to use INNER JOIN but could not get it to work. Neither with the LibreOffice Base query assistant nor an SQL-Code. I tried this:
SELECT groupA.*
FROM groupA
INNER JOIN newsletter
ON groupA.memberID = newsletter.memberID
The error code says: Cannot be in ORDER BY clause in statement
I suppose that the problem comes from the fact, that both queries are based on the same table.
May be there is an even easier way of nesting queries?
I am hoping for something like
SELECT * FROM groupA
WHERE groupA.memberID = newsletter.memberID
Thank you and sorry if this already has a duplicate, I just could not find the right search terms.

Related

SQL SELECT JOIN Query over multiple tables

I’m trying to get data about Installations in buildings. The problem is that one building can have multiple installations and I’m unsure how to adjust my sql for that as the initial table I query only holds the relations that own the buildings.
Here’s the situation.
Table 1 (RELRLGRP) holds the id of the group the relations that own the buildings that have the installations that have the data I need.
This is what I have so far, I’m worried I shouldn’t use this many joins in an SQL statement but cannot find a quicker link between the information I need from my starting point at the group of relations till the installation data I seek in the BORGINST table. Please disregard the select portion of the statement (removed it for clarity).
SELECT *
FROM RELRLGRP A
JOIN RELATION R ON A.RELATION_GC_ID = R.GC_ID
JOIN BUILDING G ON R.CODE = G.GC_CODE
JOIN INSTALL I ON G.GC_CODE = I.GC_CODE
JOIN BORGINST B ON I.GC_ID = B.GC_ID
WHERE A.RELGROUP_GC_ID LIKE '100109' (<- the group the relations belong to)
I’ve done some rudimentary SQL but this linking through tables is new territory for me, in that sense I’d be happy to know if this many join statements are the way to go or if I should head a different route entirely.
JesseJ - Since I don't know all the columns that exist in your tables, I am going to assume that you are joining on primary keys. If this is the case, your solution may be the only one available to link the RELRLGRP to the BORGINST table.
Linking multiple tables like you are doing can be common in a normalized database.
Example:
In the example I posted, in order to find the State where a particular transaction happened, you have to link all the tables together. There is no shortcut.
Don't sweat it: I have views with three times as many joins. Every join does add complexity and suck more processor, but it really comes down to performance: if this process doesn't finish as quickly as you need it to, you can look into other methods, but otherwise multiple joins like this are perfectly fine.

How to check if a SQL SELECT query is a subset of other query

I an trying to find a way to determine whether or not an SQL SELECT query A is prone to return a subset of the results returned by another query B. Furthermore, this needs to be acomplished from the queries alone, without having access to the respective result sets.
For example, the query SELECT * from employee WHERE salary >= 1000 will return a subset of the results of query SELECT * from employee. I need to find an automated way to perform this validation for any two queries A and B, without accessing the database that stores the data.
If it is unfeasable to achieve this without the aid of an RDBMS, we can assume that I have access to a local, but empty RDBMS, but with the data stored somewhere else. In addition, this check must be done in code, either using an algorithm or a library. The language I am using is Java, but other language will also do.
Many thanks in advance.
I don't know how deep you want to get into parsing queries, but basically you can say that there are two general ways of making a subset of a query (given that source table and projection(select) staying the same):
using where clause to add condition to row values
using having clause to add conditions to aggregated values
So you can say that if you have two objects that represent queries and say they look something close to this:
{
'select': { ... },
'from': {},
'where': {},
'orderby': {}
}
and they have select, from and orderby to be the same, but one have extra condition in the where clause , you have a subset.
One way you might be able to determine if a query is a subset of another is by examining their source tables. If you don't have access to the data itself, this can be tricky. This question references using Snowflake joins to generate database diagrams based on a query without having access to the data itself:
Generate table relationship diagram from existing schema (SQL Server)
If your query is 800 characters or less, the tool is free to use: https://snowflakejoins.com/index.html
I tested it out using the AdventureWorks database and these two queries:
SELECT * FROM HumanResources.Employee
SELECT * FROM HumanResources.Employee WHERE EmployeeID < 200
When I plugged both of them into the Snowflake Joins text editor, this is what was generated:
SnowflakeJoins DB Diagram example
Hope that helps.

Efficient way to query similarly-named tables with identical column names

I'm building a report in SSRS which takes data from several tables with similar names. There are three different 'sets' of tables - i.e., 123xxx, 456xxx, and 789xxx. Within these groups, the only difference in the table names is a three-digit code for a worksite, so, for example, we might have a table called 123001, 123010, and 123011. Within each set of tables, the columns have the same names.
The problem is that there are about 15 different sites, and I'm taking several columns from each site and each group of tables. Is there a more efficient way to write that query than to write out the name of every single column?
I don't believe there is but I feel like the use of aliases on your tables would make it much easier to undestand/follow your query building.
Also, if you aren't comparing values on the tables at all, then maybe a union between each table select would help make sense too.
I would give each table an alias.
SELECT s1t1.name
FROM Site1Table1 as s1t1;

UNION in a subquery throwing the numbers

I'm working on a project for a landing page. Basically, there are multiple criteria that the user can select that will run a query on a DB2 database and return the results. The queries are broken down into various pieces that are assembled depending on user criteria and parameters inserted. While I'm having some difficulty with some that are return giant datasets pulled from even larger tables and joins, there's one that stands out as an oddball when I run some performance numbers on the database.
One thing that all of these fully-assembled queries have in common is that they are filtered on a list of use ids. There are half a dozen or so of these queries that return datasets of varying sizes. Most of them are pretty straightforward, ie:
TABLE.COLUMN IN (subquery with a few joins that returns a column of user ids)
These subqueries take diddly for time to run by themselves. However, one of these requires a union. Essentially, one table contains a key that has to be used to gather user ids from two different tables, so two sets of user ids must be unioned to get a single list for the subquery, ie:
TABLE.COLUMN IN (subquery UNION subquery)
It's my guess that the DB2 optimizer runs into a lot more limitations when going over a subquery with a union than one with a simple series of joins and can't handle it as well. This particular subquery is middle-of-the-road when it comes to the amount of data it collects, so it's not an issue with a giant dataset.
I'm wondering what alternatives I might have to a union that would at least bring this subquery in line with the others. It's a bit maddening that making changes may help this particular case, but show a detriment to the others, or vice versa. I've tinkered with a few things, but with no luck. The explain plan shows that the proper indexes are being utilized, at least. I know that I don't have much in the way of examples, but these queries are pretty massive overall and it would be difficult to post the necessary data concisely, but let me know if it's necessary and I'll try to knock something together. Thanks.
You try these two alternatives to a union:
WHERE TABLE.COLUMN IN (subquery1)
OR TABLE.COLUMN IN (subquery2)
Or using filtering joins:
SELECT *
FROM TABLE T
LEFT JOIN
(
subquery1
) f1
ON f1.COLUMN = T.COLUMN
LEFT JOIN
(
subquery2
) f1
ON f2.COLUMN = T.COLUMN
WHERE f1.COLUMN IS NOT NULL
OR f2.COLUMN IS NOT NULL

How (and where) should I combine one-to-many relationships?

I have a user table, and then a number of dependent tables with a one to many relationship
e.g. an email table, an address table and a groups table. (i.e. one user can have multiple email addresses, physical addresses and can be a member of many groups)
Is it better to:
Join all these tables, and process the heap of data in code,
Use something like GROUP_CONCAT and return one row, and split apart the fields in code,
Or query each table independently?
Thanks.
It really depends on how much data you have in the related tables and on how many users you're querying at a time.
Option 1 tends to be messy to deal with in code.
Option 2 tends to be messy to deal with as well in addition to the fact that grouping tends to be slow especially on large datasets.
Option 3 is easiest to deal with but generates more queries overall. If your data-set is small and you're not planning to scale much beyond your current needs its probably the best option. It's definitely the best option if you're only trying to display one record.
There is a fourth option however that is a middle of the road approach which I use in my job in which we deal with a very similar situation. Instead of getting the related records for each row 1 at a time, use IN() to get all of the related records for your results set. Then loop in your code to match them to the appropriate record for display. If you cache search queries you can cache that second query as well. Its only two queries and only one loop in the code (no parsing, use hashes to relate things by their key)
Personally, assuming my table indexes where up to scratch I'd going with a table join and get all the data out in one go and then process that to end up with a nested data structure. This way you're playing to each systems strengths.
Generally speaking, do the most efficient query for the situation you're in. So don't create a mega query that you use in all cases. Create case specific queries that return just the information you need.
In terms of processing the results, if you use GROUP_CONCAT you have to split all the resulting values during processing. If there are extra delimiter characters in your GROUP_CONCAT'd values, this can be problematic. My preferred method is to put the GROUPed BY field into a $holder during the output loop. Compare that field to the $holder each time through and change your output accordingly.