SQL - how to make a query account for all potential IDs - sql

There are 10 possible construction sites that I need to account for in my report. However, as my code is now, when a construction site is not in my database, it isn't accounted for at all, which makes sense, but I would prefer it list all of the possible construction sites and put 0 as the value instead of returning nothing. The reason I need to do this is because I am creating reports based off these queries, and it's hard to line everything up unless I consistently have all of the construction sites accounted for every time. Here is the SQL:
TRANSFORM Count(Main.ID) AS CountOfID
SELECT 'Total IDs' AS [Construction site >>>]
FROM Research INNER JOIN Main ON Research.Primary_ID = Main.ID
GROUP BY 'Total IDs'
PIVOT Research.Construction_site;
By the way, I am using MS Access 2007 is that makes a difference.
Thanks

If I'm reading your question correctly, you want all fields from the Research table, regardless of whether they are in the Main table. In which case, you just need a LEFT OUTER JOIN:
TRANSFORM Count(Main.ID) AS CountOfID
SELECT 'Total IDs' AS [Construction site >>>]
FROM Research LEFT OUTER JOIN Main ON Research.Primary_ID = Main.ID
GROUP BY 'Total IDs'
PIVOT Research.Construction_site;
This will return all rows from the Research table at least once - and multiple times if they exist more than once in the Main table.

Most probably you need to replace INNER JOIN with LEFT JOIN. (I.e. simply change 'INNER' to 'LEFT'.) That way the construction sites not represented in Main won't be filtered out.

Related

PostgreSQL: SQL JOIN with "Lookahead" Condition in ON or WHERE Clause

I am not sure how best to describe the problem I have, but it feels very much like an SQL query need for a lookahead condition such as those in a regular expression :). Pardon my verbosity as I try to find a way to express this problem.
I have a relatively complex schema of medical services, patient info, patient payment types (including insurances, workers comp info, etc), and medical sales commission earnings. Naturally there are sales reporting tools. What follows is a very oversimplified version of a query involving varying JOIN conditional cases. Most of this query is just context for the clause my question focuses on, the 2nd-to-last WHERE condition (which defines JOIN conditions):
SELECT vendor_services.*, patients.*, payments.*, vendor_products.*, products.*, vendor_product_commissions.*, vendor_commissions.*, commission_earners.*, users.*
FROM vendor_services
JOIN patients ON patients.vendor_id = vendor_services.vendor_id
JOIN payments ON payments.patient_id = patiends.id
JOIN payment_types ON payment_types.id = payments.payment_type_id
JOIN vendor_products ON vendor_products.id = vendor_services.vendor_product_id
JOIN products ON products.id = vendor_products.product_id
JOIN vendor_product_commissions ON vendor_product_commissions.vendor_product_id = vendor_products.id
JOIN vendor_commissions ON vendor_commissions.id = vendor_product_commissions.vendor_commissions.id
LEFT JOIN commission_earners ON commission_earners.id = vendor_commissions.commission_earners_id
JOIN users ON commission_earners.user_id = users.id
WHERE
vendor_services.state != 'In Progress'
AND
vendor_services.date BETWEEN :datetime_1 AND :datetime_2
AND
vendor_commissions.start_date > :datetime_1
AND
vendor_commissions.end_date < :datetime_2
AND
vendor_product_commissions.payment_type = payment_types.type
AND
payments.transaction_type = 'Paid'
GROUP BY
....
Again, this is very oversimplified: the SELECT clause is far more complex, as are the GROUP BY and ORDER clauses, performing CASE statements and aggregate calculations, etc. I have left out many other tables which represent other systems within the overall application, and focused just on the data and clauses that are relevant. My question is in regards to a needed change to this particular WHERE condition regarding the following JOIN:
WHERE ... AND vendor_product_commissions.payment_type = payment_types.type
There has been an introduction of a new possible vendor_product_commissions.payment_type value that is not a member of the payment_types.type values. With the SQL query exactly as is, it no longer selects rows in most cases, as much of the LEFT table will be using this new value. When adding an OR clause, then duplicate rows are selected when only one row should be selected:
WHERE ... AND vendor_product_commissions.payment_type = payment_types.type OR vendor_product_commissions.payment_type = 'DEFAULTVALUE'
What I need is to JOIN only on the row where vendor_product_commissions.payment_type = payment_types.type, unless that produces NULL, in which case I need to perform the JOIN on vendor_product_commissions.payment_type = 'DEFAULTVALUE'.
I can do this programatically with the ORM code surrounding this query, but that is very inefficient for a very large reporting system (essentially, query first for the specific type, then if none returned, query again for the "default" type).
I dont believe this feature exists in PostgreSQL, but thats why I am describing it as a "lookahead JOIN" problem - I need to have a sort of CASE statement that if the first JOIN condition produces a NULL relation, then perform the subsequent JOIN (OR) condition to match against this newly introduced value ('DEFAULTVALUE'). Can this be done in raw SQL? Or do I need to break this whole query apart and perform the selection of services and related data, and then programatically / iteratively relate it (via ORM/application language code) to the sales commission data? I have a strong hunch that the query can be modified to do this, but without being knowledgeable of a particular label or term for this problem, I am having a hard time searching for a possible SQL-based solution.
This is for a Ruby on Rails 4 application, using ActiveRecord, though the SQL JOIN statements are all in plaintext / strings since AR doesnt provide methods for LEFT JOIN (again, there are more and more types of JOIN statements than those listed above). I am not sure if Rails is relevant to my question, but I figured I would mention it.

MS Access: LEFT JOIN with Filters

My MS Access version is 2003, in case that matters.
I have a single table with daily values for securities in an account. I'd like to compare the values of all securities in each account, one year ago versus today (and create an expression for the difference). The securities in the account may change over the course of a year, so there must be NULL values when linking by security. Accordingly, I'd like to perform a FULL OUTER JOIN, which I understand is not possible in MS Access. Alternatively, I'll have to create a UNION of a LEFT JOIN and RIGHT JOIN, as suggested in this SO post.
Although the below query behaves like an INNER JOIN, I believe the picture will help illustrate what I'm trying to accomplish:
I understand that creating this query in Design View causes the filters to go into the WHERE clause, which is filtering out data before the LEFT JOIN is performed. I'm attempting to replicate the solution proposed in this SO post, so far unsuccessfully. Following is my current SQL statement:
SELECT dbo_vw_Core_Monitor_Historical.AsOFdate,
dbo_vw_Core_Monitor_Historical.Account,
dbo_vw_Core_Monitor_Historical.SecID,
dbo_vw_Core_Monitor_Historical.YTM,
dbo_vw_Core_Monitor_Historical_1.AsOFdate,
dbo_vw_Core_Monitor_Historical_1.Account,
dbo_vw_Core_Monitor_Historical_1.SecID,
dbo_vw_Core_Monitor_Historical_1.YTM,
[dbo_vw_Core_Monitor_Historical_1.YTM] - [dbo_vw_Core_Monitor_Historical.YTM] AS YTM_Change
FROM dbo_vw_Core_Monitor_Historical
LEFT JOIN
dbo_vw_Core_Monitor_Historical AS dbo_vw_Core_Monitor_Historical_1
ON ((dbo_vw_Core_Monitor_Historical.Account = dbo_vw_Core_Monitor_Historical_1.Account)
AND (dbo_vw_Core_Monitor_Historical.SecID = dbo_vw_Core_Monitor_Historical_1.SecID)
AND ((dbo_vw_Core_Monitor_Historical_1.AsOFdate)=#12/8/2015#))
WHERE ((dbo_vw_Core_Monitor_Historical.AsOFdate)=#12/8/2014#);
I've tried a few different queries, but I believe the above is most correct based on what I've gathered from SO. This causes MS Access to immediately crash. I'm expecting output something like the below (where the highlights are for SecID's no longer in the account as of 12/8/2015:
Any advice? Is this just a symptom of using MS Access, rather than some more robust database?
You must make a difference between records that are there but with NULL fields and records that are missing on the right (outer) side. If you are dealing with missing records, your query looks correct. I have no idea why Access crashes. You could change the query into a pass-through query. This means that the query will be executed by SQL-Server. Of course it must be written in T-SQL then:
SELECT
A.AsOFdate,
A.Account,
A.SecID,
A.YTM,
B.AsOFdate,
B.Account,
B.SecID,
B.YTM,
B.YTM - A.YTM AS YTM_Change
FROM
dbo.vw_Core_Monitor_Historical A
LEFT JOIN dbo.vw_Core_Monitor_Historical AS B
ON A.Account = B.Account AND
A.SecID = B.SecID AND
B.AsOFdate = '2015/12/08'
WHERE
A.AsOFdate = '2014/12/08';

Multi Table NOT EQUAL in Access Query

I have two tables. VEHICLES and OWNERSHIP. I am trying to make a query that will give me a list of all VEHICLES NOT in the OWNERSHIP table. I basically need a report of my available VEHICLE inventory. I tried this query:
SELECT VEHICLE.*
FROM VEHICLE, OWNERSHIP
WHERE (VEHICLE.VEH_ID <> OWNERSHIP.VEH_ID);
Im getting:
When I do an equal I get all vehicles which are listed in the ownership so that works. But the NOT Equal does not. Any ideas?
Try
SELECT VEHICLE.*
FROM VEHICLE
WHERE NOT EXISTS
(SELECT NULL FROM OWNERSHIP WHERE VEHICLE.VEH_ID= OWNERSHIP.VEH_ID);
The NOT EXISTS approach can be slow if your tables contain many rows. An alternative approach which can be much faster is to use a LEFT JOIN with a WHERE clause to return only the rows where the right-hand join field is Null.
SELECT VEHICLE.*
FROM
VEHICLE AS v
LEFT JOIN OWNERSHIP AS o
ON v.VEH_ID = o.VEH_ID
WHERE o.VEH_ID Is Null;
You could use Access' "Find Unmatched Query Wizard" to create a similar query.
If both tables are small you probably won't notice a difference. But it should be easy to check whether the difference is noticeable. And this approach will serve you better if your tables grow substantially over time.

Understanding how JOIN works when 3 or more tables are involved. [SQL]

I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]
If I compose a query e.g
SELECT *
FROM A JOIN B
ON A.AId = B.AId
All good - I'm sweet with how this works.
What happens when Table C (Or some other D,E, .... gets added)
In the situation
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values therein)?
Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
If I was trying to find a list of all the opening and closing balances for an account
I might do something like
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
I know that the example tables do not conform to good DB design
Apologies for the essay, thanks for any elightenment :)
Edit in response to question by Marc
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
Conceptually here is what happens when you join three tables together.
The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
The query execution engine applies any predicates (WHERE clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN conditions or the SELECT list or the ORDER BY list. Call this result A
It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
Then it joins in the final table and applies the ORDER BY
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
We know that the data from B is going to be filtered by the (inner) join to A (the data in A is also filtered). So if we (inner) join from B to C, thus the set C is also filtered by the relationship to A. And note also that any duplicates from the join will be included.
However; what order this happens in is up to the optimizer; it could decide to do the B/C join first then introduce A, or any other sequence (probably based on the estimated number of rows from each join and the appropriate indexes).
HOWEVER; in your later example you use a LEFT OUTER join; so Account is not filtered at all, and may well my duplicated if any of the other tables have multiple matches.
Are there duplicates (per account) in BalanceToken?
I often find it helps to view the actual execution plan. In query analyser/management studio, you can turn this on for queries from the Query menu, or use Ctrl+M. After running the query, the plan that was executed is shown in another result tab. From this you'll see that C and B are joined first, and then the result is joined with A. The plan might vary depending on information the DBMS has because both joins are inner, making it A-and-B-and-C. What I mean is that the result will be the same regardless of which is joined first, but the time it takes might differ greatly, and this is where the optimiser and hints come into play.
Joins can be tricky, and much of the behavior is of course dictated by how the data is stored in the actual tables.
Without seeing the tables it's hard to give a clear answer in your particular case but I think the basic issue is that you are summing over multiple result sets that are being combined into one.
Perhaps instead of multiple joins you should make two separate temporary tables in your query, one with the accountID, date and sum of openingbalances, a second one with the accountID, date and sum of closing balances, then joining those two on AccountID and date.
In order to find out exactly what is happening with joins, also in your specific case, I would do the following:
Change the initial part
SELECT accountID Accountbalancedate, sum(...) as openingbalance,
sum(...) as closingbalance FROM
to simply
"SELECT * FROM"
Study the resulting table, and you will see exactly what data is being duplicated. Remove the joins one by one and see what happens. This should give you a clue to what it is about your particular data that is causing the dupes.
If you open the query in SQL server management studio (Free version exists) you can edit the query in the designer. The visual view of how the tables are being joined might also help you realize what's going on.

Object Relational Mapping Issues: Suggestions needed

I've been trying to come up with a good design pattern for mapping data contained in relational databases to the business objects I've created but I keep hitting a wall.
Consider the following tables:
TYPE: typeid, description
USER: userid, username, usertypeid->TYPE.typeid, imageid->IMAGE.imageid
IMAGE: imageid, location, imagetypeid->TYPE.typeid
I would like to gather all the information regarding a specific user. Creating a query for this isn't too difficult.
SELECT u.*, ut.*, i.*, it.* FROM user u
INNER JOIN type ut ON ut.typeid = u.usertypeid
INNER JOIN image i ON i.imageid = u.imageid
INNER JOIN type it ON it.typeid = i.imagetypeid
WHERE u.userid = #userid
The problem is that the field names collide and then I'm forced to alias every single field which gets out of hand very quickly.
Does anyone have a decent design pattern for this kind of thing?
I've thought about retrieving multiple results from a single stored procedure and then using a dataset to iterate through each one but I'm worried that some performance issues might bite me in the butt later. For example instead of the above query something like:
SELECT u.*, t.* FROM user u
INNER JOIN type t ON t.typeid = u.usertypeid
WHERE u.userid = #userid;
SELECT i.*, t.* FROM image i
INNER JOIN type t ON t.typeid = i.imagetypeid
INNER JOIN user u ON u.imageid = i.imageid
WHERE u.userid = #userid;
Does that seem like a decent solution? Can anyone foresee any issues with this approach?
Never use the SQL * wildcard in production code. Always spell out all the columns you want to retrieve.
Then aliasing some of them doesn't seem like such a huge amount of extra work.
Re your comment asking for background and reasoning:
Sometimes you don't really need every column from all tables, and fetching them can be needlessly costly (especially for large strings and blobs). There is no SQL syntax for "all columns except the following exceptions."
You can't alias columns that you fetch using the wildcard. Once you need to alias any of the columns, you need to expand the wildcard to list all the columns explicitly.
If the table structure changes, e.g. columns are renamed, reordered, dropped, or added, then the wildcard fetches them all, by position as defined in the tables. This may seem like a convenience, but not when your application depends on columns being in the result set by a given name or in a given position. You can get mysterious bugs where your application displays columns in the wrong order (if referencing columns by position), or shows them as blank (if referencing columns by name).
However, if the SQL query names columns explicitly, you can employ the "Fail Early" principle. This helps debugging, because it leads you directly to the SQL query that needs to be edited to account for the schema change.