Joining tables depending on value of cell in another table

Joining tables depending on value of cell in another table - sql

I have a database with a few tables of which I am trying to get some data from.
but due to the layout (which I can't do anything about), I can't seem to get a normal JOIN to work.
I have three tables:
datorer
program
volymlicenser
In the table "datorer" is a number of computers (registered with AD name, room number and a cell for comments).
In the table "program" is different programs that my organization have purchased.
In the table "volymlicenser" is the few licenses owned by the organization that is volume licenses.
The cells in here is ID, RegKey and comp_name.
Most programs are OEM licenses and only installed on one computer, hence they never needed to register the program names together with the belonging computer in another table like with the volume licenses.
When the database was designed, it was only containing the two last tables, and no join queries was needed. Recently they added the table "datorer" which consists of the said cells above.
What I would like to do now, is, preferably by one single query, see if the boolean cell program.VL is set to true.
If so, I want to join progran.RegKey on volymlicenser.RegKey, and from there get the contents from volymlicenser.comp_name.
The query I tried with, is the following.. which did not work.
SELECT
prog.Namn AS Program, prog.comp_name AS Datornamn,
pc.room AS Rum, pc.kommentar AS Kommentar
FROM
program AS prog
JOIN
datorer AS pc ON prog.comp_name = pc.comp_name
JOIN
volymlicenser AS vl ON vl.RegKey = prog.RegKey
WHERE
prog.Namn = "Adobe Production Premium CS6"
Hope someone can help me. :)
Please do ask if something is not fully clear!
The following are example records and desired results:
Table datorer:
| id | comp_name | room | kommentar|
|----------------------------------|
| 1 | MB-56C5 | 1.1 | NULL |
| 2 | MB-569B | 4.1 | NULL |
Table program:
| id | Namn | amount | VL | RegKey | comp_name | leveranotor | purchased | note | Suite | SuiteContents |
|-----------------------------------------------------------------------------------------------|
| 1 | Adobe Production Premium CS6 | 2 | 1 | THE-ADOBE-SERIAL | NULL | Atea | 2012-11-01 | Purchased 2012 together with new computers | 1 | The contents of this suite |
| 2 | Windows 7 PRO retail | 1 | 0 | THE-MS-SERIAL | MB-569B | Atea | 2012-11-01 | Purchased 2012 together with new computers | 0 | NULL |
| 3 | Windows 7 PRO retail | 1 | 0 | THE-MS-SERIAL | MB-56C5 | Atea | 2012-11-01 | Purchased 2012 together with new computers | 0 | NULL |
Table volymlicenser:
| id | RegKey | comp_name |
|-----------------------------------|
| 1 | THE-ADOBE-SERIAL | MB-569B |
Desired result according to the SQL select query:
| Program | Computer name | Room | Kommentar|
|-------------------------------------------|
| Adobe Production Premium CS6 | MB-569B | 4.1 | NULL |
|-------------------------------------------|
Desired result when querying for Windows 7 PRO retail:
| Program | Computer name | Room | Kommentar|
|-------------------------------------------|
| Windows 7 PRO Retail | MB-569B | 4.1 | NULL |
| Windows 7 PRO Retail | MB-56C5 | 1.1 | NULL |
Desired result if the "WHERE" was changed to "Windows 7 PRO Retail"
Simply put, if program.VL is 1, the comp_name will be found in the volymlicenser.comp_name column.
If program.VL is 0, the comp_name will be found in program.comp_name column.
Uppon finding the comp_name, it needs to join comp_name from any of these tables on datorer.comp_name to get the room number.
I hope that this makes as much sense to you as it does to me.

Take a look at COMP_NAME in PROGRAM -- it's NULL for the Adobe product. In following, the first regular join you wrote cut the Adobe out of the results. So, after the first join, you just ended up with the the Microsoft products. And then the second join using Reg_Key would have gotten you an empty table because the remaining RegKeys refer solely to "THE-MS-SERIAL".
Instead...
SELECT prog.namn, coalesce(vl.comp_name, prog.comp_name), pc.room, pc.kommentar
FROM program as prog LEFT JOIN volymlicenser as vl
ON prog.RegKey = vl.RegKey
LEFT JOIN dataorer as pc
ON coalesce(vl.comp_name, prog.comp_name) = pc.comp_name
The use of left joins will preserve the contents of the tables to the left of the join syntax. This join method is required because the join keys are not consistently filled out through all three tables. And the coalesce function acts like an ifelse function. If the first variable is null then it is replaced with the contents of the next variable. Nifty.
By the way, I haven't run this myself.

You're probably better off creating 2 inline tables, one with each of your JOIN configurations, then using a CASE to decide which to select from:
SELECT
CASE WHEN table1.column1 = "A"
THEN table2.column2
ELSE
table1.column2
END
FROM
(SELECT t1.id, t1.column1, t2.column2
FROM t1 INNER JOIN t2 ON t1.x = t2.y) table1 INNER JOIN
(SELECT t1.id, t1.column1, t3.column2
FROM t1 INNER JOIN t3 ON t1.x = t2.y) table2 ON table1.id = table2.id;

I am trying the following:
SELECT
CASE WHEN program.VL = "1"
THEN Volymlicenser.comp_name AS Datornamn
ELSE
program.comp_name AS Datornamn
END
FROM
(SELECT prog.Namn AS Program, pc.room AS Room, pc.kommentar AS Komentar
FROM program AS prog INNER JOIN Volymlicenser AS vlic ON vlic.RegKey = prog.RegKey) table1 INNER JOIN
(SELECT t1.id, t1.column1, t3.column2
FROM t1 INNER JOIN t3 ON t1.x = t2.y) table2 ON table1.id = table2.id;
but come to a grinding halt at the select's..
I seriously don't understand the table1, table2, t1, t2 and t3 here.
loltempast: Could you (or anyone) please claryfy as I don't seem to understand how the query should be made.

Related

Complex SQL Joins with Where Clauses

Being pretty new to SQL, I ask for your patience. I have been banging my head trying to figure out how to create this VIEW by joining 3 tables. I am going to use mock tables, etc to keep this very simple. So that I can try to understand the answer - no just copy and paste.
ICS_Supplies:
Supplies_ ID Item_Description
-------------------------------------
1 | PaperClips
2 | Rubber Bands
3 | Stamps
4 | Staples
ICS_Orders:
ID SuppliesID RequisitionNumber
----------------------------------------------------
1 | 1 | R1234a
6 | 4 | R1234a
2 | 1 | P2345b
3 | 2 | P3456c
4 | 3 | R4567d
5 | 4 | P5678e
ICS_Transactions:
ID RequsitionNumber OrigDate TransType OpenClosed
------------------------------------------------------------------
1 | R1234a | 06/12/20 | Req | Open
2 | P2345b | 07/09/20 | PO | Open
3 | P3456c | 07/14/20 | PO | Closed
4 | R4567d | 08/22/20 | Req | Open
5 | P5678e | 11/11/20 | PO | Open
And this is what I want to see in my View Results
Supplies_ID Item RequsitionNumber OriginalDate TransType OpenClosed
---------------------------------------------------------------------------------------
1 | Paper Clips | P2345b | 07/09/20 | PO | OPEN
2 | Rubber Bands | Null | Null | Null | Null
3 | Stamps | Null | Null | Null | Null
4 | Staples | P56783 | 11/11/20 | PO | OPEN
I just can't get there. I want to always have the same amount of records that we have in the ICS_Supplies Table. I need to join to the ICS_Orders Table in order to grab the Requisition Number because that's what I need to join on the ICS_Transactions Table. I don't want to see data in the new added fields UNLESS ICS_Transactions.TransType = 'PO' AND ICS_Transactions.OpenClosed = 'OPEN', otherwise the joined fields should be seen as null, regardless to what they contain. IF that is possible?
My research shows this is probably a LEFT Join, which is very new to me. I had made many attempts on my own, and then posted my question yesterday. But I was struggling to ask the correct question and it was recommended by other members that I post the question again . .
If needed, I can share what I have done, but I fear it will make things overly confusing as I was going in the wrong direction.
I am adding a link to the original question, for those that need some background info
Original Question
If there is any additional information needed, just ask. I do apologize in advance if I have left out any needed details.

This is a bit tricky, because you want to exclude rows in the second table depending on whether there is a match in the third table - so two left joins are not what you are after.
I think this implements the logic you want:
select s.supplies_id, s.item_description,
t.requisition_number, t.original_date, t.trans_type, t.open_closed
from ics_supplies s
left join ics_transaction t
on t.transtype = 'PO'
and t.open_closed = 'Open'
and exists (
select 1
from ics_order o
where o.supplies_id = s.supplies_id and o.requisition_number = t.requisition_number
)
Another way to phrase this would be an inner join in a subquery, then a left join:
select s.supplies_id, s.item_description,
t.requisition_number, t.original_date, t.trans_type, t.open_closed
from ics_supplies s
left join (
select o.supplies_id, t.*
from ics_order o
inner join ics_transaction t
on t.requisition_number = o.requisition_number
where t.transtype = 'PO' and t.open_closed = 'Open'
) t on t.supplies_id = s.supplies_id

This query should return the data for supplies. The left join will add in all orders that have a supply_id (and return null for the orders that don't).
select
s.supplies_id
,s.Item_Description as [Item]
,t.RequisitionNumber
,t.OrigDate as [OriginalDate]
,t.TransType
,t.OpenClosed
from ICS_Supplies s
left join ICS_Orders o on o.supplies_id = s.supplies_id
left join ICS_Transactions t on t.RequisitionNumber = o.RequisitionNumber
where t.TransType = 'PO'
and t.OpenClosed = 'Open'
The null values will automatically show null if the record doesn't exist. For example, you are joining to the Transactions table and if there isn't a transaction_id for that supply then it will return 'null'.
Modify your query, run it, then maybe update your question using real examples if it's possible.

In the original question you wrote:
"I only need ONE matching record from the ICS_Transactions Table.
Ideally, the one that I want is the most current
'ICS_Transactions.OriginalDate'."
So the goal is to get the most recent transaction for which the TransType is 'PO' and OpenClosed is 'Open'. That the purpose of the CTE 'oa_cte' in this code. The appropriate transactions are then LEFT JOIN'ed on SuppliesId. Something like this
with oa_cte(SuppliesId, RequsitionNumber, OriginalDate,
TransType, OpenClosed, RowNum) as (
select o.SuppliesId, o.RequsitionNumber,
t.OrigDate, t.TransType, t.OpenClosed,
row_number() over (partition by o.SuppliesId
order by t.OrigDate desc)
from ICS_Orders o
join ICS_Transactions t on o.RequisitionNumber=t.RequisitionNumber
where t.TransType='PO'
and t.OpenClosed='OPEN')
select s.*, oa.*
from ICS_Supplies s
left join oa_cte oa on s.SuppliesId=oa.SuppliesId
and oa.RowNum=1;

MS Access 2016 - Pull client name from separate table in complex query

I have three tables for vulnerability scanning jobs: customers, authorization forms, and scans. Relationships are one to many from left to right. I previously had scans directly related to clients, but implemented the forms table to add the ability to prevent scanning without authorization. I have the below query which pulls the dates of the most recent and next coming scans (huge thanks to #donPablo), but when I made the change in tables I'm no longer pulling the correct data from the customers table. I'm not exactly sure how to fix it.
SELECT u.Customer_Company, z.*
FROM (Select
NZ(a.Scan_Data.Customer_ID, b.Scan_Data.Customer_ID) as Customer,
aPast as Past,
aFuture as Future,
DATEDIFF("d", aPast, aFuture) as Difference
FROM
(Select Scan_Data.Customer_ID, Max(Scan_Date) as aPast from Scan_Data where Scan_Date <= DATE() Group By Scan_Data.Customer_ID) a
LEFT JOIN
(Select Scan_Data.Customer_ID, Min(Scan_Date) as aFuture from Scan_Data where Scan_Date > DATE() Group By Scan_Data.Customer_ID) b
ON a.Scan_Data.Customer_ID = B.Scan_Data.Customer_ID
UNION
Select
NZ(a.Scan_Data.Customer_ID, b.Scan_Data.Customer_ID) as Customer,
aPast as Past,
aFuture as Future,
DATEDIFF("d", aPast, aFuture) as Difference
FROM
(Select Scan_Data.Customer_ID, Max(Scan_Date) as aPast from Scan_Data where Scan_Date <= DATE() Group By Scan_Data.Customer_ID) a
RIGHT JOIN
(Select Scan_Data.Customer_ID, Min(Scan_Date) as aFuture from Scan_Data where Scan_Date > DATE() Group By Scan_Data.Customer_ID) b
ON a.Scan_Data.Customer_ID = B.Scan_Data.Customer_ID
) AS z LEFT JOIN Customer_Data AS u ON cint(z.Customer) = cint(u.Customer_ID);
In this query the Scan_Data.Customer_ID winds up being the FormID and it then pulls the customer's name based on the FormID. I fixed it in my other queries by doing a double inner join to pull the actual CustomerID based on the FormID, but I can't get that to work here because of the existing joins. Form_Data.Customer_ID is the way it's identified in the Form table. All IDs in their primary tables are autonumber generated PKs.
Customer_Data table:
.Customer_ID | .Customer_Name | etc.
1 | Microsoft |
2 | Reddit |
Form_Data table:
.Form_ID | .Signature_Date | .Expiration_Date | .Customer_ID
1 | 01-Jan-19 | 01-Jan-20 | 2/Reddit
2 | 15-May-18 | 15-May-21 | 1/Microsoft
Scan_Data table:
.Scan_ID | .Scan_Title | .Scan_Date | .Customer_ID
1 | First MS 19052018 | 19-May-18 | 1/2/Reddit
2 | First R 05012019 | 05-Jan-19 | 2/1/Microsoft
The above Scan_Data shows the problem I'm having. The numbers in the Scan_Data.Customer_ID field are the PKs from the other two tables. The .Customer_ID field is pulling the customer ID based upon the form ID and not the actual customer ID. It should show like this:
.Scan_ID | .Scan_Title | .Scan_Date | .Customer_ID
1 | First MS 19052018 | 19-May-18 | 2/1/Microsoft
2 | First R 05012019 | 05-Jan-19 | 1/2/Reddit

Multiple Self-Join based on GROUP BY results

I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example:
| session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
| 1 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z | 30900 | 11111111 |
| 2 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z | 30000 | 22222222 |
| 3 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z | 30000 | 22222222 |
| 4 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z | 30000 | 22222222 |
| 5 | server01 | Windows | Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z | 30000 | 33333333 |
| 6 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z | 30000 | 44444444 |
| 7 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z | 30900 | 55555555 |
| 8 | server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z | 30000 | 66666666 |
| 9 | server04 | Windows | Windows File System | Validate | 2017-12-05T03:00:00Z | 30000 | 66666666 |
Each client_name (server) can have multiple datasets, and each dataset can have multiple plugin_names. So I have a created a SQL statement that does a GROUP BY of these three columns to get a list of "job" activity over time.
(http://sqlfiddle.com/#!15/f15556/1)
select
client_name,
dataset,
plugin_name
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
Each of these Jobs can be successful or fail based on a status_code column. Using self-join with subqueries I'm able to get results of the Last Good backup along with it's completed_ts (completed time) and bytes_modified and more:
(http://sqlfiddle.com/#!15/f15556/16)
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
I can do the same thing separately to get the Last Attempt details by removing WHERE's status_code line: http://sqlfiddle.com/#!15/f15556/3. Note that most times LastGood and LastAttempted are the same row but sometimes they are not, depending if the last backup was successful.
What I'm having problems with is merging these two statements together (if possible). So I will get this result:
| client_name | dataset | plugin_name | lastgood | lastgood_bytes | lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
| server01 | Windows | Windows File System | 2017-12-04T01:00:00Z | 22222222 | 2017-12-05T01:00:00Z | 11111111 |
| server01 | Windows | Windows VSS | 2017-12-01T01:00:00Z | 33333333 | 2017-12-01T01:00:00Z | 33333333 |
| server02 | Windows | Windows File System | 2017-12-05T02:00:00Z | 44444444 | 2017-12-05T02:00:00Z | 44444444 |
| server03 | Windows | Windows File System | 2017-12-05T03:00:00Z | 66666666 | 2017-12-05T03:00:00Z | 66666666 |
I attempted just adding another RIGHT JOIN to the end (http://sqlfiddle.com/#!15/f15556/4) and getting NULL rows. After doing some reading I see that the first two JOINs run first creating a temporary table before the 2nd join occurs, but at that point the data I need is lost so I get NULL rows.
Using PostgreSQL 8 via groovy scripting. I also only have read-only access to the DB.

You apparently have two intermediate inner join output tables and you want to get columns from each about some things identified by a common key. So inner join them on the key.
select
g.client_name,
g.dataset,
g.plugin_name,
LastGood,
g.status_code,
LastGood_bytes
LastAttempt,
l.status_code,
LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
) as g
join
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
select
a1.client_name,
a1.dataset,
a1.plugin_name,
a1.LastAttempt,
a3.status_code,
a3.bytes_modified as LastAttempt_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2 a2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as a1
on a3.client_name = a1.client_name and
a3.dataset = a1.dataset and
a3.plugin_name = a1.plugin_name and
a3.completed_ts = a1.LastAttempt
) as l
on l.client_name = g.client_name and
l.dataset = g.dataset and
l.plugin_name = g.plugin_name
order by client_name, dataset, plugin_name
This uses one of the applicable approaches in Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs. However the correspondence of chunks of code might not be so clear. Its intermediate are left vs your inner & group_concat is your max. (But it has more approaches because of particulars of group_concat & its query.)
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.
Whether this actually serves your purpose in general depends on your actual specification and constraints. Even if the two joins you link are what you want you need to explain exactly what you mean by "merge". You don't say what you want if the joins have different sets of values for the grouped columns. Force yourself to use the English language to say what rows go in the result based on what rows are in the input.
PS 1 You have undocumented/undeclared/unenforced constraints. Please declare when possible. Otherwise enforce by triggers. Document in question text if not in code. Constraints are fundamental to multiple subrow value instances in join & to group by.
PS 2 Learn the syntax/semantics for select. Learn what left/right outer join ons return--whatinner join on does plus unmatched left/right table rows extended by nulls.
PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?

Here is an alternate way that also works but harder to follow and likely more particular to my dataset: http://sqlfiddle.com/#!15/f15556/114
select
Actvty.client_name,
Actvty.dataset,
Actvty.plugin_name,
ActvtyGood.LastGood,
ActvtyGood.status_code as LastGood_status,
ActvtyGood.bytes_modified as LastGood_bytes,
ActvtyOnly.LastAttempt,
Actvty.status_code as LastAttempt_status,
Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty
-- 1. Get last attempt of each job (which may or may not match last good)
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name = ActvtyOnly.client_name and
Actvty.dataset = ActvtyOnly.dataset and
Actvty.plugin_name = ActvtyOnly.plugin_name and
Actvty.completed_ts = ActvtyOnly.LastAttempt
-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (
-- 3. join last good runs with the full table to get the additional details of each
select
ActvtyGoodSub.client_name,
ActvtyGoodSub.dataset,
ActvtyGoodSub.plugin_name,
ActvtyGoodSub.LastGood,
ActvtyAll.status_code,
ActvtyAll.bytes_modified
from v_activities_2 ActvtyAll
-- 2. Get last Good run of each job
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as ActvtyGoodSub
on ActvtyAll.client_name = ActvtyGoodSub.client_name and
ActvtyAll.dataset = ActvtyGoodSub.dataset and
ActvtyAll.plugin_name = ActvtyGoodSub.plugin_name and
ActvtyAll.completed_ts = ActvtyGoodSub.LastGood
) as ActvtyGood
on Actvty.client_name = ActvtyGood.client_name and
Actvty.dataset = ActvtyGood.dataset and
Actvty.plugin_name = ActvtyGood.plugin_name

Complex filtering SQL Server query with multiple conditions

I have a SQL Server query in which I am joining several tables together and we are using SQL Server 2012. I have a customer number, customer name, item type, a field for document number (which can be one document number in either of the two tables I will mention; even though there are two fields mentioned later, I do want to combine this into one field using a CASE statement), a document amount, document date, due date, amount remaining to be paid, and a description.
The problem is that for my item type, one table (RM20101), doesn't recognize an item type associated with it because no item types are in that table. The other table (Custom_RecData), recognizes the correct item type for all pertinent document numbers but that tables only has a record of document numbers associated with items--which not all document numbers in the system have.
So here's an example of the trimmed down version:
CUSTNMBR | ItemType | DocumentNumberRM | DocNumCUSTOM | DocAmount
------------+--------------+--------------------+---------------+-----------
12345ABC | NULL | PYMNT01234567 | NULL | - 28.50
12345ABC | TOYS | 9010456778 | 9010456778 | 300.00
12345ABC | NULL | 9010456778 | NULL | 300.00
12345ABC | NULL | 9019888878 | NULL | 47.90
12345ABC | CRAFTS | 9502345671 | 9502345671 | 145.25
12345ABC | NULL | 9502345671 | NULL | 145.25
I named the columns in this example for document number as "RM" to come out of the RM20101 table and "CUSTOM" to come out of the Custom_RecDate table.
So I've tried lots of ways, from CASE statements, to my WHERE clause, to a HAVING clause, to subqueries... I can't figure it out. Here's what I'd like to see:
CUSTNMBR | ItemType | DocumentNumberRM | DocNumCUSTOM | DocAmount
------------+--------------+--------------------+---------------+-----------
12345ABC | NULL | PYMNT01234567 | NULL | - 28.50
12345ABC | TOYS | 9010456778 | 9010456778 | 300.00
12345ABC | NULL | 9019888878 | NULL | 47.90
12345ABC | CRAFTS | 9502345671 | 9502345671 | 145.25
So why did I call this multiple conditions? Well, if you look at the table, I am cutting out items based on the following:
If both the ItemType And DocNumCUSTOM fields are NULL then show it.
If the ItemType IS NOT NULL and both document number fields are NOT NULL, then show it.
If the ItemType IS NULL and DocNumCUSTOM IS NULL, but we've already seen the document number in the RM20101 DocumentNumberRM field (which I suspect would be in a HAVING clause for a COUNT or something), then don't show it a second time.
If I don't find a way to do step 3, I will get the duplicates as shown in my initial example.
I basically do not want duplication. I want to show an item type whether it be NULL (non-existent for that document number in either table) or show it only once if existent in the Custom_RecDate table, which is the only table that has the item type information in it.
Does this make sense? I know it sounds complicated as heck, but hopefully someone can make sense of it all. :)
Thanks!
By the way, here's the important part (my query, albeit very trimmed down for the example):
SELECT DISTINCT
R1.CUSTNMBR,
I.ITMCLSCD AS [ItemType],
R1.DOCNUMBR AS [DocumentNumberRM],
I.DOCNUMBR AS [DocNumCUSTOM],
R1.ORTRXAMT AS [DocAmount]
FROM
RM20101 R1
JOIN
RM40401 R2 ON R2.RMDTYPAL = R1.RMDTYPAL
JOIN
RM00401 R3 ON R3.DOCNUMBR = R1.DOCNUMBR
JOIN
RM00101 R4 ON R4.CUSTNMBR = R1.CUSTNMBR
LEFT OUTER JOIN
SR_ITCMCD C ON C.CUSTNMBR = R1.CUSTNMBR
LEFT OUTER JOIN
AR_Description D ON D.SOPNUM = R1.DOCNUMBR
LEFT OUTER JOIN
Custom_RecData I ON I.ITEMNMBR = C.ITEMNMBR
AND I.DOCNUMBR = R1.DOCNUMBR
Again, I've tried using CASE statements and putting various conditions in my WHERE clause, I just can't figure it out.

Purely based on the sample data, MAX() should do the trick:
SELECT
R1.CUSTNMBR,
MAX(I.ITMCLSCD) AS [ItemType],
R1.DOCNUMBR AS [DocumentNumberRM],
MAX(I.DOCNUMBR) AS [DocNumCUSTOM],
R1.ORTRXAMT AS [DocAmount]
FROM RM20101 R1
JOIN RM40401 R2 ON R2.RMDTYPAL = R1.RMDTYPAL
JOIN RM00401 R3 ON R3.DOCNUMBR = R1.DOCNUMBR
JOIN RM00101 R4 ON R4.CUSTNMBR = R1.CUSTNMBR
LEFT OUTER JOIN SR_ITCMCD C ON C.CUSTNMBR = R1.CUSTNMBR
LEFT OUTER JOIN AR_Description D ON D.SOPNUM = R1.DOCNUMBR
LEFT OUTER JOIN Custom_RecData I ON I.ITEMNMBR = C.ITEMNMBR
AND I.DOCNUMBR = R1.DOCNUMBR
GROUP BY
R1.CUSTNMBR,
R1.DOCNUMBR,
R1.ORTRXAMT
Removed the DISTINCT since GROUP BY will accomplish the same and is needed for MAX()

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------

A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE

I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas