How can I implement multiple nameColumns that target the same table? - ssas

I have two tables. The first one references the second one multiple times.
Some demo data:
id fk1 fk2 fk3
-------------------------------
4009 9419 2282 9005
4057 9419 2112 NULL
5480 NULL 4279 NULL
5989 NULL 1677 NULL
The second table contains the names for the foreign keys in table 1
id name
--------------------------------
1677 Bank Account No1
2112 Loyalty Account
2282 Sales Account
4279 Employee Account
9005 Warehouse No1
9419 Sales
I have to create a dimension for Table1. The fields fk1, fk2 and fk3 should use the nameColumn attribute, and show the values from table 2.
I tried creating this named query in the Data SourcE View, but if I try to put it into the cube, the deployment breaks without proper error.
SELECT [Table1].[id]
,fk1
,fk2
,fk3
,T2_fk1.name as fk1name
,T2_fk2.name as fk1name
,T2_fk3.name as fk1name
FROM [dbo].[Table1]
left join Table2 as T2_fk1
on Table2.id = [Table1].fk1
left join Table2 as T2_fk2
on Table2.id = [Table1].fk2
left join Table2 as T2_fk3
on Table2.id = [Table1].fk3
How can I implement multiple nameColumns that target the same table?

Even if NULL values can technically be used in Dimensions in Analysis Services, it is normally a bad idea to do so, as Analysis Services and the relational database have different ideas how they treat them: Analysis Services treats them as empty strings or numerical zeroes, while for the relational database, normally - with the exception of e. g. old versions of Oracle, which treat empty strings as null - these are different. Hence when Analysis Services issues a SQL statement containing DISTINCT or GROUP BY, it can happen that there is more than one row where Analysis Services only expects one, etc.
It is best practice to avoid nulls in attribute columns, as well as foreign key columns in the star schema. In measure columns, nulls are fine. Thus, you should change your statement to
SELECT [Table1].[id]
,coalesce(fk1, -1) as fk1
,coalesce(fk2, -1) as fk2
,coalesce(fk3, -1) as fk3
,coalesce(T2_fk1.name, '<unknown>') as fk1name
,coalesce(T2_fk2.name, '<unknown>') as fk1name
,coalesce(T2_fk3.name, '<unknown>') as fk1name
FROM [dbo].[Table1]
left join Table2 as T2_fk1
on Table2.id = [Table1].fk1
left join Table2 as T2_fk2
on Table2.id = [Table1].fk2
left join Table2 as T2_fk3
on Table2.id = [Table1].fk3
or whatever you chose to replace nulls with instead of -1 or <unknown>.

Related

Left Join is filtering rows out of my query in MySQL 5.7 without any left join columns in the where clause

I have a query that joins 4 tables. It returns 35 rows every time I run it. Here it is..
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
If I left join a table (using a subquery in the on clause of the join), MySQL does some very unexpected things. Here's the modified query with the left join...
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
LEFT OUTER JOIN taxrecordpayment trp ON trp.taxRecordId = tr.Tax_ID AND trp.paymentId = (
SELECT p.id
FROM taxrecordpayment trpi
JOIN payments p ON p.id = trpi.paymentId
WHERE trpi.taxRecordId = tr.Tax_ID AND p.isFullYear = 0
ORDER BY p.dueDate, p.paymentSendTo
LIMIT 1
)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
I would like to note that the left join table does not appear in the where clause of the query at all, and I am not using the left join table in the select clause. In real life, I actually use the left join records in the select clause, but in my effort to get to the essential elements causing this problem, I have simplified the query and removed everything but the essential parts that cause trouble.
Here's what is happening...
Where I used to get 35 records, now I get a random number of records approaching 35. Sometimes, I get 33. Other times, I get 27, or 29, or 31, and so on. I would never expect a left join like this to filter out any records from my result set. A left join should only add additional columns to the result set, particularly when - as is the case here - the left join table is not part of the where clause.
I have determined that the problem really only happens if the subquery has a non-deterministic sort. In other words, if I have two taxrecordpayment records that match the subquery and both have the same due date and the same "paymentSendTo" value, then I see the issue. If the inner subquery has a deterministic sort, the issue goes away.
I would imagine that some people will look at my simplified example and recommend that I simply remove the subquery. If my query were this simple in real life, that would be the way to go.
In reality, the entire query is more complicated, is hitting a LOT of data, and modifying it is possible, but costly. Removing the subquery is even more costly.
Has anyone seen this sort of behavior before? I would expect a non-deterministic subquery to simply produce inconsistent results and I would never expect a left join like this to actually filter records out when the left joined table is not used at all in the where clause.
Here is the query plan, as provided by EXPLAIN...
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
parcels
NULL
range
PRIMARY,Loan_ID,state_county,ParcelsCounty,county_state,Location,CountyLoan
county_state
106
NULL
590
1
Using index condition; Using where
1
PRIMARY
tr
NULL
eq_ref
parcel_year,ParcelsTax_Record,Year
parcel_year
8
infoexchange.parcels.Parcel_ID,const
1
100
Using index
1
PRIMARY
Loans
NULL
eq_ref
PRIMARY,Bank_ID,Bank,DateSub,loan_number
PRIMARY
4
infoexchange.parcels.Loan_ID
1
21.14
Using where
1
PRIMARY
Lender
NULL
eq_ref
PRIMARY
PRIMARY
8
infoexchange.Loans.bank_id
1
100
Using index
1
PRIMARY
trp
NULL
eq_ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
8
infoexchange.tr.Tax_ID,func
1
100
Using where; Using index
2
DEPENDENT SUBQUERY
trpi
NULL
ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
4
infoexchange.tr.Tax_ID
1
100
Using index; Using temporary; Using filesort
2
DEPENDENT SUBQUERY
p
NULL
eq_ref
PRIMARY
PRIMARY
4
infoexchange.trpi.paymentId
1
10
Using where
I have attempted to recreate this with a contrived data setup and an analogous query, but with my contrived data set, I cannot get the subquery behave non-deterministically even though it suffers from the same problem as my subquery above (there are multiple records that match the subquery and the order by is not unique for those records).
This seems to require a massive data set to start misbehaving. It happens on multiple distinct instances of MySQL 5.7, while a MySQL 5.6 instance does not demonstrate the problem at all. I am hoping someone can spot something in the above query plan to help me understand why the subquery is non-deterministic and - more importantly - why that causes records to get dropped from the result set.
I feel like this is either a data set issue (perhaps we need to do a table optimize or do some maintenance on our tables), or a bug in MySQL.
I have submitted a bug for this behavior.
https://bugs.mysql.com/bug.php?id=104824
You can recreate this behavior as follows...
CREATE TABLE tableA (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(10)
);
CREATE TABLE tableB (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
tableAId INTEGER NOT NULL,
name VARCHAR(10),
CONSTRAINT tableBFKtableAId FOREIGN KEY (tableAId) REFERENCES tableA (id)
);
INSERT INTO tableA (name)
VALUES ('he'),
('she'),
('it'),
('they');
INSERT INTO tableB (tableAId, name)
VALUES (1, 'hat'),
(2, 'shoes'),
(4, 'roof');
Run this query multiple times and the number of rows returned will vary:
SELECT COALESCE(b.id, -1) AS tableBId,
a.id AS tableAId
FROM tableA a
LEFT JOIN tableB b ON (b.tableAId = a.id AND 0.5 > RAND());

Join 2 tables by 1 field, partial matching on other two fields - no cartesian product

I have 2 tables, representing the substitutions of components in an intervention (ID_TK), each with a Part Number OLD and a Part Number NEW, from two different systems point of view (PN OLD and NEW in one system, HWC OLD and NEW in the other one).
The number of rows in the two tables may be different for each ID_TK.
ID_SOST is the unique KEY in the first table, PR_ID is the unique key in the second one.
However, only the id intervention (ID_TK) links the two tables exactly.
I have to check if the substitutions match in the two tables for each intervention.
"Match" means: same ID_TK, AND PN_OLD must be substring (or equal) to HWC_OLD, AND PN_NEW must be substring (or equal) to HWC_NEW (both uppercase):
case
when
(UPPER(PN_NEW) = UPPER(SUBSTR(HWC_NEW,1,LENGTH(PN_NEW))) AND UPPER(PN_OLD) = UPPER(SUBSTR(HWC_OLD,1,LENGTH(PN_OLD)))) THEN 'YES' else 'NO' as MATCH
In the desiderata result table, the "matching" couples ID_SOST-PR_ID have to be listed first, and they have not been considered for any other matching in the ID_TK (the best option is the exact match); the remaining non-matching couples have to be listed after the matching ones, first listing the better similarity. The difficulty is that I don't want to show the cartesian product of the non-matching substitutions(PRs and ID_SOSTs). BTW, if the number of PR (or SOST) is different, null will fill the missing fields.
So first, for future reference, here's how to give example data in a question:
create table table1 (id_tk varchar2(20), id_sost varchar2(20), hwc_old varchar2(20), hwc_new varchar2(20));
create table table2 (id_tk varchar2(20), pr_id varchar2(20), pn_old varchar2(20), pn_new varchar2(20));
insert into table1 values ('TK0000001296676', '00000000199412', '3AL80407AA', '3AL80407AA');
insert into table1 values ('TK0000001296676', '00000000199413', '3AL79090BAAS', '3AL79090BAAS');
insert into table2 values ('TK0000001296676', 'pr-20191025-008', '3AL79090BAAS04', '3AL79090BA');
insert into table2 values ('TK0000001296676', 'pr-20191115-009', '3AL79090BA', '3AL79090BA');
insert into table2 values ('TK0000001296676', 'pr-20191115-011', '3AL79090BAAS04', '3AL79090BA');
It's very helpful - please don't make us re-type text that you already have. Just copy/pasting the data from Excel is fine too.
For the actual query - we're doing a full outer join, and only including the rows which either match, or have nulls instead of a match. We could also have written this as a full outer join, excluding the cartesian product rows which don't match.
select nvl(table1.id_tk, table2.id_tk) as id_tk, id_sost, hwc_old, hwc_new, pr_id, pn_old, pn_new,
case when (UPPER(PN_NEW) = UPPER(SUBSTR(HWC_NEW,1,LENGTH(PN_NEW))) AND UPPER(PN_OLD) = UPPER(SUBSTR(HWC_OLD,1,LENGTH(PN_OLD)))) THEN 'YES' else 'NO' end as MATCH
from table1
full outer join table2
on (table1.id_tk = table2.id_tk -- display matches if they exist
and UPPER(PN_NEW) = UPPER(SUBSTR(HWC_NEW,1,LENGTH(PN_NEW)))
and UPPER(PN_OLD) = UPPER(SUBSTR(HWC_OLD,1,LENGTH(PN_OLD)))
) OR ( -- also include rows with no matches
(id_sost is null or pr_id is null))
order by MATCH desc -- show matches first
;
Output - my output is a little different from yours, because I don't think you really want to have non-matching ID_SOST - PR_ID data in the same row. It's hard to do in SQL, and I think this way is more clear that the table1/table2 data in the row is not a match with each other.
ID_TK ID_SOST HWC_OLD HWC_NEW PR_ID PN_OLD PN_NEW MATCH
TK0000001296676 199413 3AL79090BAAS 3AL79090BAAS pr-20191115-009 3AL79090BA 3AL79090BA YES
TK0000001296676 null null null pr-20191115-011 3AL79090BAAS04 3AL79090BA NO
TK0000001296676 null null null pr-20191025-008 3AL79090BAAS04 3AL79090BA NO
TK0000001296676 199412 3AL80407AA 3AL80407AA null null null NO

SQL Query Schema and Data into One Row from 2 Databases

We have data that was merged by accident in our production site, but we still have the data separated in our test site/database. I'd like to be able to query customer data from both databases to compare based on a customer's uniqueidentifer. What I'd like to see in the query results is the schema tables, columns, and primary uniqueidentifier keys of tables where the table has a matching foreign column containing the customer's key.
For instance, if the customer has an invoice the customer cst_key would be in the inv_cst_key of the ac_invoice. I need the primary key of that table, which would come from the inv_key column of that row. So if the customer had two invoices, two inv_key's would list as separate rows.
I installed ApexSQL Search to search the database uniqueidentifier columns for the customer key and it provided the table and foreign columns where the customer's cst_key existed (e.g.: inv_cst_key), but I still need the primary key (e.g.: inv_key) of that table row which the customer key resides. I tried using those search results to build something with excel (piecing tables, operators, columns, etc.) and copy/paste it to SSMS, but the query pulls millions of results the way it's setup...
DECLARE #cstkey uniqueidentifier
SET #cstkey = 'xxxxxxxxxxxxxxxxxx'
SELECT cst_key,
inv_key,
-- many more columns
FROM co_customer
LEFT JOIN ac_invoice ON cst_key = inv_cst_key
-- more LEFT JOINS
WHERE cst_key = #cstkey
Also, I know how to query data from 2 databases, but I don't know how to query so that I can see table and column names in a row next to the column's data.
DECLARE #cstkey uniqueidentifier
SET #cstkey = 'xxxxxxxxxxxxxxxxxx'
SELECT cst1.cst_key AS cst_key_1, inv1.inv_key AS inv_key_1,
cst2.cst_key AS cst_key_2, inv2.inv_key AS inv_key_2
FROM db1name.dbo.co_customer cst1
LEFT JOIN db1name.dbo.ac_invoice inv1 (NOLOCK) ON inv1.inv_cst_key = cst1.cst_key
INNER JOIN db2name.dbo.co_customer cst2 (NOLOCK) ON cst1.cst_key = cst2.cst_key
INNER JOIN db2name.dbo.ac_invoice inv2 (NOLOCK) ON inv2.inv_cst_key = cst2.cst_key
WHERE cst1.cst_key = #cstkey
AND cst2.cst_key = #cstkey
I'd like the results to look something like this...
DB1 | T1 | PC1 | PC1 Data Key | FC1 | FC1 Data Key || DB2 | T2 | ...
--------------------------------------------------------------------
DB = Database, T= Table, PC = Primary Column, FC = Foreign Column
Btw, the FC1 Data Key would also be the cst_key as mentioned above.
Thanks in advance for any assistance.

Update on custom relation: is primary key required?

UPDATE (
SELECT
o.order_id,
o.shipping_from
FROM
orders o,
items i
WHERE
o.item_id = i.item_id
AND o.shipping_from = 'foot'
AND i.type = 'ent'
) t
SET
t.shipping_from = 'car';
The inner SELECT query returns 2 rows from orders. The whole query works well as excepted. o.order_id and i.item_id are primary keys, o.item_id is a foreign key, other columns' name don't match.
When I run an update in this way, is it reuquired to include a primary key in the relation I want to update? Why? If not, how would the DBMS know that a row is located in another table? Sure, items doesn't have shipping_from field so it's not ambiguous which row I select, but what if it had?
Some data examples:
SELECT * FROM items WHERE type = 'ent';
ITEM_ID ITEM_SERIAL_CODE NAME BRAND TYPE DAILY_COST PURCHASE_DAT
---------- -------------------- -------------------- -------------------- ------------ ---------- ------------
1007 DC00755250 Dragon costume Branded ent 19000 14-DEC. -15
1010 SS01003632 Serpentine streamer Chinese ent 132500 10-MÁRC. -03
SELECT * FROM orders WHERE shipping_from = 'foot';
ORDER_ID ITEM_ID EVENT_ID LIABLE_PERSON SHIPPING_T SHIPPING_F ORDER_COMMENT
---------- ---------- ---------- --------------- ---------- ---------- -----------------------------------
3011 1006 2010 Géza Nagy car foot It will be a great party.
3018 1009 2011 Ferenc Nagy boat foot Multiple celebs expected.
3019 1010 2011 Ferenc Balázs bus foot Changing weather, changing seasons.
3020 1010 2012 Béci Patkó boat foot Bring the stuff to the first floor.
is it reuquired to include a primary key in the relation I want to
update? Why?
Yes
The answer is in the documentation: 24.1.5 DML Statements and Join Views
Updating a Join View
An updatable join view (also referred to as a modifiable join view) is a view that contains multiple tables in the
top-level FROM clause of the SELECT statement, and is not restricted
by the WITH READ ONLY clause.
The rules for updatable join views are shown in the following table.
Views that meet these criteria are said to be inherently updatable.
General Rule: Any INSERT, UPDATE, or DELETE operation on a join view can modify only one underlying base table at a time.
UPDATE Rule: All updatable columns of a join view must map to columns of a key-preserved table. See "Key-Preserved Tables" for a discussion of key-preserved tables. If the view is defined with the WITH CHECK OPTION clause, then all join columns and all columns of repeated tables are not updatable.
.......................
In the context of the above documentation, a subquery in the UPDATE statement (UPDATE ( subquery ) SET ... ) is treated as the view, that is like UPDATE the_view SET ... - because any view is nothing but a (sub)query.
The answer to what I wanted to know is no. The primary key is not required in the projection list, the UPDATE's effect is the same even if o.order_id is not selected in the sub-query. However, this doesn't mean the primary key is unused, thanks to the declarative technique. Whatever including or excluding the primary key, the same execution plan is created to fetch the records which will be updated:
Projection doesn't change much here, the DBMS could identify each rows after the hash join operation, see krokodilko's answer.
If you don't select anything from the table (like you type SELECT 1 FROM ...), you can't execute the update command, the sub-query's result is just a normal set of some data, it can't be mapped to a table.
I think we should always include the primary key to make it clear to the reader how is the sub-query's result used in the inner-query, which table we plan to update.

INSERT INTO from two tables into one empty table with foreign key constraints

I have a table called PurchaseOrderAccount that is empty. I need to insert the Account code from a table called DailyCosts. Also, I need to Insert a PurchaseOrderID from PurchaseOrder.ID. These are both foreign key restraints. No columns will accept Null. This is what I have:
Insert Into PurchaseOrderAccount (WellID, JobID,ID,PurchaseOrderID,AccountCode)
Select DailyCosts.WellID,
DailyCosts.JobID,
NEWID(),
PurchaseOrder.ID,
DailyCosts.AccountCode
From DailyCosts
inner join
PurchaseOrder
on DailyCosts.Notes =PurchaseOrder.PONumber
Join
PurchaseOrderDailyCost
On DailyCosts.DailyCostID = PurchaseOrderDailyCost.DailyCostID
Where DailyCosts.WellID = '24A-23'
Group By DailyCosts.WellID,
DailyCosts.JobID,
PurchaseOrder.ID,
DailyCosts.AccountCode;
With this, I get 191 records. I only want unique AccountCodes from DailyCosts which are 54. I would appreciate any direction.
I feel that you would have to remove DailyCosts.JobID in the 'Group By' if you want all unique combinations of DailyCosts.AccountCode and PurchaseOrder.ID.
As you have filtered DailyCosts.WellID = '24A-23', a group by should not affect it.
Insert Into PurchaseOrderAccount (WellID, ID,PurchaseOrderID,AccountCode)
Select DailyCosts.WellID,
NEWID(),
PurchaseOrder.ID,
DailyCosts.AccountCode
From DailyCosts
inner join
PurchaseOrder
on DailyCosts.Notes =PurchaseOrder.PONumber
Join
PurchaseOrderDailyCost
On DailyCosts.DailyCostID = PurchaseOrderDailyCost.DailyCostID
Where DailyCosts.WellID = '24A-23'
Group By DailyCosts.WellID,
PurchaseOrder.ID,
DailyCosts.AccountCode;
Here, I believe that there would be multiple JobID's causing your total number to be coming to 191, instead of the expected 41. Please let me know if this works for you.