Help with a complex join query

Help with a complex join query - sql-server-2000

Keep in mind I am using SQL 2000
I have two tables.
tblAutoPolicyList contains a field called PolicyIDList.
tblLossClaims contains two fields called LossPolicyID & PolicyReview.
I am writing a stored proc that will get the distinct PolicyID from PolicyIDList field, and loop through LossPolicyID field (if match is found, set PolicyReview to 'Y').
Sample table layout:
PolicyIDList LossPolicyID
9651XVB19 5021WWA85, 4421WWA20, 3314WWA31, 1121WAW11, 2221WLL99 Y
5021WWA85 3326WAC35, 1221AXA10, 9863AAA44, 5541RTY33, 9651XVB19 Y
0151ZVB19 4004WMN63, 1001WGA42, 8587ABA56, 8541RWW12, 9329KKB08 N
How would I go about writing the stored proc (looking for logic more than syntax)?
Keep in mind I am using SQL 2000.

Select LossPolicyID, * from tableName where charindex('PolicyID',LossPolicyID,1)>0

Basically, the idea is this:
'Unroll' tblLossClaims and return two columns: a tblLossClaims key (you didn't mention any, so I guess it's going to be LossPolicyID) and Item = a single item from LossPolicyID.
Find matches of unrolled.Item in tblAutoPolicyList.PolicyIDList.
Find matches of distinct matched.LossPolicyID in tblLossClaims.LossPolicyID.
Update tblLossClaims.PolicyReview accordingly.
The main UPDATE can look like this:
UPDATE claims
SET PolicyReview = 'Y'
FROM tblLossClaims claims
JOIN (
SELECT DISTINCT unrolled.LossPolicyID
FROM (
SELECT LossPolicyID, Item = itemof(LossPolicyID)
FROM unrolling_join
) unrolled
JOIN tblAutoPolicyList
ON unrolled.ID = tblAutoPolicyList.PolicyIDList
) matched
ON matched.LossPolicyID = claims.LossPolicyID
You can take advantage of the fixed item width and the fixed list format and thus easily split LossPolicyID without a UDF. I can see this done with the help of a number table and SUBSTRING(). unrolling_join in the above query is actually tblLossClaims joined with the number table.
Here's the definition of unrolled 'zoomed in':
...
(
SELECT LossPolicyID,
Item = SUBSTRING(LossPolicyID,
(v.number - 1) * #ItemLength + 1,
#ItemLength)
FROM tblLossClaims c
JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 1 AND (LEN(c.LossPolicyID) + 2) / (#ItemLength + 2)
) unrolled
...
master..spt_values is a system table that is used here as the number table. Filter v.type = 'P' gives us a rowset with number values from 0 to 2047, which is narrowed down to the list of numbers from 1 to the number of items in LossPolicyID. Eventually v.number serves as an array index and is used to cut out single items.
#ItemLength is of course simply LEN(tblAutoPolicyList.PolicyIDList). I would probably also declared #ItemLength2 = #ItemLength + 2 so it wasn't calculated every time when applying the filter.
Basically, that's it, if I haven't missed anything.

If the PolicyIDList field is a delimited list, you have to first separate the individual policy IDs and create a temporary table with all of the results. Next up, use an update query on the tblLossClaims with 'where exists (select * from #temptable tt where tt.PolicyID = LossPolicyID).
Depending on the size of the table/data, you might wish to add an index to your temporary table.

Related

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )

You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause

See Update 4 for details of how I resolved the "data conversion or mapping" error.

Completely Unique Rows and Columns in SQL

I want to randomly pick 4 rows which are distinct and do not have any entry that matches with any of the 4 chosen columns.
Here is what I coded:
SELECT DISTINCT en,dialect,fr FROM words ORDER BY RANDOM() LIMIT 4
Here is some data:
**en** **dialect** **fr**
number SFA numero
number TRI numero
hotel CAI hotel
hotel SFA hotel
I want:
**en** **dialect** **fr**
number SFA numero
hotel CAI hotel
Some retrieved rows would have something similar with each other, like having the same en or the same fr, I would like to retrieved rows that do not share anything similar with each other, how do I do that?

I think I’d do this in the front end code rather the dB, here’s a pseudo code (don’t know what your node looks like):
var seenEn = “en not in (''“;
var seenFr = “fr not in (''“;
var rows =[];
while(rows.length < 4)
{
var newrow = sqlquery(“SELECT *
FROM table WHERE “ + seenEn + “) and ”
+ seenFr + “) ORDER BY random() LIMIT 1”);
if(!newrow)
break;
rows.push(newrow);
seenEn += “,‘“+ newrow.en + “‘“;
seenFr += “,‘“+ newrow.fr + “‘“;
}
The loop runs as many times as needed to retrieve 4 rows (or maybe make it a for loop that runs 4 times) unless the query returns null. Each time the query returns the values are added to a list of values we don’t want the query to return again. That list had to start out with some values (null) that are never in the data, to prevent a syntax error when concatenation a comma-value string onto the seenXX variable. Those syntax errors can be avoided in other ways like having a Boolean of “if it’s the first value don’t put the comma” but I chose to put dummy ineffective values into the sql to make the JS simpler. Same goes for the
As noted, it looks like JS to ease your understanding but this should be treated as pseudo code outlining a general algorithm - it’s never been compiled/run/tested and may have syntax errors or not at all work as JS if pasted into your file; take the idea and work it into your solution
Please note this was posted from an iphone and it may have done something stupid with all the apostrophes and quotes (turned them into the curly kind preferred by writers rather than the straight kind used by programmers)

You can use Rank or find first row for each group to achieve your result,
Check below , I hope this code will help you
SELECT 'number' AS Col1, 'SFA' AS Col2, 'numero' AS Col3 INTO #tbl
UNION ALL
SELECT 'number','TRI','numero'
UNION ALL
SELECT 'hotel','CAI' ,'hotel'
UNION ALL
SELECT 'hotel','SFA','hotel'
UNION ALL
SELECT 'Location','LocationA' ,'Location data'
UNION ALL
SELECT 'Location','LocationB','Location data'
;
WITH summary AS (
SELECT Col1,Col2,Col3,
ROW_NUMBER() OVER(PARTITION BY p.Col1 ORDER BY p.Col2 DESC) AS rk
FROM #tbl p)
SELECT s.Col1,s.Col2,s.Col3
FROM summary s
WHERE s.rk = 1
DROP TABLE #tbl

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.

I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).

Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!

I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Update 1 field in a table from another field in a different table (OS400, not a 1 to 1 relationship)

Im trying to update a field in a table from another field in a different table.
The table being updated will have multiple records that need updating from 1 match in the other table.
Example, i have a 1 million row sales history file. Those million records have aproximately 40,000 different sku codes, each row has a date and time stamp. Each sku will have multiple records in there.
I added a new field called MATCOST (material cost).
I have a second table containing SKU and the MATCOST.
So i want to stamp every line in table 1 with the corresponding SKU's MATCOST in table2. I cannot seem to achieve this when its not a 1 to 1 relationship.
This is what i have tried:
update
aulsprx3/cogtest2
set
matcost = (select Matcost from queryfiles/coskitscog where
aulsprx3/cogtest2.item99 = queryfiles/coskitscog.ITEM )
where
aulsprx3/cogtest2.item99=queryfiles/coskitscog.ITEM
But that results in the SQL error: Column qualifier or table COSKITSCOG undefined and highlighting the q in the last reference to queryfiles/coskitscog.Item
Any idea's ?
Kindest Regards
Adam
Update: This is what my tables look like in principle. 1 Table contains the sales data, the other contains the MATCOSTS for the items that were sold. I need to update the Sales Data table (COGTEST2) with the data from the COSKITCOG table. I cannot use a coalesce statement because its not a 1 to 1 relationship, most select functions i use result in the error of multiple selects. The only matching field is Item=Item99
I cant find a way of matching multiple's. In the example we would have to use 3 SQL statements and just specify the item code. But in live i have about 40,000 item codes and over a million sales data records to update. If SQL wont do it, i suppose i'd have to try write it in an RPG program but thats way beyond me for the moment.
Thanks for any help you can provide.

Ok this is the final SQL statement that worked. (there were actually 3 values to update)
UPDATE atst2f2/SAP20 ct
SET VAL520 = (SELECT cs.MATCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
VAL620 = (SELECT cs.LABCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
VAL720 = (SELECT cs.OVRCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20),
WHERE ct.pnum20 IN (SELECT cs.ITEM
FROM queryfiles/coskitscog cs)

This more compact way to do the same thing should be more efficient, eh?
UPDATE atst2f2/SAP20 ct
SET (VAL520, VAL620, VAL720) =
(SELECT cs.MATCOST, cs.LABCOST, cs.OVRCOST
FROM queryfiles/coskitscog cs
WHERE cs.ITEM = ct.pnum20)
WHERE ct.pnum20 IN (SELECT cs.ITEM
FROM queryfiles/coskitscog cs)

Qualify the columns with correlation names.
UPDATE AULSPRX3/COGTEST2 A
SET A.matcost = (SELECT matcost
FROM QUERYFILES/COSKITSCOG B
WHERE A.item99 = B.item)
WHERE EXISTS(SELECT *
FROM QUERYFILES/COSKITSCOG C
WHERE A.item99 = C.item)

From UPDATE, I'd suggest:
update
aulsprx3/cogtest2
set
(matcost) = (select Matcost from queryfiles/coskitscog where
aulsprx3/cogtest2.item99 = queryfiles/coskitscog.ITEM)
where
aulsprx3/cogtest2.item99=queryfiles/coskitscog.ITEM
Note the braces around matcost.

Selecting elements that don't exist

I am working on an application that has to assign numeric codes to elements. This codes are not consecutives and my idea is not to insert them in the data base until have the related element, but i would like to find, in a sql matter, the not assigned codes and i dont know how to do it.
Any ideas?
Thanks!!!
Edit 1
The table can be so simple:
code | element
-----------------
3 | three
7 | seven
2 | two
And I would like something like this: 1, 4, 5, 6. Without any other table.
Edit 2
Thanks for the feedback, your answers have been very helpful.

This will return NULL if a code is not assigned:
SELECT assigned_codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE codes.code = #code
This will return all non-assigned codes:
SELECT codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE assigned_codes.code IS NULL
There is no pure SQL way to do exactly the thing you want.
In Oracle, you can do the following:
SELECT lvl
FROM (
SELECT level AS lvl
FROM dual
CONNECT BY
level <=
(
SELECT MAX(code)
FROM elements
)
)
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
In PostgreSQL, you can do the following:
SELECT lvl
FROM generate_series(
1,
(
SELECT MAX(code)
FROM elements
)) lvl
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL

Contrary to the assertion that this cannot be done using pure SQL, here is a counter example showing how it can be done. (Note that I didn't say it was easy - it is, however, possible.) Assume the table's name is value_list with columns code and value as shown in the edits (why does everyone forget to include the table name in the question?):
SELECT b.bottom, t.top
FROM (SELECT l1.code - 1 AS top
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code - 1)) AS t,
(SELECT l1.code + 1 AS bottom
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code + 1)) AS b
WHERE b.bottom <= t.top
AND NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code >= b.bottom AND l2.code <= t.top);
The two parallel queries in the from clause generate values that are respectively at the top and bottom of a gap in the range of values in the table. The cross-product of these two lists is then restricted so that the bottom is not greater than the top, and such that there is no value in the original list in between the bottom and top.
On the sample data, this produces the range 4-6. When I added an extra row (9, 'nine'), it also generated the range 8-8. Clearly, you also have two other possible ranges for a suitable definition of 'infinity':
-infinity .. MIN(code)-1
MAX(code)+1 .. +infinity
Note that:
If you are using this routinely, there will generally not be many gaps in your lists.
Gaps can only appear when you delete rows from the table (or you ignore the ranges returned by this query or its relatives when inserting data).
It is usually a bad idea to reuse identifiers, so in fact this effort is probably misguided.
However, if you want to do it, here is one way to do so.

This the same idea which Quassnoi has published.
I just linked all ideas together in T-SQL like code.
DECLARE
series #table(n int)
DECLARE
max_n int,
i int
SET i = 1
-- max value in elements table
SELECT
max_n = (SELECT MAX(code) FROM elements)
-- fill #series table with numbers from 1 to n
WHILE i < max_n BEGIN
INSERT INTO #series (n) VALUES (i)
SET i = i + 1
END
-- unassigned codes -- these without pair in elements table
SELECT
n
FROM
#series AS series
LEFT JOIN
elements
ON
elements.code = series.n
WHERE
elements.code IS NULL
EDIT:
This is, of course, not ideal solution. If you have a lot of elements or check for non-existing code often this could cause performance issues.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas