SQL query for crystal reports produces duplicate results

SQL query for crystal reports produces duplicate results - sql

I created 3 tables TstInvoice, TstProd, TstPersons and added some data:
INVOICE_NBR CLIENT_NR VK_CONTACT
A10304 003145 AT
A10305 000079 EA
A10306 004458 AT
A10307 003331 JDJ
PROD_NR INVOICE_NBR
P29366 A10304
P29367 A10304
P29368 A10305
P29369 A10306
P29370 A10306
P29371 A10307
PERS_NR INITIALEN STATUS PERSOON
0001 AT 7 Alice Thompson
0002 EA 1 Edgar Allen
0003 JDJ 1 John Doe Joe
0004 AT 1 Arthur Twins
The parameter that is passed to the crystal report is the INVOICE_NBR.
On my crystal report I put some fields from the databases and one sql expression:
(
SELECT "TstPersons"."PERSOON" FROM "TstPersons"
WHERE "TstPersons"."INITIALEN" = "TstInvoice"."VK_CONTACT" AND "TstPersons"."STATUS" = 1
)
The full query that is generated:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR", (
SELECT "TstPersons"."PERSOON" FROM "TstPersons"
WHERE "TstPersons"."INITIALEN" = "TstInvoice"."VK_CONTACT" AND "TstPersons"."STATUS" = 1
)
FROM ("GCCTEST"."dbo"."TstInvoice" "TstInvoice" INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd" ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR") INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons" ON "TstInvoice"."VK_CONTACT"="TstPersons"."INITIALEN"
WHERE "TstInvoice"."INVOICE_NBR"='A10304'
The result is as shown in the screenshot:
As you can see the TstPersons.PERSOON field is populated with Alice Thompson and the sql expression field is correctly populated with Arthur Twins. However, I would like only to see the prod_nr once. With this query it produces the prod numbers twice because of the double entry for "AT" despite the fact that I ask for only status 1. I could just delete the old entry but I want to know if it's possible this way.
* edit * I added the status = 1 to the "record selection formula editor" and that seems to work. Not need the sql expression field at all. Not sure if this is the correct way to go though.
So now it looks like this:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR"
FROM ("GCCTEST"."dbo"."TstInvoice" "TstInvoice" INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd" ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR") INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons" ON "TstInvoice"."VK_CONTACT"="TstPersons"."INITIALEN"
WHERE "TstInvoice"."INVOICE_NBR"='A10304' AND "TstPersons"."STATUS"=1

You have a very weak join in your query due to the duplicate values found in the INITIALEN column. Using the STATUS = 1 criteria is a work-around more than a solution because if you ever need to report on an invoice where the contact has a status other than 1, you will need to modify the report's design to allow your join to work because the STATUS value is not found on the invoice to allow a proper join to occur.
You are also running a risk of this work-around breaking down completely should you have another contact with both the same initials and status values as another.
The correct way to solve this problem would be to join TstInvoice to TstPersons through a field that has unique values. The PERS_NR column appears to be a good choice for this.
This is also going to require a redesign of the TstInvoice table to include the PERS_NR column as a Foreign Key.
A stronger join between invoices and persons would also remove the need for that sub-query in you selection statement. This would simplify your query down to the following:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR"
FROM "GCCTEST"."dbo"."TstInvoice" "TstInvoice"
INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd"
ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR"
INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons"
ON "TstInvoice"."PERS_NR"="TstPersons"."PERS_NR"
WHERE "TstInvoice"."INVOICE_NBR"='A10304'

Related

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )

You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause

See Update 4 for details of how I resolved the "data conversion or mapping" error.

How to make a query to obtain only results that have N number within a range of values?

I'm trying to extract nutrient data in MS Access 2007 from the USDA food database, freely available at http://www.ars.usda.gov/Services/docs.htm?docid=24912
I need records that have ALL nutrients from NUT_DATA.Nutr_No . Those records have values between '501' and '511' . But I wish to exclude incomplete records that have missing values.
Currently, Baby food banana has all from nutrient 501 to 511, but Baby food Beverage has only 9 of the nutrients listed, and many others are like that.
As a last resort, I guess it would be acceptable to have all records, showing null for missing values, as long as each FOOD_DES.Long_Desc has exactly 11 records, one for each NUT_DATA.Nutr_No OR NUTR_DEF.NutrDesc (which correspond to each other).
SELECT
FOOD_DES.NDB_No, FOOD_DES.FdGrp_Cd, FOOD_DES.Long_Desc, NUT_DATA.Nutr_No, NUTR_DEF.NutrDesc, NUT_DATA.Nutr_Val, WEIGHT.Amount, WEIGHT.Msre_Desc, WEIGHT.Gm_Wgt, [WEIGHT]![Amount] & " " & [WEIGHT]![Msre_Desc] AS msre
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE
(NUT_DATA.Nutr_No between '501' and '511' ) and ((WEIGHT.Seq)="1") and NUT_DATA.Nutr_Val > '0' and
// this part is me out of ideas trying stuff, but didn't help
EXISTS (SELECT 1
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE count FOOD_DES.Long_Desc = "11" )
//end wild of experimentation
ORDER BY FOOD_DES.Long_Desc, NUTR_DEF.SR_Order;
This is a sample of the data. I just copied the most important columns. The red is not what I'm looking for because it doesn't have all 11 nutrients. I can paste on the google doc the whole table if someone thinks that would help.
https://docs.google.com/spreadsheets/d/1FghDD59wy2PYlpsqUlYVc3Ulwvy4MMLagpBUYtvLBfI/edit?usp=sharing

As your starting point, identify which food items have values > 0 for all 11 of those nutrients. Check whether this simpler GROUP BY query shows you the correct items:
SELECT ndat.NDB_No
FROM
NUT_DATA AS ndat
INNER JOIN WEIGHT AS wt
ON ndat.NDB_No = wt.NDB_No
WHERE
ndat.Nutr_Val>0
AND ndat.Nutr_No IN('501','502','503','504','505','506','507','508','509','510','511')
AND wt.Seq='1'
GROUP BY ndat.NDB_No
HAVING Count(ndat.Nutr_No)=11;
Note you could use Val(ndat.Nutr_No) Between 501 And 511 as the Nutr_No restriction, which would give you a more concise statement. However, evaluating Val() for every row of the table means that approach would forego the performance benefit of indexed retrieval ... so that version of the query should be noticeably slower.
Save that query and create a new query which joins it to the base tables for the additional data you need from other columns. Or use it as a subquery instead of a named query if you prefer.

SQL using inner join

So this is the situation I have. I have 3 tables (tblEmployeesinfo , sqlSumrepMTC, sqlSumrepMTC15th) I display this info using inner join:
SELECT
SQLSummRepMTC."RepCompany", SQLSummRepMTC."WHTax",
SQLSummRepMTC."Company", SQLSummRepMTC."MonthName",
SQLSummRepMTC."YearVal", SQLSummRepMTC."Basis",
tblEmployeesInfo."LastName", tblEmployeesInfo."FirstName",
tblEmployeesInfo."Company", tblEmployeesInfo."MInitial",
tblEmployeesInfo."Division", sqlSumrepMTC15th."WHTax",
sqlSumrepMTC15th."Basis"
FROM
{
oj ("BIOMETRICS"."dbo"."SQLSummRepMTC" SQLSummRepMTC INNER JOIN
"BIOMETRICS"."dbo"."tblEmployeesInfo" tblEmployeesInfo
ON SQLSummRepMTC."EmployeeNo" = tblEmployeesInfo."EmployeeNo")
INNER JOIN "BIOMETRICS"."dbo"."sqlSumrepMTC15th" sqlSumrepMTC15th
ON tblEmployeesInfo."EmployeeNo" = sqlSumrepMTC15th."EmployeeNo"
}
ORDER BY
SQLSummRepMTC."Basis" ASC,
tblEmployeesInfo."Company" ASC,
tblEmployeesInfo."LastName" ASC
Let's say one employee has his record on sqlSumrepMTC with its field Taxvalue of 50 but he does not exist on sqlSumrepMTC15th my problem is that this record will not be displayed in the inner join since it does not have value on both tables. What i want to achieve is just display a 0 value when it does not exist in the other table. This is my report looks like.
Employeeno employeename 15th 30th
01 james 10 20
02 Chris NULL 50
first record will appear in the report since it has both record existing in the two tables, the second will not since its null in the first table. I just need it to appear in the report if one value is null or is missing from the other. Thanks in advance

You have two joins! Thus, if there is no record in sqlSumrepMTC15th you need to replace the second join with LEFT JOIN. If it is possible, that there is no join record in sqlSumrepMTC15th AND tblEmployeesInfo, you need to replace both joins with LEFT JOIN.
Furthermore, you can replace NULL by
SELECT CASE
WHEN attribute IS NULL
THEN 0
ELSE attribute
END AS resultColumName,
nextAttribute
FROM ...

Way to combine filtered results using LIKE

I have a many to many relationship between people and some electronic codes. The table with the codes has the code itself, and a text description of the code. A typical result set from a query might be (there are many codes that contain "broken" so I feel like it's better to search the text description rather than add a bunch of ORs for every code.)
id# text of code
1234 broken laptop
1234 broken mouse
Currently the best way for me to get a result set like this is to use the LIKE%broken% filter. Without changing the text description, is there any way I can return only one instance of a code with broken? So in the example above the query would only return 1234 and broken mouse OR broken laptop. In this scenario it doesn't matter which is returned, all I'm looking for is the presence of "broken" in any of the text descriptions of that person's codes.
My solution at the moment is to create a view that would return
`id# text of code
1234 broken laptop
1234 broken mouse`
and using SELECT DISTINCT ID# while querying the view to get only one instance of each.
EDIT ACTUALLY QUERY
SELECT tblVisits.kha_id, tblICD.descrip, min(tblICD.Descrip) as expr1
FROM tblVisits inner join
icd_jxn on tblVisits.kha_id = icd_jxn.kha)id inner join tblICD.icd_fk=tblICD.ICD_ID
group by tblVisits.kha_id, tblicd.descrip
having (tblICD.descrip like n'%broken%')

You could use the below query to SELECT the MIN code. This will ensure only text per id.
SELECT t.id, MIN(t.textofcode) as textofcode
FROM table t
WHERE t.textofcode LIKE '%broken%'
GROUP BY t.id
Updated Actual Query:
SELECT tblVisits.kha_id,
MIN(tblICD.Descrip)
FROM tblVisits
INNER JOIN icd_jxn ON tblVisits.kha_id = icd_jxn.kha)id
INNER JOIN tblicd ON icd_jxn.icd_fk = tbl.icd_id
WHERE tblICD.descrip like n'%broken%'
GROUP BY tblVisits.kha_id

SQL: Need to remove duplicate rows in query containing multiple joins

Note that I'm a complete SQL noob and in the process of learning. Based on Google searches (including searching here) I've tried using SELECT DISTINCT and GROUP BY but neither works, likely due to all of my joins (if anyone knows why they won't work exactly, that would be helpful to learn).
I need data from a variety of tables and below is the only way I know to do it (I just know the basics). The query below works fine but shows duplicates. I need to know how to remove those. The only hint I have right now is perhaps a nested SELECT query but based on research I'm not sure how to implement them. Any help at all would be great, thanks!
USE SQL_Contest
go
SELECT
CLT.Description AS ClockType,
CLK.SerialNumber AS JobClockSerial,
SIT.SiteNumber AS JobID,
SIT.[Name] AS JobsiteName,
SIT.Status AS SiteActivityStatus,
DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC
EDIT: I will be reviewing these answers shortly, thank you very much. I'm posting the additional duplicate info per Rob's request:
Everything displays fine until I add:
LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber
Which I need. That's when a duplicate appears:
JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049
JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049
I want that info to only display once. Essentially this query is to determine all active jobsites that have a clock assigned to them, and that job only has one clock assigned to it, and it's only one jobsite, but it's appearing twice.
EDIT 2: Based on the help you guys provided I was able to determine they actually are NOT duplicates, and each session is independent, that is the only one that happened to have two sessions. So now I'm going to try to figure out how to only pull in information from the latest session.

If everything "works fine" until you add:
LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber
Then there must be more than one record in tb_Session for each CLK.SerialNumber.
Run the following query:
SELECT *
FROM tb_Session SES
WHERE ClockSerialNumber = '08-107'
There should be two records returned. You need to decide how to handle this (i.e. Which record do you want to use?), unless both rows from tb_Session contain identical data, in which case, should they?
You could always change your query to:
SELECT
CLT.Description AS ClockType,
CLK.SerialNumber AS JobClockSerial,
SIT.SiteNumber AS JobID,
SIT.[Name] AS JobsiteName,
SIT.Status AS SiteActivityStatus,
DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN
(
SELECT DISTINCT ClockSerialNumber, ClockVoltage
FROM tb_Session
) SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC
As that should ensure that SES only contains one record for each unique combination of ClockSerialNumber and ClockVoltage

Take this example dataset:
Ingredient
IngredientId IngredientName
============ =========
1 Apple
2 Orange
3 Pear
4 Tomato
Recipe
RecipeId RecipeName
======== ==========
1 Apple Turnover
2 Apple Pie
3 Poached Pears
Recipe_Ingredient
RecipeId IngredientId Quantity
======== ============ ========
1 1 0.25
1 1 1.00
2 1 2.00
3 3 1.00
Note: Why the Apple Turnover has two lots of apple as ingredients, is neither here nor there, it just does.
The following query will return two rows for the "Apple Turnover" recipe, one row for the "Apple Pie" recipe and one row for the "Poached Pears" recipe, because there are two entries in the Recipe_Ingredient table for IngredientId 1. That's just what happens with a join..
SELECT I.IngredientName,
R.RecipeName
FROM Ingredient I
JOIN Recipe_Ingredient RI
ON I.IngredientId = RI.IngredientId
JOIN Recipe R
ON RI.recipeId = R.RecipeId
You could get this to return only one row by changing it to:
SELECT I.IngredientName,
R.RecipeName
FROM Ingredient I
JOIN Recipe_Ingredient RI
ON I.IngredientId = RI.IngredientId
JOIN Recipe R
ON RI.recipeId = R.RecipeId
GROUP BY I.IngredientName, R.RecipeName
Without more specifics regarding your data, it's hard to apply this to your specific scenario, but the walkthrough may help you understand where the "duplicates" are coming from as someone unfamiliar with SQL

The joins are not your problem. From your comments I will infer that what you are calling "duplicates" are not actual duplicates. If all columns values for 2 "duplicates" returned from the query matched, then either SELECT DISTINCT or GROUP BY would definitely eliminate them. So you should be able to find a solution by looking at your column definitions.
My best guess is that you're getting duplicates of for the same date which aren't really duplicates because the time component of the date doesn't match. To eliminate this problem, you can truncate the date fields to the date only using this technique:
DATEADD(DAY, DATEDIFF(DAY, 0, DHA.IssuedDate), 0) AS DHAIssuedDate,
DATEADD(DAY, DATEDIFF(DAY, 0, CLK.CreatedDate), 0) AS CLKCreatedDate,
If that doesn't work you might want to take a look at JobClockSerial: does this column belong in the query results?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas