Access query inconsistently treats empty string as null - sql

I have an application that is grabbing data from an Access database. I am seeking the minimum value of a column and the results I am getting back are inconsistent.
Have I run into a feature where Access inconsistently treating an empty string as a null depending on whether I add a filter or not, or is there something wrong with the way I am querying the data?
The column contains one blank value (not null) and several non-blank values that are all identical (about 30 instances of 'QLD'). The query I am using has a filter that involves multiple other tables, so that only the blank value and about half of the 'QLD' values are eligible.
It's probably easier to show the code and the effects rather than describe it. I have created a series of unioned queries which 'should' bring back identical results but do not.
Query:
SELECT 'min(LOC_STATE)' as Category
, min(LOC_STATE) as Result
FROM pay_run, pay_run_employee, employee, department, location
WHERE pr_id = pre_prid
AND em_location = loc_id
AND pre_empnum = em_empnum
AND em_department = dm_id
AND pr_date >= #2/24/2015#
AND pr_date <= #2/24/2016#
UNION ALL
(SELECT TOP 1 'top 1 LOC_STATE'
, LOC_STATE
FROM pay_run, pay_run_employee, employee, department, location
WHERE pr_id = pre_prid
AND em_location = loc_id
AND pre_empnum = em_empnum
AND em_department = dm_id
AND pr_date >= #2/24/2015#
AND pr_date <= #2/24/2016#
ORDER BY LOC_STATE)
UNION ALL
SELECT 'min unfiltered', min(loc_state)
FROM location
UNION ALL
(SELECT TOP 1 'iif is null', iif(loc_state is null, 'a', loc_state)
FROM location
ORDER BY loc_state)
Results:
Category Result
min(LOC_STATE) 'QLD'
top 1 LOC_STATE ''
min unfiltered ''
iif is null ''
If I do a minimum with the filter it brings back 'QLD' and not the empty string. At this stage it is possible that the empty string is not being included because it is treated as a null or the filter removes it.
The second query, which brings back the top 1 state using the filter shows that the empty string is not filtered out, which means that the Min function is ignoring the empty string.
The third query, which gets the minimum of the unfiltered table, brings back the empty string - so the minimum function does not exclude empty strings / treat them as null.
The fourth query, ensures that there is not a null in the empty string position.
My conclusion is that perhaps the inclusion of other tables and filter criteria is causing the empty string value to be treated as a null, but I feel that I must be missing something.
NB: I have a very similar query (date literals altered) that executes against the same data imported into a SQL Server database. It is correctly returning '' for all 4 queries.
Does anyone know why the empty string is ignored by the Min function in the first query?
PS: for those who prefer a query with joins
SELECT 'min(LOC_STATE)' as Category
, min(LOC_STATE) as Result
FROM (((pay_run
INNER JOIN pay_run_employee ON pay_run.pr_id = pay_run_employee.pre_prid)
INNER JOIN employee ON pay_run_employee.pre_empnum = employee.em_empnum)
INNER JOIN department ON employee.em_department = department.dm_id)
INNER JOIN location on employee.em_location = location.loc_id
WHERE
PR_DATE >= #2/24/2015# and
PR_DATE <= #2/24/2016#
union all
(SELECT TOP 1 'TOP 1 LOC_STATE'
, LOC_STATE
FROM (((pay_run
INNER JOIN pay_run_employee ON pay_run.pr_id = pay_run_employee.pre_prid)
INNER JOIN employee ON pay_run_employee.pre_empnum = employee.em_empnum)
INNER JOIN department ON employee.em_department = department.dm_id)
INNER JOIN location on employee.em_location = location.loc_id
WHERE
PR_DATE >= #2/24/2015# and
PR_DATE <= #2/24/2016#
order by LOC_STATE)
union all
select 'min unfiltered', min(loc_state)
from location

This has got nothing to do with corrupt data or unions or joins. The problem can be easily made visible by exectuting following queries in access:
create table testbug (Field1 varchar (255) NULL)
insert into testbug (Field1) values ('a')
insert into testbug (Field1) values ('')
insert into testbug (Field1) values ('c')
select min(field1) from testbug
To my opinion this is a bug in ms-access. When the MIN function in ms-access comes across an empty string ('') it forgets all the values he has come across and returns the minimum value from all the values below the empty string. (in my simple example only value 'c')

Related

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )
You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause
See Update 4 for details of how I resolved the "data conversion or mapping" error.

Query that detects difference between accounts/loads from TODAY and YESTERDAY

GOAL: DETECT any difference between yesterday's table loads and today's loads. Each load loads values of data that are associated with bank accounts. So I need a query that returns each individual account that has a difference, with the value in the column name.
I need data from several columns that are located from two different tables. AEI_GFXAccounts and AEI_GFXAccountSTP. Each time the table is loaded, it has a "run_ID" that is incremented by one. So it needs to be compared to MAX(run_id) and MAX(run_id) -1.
I have tried the following queries. All this query does is return all the columns I need. I now need to implement logic that runs these queries WHERE runID = MAX(runID). Then run it again where run_ID = Max(runID) -1. Compare the two tables, show the differences that can be displayed under columns like SELECT AccountBranch WHERE MAX(Run_ID) -1 AS WAS. etc. and another custom named column as 'IS NOW' etc for each column.
SELECT AEI_GFXAccounts.AccountNumber,
AccountBranch,
AccountName,
AccountType,
CostCenter,
TransactionLimit,
ClientName,
DailyCumulativeLimit
FROM AEI_GFXAccounts
JOIN AEI_GFXAccountSTP
ON (AEI_GFXAccounts.feed_id = AEI_GFXAccountSTP.feed_id
and AEI_GFXAccounts.run_id = AEI_GFXAccountSTP.run_id)
I use something similar to this to detect changes for a logging system:
WITH data AS (
SELECT
a.run_id,
a.AccountNumber,
?.AccountBranch,
?.AccountName,
?.AccountType,
?.CostCenter,
?.TransactionLimit,
?.ClientName,
?.DailyCumulativeLimit
FROM
AEI_GFXAccounts a
INNER JOIN AEI_GFXAccountSTP b
ON
a.feed_id = b.feed_id and
a.run_id = b.run_id
),
yest AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id)-1 FROM AEI_GFXAccounts)
),
toda AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id) FROM AEI_GFXAccounts)
)
SELECT
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN yest.AccountBranch END as yest_AccountBranch,
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN toda.AccountBranch END as toda_AccountBranch,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN yest.AccountName END as yest_AccountName,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN toda.AccountName END as toda_AccountName,
...
FROM
toda INNER JOIN yest ON toda.accountNumber = yestaccountNumber
Notes:
You didn't say which table some of your columns are from. I've prefixed them with ?. - replace these with a. or as. respectively (always good practice to fully qualify all your column aliases)
When you're repeating out the pattern in the bottom select (above ...) choose data for the COALESCE that will not appear in the column. I'm using COALESCE as a quick way to avoid having to write CASE WHEN a is null and b is not null or b is null and a is not null or a != b, but the comparison fails if accountname (for example) was 'x' yesterday and today it is null, because the null becomes 'x'. If you pick data that will never appear in the column then the check will work out because nulls will be coalesced to something that can never appear in the real data, and hence the <> comparison will work out
If you don't care when a column goes to null today from a value yesterday, or was null yesterday but is a value today, you can ditch the coalesce and literally just do toda.X <> yest.X
New accounts today won't show up until tomorrow. If you want them to show up do toda LEFT JOIN yest .... Of course all their properties will show as new ;)
This query returns all the accounts regardless of whether any changes have been made. If you only want a list of accounts with changes you'll need a where clause that is similar to your case whens:
WHERE
COALESCE(toda.AccountBranch, 'x') <> COALESCE(yest.AccountBranch, 'x') OR
COALESCE(toda.AccountName, 'x') <> COALESCE(yest.AccountName, 'x') OR
...
Do you have a date field? If so you can use Row_Number partitioned by your accounts. Exclude all accounts that have a max of 1 row 'New accounts", and then subtract the Max(rownumber) of each account's load by the Max(rownumber)-1's load. Only return accounts where this returned load is >0.You can also use the lag function to grab the previous accounts load instead of Max(rownumber)-1

Is there a way to exlude SQL results that are ALMOST duplicates?

I have query that runs daily that shows old and new member addresses as they are updated. The query works fine except for the times when a USPS address match is done in our core system and just changes some of the abbreviations
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
I don't need to see these results.
I have tried removing based on unique fields in the core, however, the USPS match process creates all new fields so the query can't remove based on that information.
The main SP for this is:
INSERT INTO #results
SELECT
distinct i.INDIVIDUAL_ID,
i.FIRST_NAME,
i.MIDDLE_NAME,
i.LAST_NAME,
i.D1NAME,
CurrentAddress.ADDRESS1,
PreviousAddress.ADDRESS1,
CurrentAddress.ADDRESS2,
PreviousAddress.ADDRESS2,
CurrentAddress.ADDRESS3,
PreviousAddress.ADDRESS3,
CurrentAddress.CITY,
PreviousAddress.CITY,
CurrentAddress.STATE,
PreviousAddress.STATE,
CurrentAddress.ZIP_STR,
PreviousAddress.ZIP_STR,
CurrentAddress.ZIP4_STR,
PreviousAddress.ZIP4_STR,
CurrentAddress.COUNTRY,
PreviousAddress.COUNTRY
FROM INDIVIDUAL i
INNER JOIN MEMBERSHIPPARTICIPANT mpt
ON i.INDIVIDUAL_ID = mpt.INDIVIDUAL_ID
AND i.DL_LOAD_DATE = mpt.DL_LOAD_DATE
INNER JOIN AGR_MEMBERTOTAL_TODAY m
ON mpt.MEMBER_NBR = m.MEMBER_NBR
AND mpt.DL_LOAD_DATE = m.DL_LOAD_DATE
INNER JOIN BRANCH b
ON i.BRANCH_NBR = b.BRANCH_NBR
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #latestDate) AS CurrentAddress
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #previousDate) AS PreviousAddress
WHERE i.DL_LOAD_DATE = #latestDate
AND ( m.OPN_LN_ALL_CNT > 0 OR m.OPN_SV_ALL_CNT > 0 )
order by i.FIRST_NAME asc
DELETE #results
WHERE Address1_Today = Address2_Yesterday
AND Address2_Today = Address1_Yesterday
SELECT *
FROM #results
WHERE (Address1_Today != Address1_Yesterday
OR Address2_Today != Address2_Yesterday
OR Address3_Today != Address3_Yesterday
OR City_Today != City_Yesterday
OR State_Today != State_Yesterday
OR ZipCode_Today != ZipCode_Yesterday
--OR FullZip_Today != FullZip_Yesterday
OR Country_Today != Country_Yesterday)
I'd like to remove the almost duplicate rows
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
There isn't a built in way to test via SQL, and it will have to be defined by logic via procedure. The first thing I'd do is group the substrings in both Old Address and New Address by count of those substrings. The ones where the counts equal each other at the row level, you can split by space and break up the address. Think of each address field as three parts [street_nbr, street_nm, street_suffix]. The street_nm can have an abbreviated prefix, which is why grouping the count of substrings is important thereby increasing the count past 3. Secondary lookup tables that match words/abbreviations that you identify can then be used to "un-duplicate" those suffixes and prefixes.
CREATE TABLE lookup_abbreviations(
unabbreviated_name varchar(50),
abbreviated_name varchar(50));
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('East', 'E')
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('Street', 'St');
-- Use Cross Applies and functions(LEN, LEFT, RIGHT, CHARINDEX, SUBSTRING) to split the address
-- into equal parts. This is where you'll have to figure out the best logic for grouping.
SELECT DISTINCT
Old_Street_Nbr = SUBSTRING(Old_Address, CHARINDEX(' ', Old_Address))
Old_Street_Nm_Prefix = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Nm = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Suffix = []
INTO #AbbreviatonSort
FROM Results;
SELECT
Old_Street_Nbr ,
Old_Street_Nm_Prefix = CASE
WHEN Old_Street_Nm_Prefix IN (SELECT abbreviated_name from
lookup_abbreviations)
THEN (SELECT unabbreviated_name from
lookup_abbreviations WHERE abbreviated_name =
Old_Street_Nm_Prefix)
ELSE Old_Street_Nm_Prefix
END
INTO #SortedAddresses
FROM #AbbreviationSort
;
SELECT DISTINCT * FROM
(
SELECT Old_Street_Nbr, Old_Prefix FROM #SortedAddresses
UNION ALL
SELECT New_Street_Nbr, New_Prefix FROM #SortedAddresses
) AS DupSearch

How to return a potential NULL in SQL while performing a comparison on the potentially null value

Developing a query that will return information about an item stored across 4 tables. When all the fields have values my query works fine, however some of the data has null fields (which I can't change) that I need to perform my comparison on. When these show up, the row doesn't show up in the query results, even if all the other fields have values.
Here is what I have so far:
select [Item_NO], [Color_Name], [size_code], [style_code], [PERM_UNIT_PRICE]
FROM [USICOAL].[dbo].[ITEM], [USICOAL].[dbo].[COLOR], [USICOAL].[dbo].[SIZE], [USICOAL].[dbo].[STYLE]
where [ITEM_NO] in ('191202002944', '191202003026')
AND [USICOAL].[dbo].[ITEM].[COLOR_ID] = [USICOAL].[dbo].[COLOR].[COLOR_ID]
AND [USICOAL].[dbo].[ITEM].[SIZE_ID] = [USICOAL].[dbo].[SIZE].[SIZE_ID]
AND [USICOAL].[dbo].[ITEM].[STYLE_ID] = [USICOAL].[dbo].[STYLE].[STYLE_ID]
For these 2 items numbers, the Size_ID field is null. How can I get the results to reflect this null?
SELECT
[Item_NO]
,[Color_Name]
,[size_code]
,[style_code]
,[PERM_UNIT_PRICE]
FROM
[USICOAL].[dbo].[ITEM] i
LEFT OUTER JOIN
[USICOAL].[dbo].[COLOR] c
ON c.[COLOR_ID] = i.[COLOR_ID]
LEFT OUTER JOIN
[USICOAL].[dbo].[SIZE] s
ON s.[SIZE_ID] = i.[SIZE_ID]
LEFT OUTER JOIN
[USICOAL].[dbo].[STYLE] t
ON t.[STYLE_ID] = i.[STYLE_ID]
WHERE
[ITEM_NO] in ('191202002944', '191202003026')

IBM DB2: Using MINUS to exclude information in the subselect statement

Currently I am having an issue bringing back the correct data for this particular query below. I am attempting to bring back data that excludes select criteria from the subselect statement after MINUS keyword.
SELECT
DISTINCT ORDER.OWNER, ORDER_H.PO_ID
FROM ORDER ORDER
WHERE ORDER.TYPE != 'X'
AND ORDER.STATUS='10'
AND ORDER.CLOSE_DATE IS NULL MINUS
(
SELECT
DISTINCT ORDER.OWNER, ORDER.PO_ID
FROM ORDER ORDER
INNER JOIN COST COST ON COST.PO_ID = ORDER.PO_ID
AND COST.CODE IN
(
'LGSF',
'DFCDC',
'BOF',
'TFR',
'RFR',
'TFLHC',
'BF',
'CBF',
'CHAP',
'DYPH' ,
'OFFP',
'PTWT',
'DTEN',
'OTHR',
'DMSG',
'STOR',
'TOF',
'ANTCV',
'ANTIP',
'CVD',
'TRAN'
)
WHERE ORDER.TYPE != 'OTR'
AND ORDER.STATUS = '10'
AND (COST.E_AMT > 0 AND COST.A_AMT IS NULL)
)
FOR READ ONLY WITH UR
The data coming back includes the data within the subquery instead of excluding this data from the resultset. I cannot figure out why this is the case. Does anyone have any idea why after MINUS it doesn't exclude this data and is bringing back data where COST.E_AMT is actually greater than 0 and COST.A_AMT is actually populated for each CODE listed in the subquery? Any help would be appreciated, thanks.