Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables?

Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables? - sql

Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.

I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.

Related

The IDs change in the column

I am quite a novice in programming and I kind need your help regarding SQL and an issue I notice.
I have a table:
date, ID, secondary ID, expenses
jul2020 258 0004 1000
jul2020 xxx xxxx xxx
...... .... .... .....
aug2020 258 0008 2000
aug2020 xxx xxxx xxx
aug2020 500 0004 1000
Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID.
Thanks for reading and if you have any ideas would be very helpful
UPDATE: everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

Duplicate rows because 1 column has multiple distinct values

I'm running a SELECT query to get data across multiple tables in the same server instance. However I've just noticed that the rows pulled on some data get duplicated because the main table I'm pulling from has a few different values in one of the columns. Here's the query:
SELECT DISTINCT BIF030.C_ACCOUNT AS ACCOUNTNUMBER,
BIF003.C_ACCOUNTTYPE AS ACCOUNTTYPECODE,
CON013.C_DESCRIPTION AS ACCOUNTTYPE,
BIF003.C_DIVISION AS ZONE_DIVISONCODE,
CON028.C_DESCRIPTION AS ZONE_DIVISION,
BIF030.C_METER as METERNUMBER,
BIF005.C_METERCUSTOM1 AS REGISTERNUMBER,
CONVERT(DECIMAL(20,2), BIF030.N_CONSUMP) AS CONSUMPTION,
CON007.C_DESCRIPTION AS UNITS,
BIF030.T_READDATE AS READINGDATE,
MONTH(BIF030.T_READDATE) AS READINGMONTH,
DAY(BIF030.T_READDATE) AS READINGDAY,
YEAR(BIF030.T_READDATE) AS READINGYEAR,
BIF030.I_DAYS AS READINGDAYSCOUNT
FROM ADVANCED.BIF030
LEFT JOIN ADVANCED.CON007 ON CON007.C_UNITS=BIF030.C_UNITS
LEFT JOIN ADVANCED.BIF005 ON BIF005.C_METER=BIF030.C_METER
LEFT JOIN ADVANCED.BIF003 ON BIF003.C_ACCOUNT=BIF030.C_ACCOUNT
LEFT JOIN ADVANCED.CON013 ON CON013.C_ACCOUNTTYPE=BIF003.C_ACCOUNTTYPE
LEFT JOIN ADVANCED.CON028 ON CON028.C_DIVISION=BIF003.C_DIVISION
WHERE T_READDATE > '01-01-2014'
ORDER BY ACCOUNTNUMBER, READINGDATE ASC
I know SELECT DISTINCT is frowned upon, but I get even more rows without it. Here's a sample of what the data looks like when pulled:
ACCOUNTNUMBER
ACCOUNTTYPECODE
ACCOUNTTYPE
ZONE_DIVISIONCODE
ZONE_DIVISION
METERNUMBER
REGISTERNUMBER
CONSUMPTION
UNITS
READINGDATE
READINGMONTH
READINGDAY
READINGYEAR
READINGDAYSCOUNT
1234567
SP
ACCOUNT TYPE 1
00
00-NO ZONE
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
MF
ACCOUNT TYPE 2
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
SR
ACCOUNT TYPE 3
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
I also know the column that is messing this up is the "AccountTypeCode" because other accounts that don't have multiple codes associated with the "AccountNumber" only show 1 set of rows. So this one specifically (and probably others) is tripling the amount of rows pulled when it should only pull one for each "ReadingDate".
Also if anyone knows a good way to optimize the query I'd be happy to learn. I know just enough SQL to be dangerous, but not enough to figure this out. Thanks.

Ok. So good news and I want to add this in case it helps anyone else in the future. I found out that since the ACCOUNTTYPECODE and ZONE_DIVISIONCODE were coming from the table BIF003 I needed to add more in the WHERE statement. This is what fixed it for me:
AND BIF030.C_CUSTOMER = BIF003.C_CUSTOMER
Because the C_CUSTOMER column was different (it's a column in the BIF003 and BIF030 tables) which lead to the separate ACCOUNTTYPECODE results I need to check it in the WHERE statement.
Thanks everyone for kick starting my brain on this one.

Changes to table not designed for SQL

I am supposed to do some changes to an enormous CSV file based on a different file. Therefore I chose to do it in SQL but after further consideration I am not sure how to proceed..
In the 1st table I have a list of contracts. Columns represent some segments the contract belongs to and some products that can be linked to the contract (example in the table below).
Here contract no. 1234 belongs to segments X1 and Y2. There is no product number 1 linked to it, but it has product number 2 linked to it. The product originaly ends on the 1st of January 2030.
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |01/01/2030 |..
The 2nd file is a list of combinations of segments and an indication how the "date" columns should be adjusted. The example shows following situation - if there is prod_2 linked to the contract which belongs to groups X1 and Y2, end the prod_2 this year. I need this result to alter table no. 1.
prod_no|segment_1|segment_2|result
prod_2 | X1 | Y2 | end the product on anniversary
Ergo I need to get to the result:
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |30/11/2019 |..
In the original files I have around 600k rows and more than 300 columns (meaning around 100 different products) in table 1 and around 800 possible combinations of segments in table 2.
The algorithm I need to implement (very generally):
for x=1 to 100
IF product_x = YES THEN date_product_x = date + "Seach for result in table2"
Is there a reasonable way how to change the "date_product_x" columns based on the 2nd table or would it be better to find a different solution?
Thanks a lot!

I can only give you a general approach, because the information in your question is general (for example, why does "end the product on anniversary" translate to "30/11/2019"? It's not explained in the question, so I assume you're going to be able to handle that part of the logic).
You can approach this by using an UNPIVOT on Table 1 to get a structure like:
cont_n | segment1 | segment2 | product_number | product_date
You will UNPIVOT..FOR date_product_1 thru date_product_100. You'll either have to type out all 100 column names, or use dynamic sql to build the whole thing.
You'll do some string manipulation to grab the "x" portion of "date_product_x", and turn it into "prod_x", and then you can join to the second table on the two segment columns and the "prod_x" column, get the result column value, and do whatever rules you're doing to get the value you want for date_product_x.
Finally, you take that result, and PIVOT it back to the one-row-per-contract form, and JOIN it to your original table to UPDATE the date_product_x columns.

Multi-Row Per Record SQL Statement

I'm not sure this is possible but my manager wants me to do it...
Using the below picture as a reference, is it possible to retrieve a group of records, where each record has 2 rows of columns?
So columns: Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated would be part of the first row and column: Work Notes would be a new row that spans the width of the report. Each record would have two rows. Is this possible with a GROUP BY statement?
Record 1
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record 2
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record n
...

I don't think that possible with the built in report engine. You'll need to export the data and format it using something else.
You could have something similar to what you want on short description (list report, group by short description), but you can't group by work notes so that's out.

One thing to note is that the work_notes field is not actually a field on the table, the work_notes field is of type journal_input, which means it's really just a gateway to the actual underlying data model. "Modifying" work_notes actually just inserts into sys_journal_field.
sys_journal_field is the table which stores the work notes you're looking for. Given a sys_id of an incident record, this URL will give you all journal field entries for that particular record:
/sys_journal_field_list.do?sysparm_query=name=task^element_id=<YOUR_SYS_ID>
You will notice this includes ALL journal fields (comments + work_notes + anything else), so if you just wanted work notes, you could simply add a query against element thusly:
/sys_journal_field_list.do?sysparm_query=name=task^element=work_notes^element_id=<YOUR_SYS_ID>
What this means for you!
While you can't separate a physical row into multiple logical rows in the UI, in the case of journal fields you can join your target table against the sys_journal_field table using a Database View. This deviates from your goal in that you wouldn't get a single row for all work notes, but rather an additional row for each matched work note.
Given an incident INC123 with 3 work notes, your report against the Database View would look kind of like this:
Row 1: INT123 | markmilly | This is a test incident |
Row 2: INT123 | | | Work note #1
Row 3: INT123 | | | Work note #2
Row 4: INT123 | | | Work note #3

Access 2007: Delete Partially Duplicate Rows

I have data in an Excel spreadsheet that I import into Access 2007. There is a candidate key (CN). For those rows with the same CN, the data is different for all columns. Example below (real data has 100 columns and MsgNum might vary more often, haven't confirmed this pattern with other instances yet, so although I tried to select on it, the solution should probably ignore that the combination of CN and MsgNum could be unique).
Date | CN | MsgNum
2012-01-03 111-111-1111 101
2012-01-04 222-222-2222 101
2012-01-05 222-222-2222 202
2012-01-05 333-333-3333 101
2012-01-05 333-333-3333 202
2012-01-04 444-444-4444 101
2012-01-04 444-444-4444 101
I do not have access to SQL Server. All I have is Access 2007. I don't want to use Excel's remove duplicates procedure because the data that gets to me comes from Access before being exported to Excel, so I'm trying to find a solution to remove the duplicates through Access.
Using SQL in Query Design in Access, I have tried using a subquery in the WHERE clause that groups by the CN keeping those with a count of 1, but that removes all instances and does not keep at least one.
I tried selecting just two columns (CN and MIN(MsgNum))--grouping appropriately--and that gives me what I want, but when I run it with all the columns specified (100 columns in all), I get duplicates still.
I tried the Query Wizard Find Duplicates for a single column and to return the rest of the columns, that works in getting the duplicates isolated in a view. Since I cannot setup any primary keys, I am not sure how to join the tables. When running the previous MIN query with all columns, I get the same problem as before.
I was trying to setup something in the WHERE clause that compared the combination of two columns, but I read that that cannot be done. So, I am at a lost on how to solve this issue where there is a candidate key but the information in the records for the duplicates on this candidate key column are different. What I want to have done is what Excel 2007's Remove Duplicates procedure does where duplicates on one column can be removed and retain others.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas