Best way to remove 'duplicates records'

Best way to remove 'duplicates records' - sql

I Have 3 files. The customer file has customers who never ordered or had an invoice. We want to remove those customers from the customer file. I have 2 rpg programs one for each (orders, invoice) files. They create 2 temp outfiles which have those records that we want to Purge.
I want to merge these 2 files. There are duplicates in this sense:
Customer number Suffix
123456 000
123456 001
123456 002
567890 000
Suffix can be there if the customer contacted us a second time, etc.
SO both outfiles can have these dupes.
I would like to have a final file that only has the customer number.
BUt I want to do this automatically, in a CL.
Can this be done in a CL, rather than a ad hoc SQL?

Generally speaking, CL is not a database language. Put the ad hoc SQL in a source member and execute it with the CL command RUNSQLSTM. For more dynamic SQL inside a CL program, use RUNSQL.
If I recall, this application is creating some archive files and this is the final step. When you create the archive files, it would be easy to also create the 'duplicates' file. I'd consider that as a better route, because you can more easily create a report or spreadsheet or web page or some other record of the customers you are about to purge.

Related

What is most efficient way to find ‘inverse' of getting all records that match particular criteria

I am trying to find the most efficient way to find ‘inverse' of getting all records that match particular criteria
I.e. find all predefined criteria from a set that a particular record matches
I have a table of 'target' criteria that has many records - each built using a querybuilder javascript component - so each target record has its criteria stored as a json string in a field.
I also have a standard 'person' table
It is straight forward to query how many people fit a particular target.
What I am trying to do is get all targets that match a particular person
Is there a more efficient way than just running each target's criteria against a person?
Open to suggestions beyond just sql - e.g. caching , hashing or building up some kind of lookup table/file
Edit:
Hopefully tables below clarify this issue. If I parsed and ran the 'Good Eyesight' target criteria I would expect to return both Bob and Sue
But I want to know that Bob matches the 'Young People' and 'Good Eyesight' target. I will have thousands of users and probably up to 50 active targets.
Table 1: Person
ID Name Age Fav_Vegetable
---------------------------------
1 Bob 20 Carrot
2 Sue 40 Carrot
Table 2: Target
ID Name Criteria_JSON
---------------------------------
1 Young People {"rule": "young_age", "selectedOperator": "<","selectedOperand": "Age","value": "30"}
2 Old People {"rule": "old_age", "selectedOperator": ">","selectedOperand": "Age","value": "30"}
3 Good Eyesight {"rule": "vegetable","selectedOperator": "equals","selectedOperand": "Fav_Vegetable","value": "Carrot"}

The answer I have come up with is to run all targets against all people and maintain an index type table of the results.
i.e. have a table TargetIndex with columns targetId, personId
Then when I need to know the targets for a particular person I can just check against the TargetIndex table rather than rerunning queries.
Obviously these results would need to be refreshed as the target or people records change - - probably whenever a target is added/edited and refreshed periodically (hourly/nightly?) to pick up changes in people
Thanks for people's thoughts

Merge multiple rows in fixed width file source into one row

I'm working with the craziest file format I've seen. It is fixed width, and contains multiple record types (in the sense that each row may have different columns and widths). There's a file header, trailer, and then a static number of rows that when put together make up one record. The problem I'm having is that there is nothing in the rows that tell you they belong to the same record other than their sort order and a row number attribute.
Example:
001 David Wellingsworth Mr.
002 312-555-5555 3060 W Maple St. Chicago
001 Jimothy Bogendath Dr.
002 563-555-5432 123 Main St. Davenport
My question is therefore: is it possible, without using a Script Component, to process a file like this? I understand the basic concept of how to handle disparate record types in a fixed width file (making use of conditional splits and substrings), but I can't get past how to join up all this data after the splits if the rows don't have identifiers.
If it helps, my question is basically this previous question but in reverse.

Possible but with some work. I've worked with data like these and this was our approach on how we solved them.
You will need to build a table that will give them their own unique RecordID
Create another table for your Files to log in your filename and unique fileID
Link your fileID to the RecordID so you know which file each record came from
Build all your sub tables linking to each unique RecordID
Building your tables this way will give you:
Unique recordID for each row (though there maybe duplicate in the file, in your tables they are unique).
Knowing which file each record comes from.

Databases design and primary key composed

I have a table named minibar_bill and i use it for keeping evidence of client's expenditure. I'm trying to build a hotel/pension system management.
I thought that i could make a table
Minibar_bill with (id_bill, id_minibar_product, id_client)
And i would like to add those info on an invoice based on bill_id...
How should i do it ?
I mean i want to have something like that:
Id_bill(1)
id_minibar_product(1,2,3)
id_client(123)
So first 3 records will be :
1, 1, 123
1, 2, 123
1, 3, 123
And i want the id_bill to be on invoice ... maybe i could switch id_product with id_bill
Where id_bill(1) - would be the first bill record in database
id_minibar_product(1,2,3) - would be product 1,2,3 which has been consumed by client
id_client(123) - client id which we use on invoice to collect data from Client table in order to print them on invoice( i will use C# for UI ).
What I have tried:
I've tried to make a db with field id_bill and id_product but i think it's a wrong approach since i made them a composed primary key and i cannot add them to foreign key in Invoice table.

Here are some suggestions for your design:
It's a good idea to name things descriptively, but if you create a table called Minibar_bill, that's going to be inconsistent and short sighted if you want to start charging in-room movies and in-room dining, services etc. to the room. I suggest you call it something more generic - remove Minibar from all of your table names.
You must never put comma separated values into a single field.
There are a million sales data models online, including, as already suggested, templates in MS Access. There's no point reinventing the wheel
I suggest you have something like this
Client A list of clients
Products A list of products you can be billed for (not just minibar)
Bill A client has zero or more bills (usually one)
BillLine A bill has zero ore more lines. Each line represents
One product being charged for on a bll
So Bill is the header. It's up to you whether you add a column indicating when / if it is invoiced, paid etc., or whether you want to create a seperate invoicing module.
With regards to this comment:
What i wish for is to link Invoice to minibar_bill in order to have the status on a single Invoice of all products from minibar which have been bought by a customer.
If you have a seperate invoice table you can write the BillID to it to link it.
I'm not sure if you understand that all this info exists across different tables, and when, for example, you print an invoice, you go and collect all the info from across the tables at that time.

VBA: track new equity (name) in a transaction statement

We have an investment account, with mainly equities. We can download transaction history in xlsx format.
The table format looks like this
Date Activity Symbols/description Price
10/29 Buy FLY/leasing LTD 13.92
10/22 Sell BRFRF/BURFORD cap 1.99
10/20 Buy BRFRF/BURFORD cap 1.92
10/17 NameChg JGWPT/JGWPT Holding
10/13 Buy JGW/JG Wentworth 5.31
Now, I am going to write a VBA module to
generate a list in a separate tab (worksheet) to keep the UNIQUE Symbols -- the names of equities in my portfolio.
Only work for the 'buy' or 'sell' transactions, all other activities should be ignored
Any time the transaction history is updated, the code can check for any new symbols (I don't care whether it is fully automatic, or click button), and add to the list.
I am sure I can do it with a lot conditional statements and loops, but I believe there must be some better ways.
Any idea would be helpful. Thanks so much.

Split Symbols/Description on the "/" Text to Columns. Make sure you preserve the price.
Use Microsoft Query on the Excel file in question to pull the relevant information into a table in the second sheet.

Fortran 90 OPEN file

I've been working on my project about bank account transactions (withdraw, deposit, check cashed, and balance inquiry) using "account.txt".
My TA said that I have to use temporary file. This temporary file will read line by line to find what the user is looking for. However, I did not understand this temporary OPEN file at all. Does anyone explain what that is, and if it's possible, would you attach example of it?
Here are the project instructions:
This project is about writing a program to perform transactions on bank accounts. You will be given a file which contains all the accounts in the bank (the file is named “account.txt”). Your program is to provide an interactive menu for users to perform transactions on these accounts. Your program needs to update the account file after each transaction. The user may perform transactions on accounts that are not available. Your program needs print an error message on the screen and return to the menu. In addition, your program needs to print whether a transaction is successful. For unsuccessful transaction, your program will print out the reason for the failed transaction.
Your program needs to be able to handle the following transactions:
Deposit money into an account
Withdraw money from an account
Check cashed against an account
Balance inquiry of an account
There is a limit on how many checks can be cashed against a saving account. The limit is 2 checks per month. There is a $0.25 penalty for each check cashed over the limit. If there is enough fund to cash the check but not the penalty, the transaction should go through and the resulting balance would be zero.
Here is the format in the account file for one account (data fields are separated by exactly one space):
Account type, S for saving, C for checking (1 character)
Account number of 5 digits
Last name of account holder (15 characters)
First name of account holder (15 characters)
Balance of the account in the form xxxxx.xxx
An integer field indicating how many checks have been cahsed this month (three digit)
An interest rate in the form of xx.xx (e.g. 10.01 = 10.01%)
For names with fewer than 15 characters, the data will be padded to have width of 15 characters.
Here is an example of the account file:
C 12345 Smith John 100.000 10 0.00
S 45834 Doe Jane 3462.340 0 0.30
C 58978 Bond Jones 13.320 5 0.00
*Creating temporary file
There is a way in FORTRAN to create a temporary file. Use:
OPEN(UNIT = , STATUS = "SCRATCH", ...)
There is no need to provide (FILE = ””). By using a temporary file, you can copy the accounts from the account file to the temporary file. Then when you copy the data back from the temporary file to the account file, perform the necessary transactions. Your program should not copy accounts between these two files if a transaction is to be failed.
Please forgive my english, I'm Japanese.

The are saying that a statement such as:
OPEN (7, ACCESS = 'DIRECT',STATUS = 'SCRATCH')
You can create a temporary file--one that will only live until you close it, and not be saved to disk. This file needs no name (it's never going to be referred to by name) just a unit number (in my example 7).
You can use this file to hold the account information temporarily during a transaction. You need this because, when you are inserting rows into the real file, and you don't want to overwrite subsequent data. So they are saying:
Copy everything to a temporary file
If the transaction succeeds, copy the data back to the main file but
Omit rows that are to be deleted
Add in the rows that are to be inserted
Does that help?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas