Consolidate duplicate clients and sales

Consolidate duplicate clients and sales - sql

I have a question and I can't come up with a good way to solve it, this is using SQLServer 2008R2.
I have 3 tables
Client, Invoice, Car
The client is duplicate n times, but each client has a ticket and one product.
So I am trying to consolidate the clients (with a unique identifier), now my question would be how to update the reference field on the product and ticket.
Example
**Client**
[Nombre]
,[Apellido_Paterno]
,[Apellido_Materno]
,[Sexo]
,[Estado_Civil]
,[Fecha_Nacimiento]
,[RFC]
,[Saludo]
,[Persona]
,[Razon_Social]
,[Direccion]
,[Colonia]
,[Municipio_Delegacion]
,[Estado]
,[Codigo_Postal]
,[Lada_Casa]
,[Telefono_Casa]
,[Ext_Telefono_Casa]
,[Lada_Oficina]
,[Telefono_Oficina]
,[Ext_Telefono_Oficina]
,[Telefono_Celular]
,[Email_Personal]
,[Vehiculo_Actual_Anterior]
,[Marca_Actual_Anterior]
,[AnioModelo_Actual_Anterior]
,[Color_Actual_Anterior]
,[Escolaridad]
,[Venta_Id]
,[Nombre1]
,[Nombre2]
**Invoice**
[Factura_Cliente]
,[Factura_Distribuidor]
,[Precio_Base_Vehiculo]
,[Precio_Accesorios]
,[Precio_Vehiculo_DeContado]
,[Descuento]
,[Incentivo_Calculado]
,[Fecha_Entrega_DelVehiculo]
,[Fecha_Factura_Cliente]
,[Clave_Distribuidor]
,[Seguro_Gratis]
,[Clave_Promocion]
,[Tipo_Venta]
,[Unidad_de_Intercambio]
,[Venta_Id]
**Car**
[Modelo]
,[Marca]
,[AnioModelo]
,[Basico]
,[Cabecera]
,[Version]
,[Color_Exterior]
,[Color_Interior]
,[VIN]
,[Motor]
,[Transmision]
,[Origen]
,[Basico_Linea_Modelo]
,[Venta_Id]
I can consolidate the clients, (even if they have discrepancies on their fields), but I can't find a effective solution for this.
Edit: The first column on client shows the duplicate, while the second is the id that match the ticket and product.

i would make two more tables.
one for new consolidated clients (with all the same fields as the normal client)
and a mapping table between the old client and the new client.
this mapping table should store the id of the old one, and which i it was mapped to as a new one.
from here its pretty easy. update the other tables with the new id based on its current old id in the map table..

Related

How to use loop to find related object using Pentaho Data Integration

I want to identify the bad/invalid records so that i can add in a separate SQL Table. For example, we have an account object. And i want to find bad accounts. But i need to apply some filters on contact object. If conditions satisfy based on contact then i want to inserts those invalid account records in SQL Table.
I don't want to directly query from contact. I want to query using account but conditions should be used from contact.
Do anyone knows what is the best way to perform loop in Pentaho? Check each record for contact , if all contact's condition satisfy then add Account id in table. If one of the contact record doesn't satisfy condition. The relevant account should not be added in SQL Table
For Example:
On Account "A" we have 10 contacts
if the email field is empty on all 10 contacts then add Account in SQL table(As bad data)
if on two of contact rcords has email field populated but 8 of them are blank then Account id shouldn't be added in SQL table
How we can better implement this scenario using Pentaho? Any help matters
Thanks

So you can create a transformation similar to this:
You have a query with the different account contacts
Order the query data by account
Group the information by accounts and calculate the maximum ContactMail (so if all mails in contacts are null, the max will be a null, is the result of that step is shown in the Preview data part of my screenshot)
Filter rows by MaxContactMail IS NOT NULL
These could be the basic steps, you'll need to add more steps or perform more than one transformation depending on the complexity of your data.

Add rows in table only if those rows do not already exist in that same table

I have a bunch of .csv files containing client information about their cloud usage. Each .csv contains data about one client and each file tracks what the client has used for the day.
I have one year worth of those files, for every client. For instance : I have client1__05022020, client1_06022020, client2_05022020, etc until client1_06022021 and that's so for all the other clients.
I have a temporary table named Cloud__TMP. In this table, I integrate 1 raw .csv file and then in a stored procedure, I have sql requests which fill other tables like my "client" table. Then I delete what's in Cloud__TMP and integrate another .csv and it goes on and on.
As I said, I have a "client" table with id (primary key), idCloud (nvarchar), clientName (nvarchar). I want to fill it with data from Cloud__TMP. Basically, there are 2 columns in Cloud__TMP that are of interest for my "client" table : Org (which corresponds to idCloud) and OrgFullName (which corresponds to clientName).
edit : It's important to note that there is a multitude of rows in each file since it's about cloud usage (everytime the client uses more or less, a new row is put in the .csv file). So, my columns OrgFullName and idCloud are filled the same client and the same cloud id many, many times.
My issue is that I have trouble making it so there are no duplicates in my "client" table. I tried a bunch of requests but I'm not very good at it and could use your help.
Here's one request I've tried:
INSERT INTO client (clientName, idCloud)
SELECT OrgFullName, Org
FROM Cloud__TMP AS Cloud
WHERE NOT EXISTS (SELECT * FROM client WHERE Cloud.Org = client.idCloud)

Your insert is fine . . . unless the incoming data has duplicates. You may want to remove those as well:
INSERT INTO client (clientName, idCloud)
SELECT MAX(OrgFullName), Org
FROM Cloud__TMP AS Cloud
WHERE NOT EXISTS (SELECT * FROM client WHERE Cloud.Org = client.idCloud)
GROUP BY Org;
This arbitrarily chooses one value when the incoming data has duplicates.

SSIS Inserting incrementing ID with starting range into multiple tables at a time

Is there are one or some reliable variants to solve easy task?
I've got a number of XML files which will be converting into 6 SQL tables (via SSIS).
Before the end of this process i need to add a new (in fact - common for all tables) column (or field) into each of them.
This column represents ID with assigning range and +1 incrementing step. Like (350000, 1)
Yes, i know how to solve it on SSMS SQL stage. But i need a solution at SSIS's pre-SQL converting lvl.
I'm sure there should be well-known pattern-solutions to deal with it.

I am going to take a stab at this. Just to be clear, I don't have a lot of information in your question to go on.
Most XML files that I have dealt with have a common element (let's call it a customer) with one to many attributes (this can be invoices, addresses, email, contacts, etc).
So your table structure will be somewhat star shaped around the customer.
So your XML will have a core customer information on a 1 to 1 basis that can be loaded into a single main table, and will have array information of invoices and an array of addresses etc. Those arrays would be their own tables referencing the customer as a key.
I think you are asking how to create that key.
Load the customer data first and return the identity column to be used as a foreign key when loading the other tables.
I find it easiest to do so in script component. I'm only going to explain how to get the key back. I personally would handle the whole process in C# (deserializing and all).
Add this to Using Block:
Using System.Data.OleDB;
Add this into your main or row processing depending on where the script task / component is:
string SQL = #"INSERT INTO Customer(CustName,field1, field2,...)
values(?,?,?,...); Select cast(scope_identity() as int);";
OleDBCommanad cmd = new OleDBCommand();
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText = SQL;
cmd.Parameters.AddWithValue("#p1",[CustName]);
...
cmd.Connection.Open();
int CustomerKey = (int)cmd.ExecuteScalar(); //ExecuteScalar returns the value in first row / first column which in our case is scope_identity
cmd.Connection.Close();
Now you can use CustomerKey for all of the other tables.

Databases design and primary key composed

I have a table named minibar_bill and i use it for keeping evidence of client's expenditure. I'm trying to build a hotel/pension system management.
I thought that i could make a table
Minibar_bill with (id_bill, id_minibar_product, id_client)
And i would like to add those info on an invoice based on bill_id...
How should i do it ?
I mean i want to have something like that:
Id_bill(1)
id_minibar_product(1,2,3)
id_client(123)
So first 3 records will be :
1, 1, 123
1, 2, 123
1, 3, 123
And i want the id_bill to be on invoice ... maybe i could switch id_product with id_bill
Where id_bill(1) - would be the first bill record in database
id_minibar_product(1,2,3) - would be product 1,2,3 which has been consumed by client
id_client(123) - client id which we use on invoice to collect data from Client table in order to print them on invoice( i will use C# for UI ).
What I have tried:
I've tried to make a db with field id_bill and id_product but i think it's a wrong approach since i made them a composed primary key and i cannot add them to foreign key in Invoice table.

Here are some suggestions for your design:
It's a good idea to name things descriptively, but if you create a table called Minibar_bill, that's going to be inconsistent and short sighted if you want to start charging in-room movies and in-room dining, services etc. to the room. I suggest you call it something more generic - remove Minibar from all of your table names.
You must never put comma separated values into a single field.
There are a million sales data models online, including, as already suggested, templates in MS Access. There's no point reinventing the wheel
I suggest you have something like this
Client A list of clients
Products A list of products you can be billed for (not just minibar)
Bill A client has zero or more bills (usually one)
BillLine A bill has zero ore more lines. Each line represents
One product being charged for on a bll
So Bill is the header. It's up to you whether you add a column indicating when / if it is invoiced, paid etc., or whether you want to create a seperate invoicing module.
With regards to this comment:
What i wish for is to link Invoice to minibar_bill in order to have the status on a single Invoice of all products from minibar which have been bought by a customer.
If you have a seperate invoice table you can write the BillID to it to link it.
I'm not sure if you understand that all this info exists across different tables, and when, for example, you print an invoice, you go and collect all the info from across the tables at that time.

Duplicate a record and its references in web2py

In my web2py application I have a requirement to duplicate a record and all its references.
For example
one user has a product (sponserid is the user). and this product has so many features stored in other tables (reference to product id).
And my requirement is if an another user is copying this product, the a new record will generate in the product table with new productid and new sponserid. And all the reference table records will also duplicate with the new product id. Effectively a duplicate entry is creating in all the tables only change is product id and sponserid.
The product table fields will change. So I have to write a dynamic query.
If I can write a code like below
product = db(db.tbl_product.id==productid).select(db.tbl_product.ALL).first()
newproduct = db.tbl_product.insert(sponserid=newsponserid)
for field,value in product.iteritems():
if field!='sponserid':
db(db.tbl_product.id==newproduct).update(field=value)
But I cannot refer a field name like this in the update function.
Also I would like to know if there is any other better logic to achieve this requirement.
I would greatly appreciate any suggestions.

For the specific problem of using the .update() method when the field name is stored in a variable, you can do:
db(db.tbl_product.id==newproduct).update(**{field: value})
But an easier approach altogether would be something like this:
product = db(db.tbl_product.id==productid).select(db.tbl_product.ALL).first()
product.update(sponserid=newsponserid)
db.tbl_product.insert(**db.tbl_product._filter_fields(product))
The .update() method applied to the Row object updates only the Row object, not the original record in the db. The ._filter_fields() method of the table takes a record (Row, Storage, or plain dict) and returns a dict including only the fields that belong to the table (it also filters out the id field, which the db will auto-generate).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas