How to Iterate each row of csv for specific process - pentaho

In csv there are contact with firstName, LastName, HouseholdName.
FName LName HouseName
ABC XYZ Family1
PQR XYZ Family1
In my case there are possibility of multiple contact belong to same family(householdName).
check whether household exist in table
If not then create household and create contact with new household id, else create contact with existing household ID found in step 1
But when transformation run, at first step it check household name for all row in table. As table is empty it return false and create household each time.
How I can avoid such situation without manipulating db.Please suggest
Thanks in advance.

Related

Database schema for Sales Commissions

I'm trying to create a database with table titles which contains different titles, code(short code for the name) and commission of that title on other titles for instance.
I have a table named Title
Id Name Code CommissionOnA CommissionOnEng
1 Admin A 0 15
2 Engineer Eng 1 0
Now Is it good to have table schema like this, as the titles will change and can be inserted, updated or deleted dynamically. So with my current approach I have to alter table and add another column to it, in order to add commission for new title.
Is there any better way to do it, considering in mind that this also support multilevel sale heirarchy. Schema for any database is fine, but for MySql is preferred.
The Scenerio is, that the form where user creates a new title, dynamically renders all the titles that exist in the table with the textbox, so that when user creates a new title, he should be able to add commissions corresponding to other titles for the new title.
for instance if user creates a new Title name "Consultant" with code "c", he should see textboxes for Admin, Engineer, so that when user saves it, a row in the table gets created which has following data
Id Name Code CommissionOnA CommissionOnEng CommissionOnC
1 Admin A 0 15 0
2 Engineer Eng 1 0 0
3 Consultant C 12 5 0
Now I have another table called Employees
Id Name Title ManagerId
1 Rob 1 Null
2 Kate 2 1
3 Eli 3 2
4 Al 2 3
Now when Ido recursion, each time a junior get sale, a commission should be transfered to his manager as well as manager of his manager based on the commission specified in the title table.
So, when Al sells something, than Eli should get commission of 5 as, title of Eli is Consultant and Eli is boss of Al, so Employee with title Consultant(3) get commission of 5, if Employee with title Engineer(2) sells something.
It's better to normalise your table schemas so you don't need to add new columns instead put those related columns into their own table and then join these records via a foreign key.
For example, create a new table named commissions, then have a column for its unique ID, the ID that relates to the titles table and the commission amount:
commissions
----------------------------
id (INT, NOT NULL, Primary Key)
titles_id (INT, NOT NULL)
amount (INT, NOT NULL, DEFAULT=0)
and the data would look like:
id titles_id amount
1 1 15
2 2 1

How do you Swap details in columns from one ID to another ID?

I have 3 tables, the person table, the organisation table and the link table
called PersonOrganisation with PersonId and Organisation keys. I want to
swap two people between two organisations after the User inputs the Id's of the two people he wants to switch.
Example: I enter 2 and 6 for the PersonId's
and then the organisation associated with person 2 is swapped with organisation associated with person 6.
Thanks in Advance
How about this:
update PersonOrganization
set PersonId = (case when PersonId = 2 then 6 else 2 end)
where (PersonId in (2, 6);
In other words, swap the persons, not the organizations.

How do I make a query for if value exists in row add a value to another field?

I have a database on access and I want to add a value to a column at the end of each row based on which hospital they are in. This is a separate value. For example - the hospital called "St. James Hospital" has the id of "3" in a separate field. How do I do this using a query rather than manually going through a whole database?
example here
Not the best solution, but you can do something like this:
create table new_table as
select id, case when hospital="St. James Hospital" then 3 else null
from old_table
Or, the better option would be to create a table with the columns hospital_name and hospital_id. You can then create a foreign key relationship that will create the mapping for you, and enforce data integrity. A join across the two tables will produce what you want.
Read about this here:
http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3-database-relationships/
The answer to your question is a JOIN+UPDATE. I am fairly sure if you looked up you would find the below link.
Access DB update one table with value from another
You could do this:
update yourTable
set yourFinalColumnWhateverItsNameIs = {your desired value}
where someColumn = 3
Every row in the table that has a 3 in the someColumn column will then have that final column set to your desired value.
If this isn't what you want, please make your question clearer. Are you trying to put the name of the hospital into this table? If so, that is not a good idea and there are better ways to accomplish that.
Furthermore, if every row with a certain value (3) gets this value, you could simply add it to the other (i.e. Hospitals) table. No need to repeat it everywhere in the table that points back to the Hospitals table.
P.S. Here's an example of what I meant:
Let's say you have two tables
HOSPITALS
id
name
city
state
BIRTHS
id
hospitalid
babysname
gender
mothersname
fathername
You could get a baby's city of birth without having to include the City column in the Births table, simply by joining the tables on hospitals.id = births.hospitalid.
After examining your ACCDB file, I suggest you consider setting up the tables differently.
Table Health_Professionals:
ID First Name Second Name Position hospital_id
1 John Doe PI 2
2 Joe Smith Co-PI 1
3 Sarah Johnson Nurse 3
Table Hospitals:
hospital_id Hospital
1 Beaumont
2 St James
3 Letterkenny Hosptial
A key point is to avoid storing both the hospital ID and name in the Health_Professionals table. Store only the ID. When you need to see the name, use the hospital ID to join with the Hospitals table and get the name from there.
A useful side effect of this design is that if anyone ever misspells a hospital name, eg "Hosptial", you need correct that error in only one place. Same holds true whenever a hospital is intentionally renamed.
Based on those tables, the query below returns this result set.
ID Second Name First Name Position hospital_id Hospital
1 Doe John PI 2 St James
3 Johnson Sarah Nurse 3 Letterkenny Hosptial
2 Smith Joe Co-PI 1 Beaumont
SELECT
hp.ID,
hp.[Second Name],
hp.[First Name],
hp.Position,
hp.hospital_id,
h.Hospital
FROM
Health_Professionals AS hp
INNER JOIN Hospitals AS h
ON hp.hospital_id = h.hospital_id
ORDER BY
hp.[Second Name],
hp.[First Name];

MySQL duplicates -- how to specify when two records actually AREN'T duplicates?

I have an interesting problem, and my logic isn't up to the task.
We have a table with that sometimes develops duplicate records (for process reasons, and this is unavoidable). Take the following example:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-1234 jdoe#gmail.com
2 Jane Smith 123-555-1111 jsmith#foo.com
3 John Doe 123-555-4321 jdoe#yahoo.com
4 Bob Jones 123-555-5555 bob#bar.com
5 John Doe 123-555-0000 jdoe#hotmail.com
6 Mike Roberts 123-555-9999 roberts#baz.com
7 John Doe 123-555-1717 wally#domain.com
We find the duplicates this way:
SELECT c1.*
FROM `clients` c1
INNER JOIN (
SELECT `FirstName`, `LastName`, COUNT(*)
FROM `clients`
GROUP BY `FirstName`, `LastName`
HAVING COUNT(*) > 1
) AS c2
ON c1.`FirstName` = c2.`FirstName`
AND c1.`LastName` = c2.`LastName`
This generates the following list of duplicates:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-1234 jdoe#gmail.com
3 John Doe 123-555-4321 jdoe#yahoo.com
5 John Doe 123-555-0000 jdoe#hotmail.com
7 John Doe 123-555-1717 wally#domain.com
As you can see, based on FirstName and LastName, all of the records are duplicates.
At this point, we actually make a phone call to the client to clear up potential duplicates.
After doing so, we learn (for example) that records 1 and 3 are real duplicates, but records 5 and 7 are actually two different people altogether.
So we merge any extraneously linked data from records 1 and 3 into record 1, remove record 3, and leave records 5 and 7 alone.
Now here's were the problem comes in:
The next time we re-run the "duplicates" query, it will contain the following rows:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-4321 jdoe#gmail.com
5 John Doe 123-555-0000 jdoe#hotmail.com
7 John Doe 123-555-1717 wally#domain.com
They all appear to be duplicates, even though we've previously recognized that they aren't.
How would you go about identifying that these records aren't duplicates?
My first though it to build a lookup table identifying which records aren't duplicates of each other (for example, {1,5},{1,7},{5,7}), but I have no idea how to build a query that would be able to use this data.
Further, if another duplicate record shows up, it may be a duplicate of 1, 5, or 7, so we would need them all to show back up in the duplicates list so the customer service person can call the person in the new record to find out which record he may be a duplicate of.
I'm stretched to the limit trying to understand this. Any brilliant geniuses out there that would care to take a crack at this?
Interesting problem. Here's my crack at it.
How about if we approach the problem from a slightly different perspective.
Consider that the system is clean for a start i.e all records currently in the system are either with Unique First + Last name combinations OR the same first + last name ones have already been manually confirmed to be different people.
At the point of entering a NEW user in the system, we have an additional check. Can be implemented as an INSERT Trigger or just another procedure called after the insert is successfully done.
This Trigger / Procedure matches the
FIRST + LAST name combination of
"Inserted"record with all existing
records in the table.
For all the matching First + Last names, it will create an entry in a matching table (new table) with NewUserID, ExistingMatchingRecordsUserID
From an SQL perspective,
TABLE MatchingTable
COLUMNS 1. NewUserID 2. ExistingUserID
Constraint : Logical PK = NewUserID + ExistingMatchingRecordsUserID
INSERT INTO MATCHINGTABLE VALUES ('NewUserId', userId)
SELECT userId FROM User u where u.firstName = 'John' and u.LastName = 'Doe'
All entries in MatchingTable need resolution.
When say an Admin logs into the system, the admin sees the list of all entries in MatchingTable
eg: New User John Doe - (ID 345) - 3 Potential matches John Doe - ID 123 ID 231 / ID 256
The admin will check up data for 345 against data in 123 / 231 and 256 and manually confirm if duplicate of ANY / None
If Duplicate, 345 is deleted from User Table (soft / hard delete - whatever suits you)
If NOT, the entries for ID 354 are just removed from MatchingTable (i would go with hard deletes here as this is like a transactional temp table but again anything is fine).
Additionally, when entries for ID 354 are removed from MatchingTable, all other entries in MatchingTable where ExistingMatchingRecordsUserID = 354 are automatically removed to ensure that unnecessary manual verification for already verified data is not needed.
Again, this could be a potential DELETE trigger / Just logic executed additionally on DELETE of MatchingTable. The implementation is subject to preference.
At the expense of adding a single byte per row to your table, you could add a manually_verified BOOL column, with a default of FALSE. Set it to TRUE if you have manually verified the data. Then you can simply query where manually_verified = FALSE.
It's simple, effective, and matches what is actually happening in the business processes: you manually verify the data.
If you want to go a step further, you might want to store when the row was verified and who verified it. Since this might be annoying to store in the main table, you could certainly store it in a separate table, and LEFT JOIN in the verification data. You could even create a view to recreate the appearance of a single master table.
To solve the problem of a new duplicate being added: you would check non-verified data against the entire data set. So that means your main table, c1, would have the condition manually_verified = FALSE, but your INNER JOINed table, c2, does not. This way, the unverified data will still find all potential duplicate matches:
SELECT * FROM table t1
INNER JOIN table t2 ON t1.name = t2.name AND t1.id <> t2.id
WHERE t1.manually_verified = FALSE
The possible matches for the duplicates will be in the joined table.

sql insert and update question..multiple queries in one statement

I have a table called auctions, which has various columns such as username, auction id(the primary key), firstname, lastname, location etc, as well as a category column. The category column is blank by default, unless it is filled in for a particular record by a user.
I have made a new users table, which has username and category columns, as well as aditional fields which will be completed by user input.
I would like to know if it is possible when updating a record in the auctions table to have a category, to insert the username and category from that record into the users table as long as the username is not already present in the table.
For example, if I have the following tables:
auctions
auctionid username firstname lastname category
------------------------------------------------------------------------
1 zerocool john henry
2 fredflint fred smith
3 azazal mike cutter
Then, upon updating the second record to have a catagory like so:
2 fredflintsoner fred smith shoes
The resulting users table should be:
users
username shoes pants belts misc1 misc2
--------------------------------------------------
fredflint true
With no record have existed previously.
If additional auctions exist with the same username in the auctions table, such as:
7 fredflint fred smith belts
Then even if this auction is added to the category, a new record should not be inserted for the users table, as the username is already , however it should be updated as necessary, resulting in:
username shoes pants belts misc1 misc2
--------------------------------------------------
fredflint true true
What you are looking for is known as a TRIGGER. You can specify something to run after every insert/update in the auctions table and then determine what to do to the users table.
A couple of questions come to mind. The first is, your user table looks denormalized. What happens when you add a new category? Consider a user table in the form of:
id username category
Where you have multiple rows if a user has multiple categories:
1 fredflint shoes
2 fredflint pants
....
The second question I have is, why do you need a user table at all? It looks like all the information in the user table is already stored in the auction table! You can retrieve the user table simply by:
select distinct username, category
from auctions
If you need the separate table, an option to manually update the table when you create a new auction. I'd do it like this (I know just enough about triggers to avoid them):
1 - Make sure there's a row for this user
if not exists (select * from users where username = 'fredflint')
insert into users (username) values ('fredflint')
2 - Make sure he the shoe category
if not exists (select * from users where username = 'fredflint' and shoes = 1)
update users set shoes = 1 where username = 'fredflint'