Database changes between two databases - sql

I have two Microsoft Access 2007 databases. They both have tables that are very similar, there are a couple of columns that are different, but most of the structure is the same.
I exported one of the tables (table A) to Excel and filtered, getting a list of rows and their keys.
What I want to do is to update rows on the other table (table B) where the keys are the same, I can't filter using the same logic because table B does not have the columns needed to filter in the same way.
What will happen is some cell in table B will be set to the value in table A, if the key for the row matches a key on the filtered table A.
My idea was to import both tables into a C# application and make the changes programmatically, but if there is an easier way, possibly a SQL based way (Or using something in Excel) to update one of the tables based on a table in a different database, I would like to use it to be able to finish faster.

You can write cross database SQL query's in ms-access. This is based on file location. Explained here
So you could do the same with an update query.
UPDATE
s2006
SET
s2006.col1 = s2007.col1,
s2006.col2 = s2007.col2
FROM
c:\data\Sales2006.Sales s2006
INNER JOIN
c:\data\Sales2007.Sales s2007
ON
s2006.id = s2007.id
This sound to me like a situation to avoid however.

Related

How to implement an add if not available in the database in Pentaho?

How do I implement, or what steps do I use to create a transformation that compares a table and a list . For example a database table name Schools and an excel file with a huge list of names of Schools.
if the entry in the excel is not seen in the database, it should then be added to the database table.
I'm not quite sure if I can use the database lookup step, it does not tell if a lookup fails. insert update step doesn't seem a solution as well, for it requires some ID value but no ID is present on the list of schools in the excel file
Based on the information that you provided a simple join with table insert step will do your task. You can use the Merge rows step for comparing both the data stream (excel and database). The merge rows step uses the key to compare two streams and add a flag field which marks the row as new, identical, changed, deleted. In your case you would like to insert all the rows that are marked as new by using table insert step.
Please check the below links for more reference.
Merge rows, Synchronize after merge
This what worked for me,
excel file -->
select values (to delete unnecessary fields) -->
database lookup (this will create a new field, and will set null if not found) -->
filter rows (get the fields with null output from lookup) -->
table output (insert the filtered records)

FIND all tables I need to join to get relationship between two tables

I'm using SQL Server 2012. I want to join two tables without columns that I can join them, how can I find all the tables to reach to this two tables?
For example: I need to join the Table A to table D and to do that I need to connect A to B and then to C and at the end to D.
My question is: can I find the tables B and C among thousands of tables in the database without searching table by table?
Thanks a lot,
Ohad
Assuming that:
You want to automate this process
You have FOREIGN KEY constraints that you can rely on
You should proceed as follows:
Query sys.foreign_keys and create a directed graph structure that will contain the links between tables.
Then implement a graph search algorithm that will start from table A and try to find a path to table D and from D to A.
Once you have found the path, it will be easy to construct dynamic SQL containing the join of all tables on the path. You will need to query sys.foreign_key_columns as well to be able to construct the ON clauses of the JOIN's.
Let me know if you need help with more detail.
There's a couple of things you can do to help your cause, but for the most part, there's no direct way and you would need to know how your database is structured and the purposes of the tables. Furthermore, based on the database's design, it might be very difficult for you to intuitively find your answer and you might need just need to get guidance from someone who is knowledgeable with the database design. Regardless:
Fields in your tables A & D:
you can look at primary fields or unique fields in the tables to determine what other tables may link to those table. Usually they are named in a way that match those other tables and you can tell what table they're coming from.
Information_Schema Views
You can use information_schema.tables and information_schema.column views to easily search for names of tables and columns across the entire database and narrow your search to less tables.

Refer to table by id

Is there a way to reference a table by id. Essentially, the table contains rows which pertain to one of many tables.
e.g. I need to reference tables A or B or C in table D's rows so that I know which table to join on.
I assume that if there is no shorthand way to reference a table externally then the only option is to store the table's name.
There is a "shorthand reference" for a table: the object identifier - the OID of the catalog table pg_class. Simple way to get it:
SELECT 'schema_name.tbl_name'::regclass
However, do not persist this in your user tables. OIDs of system tables are not stable across dump / restore cycles.
Also, plain SQL statements do not allow to parameterize table names. So you'll have to jump through hoops to use your idea in queries. Either dynamic SQL or multiple LEFT JOINs with CASE statements ... potentially expensive.

Joining two tables on different database servers

I need to join two tables Companies and Customers.
Companies table is in MS SQLServer and Customer table is in MySQL Server .
What is the best way to achieve this goal ?
If I am understand correctly, you need to join tables in SQL Server, not in code, because tag is sql.
If I have right, then you need to do some administrative tasks, like server linking.
Here you have an explanation how to link MySQL server into MSSQL server.
After you successfully link those servers, then your syntax is simple as:
SELECT
[column_list]
FROM companies
JOIN [server_name].[database_name].[schema_name].[table_name]
WHERE ...
Keep in mind that when accessing tables that exist on linked server, then you must write four-part names.
In order to query 2 databases, you need 2 separate connections. In this case, you would also need separate drivers, since you have a MSSQL and a MySQL database. Because you need separate connections, you need 2 separate queries. Depending on what you want to do, you can first retrieve your Companies and then do a query on Customers WHERE company = 'some value from COMPANIES' (or the other way around).
You could also just fetch every row from both tables in their own lists and compare those lists in your code, rather than using a query.
Try the following:
1 retrieve the data from Companies table from the SQL server and store the required columns in an ArrayList<HashMap<String,String>> format.
Therefore creating rows as arraylist index and HashMap as the key value pair responding to column names. Key: column name and Value as column value of that row.
2 Then pull data from Customer tables adding a where clause by converting data from your first Map into a comma separated format. Thus creating a filter similar to the join in SQL.
Add the data to the same result set data as before thus not over lapping the column names in the HashMap.
when u need to access the 5th row column7 then u write
ArrayList.get(4).get("column7");
The logic is given, Please implement it yourself.
Select Companies from DB1
Select Customers from DB2
Put them in Map<WhatToJoinOn, Company> and Map<WhatToJoinOn, Customer>
Join on map keys, creating a List<CompanyCustomer>

How to create a Primary Key on quasi-unique data keys?

I have a nightly SSIS process that exports a TON of data from an AS400 database system. Due to bugs in the AS400 DB software, ocassional duplicate keys are inserted into data tables. Every time a new duplicate is added to an AS400 table, it kills my nightly export process. This issue has moved from being a nuisance to a problem.
What I need is to have an option to insert only unique data. If there are duplicates, select the first encountered row of the duplicate rows. Is there SQL Syntax available that could help me do this? I know of the DISTINCT ROW clause but that doesn't work in my case because for most of the offending records, the entirety of the data is non-unique except for the fields which comprise the PK.
In my case, it is more important for my primary keys to remain unique in my SQL Server DB cache, rather than having a full snapshot of data. Is there something I can do to force this constraint on the export in SSIS/SQL Server with out crashing the process?
EDIT
Let me further clarify my request. What I need is to assure that the data in my exported SQL Server tables maintains the same keys that are maintained the AS400 data tables. In other words, creating a unique Row Count identifier wouldn't work, nor would inserting all of the data without a primary key.
If a bug in the AS400 software allows for mistaken, duplicate PKs, I want to either ignore those rows or, preferably, just select one of the rows with the duplicate key but not both of them.
This SELECT statement should probably happen from the SELECT statement in my SSIS project which connects to the mainframe through an ODBC connection.
I suspect that there may not be a "simple" solution to my problem. I'm hoping, however, that I'm wrong.
Since you are using SSIS, you must be using OLE DB Source to fetch the data from AS400 and you will be using OLE DB Destination to insert data into SQL Server.
Let's assume that you don't have any transformations
Add a Sort transformation after the OLE DB Source. In the Sort Transformation, there is a check box option at the bottom to remove duplicate rows based on a give set of column values. Check all the fields but don't select the Primary Key that comes from AS400. This will eliminate the duplicate rows but will insert the data that you still need.
I hope that is what you are looking for.
In SQL Server 2005 and above:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY almost_unique_field ORDER BY id) rn
FROM import_table
) q
WHERE rn = 1
There are several options.
If you use IGNORE_DUP_KEY (http://www.sqlservernation.com/home/creating-indexes-with-ignore_dup_key.html) option on your primary key, SQL will issue a warning and only the duplicate records will fail.
You can also group/roll-up your data but this can get very expensive. What I mean by that is:
SELECT Id, MAX(value1), MAX(value2), MAX(value3) etc
Another option is to add an identity column (and cluster on this for an efficient join later) to your staging table and then create a mapping in a temp table. The mapping table would be:
CREATE TABLE #mapping
(
RowID INT PRIMARY KEY CLUSTERED,
PKIN INT
)
INSERT INTO #mapping
SELECT PKID, MIN(rowid) FROM staging_table
GROUP BY PKID
INSERT INTO presentation_table
SELECT S.*
FROM Staging_table S
INNER JOIN #mapping M
ON S.RowID = M.RowID
If I understand you correctly, you have duplicated PKs that have different data in the other fields.
First, put the data from the other database into a staging table. I find it easier to research issues with imports (especially large ones) if I do this. Actually I use two staging tables (and for this case I strongly recommend it), one with the raw data and one with only the data I intend to import into my system.
Now you can use and Execute SQL task to grab the one of the records for each key (see #Quassnoi for an idea of how to do that you may need to adjust his query for your situation). Personally I put an identity into my staging table, so I can identify which is the first or last occurance of duplicated data. Then put the record you chose for each key into your second staging table. If you are using an exception table, copy the records you are not moving to it and don't forget a reason code for the exception ("Duplicated key" for instance).
Now that you have only one record per key in a staging table, your next task is to decide what to do about the other data that is not unique. If there are two different business addresses for the same customer, which do you chose? This is a matter of business rules definition not strictly speaking SSIS or SQL code. You must define the business rules for how you chose the data when the data needs to be merged between two records (what you are doing is the equivalent of a de-dupping process). If you are lucky there is a date field or other way to determine which is the newest or oldest data and that is the data they want you to use. In that case once you have selected just one record, you are done the intial transform.
More than likely though you may need different rules for each other field to choose the correct one. In this case you write SSIS transforms in a data flow or Exec SQl tasks to pick the correct data and update the staging table.
Once you have the exact record you want to import, then do the data flow to move to the correct production tables.