SQL Server: Remove substrings from field data by iterating through a table of city names - sql

I have two databases, Database A and Database B.
Database A contains some data which needs to be placed in a table in Database B. However, before that can happen, some of that data must be “cleaned up” in the following way:
The table in Database A which contains the data to be placed in Database B has a field called “Desc.” Every now and then the users of the system put city names in with the data they enter into the “Desc” field. For example: a user may type in “Move furniture to new cubicle. New York. Add electric.”
Before that data can be imported into Database B the word “New York” needs to be removed from that data so that it only reads “Move furniture to new cubicle. Add electric.” However—and this is important—the original data in Database A must remain untouched. In other words, Database A’s data will still read “Move furniture to new cubicle. New York. Add electric,” while the data in Database B will read “Move furniture to new cubicle. Add electric.”
Database B contains a table which has a list of the city names which need to be removed from the “Desc” field data from Database A before being placed in Database B.
How do I construct a stored procedure or function which will grab the data from Database A, then iterate through the Cities table in Database B and if it finds a city name in the “Desc” field will remove it while keeping the rest of the information in that field thus creating a recordset which I can then use to populate the appropriate table in Database B?
I have tried several things but still haven’t cracked it. Yet I’m sure this is probably fairly easy. Any help is greatly appreciated!
Thanks.
EDIT:
The latest thing I have tried to solve this problem is this:
DECLARE #cityName VarChar(50)
While (Select COUNT(*) From ABCScanSQL.dbo.tblDiscardCitiesList) > 0
Begin
Select #cityName = ABCScanSQL.dbo.tblDiscardCitiesList.CityName FROM ABCScanSQL.dbo.tblDiscardCitiesList
SELECT JOB_NO, LTRIM(RTRIM(SUBSTRING(JOB_NO, (LEN(job_no) -2), 5))) AS LOCATION
,JOB_DESC, [Date_End] , REPLACE(Job_Desc,#cityName,' ') AS NoCity
FROM fmcs_tables.dbo.Jobt WHERE Job_No like '%loc%'
End
"Job_Desc" is the field which needs to have the city names removed.

This is a data quality issue. You can always make a copy of the [description] in Database A and call it [cleaned_desc].
One simple solution is to write a function that does the following.
1 - Read data from [tbl_remove_these_words]. These are the phrases you want removed.
2 - Compare the input - #var_description, to the rows in the table.
3 - Upon a match, replace with a empty string.
This solution depends upon a cleansing table that you maintain and update.
Run a update query that uses the input from [description] with a call to [fn_remove_these_words] and sets [cleaned_desc] to the output.
Another solution is to look at products like Melisa Data (DQ) product for SSIS or data quality services in the SQL server stack to give you a application frame work to solve the problem.

Related

SSIS Inserting incrementing ID with starting range into multiple tables at a time

Is there are one or some reliable variants to solve easy task?
I've got a number of XML files which will be converting into 6 SQL tables (via SSIS).
Before the end of this process i need to add a new (in fact - common for all tables) column (or field) into each of them.
This column represents ID with assigning range and +1 incrementing step. Like (350000, 1)
Yes, i know how to solve it on SSMS SQL stage. But i need a solution at SSIS's pre-SQL converting lvl.
I'm sure there should be well-known pattern-solutions to deal with it.
I am going to take a stab at this. Just to be clear, I don't have a lot of information in your question to go on.
Most XML files that I have dealt with have a common element (let's call it a customer) with one to many attributes (this can be invoices, addresses, email, contacts, etc).
So your table structure will be somewhat star shaped around the customer.
So your XML will have a core customer information on a 1 to 1 basis that can be loaded into a single main table, and will have array information of invoices and an array of addresses etc. Those arrays would be their own tables referencing the customer as a key.
I think you are asking how to create that key.
Load the customer data first and return the identity column to be used as a foreign key when loading the other tables.
I find it easiest to do so in script component. I'm only going to explain how to get the key back. I personally would handle the whole process in C# (deserializing and all).
Add this to Using Block:
Using System.Data.OleDB;
Add this into your main or row processing depending on where the script task / component is:
string SQL = #"INSERT INTO Customer(CustName,field1, field2,...)
values(?,?,?,...); Select cast(scope_identity() as int);";
OleDBCommanad cmd = new OleDBCommand();
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText = SQL;
cmd.Parameters.AddWithValue("#p1",[CustName]);
...
cmd.Connection.Open();
int CustomerKey = (int)cmd.ExecuteScalar(); //ExecuteScalar returns the value in first row / first column which in our case is scope_identity
cmd.Connection.Close();
Now you can use CustomerKey for all of the other tables.

Using MS Access to obtain data across linked tables

I'm new to MS Access and am trying to speed up a data gathering process that is taking forever in Powershell. In Powershell I have 10 or so web API calls to get data and each comes back as an object with multiple properties (fields.) Each set of data has related fields to 1 or more of the other sets of data. Getting the data is very quick but piping an array of objects to where-object to select-object takes over an hour and there's really not that much data. Each object contains 500-1500 "records" and 5 to 10 "fields" so I thought why not export that data and use something that's intended to search through data to do the job? I exported each object as a separate .CSV file. So enter MS Access..
I imported each of the CSV's as a separate table (easy enough.) I'm going to simplify this down for this example to the following 3 tables:
[Tables]https://i.stack.imgur.com/UCH1F.jpg
Every table has fields that relate it over to other tables. Pretty much there's some sort of Id field in every table that is related to another Id field in a different table that I need to pull a field called "name" from. I'm trying to follow the bread crumbs from the Player name to it's Network name to it's Application name, to it's Layout name, etc... I want to build a query that I would eventually just be able to export as an Excel file. I also would prefer to just write out the SQL unless it's really easier to to understand the visual query builder. I'm looking to build a sheet with the following information:
Player's Name would include all names from the Players table and getting just that data makes sense to me. SELECT Name AS PlayerName FROM Players Everything else, not so much. I feel like this will end up being some mega query as I get deeper into related table after related table. In Excel, it would be straightforward using Vlookups across tabs but that doesn't seem to be the best approach. Given the info above, I'm trying to achieve the following output:
Result table
Any help with strategy and syntax greatly appreciated!
You're looking for the JOIN clause.
SELECT
Players.Name PlayerName, Networks.Name PlayerNetwork, Applications.Name ApplicationName
FROM
Players
LEFT OUTER JOIN
Networks
ON
Networks.ID = Players.NetworkId
LEFT OUTER JOIN
Applications
ON
Applications.Id = Players.ApplicationID

SQL Insert & Update Options

Ok, so I need some SQL 101 assistance. I am building a new table in SQL that will be used in a VPN connection to an outside source. I have built a view inside of SQL that contains all the information I need from the main database for this new table. I now need to push the data into the table. The problem is that the data is constantly changing and I am not really sure what my easiest way to do this is. I need to be able to copy the data from the view to the table initially but after that I need to be able to update already existing records with new information and insert new records into the table that don't already exist.
Now, the reason I am not just using the view to deal with the data needed through the VPN is because the outside source that will be using the data needs to be able to push back some values that don't exist anywhere in my table and save them to the unique records they are associated with. When I do the update, I need to leave those values that are pushed back alone and just update the values that already exist in my database.
I hope this makes sense as I need some guidance on how to do this as I have not done this before.
The table for this looks like this:
ID Name Address Email X Y Z<br>
123456 John Smith 123 Any St john#d.com 123 12 1125
X Y & Z are the fields that will be filled from an outside source which will be tied to the ID number of the record from the web service.
Thanks for the assistance

Best way to load xml data into new SQL table

I have users info in SQL table with 3 columns. One of the column is in XML datatype which has user information in XML format. The number of columns in the XML data can vary from User to User.For instance, under User 1, i can have 25 fields and then User 2 can have 100 fields . That can change again to 50 for User 3. The fields for each user changes. I need to be able to pull all the fields(columns) under each user and write to a SQL table XYZ.
After writing user A record into SQL table XYZ, User B will have more fields(columns) than A, here i need to ADD these fields(columns) to XYZ table making values as NULL to user A.
Is there an efficient way of achieving this using T-SQL OR SSIS?
I think your problem is not the Data loading mechanism but the Data Injection Strategy
2 strategies I can think of right now:
I would suggest you to define an XSD for your XML with the worst case (hoping it is definable) scenario and then design your db table around it. As long as the user info conforms to the XSD then you should be fine with your inserts.
You create a table like: Userid | ColumnName | ColumnValue
and then enter the data row-wise , that would give you a lot of flexibility to work around the scenario. You could then always write queries to extract the data in the format you want.

One to Many - Calculated Column

I am trying to teach myself the new Tabular model for SQL 2012 SSAS to handle some analytic reports that were previously handled in (slow) stored procedures.
I've made decent progress on most of it, just figuring out how things work and how to add the calculations I need but I have been banging my head against the following:
I have a table that has file information -- it has:
ID
FileName
CurrentStatus
UploadedBy
And then a table that has statuses that the file went through (a many relationship to the file table):
FileID
StatusID
TimeStamp
What I'm trying to do is to add a calculated column to the File table that returns the TimeStamp information when a file was in a particular status. ie: StatusID=100 is uploaded. I want to add a calculated column called UploadedDate on the File table that has the associated TimeStamp information from the FileStatus table.
It seems like this should be doable with DAX but I just can't seem to wrap my head around it. Any ideas out there?
In advance, many thanks,
Brent
Here's a formula that should work for what you want to do...
=MAXX(
CALCULATETABLE(
'FileStatus'
,'FileStatus'[StatusID] = 100
)
,'FileStatus'[TimeStamp]
)
I'm assuming each file can only be in each status once (there is only one row per FileID that has StatusID 100). I believe you can just use a lookupvalue formula. The formula for your UploadedDate calculated column would be something like
=LOOKUPVALUE(FileStatus[Timestamp], File[FileID], FileStatus[FileID], FileStatus[StatusID], 100)
Here's the MSDN description of LOOKUPVALUE. You provide the column containing the value you want returned, the column you want to search, and the value you are searching for. You can add multiple criteria to your lookup table. Here's a blog post that contains a good example.