I have users info in SQL table with 3 columns. One of the column is in XML datatype which has user information in XML format. The number of columns in the XML data can vary from User to User.For instance, under User 1, i can have 25 fields and then User 2 can have 100 fields . That can change again to 50 for User 3. The fields for each user changes. I need to be able to pull all the fields(columns) under each user and write to a SQL table XYZ.
After writing user A record into SQL table XYZ, User B will have more fields(columns) than A, here i need to ADD these fields(columns) to XYZ table making values as NULL to user A.
Is there an efficient way of achieving this using T-SQL OR SSIS?
I think your problem is not the Data loading mechanism but the Data Injection Strategy
2 strategies I can think of right now:
I would suggest you to define an XSD for your XML with the worst case (hoping it is definable) scenario and then design your db table around it. As long as the user info conforms to the XSD then you should be fine with your inserts.
You create a table like: Userid | ColumnName | ColumnValue
and then enter the data row-wise , that would give you a lot of flexibility to work around the scenario. You could then always write queries to extract the data in the format you want.
Related
I have a .net solution with a big form with many data that the customer need to fill, like a form with many steps to fill all data we need to get.
So i was wondering if it's better (from a performance and design approach) a traditional big table with many fields, o store the data only on one field of XML type.
Example of one "TraditionalTable":
RecordId
CustomerId
Data 1
Data 2....
to Data N
1
120
01/01/1980
abcd ....
123
2
20
04/02/2004
fgh ....
230
3
10
05/01/1995
xyz ....
135
Example of one "DataWithXMLField":
RecordId
CustomerId
FormData
1
120
< data>< customerdetails>< borndate>01/01/1980< /borndate>< /customerdetails >< financialinfo >...."
I've done many systems like this and prefer to keep the data as XML (often it's a serialized object). I find this to be efficient at runtime and at design time. (See item below about binary attachments).
The following are some suggestions based on what I've done in the past. Obviously it's not a one-sized hammer...
Often data is "collected" by a user and "approved" by an administrator. While collecting the data, it's stored as XML. When approved, the XML is shred and placed into "normal" relational tables/fields.
Often this data has been collected through multiple pages. Storing as XML allows collecting data in a way that is logical to the user but doesn't fit the final data structure very well.
If a form is abandoned (not completed or canceled) it's easy to delete a single row.
Things to keep in mind:
Some data is related to workflow and is separate from the data being collected. For example, and field for "Form Status" may go from "In Progress", to "Submitted" to "Approved". This type of data should be kept as regular columns.
Store Binary Data separately. If your form includes submitting binary data (like uploading a PDF) I like to generate a GUID on the front end. Store that GUID in the XML and then save the binary data separately using the GUID. Possibly on disk or in a separate "attachments" table.
Define a column for a "version number" of the XML. This way you can programmatically identify what is in the XML. This will help in the future when you need to make changes to the XML.
Define a column for a "Summary" that is short human-friendly version of the XML. For example, if your XML contains information for registering for summer camps, your "XML Summary" might contain the text: "SMITH,JOHN, Camp White Pine 2021". This text us calculated on the front end. It can then be used for displaying rows of data without having to poke into the XML. For example, an administrative page may exist that lists applications that require approval.
Define a column to indicate if the XML meets all your requirements. You don't want to validate XML in the database (it's often hard, and likely repetitive of the UI). Your business layer can apply business rules (Validation) to the XML (or classes) and store in the database an indicator that all business rules are met.
I built a simple UI for our users to query on our SQL Server DB. The UI started off as just one input field for a person's name. This field's input would be used to search on 3 fields on our database. The query up until now looks like this:
SELECT [Id], [Url], [PersonName], [BusinessName], [DOB], [POB], [Text]
FROM dbo.DataAggregate
WHERE CONTAINS([PersonName], 'NEAR((john, doe), 2, FALSE)')
OR CONTAINS([BusinessName], 'NEAR((john, doe), 2, FALSE)')
OR CONTAINS([Text], 'NEAR((john, doe), 2, FALSE)')
The above assumes the user queried on John Doe. The requirement for NEAR has to do with the format inconsistencies across data in our fields, but that's not relevant to this question, just an FYI.
Now, I've been instructed to add 4 more input fields in the UI to allow users to further tailor their query. These fields already exist in the DB records. My question is how do I add on to the above query for when the additional fields in the UI are used? Am I simply just adding several AND statements to it or OR statements to it?
Let me give an example to help you help me:
User Query:
Person Name: John Doe
DOB: 01/01/1900
Address: 123 Main St
POB: USA
Occupation: Worker
How would I add to my query to include the data for the other 4 input fields? Initially, to handle which input fields are populated and which are not, do I need IF statements in the query?
Each value in the other 4 input fields would need to be searched for in its own field, plus the Text field - i.e.
-The DOB would need to searched for in the DOB field and Text field
-The Address would need to searched for in the Address field and the Text field
etc.
It just seems there has to be a more efficient way to structure a query like this than having basically 5 sections similar to my above query separated by IF/AND/OR.
Thank you.
If you use parameters you can get round unpopulated inputs with this trick
Where
((#input1 is null) or (somefield = #input1))
Or / And
...
This is going to get messy real quick though, they'll come up with more inputs next week.
Other options
Grid with Filter capability.
Data dump for say Excel
Building the query programatically, with parameters.
I'm developing application, which generate big html reports. I need to store data in temp tables in DB for html pages. Which is the best way to do it? Generate big xml string in table tmpTable(num, xmlStr)(xmlStr - aprox. 400 Kb) for HTML page, insert into table and than select this page after user request. Or save data in temp table like tmpTable1(num, val1, val2, val3...), where val - just short strings, int and double, and generate xml using this data after user requesting. Which way will be good for perfomance?
If you can normalize the data in tabular format, it's better to have that data in table. Generate the report based on user demand. Also, if report is not changing frequently, you may generate it as a batch process and keep it on server for the required time period.
Additionally, if you want to do any historical data mining, you still have raw data in your table. You can always run your queries and get the desired outputs. I'd personally go with this approach. Please share what would you choose and any further input/feedback.
I have two databases, Database A and Database B.
Database A contains some data which needs to be placed in a table in Database B. However, before that can happen, some of that data must be “cleaned up” in the following way:
The table in Database A which contains the data to be placed in Database B has a field called “Desc.” Every now and then the users of the system put city names in with the data they enter into the “Desc” field. For example: a user may type in “Move furniture to new cubicle. New York. Add electric.”
Before that data can be imported into Database B the word “New York” needs to be removed from that data so that it only reads “Move furniture to new cubicle. Add electric.” However—and this is important—the original data in Database A must remain untouched. In other words, Database A’s data will still read “Move furniture to new cubicle. New York. Add electric,” while the data in Database B will read “Move furniture to new cubicle. Add electric.”
Database B contains a table which has a list of the city names which need to be removed from the “Desc” field data from Database A before being placed in Database B.
How do I construct a stored procedure or function which will grab the data from Database A, then iterate through the Cities table in Database B and if it finds a city name in the “Desc” field will remove it while keeping the rest of the information in that field thus creating a recordset which I can then use to populate the appropriate table in Database B?
I have tried several things but still haven’t cracked it. Yet I’m sure this is probably fairly easy. Any help is greatly appreciated!
Thanks.
EDIT:
The latest thing I have tried to solve this problem is this:
DECLARE #cityName VarChar(50)
While (Select COUNT(*) From ABCScanSQL.dbo.tblDiscardCitiesList) > 0
Begin
Select #cityName = ABCScanSQL.dbo.tblDiscardCitiesList.CityName FROM ABCScanSQL.dbo.tblDiscardCitiesList
SELECT JOB_NO, LTRIM(RTRIM(SUBSTRING(JOB_NO, (LEN(job_no) -2), 5))) AS LOCATION
,JOB_DESC, [Date_End] , REPLACE(Job_Desc,#cityName,' ') AS NoCity
FROM fmcs_tables.dbo.Jobt WHERE Job_No like '%loc%'
End
"Job_Desc" is the field which needs to have the city names removed.
This is a data quality issue. You can always make a copy of the [description] in Database A and call it [cleaned_desc].
One simple solution is to write a function that does the following.
1 - Read data from [tbl_remove_these_words]. These are the phrases you want removed.
2 - Compare the input - #var_description, to the rows in the table.
3 - Upon a match, replace with a empty string.
This solution depends upon a cleansing table that you maintain and update.
Run a update query that uses the input from [description] with a call to [fn_remove_these_words] and sets [cleaned_desc] to the output.
Another solution is to look at products like Melisa Data (DQ) product for SSIS or data quality services in the SQL server stack to give you a application frame work to solve the problem.
At the moment the team i am working with is looking into the possibility of storing data which is entered by users from a series of input wizard screens as an XML blob in the database. the main reason for this being that i would like to write the input wizard as a component which can be brought into a number of systems without having to bring with it a large table structure.
To try to clarify if the wizard has 100 input fields (for example) then if i go with the normal relational db structure then their will be a 1 to 1 relationship so will have 100 columns in database. So to get this working in another system will have to bring the tables,strore procedures etc into the new system.
I have a number of reservations about this but i would like peoples opinions??
thanks
If those inputted fields don't need to be updated or to be used for later calculation or computation some values using xml or JSON is a smart choice.
so for your scenario seems like its a perfect solution