I'm using a Control Flow and Data Flow tasks to record the number of rows read from an Excel data source in Visual Studio SSIS. The data is then processed into good and bad tables, the rows in these counted and the results written into a statistics table via a parameterised SQL statement.
For reasons unknown the data seems to be getting written into the wrong fields in the statistics table and despite recreating the variables and explicitly setting the columns for each variable I can't fix or identify the problem.
Three variables are set:
1. Total rows read from source Excel via a Row Count task (approx 28964 rows)
Rows written to table as 'good' data after processing (most of the source files, approx 28,540)
Rows written to table as 'bad' data after processing (approx 424)
Then the varables are stored in a separate table via a SQL command that reads parameters set from the variables. A final percentage field is calculated from the total rows and the errors.
However, the results in the table seem to be in the wrong fields (see image).
I've checked this several times and recreated all the tables and variables but get the same result. All tables are Access.
Any ideas?
Any help is much appreciated.
Is that an Access parameterised query?
I've never run one of those from SSIS. I do know that SSIS can be weird about mapping the values from the variables to the query parameters. Have you noticed that the display order of your variables (in the variable-to-parameter mapping) is the same as how they get assigned to parameters?
It looks as though the GoodRows value (28540) is going to P1, BadRows to P2 and TotalRows to P3. That's the order that the variables appear in the mapping.
This is exactly the bizarre, infuriating thing that I've seen SSIS do - though not specifically with Access SQL statements. SSIS sometimes maps your variables to the parameters in the order that they appear in the mapping list, completely ignoring what you specify in the Parameter Name column.
Try deleting all the mappings, and mapping the variables one after another so that they appear in the order P1, P2, P3 in the mapping table.
I recommend that you create a fourth variable for the fourth parameter rather than trying to do math in the ExecuteSQL task.
Instead of using P1, P2 & P3 in the Parameter Names column of the Parameter-mapping tab, try using their zero-based ordinal position.
In the query itself, use question marks for the parameters:
...VALUES ("France", ?, ?, ?, ?)
In other words, for the parameter used first in the query, use 0 for the name. Use 1 for the next parameter, 2 for the next parameter, and so on.
If that doesn't work, you can use your variables to build a string variable that holds the entire SQL string that you want to execute, and use the "SQL from Variable" option in the ExecuteSQL task.
Please try to replace the Parameter Names in the Parameter Mapping with 0, 1 and 2.
Just use numbers in the column order you need. In my SSIS-Projects this works fine.
Related
I'm writing a Command for a Crystal Report that queries an SQL Database. The Command will use parameters/inputs that are generated from a different program. I've put parameters directly in Commands before, but this one has to be handled differently.
Said input will be a string that is numbers with an & in between such as this: "6&12&15", order is irrelevant in this case. For understanding purposes, we'll say that the numbers are product ID's and are unique. When a user wants to search for multiple products in this database, the string above will be how it looks.
I have used the following code in the past for non-number based strings and it works well because of how other fields are set up:
CASE WHEN '{?WearhouseState}' = '' THEN 1
WHEN CHARINDEX(Products.WearhouseState,'{?WearhouseState}',0)>0 THEN 1
ELSE 0
END = 1
That code will search for the field's value as a substring essentially anywhere in the given input parameter, which works for things like a state because "Texas" is never going to be a substring of any other state. However, this doesn't work so well with numbers. For example, if a product has an ID of 3, then the search will return that record if the parameter is '31', which I do not clearly want (it would also return product 1 as well).
For the mean time, I have been splitting the string up with a delimiter in Crystal Reports which works fine, but slows down the overall time to create the document. Most of the parameters I use I tend to put right in the query and it drastically improves the speed. The Crystal code is as follows:
{?ProductID}="" or {Command.ProductID} in split({?ProductID},"&")
This works exactly as intended but again, time is of the essence. Any additional information can be provided. It is technically InterSystems SQL so keep that in mind because I know the commands/clauses can vary between SQL.
I'd do the split string operation in SQL Server instead of CR. See e.g. T-SQL split string for a working code sample. Note that this logic does not need to run as a function, but you could also include it directly in your CR command.
I am writing a query where 'batch_name' is the parameter, some times I get only one batch name and sometime I get 2 or more batch names. How can I handle this in Oracle BI Publisher query,
Here is my query,
Select * from pay_batch_headers pbh Where UPPER(pbh.batch_name) = UPPER(:p_batch_name)
Now this query will handle for only one batch name, I want it to handle multiple batch names.
something like Where UPPER(pbh.batch_name) IN ('Batch1','Batch2','Batch3')
But problem to use IN clause is I cant predict number of batches I have to query. Can any one help me in this please.
You have two choices. One is to munge the variables together into a string and use some method, such as regexp_like():
where regexp_like(upper(pbh.batch_name), ??)
The parameter string should look like: '^abc|def|ghi|jkl$'. You can make it as long as you like.
Another method is to use execute immediate. Dump the values into a SQL query as a string, using IN. The advantage of this method is that it can more easily use indexes
I got a table with 75 columns,. what is the sql statement to display only the columns with values in in ?
thanks
It's true that a similar statement doesn't exist (in a SELECT you can use condition filters only for the rows, not for the columns). But you could try to write a (bit tricky) procedure. It must check which are the columns that contains at least one not NULL/empty value, using queries. When you get this list of columns just join them in a string with a comma between each one and compose a query that you can run, returning what you wanted.
EDIT: I thought about it and I think you can do it with a procedure but under one of these conditions:
find a way to retrieve column names dynamically in the procedure, that is the metadata (I never heard about it, but I'm new with procedures)
or hardcode all column names (loosing generality)
You could collect column names inside an array, if stored procedures of your DBMS support arrays (or write the procedure in a programming language like C), and loop on them, making a SELECT each time, checking if it's an empty* column or not. If it contains at least one value concatenate it in a string where column names are comma-separated. Finally you can make your query with only not-empty columns!
Alternatively to stored procedure you could write a short program (eg in Java) where you can deal with a better flexibility.
*if you check for NULL values it will be simple, but if you check for empty values you will need to manage with each column data type... another array with data types?
I would suggest that you write a SELECT statement and define which COLUMNS you wish to display and then save that QUERY as a VIEW.
This will save you the trouble of typing in the column names every time you wish to run that query.
As marc_s pointed out in the comments, there is no select statement to hide columns of data.
You could do a pre-parse and dynamically create a statement to do this, but this would be a very inefficient thing to do from a SQL performance perspective. Would strongly advice against what you are trying to do.
A simplified version of this is to just select the relevant columns, which was what I needed personally. A quick search of what we're dealing with in a table
SELECT * FROM table1 LIMIT 10;
-> shows 20 columns where im interested in 3 of them. Limit is just to not overflow the console.
SELECT column1,column3,colum19 FROM table1 WHERE column3='valueX';
It is a bit of a manual filter but it works for what I need.
I have a data driven site with many stored procedures. What I want to eventually be able to do is to say something like:
For Each #variable in sproc inputs
UPDATE #TableName SET #variable.toString = #variable
Next
I would like it to be able to accept any number of arguments.
It will basically loop through all of the inputs and update the column with the name of the variable with the value of the variable - for example column "Name" would be updated with the value of #Name. I would like to basically have one stored procedure for updating and one for creating. However to do this I will need to be able to convert the actual name of a variable, not the value, to a string.
Question 1: Is it possible to do this in T-SQL, and if so how?
Question 2: Are there any major drawbacks to using something like this (like performance or CPU usage)?
I know if a value is not valid then it will only prevent the update involving that variable and any subsequent ones, but all the data is validated in the vb.net code anyway so will always be valid on submitting to the database, and I will ensure that only variables where the column exists are able to be submitted.
Many thanks in advance,
Regards,
Richard Clarke
Edit:
I know about using SQL strings and the risk of SQL injection attacks - I studied this a bit in my dissertation a few weeks ago.
Basically the website uses an object oriented architecture. There are many classes - for example Product - which have many "Attributes" (I created my own class called Attribute, which has properties such as DataField, Name and Value where DataField is used to get or update data, Name is displayed on the administration frontend when creating or updating a Product and the Value, which may be displayed on the customer frontend, is set by the administrator. DataField is the field I will be using in the "UPDATE Blah SET #Field = #Value".
I know this is probably confusing but its really complicated to explain - I have a really good understanding of the entire system in my head but I cant put it into words easily.
Basically the structure is set up such that no user will be able to change the value of DataField or Name, but they can change Value. I think if I were to use dynamic parameterised SQL strings there will therefore be no risk of SQL injection attacks.
I mean basically loop through all the attributes so that it ends up like:
UPDATE Products SET [Name] = '#Name', Description = '#Description', Display = #Display
Then loop through all the attributes again and add the parameter values - this will have the same effect as using stored procedures, right??
I dont mind adding to the page load time since this is mainly going to affect the administration frontend, and will marginly affect the customer frontend.
Question 1: you must use dynamic SQL - construct your update statement as a string, and run it with the EXEC command.
Question 2: yes there are - SQL injection attacks, risk of mal-formed queries, added overhead of having to compile a separate SQL statement.
Your example is very inefficient, so if I pass in 10 columns you will update the same table 10 times?
The better way is to do one update by using sp_executesql and build this dynamically, take a look at The Curse and Blessings of Dynamic SQL to see how you have to do it
Is this a new system where you have the freedom to design as necessary, or are you stuck with an existing DB design?
You might consider representing the attributes not as columns, but as rows in a child table.
In the parent MyObject you'd just have header-level data, things that are common to all objects in the system (maybe just an identifier). In the child table MyObjectAttribute you'd have a primary key of with another column attrValue. This way you can do an UPDATE like so:
UPDATE MyObjectAttribute
SET attrValue = #myValue
WHERE objectID = #myID
AND attrName = #myAttrName
I am reading and validating large fixed-width text files (range from 10-50K lines) that are submitted via our ASP.net website (coded in VB.Net). I do an initial scan of the file to check for basic issues (line length, etc). Then I import each row into a MS SQL table. Each DB rows basically consists of a record_ID (Primary, auto-incrementing) and about 50 varchar fields.
After the insert is done, I run a validation function on the file that checks each field in each row based on a bunch of criteria (trimmed length, isnumeric, range checks, etc). If it finds an error in any field, it inserts a record into the Errors table, which has an error_ID, the record_ID and an error message. In addition, if the field fails in a particular way, I have to do a "reset" on that field. A reset might consist of blanking the entire field, or simply replacing the value with another value (e.g. replacing the string with a new one that has all illegals chars taken out).
I have a 5,000 line test file. The upload, initial check, and import takes about 5-6 seconds. The detailed error check and insert into the Errors table takes about 5-8 seconds (this file has about 1200 errors in it). However, the "resets" part takes about 40-45 seconds for 750 fields that need to be reset. When I comment out the resets function (returning immediately without actually calling the UPDATE stored proc), the process is very fast. With the resets turned on, the pages take 50 seconds to return.
My UPDATE stored proc is using some recommended code from http://sommarskog.se/dynamic_sql.html, whereby it uses CASE instead of dynamic SQL:
UPDATE dbo.Records
SET dbo.Records.file_ID = CASE #field_name WHEN 'file_ID' THEN #field_value ELSE file_ID END,
.
. (all 50 varchar field CASE statements here)
.
WHERE dbo.Records.record_ID = #record_ID
Is there any way I can help my performance here. Can I somehow group all of these UPDATE calls into a single transaction? Should I be reworking the UPDATE query somehow? Or is it just sheer quantity of 750+ UPDATEs and things are just slow (it's a quad proc server with 8GB ram).
Any suggestions appreciated.
Don't do this in sql; fix the data up in code, then do you updates.
If you have sql 2008, then look into table-value parameters. It enables you to pass an entire table as a parameter to a s'proc. From their you just have the one insert/update or merge statement
If your looping through the lines and doing individual updates/inserts this can be really expensive... Consider using SqlBulkCopy which can speed up all your inserts. Similarly, you can create a DataSet, make your updates on the dataset and then submit them all in one shot through a SqlDataAdapter.
I believe you are doing 50 case statements on every update. Sounds like that would be slow.
It is possible to solve this problem with inject proof code via parameterized querys and a string constant table.
Quick and dirty example code.
string [] queryList = { "UPDATE records SET col1 = {val} WHERE ID={key}",
"UPDATE records SET col2 = {val} WHERE ID={key}",
"UPDATE records SET col3 = {val} WHERE ID={key}",
...
"UPDATE records SET col50 = {val} WHERE ID={key}"}
Then in your call to SQL you just pick the item in the array corresponding to the col you want to update and set the value and key for the parameterized items.
I'm guessing you will see a significant improvement... let me know how it goes.
Um. Why are you inserting numeric data into VARCHAR fields then trying to run numeric checks on it? This is yucky.
Apply correct data typing and constraints to your table, do the INSERT, and see if it failed. SQL Server will happily report errors back to you.
I would try changing the recovery model to simple and look at my indexes. Kimberly Tripp did a session showing a scenario with improved performance using a heap.