How to fix BCP file with widechar support that causes Pentaho data integration to fail on insert of values from first character of first row? - bcp

I have an bat file that collects data with a bcp extract call that executes a Stored Procedure(SP) with the -w flag. When the data from that file is consumed by our Pentaho transformation, there is an additional character added to the first value in any row. The CSV input step uses "UTF-16LE" but the first field has a value that has garbage characters before the value (ex. "1" instead of "1"). Is there an additional option to bcp that can either add a header row or is there something that can cleanse this character from the pentaho side.
Sample BCP command :
bcp "exec [companyschema].[collectdataprocedure] %SESSIONID%" queryout collectedoutput.csv -t "," -w -T -S
The issue occurs when I try to load to the database within the transformation.
I have tried skipping the first row of the data but do need to have that data loaded to the db.

Found an answer to the issue and it is to use the Replace in String step with Search Pattern of "([^A-Za-z0-9\-])" and set Empty String "Y" to replace the first field in your row with the same name.
This resolved the issue with losing the first row of data.

Related

SSIS Pipe delimited file not failing when the row has more number pipes than the column number?

My Source File is (|) Pipe Delimited text file(.txt). I am trying load the file into SQL Server 2012 using SSIS(SQL Server Data Tools 2012). I have three columns. Below is the example for how data in file looks like.
I am hoping my package should fail as this is pipe(|) delimited instead my package is a success and the last row in the third column with multiple Pipes into last column.
My Question is Why is't the package failing? I believe it has corrupt data because it has more number of columns if we go by delimiter?
If I want to fail the package what are my options,If number of delimiters are more than the number columns?
You can tell what is happening if you look at the advanced page of the flat file connection manager. For all but the last field the delimiter is '|', for the last field it is CRLF.
So by design all data after the last defined pipe and the end of the line (CRLF) is imported into your last field.
What I would do is add another column to the connection manager and your staging table. Map the new 'TestColumn' in the destination. When the import is complete you want to ensure that this column is null in every row. If not then throw an error.
You could use a script task but this way you will not need to code in c# and you will not have to process the file twice. If you are comfortable coding a script task and / or you can not use a staging table with extra column then that will be the only other route I could think of.
A suggestion for checking for null would be to use an execute sql task with single row result set to integer. If the value is > 0 then fail the package.
The query would be Select Count(*) NotNullCount From Table Where TestColumn Is Not Null.
You can write a script task that reads the file, counts the pipes, and raises an error if the number of pipes is not what you want.

ignore bcp right truncation

I have a file with the stock information, such as ticker and stock price. The file was loaded to database table using freebcp. The stock price format in the file is like: 23.125. The stock price data type in database table is [decimal](28, 2). freebcp loaded the data to the table without any problem by ignoring the last digit: 23.12 was loaded to the table column of the record. We are now using Microsoft SQL Server's bcp utility (Version: 11.0 ) to load the data. However we now encounter an issue: bcp considers loading 23.125 to decimal(28.2) is an error (## Row 783, Column 23: String data, right truncation ##). It rejected the record.
I didn't want to modify the input file, because there are a lot of columns in the file need to be fixed by removing the last digit of columns.
Are there any ways to construct the BCP or the Microsoft SQL Server to ingore the right truncation error?
A common workaround back in the day, is to BCP into a secondary/temp table, then do SELECT (columnlist) INTO the base table, with the necessary conversion. Another option, is to Use the OPENROWSET Bulk Rowset Provider, then you can cast/convert as needed.
I encountered this error today and I fixed this by using the -m parameter in sql server version #15
bcp dbo.<table> in <csv file> -S <server> -d <db> -U <user> -P <psw> -m 999999 -q -c -t ,
Reference: https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-ver15#m
Note:
The -m option also does not apply to converting the money or bigint data types.

push query results into array

I have a bash shell script. I have a psql copy command that has captured some rows of data from a database. The data from the database are actual sql statments that I will use in the bash script. I put the statements in a database because they are of varying length and I want to be able to dynamically call certain statements.
1) I'm unsure what delimiter to use in the copy statement. I can't use a comma or pipe because they are in my data coming from the database. I have tried a couple random characters because those are not in my database but copy has a fit and only wants one ascii character.
Also to complicate things I need to get query_name and query_string for each row.
This is what I currently have. I get all the data fine with the copy but now I just want to push the data into an array so that I will be able to loop over it later:
q="copy (select query_name,query_string from query where active=1)
to stdout delimiter ','"
statements=$(psql -d ${db_name} -c "${q}")
statements_split=(`echo ${statements//,/ }`)
echo ${statements_split[0]};
Looks to me like you actually want to build something like a dictionary (associative array) mapping query_name to query_string. bash isn't really the best choice for handling complex data structures. I'd suggest using Perl for this kind of task if that's an option.

Moving results of T-SQL query to a file without using BCP?

What I want to do is output some query results to a file. Basically, when I query the table I'm interested in, my results look like this:
HTML_ID HTML_CONTENT
1 <html>...
2 <html>...
3 <html>...
4 <html>...
5 <html>...
6 <html>...
7 <html>...
The field HTML_CONTENT is of type ntext and each record's value is around 500+ characters (that contains HTML content).
I can create a cursor to move each record's content to a temp table or whatever.
But my question is this: instead of temp table, how would I move this without using BCP?
BCP isn't an option as our sysadmin has blocked access to sys.xp_cmdshell.
Note: I want to store each record's HTML content to individual files
My version of sql is: Microsoft SQL Server 2008 (SP1) - 10.0.2531.0
You can make use of SSIS to read the table data and output the content of the table rows as files. Export column transformation available within Data Flow Task of SSIS packages might help you do that.
Here is an example, The Export Column Transformation
MSDN documentation about Export Column transformation.
This answer would have worked until you added the requirement for Individual Files.
You can run the SQL from command line and dump the output into a file. The following utilities can be used for this.
SQLCMD
OSQL
Here is an example with SQLCMD with an inline query
sqlcmd -S ServerName -E -Q "Select GetDate() CurrentDateAndTime" > output.txt
You can save the query to a file (QueryString.sql) and use -i instead
sqlcmd -S ServerName -E -i QueryString.sql> output.txt
Edit
Use SSIS
Create a package
Create a variable called RecordsToOutput of type Object at the package level
Use an EXECUTE SQL task and get the dataset back into RecordsToOutput
Use a For-Each loop to go through the RecordsToOutput dataset
In the loop, create a variable for each column in the dataset (give it the same name)
Add a Data Flow task
Use a OleDB source and use a SQL statement to create one row (with data you already have)
use a flat-file destination to write out the row.
Use expressions on the flat file connection to change the name of the destination file for each row in the loop.

Unable to update the table of SQL Server with BCP utility

We have a database table that we pre-populate with data as part of our deployment procedure. Since one of the columns is binary (it's a binary serialized object) we use BCP to copy the data into the table.
So far this has worked very well, however, today we tried this technique on a Windows Server 2008 machine for the first time and noticed that not all of the columns were being updated. Out of the 31 rows that are normally inserted as part of this operation, only 2 rows actually had their binary columns populated correctly. The other 29 rows simply had null values for their binary column. This is the first situation where we've seen an issue like this and this is the same .dat file that we use for all of our deployments.
Has anyone else ever encountered this issue before or have any insight as to what the issue could be?
Thanks in advance,
Jeremy
My guess is that you're using -c or -w to dump as text, and it's choking on a particular combination of characters it doesn't like and subbing in a NULL. This can also happen in Native mode if there's no format file. Try the following and see if it helps. (Obviously, you'll need to add the server and login switches yourself.)
bcp MyDatabase.dbo.MyTable format nul -f MyTable.fmt -n
bcp MyDatabase.dbo.MyTable out MyTable.dat -f MyTable.fmt -k -E -b 1000 -h "TABLOCK"
This'll dump the table data as straight binary with a format file, NULLs, and identity values to make absolutely sure everything lines up. In addition, it'll use batches of 1000 to optimize the data dump. Then, to insert it back:
bcp MySecondData.dbo.MyTable in MyTable.dat -f MyTable.fmt -n -b 1000
...which will use the format file, data file, and set batching to increase the speed a little. If you need more speed than that, you'll want to look at BULK INSERT, FirstRow/LastRow, and loading in parallel, but that's a bit beyond the scope of this question. :)