"Error loading data: 42000" in Pentaho PDI MonetDB bulk Loader step - pentaho

I want to insert data from a large CSV file to MonetDB. I can't use MonetDB "mclient" because this procedure must run inside a Pentaho Server application within a Docker container. MonetDB is inside a Docker container too.
Here's my very simple transformation:
When I test the transformation, I always get the following error message:
2021/03/20 22:37:37 - MonetDB bulk loader.0 - Error loading data: 42000!COPY INTO: record separator contains '\r\n' but in the input stream, '\r\n' is being normalized into '\n'
Does anyone have any idea what is happening?
Thank you!

This is related to line endings. Pentaho issues a COPY INTO statement,
COPY INTO <table> FROM <file>
USING DELIMITERS ',', '\r\n'
Here, \r\n means DOS/Windows line endings. Since the Oct2020 release, MonetDB always normalizes the line endings from DOS/Windows to Unix style \n when loading data. Before, it used to sometimes normalize and sometimes not. However, normalizing to \n means that looking for \r\n would yield one giant line containing the whole file, hence the error message.
I will submit a patch to MonetDB to automatically replace the USING '\r\n' with '\n'.
This will fix it in the longer term.
In the short term I have no good solution to offer. I have no experience with Pentaho, but looking at the source code, it seems Pentaho is uses the system property lines.separator, which is \r\n on Windows.
This means, if you have access to a Mac or Linux machine to run Pentaho on, that will work as line.separator is \n there. Otherwise, maybe you can ask the Pentaho people if the JVM can be started with something like java -Dline.separator="\n" as a workaround, see also this Stack Overflow question.
Otherwise, we'll have to use a patched version of either Pentaho, the MonetDB JDBC driver or MonetDB. I could send you a patched version of the JDBC driver that automagically replaces the '\r\n' with '\n' before sending the query to the server but you have to figure out yourself how to get Pentaho to use this JDBC driver instead of the default one.

Related

dbt Error : Encountered an error: 'utf-8' codec can't decode byte 0xa0 in position 441: invalid start byte

I have upgraded my dbt version to 1.0.0 yesterday night and ran few connection test. It went well . Now when i am running the my first dbt example model , i am getting below error , even though i have not changed any code in this default example model.
Same error i am getting while running dbt seed command also for a csv dataset . The csv is utf-8 encoded and no special character in it .
I am using python 3.9
Could anyone suggest what is the issue ?
Below is my first dbt model sql
After lots of back and forth, I figured out the issue. This is more like fundamental concept issue.
Every time we execute dbt run, dbt will scan through the entire project directory ( including seeds directory even though it is not materializing the seed ) [Attached screenshot below].
If it finds any csv it also parsed it .
In case of above error, I had a csv file which looks follows :
If we see the highlighted line it contains some symbol character which dbt (i.e python) was not able to parse it causing above error.
This symbol was not visible earlier in excel or notepad++.
It could be the issue with Snowflake python connector that #PeterH has pointed out .
As temporary solution , for now we are manually removing these character from Data file.
I’d leave this as a comment but I don’t have the rep yet…
This appears to be related to a recently-opened issue.
https://github.com/dbt-labs/dbt-snowflake/issues/66
Apparently it’s something to do with the snowflake python adapter.
Since you’re seeing the error from a different context, it might be helpful for you to post in that issue that you’re seeing this outside of query preview.

Relacing a word in an db2 sql file causes DSNC105I : End of file reached while reading the command error

I have a dynamic sql file in which name of TBCREATOR changes as given in a parameter.
I use a simple python script to change the TBCREATOR=<variable here> and write the result to an output sql file.
calling this file using db2 -td# -vf <generated sql file>gives
DSNC105I : End of file reached while reading the command
Here is the file i need the TBCREATOR variable replaced:
CONNECT to 204.90.115.200:5040/DALLASC user *** using ****#
select REMARKS from sysibm.SYSCOLUMNS WHERE TBCREATOR='table' AND NAME='LCODE'
#
Here is the python script:
#!/usr/bin/python3
# #------replace table value with schema name
# print(list_of_lines)
fin = open("decrypt.sql", "rt")
#output file to write the result to
fout = open("decryptout.sql", "wt")
for line in fin:
fout.write(line.replace('table', 'ZXP214'))
fin.close()
fout.close()
After decryptout.sql is generated I call it using db2 -td# -vf decryptout.sql
and get the error given above.
Whats irritating is I have another sql file that contains exactly same data as decryptout.sql which runs smoothly with the db2 -td# -vf ... command. I tried to use the unix command cmp to compare the generated file and the one which I wrote, with the variable ZXP214 already replaced but there are no differences. What is causing this error?.
here is the file (that executes without error) I compare generated output with:
CONNECT to 204.90.115.200:5040/DALLASC user *** using ****#
select REMARKS from sysibm.SYSCOLUMNS WHERE TBCREATOR='ZXP214' AND NAME='LCODE'
#
I found that specifically on the https://ibmzxplore.influitive.com/ challenge, if you are using the java db2 command and working in the Zowe USS system (Unix System Services of zOS), there is a conflict of character sets. I believe the system will generally create files in EBCDIC format, whereas if you do
echo "CONNECT ..." > syscat.clp
the resulting file will be tagged as ISO8859-1 and will not be processed properly by db2. Instead, go to the USS interface and choose "create file", give it a folder and a name, and it will create the file untagged. You can use
ls -T
to see the tags. Then edit the file to give it the commands you need, and db2 will interoperate with it properly. Because you are creating the file with python, you may be running into similar issues. When you open the new file, use something like
open(input_file_name, mode=”w”, encoding=”cp1047”)
This makes sure the file is open as an EBCDIC file.
If you are using the Db2-LUW CLP (command line processor) that is written in c/c++ and runs on windows/linux/unix, then your syntax for CONNECT is not valid.
Unfortunately your question is ambigiously tagged so we cannot tell which Db2-server platform you actually use.
For Db2-LUW with the c/c++ written classic db2 command, the syntax for a type-1 CONNECT statement does not allow a connection-string (or partial connection string) as shown in your question. For Db2-LUW db2 clp, the target database must be externally defined (i.e not inside the script) , either via the legacy actions of both catalog tcpip node... combined with catalog database..., or must be defined in the db2dsdriver.cfg configuration file as plain XML.
If you want to use connection-strings then you can use the clpplus tool which is available for some Db2-LUW client packages, and is present on currently supported Db2-LUW servers. This lets you use Oracle style scripting with Db2. Refer to the online documentation for details.
If you not using the c/c++ classic db2 command, and you are instead using the emulated clp written in java only available with Z/OS-USS, then you must open a ticket with IBM support for that component, as that is not a matter for stackoverflow.

How to use csvSeparator with jpexport (jprofiler)?

I am using jprofiler to make some tests about the memory usage of my application. I would like to include them in my build process. All the steps should work in command like.
On step exports csv file from jps file with a command like:
~/jprofiler7/bin/jpexport q1.jps "TelemetryHeap" -format=csv q1_telemetry_heap.csv
On my local machine (widows), it is working. On my server (linux) the csv file is not well formatted:
"Time [s]","Committed size","Free size","Used size"
0.0,30,784,000,19,558,000,11,226,000
1.0,30,976,000,18,376,000,12,600,000
2.0,30,976,000,16,186,000,14,790,000
3.0,30,976,000,16,018,000,14,958,000
4.01,30,976,000,14,576,000,16,400,000
They is no way to distinguish the comma of csv format and the one of the numbering format.
According to the documentation, I need to change the value of -Djprofiler.csvSeparator in the file bin/export.vmoptions.
But I fail. I also try to change this value in jpexport.vmoptions and in jprofiler.vmoptions.
What should I do?
Thanks for your help
This bug was fixed in JProfiler 8.0.2.
Adding
-Djprofiler.csvSeparator=;
on a new line in bin/jpexport.vmoptions should work in JProfiler 7, though.

stata odbc sqlfile

I am trying to load data from database (either MS Access or SQL server) using odbc sqlfile it seems that the code is running with any error but I am not getting data. I am using the following code odbc sqlfile("sqlcode.sql"),dsn("mysqlodbcdata"). Note that sqlcode.sql contains just sql statement with SELECT. The thing is that the same sql code is giving data with odbc load,exec(sqlstmt) dsn("mysqlodbcdata"). Can anyone suggest how can I use odbc sqlfile to import data? This would be a great help for me.
Thanks
Joy
sqlfile doesn't load any data. It just executes (and displays the results when the loud option is specified), without loading any data into Stata. That's somewhat counter-intuitive, but true. The reasons are somewhat opaquely explained in the pdf/dead tree manual entry for the odbc command.
Here's a more helpful answer. Suppose you have your SQL file named sqlcode.sql. You can open it in Stata (as long as it's not too long, where too long depends on your flavor of Stata). Basically, -file read- reads the SQL code line by line, storing the results in a local macro named exec. Then you pass that macro as an argument to the -odbc load- command:
Updated Code To Deal With Some Double Quotes Issues
Cut & paste the following code into a file called loadsql.ado, which you should put in directory where Stata can see it (like ~/ado/personal). You can find such directories with the -adopath- command.
program define loadsql
*! Load the output of an SQL file into Stata, version 1.3 (dvmaster#gmail.com)
version 14.1
syntax using/, DSN(string) [User(string) Password(string) CLEAR NOQuote LOWercase SQLshow ALLSTRing DATESTRing]
#delimit;
tempname mysqlfile exec line;
file open `mysqlfile' using `"`using'"', read text;
file read `mysqlfile' `line';
while r(eof)==0 {;
local `exec' `"``exec'' ``line''"';
file read `mysqlfile' `line';
};
file close `mysqlfile';
odbc load, exec(`"``exec''"') dsn(`"`dsn'"') user(`"`user'"') password(`"`password'"') `clear' `noquote' `lowercase' `sqlshow' `allstring' `datestring';
end;
/* All done! */
The syntax in Stata is
loadsql using "./sqlfile.sql", dsn("mysqlodbcdata")
You can also add all the other odbc load options, such as clear, as well. Obviously, you will need to change the file path and the odbc parameters to reflect your setup. This code should do the same thing as -odbc sqlfile("sqlfile.sql"), dsn("mysqlodbcdata")- plus actually load the data.
I also added the functionality to specify your DB credentials like this:
loadsql using "./sqlfile.sql", dsn("mysqlodbcdata") user("user_name") password("not12345")
For "--XYZ" style comments, do something like this (assuming you don't have "--" in your SQL code):
if strpos(`"``line''"', "--") > 0 {;
local `line' = substr(`"``line''"', 1, strpos(`"``line''"', "--")-1);
};
I had to post this as an answer otherwise the formatting would've been all messed up, but it's obviously referring to Dimitriy's code.
(You could also define a local macro holding the position of the "--" string to make your code a little cleaner.)

Reading a text file with SSIS with CRLF or LF

Running into an issue where I receive a text file that has LF's as the EOL. Sometimes they send the file with CRLF's as the EOL. Does anyone have any good ideas on how I can make SSIS use either one as the EOL?
It's a very easy convert operation with notepad++ to change it to what ever I need, however, it's manual and I want it to be automatic.
Thanks,
EDIT. I fixed it (but not perfect) by using Swiss File Knife before the dataflow.
If the line terminators are always one or the other, I'd suggest setting up 2 File Connection Managers, one with the "CRLF" row delimiter, and the other with the "LF" row delimiter.
Then, create a boolean package variable (something like #IsCrLf) and scope this to your package. Make the first step in your SSIS package a Script Task, in which you read in a file stream, and attempt to discover what the line terminator is (based on what you find in the stream). Set the value of your variable accordingly.
Then, after the Script Task in your Control Flow, create 2 separate Data Flows (one for each File Connection Manager) and use a Precedence Constraint set to "Expression and Constraint" on the connectors to specify which Data Flow to use, depending on the value of the #IsCrLf variable.
Example of the suggested Control Flow below.
how about a derived column with the REPLACE operation after your file source to change the CRLFs to LFs?
I second the OP's vote for Swiss File Knife.
To integrate that, I had to add an Execute Process Task:
However, I have a bunch of packages that run For-Each-File loops, so I needed some BIML - maybe this'll help the next soul.
<ExecuteProcess Name="(EXE) Convert crlf for <#= tableName #>"
Executable="<#= myExeFolder #>sfk.exe">
<Expressions>
<Expression PropertyName="Arguments">
"crlf-to-lf " + #[User::sFullFilePath]
</Expression>
</Expressions>
</ExecuteProcess>