Pentaho - Spoon Decimal from Text File Input - pentaho

I'm new to Pentaho and have a little problem with the Text file Input.
Currently I have to have several data records written to a database. In the files, the decimal numbers are separated by a point.
Pentaho is currently transforming the number 123.3659 € to 12.33 €.
Can someone help?

When you read the file, do you read it as a csv, excel or something like that? If that's the case, then you can specify the format of the column to interpret the number correctly (I think, I'm talking from memory now) Or maybe playing with the language of the file might work.
If it's a file containing a string, you can use some step like the string operator to replace the point with a comma.

This problem might come from various reasons.
Although I think that by following the next steps you can solve the issue.
-First, you must get a "Replace in String" step;
-Then search for the dot and replace it with nothing as I show in the following image, or with a coma if the number you show is a float;
Example snip
Hope this helped!
Give feedback if so!
Have a good day!

Related

Multi-line text in a .env file

In vue, is there a way to have a value span multiple lines in an .env file. Ex:
Instead of:
someValue=[{"someValue":"Here is a really really long piece which should be split into multiple lines"}]
I want to do something like:
someValue=`[{"someValue":"Here is a really
really long piece which
should be split into multiple lines"}]`
Doing the latter gives me a JSON parsing error if I try to do JSON.parse(someValue) in my code
I don't know if this will work, but I can't format a comment appropriately enough to get the point across so see if this will work:
someValue=[{"someValue":"Here is a really\
really long piece which\
should be split into multiple lines"}]
Where "\" should escape the newline similar to how you can write long bash commands while escaping the newline. I'm not certain the .env interpreter will support it though.
EDIT
Looks like this won't work. This syntax was actually proposed, but I don't think it was incorporated. See motdotla/dotenv#333 (which is what Vue uses to parse .env).
Like #zero298 said, this isn't possible. Likely you could delimit the entry with a character that wouldn't show up normally in the text (^ is a good candidate), then parse it within the application using string.replace('^', '\n');

Inconsistent line endings in SSIS Flat File import

I have a large, pipe delineated text file with no text qualifiers, and it looks like whatever spit out this file accidentally spit out false "LF" markers in the last column every few hundred rows.
The last column is a descriptive column, and It is not text qualified in any way like it should be.
file looks similar to this:
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Descr[LF]
iption[LF]
id|data|data|data|data|Description[LF]
Id|data|data|data|data|Description[LF]
id|data|data|data|data|Descripti[LF]
on[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|D[LF]
escription[LF]
I'm pretty new to SSIS and SQL in general, Does anyone have any advice on how to fix this?
I did actually find a way to fix it in Notepad++, because I don't know C# and I don't know SSIS well enough..
The ID was 8 Digits long, and followed by 7 Blank spaces. That was absolutely unique to this file.
In notepad++ I used (Find Extended) to search and replace "\n"(LF) with nothing
then I used the this expression for find:
(\d\d\d\d\d\d\d\d[[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]])
to find all 8 digit numbers with 7 trailing spaces, and for replace, used this:
\r\n\1
to put a [CR][LF] in front of those 8 digit numbers.
Lo and behold it worked!
But either way.. My boss contacted the client and is requesting a better file. Now I get kudos, and we get proper data. Thanks for the advice all!
If I had to take a guess, I would say that this is occurring because of how the file is created... you are probably having data that just happens to include certain special characters which are being incorrectly interpreted as a Line Feed.
Check this site to see if the data within your problem lines match any of these encodings. If this is the case then ultimately you have two options available:
1) Create some elaborate and complicated ETL process to detect and correct the file data before you process it. This is inadvisable as it will be a major pain to create and maintain.
2) Try changing the way this file is produced. Most text export wizards will allow you to place quotes (") around text items so that your import process can quickly detect something as a text block as opposed to a series of encoded characters to interpret.

SAS : read in PDF file

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)
The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.
Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)
(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)
You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:
%let pdfFileName = Test.pdf;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile "&pdfFileName" lrecl=&lineSize;
input text_line $;
run;
This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (##) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.

Is there a tool for finding the Char code of a character?

I am trying to write a VB function to strip unwanted characters from a string. It is for generating a 'clean' url from data that has been inputted into a CMS. Someone has copied and pasted from a Word document and so there appears to be an mdash or ndash in the product title. This results in ─ appearing instead of -
I have tried a Replace(text, Chr(196), Chr(45)) but it isn't working so it can't be 196. Is there a tool or something where I can copy this character and paste it into the tool and it will tell me what char code it is?
Thanks.
You can make your program write out the Character Code using the finction Asc()
Response.write Asc("-") would write out
45
for example.
Try here or here. From 2nd link I can see that your char is alt150

Pentaho Spoon - Validate Fixed Width Input File Format

I'm trying to process a fixed width input file in pentaho and validate the format. The file will be a mixture of strings, numbers and dates. However when attempting to process a number field that has an incorrect character present (which i had expected would throw an error) it just reads the first part of the number and ignores the bad char.
I can recreate this issue with a very simple input file containing a single field:
I specify the expected number format, along with start position and length:
On running the transformation i would have expected the 'Q' to cause an error instead the following result is displayed, just reading the first two digits "67" and padding the rest to match the specified format:
If the input file is formatted correctly it runs perfectly well, but need it to throw an error otherwise. Any suggestions would be awesome. Thanks!
Just an FYI in case someone stumbles accross this question after hitting the same issues as myself.
I was able to construct a workaround by reading all values in the "Text File Input" step as strings, and then using a "Data Validator" step equipped with regex evaluation to ensure numbers were correctly formatted before parsing to number type with a following "Select Values" step.
Takes a bit longer to do this for every field, but was the most robust solution i could come up with.
Thanks