Read flat file line by line using smooks - smooks

I have a flat file as input. I need to read the flat file line by line using Smooks.
Can anyone please give me suggestions or sample code how to do it.

Here is Smooks Config file (SmooksConfig_ForFlatFile.xml).
here is Input File (input-messages.txt)
charles moulliard Male 43BE
maxence dewil Male 30NL
eleonor moulliard Female 12AD
Here is the java code:
smooks = new Smooks("SmooksConfig_ForFlatFile.xml");
FileOutputStream fos = new FileOutputStream("FlatFile_Output.xml");
smooks.filterSource(new StreamSource(new FileInputStream("input-message.txt")), new StreamResult(fos));

Related

How to loop over CSV file and write each line in Write File activity?

I am using TIBCO BW 6.5 designer, I am trying to read a large CSV file (having ; as separator). Below are some of my sample CSV file data:-
ORDER_NUMBER;CODE_NUMBER
A;014 53758
B;015 73495
C;016 67569
D;017 59390
I am trying to start reading from 2nd line i.e. "A;014 53758".
I am using "ParseData" activity which is placed inside a "Repeat" group as shown in Image below:-
The configuration of my "Repeat" group is below:-
The configuration of my "ParseData" is:-
In my WriteFile I have checked the "append" box, and I am writing as 'Text' in my file. The textContent for my WriteFile is :-
concat($ParseData/Rows/Updates[$index]/ORDER_NUMBER, $ParseData/Rows/Updates[$index]/CODE_NUMBER , '&crlf;')
But when I run my project, the Write File only writes the first row and all the rest rows are blank.
Can anybody please help in rectifying what I am doing wrong.
Thanks,
Rudra
Try this :
ParseData activity input : startRecord should be 1 intead of $index + 1
WriteFile activity input : concat($ParseData/Rows/Updates[1]/ORDER_NUMBER,
-$ParseData/Rows/Updates[1]/CODE_NUMBER, '&crlf;') (1 instead of $index)
You can uncheck the accumulate in the repeat loop

Bigquery error (ASCII 0) encountered for external table and when loading table

I'm getting this error
"Error: Error detected while parsing row starting at position: 4824. Error: Bad character (ASCII 0) encountered."
The data is not compressed.
My external table points to multiple CSV files, and one of them contains a couple of lines with that character. In my table definition I added "MaxBadRecords", but that had no effect. I also get the same problem when loading the data in a regular table.
I know I could use DataFlow or even try to fix the CSVs, but is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?
is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?
Try below in Google Cloud SDK Shell (with use of tr utility)
gsutil cp gs://bucket/badfile.csv - | tr -d '\000' | gsutil cp - gs://bucket/fixedfile.csv
This will
Read your "bad" file
Remove ASCII 0
Save "fixed" file into new file
After you have new file - just make sure your table now points to that fixed one
Sometimes it occurs that a final byte appears in file.
What could help is replacing it thanks to :
tr '\0' ' ' < file1 > file2
You can clean the file using an external tool like python or PowerShell. There is no way to load any file with an ASCII0 in bigquery
This is a script that can clear the file with python:
def replace_chars(self,file_path,orignal_string,new_string):
#Create temp file
fh, abs_path = mkstemp()
with os.fdopen(fh,'w', encoding='utf-8') as new_file:
with open(file_path, encoding='utf-8', errors='replace') as old_file:
print("\nCurrent line: \t")
i=0
for line in old_file:
print(i,end="\r", flush=True)
i=i+1
line=line.replace(orignal_string, new_string)
new_file.write(line)
#Copy the file permissions from the old file to the new file
shutil.copymode(file_path, abs_path)
#Remove original file
os.remove(file_path)
#Move new file
shutil.move(abs_path, file_path)
The same but for PowerShell:
(Get-Content "C:\Source.DAT") -replace "`0", " " | Set-Content "C:\Destination.DAT"

write chinese words in csv file using python2.7

I am trying to write Chinese words like 花花公子昊天鞋类专营店 in a CSV file in python, but not able to do it. I tried solution given here("issues with writing Chinese to csv file in Python"). Any help will be appreciated.
The module unicodecsv helps with that (you can install that with pip):
import unicodecsv
w = unicodecsv.writer(open("test.csv", "w"))
w.writerow((u"花花公子昊天鞋类专营店", 78.10))
del w
The resulting csv file opens succesfully in OpenOffice.
You can also read it back in Python:
r = unicodecsv.reader(open("test.csv", "rb"))
for row in r:
print row[0], row[1]
And when run, it should print:
(user#motoom) ~/Prj/python $ python chinesecsv.py
花花公子昊天鞋类专营店 78.1

File line splitting in Jython

I am trying to read a file and populate the values in DB with the help of Jython in ODI.
For this, I read the line one by one split the line on the basis of ',' present.
Now I have a line as
4JGBB8GB5AA557812,,Miss,Maria,Cruz,,"266 Faller Drive Apt. B",
New Milford,NJ,07646,2015054604,2015054604,20091029,51133,,,
N,LESSEE,"MERCEDES-BENZ USA, LLC",N,N
"MERCEDES-BENZ USA, LLC" this field has , within the double quotes due to which it gets split into two fields whereas it should only be considered one. Can someone please tell me how should i avoid this.
fields = valueList.split(',')
I use this for splitting where valuelist is the individual line present in the file
You can use csv module which can take care of quotes:
line = '4JGBB8GB5AA557812,,Miss,Maria,Cruz,,"266 Faller Drive Apt. B",New Milford,NJ,07646,2015054604,2015054604,20091029,51133,,,N,LESSEE,"MERCEDES-BENZ USA, LLC",N,N'
import StringIO
import csv
f = StringIO.StringIO(line)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\n'.join(row))
result:
...
266 Faller Drive Apt. B
...
LESSEE
MERCEDES-BENZ USA, LLC
...
My example uses StringIO because test line is as string in code, you can simply use just opened file handler as f.
You will find more examples at "Module of the Month": http://pymotw.com/2/csv/index.html#module-csv

How to read different file format data and use it for compression

fob = open('this.txt','rb')
fob1 = open('that.txt','wb')
content = ''
for i in fob:
content += i
fob1.write(content)
fob.close()
fob1.close()
This is a code that is used to read a txt file and store it in a txt file.. How do I read any kind of file??? tht might evn be a jpeg file,pdf file or someother file.. Pls do help me..
Thanks in advance..
Your code reads a *.txt file line by line (and copies it).
If you want to read a different type of file byte by byte, and print its bits you can do this:
f = open('test.gnu','rb')
flag=1;
while flag:
byte = f.read(1)
flag = (byte != "")
if flag:
# do something with the byte, eg:
# print its bits:
print '{0:08b}'.format(ord(byte))
f.close()
Or if you want to zip and unzip files, you can use the package "zipfile"
http://docs.python.org/2/library/zipfile; for code with examples with various compression formats see:
http://pymotw.com/2/compression.html