Write byte strings to a file - lxml

I have following code
from lxml import etree
xmlns = "http://www.fpml.org/FpML-5/confirmation"
xsi = "http://www.w3.org/2001/XMLSchema-instance"
fpmlVersion="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-6.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd"
page = etree.Element("{"+xmlns+"}dataDocument",nsmap={None:xmlns,'xsi':xsi })
doc = etree.ElementTree(page)
page.set("fpmlVersion", fpmlVersion)
trade = etree.SubElement(page,'trade')
party = etree.SubElement(page,'party',id='party1')
partyID = etree.SubElement(party,'partyID')
partyID.text = 'PARTYAUS33'
partyName = etree.SubElement(party,'partyName')
partyName.text = 'Party A'
party = etree.SubElement(page,'party',id='party2')
partyID = etree.SubElement(party,'partyID')
partyID.text = 'BARCGB2L'
partyName = etree.SubElement(party,'partyName')
partyName.text = 'Party B'
s = etree.tostring(doc, xml_declaration=True,encoding="UTF-8",pretty_print=True)
print (s)
How do i save the contents of S to a file

You have to open() a new file in binary mode, and later use the filehandle write() function, so add:
import sys
fd = open(sys.argv[1], 'wb')
at the beginning.
And:
fd.write(s)
at the end.
So, you now have to pass an additional argument to the script, that will be the output file to write. Also note that I use the w flag in the open() function that will destroy (to re-create) any file with that name in current directory.
Run it like:
python3 script.py outfile
That yields:
<?xml version='1.0' encoding='UTF-8'?>
<dataDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fpml.org/FpML-5/confirmation" fpmlVersion="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-6.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd">
<trade/>
<party id="party1">
<partyID>PARTYAUS33</partyID>
<partyName>Party A</partyName>
</party>
<party id="party2">
<partyID>BARCGB2L</partyID>
<partyName>Party B</partyName>
</party>
</dataDocument>

Related

wait until the blob storoage folder is created

I would like to download a picture into a blob folder.
Before that I need to create the folder first.
Below codes are what I am doing.
The issue is the folder needs time to be created.
When it comes to with open(abs_file_name, "wb") as f:
it can not find the folder.
I am wondering whether there is an 'await' to get to know the completion of the folder creation, then do the write operation.
for index, row in data.iterrows():
url = row['Creatives']
file_name = url.split('/')[-1]
r = requests.get(url)
abs_file_name = lake_root + file_name
dbutils.fs.mkdirs(abs_file_name)
if r.status_code == 200:
with open(abs_file_name, "wb") as f:
f.write(r.content)
The final sub folder will not be created when using dbutils.fs.mkdirs() on blob storage.
It creates a file with the final sub folder name which would be considered as a directory, but it is not a directory. Look at the following demonstration:
dbutils.fs.mkdirs('/mnt/repro/s1/s2/s3.csv')
When I try to open this file, the error says that this is a directory.
This might be the issue with the code. So, try using the following code instead:
for index, row in data.iterrows():
url = row['Creatives']
file_name = url.split('/')[-1]
r = requests.get(url)
abs_file_name = lake_root + 'fail' #creates the fake directory (to counter the problem we are facing above)
dbutils.fs.mkdirs(abs_file_name)
if r.status_code == 200:
with open(lake_root + file_name, "wb") as f:
f.write(r.content)

Combine two TTS outputs in a single mp3 file not working

I want to combine two requests to the Google cloud text-to-speech API in a single mp3 output. The reason I need to combine two requests is that the output should contain two different languages.
Below code works fine for many language pair combinations, but unfortunately not for all. If I request e.g. a sentence in English and one in German and combine them everything works. If I request one in English and one in Japanes I can't combine the two files in a single output. The output only contains the first sentence and instead of the second sentence, it outputs silence.
I tried now multiple ways to combine the two outputs but the result stays the same. The code below should show the issue.
Please run the code first with:
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'August' --code2 de-De
This works perfectly.
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'こんにちは' --code2 ja-JP
This doesn't work. The single files are ok, but the combined files contain silence instead of the Japanese part.
Also, if used with two Japanes sentences everything works.
I already filed a bug report at Google with no response yet, but maybe it's just me who is doing something wrong here with encoding assumptions. Hope someone has an idea.
#!/usr/bin/env python
import argparse
# [START tts_synthesize_text_file]
def synthesize_text_file(text1, text2, code1, code2):
"""Synthesizes speech from the input file of text."""
from apiclient.discovery import build
import base64
service = build('texttospeech', 'v1beta1')
collection = service.text()
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = '<speak><break time="2s"/></speak>'
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
audio_pause = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_pause = response['audioContent']
ssmlLine = '<speak>' + text1 + '</speak>'
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = ssmlLine
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
# The response's audio_content is binary.
with open('output1.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output1.mp3"')
audio_text1 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text1 = response['audioContent']
ssmlLine = '<speak>' + text2 + '</speak>'
data2 = {}
data2['input'] = {}
data2['input']['ssml'] = ssmlLine
data2['voice'] = {}
data2['voice']['ssmlGender'] = 'MALE'
data2['voice']['languageCode'] = code2 #'ko-KR'
data2['audioConfig'] = {}
data2['audioConfig']['speakingRate'] = 0.8
data2['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data2)
response = request.execute()
# The response's audio_content is binary.
with open('output2.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output2.mp3"')
audio_text2 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text2 = response['audioContent']
result = audio_text1 + audio_pause + audio_text2
with open('result.mp3', 'wb') as out:
out.write(result)
print('Audio content written to file "result.mp3"')
raw_result = raw_text1 + raw_pause + raw_text2
with open('raw_result.mp3', 'wb') as out:
out.write(base64.b64decode(raw_result.decode('UTF-8')))
print('Audio content written to file "raw_result.mp3"')
# [END tts_synthesize_text_file]ls
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--t1')
parser.add_argument('--code1')
parser.add_argument('--t2')
parser.add_argument('--code2')
args = parser.parse_args()
synthesize_text_file(args.t1, args.t2, args.code1, args.code2)
You can find the answer here:
https://issuetracker.google.com/issues/120687867
Short answer: It's not clear why it is not working, but Google suggests a workaround to first write the files as .wav, combine and then re-encode the result to mp3.
I have managed to do this in NodeJS with just one function (idk how optimal is it, but at least it works). Maybe you could take inspiration from it
I have used memory-streams dependency from npm
var streams = require('memory-streams');
function mergeAudios(audios) {
var reader = new streams.ReadableStream();
var writer = new streams.WritableStream();
audios.forEach(element => {
if (element instanceof streams.ReadableStream) {
element.pipe(writer)
}
else {
writer.write(element)
}
});
reader.append(writer.toBuffer())
return reader
}
Input parameter is a list which contain ReadableStream or responce.audioContent from synthesizeSpeech operation. If it is readablestream, it uses pipe operation, if it is audiocontent, it uses write method. At the end all content is passed into an readabblestream.

Test file upload in Flask

I have a flask's controller (POST) to upload a file:
f = request.files['external_data']
filename = secure_filename(f.filename)
f.save(filename)
I have tried to test it:
handle = open(filepath, 'rb')
fs = FileStorage(stream=handle, filename=filename, name='external_data')
payload['files'] = fs
url = '/my/upload/url'
test_client.post(url, data=payload)
But in the controller request.files contains:
ImmutableMultiDict: ImmutableMultiDict([('files', <FileStorage: u'myfile.png' ('image/png')>)])
My tests pass in case I replace 'external_data' with 'files'
How is it possible to create flask test request that contains request.files('external_data')?
You're not showing the origin from payload, which is the issue.
payload should probably be a .copy() of a dict() version of your original object.

How to set log filename in flume

I am using Apache flume for log collection. This is my config file
httpagent.sources = http-source
httpagent.sinks = local-file-sink
httpagent.channels = ch3
#Define source properties
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 8082
# Local File Sink
httpagent.sinks.local-file-sink.type = file_roll
httpagent.sinks.local-file-sink.channel = ch3
httpagent.sinks.local-file-sink.sink.directory = /home/avinash/log_dir
httpagent.sinks.local-file-sink.sink.rollInterval = 21600
# Channels
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 1000
My application is working fine.My problem is that in the log_dir the files are using some random number (I guess its timestamp) timestamp as by default.
How to give a proper filename suffix for logfiles ?
Having a look on the documentation it seems there is no parameter for configuring the name of the files that are going to be created. I've gone to the sources looking for some hidden parameter, but there is no one :)
Going into the details of the implementation, it seems the name of the file is managed by the PathManager class:
private PathManager pathController;
...
#Override
public Status process() throws EventDeliveryException {
...
if (outputStream == null) {
File currentFile = pathController.getCurrentFile();
logger.debug("Opening output stream for file {}", currentFile);
try {
outputStream = new BufferedOutputStream(new FileOutputStream(currentFile));
...
}
Which, as you already noticed, is based on the current timestamp (showing the constructor and the next file getter):
public PathManager() {
seriesTimestamp = System.currentTimeMillis();
fileIndex = new AtomicInteger();
}
public File nextFile() {
currentFile = new File(baseDirectory, seriesTimestamp + "-" + fileIndex.incrementAndGet());
return currentFile;
}
So, I think the only possibility you have is to extend the File Roll sink and override the process() method in order to use a custom path controller.
For sources you have execute commands to tail and pre-pend or append details, based on shell scripting. Below is a sample:
# Describe/configure the source for tailing file
httpagent.sources.source.type = exec
httpagent.sources.source.shell = /bin/bash -c
httpagent.sources.source.command = tail -F /path/logs/*_details.log
httpagent.sources.source.restart = true
httpagent.sources.source.restartThrottle = 1000
httpagent.sources.source.logStdErr = true

Generate xml documents using lxml and vary element text and attributes based on logic

I have my lxml code like this
from lxml import etree
import sys
fd = open('D:\\text.xml', 'wb')
xmlns = "http://www.fpml.org/FpML-5/confirmation"
xsi = "http://www.w3.org/2001/XMLSchema-instance"
fpmlVersion="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-6.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd"
page = etree.Element("{"+xmlns+"}dataDocument",nsmap={None:xmlns,'xsi':xsi })
doc = etree.ElementTree(page)
page.set("fpmlVersion", fpmlVersion)
trade = etree.SubElement(page,'trade')
tradeheader = etree.SubElement(trade,'tradeheader')
partyTradeIdentifier = etree.SubElement(tradeheader,'partyTradeIdentifier')
partyReference = etree.SubElement(partyTradeIdentifier,'partyReference',href='party1')
tradeId = etree.SubElement(partyTradeIdentifier,'tradeId',tradeIdScheme='http://www.partyA.com/swaps/trade-id')
tradeId.text = 'TW9235'
swap = etree.SubElement(trade,'swap')
party = etree.SubElement(page,'party',id='party1')
partyID = etree.SubElement(party,'partyID')
partyID.text = 'PARTYAUS33'
partyName = etree.SubElement(party,'partyName')
partyName.text = 'Party A'
party = etree.SubElement(page,'party',id='party2')
partyID = etree.SubElement(party,'partyID')
partyID.text = 'BARCGB2L'
partyName = etree.SubElement(party,'partyName')
partyName.text = 'Party B'
s = etree.tostring(doc, xml_declaration=True,encoding="UTF-8",pretty_print=True)
print (s)
fd.write(s)
And i need to generate a xml file like
<?xml version='1.0' encoding='UTF-8'?>
<dataDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fpml.org/FpML-5/confirmation" fpmlVersion="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-6.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd">
<trade>
<tradeheader>
<partyTradeIdentifier>
<partyReference href="party1"/>
<tradeId tradeIdScheme="http://www.partyA.com/swaps/trade-id">TW9235</tradeId>
</partyTradeIdentifier>
</tradeheader>
<swap/>
</trade>
<party id="party1">
<partyID>PARTYAUS33</partyID>
<partyName>Party A</partyName>
</party>
<party id="party2">
<partyID>BARCGB2L</partyID>
<partyName>Party B</partyName>
</party>
</dataDocument>
Now the above code works.
However i need to generate 10k such files where the elements text or attributes vary .
For example the partyID maybe different like
PARTYGER45 instead of PARTYUS33 is there a clean way to do this instead of hard coding it ?
Similarly i need to vary lot of things like the tradeId TW9235
one way could be to have the output xml without values loaded to lxml objectify and then loop while setting relevant values and write it to a file, meaning
from lxml import objectify
with open('in.xml') as f_in:
for pId in ['PARTYGER45', ...]:
dataDocument = objectify.parse(f.read())
dataDocument.party.partyID._setText(pId)
...
obj_xml = lxml.etree.tostring(dataDocument)
with open('out_%s.xml' % pId, 'w') as f_out:
f.write(obj_xml)
another way might be to use lxml and xslt, again, start from an empty structured xml and transform the structure according to your needs.