I'm trying to extract JSON fields from syslog inputs.
In ./etc/system/default/props.conf I've added the following lines:
[mylogtype]
SEDCMD-StripHeader = s/^[^{]+//
INDEXED_EXTRACTIONS = json
KV_MODE = none
pulldown_type = true
The SEDCMD works; the syslogs headers are removed.
But the JSON fields are not parsed.
Any ideas?
Resolved. Use the following configuration in props.conf
[yourlogtype]
SEDCMD-StripHeader = s/^[^{]+//
KV_MODE = json
pulldown_type = true
Related
My flume config:
agent.sinks = s3hdfs
agent.sources = MySpooler
agent.channels = channel
agent.sinks.s3hdfs.type = hdfs
agent.sinks.s3hdfs.hdfs.path = s3a://mybucket/test
agent.sinks.s3hdfs.hdfs.filePrefix = FilePrefix
agent.sinks.s3hdfs.channel = channel
agent.sinks.s3hdfs.hdfs.useLocalTimeStamp = true
agent.sources.MySpooler.channels = channel
agent.sources.MySpooler.type = spooldir
agent.sources.MySpooler.spoolDir = /flume_to_aws
agent.sources.MySpooler.fileHeader = true
agent.channels.channel.type = memory
agent.channels.channel.capacity = 100
Now I will add a file in /flume_to_aws folder with the following content (text):
Oracle and SQL Server
After it is uploaded in S3, I downloaded the file and opened it, and it show the following text:
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable
Œúg ÊC•ý¤ïM·T.C ! †"ûþ Oracle and SQL ServerÿÿÿÿŒúg ÊC•ý¤ïM·T.C
Why the file is not uploaded only with the text "Oracle and SQL Server"??
Problem solved. I have found this question in stackoverflow here
Flume is generating files in binary format instead of text format.
So, I have added the following lines:
agent.sinks.s3hdfs.hdfs.writeFormat = Text
agent.sinks.s3hdfs.hdfs.fileType = DataStream
I have to read a file using this file format:
CREATE OR REPLACE FILE FORMAT FF_CSV
TYPE = CSV
COMPRESSION = GZIP
RECORD_DELIMITER = '\n'
FIELD_DELIMITER = 'µµµ'
FILE_EXTENSION = 'csv'
SKIP_HEADER = 0
SKIP_BLANK_LINES = TRUE
DATE_FORMAT = AUTO
TIME_FORMAT = AUTO
TIMESTAMP_FORMAT = AUTO
BINARY_FORMAT = UTF8
ESCAPE = NONE --may need to set to '<character>'
ESCAPE_UNENCLOSED_FIELD = NONE --may need to set to '<character>'
TRIM_SPACE = TRUE
FIELD_OPTIONALLY_ENCLOSED_BY = '"' --may need to set to '<character>'
NULL_IF = '' --( '<string>' [ , '<string>' ... ] )
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
REPLACE_INVALID_CHARACTERS = TRUE
VALIDATE_UTF8 = TRUE
EMPTY_FIELD_AS_NULL = TRUE
SKIP_BYTE_ORDER_MARK = FALSE
ENCODING = UTF8
;
For now I just want to test whether this definition would work correctly with my file or not. However, I am uncertain about how I can test for this. This is how I upload the file to my Snowflake stage:
put file://Users/myname/Desktop/leg.csv #:~
Now, how can I use FILE FORMAT FF_CSVin a select statement such that I can read my uploaded file using that format?
You would use a copy into statement to pull the data from the stage into a table.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
I would also recommend checking out some of these pages around Loading data into Snowflake.
https://docs.snowflake.com/en/user-guide-data-load.html
I have created an ICF handler class which sends files to the sender. The thing is, it works fine with single file where i am reading the data in binary format and attaching the same in body part using set_data.
But when I try to add more than 1 file, I am unable to add 2 files separately. i am using IF_HTTP_EXTENSION and do not have NTW GATEWAY component yet.
I am also using MULTIPART feature, but dont konw exactly on how to add 2 files separately. Can you please help me ?
//file1
server->response->set_header_field( name = 'Content-Type' value = 'multipart/mixed').
CONCATENATE 'form-data;name="file"; filename="' filename+5(9) '"' INTO lv_header_value.
server->response->set_header_field( name = 'content-disposition' value = lv_header_value ).
server->response->set_data( data = attach_xstring ).
//file2
server->response->add_multipart( ).
CONCATENATE 'form-data;name="file"; filename="' filename+5(9) '"' INTO lv_header_value.
server->response->set_header_field( name = 'content-disposition' value = lv_header_value ).
server->response->set_data( data = attach_xstring ).
You need to use add_multipart() method. Try like this:
cl_http_client=>create( EXPORTING host = host service = port scheme = scheme
IMPORTING client = lo_http_client ).
lo_http_client->request->set_header_field( name = 'Content-Type' value = 'multipart/form-data' ). "#EC NOTEXT
lo_request_part = lo_http_client->request->add_multipart( ).
lo_request_part->set_content_type( 'application/xml' ).
lv_content_disposition = |form-data; name="item"; filename="item_data.xml" |.
lo_request_part->set_header_field( name = `Content-Disposition` value = lv_content_disposition ).
lo_request_part->set_data( data = lv_create_item_xml ).
LOOP AT mt_files ASSIGNING <attachment>.
lo_request_part = lo_http_client->request->add_multipart( ).
lo_request_part->set_content_type( <attachment>-content_type ). "#EC NOTEXT
lv_content_disposition = |form-data; name="{ <attachment>-part_name }"; filename="{ <attachment>-filename }" |.
lo_request_part->set_header_field( name = `Content-Disposition` value = lv_content_disposition ).
lo_request_part->set_data( <attachment>-file ).
ENDLOOP.
It is sample for request, but for response the scheme should be the same. Here initially xml-file added to request and them multiple attachments are processed in loop.
I want to combine two requests to the Google cloud text-to-speech API in a single mp3 output. The reason I need to combine two requests is that the output should contain two different languages.
Below code works fine for many language pair combinations, but unfortunately not for all. If I request e.g. a sentence in English and one in German and combine them everything works. If I request one in English and one in Japanes I can't combine the two files in a single output. The output only contains the first sentence and instead of the second sentence, it outputs silence.
I tried now multiple ways to combine the two outputs but the result stays the same. The code below should show the issue.
Please run the code first with:
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'August' --code2 de-De
This works perfectly.
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'こんにちは' --code2 ja-JP
This doesn't work. The single files are ok, but the combined files contain silence instead of the Japanese part.
Also, if used with two Japanes sentences everything works.
I already filed a bug report at Google with no response yet, but maybe it's just me who is doing something wrong here with encoding assumptions. Hope someone has an idea.
#!/usr/bin/env python
import argparse
# [START tts_synthesize_text_file]
def synthesize_text_file(text1, text2, code1, code2):
"""Synthesizes speech from the input file of text."""
from apiclient.discovery import build
import base64
service = build('texttospeech', 'v1beta1')
collection = service.text()
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = '<speak><break time="2s"/></speak>'
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
audio_pause = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_pause = response['audioContent']
ssmlLine = '<speak>' + text1 + '</speak>'
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = ssmlLine
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
# The response's audio_content is binary.
with open('output1.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output1.mp3"')
audio_text1 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text1 = response['audioContent']
ssmlLine = '<speak>' + text2 + '</speak>'
data2 = {}
data2['input'] = {}
data2['input']['ssml'] = ssmlLine
data2['voice'] = {}
data2['voice']['ssmlGender'] = 'MALE'
data2['voice']['languageCode'] = code2 #'ko-KR'
data2['audioConfig'] = {}
data2['audioConfig']['speakingRate'] = 0.8
data2['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data2)
response = request.execute()
# The response's audio_content is binary.
with open('output2.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output2.mp3"')
audio_text2 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text2 = response['audioContent']
result = audio_text1 + audio_pause + audio_text2
with open('result.mp3', 'wb') as out:
out.write(result)
print('Audio content written to file "result.mp3"')
raw_result = raw_text1 + raw_pause + raw_text2
with open('raw_result.mp3', 'wb') as out:
out.write(base64.b64decode(raw_result.decode('UTF-8')))
print('Audio content written to file "raw_result.mp3"')
# [END tts_synthesize_text_file]ls
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--t1')
parser.add_argument('--code1')
parser.add_argument('--t2')
parser.add_argument('--code2')
args = parser.parse_args()
synthesize_text_file(args.t1, args.t2, args.code1, args.code2)
You can find the answer here:
https://issuetracker.google.com/issues/120687867
Short answer: It's not clear why it is not working, but Google suggests a workaround to first write the files as .wav, combine and then re-encode the result to mp3.
I have managed to do this in NodeJS with just one function (idk how optimal is it, but at least it works). Maybe you could take inspiration from it
I have used memory-streams dependency from npm
var streams = require('memory-streams');
function mergeAudios(audios) {
var reader = new streams.ReadableStream();
var writer = new streams.WritableStream();
audios.forEach(element => {
if (element instanceof streams.ReadableStream) {
element.pipe(writer)
}
else {
writer.write(element)
}
});
reader.append(writer.toBuffer())
return reader
}
Input parameter is a list which contain ReadableStream or responce.audioContent from synthesizeSpeech operation. If it is readablestream, it uses pipe operation, if it is audiocontent, it uses write method. At the end all content is passed into an readabblestream.
I followed softlayer-object-storage-python in order to return a list of my objects matching a specific criteria.
This code seems to just return everything in my container no matter what I put into the search
sl_storage = object_storage.get_client(
username = environment['slos_username'],
password = environment['api_key'],
auth_url = environment['auth_url']
)
# get container
sl_container = sl_storage[environment['object_container']]
# get list, the search function doesn't actually work...
containers = sl_container.search("icm10restapi-qa.zip.*")
I expect only to get back things that start with icm10restapi-qa.zip.
I also tried using ^=icm10restapi-qa.zip but no luck either.
Reviewing the method, it seems that there is not possible to filter the objects as you would like:
https://github.com/softlayer/softlayer-object-storage-python/blob/master/object_storage/client.py#L147
API Operations for Search Services
My apologizes for the inconveniences, I recommended to try filter these in your code.
Updated
This script will help to filter your objects with the name which starts as specific string
import object_storage
import pprint
# Declare username, apikey and datacenter
USERNAME = 'set me'
API_KEY = 'set me'
DATACENTER = 'https://dal05.objectstorage.softlayer.net/auth/v1.0/'
# Creating object storage connection
sl_storage = object_storage.get_httplib2_client(USERNAME, API_KEY, auth_url=DATACENTER)
# Declare name to filter
name = 'icm10restapi-qa.zip'
# Filtering
containers = sl_storage.search(name)
for container in containers['results']:
if container.__dict__['name'].startswith(name):
print(container)