Unable to write Dynamicframe without header - dataframe

Hello I am converting a parquet file to csv and wanna write it without header. I have followed this document.
Below is my code:
datasink3 = applymapping1.coalesce(1)
datasink2 = glueContext.write_dynamic_frame.from_options(frame = datasink3, connection_type = "s3", connection_options = {"path": "s3://saphana12/output1"}, format = "csv", format_options = {"writeHeader":'false',"quoteChar":'-1',"separator":'|'}, transformation_ctx = "datasink2")
I see the headers are still appearing.
Am I doing something wrong?
Kindly help me out in this

use {"writeHeader": False,"quoteChar":'-1',"separator":'|'}, transformation_ctx = "datasink2")
Instead of declaring 'false' declare False as it is boolean

Related

Flume not writing correctly in amazon s3 (weird characters)

My flume config:
agent.sinks = s3hdfs
agent.sources = MySpooler
agent.channels = channel
agent.sinks.s3hdfs.type = hdfs
agent.sinks.s3hdfs.hdfs.path = s3a://mybucket/test
agent.sinks.s3hdfs.hdfs.filePrefix = FilePrefix
agent.sinks.s3hdfs.channel = channel
agent.sinks.s3hdfs.hdfs.useLocalTimeStamp = true
agent.sources.MySpooler.channels = channel
agent.sources.MySpooler.type = spooldir
agent.sources.MySpooler.spoolDir = /flume_to_aws
agent.sources.MySpooler.fileHeader = true
agent.channels.channel.type = memory
agent.channels.channel.capacity = 100
Now I will add a file in /flume_to_aws folder with the following content (text):
Oracle and SQL Server
After it is uploaded in S3, I downloaded the file and opened it, and it show the following text:
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable
Œúg ÊC•ý¤ïM·T.C ! †"û­þ Oracle and SQL ServerÿÿÿÿŒúg ÊC•ý¤ïM·T.C
Why the file is not uploaded only with the text "Oracle and SQL Server"??
Problem solved. I have found this question in stackoverflow here
Flume is generating files in binary format instead of text format.
So, I have added the following lines:
agent.sinks.s3hdfs.hdfs.writeFormat = Text
agent.sinks.s3hdfs.hdfs.fileType = DataStream

read a table with specified file format

I have to read a file using this file format:
CREATE OR REPLACE FILE FORMAT FF_CSV
TYPE = CSV
COMPRESSION = GZIP
RECORD_DELIMITER = '\n'
FIELD_DELIMITER = 'µµµ'
FILE_EXTENSION = 'csv'
SKIP_HEADER = 0
SKIP_BLANK_LINES = TRUE
DATE_FORMAT = AUTO
TIME_FORMAT = AUTO
TIMESTAMP_FORMAT = AUTO
BINARY_FORMAT = UTF8
ESCAPE = NONE --may need to set to '<character>'
ESCAPE_UNENCLOSED_FIELD = NONE --may need to set to '<character>'
TRIM_SPACE = TRUE
FIELD_OPTIONALLY_ENCLOSED_BY = '"' --may need to set to '<character>'
NULL_IF = '' --( '<string>' [ , '<string>' ... ] )
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
REPLACE_INVALID_CHARACTERS = TRUE
VALIDATE_UTF8 = TRUE
EMPTY_FIELD_AS_NULL = TRUE
SKIP_BYTE_ORDER_MARK = FALSE
ENCODING = UTF8
;
For now I just want to test whether this definition would work correctly with my file or not. However, I am uncertain about how I can test for this. This is how I upload the file to my Snowflake stage:
put file://Users/myname/Desktop/leg.csv #:~
Now, how can I use FILE FORMAT FF_CSVin a select statement such that I can read my uploaded file using that format?
You would use a copy into statement to pull the data from the stage into a table.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
I would also recommend checking out some of these pages around Loading data into Snowflake.
https://docs.snowflake.com/en/user-guide-data-load.html

Combine two TTS outputs in a single mp3 file not working

I want to combine two requests to the Google cloud text-to-speech API in a single mp3 output. The reason I need to combine two requests is that the output should contain two different languages.
Below code works fine for many language pair combinations, but unfortunately not for all. If I request e.g. a sentence in English and one in German and combine them everything works. If I request one in English and one in Japanes I can't combine the two files in a single output. The output only contains the first sentence and instead of the second sentence, it outputs silence.
I tried now multiple ways to combine the two outputs but the result stays the same. The code below should show the issue.
Please run the code first with:
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'August' --code2 de-De
This works perfectly.
python synthesize_bug.py --t1 'Hallo' --code1 de-De --t2 'こんにちは' --code2 ja-JP
This doesn't work. The single files are ok, but the combined files contain silence instead of the Japanese part.
Also, if used with two Japanes sentences everything works.
I already filed a bug report at Google with no response yet, but maybe it's just me who is doing something wrong here with encoding assumptions. Hope someone has an idea.
#!/usr/bin/env python
import argparse
# [START tts_synthesize_text_file]
def synthesize_text_file(text1, text2, code1, code2):
"""Synthesizes speech from the input file of text."""
from apiclient.discovery import build
import base64
service = build('texttospeech', 'v1beta1')
collection = service.text()
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = '<speak><break time="2s"/></speak>'
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
audio_pause = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_pause = response['audioContent']
ssmlLine = '<speak>' + text1 + '</speak>'
data1 = {}
data1['input'] = {}
data1['input']['ssml'] = ssmlLine
data1['voice'] = {}
data1['voice']['ssmlGender'] = 'FEMALE'
data1['voice']['languageCode'] = code1
data1['audioConfig'] = {}
data1['audioConfig']['speakingRate'] = 0.8
data1['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data1)
response = request.execute()
# The response's audio_content is binary.
with open('output1.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output1.mp3"')
audio_text1 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text1 = response['audioContent']
ssmlLine = '<speak>' + text2 + '</speak>'
data2 = {}
data2['input'] = {}
data2['input']['ssml'] = ssmlLine
data2['voice'] = {}
data2['voice']['ssmlGender'] = 'MALE'
data2['voice']['languageCode'] = code2 #'ko-KR'
data2['audioConfig'] = {}
data2['audioConfig']['speakingRate'] = 0.8
data2['audioConfig']['audioEncoding'] = 'MP3'
request = collection.synthesize(body=data2)
response = request.execute()
# The response's audio_content is binary.
with open('output2.mp3', 'wb') as out:
out.write(base64.b64decode(response['audioContent'].decode('UTF-8')))
print('Audio content written to file "output2.mp3"')
audio_text2 = base64.b64decode(response['audioContent'].decode('UTF-8'))
raw_text2 = response['audioContent']
result = audio_text1 + audio_pause + audio_text2
with open('result.mp3', 'wb') as out:
out.write(result)
print('Audio content written to file "result.mp3"')
raw_result = raw_text1 + raw_pause + raw_text2
with open('raw_result.mp3', 'wb') as out:
out.write(base64.b64decode(raw_result.decode('UTF-8')))
print('Audio content written to file "raw_result.mp3"')
# [END tts_synthesize_text_file]ls
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--t1')
parser.add_argument('--code1')
parser.add_argument('--t2')
parser.add_argument('--code2')
args = parser.parse_args()
synthesize_text_file(args.t1, args.t2, args.code1, args.code2)
You can find the answer here:
https://issuetracker.google.com/issues/120687867
Short answer: It's not clear why it is not working, but Google suggests a workaround to first write the files as .wav, combine and then re-encode the result to mp3.
I have managed to do this in NodeJS with just one function (idk how optimal is it, but at least it works). Maybe you could take inspiration from it
I have used memory-streams dependency from npm
var streams = require('memory-streams');
function mergeAudios(audios) {
var reader = new streams.ReadableStream();
var writer = new streams.WritableStream();
audios.forEach(element => {
if (element instanceof streams.ReadableStream) {
element.pipe(writer)
}
else {
writer.write(element)
}
});
reader.append(writer.toBuffer())
return reader
}
Input parameter is a list which contain ReadableStream or responce.audioContent from synthesizeSpeech operation. If it is readablestream, it uses pipe operation, if it is audiocontent, it uses write method. At the end all content is passed into an readabblestream.

Unable to extract JSON fields in Splunk

I'm trying to extract JSON fields from syslog inputs.
In ./etc/system/default/props.conf I've added the following lines:
[mylogtype]
SEDCMD-StripHeader = s/^[^{]+//
INDEXED_EXTRACTIONS = json
KV_MODE = none
pulldown_type = true
The SEDCMD works; the syslogs headers are removed.
But the JSON fields are not parsed.
Any ideas?
Resolved. Use the following configuration in props.conf
[yourlogtype]
SEDCMD-StripHeader = s/^[^{]+//
KV_MODE = json
pulldown_type = true

How to access project name from a query of type portfolioitem

I am trying to match Project name in my query and also trying to print the name of the project associated with each feature record. I know there are plenty of answers but I couldn't find anything that could help me. I am trying to do something like this:
pi_query.type = "portfolioitem"
pi_query.fetch="Name,FormattedID,Owner,c_ScopingTeam,c_AspirationalRelease,c_AssignedProgram,Tags"
#To be configured as per requirement
pi_query.project_scope_up = false
pi_query.project_scope_down = false
pi_query.order = "FormattedID Asc"
pi_query.query_string = "(Project.Name = \"Uni - Serviceability\")"
pi_results = #rally.find(pi_query)
I am trying to match the project name but it simply doesn't work, I also tried printing the name of the project, i tried Project.Name, Project.Values or simply Project. But it doesn't work. I am guessing it is because of my query type which is "portfolioItem" and I can't change my type because I am getting all other attribute values correctly.
Thanks.
Make sure to fetch Project, e.g: feature_query.fetch = "Name,FormattedID,Project"
and this should work:
feature_query.query_string = "(Project.Name = \"My Project\")"
Here is an example where a feature is found by project name.
require 'rally_api'
#Setup custom app information
headers = RallyAPI::CustomHttpHeader.new()
headers.name = "create story in one project, add it to a feature from another project"
headers.vendor = "Nick M RallyLab"
headers.version = "1.0"
# Connection to Rally
config = {:base_url => "https://rally1.rallydev.com/slm"}
config[:username] = "user#co.com"
config[:password] = "secret"
config[:workspace] = "W"
config[:project] = "Product1"
config[:headers] = headers #from RallyAPI::CustomHttpHeader.new()
#rally = RallyAPI::RallyRestJson.new(config)
obj = {}
obj["Name"] = "new story xyz123"
new_s = #rally.create("hierarchicalrequirement", obj)
query = RallyAPI::RallyQuery.new()
query.type = "portfolioitem"
query.fetch = "Name,FormattedID,Project"
query.workspace = {"_ref" => "https://rally1.rallydev.com/slm/webservice/v2.0/workspace/12352608129" }
query.query_string = "(Project.Name = \"Team Group 1\")"
result = #rally.find(query)
feature = result.first
puts feature
field_updates={"PortfolioItem" => feature}
new_s.update(field_updates)