My application writes log data to disk file. The log data is one-line json as below. I use the splunker-forwarder to send the log to splunk indexer
{"line":{"level": "info","message": "data is correct","timestamp": "2017-08-01T11:35:30.375Z"},"source": "std"}
I want to only send the sub-json object {"level": "info","message": "data is correct","timestamp": "2017-08-01T11:35:30.375Z"} to splunk indexer, not the whole json. How should I configure splunk forwarder or splunk indexer?
You can use sedcmd to delete data before it gets written to disk by the indexer(s).
Add this to your props.conf
[Yoursourcetype]
#...Other configurations...
SEDCMD-removejson = s/(.+)\:\{/g
This is an index time setting, so you will need to restart splunkd for changes to take affect
Related
I have installed splunk UF on windows . I have one static log file in system (json) and that need to be monitored. I have configure this in inputs.conf file.
I see only system/application and security logs being sent to indexer whereas the static log file is not seen.
I ran "splunk list inputstatus" and checked,
C:\Users\Administrator\Downloads\test\test.json
file position = 75256
file size = 75256
percent = 100.00
type = finished reading
So, this means the file is being read properly.
What can be the issue that I dont see test.json logs at splunk side ? I tried checking index=_internal at indexer but not able to figure out what is causing issue, I checked few blogs on Internet as well. Can anyone please help on this.
inputs.conf stanza:
[monitor://C:\Users\Administrator\Downloads\data test\test.json]
disabled = 0
index = test_index
sourcetype = test_data
I am reading different logs from same source folder. But not all files are getting read, one stanza works other don't.
If i restart the UF, all stanzas work, but changed data is not capturing by one stanza.
files i am planning to monitor below files
performance_data.log
performance_data.log.1
performance_data.log.2
performance_data.log.3
performance.log
performance.log.1
performance.log.2
SystemOut.log
my input.conf file
[default]
host = LOCALHOST
[monitor://E:\Data\AppServer\A1\performance_data.lo*]
source=applogs
sourcetype=data_log
index=my_apps
[monitor://E:\Data\AppServer\A1\performance.lo*]
source=applogs
sourcetype=perf_log
index=my_apps
[monitor://E:\Data\logs\ImpaCT_A1\SystemOu*]
source=applogs
sourcetype=systemout_log
index=my_apps
\performance_data.lo* and \SystemOu* stanzas working fine, but performance.lo* stanza not working. only sends data when i restart the UF (universal forwarder), but changes were not sending automatically like other stanzas did.
Anything i am doing wrong here ?
It may be the buffer speed got exceed the limit so forwarder unable to send data to splunk
so try to add in input.conf like below
and create limit.conf in local path
input.conf
[monitor://E:\Data\AppServer\A1\performance.lo*]
source=applogs
sourcetype=perf_log
index=my_apps
crcSalt = <SOURCE>
limits.conf
[thruput]
maxKBps = 0
I am copying data from a JSON script into my SQLDW. I am using the Copy Activity from DF2 where my datasets are HTTP-Source and SQLDW-Target
Everything is working fine until I set up a URL parameter for my HTTP LinkedService connection and then I want to feed it with a LookUp Activity.
Is it possible to parameterize LinkedServices?
The ‘Linked Service parameterization’ feature is currently under preview.
I want to write a test for my spark streaming application that consume a flume source.
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/ suggests using ManualClock but for the moment reading a file and verifying outputs would be enough for me.
So I wish to use :
JavaStreamingContext streamingContext = ...
JavaDStream<String> stream = streamingContext.textFileStream(dataDirectory);
stream.print();
streamingContext.awaitTermination();
streamingContext.start();
Unfortunately it does not print anything.
I tried:
dataDirectory = "hdfs://node:port/absolute/path/on/hdfs/"
dataDirectory = "file://C:\\absolute\\path\\on\\windows\\"
adding the text file in the directory BEFORE the program begins
adding the text file in the directory WHILE the program run
Nothing works.
Any suggestion to read from text file?
Thanks,
Martin
Order of start and await are indeed inversed.
In addition to that, the easiest way to pass data to your Spark Streaming application for testing is a QueueDStream. It's a mutable queue of RDD of arbitrary data. This means that you could create the data programmatically or load it from disk into an RDD and pass that to your Spark Streaming code.
Eg. to avoid the timing issues faced with the fileConsumer, you could try this:
val rdd = sparkContext.textFile(...)
val rddQueue: Queue[RDD[String]] = Queue()
rddQueue += rdd
val dstream = streamingContext.queueStream(rddQueue)
doMyStuffWithDstream(dstream)
streamingContext.start()
streamingContext.awaitTermination()
I am so stupid, I inverted calls to start() and awaitTermination()
If you want to do the same, you should read from HDFS, and add the file WHILE the program runs.
I have data in Spark which I want to save to S3. The recommended method is to save is using the saveAsTextFile method on the SparkContext, which is successful. I expect that the data will be saved as 'parts'.
My problem is that when I go to S3 to look at my data it has been saved in a folder name _temporary, with a subfolder 0 and then each part or task saved in its own folder.
For example,
data.saveAsTextFile("s3:/kirk/data");
results in file likes
s3://kirk/data/_SUCCESS
s3://kirk/data/_temporary/0/_temporary_$folder$
s3://kirk/data/_temporary/0/task_201411291454_0001_m_00000_$folder$
s3://kirk/data/_temporary/0/task_201411291454_0001_m_00000/part-00000
s3://kirk/data/_temporary/0/task_201411291454_0001_m_00001_$folder$
s3://kirk/data/_temporary/0/task_201411291454_0001_m_00001/part-00001
and so on. I would expect and have seen something like
s3://kirk/data/_SUCCESS
s3://kirk/data/part-00000
s3://kirk/data/part-00001
Is this a configuration setting, or do I need to 'commit' the save to resolve the temporary files?
I had the same problem with spark streaming, that was because my Sparkmaster was set up with conf.setMaster("local") instead of conf.SetMaster("local[*]")
Without the [*], spark can't execute saveastextfile during the stream.
Try using coalesce() to reduce the rdd to 1 partition before you export.
Good luck!