Create a Splunk alert from a log file when a file with name hello.imp is below 10 bytes - splunk

I'm trying to write a Splunk query where it searches for a file called hello.imp from a log file and returns with a output if the file size is below 10 bytes. I have the index and log location but unable to find the exact query. Please help me out in a writing a query and creating an alert out of it.

You can get the size of a source file by adding up the sizes of each event within that file. Like this:
index=foo source=bar
| eval size=len(_raw)
| stats sum(size) as TotalSize

Related

Log file size calculated using len(_raw) in Splunk does not match even close to the actual file size on the host?

I am using a Splunk query to calculate the size of logs files sent to Splunk. This is the Splunk query I have used:
index="<my_index>" path="/<my_path>/<my_log_file>"
| eval raw_len=len(_raw)
| eval raw_len_kb = raw_len/1024
| eval raw_len_mb = raw_len/1024/1024
| eval raw_len_gb = raw_len/1024/1024/1024
| stats sum(raw_len) as Bytes sum(raw_len_kb) as KB sum(raw_len_mb) as MB sum(raw_len_gb) as GB by source
| addcoltotals
Splunk reports the size as 17 GB. On the other hand, when I do this on the Unix host:
ls -l /<my_path>/<my_log_file>
the value is just a few MB.
Any idea why there is so much difference?
One should not expect the size of data indexed in Splunk to exactly match the size reported by an OS. This is because Splunk by default removes line ends and because the len function counts characters rather than bytes.
Also, the query shown does not account for multiple hosts sending data to Splunk. There's no time window indicated so we don't know if the file may have been truncated at time point while Splunk still retains all of the data the file ever held.

Add file name and timestamp into each record in BigQuery using Dataflow

I have a few .txt files with data in JSON to be loaded to google BigQuery table. Along with the columns in the text files I will need to insert filename and current timestamp for each rows. It is in GCP Dataflow with Python 3.7
I accessed the Filemetadata containing the filepath and size using GCSFileSystem.match and metadata_list.
I believe I need to get the pipeline code to run in a loop, pass the filepath to ReadFromText, and call a FileNameReadFunction ParDo.
(p
| "read from file" >> ReadFromText(known_args.input)
| "parse" >> beam.Map(json.loads)
| "Add FileName" >> beam.ParDo(AddFilenamesFn(), GCSFilePath)
| "WriteToBigQuery" >> beam.io.WriteToBigQuery(known_args.output,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
I followed the steps in Dataflow/apache beam - how to access current filename when passing in pattern? but I can't make it quite work.
Any help is appreciated.
You can use textio.ReadFromTextWithFilename instead of ReadFromText. That will produce a PCollection of (filename,line) tuples.
To include the file and timestamp in your output json record, you could change your "parse" line to
| "parse" >> beam.map(lambda (file, line): {
**json.loads(line),
"filename": file,
"timestamp": datetime.now()})

How to loop over CSV file and write each line in Write File activity?

I am using TIBCO BW 6.5 designer, I am trying to read a large CSV file (having ; as separator). Below are some of my sample CSV file data:-
ORDER_NUMBER;CODE_NUMBER
A;014 53758
B;015 73495
C;016 67569
D;017 59390
I am trying to start reading from 2nd line i.e. "A;014 53758".
I am using "ParseData" activity which is placed inside a "Repeat" group as shown in Image below:-
The configuration of my "Repeat" group is below:-
The configuration of my "ParseData" is:-
In my WriteFile I have checked the "append" box, and I am writing as 'Text' in my file. The textContent for my WriteFile is :-
concat($ParseData/Rows/Updates[$index]/ORDER_NUMBER, $ParseData/Rows/Updates[$index]/CODE_NUMBER , '&crlf;')
But when I run my project, the Write File only writes the first row and all the rest rows are blank.
Can anybody please help in rectifying what I am doing wrong.
Thanks,
Rudra
Try this :
ParseData activity input : startRecord should be 1 intead of $index + 1
WriteFile activity input : concat($ParseData/Rows/Updates[1]/ORDER_NUMBER,
-$ParseData/Rows/Updates[1]/CODE_NUMBER, '&crlf;') (1 instead of $index)
You can uncheck the accumulate in the repeat loop

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Hi , Google big query - bq fail load display file number how to get the file name

I'm running the following bq command
bq load --source_format=CSV --skip_leading_rows=1 --max_bad_records=1000 --replace raw_data.order_20150131 gs://raw-data/order/order/2050131/* order.json
and
getting the following message when loading data into bq .
*************************************
Waiting on bqjob_r4ca10491_0000014ce70963aa_1 ... (412s) Current status: DONE
BigQuery error in load operation: Error processing job
'orders:bqjob_r4ca10491_0000014ce70963aa_1': Too few columns: expected
11 column(s) but got 1 column(s). For additional help: http://goo.gl/RWuPQ
Failure details:
- File: 844 / Line:1: Too few columns: expected 11 column(s) but got
1 column(s). For additional help: http://goo.gl/RWuPQ
**********************************
The message display only the file number .
checked the files content most of them are good .
gsutil ls and the cloud console on the other hand display file names .
how can I know which file is it according to the file number?
There seems to be some weird spacing introduced in the question, but if the desired path to ingest is "/order.json" - that won't work: You can only use "" at the end of the path when ingesting data to BigQuery.