How to use Reject file in infobright with version 4.0.6 GA - ice

In the release notes of version 4.0.6 GA, we can found that the first new feature is "Infobright Loader now supports a Reject file"
But I cannot find anything more about this, just like how to use it, and in which case we may need it.
Please help

To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of
the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD).
For example, if you want to load data from the file DATAFILE.csv to table T but you expects
that 10 rows in this file might be wrongly formatted, you would run the following
commands:
set #BH_REJECT_FILE_PATH = '/tmp/reject_file';
set #BH_ABORT_ON_COUNT = 10;
load data infile DATAFILE.csv into table T;
If less than 10 rows are rejected, a warning will be output, the load will succeed and all
problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a
tenth bad row, the load will terminate with an error and all bad rows found so far will be
output to the file /tmp/reject_file.

Related

is there a way to start to read where we stop in CSV Data Set Config in JMeter?

I have created a test that reads the users from a CSV Data Set Config in JMeter.
For example when i run a test, JMeter reads first 20 users in the CSV file.
Then if i will run the same test again, JMeter again reads the first 20 users in the CSV file.
But i want JMeter to read 20 users but must start the reading from 21st user, and so on.
Is there a way to make this possible?
As per CSV Data Set Config documentation:
By default, the file is only opened once, and each thread will use a different line from the file. However the order in which lines are passed to threads depends on the order in which they execute, which may vary between iterations. Lines are read at the start of each test iteration. The file name and mode are resolved in the first iteration.
So there is no way to specify the "offset" for reading the file, the options are in:
Use __CSVRead() function where you can call ${__CSVRead(/path/to/your/file.csv,next)} as many times as needed to "skip" the lines which are already "used"
Use setUp Thread Group and JSR223 Sampler to remove first 20 lines from the CSV file programmatically
Go for Redis Data Set config instead where you have Recycle Data on Use option, if you set it to False the "used" data will be removed

How to update the "last update" date in text files automatically according their modification date in OS?

There are sevaral source files in VHDL. All files have a header which gives the file name, creation date and description among other things. One of these things is the last update date. All files are version controlled in Git.
What happens is that often the files are modified, commited and pushed up. However, the last update date is not updated often. This happens by mistake since so many different files are worked on at different times and one might forget to always change the "last update" part of the file header to the latest date when it has actually been changed.
I want to automate this process and believe there are many different ways to do this.
A script of some sort, must check the last update date in the text file header. Then, if it is different from the actual last modified date that can be accessed through properties of the file in the file-system, the last update date in the text must be updated to the last modified date value. What would be the most optimal way to do this? A Python script, Bash script or something else?
Basically I want to do this when the files are being commited into Git. It should ideally happen automatically but running one line in terminal to execute script is not a big deal perhaps. The check is required on the files that are being commited and pushed up.
I'm not a Python programmer, but I made a little script to hopefully help you out. Maybe this fits your needs.
What the script should do:
Get all files form the path (here c:\Python) which have the extension .vdhl
Loop over the files and extract the date from line 9 via regex
Get the last modified date from the file
If last modified > then the date in the file, then update the file
import os
import re
import glob
import datetime
path = r"c:\Python"
mylist = [f for f in glob.glob("*.vhdl")]
print(mylist)
for i in mylist:
filepath = os.path.join(path, i)
with open(filepath, 'r+') as f:
content = f.read()
last_update = re.findall("Last\supdate\:\s+(\d{4}-\d{2}-\d{2})", content)
modified = os.path.getmtime(filepath)
modified_readable = str(datetime.datetime.fromtimestamp(modified))[:10]
#print(content)
#print(last_update)
#print(modified_readable)
#print("Date modified:", datetime.datetime.fromtimestamp(modified))
if (modified_readable > last_update[0]):
print(filepath, 'UPDATE')
text = re.sub(last_update[0], modified_readable, content)
f.seek(0)
f.write(text)
f.truncate()
else:
print(filepath, 'NO CHANGE')

How to configure/capture 'failureMessage' to the result file in Jmeter

I have a jmx script which saves the results to a CSV file.
I need to see the 'failureMessage' field in the CSV especially when the 'success' column says 'false' as in the below example. But the failureMessage column always appear as blank irrespective in the csv
Example -
timeStamp|time|label|responseCode|threadName|dataType|success|failureMessage
02/06/03 08:21:42|1187|Home|200|Thread Group-1|text|true|
02/06/03 08:21:42|47|Login|200|Thread Group-1|text|false|Test Failed: expected to contain: password etc.
I tried looking up the jmeter.properties file to check the below which is set to true. But it still doesn't save the message to failureMessage in the csv.
assertion_results_failure_message only affects CSV output
jmeter.save.saveservice.assertion_results_failure_message=true
I cannot reproduce your issue using:
Latest JMeter 5.2.1
With the default Results File Configuration
Running JMeter in command-line non-GUI mode
Demo:
If you cannot see custom assertion failure messages your setup violates at least one of the above 3 points.
Try adding assertions with your requests and you will find it in your results in case of assertions getting failed.

Split CSV file in records and save as a csv file format - Apache NIFI

What I want to do is the following...
I want to divide the input file into registers, convert each record into a
file and leave all the files in a directory.
My .csv file has the following structure:
ERP,J,JACKSON,8388 SOUTH CALIFORNIA ST.,TUCSON,AZ,85708,267-3352,,ALLENTON,MI,48002,810,710-0470,369-98-6555,462-11-4610,1953-05-00,F,
ERP,FRANK,DIETSCH,5064 E METAIRIE AVE.,BRANDSVILLA,MO,65687,252-5592,1176 E THAYER ST.,COLUMBIA,MO,65215,557,291-9571,217-38-5525,129-10-0407,1/13/35,M,
As you can see it doesn't have Header row.
Here is my flow.
My problem is that when the Split Proccessor divides my csv into flows with 400 lines, it isn't save in my output directory.
It's first time using NIFI, sorry.
Make sure your RecordReader controller service is configured correctly(delimiter..etc) to read the incoming flowfile.
Records per split value as 1
You need to use UpdateAttribute processor before PutFile processor to change the filename to unique value (like UUID) unless if you are configured PutFile processor Conflict Resolution strategy as Ignore
The reason behind changing filename is SplitRecord processor is going to have same filename for all the splitted flowfiles.
Flow:
I tried your case and flow worked as expected, Use this template for your reference and upload to your NiFi instance, Make changes as per your requirements.

Internal error while loading to Bigquery table

I ran this command to load 11 files to a Bigquery table:
bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
I got this error:
Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE
BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 5: Unexpected. Please try again.
I tried many times after that and still got the same error.
To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:
/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
There are 11 files total and each ran fine.
Could someone please help? Is this a bug on Bigquery side?
Thank you.
There was an error reading one of the files: gs://...part-m-00005.gz
Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.
It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.
When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.
If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.
Please let me know if you think this is not the case and I'll dig in some more.