how to get a binary schema from a text schema file? - flatbuffers

I have a flatbuffer text schema se.fbs file.
How can I get a corresponding binary schema file?
This cmd does not work: flatc --cpp se.fbs --schema

Related

how to read a mounted dbc file in databricks?

I try to read a dbc file in databricks (mounted from an s3 bucket)
the file path is:
file_location="dbfs:/mnt/airbnb-dataset-ml/dataset/airbnb.dbc"
how to read this file using spark?
I tried the code below:
df=spark.read.parquet(file_location)
But it generates and error:
AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
thanks for help !
I tried the code below: df=spark.read.parquet(file_location) But it
generates and error:
You are using spark.read.parquet but want to read dbc file. It won't work this way.
Don't use parquet but use load. Give file path with file name (without .dbc extension) in path parameter and dbc in format paramter.
Try below code:
df=spark.read.load(path='<file_path_with_filename>', format='dbc')
Eg: df=spark.read.load(path='/mnt/airbnb-dataset-ml/dataset/airbnb', format='dbc')

Create a ADF Dataset to load multiple csv files (same format) from the Blob

I try to create a dataset containing multiple csv files from the Blob. In the file path of dataset setting: I create a parameter - #dataset().FolderName and add FolderName in the Parameters.
I leave file (from File Path) empty as I want to grab all files in the folder. However, there is no data when I preview data. Is there anything missing? Thank you
I have tested it on my side and it can work fine.
add FolderName parameter
preview data
If you want to merge all csv files in Data Flow, you can do this:
1.output to single file
2.set Single partition

How to ignore errors but not skip rows in redshift copy command

I have a nested json as my source file in S3 and I am trying to copy this file into redshift.
My issues with this are as follows,
I use MAXERROR - I need to skip certain errors because the source file is missing certain fields in some cases and has them in other
I use a JSONPATH file - to pick the fields that I need to copy to redshift
All the columns in the table are varchar
Obviously, since I am using maxerror the copy command executes successfully but the table has 0 records. Here is my copy command
COPY public.table(col1,col2,col3,col4,col5,col6)
from 's3://bucket/filename'
credentials 'redshift'
format as JSON 'jsonpathfile.json'
timeformat 'YYYY-MM-DDTHH:MI:SS'
EMPTYASNULL ACCEPTANYDATE ACCEPTINVCHARS TRUNCATECOLUMNS maxerror 100 ;
If I check into stl_load_errors it keeps saying
Invalid JSONPath format: Member is not an object.
Does this mean the copy command is not able to find even one object that fits the jsonpath file?
Which is definitely not true. I inferred the schema of the input file to design the jsonpath file.
Here is an example from COPY Examples - Amazon Redshift:
copy category
from 's3://mybucket/category_object_paths.json'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
json 's3://mybucket/category_jsonpath.json';
The path to the jsonpath file is specified fully, whereas your example just refers to the filename.
Try specifying the full path starting with s3:// and see whether that helps.

how to convert Flatbuffer data file to json?

I used flatbuffer to encode data into a file named 'person.txt', how can I convert it into a json file ?
I trying 'flatc --json person.txt person.json' but it failed.
I have 'person.fbs, person.txt', How to do it ?
you need to provide path of schema too in the command you are executing.
flatc.exe --raw-binary -t <path to fbs schema file> -- <path to flatbuffer binary file>

Using BQ command line change configuration configuration.load.quote

I want to know how using BQ command line tool I can change configuration of a BigQuery API job. E.g., I want to change configuration.load.quote property from command line tool. Is there is any way. I need this to load a table with field double quote(") inside.
You cannot modify a job once it is created, but I guess what you want is set the quote property when creating the job.
In most cases, bq help <command> will get you what you need. Here's the output of bq help load. As you can see, you just have to specify --quote="'" after the command but before the arguments.
$ bq help load
Python script for interacting with BigQuery.
USAGE: bq.py [--global_flags] <command> [--command_flags] [args]
load Perform a load operation of source into destination_table.
Usage:
load <destination_table> <source> [<schema>]
The <destination_table> is the fully-qualified table name of table to
create, or append to if the table already exists.
The <source> argument can be a path to a single local file, or a
comma-separated list of URIs.
The <schema> argument should be either the name of a JSON file or a
text schema. This schema should be omitted if the table already has
one.
In the case that the schema is provided in text form, it should be a
comma-separated list of entries of the form name[:type], where type
will default to string if not specified.
In the case that <schema> is a filename, it should contain a single
array object, each entry of which should be an object with properties
'name', 'type', and (optionally) 'mode'. See the online documentation
for more detail:
https://developers.google.com/bigquery/preparing-data-for-bigquery
Note: the case of a single-entry schema with no type specified is
ambiguous; one can use name:string to force interpretation as a
text schema.
Examples:
bq load ds.new_tbl ./info.csv ./info_schema.json
bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3
Arguments:
destination_table: Destination table name.
source: Name of local file to import, or a comma-separated list of
URI paths to data to import.
schema: Either a text schema or JSON file, as above.
Flags for load:
/home/David/google-cloud-sdk/platform/bq/bq.py:
--[no]allow_jagged_rows: Whether to allow missing trailing optional columns in
CSV import data.
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import
data.
-E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input
file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between
columns in the input file. "\t" and "tab" are accepted names for tab.
--[no]ignore_unknown_values: Whether to allow and ignore extra, unrecognized
values in CSV or JSON import data.
--max_bad_records: Maximum number of bad records allowed before the entire job
fails.
(default: '0')
(an integer)
--quote: Quote character to use to enclose records. Default is ". To indicate
no quote character at all, use an empty string.
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form
name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to
skip.
(an integer)
--source_format: <CSV|NEWLINE_DELIMITED_JSON|DATASTORE_BACKUP>: Format of
source data. Options include:
CSV
NEWLINE_DELIMITED_JSON
DATASTORE_BACKUP
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on
the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value
format.
(default: '')