Using BQ command line change configuration configuration.load.quote - google-bigquery

I want to know how using BQ command line tool I can change configuration of a BigQuery API job. E.g., I want to change configuration.load.quote property from command line tool. Is there is any way. I need this to load a table with field double quote(") inside.

You cannot modify a job once it is created, but I guess what you want is set the quote property when creating the job.
In most cases, bq help <command> will get you what you need. Here's the output of bq help load. As you can see, you just have to specify --quote="'" after the command but before the arguments.
$ bq help load
Python script for interacting with BigQuery.
USAGE: bq.py [--global_flags] <command> [--command_flags] [args]
load Perform a load operation of source into destination_table.
Usage:
load <destination_table> <source> [<schema>]
The <destination_table> is the fully-qualified table name of table to
create, or append to if the table already exists.
The <source> argument can be a path to a single local file, or a
comma-separated list of URIs.
The <schema> argument should be either the name of a JSON file or a
text schema. This schema should be omitted if the table already has
one.
In the case that the schema is provided in text form, it should be a
comma-separated list of entries of the form name[:type], where type
will default to string if not specified.
In the case that <schema> is a filename, it should contain a single
array object, each entry of which should be an object with properties
'name', 'type', and (optionally) 'mode'. See the online documentation
for more detail:
https://developers.google.com/bigquery/preparing-data-for-bigquery
Note: the case of a single-entry schema with no type specified is
ambiguous; one can use name:string to force interpretation as a
text schema.
Examples:
bq load ds.new_tbl ./info.csv ./info_schema.json
bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3
Arguments:
destination_table: Destination table name.
source: Name of local file to import, or a comma-separated list of
URI paths to data to import.
schema: Either a text schema or JSON file, as above.
Flags for load:
/home/David/google-cloud-sdk/platform/bq/bq.py:
--[no]allow_jagged_rows: Whether to allow missing trailing optional columns in
CSV import data.
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import
data.
-E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input
file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between
columns in the input file. "\t" and "tab" are accepted names for tab.
--[no]ignore_unknown_values: Whether to allow and ignore extra, unrecognized
values in CSV or JSON import data.
--max_bad_records: Maximum number of bad records allowed before the entire job
fails.
(default: '0')
(an integer)
--quote: Quote character to use to enclose records. Default is ". To indicate
no quote character at all, use an empty string.
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form
name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to
skip.
(an integer)
--source_format: <CSV|NEWLINE_DELIMITED_JSON|DATASTORE_BACKUP>: Format of
source data. Options include:
CSV
NEWLINE_DELIMITED_JSON
DATASTORE_BACKUP
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on
the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value
format.
(default: '')

Related

How to parameterize values in the csv file which is already parameterized in Jmeter

I have multiple API request bodies and Im passing them using txt file in CSV Data Set Config. This API request bodies have certain values to parameterize. How Do I acheive this? something like paramertize inside parameterized csv file.
If your CSV file contains JMeter Functions or Variables which you want to evaluate in the runtime you need to wrap the variable(s) defined in the CSV Data Set Config into __eval() function
For example if you have:
test.csv file with the single line containing ${foo} and CSV Data Set Config reading this file into some-variable
User Defined Variables which assigns the variable foo the value of bar
And a couple of Debug Samplers for visualization
You will see that:
${some-variable} will return ${foo}, basically the line from the CSV file
${__eval(${some-variable})} will return bar because the variable will be evaluated and its respective value will be resolved.

can dbt seed be used with a Pipe delimited csv files?

We have over a dozen tables we have built that are perfect candidates for dbt seed and it is working great. We do have two files with addresses with commas in them though. I tried to use a pipe delimited file but get a syntax error saying it found an unexpected '|'.
I have searched the internet, getdbt and stackoverflow and don't find any reference to possibly declaring a delimiter in the dbt_project.yml file. Can we use a pipe delimiter in the csv file instead of a comma? Thanks.
Just quote the values like so:
col_1,col_2,col_3
value,"value, with comma",another value
Unfortunately, this is an issue which I've brought up to the dbt team but has yet to be resolved.
See: dbt-core issue #3990
That issue thread has more details into the underlying issues with the agate library's csv function, kwargs, etc.
In the interim, Jeremy was helpful in giving some options which make it possible to parse "|" or other delimiter options via the dbt-external-tables package which you can find more information about here:
See: dbt-external-tables issue #72
The file_format property, used as the input to create external table
statements, accepts a string of any length. So today, you could pass
it:
sources:
- name: my_external_source
tables:
- name: my_external_tbl
external:
location: "#my_stage"
file_format: "( type = csv field_delimiter = 'aa' record_delimiter = 'aabb' )"
The package macros will template this out as:
create or replace external table my_external_source.my_external_tbl
with location = '#my_stage'
file_format = ( type = csv field_delimiter = 'aa' record_delimiter = 'aabb' )
Hope that helps, I find the support for "|" delimited files to be relatively weak in general since they seem to be only used heavily in certain industries (financial) or locales.
Will update answer if anything the above changes but this is current information as of dbt-core v1.0 & dbt-external-tables v0.1.2

Getting Error for Excel to Table Conversion

I just started learning Python and now I'm trying to integrate that with my GIS knowledge. As the title suggests, I'm attempting to convert an Excel sheet to a table but I keep getting errors, one which is wholly undecipherable to me and the other which seems to be suggesting that my file does not exist which, I know is incorrect since I copied it's location directly from it's properties.
Here is a screenshot of my environment. Please help if you can and thanks in advance.
Environment/Error
Simply set, you put the workspace directory inside the filename variable so when arcpy handles it, it tries to acess a file that does not exist, in an unknown workspace.
Try this.
arcpy.env.workspace = "J:\egis_work\dpcd\projects\SHARITA\Python\"
arcpy.ExcelToTable_conversion("Exceltest.xlsx", "Bookstorestable", "Sheet1")
Arcpy uses the following syntax to convert geodatabase tables to excel
It is straight forward.
Example
Excel tables cannot be stored in the geodatabase. Most reasonable thing is to store them in the rootfolder in which the geodatabase with the table is. Say I want to convert table below into excel and save it in the root folder or in the folder in which the geodatabase is.
I will go as follows: I have put the explanations after the #.
import arcpy
import os
from datetime import datetime, date, time
# Set environment settings
in_table= r"C:\working\Sunderwood\Network Analyst\MarchDistances\Centroid.gdb\SunderwoodFirstArcpyTable"
#os.path.basename(in_table)
out_xls= os.path.basename(in_table)+ datetime.now().strftime('%Y%m%d') # Here
#os.path.basename(in_table)- Gives the base name of pathname. In this case, it returns the name table
# + is used in python to concatenate
# datetime.now()- gives todays date
# Converts todays date into a string in the format YYYMMDD
# Please add all the above statements and you notice you have a new file name which is the table you input plus todays date
#os.path.dirname() method in Python is used to get the directory name from the specified path
geodatabase = os.path.dirname(in_table)
# In this case, os.path.dirname(in_table) gives us the geodatabase
# The The join() method takes all items in an iterable and joins them into one string
SaveInFolder= "\\".join(geodatabase.split('\\')[:-1])
# This case, I tell python take \ and join on the primary directory above which I have called geodatabase. However, I tell it to remove some characters. I will explain the split below.
# I use split method. The split() method splits a string into a list
#In the case above it splits into ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances\\Centroid.gdb']. However, that is not what I want. I want to remove "\\Centroid.gdb" so that I remain with the follwoing path ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances']
#Before I tell arcpy to save, I have to specify the workspace in which it will save. So I now make my environment the SaveInFolder
arcpy.env.workspace =SaveInFolder
## Now I have to tell arcpy what I will call my newtable. I use os.path.join.This method concatenates various path components with exactly one directory separator (‘/’) following each non-empty part except the last path component
newtable = os.path.join(arcpy.env.workspace, out_xls)
#In the above case it will give me "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402"
# You notice the newtable does not have an excel extension. I resort to + to concatenate .xls onto my path and make it "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402.xls"
table= newtable+".xls"
#Finally, I call the arcpy method and feed it with the required variables
# Execute TableToExcel
arcpy.TableToExcel_conversion(in_table, table)
print (table + " " + " is now available")

How to ignore errors but not skip rows in redshift copy command

I have a nested json as my source file in S3 and I am trying to copy this file into redshift.
My issues with this are as follows,
I use MAXERROR - I need to skip certain errors because the source file is missing certain fields in some cases and has them in other
I use a JSONPATH file - to pick the fields that I need to copy to redshift
All the columns in the table are varchar
Obviously, since I am using maxerror the copy command executes successfully but the table has 0 records. Here is my copy command
COPY public.table(col1,col2,col3,col4,col5,col6)
from 's3://bucket/filename'
credentials 'redshift'
format as JSON 'jsonpathfile.json'
timeformat 'YYYY-MM-DDTHH:MI:SS'
EMPTYASNULL ACCEPTANYDATE ACCEPTINVCHARS TRUNCATECOLUMNS maxerror 100 ;
If I check into stl_load_errors it keeps saying
Invalid JSONPath format: Member is not an object.
Does this mean the copy command is not able to find even one object that fits the jsonpath file?
Which is definitely not true. I inferred the schema of the input file to design the jsonpath file.
Here is an example from COPY Examples - Amazon Redshift:
copy category
from 's3://mybucket/category_object_paths.json'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
json 's3://mybucket/category_jsonpath.json';
The path to the jsonpath file is specified fully, whereas your example just refers to the filename.
Try specifying the full path starting with s3:// and see whether that helps.

Upload csv files with comma inside it

As per my requirement, I need to upload a .csv file into the application. I am trying to simulate this using loadrunner. The issue I am encoutering is that my csv file is in the below format
Header - AA,BB,CC
Data-xyz,"yyx,zzy",xxz
On using the below statement to upload the file, I am getting an error ""line 2 contains 4 columns instead of 3"
web_submit_data("upload",
"Action=xxx/upload",
"Method=POST",
"EncType=multipart/form-data",
"RecContentType=text/html",
"Referer=xxx",
"Snapshot=t86.inf",
"Mode=HTML",
ITEMDATA,
"Name=utf8", "Value=✓", ENDITEM,
"Name=token", "Value={token_1}", ENDITEM,
"Name=upload_file", "Value={NewParam_5}", "File=yes", "ContentType=text/csv", ENDITEM,
"Name=Button1", "Value=Upload", ENDITEM,
LAST);
AS per information provided in How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner? ,
I tried updating the .prm file to a new delimiter pipe, | but still i get the error.
[parameter:NewParam_5]
Delimiter="|"
ParamName="NewParam_5"
TableLocation="C:\temp"
ColumnName="Col 1"
I also notice that even though I set the delimiter to pipe, if I rightclick on the web_submit_data() and go to Parameter properties, i see a column delimiter option there as well and it is not set to pipe and is set to comma which indicates that this setting is taking higher precedence to the setting in .prm file.
Can someone please guide me the right way to set a new delimiter so that vugen recognizes and parses the csv file as I want it to.
I am using loadrunner 12.5
Thanks for your help.
Do you need to upload a file or a line of comma separated variables? Right now you appear to be reading a line of CSV variables, not a file as your parameter file would contain a list of filenames or a single file reference within the directory of the virtual user (extra files, transferred with the use) or created by the virtual user and then uploaded.