I am trying to add a label to my bigquery table/view using the following bq command.
bq update --set_label primary_keys:a,b project-id:dataset.tablename
The command works perfectly fine if I have only one key (a) as the primary key. However, when I try to insert multiple keys (a,b) separated by comma then it throws an invalid characters error. Is there a way to add multiple keys within the same label separated by comma.
I don't think that this is feasible, thus comma character is not accepted there, according to the documentation:
Keys and values can contain only lowercase letters, numeric
characters, underscores, and dashes. All characters must use UTF-8
encoding, and international characters are allowed.
According to the documentation, labels are key-value pairs that helps you organize your Google Cloud BigQuery resources.
Being a key-value pair is a requirement as per the documentation, and this is not compatible with your intention of giving two different values to the same key.
Related
In my project I saw two Hive tables and in the create table statement I saw one table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0004' and another table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C'. I want to know what does these '\u0004' and '\u001C' mean and when to use them? Kindly answer.
In many text formats, \u introduces a Unicode escape sequence. This is a way of storing or sending a character that can't be easily displayed or represented in the format you're using. The four characters after the \u are the Unicode "code point" in hexadecimal. A Unicode code point is a number denoting a specific Unicode character.
All characters have a code point, even the printable ones. For example, a is U+0061.
U+0004 and U+001C are both unprintable characters, meaning there's no standard character you can use to display them on the screen. That's why an escape sequence is used here.
If you use a simple, printable character like , as your field delimiter, it will make the stored data easier for a human to read. The field values will be stored with a , between each one. For example, you might see the values one, two and three stored as:
one,two,three
But if you expect your field values to actually contain a ,, it would be a poor choice of field delimiter (because then you'd need a special way to tell the difference between a single field with a value of one,two or two different fields with the values one and two). The choice of delimiter depends both on whether you want to be able to read it easily, and what characters you expect the field to contain.
does anyone know how to include special characters like $ on a table name or if the table already exist with a $ on its name, how to ingest it on bigquery.
Thank you in advance,
That's not possible.
BigQuery column names must contain only letters, numbers, and undescores.
They must start with either a letter or an underscore.
Link to relevant doc: https://cloud.google.com/bigquery/docs/schemas#column_names
If a table contains special characters, you need to change its name before ingesting to BigQuery ($COLUMN_NAME -> DOLLAR_COLUMN_NAME, maybe?)
Bigquery column names (fields) can only contain English letters, numbers, and underscores.
I am using python and I want to create a script to migrate my data from Postgres to Bigquery and the Postgres tables have many non-english column names.
I will probably need to encode the column names to some format that Bigquery accepts, but I will need the ability to later decode it back to the original.
what is the best way to do this?
You can encode the column names to something like base64 and replace the +=/ characters to some kind of place holder.
If you don't care about fields length you can encode to base32 (its about 20% longer then base64 but don't use '+' or '/' and the '=' is used only for padding so you can discard it and it wont affect the string)
Except that you can make small conversion table for each non English character in your language to some combination in English chars, this will work only if you have small amount of non-english characters.
I am facing an issue while fetching the data via query from a redshift table. For example:
table name: test_users
column names: user_id, userName, userLastName
Now while creating the test_users table it converts the capital letter of the userName column to username and similar with userLastName which will be converted to userlastname.
I have found the way to convert the all columns to capital or in lowercase, but not in the way to get it as it is.
Unfortunately, AWS Redshift does not support case-sensitive identifiers at the time of writing (Feb 2020). And, while Redshift is based on PostgreSQL, AWS has heavily modified it to the point where many assumptions that would be correct for PostgreSQL 8 are not correct for Redshift.
The documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_names.html explicitly states that it downcases identifiers. The relevant paragraph is below, with the critical sentence bolded:
Names identify database objects, including tables and columns, as well as users and passwords. The terms name and identifier can be used interchangeably. There are two types of identifiers, standard identifiers and quoted or delimited identifiers. Identifiers must consist of only UTF-8 printable characters. ASCII letters in standard and delimited identifiers are case-insensitive and are folded to lowercase in the database. In query results, column names are returned as lowercase by default. To return column names in uppercase, set the describe_field_name_in_uppercase configuration parameter to true.
To preserve case:
SET enable_case_sensitive_identifier TO true;
https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html
To force returned uppercase fields (for anyone else curious):
SET describe_field_name_in_uppercase TO on;
https://docs.aws.amazon.com/redshift/latest/dg/r_describe_field_name_in_uppercase.html
I have a string with value
'MAX DATE QUERY: SELECT iso_timestamp(MAX(time_stamp)) AS MAXTIME FROM observation WHERE offering_id = 'HOBART''
But on inserting into postgresql table i am getting error:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "HOBART".
This is probably because my string contains single quotes. I don't know my string value. Every time it keeps changing and may contain special characters like \ or something since I am reading from a file and saving into postgres database.
Please give a general solution to escape such characters.
As per the SQL standard, quotes are delimited by doubling them, ie:
insert into table (column) values ('I''m OK')
If you replace every single quote in your text with two single quotes, it will work.
Normally, a backslash escapes the following character, but literal backslashes are similarly escaped by using two backslashes"
insert into table (column) values ('Look in C:\\Temp')
You can use double dollar quotation to escape the special characters in your string.
The above query as mentioned insert into table (column) values ('I'm OK')
changes to insert into table (column) values ($$I'm OK$$).
To make the identifier unique so that it doesn't mix with the values, you can add any characters between 2 dollars such as
insert into table (column) values ($aesc6$I'm OK$aesc6$).
here $aesc6$ is the unique string identifier so that even if $$ is part of the value, it will be treated as a value and not a identifier.
You appear to be using Java and JDBC. Please read the JDBC tutorial, which describes how to use paramaterized queries to safely insert data without risking SQL injection problems.
Please read the prepared statements section of the JDBC tutorial and these simple examples in various languages including Java.
Since you're having issues with backslashes, not just 'single quotes', I'd say you're running PostgreSQL 9.0 or older, which default to standard_conforming_strings = off. In newer versions backslashes are only special if you use the PostgreSQL extension E'escape strings'. (This is why you always include your PostgreSQL version in questions).
You might also want to examine:
Why you should use prepared statements.
The PostgreSQL documentation on the lexical structure of SQL queries.
While it is possible to explicitly quote values, doing so is error-prone, slow and inefficient. You should use parameterized queries (prepared statements) to safely insert data.
In future, please include a code snippet that you're having a problem with and details of the language you're using, the PostgreSQL version, etc.
If you really must manually escape strings, you'll need to make sure that standard_conforming_strings is on and double quotes, eg don''t manually escape text; or use PostgreSQL-specific E'escape strings where you \'backslash escape\' quotes'. But really, use prepared statements, it's way easier.
Some possible approaches are:
use prepared statements
convert all special characters to their equivalent html entities.
use base64 encoding while storing the string, and base64 decoding while reading the string from the db table.
Approach 1 (prepared statements) can be combined with approaches 2 and 3.
Approach 3 (base64 encoding) converts all characters to hexadecimal characters without loosing any info. But you may not be able to do full-text search using this approach.
Literals in SQLServer start with N like this:
update table set stringField = N'/;l;sldl;'''mess'