I am facing an issue while fetching the data via query from a redshift table. For example:
table name: test_users
column names: user_id, userName, userLastName
Now while creating the test_users table it converts the capital letter of the userName column to username and similar with userLastName which will be converted to userlastname.
I have found the way to convert the all columns to capital or in lowercase, but not in the way to get it as it is.
Unfortunately, AWS Redshift does not support case-sensitive identifiers at the time of writing (Feb 2020). And, while Redshift is based on PostgreSQL, AWS has heavily modified it to the point where many assumptions that would be correct for PostgreSQL 8 are not correct for Redshift.
The documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_names.html explicitly states that it downcases identifiers. The relevant paragraph is below, with the critical sentence bolded:
Names identify database objects, including tables and columns, as well as users and passwords. The terms name and identifier can be used interchangeably. There are two types of identifiers, standard identifiers and quoted or delimited identifiers. Identifiers must consist of only UTF-8 printable characters. ASCII letters in standard and delimited identifiers are case-insensitive and are folded to lowercase in the database. In query results, column names are returned as lowercase by default. To return column names in uppercase, set the describe_field_name_in_uppercase configuration parameter to true.
To preserve case:
SET enable_case_sensitive_identifier TO true;
https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html
To force returned uppercase fields (for anyone else curious):
SET describe_field_name_in_uppercase TO on;
https://docs.aws.amazon.com/redshift/latest/dg/r_describe_field_name_in_uppercase.html
Related
does anyone know how to include special characters like $ on a table name or if the table already exist with a $ on its name, how to ingest it on bigquery.
Thank you in advance,
That's not possible.
BigQuery column names must contain only letters, numbers, and undescores.
They must start with either a letter or an underscore.
Link to relevant doc: https://cloud.google.com/bigquery/docs/schemas#column_names
If a table contains special characters, you need to change its name before ingesting to BigQuery ($COLUMN_NAME -> DOLLAR_COLUMN_NAME, maybe?)
Bigquery column names (fields) can only contain English letters, numbers, and underscores.
I am using python and I want to create a script to migrate my data from Postgres to Bigquery and the Postgres tables have many non-english column names.
I will probably need to encode the column names to some format that Bigquery accepts, but I will need the ability to later decode it back to the original.
what is the best way to do this?
You can encode the column names to something like base64 and replace the +=/ characters to some kind of place holder.
If you don't care about fields length you can encode to base32 (its about 20% longer then base64 but don't use '+' or '/' and the '=' is used only for padding so you can discard it and it wont affect the string)
Except that you can make small conversion table for each non English character in your language to some combination in English chars, this will work only if you have small amount of non-english characters.
I want to create a SQL Server database that will hold thousands of tables who's names will reflect stock ticker names. For example, '0099-OL.HK' is a company's ticker name. Many of the stocks I'm creating tables for have special characters in them just like that.
I've read that special characters in table names should be avoided, but I still don't know why. SQL Server lets you use special characters in table names if you enclose the name with brackets, e.g., 'CREATE TABLE [0099-OL.HK] ...'.
Should I use the ticker names as their table names, or should I avoid using their special characters?
This will lead to no end of problems. The reason SQL Server allows names with spaces and special characters is because people migrate from databases that allow these characters in their names. If you must do this replace all special characters with _ like so: TN0099_OL_HK (TN for ticker name) so users can type sql without using the brackets.
It is bad practice to do so, since not every library might be able to process the table name correctly.
Avoid using special characters, spaces, and leading numbers in database names, table names, and column names.
For the full Rules for Regular Identifiers: Database Identifiers - docs
I am using oracle xe,while creating columns it is created in upper case.Is there any rule to make the entire database columns in lower case.
To answer to your specific question: No
(you cannot globally change all column headings to lowercase)
Nonquoted Identifiers:
Unless you use quotes when creating objects the default for storage is upper case and these are called nonquoted identifiers in documentation. This allows "case insensitive" use of those names. e.g. a field stored with the name DESCRIPTION can be used in lowercase or mixed case like dEscRipTion
Quoted Identifiers
If however you use "quoted identifiers" then you create case sensitive names and all references to them must be a precise match. So a field with the stored name of DesCripTion must always be used with exactly that mixed case and DescripTion would not work.
In short, don't mess with the defaults, you are way better off leaving them as "nonquoted identifiers"
see: Schema Object Naming Rules
use oracle "rename" syntax.
for example:
// existing column on TABLE1 is CITY_NAME
// new name is cityName
// you can rename table and/or columns
rename table SCHEMA1.TABLE1 to "NewTableName"
and
rename column SCHEMA1.TABLE1.CITY_NAME to "cityName"
the main thing to remember is to enclose the new name in quotes.
you have to modify your exiting views and cursors because they will break.
I tried to create table named 15909434_user with syntax like below:
CREATE TABLE 15909434_user ( ... )
It would produced error of course. Then, after I tried to have a bit research with google, I found a good article here that describe:
When you create an object in PostgreSQL, you give that object a name. Every table has a name, every column has a name, and so on. PostgreSQL uses a single data type to define all object names: the name type.
A value of type name is a string of 63 or fewer characters. A name must start with a letter or an underscore; the rest of the string can contain letters, digits, and underscores.
...
If you find that you need to create an object that does not meet these rules, you can enclose the name in double quotes. Wrapping a name in quotes creates a quoted identifier. For example, you could create a table whose name is "3.14159"—the double quotes are required, but are not actually a part of the name (that is, they are not stored and do not count against the 63-character limit). ...
Okay, now I know how to solve this by use this syntax (putting double quote on table name):
CREATE TABLE "15909434_user" ( ... )
You can create table or column name such as "15909434_user" and also user_15909434, but cannot create table or column name begin with numeric without use of double quotes.
So then, I am curious about the reason behind that (except it is a convention). Why this convention applied? Is it to avoid something like syntax limitation or other reason?
Thanks in advance for your attention!
It comes from the original sql standards, which through several layers of indirection eventually get to an identifier start block, which is one of several things, but primarily it is "a simple latin letter". There are other things too that can be used, but if you want to see all the details, go to http://en.wikipedia.org/wiki/SQL-92 and follow the links to the actual standard ( page 85 )
Having non numeric identifier introducers makes writing a parser to decode sql for execution easier and quicker, but a quoted form is fine too.
Edit: Why is it easier for the parser?
The problem for a parser is more in the SELECT-list clause than the FROM clause. The select-list is the list of expressions that are selected from the tables, and this is very flexible, allowing simple column names and numeric expressions. Consider the following:
SELECT 2e2 + 3.4 FROM ...
If table names, and column names could start with numerics, is 2e2 a column name or a valid number (e format is typically permitted in numeric literals) and is 3.4 the table "3" and column "4" or is it the numeric value 3.4 ?
Having the rule that identifiers start with simple latin letters (and some other specific things) means that a parser that sees 2e2 can quickly discern this will be a numeric expression, same deal with 3.4
While it would be possible to devise a scheme to allow numeric leading characters, this might lead to even more obscure rules (opinion), so this rule is a nice solution. If you allowed digits first, then it would always need quoting, which is arguably not as 'clean'.
Disclaimer, I've simplified the above slightly, ignoring corelation names to keep it short. I'm not totally familiar with postgres, but have double checked the above answer against Oracle RDB documentation and sql spec
I'd imagine it's to do with the grammar.
SELECT 24*DAY_NUMBER as X from MY_TABLE
is fine, but ambiguous if 24 was allowed as a column name.
Adding quotes means you're explicitly referring to an identifier not a constant. So in order to use it, you'd always have to escape it anyway.