I have a SQL database with tables imported from excel csv files. There are some tables that have field names with special characters like a space, /, $, #, % among others.
Is there a way to dynamically look at every field column name in every table and replace the special character with an underscore, some other string (for example: replace % with PCT or # with NBR or $ with CUR) or even delete the character all together?
Thank you in advance.
Related
In my project I saw two Hive tables and in the create table statement I saw one table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0004' and another table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C'. I want to know what does these '\u0004' and '\u001C' mean and when to use them? Kindly answer.
In many text formats, \u introduces a Unicode escape sequence. This is a way of storing or sending a character that can't be easily displayed or represented in the format you're using. The four characters after the \u are the Unicode "code point" in hexadecimal. A Unicode code point is a number denoting a specific Unicode character.
All characters have a code point, even the printable ones. For example, a is U+0061.
U+0004 and U+001C are both unprintable characters, meaning there's no standard character you can use to display them on the screen. That's why an escape sequence is used here.
If you use a simple, printable character like , as your field delimiter, it will make the stored data easier for a human to read. The field values will be stored with a , between each one. For example, you might see the values one, two and three stored as:
one,two,three
But if you expect your field values to actually contain a ,, it would be a poor choice of field delimiter (because then you'd need a special way to tell the difference between a single field with a value of one,two or two different fields with the values one and two). The choice of delimiter depends both on whether you want to be able to read it easily, and what characters you expect the field to contain.
does anyone know how to include special characters like $ on a table name or if the table already exist with a $ on its name, how to ingest it on bigquery.
Thank you in advance,
That's not possible.
BigQuery column names must contain only letters, numbers, and undescores.
They must start with either a letter or an underscore.
Link to relevant doc: https://cloud.google.com/bigquery/docs/schemas#column_names
If a table contains special characters, you need to change its name before ingesting to BigQuery ($COLUMN_NAME -> DOLLAR_COLUMN_NAME, maybe?)
I'm trying to pull data from a column called file name in which users have to upload the file name with only numericals for eg: 245654, 346595 , 700542. But in few cases i have also noted users where using special characters and aplhabets for e.g. 245654 / Abc, 654658-cgds,78345|ghj. I need to extract all such entries where along with numericals such special characters and alphabets are also noted.
You may use regex like here:
SELECT *
FROM yourTable
WHERE filename ~ '[^0-9]';
The above query will return any record whose file name has one or more non digit characters in it.
Bigquery column names (fields) can only contain English letters, numbers, and underscores.
I am using python and I want to create a script to migrate my data from Postgres to Bigquery and the Postgres tables have many non-english column names.
I will probably need to encode the column names to some format that Bigquery accepts, but I will need the ability to later decode it back to the original.
what is the best way to do this?
You can encode the column names to something like base64 and replace the +=/ characters to some kind of place holder.
If you don't care about fields length you can encode to base32 (its about 20% longer then base64 but don't use '+' or '/' and the '=' is used only for padding so you can discard it and it wont affect the string)
Except that you can make small conversion table for each non English character in your language to some combination in English chars, this will work only if you have small amount of non-english characters.
I want to create a SQL Server database that will hold thousands of tables who's names will reflect stock ticker names. For example, '0099-OL.HK' is a company's ticker name. Many of the stocks I'm creating tables for have special characters in them just like that.
I've read that special characters in table names should be avoided, but I still don't know why. SQL Server lets you use special characters in table names if you enclose the name with brackets, e.g., 'CREATE TABLE [0099-OL.HK] ...'.
Should I use the ticker names as their table names, or should I avoid using their special characters?
This will lead to no end of problems. The reason SQL Server allows names with spaces and special characters is because people migrate from databases that allow these characters in their names. If you must do this replace all special characters with _ like so: TN0099_OL_HK (TN for ticker name) so users can type sql without using the brackets.
It is bad practice to do so, since not every library might be able to process the table name correctly.
Avoid using special characters, spaces, and leading numbers in database names, table names, and column names.
For the full Rules for Regular Identifiers: Database Identifiers - docs