Parsing Escape character with delimiter in csv into same field in bigquery - google-bigquery

I have a following text in csv file with delimiter as ','
Vliesbehang_0\,52&1\,04,103
I want the below output
Vliesbehang_0\,52&1\,04 | 103
but when I am doing the bq load it is ignoring the escape character and output I am getting is
Vliesbehang_0\ | 52&1\ | 04 | 103

I think you should replace the last delimiter with another symbol, like a semicolon (;) or a tab (\t). After that, you can use the option --field_delimiter to specify the new delimiter.

Related

How to add delimiter to String after every n character using hive functions?

I have the hive table column value as below.
"112312452343"
I want to add a delimiter such as ":" (i.e., a colon) after every 2 characters.
I would like the output to be:
11:23:12:45:23:43
Is there any hive string manipulation function support available to achieve the above output?
For fixed length this will work fine:
select regexp_replace(str, "(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})","$1:$2:$3:$4:$5:$6")
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
Another solution which will work for dynamic length string. Split string by the empty string that has the last match (\\G) followed by two digits (\\d{2}) before it ((?<= )), concatenate array and remove delimiter at the end (:$):
select regexp_replace(concat_ws(':',split(str,'(?<=\\G\\d{2})')),':$','')
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
If it can contain not only digits, use dot (.) instead of \\d:
regexp_replace(concat_ws(':',split(str,'(?<=\\G..)')),':$','')
This is actually quite simple if you're familiar with regex & lookahead.
Replace every 2 characters that are followed by another character, with themselves + ':'
select regexp_replace('112312452343','..(?=.)','$0:')
+-------------------+
| _c0 |
+-------------------+
| 11:23:12:45:23:43 |
+-------------------+

Insert comma after every 7th character using regex and hive sql

Insert comma after every 7th character and make sure the data is having comma after every 7th character correctly using regex in hive sql.
Also to ignore the space while selecting the 7th character.
Sample Input Data:
12F123f, 123asfH 0DB68ZZ, AG12453
112312f, 1212sfH 0DB68ZZ, AQ13463
Output:
12F123f,123asfH,0DB68ZZ,AG12453
112312f,1212sfH,0DB68ZZ,AQ13463
I tried the below code, but it didn't work and insert the commas correctly.
select regexp_replace('12345 12456,12345 123', '(/(.{5})/g,"$1$")','')
I think you can use
select regexp_replace('12345 12456,12345 123', '(?!^)[\\s,]+([^\\s,]+)', ',$1')
See the regex demo
Details
(?!^) - no match if at string start
[\s,]+ - 1 or more whitespaces or commas
([^\s,]+) - Capturing group 1: one or more chars other than whitespaces and commas.
The ,$1 replacement replaces the match with a comma and the value in Group 1.
You just want to replace the empty char to ,, am I right? the SQL as below:
select regexp_replace('12F123f,123asfH 0DB68ZZ,AG12453',' ',',') as result;
+----------------------------------+--+
| result |
+----------------------------------+--+
| 12F123f,123asfH,0DB68ZZ,AG12453 |
+----------------------------------+--+

read a csv file with comma as delimiter and escaping quotes in psql

I want to read a csv file which is separated by comma (,) but want to ignore comma within the double quotes (""). I want to store the result into a table.
Example:
abc,00.000.00.00,00:00:00:00:00:00,Sun Nov 01 00:00:00 EST 0000,Sun Nov 01 00:00:00 EST 0000,"Apple, Inc.",abcd-0000abc-a,abcd-abcd-a0000-00
Here I don't want to split on Apple, .
I know there exists csv reader in python and I can use it in plpython but that's slow considering millions of such strings! I would like a pure psql method!
Here is an example of reading a CSV file with an External Table using the CSV format.
CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt',
'gpfdist://etlhost-2:8082/*.txt')
FORMAT 'CSV' ( DELIMITER ',' )
LOG ERRORS SEGMENT REJECT LIMIT 5;
This was taken from the Greenplum docs too.
http://gpdb.docs.pivotal.io/530/admin_guide/external/g-example-4-single-gpfdist-instance-with-error-logging.html

PostgreSQL COPY FROM csv - csv formating issues

I have a csv file that I'm trying to import into my PostgreSQL database (v.10). I'm using the following basic SQL syntax:
COPY table (col_1, col_2, col_3)
FROM '/filename.csv'
DELIMITER ',' CSV HEADER
QUOTE '"'
ESCAPE '\';
First 30,000 lines or so are imported without any problem. But then I start bumping into formatting issues in the csv file that break the import:
Double quotes in double quotes: "value_1",""value_2"","value_3" or "value_1","val"ue_2","value_3"
The typical error I get is
ERROR: extra data after last expected column
So I started editing the csv file manually using Vim (the csv file has close to 7 million lines so can't really think of another desktop tool to use)
Is there anything I can do with my SQL syntax to handle those malformed strings? Using alternative ESCAPE clauses? Using regex?
Can you think of a way to handle those formatting issues in Vim or using another tool or function?
Thanks a lot!
Note that the file does not meet the CSV specification:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote.
You should specify a quote sign other than double-quote, for example '|':
create table test(a text, b text, c text);
copy test from '/data/example.csv' (format csv, quote '|');
select * from test;
a | b | c
-----------+-------------+-----------
"value_1" | ""value_2"" | "value_3"
"value_1" | "val"ue_2" | "value_3"
(2 rows)
You can get rid of the unwanted double-quotes using the trim() or replace() functions, e.g.:
update test
set a = trim(a, '"'), b = trim(b, '"'), c = trim(c, '"');
select * from test;
a | b | c
---------+----------+---------
value_1 | value_2 | value_3
value_1 | val"ue_2 | value_3
(2 rows)

How to add leading spaces in oracle?

I want to add leading spaces in one of the column of the table. This ID column has data type Char(6).
Example: Table1
ID
1234
5678
when I do select * from Table1. and save file into .csv with pipeline delimited.
It show spaces at the end of number.
Current output:
|1234 |
|5678 |
desired output
| 1234|
| 5678|
You'd need to trim the value to remove the trailing spaces and then lpad it to add the leading spaces
select lpad(trim(id),6)
from your_table
Here is a sqlfiddle example that shows the steps
Try:
select LPAD(trim(id), 2) from table