sqoop importing string column of a dataset containing "," in it

sqoop importing string column of a dataset containing "," in it - sql

the Dataset I am importing contains string columns with "," in them.
When I try to import , the string value is getting split into fields.
Here is my sqoop script:
sqoop import --connect 'jdbc:sqlserver://XXX.XX.XX.XX:51260;database=Common' -username=BIG_DATA -P --table Carriers --hive-import --hive-table common.Carriers --hive-drop-import-delims --optionally-enclosed-by '\"' --map-column-hive UpdatedDate=string,ResourceID=string --lines-terminated-by '\n' -- --schema Truck -m 10
the sqoop command works fine for integer type columns but it splits the string columns as they contain ","(camma) within the string . so is there any way to escape it while parsing the string containing ","

adding this --fields-terminated-by '^' to sqoop import solved similar problem of mine

This should work
$ sqoop import --fields-terminated-by , --escaped-by \ --enclosed-by '\"' ...

Related

impala shell output csv file generation with enclosure

I am new to Impala. I am trying to fetch data from a table and load it to csv file. But I want to enclose the data in double quotes as there is a conflict in delimiter. How can I enclose the data with double quotes for each field?
query = "select t1max,t2max,rest_call from topo_tax"
result_string = 'env -i /usr/bin/impala-shell -i "'+ impalad+':'+port +'" -u "'+user+'" -d "'+database+'" -B --delimited -q "'+query+'"' ' -o /tmp/data_dump.csv --print_header --output_delimiter=,'

As a workaround, you might give delimiter as ","

sqoop is not working as expected NULL Sting from source is populated as \\N

I am trying to use null-string while pulling data from Oracle SQL Server
But for null value in SQL server while importing to HDFS I am getting '\\N' value and for which Hive table is also populated with '\\N' value .
I use as
sqoop --connect ABCD --username A \
--password B#123 \
--escaped-by \\ \
--null-string '\\N' \
--target-dir /user/anu/dear
But in hdfs I am getting as :
Anu,'\\N',Boledion,12638
Expected Results :
Anu,\N,Boledion,12638
NB: I tried using '\N' - Getting Sqoop Java error saying '\N' cannot process or wrong escape .
And even tried '\\N' as mentioned in cloudera QnA
4 Slashed
It is highly appreciated in I can get this solved.

Multiple character delimiter using apache sqoop import

I am importing data from teradata(RDBMS) to hive using apache sqoop. The usual delimiters used for import like ",", "|", "~" are present in the tables. Is there a way to use multiple characters as delimiters in apache sqoop.
To avoid it, I have used --escaped-by "\t" and --fields-terminated-by "," parameters in sqoop import command. So is there a way to 'unescape' the "\t" I used in sqoop import.

I use the '\b' delimiter whenever I get challenging tables that contain large data fields containing text that might have TABS and CR/LF characters. '\b' is as BACKSPACE which is very difficult to insert into a character firld in most databases.
Here is an example of the sqoop command I use:
sqoop import
--connect "jdbc:sqlserver://myserver;DatabaseName=MyDB;user=MyUser;password=MyPassword;port=1433"
--warehouse-dir=/user/MyUser/Import/MyDB
--fields-terminated-by '\b' --num-mappers 8
--table training_deficiency
--hive-table stage.training_deficiency
--hive-import --hive-overwrite
--hive-delims-replacement '<newline>'
--split-by Training_Deficiency_ID
--outdir /home/MyUser/sqoop/java
--where "batch_update_dt > '2016-12-09 23:06:44.69'"

sqoop import into hive

1st command:
sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–table departments \
–hive-home /user/hive/warehouse \
–hive-import \
–hive-overwrite \
–hive-table sqoop_import.departments \
–outdir java_files
2nd command:
sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–table departments \
–target-dir=/user/hive/warehouse/department_test \
–append
In both the commands we are creating table in Hive without specifying field and line delimiters and importing using sqoop, then why in second case we are getting Null and not in first case

Hive's default delimiter
Field: CTRL+A
LINE : \n
Case 1 : HIVE IMPORT
Import tables into Hive (Uses Hive’s default delimiters if none are set.)
Also, it creates table mentioned in --hive-table (if not exists) with hive's default delimiter.
Case 2 : HDFS IMPORT
In this case, data from RDBMS is stored as , field delimiter and \n line delimiter (default) which is not the default delimiters for hive. That's why you are getting NULL entries in your data.
You can solve it using two ways:
Change your hive table's field delimiter
use --fields-terminated-byin your IMPORT command.

Passing parameter in sqoop

Below is my sqoop cmd in shell script,
sqoop import --connect 'jdbc:sqlserver://190.148.155.91:1433;username=****;password=****;database=Testdb' --query 'Select DimFreqCellRelationID,OSSC_RC, MeContext, ENodeBFunction,EUtranCellFDD,EUtranFreqRelation, EUtranCellRelation FROM dbo.DimCellRelation WHERE DimFreqCellRelationID > **$maxval** and $CONDITIONS' --split-by OSS --target-dir /testval;
Before executing this command, i have assigned a value to $maxval , when I execute sqoop cmd value should get passed in place of $maxval. But thats not happning. Is it possible to pass parameter through sqoop. Can you please let me know if you have any suggestion to achieve this logic?

I believe that the problem you are seeing is with incorrect enclosing. Using single quotes (') will prohibit bash to perform any substitutions. You need to use double quotes (") if you want to use variables inside the parameter. However you also have to be careful as you do not want to substitute the $CONDITIONS placeholder. Try it without Sqoop:
jarcec#Odie ~ % echo '$maxval and $CONDITIONS'
$maxval and $CONDITIONS
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
and
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
jarcec#Odie ~ % export maxval=30
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
30 and
jarcec#Odie ~ % echo "$maxval and \$CONDITIONS"
30 and $CONDITIONS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sqoop importing string column of a dataset containing "," in it - sql

adding this --fields-terminated-by '^' to sqoop import solved similar problem of mine

This should work $ sqoop import --fields-terminated-by , --escaped-by \ --enclosed-by '\"' ...

Related

impala shell output csv file generation with enclosure

sqoop is not working as expected NULL Sting from source is populated as \\N

Multiple character delimiter using apache sqoop import

sqoop import into hive

Passing parameter in sqoop

Categories

Resources