DBT RUN - Getting Database Error using VS Code BUT Not Getting Database Error using DBT Cloud - dbt

I'm using DBT connected to Snowflake. I use DBT Cloud, but we are moving to using VS Code for our DBT project work.
I have an incremental DBT model that compiles and works without error when I issue the DBT RUN command in DBT Cloud. Yet when I attempt to run the exact same model from the same git branch using the DBT RUN command from the terminal in VS Code I get the following error:
Database Error in model dim_cifs (models\core_data_warehouse\dim_cifs.sql)
16:45:31 040050 (22000): SQL compilation error: cannot change column LOAN_MGMT_SYS from type VARCHAR(7) to VARCHAR(3) because reducing the byte-length of a varchar is not supported.
The table in Snowflake defines this column as VARCHAR(50). I have no idea why DBT is attempting to change the data length or why it only happens when the command is run from VS Code Terminal. There is no need to make this DDL change to the table.
When I view the compiled SQL in the Target folder there is nothing that indicates a DDL change.
When I look in the logs I find the following, but don't understand what is triggering the DDL change:
describe table "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS"
16:45:31.354314 [debug] [Thread-9 (]: SQL status: SUCCESS 36 in 0.09 seconds
16:45:31.378864 [debug] [Thread-9 (]:
In "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS":
Schema changed: True
Source columns not in target: []
Target columns not in source: []
New column types: [{'column_name': 'LOAN_MGMT_SYS', 'new_type': 'character varying(3)'}]
16:45:31.391828 [debug] [Thread-9 (]: Using snowflake connection "model.xxxxxxxxxx.dim_cifs"
16:45:31.391828 [debug] [Thread-9 (]: On model.xxxxxxxxxx.dim_cifs: /* {"app": "dbt", "dbt_version": "1.1.1", "profile_name": "xxxxxxxxxx", "target_name": "dev", "node_id": "model.xxxxxxxxxx.dim_cifs"} */
alter table "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS" alter "LOAN_MGMT_SYS" set data type character varying(3);
16:45:31.546962 [debug] [Thread-9 (]: Snowflake adapter: Snowflake query id: 01a5bc8d-0404-c9c1-0000-91b5178ac72a
16:45:31.548895 [debug] [Thread-9 (]: Snowflake adapter: Snowflake error: 040050 (22000): SQL compilation error: cannot change column LOAN_MGMT_SYS from type VARCHAR(7) to VARCHAR(3) because reducing the byte-length of a varchar is not supported.
Any help is greatly appreciated.

Related

Delta Table : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM'

I am trying to run the query on EMR/EMR Notebooks (Spark with Scala) -
SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.`s3://a/b/c/d`)
But I am getting the following error -
The same query works fine on Databricks.
Another doubt that I have is - why does the colour of s3 location change post //.
So I tried to break the above query and only run the Describe HISTORY query. And for some reason it says -
Error Log -
An error was encountered:
org.apache.spark.sql.AnalysisException: Table or view not found: HISTORY;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:835)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:787)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:817)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:756)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:630)
at org.apache.spark.sql.execution.command.DescribeColumnCommand.run(tables.scala:714)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:196)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:81)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:644)
... 50 elided
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'history' not found in database 'default';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:141)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:98)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:722)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:706)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:832)
UPDATED (18-Feb-2021) -> What I have tried till now.
Query Using Spark Sql -
spark.sql("SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.s3://a/b/c/d)")
But this Didnt work. Same Error.
Create Spark Session with -
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
and spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog.
But its throwing the same error.
UPDATE 2 (18-Feb-2021) :- Trying the approach as mentioned by #alex.
Using PySpark.
It was working partly and but not completely.
Thanks in Advance.
Per documentation, to get support for DESCRIBE HISTORY you need to configure Spark SQL Extensions and Catalog by passing 2 properties (see docs):
spark.sql.extensions to value io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog to value org.apache.spark.sql.delta.catalog.DeltaCatalog
Update:
For Spark 2.4.x, the Delta 0.6.1 should be used, and its documentation has following code snippet to activate extensions:
spark.sparkContext._jvm.io.delta.sql.DeltaSparkSessionExtension() \
.apply(spark._jsparkSession.extensions())
spark = SparkSession(spark.sparkContext, spark._jsparkSession.cloneSession())

Create table name using username in Hive query running in Oozie workflow?

I've got a Hive SQL script/action as part of an Oozie workflow. I'm doing a CREATE TABLE AS SELECT to output the results. I want to name the table using the username plus an appended string (e.g. "User123456_output_table"), but can't seem to get the correct syntax.
set tablename=${hivevar:current_user()};
CREATE TABLE `${hiveconf:tablename}_output_table` AS SELECT ...
That doesn't work and gives:
Error while compiling statement: FAILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: ${hivevar:current_user()%7D_output_table
Or changing the first line to set tablename=${current_user()}; starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [${current_user()}_output_table]: is not a valid table name
Or changing the first line to set tablename=current_user(); starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [current_user()_output_table]: is not a valid table name
Alternatively, is there a way to pass the username from the Oozie workflow via a parameter?
I'm using Hue to do all this rather than the command line.
Thanks
This is wrong: set tablename=${hivevar:current_user()}; - it will not be resolved and substituted as is.
Hive does not calculate variables before substitution, it substitutes them as is, all functions in variables are NOT calculated. variables are just text replacement.
This:
set tablename=current_user();
CREATE TABLE `${hiveconf:tablename}_output_table` ...
gets resolved as
CREATE TABLE `current_user()_output_table` ...
And functions are not supported in table names, it will not work this way.
The solution is to calculate functions outside the script and pass them as parameters.
See this blog: https://prodlife.wordpress.com/2013/12/06/parameterizing-hive-actions-in-oozie-workflows/

TDI for HCL Connections 6.5 synchronization fails with "bad SQL grammar [];" error

I'm using Tivoli Directory Integrator (TDI) to sync users from Domino LDAP to the local DB2 people database of HCL Connections. On a test installation, I got the following error when trying to initially sync the users:
[root#cnx65 tdisol]# LANG=en_US.utf8 ./sync_all_dns.sh
create synchronization lock
log4j:WARN No appenders could be found for logger (server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
**********
CLFRN1275I: Begin to hash records in database.
CLFRN1269I: Finish hash records in database.
**********
"message": "CLFRN1254E: An error occurred while performing findEntry: {0}."
"exception": "com.ibm.lconn.profiles.api.tdi.service.TDIException: CLFRN1254E: An error occurred while performing findEntry: {0}."
Synchronize of Database Repository failed
HCLs documentation recommend to check the logs in case of CLFRN1254E. The file logs/SyncUpdates.log contains the following exception:
2020-01-21 07:50:03,803 INFO [org.apache.log4j.DailyRollingFileAppender.7431103d-4d0a-4d63-bdb7-61e274f23ed4] - CTGDIS092I Use entry provided at runtime as work entry (first pass only).
2020-01-21 07:50:11,723 ERROR [org.apache.log4j.DailyRollingFileAppender.7431103d-4d0a-4d63-bdb7-61e274f23ed4] - [hash_db_entries] CTGDIS181E Error while evaluating the hook 'Function error' in the component 'hash_db_entries (hash_db_entries.functioncall_fail).
com.ibm.lconn.profiles.api.tdi.service.TDIException: CLFRN1254E: An error occurred while executing findEntry: {0}.
at com.ibm.lconn.profiles.api.tdi.connectors.ProfileConnector$ProfileCodeBlock.handleRecoverable(ProfileConnector.java:1063)
at com.ibm.lconn.profiles.api.tdi.connectors.Util.TDICodeRunner.run(TDICodeRunner.java:41)
at com.ibm.lconn.profiles.api.tdi.connectors.ProfileConnector.getNextEntry(ProfileConnector.java:155)
at com.ibm.di.server.AssemblyLineComponent.executeOperation(AssemblyLineComponent.java:3370)
at com.ibm.di.server.AssemblyLineComponent.getnext(AssemblyLineComponent.java:932)
at com.ibm.di.server.AssemblyLine.msGetNextIteratorEntry(AssemblyLine.java:3689)
at com.ibm.di.server.AssemblyLine.executeMainStep(AssemblyLine.java:3388)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:3000)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:2983)
at com.ibm.di.server.AssemblyLine.executeAL(AssemblyLine.java:2952)
at com.ibm.di.server.AssemblyLine.run(AssemblyLine.java:1319)
Caused by: org.springframework.jdbc.BadSqlGrammarException: SqlMapClient operation; bad SQL grammar []; nested exception is com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the TDIProfile.get-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: com.ibm.db2.jcc.c.SqlException: DB2 SQL error: SQLCODE: -551, SQLSTATE: 42501, SQLERRMC: LCUSER;SELECT;EMPINST.EMPLOYEE
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:97)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:212)
at org.springframework.orm.ibatis.SqlMapClientTemplate.executeWithListResult(SqlMapClientTemplate.java:249)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForList(SqlMapClientTemplate.java:296)
at com.ibm.lconn.profiles.internal.service.store.sqlmapdao.TDIProfileSqlMapDao.get(TDIProfileSqlMapDao.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
What could be the problem? How could I find out more information why this error occurs?
What I already tried
Increase log level
In profiles_tdi.properties I enabled debug logs for every component:
debug_collect=true
debug_draft=true
debug_fill_codes=true
debug_managers=true
debug_photos=true
debug_pronounce=true
debug_special=true
debug_update_profile=true
trace_profile_tdi_javascript=on
Since this had no effect, I set the log4j level to debug for the entire application in etc/log4j.properties:
log3j.rootCategory=DEBUG, Default
Also tried ALL instead of DEBUG. However, there is no change in the output. I expected to see the SQL query, which caused the exception.
Set mode in properties
According to this post, the mode attribute will be used to decide if an user is internal or external. Since the example config says
Actually, any string other than "external" is interpreted as employee.
it is set to mode=memberType. Also tried mode=uid and mode=mail. Both are fields containing a string not equal to "external", so this should result in all members imported as internal users.
Sync single users
Since my LDAP filter applies to around 60 users, I ran ./collect_dns.sh successfully and removed all users from collect.dns file except my own. Then sync the user from the dn file with ./populate_from_dn_file.sh. This was done for two other users, resulting always in the same error:
CLFRN0027I: After operation, success records is 0, duplicate records 0, failure records is 1, and last successful entry is null.
CLFRN1280I: 20200121105123 Iterations total number: 1.
The only difference is that logs/PopulateDBFromDNFile.log contains more detailled information about the fetched attributes, mappings and so on. Unfortunately, it doesn't really help me in terms of the error, since it produces a similiar message:
2020-01-21 10:55:27,530 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] [setup_if_lookup] CTGDIS126I Return false.
2020-01-21 10:55:27,530 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] [setup_if_lookup] CTGDIS123I Returned object class java.lang.Boolean.
2020-01-21 10:55:27,530 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] CTGDIS075I Trying to exit TaskCallBlock.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] CTGDIS076I Succeeded exiting TaskCallBlock.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] CTGDIS057I Hook after_functioncall not enabled.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] CTGDIS352I Use null Behavior for $_already_lookup_manager.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] CTGDIS351I Map Attribute $manager_uid [1].
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] CTGDIS353I Script is: conn["$manager_uid"]
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] CTGDIS352I Use null Behavior for $manager_uid.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] CTGDIS057I Hook functioncall_ok not enabled.
2020-01-21 10:55:27,531 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [add_manager_data] CTGDIS057I Hook default_ok not enabled.
2020-01-21 10:55:27,538 INFO [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] Result: <My Name of the User in dn file>
2020-01-21 10:55:27,591 ERROR [com.ibm.di.log.FileRollerAppender.268b5e1d-d0fc-4a7c-9e12-4d742c44faa5] - [callSyncDB_mod] [ProfileConnector] SqlMapClient operation; bad SQL grammar []; nested exception is com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the TDIProfile.get-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: com.ibm.db2.jcc.c.SqlException: DB2 SQL error: SQLCODE: -551, SQLSTATE: 42501, SQLERRMC: LCUSER;SELECT;EMPINST.EMPLOYEE
Found out that this was a unlucky logical mistake from me. The database is created using sql files, shipped with the Connections Installation Wizard. I automatically import them in a loop. Since it was very slow (about 30 min for all scripts), I tried to parallelize them by adding a & at the end of the command and finally wait at the end to make sure all scripts were executed.
- name: Check and create non existing DBs for CNX
become: yes
become_user: "{{ db2.instance.name }}"
shell: |
db={{ item.name }}
scripts=({{ item.files | join(' ') }})
existing_dbs=$(echo -e '{{ existing_dbs.stdout }}')
echo "Check db ${db}"
if ! echo ${existing_dbs} | grep -q ${db}; then
echo "DB ${db} doesn't exist, execute scripts"
for script in "${scripts[#]}"
do
echo "${db}: Execute script ${script}"
{{ db2.target }}/bin/db2 -td# -f {{ cnx_sql_dir }}/${script} &
done
wait
fi
register: db_check
changed_when: "'execute scripts' in db_check.stdout"
loop: "{{ cnx.db_scripts }}"
cnx.db_scripts is a mapping of database names to SQL files:
db_scripts:
- name: PEOPLEDB
files:
- profiles/db2/createDb.sql
- profiles/db2/appGrants.sql
- name: FORUM
files:
# - ...
In retrospect, this was a terrible logical mistake because I missed the fact that those scripts rely on each other: When profiles/db2/appGrants.sql is executed before profiles/db2/createDb.sql was finished, it wouldn't work because the db doesn't exists.
As a result, TDIs queries failed because the database and tables were only partly created. I didn't notice this immediately, since the machine was several re-deployed during of the Ansible playbook development. Strangely, TDI only failed in 2 of 10 deployments. Seems like DB2 make some kind of queue and depending on the timing, the people database required for TDI is created successfully on some runs.

How to fix org.apache.kafka.common.config.ConfigException: Missing required configuration "group.id" which has no default value

I have a Kafka topic setup and am attempting to create an external table in Hive to query the Kafka stream.
However, when querying the external table I get the error message
Error: java.io.IOException: org.apache.kafka.common.config.ConfigException: Missing required configuration "group.id" which has no default value. (state=,code=0)
Tried putting group.id in the server.properties when starting the Kafka server.
Tried putting group.id in external table properties.
CREATE EXTERNAL TABLE kafka_table2
(`timestamp` timestamp , `page` string, `newPage` boolean,
added int, deleted bigint, delta double)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "connect-test", "kafka.bootstrap.servers"="mykafka:9092","kafka.group.id"="1")
INFO : Completed compiling command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7); Time taken: 0.6 seconds
INFO : Executing command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7): select * from kafka_table2
INFO : Completed executing command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7); Time taken: 0.018 seconds
INFO : OK
Error: java.io.IOException: org.apache.kafka.common.config.ConfigException: Missing required configuration "group.id" which has no default value. (state=,code=0)
You should put "kafka.consumer.group.id"="1" and not "kafka.group.id"="1" in TBLPROPERTIES.
See: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_set_consumer_producer.html

BigQuery loads manually but not through the Java SDK

I have a Dataflow pipeline, running locally. The objective is to read a JSON file using TEXTIO, make sessions and load it into BigQuery. Given the structure I have to create a temp directory in GCS and then load it into BigQuery using that. Previously I had a data schema error that prevented me to load the data, see here. That issue is resolved.
So now when I run the pipeline locally it ends with dumping a temporary JSON newline delimited file into GCS. The SDK then gives me the following:
Starting BigQuery load job beam_job_xxxx_00001-1: try 1/3
INFO [main] (BigQueryIO.java:2191) - BigQuery load job failed: beam_job_xxxx_00001-1
...
Exception in thread "main" com.google.cloud.dataflow.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:187)
at pedesys.Dataflow.main(Dataflow.java:148)
Caused by: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.load(BigQueryIO.java:2198)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.processElement(BigQueryIO.java:2146)
The errors are not very descriptive and the data is still not loaded in BigQuery. What is puzzling is that if I go to the BigQuery UI and load the same temporary file from GCS that was dumped by the SDK's Dataflow pipeline manually, in the same table, it works beautifully.
The relevant code parts are as follows:
PipelineOptions options = PipelineOptionsFactory.create();
options.as(BigQueryOptions.class)
.setTempLocation("gs://test/temp");
Pipeline p = Pipeline.create(options)
...
...
session_windowed_items.apply(ParDo.of(new FormatAsTableRowFn()))
.apply(BigQueryIO.Write
.named("loadJob")
.to("myproject:db.table")
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
);
The SDK is swallowing the error/exception and not reporting it to users. It's most likely a schema problem. To get the actual error that is happening you need to fetch the job details by either:
CLI - bq show -j job beam_job_<xxxx>_00001-1
Browser/Web: use "try it" at the bottom of the page here.
#jkff has raised an issue here to improve the error reporting.