Error extracting data using TPT script for Fixed width table - sql

I am trying to export data from a fixed width table ( teradata)
The following is the error log
Found CheckPoint file: /path
This is a restart job; it restarts at step MAIN_STEP.
Teradata Parallel Transporter DataConnector Version 13.10.00.05
FILE_WRITER Instance 1 directing private log report to 'dataconnector_log-1'.
Teradata Parallel Transporter SQL Selector Operator Version 13.10.00.05
SQL_SELECTOR: private log specified: selector_log
FILE_WRITER: TPT19007 DataConnector Consumer operator Instances: 1
FILE_WRITER: TPT19003 ECI operator ID: FILE_WRITER-31608
FILE_WRITER: TPT19222 Operator instance 1 processing file 'path/out.dat'.
SQL_SELECTOR: connecting sessions
SQL_SELECTOR: TPT15105: Error 13 in finalizing the table schema definition
SQL_SELECTOR: disconnecting sessions
SQL_SELECTOR: Total processor time used = '0.02 Second(s)'
SQL_SELECTOR: Start : Sat Aug 9 12:37:48 2014
SQL_SELECTOR: End : Sat Aug 9 12:37:48 2014
FILE_WRITER: TPT19221 Total files processed: 0.
Job step MAIN_STEP terminated (status 12)
Job edwaegcp terminated (status 12)
TPT script used :
USING CHARACTER SET UTF8
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a file'
(
DEFINE SCHEMA PRODUCT_SOURCE_SCHEMA
(
id char(20)
);
DEFINE OPERATOR SQL_SELECTOR
TYPE SELECTOR
SCHEMA PRODUCT_SOURCE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'selector_log',
VARCHAR TdpId= '****',
VARCHAR UserName= '****',
VARCHAR UserPassword='*****',
VARCHAR SelectStmt= 'LOCKING ROW FOR ACCESS SELECT
CAST(id AS CHAR(20)),
FROM sample_db.sample_table'
);
DEFINE OPERATOR FILE_WRITER
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'dataconnector_log',
VARCHAR DirectoryPath = '/path/',
VARCHAR Format = 'Text',
VARCHAR FileName= 'out.dat',
VARCHAR OpenMode= 'Write'
);
APPLY TO OPERATOR (FILE_WRITER)
SELECT * FROM OPERATOR (SQL_SELECTOR);
);
Could you point out the error in the TPT script thats leading to this error?
or
How do we extract a fixed width table using TPT?

Related

Flink Window Aggregation using TUMBLE failing on TIMESTAMP

We have one table A in database. We are loading that table into flink using Flink SQL JdbcCatalog.
Here is how we are loading the data
val catalog = new JdbcCatalog("my_catalog", "database_name", username, password, url)
streamTableEnvironment.registerCatalog("my_catalog", catalog)
streamTableEnvironment.useCatalog("my_catalog")
val query = "select timestamp, count from A"
val sourceTable = streamTableEnvironment.sqlQuery(query) streamTableEnvironment.createTemporaryView("innerTable", sourceTable)
val aggregationQuery = select window_end, sum(count) from TABLE(TUMBLE(TABLE innerTable, DESCRIPTOR(timestamp), INTERVAL '10' minutes)) group by window_end
It throws following error
Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL validation failed. The window function TUMBLE(TABLE table_name, DESCRIPTOR(timecol), datetime interval[, datetime interval]) requires the timecol is a time attribute type, but is TIMESTAMP(6).
In short we want to apply windowing aggregation on an already existing column. How can we do that
Note - This is a batch processing
Timestamp columns used as time attributes in Flink SQL must be either TIMESTAMP(3) or TIMESTAMP_LTZ(3).
Column should be TIMESTAMP(3) or TIMESTAMP_LTZ(3) but also the column should be marked as ROWTIME.
Type this line in your code
sourceTable.printSchema();
and check the result. The column should be marked as ROWTIME as shown below.
(
`deviceId` STRING,
`dataStart` BIGINT,
`recordCount` INT,
`time_Insert` BIGINT,
`time_Insert_ts` TIMESTAMP(3) *ROWTIME*
)
You can find my sample below.
Table tableCpuDataCalculatedTemp = tableEnv.fromDataStream(streamCPUDataCalculated, Schema.newBuilder()
.column("deviceId", DataTypes.STRING())
.column("dataStart", DataTypes.BIGINT())
.column("recordCount", DataTypes.INT())
.column("time_Insert", DataTypes.BIGINT())
.column("time_Insert_ts", DataTypes.TIMESTAMP(3))
.watermark("time_Insert_ts", "time_Insert_ts")
.build());
watermark method makes it ROWTIME

Airflow: psycopg2.errors.InternalError_: Value too long for character type

I have a SUBDAG that fails and it gives me the following error, and I want to make sure which is the value too long that it's been inserted on the table and maybe with the location I can figure out which is the value that it's being to long.
The code is the following:
[2022-08-17 05:19:33,251] {postgres_operator.py:62} INFO - Executing: insert into user_visit (
select ew.uri uri
, date_trunc('day',ew.created_at) created_at
, ew.platform platform
, ew."type" "type"
, count(uuid) traffic
, ew."host" "host"
from event_web ew where date_trunc('day',ew.created_at) > date_trunc('day',coalesce((select max(created_at) from user_visit ),'1000-01-01')) group by uri,created_at,platform,"type","host");
[2022-08-17 05:19:33,262] {logging_mixin.py:112} INFO - [2022-08-17 05:19:33,262] {base_hook.py:84} INFO - Using connection to: id: dwh_redshift. Host: dwh-pro.cefs12046ciy.eu-central-1.redshift.amazonaws.com, Port: 5439, Schema: pro, Login: etl_user, Password: XXXXXXXX, extra: None
[2022-08-17 05:19:33,383] {logging_mixin.py:112} INFO - [2022-08-17 05:19:33,383] {dbapi_hook.py:174} INFO - insert into user_visit (
select ew.uri uri
, date_trunc('day',ew.created_at) created_at
, ew.platform platform
, ew."type" "type"
, count(uuid) traffic
, ew."host" "host"
from event_web ew where date_trunc('day',ew.created_at) > date_trunc('day',coalesce((select max(created_at) from user_visit ),'1000-01-01')) group by uri,created_at,platform,"type","host");
And the message error is this one:
psycopg2.errors.InternalError_: Value too long for character type
DETAIL:
-----------------------------------------------
error: Value too long for character type
code: 8001
context: Value too long for type character varying(2048)
query: 67463367
location: string.cpp:247
process: query1_221_67463367 [pid=31276]
This problem is not related to airflow. When you insert your data to the table user_visit, there is a column in this table of type varchar(2048) which doesn't accept a string with more than 2048 characters, but in your data you are trying to insert, there is a record which has a value of this column with size > 2048. You can check your data to know what is the max size you can have, then update the table by increasing the column varchar size.

Recursive Query on Graph ends with error "could not write to tuplestore temporary file" in PostgreSQL

I ended up with a table storing a network topology as follows:
create table topology.network_graph(
node_from_id varchar(50) not null,
node_to_id varchar(50) not null
PRIMARY KEY (node_from_id, node_to_id)
);
The expected output data in something like this, where all the sub-graphs starting from the node "A" are listed:
Now I try to find the paths between the nodes, starting at a specific node using this query:
WITH RECURSIVE network_nodes AS (
select
node_from_id,
node_to_id,
0 as hop_count,
,ARRAY[node_from_id::varchar(50), node_to_id::varchar(50)] AS "path"
from topology.network_graph
where node_from_id = '_9EB23E6C4C824441BB5F75616DEB8DA7' --Set this node as the starting element
union
select
nn.node_from_id,
ng.node_to_id,
nn.hop_count + 1 as hop_count,
, (nn."path" || ARRAY[ng.node_to_id])::varchar(50)[] AS "path"
from topology.network_graph as ng
inner join network_nodes as nn
on ng.node_from_id = nn.node_to_id
and ng.node_to_id != ALL(nn."path")
)
select node_from_id, node_to_id, hop_count, "path"
from network_nodes
order by node_from_id, node_to_id, hop_count ;
The query runs several minutes before throwing the error:
could not write to tuplestore temporary file: No space left on device
recursive query postgresql
The topology.network_graph has 2148 records and during the query execution the base/pgsql_tmp directory grows bis some GBs. It seems I have an infinite loop.
Can someone find what could be wrong?

Hive simple Regular expression

I am trying to check if all data within in a column is having a valid date.
create table dates (tm string, dt string) row format delimited fields terminated by '\t'
date.txt(sample data)
20181205 15
20171023 23
20170516 16
load data local inpath 'dates.txt' overwrite into table dates;
create temporary macro isitDate(s string)
case when regexp_extract(s,'((0[1-9]|[12][0-9]|3[01])',0) = ''
then false
else true
end;
select * from dates where isitDate(dt);
But select statement is giving below error-
Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public java.lang.String
org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer)
on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract#66b45e1e of
class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments
{15:java.lang.String, ((0[1-9]|[12][0-9]|3[01]):java.lang.String,
0:java.lang.Integer} of size 3
Is there something wrong with my regular expression.
made a stupid mistake, there is one extra opening bracket in macro

Pig - reading Hive table stored as Avro

I have created a hive table stored with Avro file format. I am trying to load same hive table using below Pig commands
pig -useHCatalog;
hive_avro = LOAD 'hive_avro_table' using org.apache.hive.hcatalog.pig.HCatLoader();
I am getting " failed to read from hive_avro_table " error when I tried to display "hive_avro" using DUMP command.
Please help me how to resolve this issue. Thanks in advance
create table hivecomplex
(name string,
phones array<INT>,
deductions map<string,float>,
address struct<street:string,zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#'
STORED AS AVRO
;
hive> select * from hivecomplex;
OK
John [650,999,9999] {"pf":500.0} {"street":"pleasantville","zip":88888}
Time taken: 0.078 seconds, Fetched: 1 row(s)
Now for the pig
pig -useHCatalog;
a = LOAD 'hivecomplex' USING org.apache.hive.hcatalog.pig.HCatLoader();
dump a;
ne.util.MapRedUtil - Total input paths to process : 1
(John,{(650),(999),(9999)},[pf#500.0],(pleasantville,88888))