I have one job and two transformations.
In first transformation I have Table input where I get max(id) from table.
Next I copy rows to result -> Write to log -> Set Variables.
In Set Variables I have Field name = max_id and variable name = max_id.
In second transformation I have Get Variables with name = max_id and Variable = ${max_id} and Type = Integer -> Write to log
Next I have Table input =
select id AS CCOLUMNS from REPO.dbo.STATUS with (nolock)
where id = ${max_id}
I get in log:
2019/09/11 15:21:33 - Merge Join.0 - offending row : [max_id Integer(9)]
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 - Error setting value #1 [Integer(9)] on prepared statement
2019/09/11 15:21:33 - Merge Join.0 - The index 1 is out of range.
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2354)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.getQueryFields(Database.java:2193)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.trans.steps.tableinput.TableInputMeta.getFields(TableInputMeta.java:253)
2019/09/11 15:21:33 - Merge Join.0 - ... 9 more
2019/09/11 15:21:33 - Merge Join.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
2019/09/11 15:21:33 - Merge Join.0 - offending row : [max_id Integer(9)]
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 - Error setting value #1 [Integer(9)] on prepared statement
2019/09/11 15:21:33 - Merge Join.0 - The index 1 is out of range.
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.setValues(Database.java:1076)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2328)
2019/09/11 15:21:33 - Merge Join.0 - ... 11 more
2019/09/11 15:21:33 - Merge Join.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
2019/09/11 15:21:33 - Merge Join.0 - Error setting value #1 [Integer(9)] on prepared statement
2019/09/11 15:21:33 - Merge Join.0 - The index 1 is out of range.
2019/09/11 15:21:33 - Merge Join.0 -
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.row.value.ValueMetaBase.setPreparedStatementValue(ValueMetaBase.java:5165)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.setValue(Database.java:1058)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.database.Database.setValues(Database.java:1074)
2019/09/11 15:21:33 - Merge Join.0 - ... 12 more
2019/09/11 15:21:33 - Merge Join.0 - Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The index 1 is out of range.
2019/09/11 15:21:33 - Merge Join.0 - at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227)
2019/09/11 15:21:33 - Merge Join.0 - at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.setterGetParam(SQLServerPreparedStatement.java:919)
2019/09/11 15:21:33 - Merge Join.0 - at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.setNull(SQLServerPreparedStatement.java:1491)
2019/09/11 15:21:33 - Merge Join.0 - at org.pentaho.di.core.row.value.ValueMetaBase.setPreparedStatementValue(ValueMetaBase.java:5052)
This happens when you configure the Table Input step to make use of incoming fields from the stream by specifying a step in "Insert data from step". That's not needed when you use named variables (and variable substitution).
Manually clear the step name from that and you should be good to go.
For more detailed info, see my answer here: https://stackoverflow.com/a/43651035/6803853
Related
I am trying to execute the following:
jdbcTemplate.update("DELETE FROM my_table WHERE created <= (NOW() - interval '? milliseconds')", 1);
but getting the following error:
org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [DELETE FROM my_table WHERE created <= (NOW() - interval '? milliseconds')]; The column index is out of range: 1, number of columns: 0.; nested exception is org.postgresql.util.PSQLException: The column index is out of range: 1, number of columns: 0.
I think it may be because the ? is inside a string literal:
'? milliseconds'
but I'm not sure what the solution is?
I can execute the following ok using the SQL editor, and get results back:
select * from my_table where created <= (NOW() - interval '1 milliseconds');
Two queries:
How to convert long type column having numbers in seconds to calendarinterval type having in Python
Spark SQL?
How to convert the below code to plain Spark SQL query:
from pyspark.sql.functions import unix_timestamp
df2 = df.withColumn(
"difference_duration",
unix_timestamp("CAL_COMPLETION_TIME") - unix_timestamp("Prev_Time")
)
Sample dataframe SS:
Basically am trying to achieve this below PGSQL query in Spark SQL:
case
when t1.prev_time <> t1.prev_time_calc and t1."CAL_COMPLETION_TIME" - t1.prev_time < interval '30 min'
then t1.next_time_calc - t1.prev_time_calc
when (t1.next_time <> t1.next_time_calc and t1.next_time - t1."CAL_COMPLETION_TIME" < interval '30 min') or (t1.next_time - t1."CAL_COMPLETION_TIME" < interval '30 min')
then t1.next_time_calc - t1."CAL_COMPLETION_TIME"
else null
end min_diff
But this part t1."CAL_COMPLETION_TIME" - t1.prev_time < interval '30 min' is throwing following error:
AnalysisException: "cannot resolve '(t1.`CAL_COMPLETION_TIME` - t1.`prev_time`)' due to data type mismatch: '(t1.`CAL_COMPLETION_TIME` - t1.`prev_time`)' requires (numeric or calendarinterval) type, not timestamp;
You can't subtract timestamps, you need to cast them to seconds. Therefore, what you are looking for is to cast the timestamp columns to long/bigint as you are subtracting, divide by 60 to get minute value, and then see if it is less than 30.
#example=df1
#both columns are of type Timestamp
+-------------------+-------------------+
| prev_time|CAL_COMPLETION_TIME|
+-------------------+-------------------+
|2019-04-26 01:19:10|2019-04-26 01:19:35|
+-------------------+-------------------+
Pyspark:
df1.withColumn("sub", F.when(((F.col("CAL_COMPLETION_TIME").cast("long")-F.col("prev_time").cast("long"))/60 < 30), F.lit("LESSTHAN30")).otherwise(F.lit("GREATERTHAN"))).show()
+-------------------+-------------------+----------+
| prev_time|CAL_COMPLETION_TIME| sub|
+-------------------+-------------------+----------+
|2019-04-26 01:19:10|2019-04-26 01:19:35|LESSTHAN30|
+-------------------+-------------------+----------+
Spark.sql
df1.createOrReplaceTempView("df1")
spark.sql("select prev_time, CAL_COMPLETION_TIME, IF(((CAST(CAL_COMPLETION_TIME as bigint) - CAST(prev_time as bigint))/60)<30,'LESSTHAN30','GREATER') as difference_duration from df1").show()
+-------------------+-------------------+-------------------+
| prev_time|CAL_COMPLETION_TIME|difference_duration|
+-------------------+-------------------+-------------------+
|2019-04-26 01:19:10|2019-04-26 01:19:35| LESSTHAN30|
+-------------------+-------------------+-------------------+
How can I generate a sequence of minutes in redshift?
In postgres this will generate a sequence of minutes over the past day:
SELECT date_trunc('minute', generate_series) as minute
FROM generate_series(NOW() - '1 day'::interval, NOW(), '1 minute')
I'm not sure how to get it to work in redshift though.
generate_series() works in Amazon Redshift, as long as you don't try to join it to data in tables. This is because it runs on the Leader Node, but not on Compute Nodes.
SELECT CURRENT_DATE - generate_series(1, 60) * interval '1 minute'
Returns:
2018-07-09 23:59:00
2018-07-09 23:58:00
2018-07-09 23:57:00
...
On running a transformation with kettle I get following error:
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Because of an error, this step can't continue:
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : org.pentaho.di.core.exception.KettleValueException:
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - CREATION_DATE String : couldn't convert string [20170326 01:10] to a date using format [yyyyMMdd HH:mm] on offset location 14
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - 20170326 01:10
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 -
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToDate(ValueMetaBase.java:791)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getDate(ValueMetaBase.java:2047)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertData(ValueMetaBase.java:3672)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertBinaryStringToNativeType(ValueMetaBase.java:1371)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getString(ValueMetaBase.java:1555)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:319)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:827)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:372)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at java.lang.Thread.run(Thread.java:745)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - Caused by: java.text.ParseException: 20170326 01:10
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToDate(ValueMetaBase.java:782)
2017/06/01 17:57:46 - Table PAX_TKT_UPDATES.0 - ... 10 more
Questions:
The date 20170326 01:10 looks absolutely valid for format yyyyMMdd. Why the parse error
I've selected ignore insert errors in my table output step and that ignores insert errors (such as column constraint errors). But it doesn't seem to ignore data conversion errors and stops the transformation. How can I ignore data conversion errors?
I'm going to guess you're running that transformation in a computer located in Europe.
March 26 is the last Sunday of March and at 1:00 timezone changed to Daylight savings time. Therefore, in local time, there is no such time. After 00:59:59 comes 02:00:00.
Your field on the database is probably a Datetime, which is in local time and therefore the timezone changed at 1am. You can use one of the following approaches:
change the data type to to UTC or fixed timezone;
convert your data making the timezone explicit, e.g., using the format 'yyyyMMdd HH:mm +0100'
my database is using an integer in epoch time for date in this table.
I want to do something like this:
select * from myTable where date_column > CURRENT_TIMESTAMP - 6 months
I'm not sure how to get 6 months out of this, dynamically. And the result of CURRENT_TIMESTAMP - 6 months would have to be in epoch time
insight appreciated
In Postgres, I believe the correct syntax is:
date_column > EXTRACT('epoch' from (NOW() - interval ' 6 months'))
or similarly:
to_timestamp(date_column) > (NOW() - interval ' 6 months'))
You can read the complete documentation of the date/time functions for Postgres for more information
In MSSQL you can use
select *
from myTable
where date_column > dateadd(month,-6,CURRENT_TIMESTAMP)
You can try this
SELECT *
FROM myTable
WHERE TO_TIMESTAMP(date_column) > CURRENT_DATE - INTERVAL '6 MONTH';
Here is sqlfiddle