tmap talend NullPointerException - nullpointerexception

I'm trying to read two csv's into my SQL Server. Getting NullPointerException even though I'm allowing nullables and checking null each time. When debugging, it appears that the null check expression is evaluating to the FALSE condition every time, even when the string is null.
Here's an example of my null check:
( row2.Acq_date.equals(null) || row2.Acq_date.equals("") ||
row2.Acq_date == null ) ? null : (int)
TalendDate.diffDate(TalendDate.getCurrentDate(),row2.Acq_date,"MM")

You shouldn't do null checking like this row2.Acq_date.equals(null).
This is the correct way : row2.Acq_date == null (which you actually included in your test, only doing it after testing row2.Acq_date.equals(null) is too late, since that is the part throwing the NullPointerException).
Here's the correct way :
( row2.Acq_date == null ? null : (int)
TalendDate.diffDate(TalendDate.getCurrentDate(),row2.Acq_date,"MM")
Based on your comment, row2.Acq_date is of type date, which you can read directly from your file using the appropriate date format. If the column is empty (or contains whitespaces) in your file, Talend returns a null date, which is handled by the above test.

Related

Prevent Oracle error ORA-18043 in select statement

I'm storing in a table the logs of an API in my application. In one of the columns, I store the raw Json sent in the HTTP request.
I've been tasked to create a page in my application dedicated to easily explore every entered logs, with filters, sorting, etc.
One of the sorting needed is on the date that was indicated in the Json body of the HTTP call. I've managed to do so using the Oracle's Json API :
SELECT *
FROM FUNDING_REQUEST f
ORDER BY TO_TIMESTAMP_TZ(JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
), 'YYYY/MM/DD"T"HH24:MI:SS.FFTZH:TZM') ASC
This works either if $.leasing_information.consumer_request_date_time is defined or not but I have an issue when the value is wrongly formatted. In one of my test, I sent this to my API :
{
[...],
"leasing_information": {
"consumer_request_date_time": "2021-25-09T12:30:00.000+02:00",
[...]
}
}
There is no 25th month, and my SQL query now returns the following error :
ORA-01843: not a valid month.
I would like to handle this value as NULL rather than returning an error, but it seems like the TO_TIMESTAMP_TZ DEFAULT clause does not really work the way I want it to. Doing this also returns an error :
SELECT *
FROM FUNDING_REQUEST f
ORDER BY TO_TIMESTAMP_TZ(JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
) DEFAULT NULL ON CONVERSION ERROR, 'YYYY/MM/DD"T"HH24:MI:SS.FFTZH:TZM') ASC
ORA-00932:inconsistent datatypes ; expected : - ; got : TIMESTAMP WITH TIME ZONE
I would also like to avoid using a PL/SQL function if possible, how can I prevent this request from returning an error ?
You don't need to extract a string and then convert that to a timestamp; you can do that within the json_value(), and return null if that errors - at least from version 12c Release 2 onwards:
SELECT *
FROM FUNDING_REQUEST f
ORDER BY JSON_VALUE(
f.REQUEST_CONTENTS,
'$.leasing_information.consumer_request_date_time'
RETURNING TIMESTAMP WITH TIME ZONE
NULL ON ERROR
) ASC
db<>fiddle, including showing what the string and timestamp extractions evaluate too (i.e. a real timestamp value, or null).

JSON stored in SUPER type fails to select camelcase element. Too long to be serialized. How can I select?

Summary:
I am working with a large JSON that is stored in a redshift SUPER type.
Context
This issue is near identical to the question posted here for TSQL. My schema:
chainId BIGINT
properties SUPER
Sample data:
{
"chainId": 5,
"$browser": "Chrome",
"token": "123x5"
}
I have this as a column in my table called properties.
Desired behavior
I want to be able to retrieve the value 5 from the chainId key and store it in a BIGINT column.
What I've tried
I have referenced the following aws docs:
https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html
https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html
https://docs.aws.amazon.com/redshift/latest/dg/super-overview.html
I have tried the following which haven't worked for me:
SELECT
properties.chainId::varchar as test1
, properties.chainId as test2
, properties.chainid as test3
, properties."chainId" as test4
, properties."chainid" as test5
, json_extract_path_text(json_serialize(properties), 'chainId') serial_then_extract
, properties[0].chainId as testval1
, properties[0]."chainId" as testval2
, properties[0].chainid as testval3
, properties[0]."chainid" as testval4
, properties[1].chainId as testval5
, properties[1]."chainId" as testval6
FROM clean
Of these, the attempt, serial_then_extract returned a not null, correct value, but not all of the values in my properties field are short enough to serialize, so this only works on some of the rows.
All others return null.
Referencing the following docs: https://docs.aws.amazon.com/redshift/latest/dg/query-super.html#unnest I have also attempted to iterate over the super type using partisql:
SELECT ps.*
, p.chainId
from clean ps, ps.properties p
where 1=1
But this returns no rows.
I also tried the following:
select
properties
, properties.token
, properties."$os"
from base
And this returned rows with values. I know that there is a chainId value as I've checked the corresponding key and am working with sample data.
What am I missing? What else should I be trying?
Does anyone know if this has to do with the way that the JSON key is formatted? [camelcase]
You need to enable case sensitive identifiers. By default Redshift maps everything to lower case for table and column names. If you have mixed case identifiers like in your super field you need to enable case sensitivity with
SET enable_case_sensitive_identifier TO true;
See: https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html

Check for null keys in map? Presto

I have a query that was running fine for a while now that was doing the following:
MAP_AGG(key, value(which is a map))
AS k_v1,
MAP_CONCAT(
k_v(some map),
MAP_UNION_SUM(
MAP(ARRAY[K], ARRAY[V])
) as k_v2
With some data from source that looks like this:
key
value
k_v
K
V
id_2
{"KEY2": "20"}
{"KEY4": "100"}
KEY8
100
id_1
{"KEY1": "96.25"}
{"KEY5": "150"}
KEY8
150
In which it provides a table as such:
k_v1
k_v2
{"id_2":{{"KEY2": "20"}, "id_1":{"KEY1": "96.25"}}
{{"KEY4": "100"}, {"KEY5": "150"}, {"KEY8": "250"}}
But now as a new job was running, I get an error stating that
"Failure": "map key cannot be null"
I'm trying to understand how to catch such a case with Presto, as it seems pretty verbose of a process to have to unnest these kinds of situations to check for null keys. Is there a more easier or built in solution to do this kind of check and remove that from the mapping?
Edit: I have hundred's of thousands of records that needs to be processed. The sample data above is to illustrate the schema.
Not sure were and how you want to apply unnest but my guess would be that source of issue is MAP(ARRAY[K], ARRAY[V]) with some of K being null (MAP_AGG should ignore null keys and other methods are working with existing maps). For this case you can try using conditional expression to ignore such rows (by creating empty maps) - if(K is null, MAP(), MAP(ARRAY[K], ARRAY[V])):
MAP_CONCAT(
k_v(some map),
MAP_UNION_SUM(
if(K is null, MAP(), MAP(ARRAY[K], ARRAY[V]))
) as k_v2
or substitute value with some default with coalesce(K, 'KEYDEFAULT'):
MAP_CONCAT(
k_v(some map),
MAP_UNION_SUM(
MAP(ARRAY[coalesce(K, 'KEYDEFAULT')], ARRAY[V])
) as k

Error maxClauseCount searching if certain property is null or not null

When I try to search with Lucene in Alfresco if certain property is not null:
myProperty IS NOT NULL;
or is null:
myProperty IS NULL;
I have this error:
org.apache.lucene.search.BooleanQuery$TooManyClauses - maxClauseCount is set to 10000
This is my query:
SELECT D.cmis:name, D.cmis:objectId, D.cmis:creationDate, R.regxun:numReg, R.regxun:numInterno FROM cmis:document AS D JOIN regxun:contextoRegistroBase AS R ON D.cmis:objectId = R.cmis:objectId WHERE D.cmis:creationDate >= TIMESTAMP '2016-02-18T00:00:00.000Z' AND D.cmis:creationDate < TIMESTAMP '2016-02-19T00:00:00.000Z' AND R.regxun:ambitoDoc='prrubuh' AND R.regxun:numReg IS NOT NULL
Any alternative?
Increase the lucene.query.maxClauses in the alfresco-global.properties to a higher number than 10000.
Like:
lucene.query.maxClauses=100000
But I wouldn't keep it that high, just try to leave the IS NOT NULL statement, which creates a lot of OR clauses in the database query.
For example what you could do is create a rule or behaviour which fill the custom property with a special value (like 'empty') and then search on that.

Handle null values within SQL IN clause

I have following sql query in my hbm file. The SCHEMA, A and B are schema and two tables.
select
*
from SCHEMA.A os
inner join SCHEMA.B o
on o.ORGANIZATION_ID = os.ORGANIZATION_ID
where
case
when (:pass = 'N' and os.ORG_ID in (:orgIdList)) then 1
when (:pass = 'Y') then 1
end = 1
and (os.ORG_SYNONYM like :orgSynonym or :orgSynonym is null)
This is a pretty simple query. I had to use the case - when to handle the null value of "orgIdList" parameter(when null is passed to sql IN it gives error). Below is the relevant java code which sets the parameter.
if (_orgSynonym.getOrgIdList().isEmpty()) {
query.setString("orgIdList", "pass");
query.setString("pass", "Y");
} else {
query.setString("pass", "N");
query.setParameterList("orgIdList", _orgSynonym.getOrgIdList());
}
This works and give me the expected output. But I would like to know if there is a better way to handle this situation(orgIdList sometimes become null).
There must be at least one element in the comma separated list that defines the set of values for the IN expression.
In other words, regardless of Hibernate's ability to parse the query and to pass an IN(), regardless of the support of this syntax by particular databases (PosgreSQL doesn't according to the Jira issue), Best practice is use a dynamic query here if you want your code to be portable (and I usually prefer to use the Criteria API for dynamic queries).
If not need some other work around like what you have done.
or wrap the list from custom list et.