"IN" support for joins - pentaho

We're trying to connect Pentaho BI to ClickHouse and sometimes Pentaho generates queries as such:
select
...
from
date_dimension_table,
fact_table,
other_dimension_table
where
fact_table.fact_date = date_dimension_table.date
and date_dimension_table.calendar_year = 2019
and date_dimension_table.month_name in ('April', 'June', ...)
and fact_table.other_dimension_id = other_dimension_table.id
and other_dimension_table.code in ('code1', 'code2', ...)
group by
date_dimension_table.calendar_year,
date_dimension_table.month_name,
other_dimension_table.code;
It produces ClickHouse error: Code: 403, e.displayText() = DB::Exception: Invalid expression for JOIN ON. Expected equals expression, got (code AS c2) IN ('code1', 'code2', ...). Supported syntax: JOIN ON Expr([table.]column, ...) = Expr([table.]column, ...) [AND Expr([table.]column, ...) = Expr([table.]column, ...)...] (version 19.15.3.6 (official build))
Engines used for tables: fact_table - MergeTree, both dimensions - TinyLog.
Thus, questions:
Can this problem be solved by changing table engines? Unfortunately, we can't change query, it's autogenerated.
If not, are there any plans for supporting joins with in clause in ClickHouse in the nearest future?
Thanx.

This issue has been fixed beginning with ClickHouse release v20.3.2.1, 2020-03-12 (see Issue 7314), so you need to upgrade CH.
! Don't forget to check all backward-incompatible changes (see changelog).
Let's reproduce this problem on CH 19.15.3 revision 54426 to get the error you described:
Received exception from server (version 19.15.3):
Code: 403. DB::Exception: Received from localhost:9000. DB::Exception: Invalid expression for JOIN ON. Expected equals expression, got code IN ('code1', 'code2'). Supported syntax: JOIN ON Expr([table.]column, ...) = Expr([table.]column, ...) [AND Expr([table.]column, ...) = Expr([table.]column, ...) ...].
Now execute this query on the latest version of CH (20.3.7 revision 54433) to make sure that it works correctly:
docker pull yandex/clickhouse-server:latest
docker run -d --name ch_test_latest yandex/clickhouse-server:latest
docker exec -it ch_test_latest clickhouse-client
# create tables as described below
..
# execute test query
..
Test preparation:
create table date_dimension_table (
date DateTime,
calendar_year Int32,
month_name String
) Engine = Memory;
create table fact_table (
fact_date DateTime,
other_dimension_id Int32
) Engine = Memory;
create table other_dimension_table (
id Int32,
code String
) Engine = Memory;
Test query:
SELECT
date_dimension_table.calendar_year,
date_dimension_table.month_name,
other_dimension_table.code
FROM date_dimension_table
,fact_table
,other_dimension_table
WHERE (fact_table.fact_date = date_dimension_table.date)
AND (date_dimension_table.calendar_year = 2019)
AND (date_dimension_table.month_name IN ('April', 'June'))
AND (fact_table.other_dimension_id = other_dimension_table.id)
AND (other_dimension_table.code IN ('code1', 'code2'))
GROUP BY
date_dimension_table.calendar_year,
date_dimension_table.month_name,
other_dimension_table.code

Related

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

EXTERNAL_QUERY suddenly started to return BYTE value instead of STRING

I'm using Query which joins external data through EXTERNAL_QUERY() LIKE THIS
(this is just example, not actual one)
SELECT
ext.program_id,
SUM(price) AS total_price
FROM a_dataset.purchases pcs
LEFT OUTER JOIN (
SELECT
program_id,
version
FROM EXTERNAL_QUERY(
'CONNECTION_INFO',
'SELECT program_id, version FROM products'
)
) ext ON pcs.program_id = ext.program_id
This query actually worked at my environment.
However, from today, this part ↓
EXTERNAL_QUERY(
'CONNECTION_INFO',
'SELECT program_id, version FROM products'
)
starts to return byte value which looks like encrypted and
turns out to show this message
No matching signature for operator = for argument types: STRING, BYTES. Supported signatures: ANY = ANY at [37:9]
'CONNECTION_INFO' refers Cloud SQL, read replica instance of MySQL.
Do you have any ideas how to fix this, or why these return values started to changed ?

ERROR: column mm.geom does not exist in PostgreSQL execution using R

I am trying to run the model in R software which calls functions from GRASS GIS (version 7.0.2) and PostgreSQL (version 9.5) to complete the task. I have created a database in PostgreSQL and created an extension Postgis, then imported required vector layers into the database using Postgis shapefile importer. Every time I try to run using R (run as an administrator), it returns an error like:
Error in fetch(dbSendQuery(con, q, n = -1)) :
error in evaluating the argument 'res' in selecting a method for function 'fetch': Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: column mm.geom does not exist
LINE 5: (st_dump(st_intersection(r.geom, mm.geom))).geom as geom,
^
HINT: Perhaps you meant to reference the column "r.geom".
QUERY:
insert into m_rays
with os as (
select r.ray, st_endpoint(r.geom) as s,
(st_dump(st_intersection(r.geom, mm.geom))).geom as geom,
mm.legend, mm.hgt as hgt, r.totlen
from rays as r,bh_gd_ne_clip as mm
where st_intersects(r.geom, mm.geom)
)
select os.ray, os.geom, os.hgt, l.absorb, l.barrier, os.totlen,
st_length(os.geom) as shape_length, st_distance(os.s, st_endpoint(os.geom)) as near_dist
from os left join lut as l
on os.legend = l.legend
CONTEXT: PL/pgSQL function do_crtn(text,text,text) line 30 at EXECUTE
I have checked over and over again, column geometry does exist in Schema>Public>Views of PostgreSQL. Any advise on how to resolve this error?
add quotes and then use r."geom" instead r.geom

Ambiguous error in Lookup Override for teradata query

I am using below self join query as Lookup override in informatica. This is running fine in teradata.
SELECT A.region_cd AS REGION_CODE,
A.enp_no AS ENP_NBR,
B.sla_cd AS SLA_CODE
FROM edb_man_work.emp A,
edb_man_work.emp B
WHERE A.company_no = Trim(Cast(B.enp_no AS INTEGER))
AND A.region_cd = B.region_cd
This is running fine in teradata but while running in mapping it is giving error
as Column SLA_CD is ambiguous.
I am not sure why this is giving this type of error.
Since you're using multiple source tables, make sure you end the Lookup SQL override with --.
If you look at the session log, you'll see that Informatica automatically appends Lookup SQL overrides with an ORDER BY statement. Adding the -- will comment out this addition.
SELECT A.region_cd AS REGION_CODE,
A.enp_no AS ENP_NBR,
B.sla_cd AS SLA_CODE
FROM edb_man_work.emp A,
edb_man_work.emp B
WHERE A.company_no = Trim(Cast(B.enp_no AS INTEGER))
AND A.region_cd = B.region_cd
--

IBMDB2 simple query error 901 - system error

I'm working on a IBM iseries v6r1m0 system.
I'm trying to execute a very simple query :
select * from XG.ART where DOS = 998 and (DES like 'ALB%' or DESABR like 'ALB%')
The columns are:
DOS -> numeric (3,0)
DES -> Graphic(80) CCSID 1200
DESABR -> Garphic(25) CCSID 1200
I get :
SQL State : 58004
SQL Code : -901
Message : [SQL0901] SQL System error.
Cause . . . . . : An SQL system error has occurred. The current SQL statement cannot be completed successfully. The error will not prevent other SQL statements from being processed. Previous messages may indicate that there is a problem with the SQL statement and SQL did not correctly diagnose the error. The previous message identifier was CPF4204. Internal error type 3107 has occurred. If precompiling, processing will not continue beyond this statement.
Recovery . . . : See the previous messages to determine if there is a problem with the SQL statement. To view the messages, use the DSPJOBLOG command if running interactively, or the WRKJOB command to view the output of a precompile. An application program receiving this return code may attempt further SQL statements. Correct any errors and try the request again.
If I change DES into REF (graphic(25)), it works...
EDIT :
I run some tests this afternoon, and it is very strange :
Just after the creation of the table/indexes, I have no errors.
If I insert some datas : error
If I clear the table : error
If I remove an index (see below) : it works (with or without datas)
!!
The index is :
create index XG.GTFAT_ART_B on XG.ART(
DOS,
DESABR,
ART_ID
)
Edit 2 :
Here is the job log (sorry, it is in French...)
It sais :
Function error X'1720' in machine instruction. Internal snapshot ID 01010054
Foo file created in library QTEMP.
*** stuff with the printer
DBOP *** FAILED open. Exception from call to SLIC$
Internal error in the query processor file
Sql system error
I finally contacted IBM.
It was an old bug from v5.
I have installed the latest PTF, and now, it works.
You need to use the GRAPHIC scalar function to convert your character literals on the LIKE predicate.
CREATE TABLE QTEMP/TEST (F1 GRAPHIC(80))
INSERT INTO QTEMP/TEST (F1) VALUES (GRAPHIC('TEST'))
SELECT * FROM QTEMP/TEST WHERE F1 LIKE GRAPHIC('TE%')
I know this guy got his problem fixed with an update. But here is something that worked for me that might work for the next guy here who has the problem.
My problem query had a lot of common table expressions. Most of them did not create tables with a whole lot of records. So if I figured that the maximum number of records a CTE would make was 1000, I added a "Fetch first 9999 rows only" to it. I knew that the CTE couldn't possibly have more rows than that. I guess the query optimizer had less to think about with that added.
If you have that problem and you don't have the option to upgrade or talk to IBM, I hope this help you.
For other people getting this errore, I encountered it on an IBM i Series v7r3, when tried an UPDATE, retrieving the value to be set on a field using a inner SELECT where multiple results where reduced to 1, using DISTINCT. I solved the problem removing DISTINCT and adding FETCH FIRST 1 ROW ONLY at the end of the inner SELECT.
E.g.: changed from
UPDATE MYTABLE AS T1
SET T1.FIELD1 = (
SELECT DISTINCT T2.FIELD5
FROM MYTABLE AS T2
WHERE T1.FIELD2 = T2.FIELD2
AND T1.FIELD3 = T2.FIELD3
)
WHERE T1.FIELD4 = 'XYZ'
to
UPDATE MYTABLE AS T1
SET T1.FIELD1 = (
SELECT T2.FIELD5
FROM MYTABLE AS T2
WHERE T1.FIELD2 = T2.FIELD2
AND T1.FIELD3 = T2.FIELD3
FETCH FIRST 1 ROW ONLY
)
WHERE T1.FIELD4 = 'XYZ'