Issue when issuing a SELECT query to a Hive table in Prestodb - hive

I am able to connect to my Hive metastore, and doing a DESCRIBE:
DESCRIBE sample_07;
Query 20131113_025614_00005_af2fx, RUNNING, 1 node, 2 splits
Column | Type | Null | Partition Key
-------------+---------+------+---------------
code | varchar | true | false
description | varchar | true | false
total_emp | bigint | true | false
salary | bigint | true | false
(4 rows)
However, a SELECT does not work:
select * from sample_07;
2013-11-12T16:54:58.611-0800 DEBUG query-scheduler-7 com.facebook.presto.execution.QueryStateMachine Query 20131113_005458_00004_af2fx is PLANNING
Query 20131113_005458_00004_af2fx failed: java.io.IOException: Failed on local exception: com.facebook.presto.hadoop.shaded.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status; Host Details : local host is: "sandbox.hortonworks.com/xx.xx.2.15"; destination host is: "sandbox.hortonworks.com":8020;
presto:default> 2013-11-12T16:56:04.771-0800 ERROR Stage-20131113_005458_00004_af2fx.1-219 com.facebook.presto.execution.SqlStageExecution Error while starting stage 20131113_005458_00004_af2fx.1 ~[guava-15.0.jar:na]
at com.facebook.presto.hive.HiveSplitIterable$HiveSplitQueue.computeNext(HiveSplitIterable.java:433) ~[na:na]
at com.facebook.presto.hive.HiveSplitIterable$HiveSplitQueue.computeNext(HiveSplitIterable.java:392) ~[na:na]
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-15.0.jar:na]
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-15.0.jar:na]
As you can tell, i am using Hortonworks' sandbox, so it may be that that's the issue? Or is it choking on the IP address ? Not completely sure i understand the problem.
cheers,
Matt

Your error message suggests that you are not running Presto against CDH4 but against Hortonworks Sandbox which I believe is Hadoop 2.2.0. There are known incompatibilities at this point. See this thread on the Presto Google Group for more information: https://groups.google.com/forum/#!topic/presto-users/lVLvMGP1sKE

Related

Apache Drill and Apache Kudu - not able to run "select * from <some table>" using Apache Drill, for the table created in Kudu through Apache Impala

I'm able to connect to Kudu through Apache Drill, and able to list tables fine. But when I have to fetch data from the table "impala::default.customer" below, I tried different options but none is working for me.
The table in Kudu was created through Impala-Shell as external table.
Initial connection to Kudu and listing objects
ubuntu#ubuntu-VirtualBox:~/Downloads/apache-drill-1.19.0/bin$ sudo ./drill-embedded
Apache Drill 1.19.0
"A Drill is a terrible thing to waste."
apache drill> SHOW DATABASES;
+--------------------+
| SCHEMA_NAME |
+--------------------+
| cp.default |
| dfs.default |
| dfs.root |
| dfs.tmp |
| information_schema |
| kudu |
| sys |
+--------------------+
7 rows selected (24.818 seconds)
apache drill> use kudu;
+------+----------------------------------+
| ok | summary |
+------+----------------------------------+
| true | Default schema changed to [kudu] |
+------+----------------------------------+
1 row selected (0.357 seconds)
apache drill (kudu)> SHOW TABLES;
+--------------+--------------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+--------------+--------------------------------+
| kudu | impala::default.customer |
| kudu | impala::default.my_first_table |
+--------------+--------------------------------+
2 rows selected (9.045 seconds)
apache drill (kudu)> show tables;
+--------------+--------------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+--------------+--------------------------------+
| kudu | impala::default.customer |
| kudu | impala::default.my_first_table |
+--------------+--------------------------------+
Now when trying to run "select * from impala::default.customer ", not able to run it at all.
>>>>>>>>>
apache drill (kudu)> SELECT * FROM `impala::default`.customer;
Error: VALIDATION ERROR: Schema [[impala::default]] is not valid with respect to either root schema or current default schema.
>>>>>>>>>
apache drill (kudu)> SELECT * FROM `default`.customer;
Error: VALIDATION ERROR: Schema [[default]] is not valid with respect to either root schema or current default schema.
Current default schema: kudu
[Error Id: 8a4ca4da-2488-4775-b2f3-443b8b4b17ef ] (state=,code=0)
Current default schema: kudu
[Error Id: ce96ea13-392f-4910-9f6c-789a6052b5c1 ] (state=,code=0)
apache drill (kudu)>
>>>>>>>>>
apache drill (kudu)> SELECT * FROM `impala`::`default`.customer;
Error: PARSE ERROR: Encountered ":" at line 1, column 23.
SQL Query: SELECT * FROM `impala`::`default`.customer
^
[Error Id: 5aacdd98-db6e-4308-9b33-90118efa3625 ] (state=,code=0)
>>>>>>>>>
apache drill (kudu)> SELECT * FROM `impala::`.`default`.customer;
Error: VALIDATION ERROR: Schema [[impala::, default]] is not valid with respect to either root schema or current default schema.
Current default schema: kudu
[Error Id: 5450bd90-dfcd-4efe-a8d3-b517be85b10a ] (state=,code=0)
>>>>>>>>>>>
In Drill conventions, the first part of the FROM clause is the storage plugin, in this case kudu. When you ran the SHOW TABLES query, you saw that the table name is actually impala::default.my_first_table. If I'm reading that correctly, that whole bit is the table name and the query below is how you should escape it.
Note the back tick before impala and after first_table but nowhere else.
SELECT *
FROM kudu.`impala::default.my_first_table`
Does that work for you?

Why Query_parallelism affects the result of a join between two UUID columns

I'm running the following test on ignite 2.10.0
I have 2 tables created with a query_parallelism=1 and without affinity key.
When I join the 2 following tables I have the result as expected.
0: jdbc:ignite:thin://localhost:10800> SELECT "id" AS "_A_id", "source_id" AS "_A_source_id" FROM PUBLIC."source_ml_blue";
+--------------------------------------+--------------------------------------+
| _A_id | _A_source_id |
+--------------------------------------+--------------------------------------+
| 86c068cd-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 |
+--------------------------------------+--------------------------------------+
1 row selected (0.004 seconds)
0: jdbc:ignite:thin://localhost:10800> SELECT "id" AS "_B_id", "flx_src_ip_text" AS "_B_src_ip" FROM PUBLIC."source_nprobe_tcp_blue";
+--------------------------------------+-----------+
| _B_id | _B_src_ip |
+--------------------------------------+-----------+
| 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 1.1.1.1 |
+--------------------------------------+-----------+
1 row selected (0.003 seconds)
0: jdbc:ignite:thin://localhost:10800> SELECT _A."id" AS "_A_id", _A."source_id" AS "_A_source_id", _B."id" AS "_B_id", _B."flx_src_ip_text" AS "_B_src_ip" FROM PUBLIC."source_ml_blue" AS "_A" INNER JOIN PUBLIC."source_nprobe_tcp_blue" AS "_B" ON "_A"."source_id"="_B"."id";
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
| _A_id | _A_source_id | _B_id | _B_src_ip |
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
| 86c068cd-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 86c068cc-da89-11eb-a185-3da86c6c6bb3 | 1.1.1.1 |
+--------------------------------------+--------------------------------------+--------------------------------------+-----------+
1 row selected (0.005 seconds)
If I delete and create the same tables with a query_parallelism = 8, I do not have a SQL error (the parallelism is equal on the 2 tables) BUT the result of the join is empty.
any idea why I get this behavior ?
You observe this behaviour because of optimisations for parallel query execution. Most likely your records landed to different partitions (handled by a different thread). If you increase the number of records in both tables you will see a subset of this join as a result.
The most elegant option here is to let "_A"."source_id" and "_B"."id" be affinity keys. Most likely ignite.jdbc.distributedJoins is going to affect performance for clustered installation. Affinity collocation will make items with matching "_A"."source_id" and "_B"."id" reside in the same partition to avoid cross-partitional interaction (for clustered environments it would lead to additional networks hops).
The problem comes from the SQL client : it has to be aware of the parallelism.
On DBeaver, I had to enable ignite.jdbc.distributedJoins in the connection properties to make the request works properly.

Insufficient Resources error while inserting into SQL table using Vertica

I'm running a Python script to load data from a DataFrame into a SQL Table. However, the insert command is throwing this error:
(pyodbc.Error) ('HY000', '[HY000] ERROR 3587: Insufficient resources to execute plan on pool fastlane [Request exceeds session memory cap: 28357027KB > 20971520KB]\n (3587) (SQLExecDirectW)')
This is my code:
df.to_sql('TableName',engine,schema='trw',if_exists='append',index=False) #copying data from Dataframe df to a SQL Table
Can you do the following for me:
run this command - and share the output. MAXMEMORYSIZE, MEMORYSIZE and MAXQUERYMEMORYSIZE, plus PLANNEDCONCURRENCY give you an idea of the (memory) budget at the time when the query / copy command was planned.
gessnerm#gessnerm-HP-ZBook-15-G3:~/1/fam/fam-ostschweiz$ vsql -x -c \
"select * from resource_pools where name='fastlane'"
-[ RECORD 1 ]------------+------------------
pool_id | 45035996273841188
name | fastlane
is_internal | f
memorysize | 0%
maxmemorysize |
maxquerymemorysize |
executionparallelism | 16
priority | 0
runtimepriority | MEDIUM
runtimeprioritythreshold | 2
queuetimeout | 00:05
plannedconcurrency | 2
maxconcurrency |
runtimecap |
singleinitiator | f
cpuaffinityset |
cpuaffinitymode | ANY
cascadeto |
Then, you should dig, out of the QUERY_REQUESTS system table, the acutal SQL command that your python script triggered. It should be in the format of:
COPY <_the_target_table_>
FROM STDIN DELIMITER ',' ENCLOSED BY '"'
DIRECT REJECTED DATA '<_bad_file_name_>'
or similar.
Then: how big is the file / are the files you're trying to load in one go? If too big, then B.Muthamizhselvi is right - you'll need to portion the data volume you load.
Can you also run:
vsql -c "SELECT EXPORT_OBJECTS('','<schema>.<table>',FALSE)"
.. .and share the output? It could well be that you have too many projections for the memory to be enough, that you are sorting by too many columns.
Hope this helps for starters ...

BigQuery select error. Error: Unexpected. Please try again

When i run a the following query i get this error: Error: Unexpected. Please try again.
It only happened when i run a query that return a timestamp and a column name "profile" and only if i try to insert the result into a Destination Table.
SELECT 'helllo' as [profile],
TIMESTAMP( '2014-10-22' ) as date_out
i have tried to change the column name from "profile" to something else and it works, but i really need it to be profile...
I see the expected result when I try your query with "allowLargeResults" disabled, both in the web UI and with the 'bq' command-line tool:
+---------+---------------------+
| profile | date_out |
+---------+---------------------+
| helllo | 2014-10-22 00:00:00 |
+---------+---------------------+
If "allowLargeResults" is enabled, however, this produces an internal error. This is a bug in BigQuery, and I've filed against the team. Is it possible for you to work around the issue by disabling allowLargeResults?

ASP.NET / SQL Server - Timeout expired while searching

We have a table called Purchases:
| PRSNumber | ... | ... | ProjectCode |
| PRJCD-00001 | | | PRJCD |
| PRJCD-00002 | | | PRJCD |
| PRJCD-00003 | | | PRJCD |
| PRJX2-00003 | | | PRJX2 |
| PRJX2-00003 | | | PRJX2 |
Note: ProjectCode is the prefix of PRSNumber.
Before, when there is no ProjectCode field in the table, our former developers use this query to search for purchases with specific supplier:
select * from Purchases where left(PRSNumber,5) = #ProjectCode
Yes, they concatenate the PRSNumber in order to obtain and compare the ProjectCode. Although, the code above works fine regardless of the table design.
But when I added a new field, the ProjectCode, and use this query:
select * from Purchases where ProjectCode = #ProjectCode
I receive this exception:
Timeout expired. The timeout period elapsed prior to completion
of the operation or the server is not responding.
I can't believe, that the first query, which needs concatenation before the compare, is faster than the second one which has to do nothing but compare. Can you please tell me why is this happening?
Some information which might be helpful:
PRSNumber is varchar(11) and is the primary key
ProjectCode is nvarchar(10)
Both query works fine in SQL Server Management Studio
First query works in ASP.NET website, but the second does not
ProjectCode is indexed
The table has 32k rows
Update
ProjectCode is now indexed, still no luck
First thing I would do is check the index on PRSNumber, I assume there is an index on that field and the table is very large.
Adding an index to your new field will likely fix the problem (if that is the case).
The code to add an index:
CREATE INDEX IX_Purchases_ProjectCode
ON dbo.Purchases (ProjectCode);
Update:
I would also try adding the field as a varchar to eliminate the datatype change from the equation.
I set the CommandTimeout property of my SqlCommand higher instead of making the query faster. It didn't solve the speed but solved the timeout issue.