How do I force Hue to run blocks of code in sequence? - hive

I am having some issues with Hue, running multiple tables generated in sequence, where Hue executes each block either in a semi-random order, or raises errors either immediately (in the case where table 4 fetches columns from table 3), or in the middle of execution, due to a myriad of errors, apparently.
Example:
create table table_1
<table code>;
create table table_2
<table code>;
create table table_3
<table code>;
create table table_4
<table code
from table_3>;
create table table_5
<table code
from table_4>;
I ran a code of this structure just this evening, and it randomly ran the blocks for table_1 and table_3 and stopped there.
In other similar examples, I have been unable to run the code since table 3 and 4 don't exist yet, and in some cases, it runs but starts blocks in non-sequence, and hitting errors due to tables not existing, and other random errors.
Is there any way to force Hue to evaluate and execute each block strictly separately? I am running up toward 100 scripts that take between 5-30 minutes to execute, so running each script separately is not really a feasible answer, unfortunately.

Related

How do I lock out writes to a specific table while several queries execute?

I have a table set up in my sql server that keeps track of inventory items (in another database) that have changed. This table is fed by several different triggers. Every 15 minutes a scheduled task runs a batch file that executes a number of different queries that send updates on the items flagged in this table to update several ecommerce websites. The last query in the batch file resets the flags.
As you can imagine there is potential to lose changes if an item is flagged while this batch file is running. I have worked around this by replaying the last 25 hours of updates every 24 hours, just in case this scenario happened. It works, but IMO is kind of clumsy.
What I would like to do is delay any writes to this table until my script finishes, and resets the flags on all the rows that were flagged when the script started. Then allow all of these delayed writes to happen.
I've looked into doing this with table hints (TABLOCK) but this seems to be limited to one query--unless I'm misunderstanding what I have read, which is certainly possible. I have several that run in succession. TIA.
Alex
Could you modify your script into a stored procedure that extracts all the data into a temporary table using a select statement that applies a lock to the production table. You could then drop your lock on the main table and do all your processing in the temporary table (or permanent table built for the purpose) away from the live system. It will be a lot slower and put more load on your SQL box but speed shouldn't be an issue if you have a point in time snapshot of it.
If that option is not applicable then maybe you could play with wrapping the whole thing in a transaction and putting a table lock on your production table with the first select statement.
Good luck mate

table to table copy command with where condition

Is there anyway to write a copy command direct which will copy data from 1 table and populate another table (with some condition will be better)?
what I have observed copy command performance is far more better that INSERT INTO in vertica. So I am trying to replace the INSERT INTO with copy command.
Thanks!!
What you want to do is an INSERT /*+ DIRECT */ INTO table2 SELECT ... FROM table1 WHERE .... The direct hint will make it do a direct load to ROS containers instead of through WOS. If you are doing large bulk loads, this would be fastest. If you are doing many small insert/selects like this, then it would be best to use WOS and leave out the DIRECT.
Another possibility would be to do a CREATE TABLE table2 AS SELECT ... FROM table1 WHERE ....
Finally, if you are really just copying all the data and not filtering (which I know isn't your question, but I'm including this for completeness)... and the tables are partitioned, you can do a COPY_PARTITONS_TO_TABLE which will just create references from the source table's ROS containeres to the target table. Any changes to the new table would reorganize the ROS containers (over time, using the tuple mover, etc. Containers wouldn't get cleaned up unless both tables reorganized them).

SQL - Table not found after backup

I saved a SQL table before deleting some information from it with the sql statment:
select * into x_table from y_table
After doing some operations, I want to get back some information from the table I saved with the query above. Unfortunately, MS SQL Server MGMTS shows an error saying that the table does not exist.
However, when I put the drop statement, the table is recognized - and the table is not underlined.
Any idea why this table is recognized by the drop table statement and not the select from statement. This seems strange for me.
EDIT:
Thank you
It may be that the table isn't underlined in your drop table command because its name is still in your IntelliSense cache. Select Edit -> IntelliSense -> Refresh Local Cache in SSMS (or just press Ctrl+Shift+R) and see if the table name is underlined then.
Edit:
Another possibility is that your drop table command might be in the same batch as another statement that creates the table, in which case SSMS won't underline it because it knows that even though the table doesn't exist now, it will exist by the time that command is executed. For instance:
None of the tables one, two, or three existed in my database when I took this screenshot. If I highlight line 6 and try to run it by itself, it will fail. Yet you can see that two is not underlined on line 6 because SSMS can see that if I run the whole script, the table will be created on line 5. On the other hand, three is underlined on line 9 because I commented out the code that would have created it on line 8.
All of that said, I think we might be making too much of this problem. If you try to select from a table and SQL Server tells you it doesn't exist, then it doesn't exist. You can't rely on IntelliSense to tell you that it does; the two examples above are probably not the only ways that IntelliSense might mislead you about the current status of a table.
If you want the simplest way to know whether an object with a given name (like x_table) exists, just use:
select object_id('x_table');
If this query returns null, x_table doesn't exist, regardless of what IntelliSense is telling you. If it returns non-null, then there is some object out there with that name, and then the real question is why your select statement is failing. And to answer that, I'd need to see the statement.
A lot of posts like this, you have to copy in 2 statements :
CREATE TABLE newtable LIKE oldtable;
INSERT newtable SELECT * FROM oldtable;

SQL Server Insert Statement Hangs

Really hoping somebody can offer some advice here. I have a SQL Statement with the following structure:
SELECT <Columns> FROM (SELECT <More columns> FROM (SELECT <AllColumns> FROM View))
Some of the steps might seem unneeded but there is a reason for all of them
The view contains data that is automatically updated each time our import processes run. It contains a rather large query that gathers data from several tables.
From that view our clients can build their own queries in order to create SSAS databases for reporting purposes, hence the second query. Some additional logic like friendly names are also applied here.
Sometimes our clients would like to inspect the data up close, and bringing back a 2GB query over a web application is rarely a good idea. So we have build a facility where the clients can select the data they would like to return from the main query, hence the outer query.
Here is an example query:
SELECT (ColA, ColB, ColC)
FROM (SELECT ColA, ColB, ColC, ColD, ColE, ColF
FROM (Select * FROM FullView))
All of this works correctly but in one specific case there is a problem. In order to make the data retrieval as fast as possible we create a summary table in the database containing the results of the middle query. However, for one of our queries the summary table refuses to build. The queries can successfully create the table with the correct columns but the data will never get populated, the SQL Server statement simply hangs in the air.
I have tried the following:
Originally the statement that creates the summary table was a SELECT INTO statement. It worked for all the queries except the one in question.
Thinking that for some obscure reason resources might not get released I split the statement into a SELECT INTO statement with a WHERE clause of 1=2 (allowing the table to be created) and an INSERT INTO SELECT clause which will create the data. The SELECT INTO statement runs perfectly, the INSERT INTO SELECT clause does nothing.
However, if I were to run the SELECT part of the INSERT INTO clause the query executes in under a minute, which is acceptable as that is how long the original view executes. But if I add the INSERT INTO statement at the front without changing any part of the SELECT statement the query hangs. Maybe it executes somewhere down the line but a query that returns less data going from a minute to several hours might irritate some people.
I have looked at the sp_who2 results and there are no blocks in the system.
Does anybody have any advice or another place I can look for the error?
Thanks!

Creating a hive table with ~40K columns

I'm trying to create a fairly large table. ~3 millions rows and ~40K columns using hive. To begin, I'm creating an empty table and inserting the data into the table.
However, I hit an error when trying this.
Unable to acquire IMPLICIT, SHARED lock default after 100 attempts. FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquire. retry after some time
The query is pretty straightforward:
create external table database.dataset (
var1 decimal(10,2),
var2 decimal(10,2),
...
var40000 decimal(10,2)
) location 'hdfs://nameservice1/root/user1/project1';
Anybody seen this error before? Cloudera says there are no limits on number of columns, but clearly hitting some system limitation here.
Additionally, I can create a smaller hive table in the specified location.
Ran across this blog post which appears to identify and fix the problem: http://gbif.blogspot.com/2014/03/lots-of-columns-with-hive-and-hbase.html
Short answer: there is a limit on the number of characters hive will pass in a query, but you can increase that with the following option change:
alter table "SERDE_PARAMS" alter column "PARAM_VALUE" type text;
Untested as I went with a different tool to handle the data (for the problem above) since hive was failing for unknown reasons. If you come across something similar, try this out and give an update please.