Apache drill Error Index out of bounds

Apache drill Error Index out of bounds - hive

Im performing queries with apache drill over a 100GB dataset in relatively small cluster (4 nodes with 16GB). When i try to run a certain query it gives me the following error:
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 88, length: 4
(expected: range(0, 64)).
Im using the standard TPCH-H data model, the error happens in Query 12.The same query ran successfully when i was working with a smaller dataset. Can someone give some sugestion why this happens?
Query:
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
pq_orders,
pq_lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('REG AIR', 'MAIL')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= '1995-01-01'
and l_receiptdate < '1996-01-01'
group by
l_shipmode
order by
l_shipmode;

Related

Trying to grab data from two columns and format them properly

So I have a database here with a table that lists off whether or not certain processes have failed. There are two columns, IsProcessed, and IsFailed. A failed process can still be considered processed if the error was handled, but I still need to recognize that it failed. They're both bit values, and so I have to try and grab and separate them despite that they may depend on one another. After they've been separated out, I need to count the relative successes and relative failures.
I utilize an AND statement in my WHERE clause to try and separate out the successes from the failures. I honestly have no idea where to go from here.
SELECT CAST(PQ.ProcessedDate AS Date) AS Date, COUNT(PQ.IsProcessed) AS Successes
FROM PQueue PQ
WHERE PQ.ProcessDate BETWEEN '2019-10-1' AND '2019-10-31' AND PQ.IsFailed = 0 AND PQ.IsProcessed = 1
GROUP BY CAST(PQ.ProcessDate AS Date)
ORDER BY CAST(PQ.ProcessDate AS Date) ASC
Because a failed process can still be processed in the system, we have to do a check first to try and grab the data that was processed but didn't flag a failure. Now I need to try and find a way to not exclude the failures, but include them and place them in a group. I can do the group part, but I'm relatively new to SQL so I don't know whether or not I can place something in an IF statement somewhere or try to use variables to get this done. Thank you in advance.

You seem to want conditional aggregation:
SELECT CAST(PQ.ProcessedDate AS Date) AS Date,
SUM(CASE WHEN PQ.IsFailed = 0 AND PQ.IsProcessed = 1 THEN 1 ELSE 0 END) as Successes,
SUM(CASE WHEN PQ.IsFailed = 1 AND PQ.IsProcessed = 1 THEN 1 ELSE 0 END) as Fails
FROM PQueue PQ
WHERE PQ.ProcessDate BETWEEN '2019-10-1' AND '2019-10-31'
GROUP BY CAST(PQ.ProcessDate AS Date)
ORDER BY CAST(PQ.ProcessDate AS Date) ASC

If SQL Server then maybe a CASE statement would help you out.
eg
SELECT ...........
CASE
WHEN IsFailed = 1 AND IsProcessed = 1 THEN "Processed But Failed"
WHEN IsFailed = 0 AND IsProcessed = 0 THEN "Not Processed"
WHEN IsFailed = 0 AND IsProcessed = 1 THEN "Processed Succesfully"
WHEN IsFailed = 1 AND IsProcessed = 0 THEN "Failed"
END as REsult

Hive query with except conditions

I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include
name = "summary"
name = "details"
name1 = "vehicle stats"
Basically, the query should exclude all the other features in name and name1.
I am quite new to hive. In sql, i know this can be done using except keyword. Just wondering whether there is some functions that can achieve the same.
Thanks very much !!

If I understand correctly, I approach this using group by and having:
select ?
from t
group by ?
having sum(case when name = 'summary' then 1 else 0 end) > 0 and
sum(case when name = 'details' then 1 else 0 end) > 0 and
sum(case when name1 = 'vehicle_stats' then 1 else 0 end) > 0;
The ? is for the column that you want the summary of.

Hive summary function inside case statement

I am trying to write a simple Hive query:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) then 1 else 0)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 where p_wk_end_d = 2014-06-28;
Here pit_sls_q and pot_sls_q both are columns in the Hive table and I want proportion of records which have pot_sls_q more than 2 times average of pit_sls_q. However I get error:
FAILED: SemanticException [Error 10128]: Line 1:95 Not yet supported place for UDAF 'avg'
To fool around I even tried using some window function:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) over (partition by dept_i,class_i) then 1 else 0 end)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 and p_wk_end_d = '2014-06-28';
which is fine considering the fact filtering or partitioning the data on same condition is "same" data essentially but even with this I get error:
FAILED: SemanticException [Error 10002]: Line 1:36 Invalid column reference 'avg': (possible column names are: p_wk_end_d, dept_i, class_i, item_i, pit_sls_q, pot_sls_q)
please suggest right way of doing this.

You are using AVG inside SUM which won't work (along with other syntax errors).
Try analytic AVG OVER () this:
select sum(case when pot_sls_q > 2 * avg_pit_sls_q then 1 else 0 end) / count(*)
from (
select t.*,
avg(pit_sls_q) over () avg_pit_sls_q
from prd_inv_fnd.item_pot_sls t
where dept_i = 43
and class_i = 3
and p_wk_end_d = '2014-06-28'
) t;

Query optimization with 3000000 in single date oracle

Table x contains millions of rows and I have to fetch data for single date using function based index(trunc).
Single date data for eg, for 22-07-16 we have 3000000 rows. I am also using case for sum of columns. Query taking 18 sec. How I can reduce time.
EDIT
QUERY:
SELECT SUM(
CASE
WHEN cssgoldenc1_.impact='Low'
THEN 1
ELSE 0
END) AS col_0_0_,
SUM(
CASE
WHEN cssgoldenc1_.impact='High'
THEN 1
ELSE 0
END) AS col_1_0_
FROM CSSCOMPLIANCEDETAIL csscomplia0_,
CSSGoldenConfiguration cssgoldenc1_,
CSS css7_
WHERE csscomplia0_.cssGoldenConfigurationID_FK=cssgoldenc1_.CSSGoldenConfigurationId_PK
AND csscomplia0_.cssID_FK =css7_.cssId_PK
AND (cssgoldenc1_.cmcategory IN ('Access List','Application of QoS Policy','Archive','BFD','BGP', 'CPU','Clock','Debug','Default settings','Entity Check','IGP Routing','Inclusion in VRF', 'Interface Parameters','LDP','LDP Establishment','License','Logging/Syslog/Debug','MTU Size', 'Multicast','Multilink','NodeReadiness','Nomenclature Related','Performance Optimization', 'QoS','Router OAM','Routing','SNMP','Security','Services','System Recovery', 'Type of Interface','Unicast','Unrequired Services','mBGP'))
AND TRUNC(csscomplia0_.creationDate) =to_Date('22-07-16','dd-mm-yy')
AND (css7_.softwareVersion IN ('/asr920-universalk9.V155_1S2_SR635680903_6.bin', '/asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin','/asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin', '/asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','/asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin', '/bootflash','asr901-universalk9-mz.155-3.S1a.bin','asr903rsp1-universalk9_npe.V155_1_S2_SR635680903_10.bin', 'asr920-universalk9.V155_1S2_SR635680903_6.bin','asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin', 'asr920-universalk9_npe.03.13.00z.S.154-3.S0z-ext.bin','asr920-universalk9_npe.03.14.02.S.155-1.S2-', 'asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin','asr920-universalk9_npe.03.15.01.S.155-2.S1-std.bin', 'asr920-universalk9_npe.03.16.01a.S.155-3.S1a-ext.bin','asr920-universalk9_npe.2016-05-10_07.53_saappuku.bin' ,'asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin',
'asr920-universalk9_npe.V155_1_S2_SR635680903_6.binn','bootflash'));
Index :
create index idx_fnc on CSSCOMPLIANCEDETAIL(trunc(creationDate));

Try this. Basically I took the CASE to a subquery since this way it shouldn't be evaluated 3M times. I also change the query in order to use JOIN
with cssgoldenc1_ as
(select /*+ Materialize */ CASE WHEN impact='Low' THEN 1
ELSE 0 END AS col_0_0_,
CASE WHEN impact='High' THEN 1
ELSE 0 END AS col_1_0_,
CSSGoldenConfigurationId_PK
from CSSGoldenConfiguration
where cssgoldenc1_.cmcategory IN ('Access List','Application of QoS Policy','Archive','BFD','BGP', 'CPU','Clock','Debug','Default settings','Entity Check','IGP Routing','Inclusion in VRF', 'Interface Parameters','LDP','LDP Establishment','License','Logging/Syslog/Debug','MTU Size', 'Multicast','Multilink','NodeReadiness','Nomenclature Related','Performance Optimization', 'QoS','Router OAM','Routing','SNMP','Security','Services','System Recovery', 'Type of Interface','Unicast','Unrequired Services','mBGP')
)
SELECT SUM(col_0_0_) AS col_0_0_,
SUM(col_1_0_) AS col_1_0_
FROM CSSCOMPLIANCEDETAIL csscomplia0_ join cssgoldenc1_ on csscomplia0_.cssGoldenConfigurationID_FK = cssgoldenc1_.CSSGoldenConfigurationId_PK
join CSS css7_ on csscomplia0_.cssID_FK = css7_.cssId_PK
WHERE TRUNC(csscomplia0_.creationDate) =to_Date('22-07-16','dd-mm-yy')
AND css7_.softwareVersion IN ('/asr920-universalk9.V155_1S2_SR635680903_6.bin', '/asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin','/asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin', '/asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','/asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin', '/bootflash','asr901-universalk9-mz.155-3.S1a.bin','asr903rsp1-universalk9_npe.V155_1_S2_SR635680903_10.bin', 'asr920-universalk9.V155_1S2_SR635680903_6.bin','asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin', 'asr920-universalk9_npe.03.13.00z.S.154-3.S0z-ext.bin','asr920-universalk9_npe.03.14.02.S.155-1.S2-', 'asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin','asr920-universalk9_npe.03.15.01.S.155-2.S1-std.bin', 'asr920-universalk9_npe.03.16.01a.S.155-3.S1a-ext.bin','asr920-universalk9_npe.2016-05-10_07.53_saappuku.bin' ,'asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin',
'asr920-universalk9_npe.V155_1_S2_SR635680903_6.binn','bootflash');

Trouble counting null and missing values (and differentiating between the two) using RODBC package

I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I am working with the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code(see below), it appears to not be working, and I believe the issue lies with assigning an integer value to the character for null, NA. I am getting this message when trying to list vars in the function:
Error in as.environment(pos) : no item called "22018 245 [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'NA' to data type int." on the search list.
Also, when I try to find the environment for the function, R returns NULL. I do not necessarily want to assign a new value to the already existent variable, and I new to SQL, but I am trying to do something along these lines If X = 'NA' then Y = 1 else 0. I get the following error message when I try to run the final 2 lines creating the percent vars:
Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'
Any insight?
test1 <- sqlQuery(channel, "select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = 'NA' THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)

I believe the data type of your column MOTHER_EDUCATION_TRENDABLE is an integer, if so, try:
select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE IS NULL THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Apache drill Error Index out of bounds - hive

Related

Trying to grab data from two columns and format them properly

Hive query with except conditions

Hive summary function inside case statement

Query optimization with 3000000 in single date oracle

Trouble counting null and missing values (and differentiating between the two) using RODBC package

Categories

Resources