Find columns with non zero values in Hive - sql

Let's say you have a database with n columns, from col1 to coln, where n is large.
Would it be possible to find all rows such that at least one column from colk to coln has a non zero value (assuming columns are non-negative numbers, and numbers may be missing)?

You could use Hive's coalesce function (which takes the first non-null value of a series of column inputs) combined with an if statement, something like:
select *
from table
where coalesce(if(col1 > 0, 1, null), if(col2 > 0, 1, null)...) = 1
;
The above will return any row where at least one column specified in the coalesce function returned a value of 1. Let me know if this works for you.
Edit: Another method which is a little cleaner (doesn't require you to list all columns) but less flexible:
select *
from table tb
where sort_array(array(tb.*))[n-1] > 0
;
The above will sort the array ascending so you can check if the largest value in the array is greater than zero and return only those rows.

why so many complicated queries use this;
INSERT INTO DB.1.table_name_1
SELECT * FROM DB.2_table_name_2
WHERE table_name_1.COL_NAME > 0
INSERT OVERWRITE DB.1.table_name_1
SELECT * FROM DB.2_table_name_2
WHERE table_name_1.COL_NAME > 0

Related

function to sum all first value of Results SQL

I have a table with "Number", "Name" and "Result" Column. Result is a 2D text Array and I need to create a Column with the name "Average" that sum all first values of Result Array and divide by 2, can somebody help me Pls, I must use the create function for this. Its look like this:
Table1
Number
Name
Result
Average
01
Kevin
{{2.0,10},{3.0,50}}
2.5
02
Max
{{1.0,10},{4.0,30},{5.0,20}}
5.0
Average = ((2.0+3.0)/2) = 2.5
= ((1.0+4.0+5.0)/2) = 5.0
First of all: You should always avoid storing arrays in the table (or generate them in a subquery if not extremely necessary). Normalize it, it makes life much easier in nearly every single use case.
Second: You should avoid more-dimensional arrays. The are very hard to handle. See Unnest array by one level
However, in your special case you could do something like this:
demo:db<>fiddle
SELECT
number,
name,
SUM(value) FILTER (WHERE idx % 2 = 1) / 2 -- 2
FROM mytable,
unnest(avg_result) WITH ORDINALITY as elements(value, idx) -- 1
GROUP BY number, name
unnest() expands the array elements into one element per record. But this is not an one-level expand: It expand ALL elements in depth. To keep track of your elements, you could add an index using WITH ORDINALITY.
Because you have nested two-elemented arrays, the unnested data can be used as follows: You want to sum all first of two elements, which is every second (the odd ones) element. Using the FILTER clause in the aggregation helps you to aggregate only exact these elements.
However: If that's was a result of a subquery, you should think about doing the operation BEFORE array aggregation (if this is really necessary). This makes things easier.
Assumptions:
number column is Primary key.
result column is text or varchar type
Here are the steps for your requirements:
Add the column in your table using following query (you can skip this step if column is already added)
alter table table1 add column average decimal;
Update the calculated value by using below query:
update table1 t1
set average = t2.value_
from
(
select
number,
sum(t::decimal)/2 as value_
from table1
cross join lateral unnest((result::text[][])[1:999][1]) as t
group by 1
) t2
where t1.number=t2.number
Explanation: Here unnest((result::text[][])[1:999][1]) will return the first value of each child array (considering you can have up to 999 child arrays in your 2D array. You can increase or decrease it as per your requirement)
DEMO
Now you can create your function as per your requirement with above query.

How to sum in SQL with conditions?

It's more than likely this question has already been asked before and that I could just not find it now knowing for what to search.
Assuming I have a simple table with two columns, one column holding one of two values, e.g. "positive" and "negative", and the other holding an integer.
Is there a way using standard SQL to calculate a sum of all the numbers in the second column whereby the number is added if the field in the first column reads "positive" and vice-versa subtracted for "negative" numbers?
Also, it would be interesting to understand how to do the same with MS Access if it is different from standard SQL.
You case use sum(case):
select sum(case when col1 = '+' then value
when col2 = '-' then - value
end) as overall
from t;
In MS Access, you would use switch or iff
select sum(switch(col1 = '+', value, col2 = '-', - value, 0)
) as overall
from t;

How to find the next sequence number in oracle string field

I have a database table with document names stored as a VARCHAR and I need a way to figure out what the lowest available sequence number is. There are many gaps.
name partial seq
A-B-C-0001 A-B-C- 0001
A-B-C-0017 A-B-C- 0017
In the above example, it would be 0002.
The distinct name values total 227,705. The number of "partial" combinations is quite large A=150, B=218, C=52 so 1,700,400 potential combinations.
I found a way to iterate through from min to max per distinct value and list all the "missing" (aka available) values, but this seems inefficient given we are not using anywhere close to the max potential partial combinations (10,536 out of 1,700,400).
I'd rather have a table based on existing data with a partial value, it's next available sequence value, and a non-existent partial means 0001.
Thanks
Hmmmm, you can try this:
select coalesce(min(to_number(seq)), 0) + 1
from t
where partial = 'A-B-C-' and
not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
);
EDIT:
For all partials you need a group by:
You can use to_char() to convert it back to a character, if necessary.
select partial, coalesce(min(to_number(seq)), 0) + 1
from t
where not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
)
group by partial;

How to use subquery result as the column name of another query

I want to use the result from subquery as the column name of another query since the data changes column all the time and the subquery will decide which column the current forcast data stored. My example:
select item,
item_type
...
forcast_0 * 0.9 as finalforcast
forcast_0 * 0.8 as newforcast
from sales_data.
but the forcast_0 column is the result (fore_column_name) of the subquery, the result may change to forcast_1 or forcast2
select
fore_column_name
from forecast_history
where ...
Also, the forcast column will be used multiple times in the first query. how could I implement this?
Use your sub query as an inline table. Something like....
select item,
item_type,
..
decode(fore_column_name, 'foo', 1, 2) * 0.9 as finalforcast,
decode(fore_column_name, 'foo', 1, 2) * 0.8 as newforcast
from sales_data,
(
select fore_column_name
from forecast_history
where ...
) inlineTable
I'm assuming here that the value from the sub-query will be the same for each row - so a quick cross-join will suffice. If the value will vary depending on the values in each row of the sales_data table, then some other type of join would be more appropriate.
Quick link to decode - in case you aren't familiar with it.

Coalesce not evaluating second argument?

I am trying to run the following query:
SELECT COALESCE(count(percent_cov), 0)
FROM sample_cov
WHERE target = 542
GROUP BY percent_cov
HAVING percent_cov < 10
Basically, I want to show the number of times this statistic was < 10, and return 0 rather than null if the count was 0. If the count is >0 I get the number I want as the result, however if the count is 0 I still get a null returned. (Same thing if I set the second argument to coalesce as a positive number). What am I doing wrong?
I rewrote your query the way I think you want it:
SELECT count(*) AS ct
FROM sample_cov
WHERE target = 542
AND percent_cov < 10;
count() returns 0 When no matching rows (or non-null values in the column) are found. No need for coalesce(). I quote the manual on this:
It should be noted that except for count, these functions return a
null value when no rows are selected.
Bold emphasis mine. If you want to return a different value when count() comes back with 0, use a CASE statement.
Also, it's no use to write count(percent_cov) while you have WHERE percent_cov < 10. Only non-null values qualify, count(*) yields the same result slightly faster and simpler in this case.
You don't need a GROUP BY clause as you don't group by anything, you are aggregating over the whole table.
You could GROUP BY target, but this would be a different query:
SELECT target, count(*)
FROM sample_cov
WHERE percent_cov < 10
GROUP BY target;
You would need to spell out the expression in the HAVING clause again. Output column names are visible in ORDER BY and GROUP BY clauses, not in WHERE or HAVING.