how would I use a for loop in my hive query - sql

I am running a hive query in sparksql that goes like this:
select
GREATEST(
CASE WHEN x1 = 2 then y1+3 else -1,
case when x2 = 2 then y2+3 else -1,
case when xe = 2 then y2+3 else -1,
.....
case when x100 = 2 then y100+3 else -1
)
as greatest_sum
from table
I need to loop through about 100 columns for each row and get the greatest value after applying the case statements.
I have 100 case statements. I'm looking for a way to do this without repetition over 100 lines. how would I do this?
I have made the case statements simple in this example but I have each case spanning over 15 lines in my code which makes repetition not the best way to go.

Related

SQL tuple/lexicographic comparison with multiple directions

I need to return elements from a database query based on an inequality using the lexicographic ordering on multiple columns. As described in this question this is straightforward if I'm comparing all the columns in the same direction. Is there a straigtforward way to do this if I want to reverse the direction of the sort on some columns.
For instance, I might have columns A, B and C and values 5, 7, and 23 and I'd like to return something like:
WHERE A < 5 OR (A = 5 AND B > 7) OR (A = 5 AND B = 7 AND C < 23)
Is there any easier way to do this using tuples (I have to construct in a function without knowing the number of columns beforehand)? Note that, some columns are DateTime columns so I can't rely on tricks that apply only to integers (e.g. negating both sides). I'm happy to use postgresql specific tricks.
And, if not, is there a specific way/order I should build expressions like the above to best use multicolumn indexes?
Just thinking if going the CTE route and creating a column which stores 0 or 1 for whether the data passes the specific filter criteria or not.
WITH CTE AS
(
SELECT
..,
...,
CASE
WHEN A < 5 THEN 1
WHEN A = 5 AND B > 7 THEN 1
WHEN A = 5 AND B = 7 AND C < 23 THEN 1
ELSE 0
END AS filter_criteria
)
SELECT
..,
..
FROM
CTE
WHERE filter_criteria = 1
OR, directly applying the CASE statement in the WHERE clause. This reduces the extra step of CTE
WHERE 1 = CASE
WHEN A < 5 THEN 1
WHEN A = 5 AND B > 7 THEN 1
WHEN A = 5 AND B = 7 AND C < 23 THEN 1
ELSE 0
END
Referring to the thread you mentioned, can you try the idea WHERE (col_a, 'value_b') > ('value_a', col_b)

Oracle 11g Nested Case Statement Calculation

In Oracle 11g, I am trying to get to a sell price from a query of data. Yes I can export this and write the code somewhere else, but I want to try to do this elegantly in the query.
I only seem to get the first part of the equation and not the last CASE where I use:
WHEN sales_code
What I am ultimately trying to do is take the result from the top and divide it by the bottom except in the case of SALE_CODE 4 where I add 1+1 or 2 to the top result and then divide by the equation.
round(to_number(price) *
CASE WHEN class_code='X'
THEN .48
ELSE .5
END * e1.set_qty +
CASE WHEN carton_pack_qty = '1'
THEN 0
ELSE (
CASE WHEN NVL(SUBSTR(size, 1,NVL(LENGTH(size) - 2,0)),1) > '35'
THEN 3.5
ELSE 3
END)
END +
CASE
WHEN sales_code='1' THEN 0 /(1-17/100)
WHEN sales_code='2' THEN 0 /(1-5/100)
WHEN sales_code='3' THEN 0 /(1-18/100)
WHEN sales_code='4' THEN 1+1 / (1-9.5/100)
WHEN sales_code='5' THEN 0 /(1-17/100)
WHEN sales_code='6' THEN 0 /(1-8/100)
WHEN sales_code='7' THEN 0 /((1-150)/100)
ELSE (100/100)
END,2) AS "Price",
I get a result from the query, but not the whole calculation. I tried this many other ways and there was always an error with parentheses or some other arbitrary error.
Any help would be appreciated.
I think this is your problem:
WHEN sales_code='1' THEN 0 /(1-17/100)
CASE returns a scalar, a number. You're trying to have it return the second half of the formula in your calculation. You need something more like this:
...
END +
CASE WHEN sales_code='4' THEN 1 ELSE 0 END /
CASE
WHEN sales_code='1' THEN (1-17/100)
WHEN sales_code='2' THEN (1-5/100)
WHEN sales_code='3' THEN (1-18/100)
WHEN sales_code='4' THEN (1-9.5/100)
WHEN sales_code='5' THEN (1-17/100)
WHEN sales_code='6' THEN (1-8/100)
WHEN sales_code='7' THEN ((1-150)/100)
ELSE 1 END ...
Actually, I'm not entirely sure what you're trying to do with sales_code='4', but that looks close.
I think I understand now what you are trying to do. Almost at least :-)
The first thing you should do is write down the complete formula with parentheses where needed. Something like:
final = ((price * class_code_factor * set_qty) + quantity_summand + two_if_sales_code4) * sales_code_factor
(That last part looks like a percentage factor, not a divisor to me. I may be wrong of course.)
Once you have the formula right, translate this to SQL:
ROUND
(
(
(
TO_NUMBER(price) *
CASE WHEN class_code = 'X' THEN 0.48 ELSE 0.5 END *
e1.set_qty
)
+
CASE WHEN carton_pack_qty = 1 THEN 0
ELSE CASE WHEN NVL(SUBSTR(size, 1,NVL(LENGTH(size) - 2,0)),1) > '35'
THEN 3.5
ELSE 3
END
END
+
CASE WHEN sales_code = 4 THEN 2 ELSE 0 END
)
*
CASE
WHEN sales_code = 1 THEN 1 - (17 / 100)
WHEN sales_code = 2 THEN 1 - (5 / 100)
WHEN sales_code = 3 THEN 1 - (18 / 100)
WHEN sales_code = 4 THEN 1 - (9.5 / 100)
WHEN sales_code = 5 THEN 1 - (17 / 100)
WHEN sales_code = 6 THEN 1 - (8 / 100)
WHEN sales_code = 7 THEN (1 - 150) / 100)
ELSE 1
END
, 2 ) AS "Price",
Adjust this to the formula you actually want. There are some things I want to point out:
Why is price not a number in your database, but a string that you must convert to a number with TO_NUMBER? That must not be. Store values in the appropriate format in your database.
In a good database you would not have to get a substring of size. It seems you are storing two different things in this column, which violates database normalization. Separate the two things and store them in separate columns.
The substring thing looks strange at that, too. You are taking the left part of the size leaving out the last two characters. It seems hence that you don't know the lenth of the part you are getting, so let's say that this can be one, two or three characers. (I don't know of course.) Now you compare this result with another string; a string that contains a numeric value. But as you are comparing strings, '4' is greater than '35', because '4' > '3'. And '200' is lesser than '35' because '2' < '3'. Is this really intended?
There are more things you treated as strings and I took the liberty to change this to numbers. It seems for instance that a quantity (carton_pack_qty) should be stored as a number. So do this and don't compare it to the string '1', but to the number 1. The sales code seems to be numeric, too. Well, again, I may be wrong.
In a good database there would be no magic numbers in the query. Knowledge belongs in the database, not in the query. If a class code 'X' means a factor of 0.48 and other class codes mean a factor of 0.5, then why is there no table of class codes showing what a class code represents and what factor to apply? Same for the mysterious summand 3 resp. 3.5; there should be a table holding these values and the size and quantity ranges they apply to. And at last there is the sales code which should also be stored in a table showing the summand (2 for code 4, 0 elsewise) and the factor.
The query part would then look something like this:
ROUND((price * cc.factor * el.set_qty) + qs.value + sc.value) * sc.factor, 2) AS "Price"
Breaking the dividend into a sub query worked and then adding parentheses around it to divide by in the main query worked.
(
select
style,
to_number(price) *
CASE WHEN class_code='X'
THEN .48
ELSE .5
END * set_qty +
CASE WHEN carton_pack_qty = '1'
THEN 1
ELSE (
CASE WHEN to_number(NVL(SUBSTR(size, 1,NVL(LENGTH(size) - 2,0)),1)) > 35
THEN 3.5
ELSE 3
END)
END as Price
FROM STYL1 s1,STY2 s2
WHERE s1.style=s2.style
) P1

Max match same numbers from each row

To generate 1mln rows of report with the below mentioned script is taking almost 2 days so, really appreciate if somebody could help me with different script which the report can be generated within 10-15mins please.
The requirement of the report is as following;
Table “cover” contains 5mln rows & 6 columns of data and likewise table “data” contains 500,000 rows and 6 columns.
So, each numbers of the rows in table cover has to go through table date and provide the maximum matches.
For instance, as mentioned on the below tables, there could be 3 matches in row #1, 2 matches in row #2 and 5 matches in row #3 so the script has to select the max selection which is 5 in row #3.
Sample table
UPDATE public.cover_sheet AS fc
SET maxmatch = (SELECT MAX(tmp.mtch)
FROM (
SELECT (SELECT CASE WHEN fc.a=drwo.a THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.b=drwo.b THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.c=drwo.c THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.d=drwo.d THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.e=drwo.e THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.f=drwo.f THEN 1 ELSE 0 END) AS mtch
FROM public.data AS drwo
) AS tmp)
WHERE fc.code>0;
SELECT *
FROM public.cover_sheet AS fc
WHERE fc.maxmatch>0;
As #a_horse_with_no_name mentioned in the comment to the question, your question is not clear...
Seems, you want to get the number of records which 6 fields from both tables are equal.
I'd suggest to:
reduce the number of select statements, then the speed of query execution will increase,
split your query into few smaller ones (good practice), to check your logic,
use join to get equal data, see: Visual Representation of SQL Joins
use subquery or cte to get result on which you'll be able to update table.
I think you want to get result as follow:
SELECT COUNT(*) mtch
FROM public.cover_sheet AS fc INNER JOIN public.data AS drwo ON
fc.a=drwo.a AND fc.b=drwo.b AND fc.c=drwo.c AND fc.d=drwo.d AND fc.e=drwo.e AND fc.f=drwo.f
If i'm not wrong and above query is correct, the time of execution of above query will reduce to about 1-2 minutes.
Finally, update query may look like:
WITH qry AS
(
-- proper select statement here
)
UPDATE public.cover_sheet AS fc
SET maxmatch = qry.<fieldname>
FROM qry
WHERE fc.code>0 AND fc.<key> = qry.<key>;
Note:
I do not see your data and i know nothing about its structure, relationships, etc. So, you have to change above query to your needs.

SQL Server query to return 1 if value exist in a column else return 0

I am trying to query the database for checking if a specific column has a value or not. If there is a value in that column, the query should return 1, else it should return 0.
But my query is returning the total count of the columns for (ex:10).
Note: query is done in Dell Boomi integration platform, SQL Server.
select count (*)
from ApplicationRequest
where EmpID = '993685' and ApplID = '1';
Do you just want case?
select (case when count(*) > 0 then 1 else 0 end)
from ApplicationRequest
where EmpID = 993685 and ApplID = 1;
I removed the single quotes around the comparisons. If they are really numbers then single quotes are not appropriate. If they are indeed strings, then use the single quotes.
If this is what you want, a more efficient method would use exists:
select (case when exists (select 1
from ApplicationRequest
where EmpID = 993685 and ApplID = 1
)
then 1 else 0
end)
The aggregation query needs to find all matching rows. This version can stop at the first one.

MS SQL UPDATE-SET 'Case' different results to SELECT 'Case'

I have an update statement which is designed to manage "jobs". I process one job first whilst the others wait in line until the first one is complete. After which they will be sent to process (far faster as I can leverage calcs from the first run).
To quote my own comment:
-- Set job status to 2 for where there is saved results file
-- Where there is no saved results file:
-- --Set job status to 2 for one instance of each unique param combination
-- --Set job status to 8 for all others
PseudoCode:
UPDATE #Jobs
SET [JobStatusId] = CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END
FROM #NewJobsGrouped [NewJobsGrouped]
LEFT JOIN (
SELECT COUNT([Id]) [FileCount],
[ResultsFile].[ParamsId]
FROM #ResultsFile [ResultsFile]
WHERE [ResultsFile].[IsActive] = 1
GROUP BY [ResultsFile].[ParamsId]
) [PreCalculated]
ON [PreCalculated].[ParamsId] = [NewJobsGrouped].[ParamsId]
where #NewJobsGrouped looks like:
Job ID || GroupedOrder || ParamsId
1460 1 807
1461 2 807
1462 3 807
This does not work. Every job is being set to status 2. However:
SELECT CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END [JobStatusId]
etc
Works exactly as I am expecting.
Why would these two case statements give different results? Is there something obvious I am missing? I honestly can't explain what I'm seeing and whilst I can probably use another temp table to hold the output from the select and have a simpler update - but I'd like to understand what's going on?