Oracle SQL how to get rowmax? - sql

I likely lack the correct vocabulary, which is why my google searches were unsuccessful to achieve the following rowmax type of operation: I want to create a new column that is for each row the maximum of two existing columns and is bounded by 0.
SELECT a,b, rowmax(a,b,0) as c
FROM ...

Use greatest():
greatest(a, b, 0) as c

Related

Adding a "calculated column" to BigQuery query without repeating the calculations

I want to resuse value of calculated columns in a new third column.
For example, this query works:
select
countif(cond1) as A,
countif(cond2) as B,
countif(cond1)/countif(cond2) as prct_pass
From
Where
Group By
But when I try to use A,B instead of repeating the countif, it doesn't work because A and B are invalid:
select
countif(cond1) as A,
countif(cond2) as B,
A/B as prct_pass
From
Where
Group By
Can I somehow make the more readable second version work ?
Is this first one inefficient ?
You should construct a subquery (i.e. a double select) like
SELECT A, B, A/B as prct_pass
FROM
(
SELECT countif(cond1) as A,
countif(cond2) as B
FROM <yourtable>
)
The same amount of data will be processed in both queries.
In the subquery one you will do only 2 countif(), in case that step takes a long time then doing 2 instead of 4 should be more efficient indeed.
Looking at an example using bigquery public datasets:
SELECT
countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B,
countif(homeFinalRuns>3)/countif(awayFinalRuns>3) as division
FROM `bigquery-public-data.baseball.games_post_wide`
or
SELECT A, B, A/B as division FROM
(
SELECT countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B
FROM `bigquery-public-data.baseball.games_post_wide`
)
we can see that doing all in one (without a subquery) is actually slightly faster. (I ran the queries 6 times for different values of the inequality, 5 times was faster and one time slower)
In any case, the efficiency will depend on how taxing is to compute the condition in your particular dataset.

Using previous table in pig group syntax after filter

Suppose I have a table in pig with 3 columns, a , b, c. Now suppose I want to filter the table by b == 4 and then group it by a. I believe that would look something like this.
t1 = my_table; -- the table contains three columns a, b, c
t1_filtered = FILTER t1_filtered by (
b == 4
);
t1_grouped = GROUP t1_filtered by my_table.a;
My question is why can't it look like this:
t1 = my_table; -- the table contains three columns a, b, c
t1_filtered = FILTER t1_filtered by (
b == 4
);
t1_grouped = GROUP t1_filtered by t1_filtered.a;
Why do you have to reference the table before the filter? I'm trying to learn pig and i find myself making this mistake a lot. It seems to me that t1_filtered should equal a table that is just the filtered version of t1. Therefore a simple group should make sense, but i've been told you need to reference the table from before. Does anyone know whats going on behind the scenes and why this makes sense? Also, help naming this question is also appreciated.
The way you have De-referenced(.) is also not correct. This is how it should be.
A = LOAD '/filepath/to/tabledata' using PigStorage(',') as (a:int,b:int,c:int);
B = FILTER A BY a==1;
C = GROUP B BY a;
But your way of dereferencing(.) will also work in some cases. You can only use dot(.) when you are referencing a complex data type like a map,tuple or bag. If we use dot operator to access the normal fields it would expect a scalar output. If it has more than one output then you will get a error something like this.
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1,2,3), 2nd :(2,2,2)
Your way of using the dot operator would work only if the output of your group by has only one output if not you will end up with this error. Relation B is not a complex data type that is the reason we do not use any dereferencing operator in the group by clause.
Hope this answers your question.

PostgreSQL: How to access column on anonymous record

I have a problem that I'm working on. Below is a simplified query to show the problem:
WITH the_table AS (
SELECT a, b
FROM (VALUES('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data AS (
SELECT 'data7' AS c, array_agg(ROW(a, b)) AS d
FROM the_table
)
SELECT c, d[array_upper(d, 1)]
FROM my_data
In the my data section, you'll notice that I'm creating an array from multiple rows, and the array is returned in one row with other data. This array needs to contain the information for both a and b, and keep two values linked together. What would seem to make sense would be to use an anonymous row or record (I want to avoid actually creating a composite type).
This all works well until I need to start pulling data back out. In the above instance, I need to access the last entry in the array, which is done easily by using array_upper, but then I need to access the value in what used to be the b column, which I cannot figure out how to do.
Essentially, right now the above query is returning:
"data7";"(data5,6)"
And I need to return
"data7";6
How can I do this?
NOTE: While in the above example I'm using text and integers as the types for my data, they are not the actual final types, but are rather used to simplify the example.
NOTE: This is using PostgreSQL 9.2
EDIT: For clarification, Something like SELECT 'data7', 6 is not what I'm after. Imagine that the_table is actually pulling from database tables and not the WITH statement the I put in for convenience, and I don't readily know what data is in the table.
In other words, I want to be able to do something like this:
SELECT c, (d[array_upper(d, 1)]).b
FROM my_data
And get this back:
"data7";6
Essentially, once I've put something into an anonymous record by using the row() function, how do I get it back out? How do I split up the 'data5' part and the 6 part so that they don't both return in one column?
For another example:
SELECT ROW('data5', 6)
makes 'data5' and 6 return in one column. How do I take that one column and break it back into the original two?
I hope that clarifies
If you can install the hstore extension:
with the_table as (
select a, b
from (values('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data as (
select 'data7' as c, array_agg(row(a, b)) as d
from the_table
)
select c, (avals(hstore(d[array_upper(d, 1)])))[2]
from my_data
;
c | avals
-------+-------
data7 | 6
This is just a very quick throw together around a similarish problem - not an answer to your question. This appears to be one direction towards identifying columns.
with x as (select 1 a, 2 b union all values (1,2),(1,2),(1,2))
select a from x;

Using NVL for multiple columns - Oracle SQL

Good morning my beloved sql wizards and sorcerers,
I am wanting to substitute on 3 columns of data across 3 tables. Currently I am using the NVL function, however that is restricted to two columns.
See below for an example:
SELECT ccc.case_id,
NVL (ccvl.descr, ccc.char)) char_val
FROM case_char ccc, char_value ccvl, lookup_value lval1
WHERE
ccvl.descr(+) = ccc.value
AND ccc.value = lval1.descr (+)
AND ccc.case_id IN ('123'))
case_char table
case_id|char |value
123 |email| work_email
124 |issue| tim_
char_value table
char | descr
work_email | complaint mail
tim_ | timeliness
lookup_value table
descr | descrlong
work_email| xxx#blah.com
Essentially what I am trying to do is if there exists a match for case_char.value with lookup_value.descr then display it, if not, then if there exists a match with case_char.value and char_value.char then display it.
I am just trying to return the description for 'issue'from the char_value table, but for 'email' I want to return the descrlong from the lookup_value table (all under the same alias 'char_val').
So my question is, how do I achieve this keeping in mind that I want them to appear under the same alias.
Let me know if you require any further information.
Thanks guys
You could nest NVL:
NVL(a, NVL(b, NVL(c, d))
But even better, use the SQL-standard COALESCE, which does take multiple arguments and also works on non-Oracle systems:
COALESCE(a, b, c, d)
How about using COALESCE:
COALESCE(ccvl.descr, ccc.char)
Better to Use COALESCE(a, b, c, d) because of below reason:
Nested NVL logic can be achieved in single COALESCE(a, b, c, d).
It is SQL standard to use COALESCE.
COALESCE gives better performance in terms, NVL always first calculate both of the queries used and then compare if the first value is null then return a second value. but in COALESCE function it checks one by one and returns response whenever if found a non-null value instead of executing all the used queries.

Parameters in Microsoft Access

I'm really confused with how parameters work in Microsoft Access. I know that parameters are supposed to be used to allow a user to type in values when the query is run - instead of having to modify the query for each instance.
So, let's use the following example.
SELECT countyTable.countyName, Sqr((69.1*(46.47-avgLatitude))^2+(69.1*(-90.17-avgLongitude)*Cos(avgLatitude/57.3))^2) as Distance
FROM countyTable
WHERE ((([avgLatitude]-5)<46.47) AND (([avgLatitude]+5)>46.47) AND (([avgLongitude]-5)<-90.17) AND (([avgLongitude]+5)>-90.17))
ORDER BY Sqr((69.1*(46.47-avgLatitude))^2+(69.1*(-90.17-avgLongitude)*Cos(avgLatitude/57.3))^2), countyTable.countyName
1) I am SELECTing a column that contains the SQR function. I also have that column named as 'Distance'. However, when I try to ORDER BY on said column - and refer to it as 'Distance' - it asks for a value instead of sorting on that column. The only way I can get the query to ORDER BY is to duplicate the expression from the SELECT line. This seems unnecessary.
2) Right now, I have some values hard-coded in. I could care less about the values '57.3' and '69.1' However, for '46.47' I would like to replace with 'x2' and -90.17 with 'y2'. How I've been trying to write this with parameters, Access asks for values for each instance of 'x2' and 'y2'. This doesn't help me at all, so I have them hardcoded in.
Any help at all? Thanks!
1) I am SELECTing a column that contains the SQR function. I also have that column named as 'Distance'. However, when I try to ORDER BY on said column - and refer to it as 'Distance' - it asks for a value instead of sorting on that column. The only way I can get the query to ORDER BY is to duplicate the expression from the SELECT line. This seems unnecessary.
Yes Access does a poor job. Every real DBMS now supports ordering by the column alias created in the SELECT clause. To do this in Access, you can either do what you are doing (repeat the expression) or subquery it, e.g.
select a,b,c
from (
select a, b, a+b as C
from sometable
) AS SUBQUERIED
order by c
2) How I've been trying to write this with parameters, Access asks for values for each instance of 'x2' and 'y2'.
You're doing it wrong. Access should prompt only once. If you have a query like this
select a, b, a+b as C
from sometable
where a > [x] and y > [x]
It will see both [x]'s as being the same - and only one prompt for both. Just make sure they are spelt exactly the same.
If you wanted something like this simplified example:
SELECT
countyTable.countyName,
Sqr((69.1*(46.47-avgLatitude))^2+(69.1*(-90.17-avgLongitude)*Cos(avgLatitude/57.3))^2) as Distance
FROM countyTable
ORDER BY Distance;
For the ORDER BY you can reference that complex Distance expression by its ordinal position in the field list.
SELECT
countyTable.countyName,
Sqr((69.1*(46.47-avgLatitude))^2+(69.1*(-90.17-avgLongitude)*Cos(avgLatitude/57.3))^2) as Distance
FROM countyTable
ORDER BY 2;
That method is supported at least since Jet 4 (Access 2000), and also by the newer ACE database engine.