How to transfer Oracle (+) operator with joins to Spark SQL?

How to transfer Oracle (+) operator with joins to Spark SQL? - sql

I have a part of code in Oracle as this:
SELECT
t1.col1,
t1.col2,
t2.col1,
t2,col2
FROM
t1,
t2
WHERE
t1.col1 <> 121
AND t1.col1 = t2.col1(+)
AND t1.col2 = t2.col2(+)
AND 'ABC' = t2.col3(+)
How to transfer it to Spark SQL assuming tables (t1 and t2) are already registered?
Thanks.

(+) is the old Oracle outer join operator. It was used like this (Scott's sample schema; there are no employees in department 40 so - if you want to display it, you have to use outer join):
SQL> select d.deptno, d.dname, e.ename
2 from emp e,
3 dept d
4 where d.deptno = e.deptno (+) --> outer join
5 order by d.deptno, e.ename;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING CLARK
10 ACCOUNTING KING
10 ACCOUNTING MILLER
20 RESEARCH ADAMS
20 RESEARCH FORD
20 RESEARCH JONES
20 RESEARCH SCOTT
20 RESEARCH SMITH
30 SALES ALLEN
30 SALES BLAKE
30 SALES JAMES
30 SALES MARTIN
30 SALES TURNER
30 SALES WARD
40 OPERATIONS --> department with no employees
15 rows selected.
If Spark SQL supports modern ANSI syntax, then the above code can be rewritten to
SQL> select d.deptno, d.dname, e.ename
2 from dept d left join emp e on e.deptno = d.deptno --> outer join
3 order by d.deptno, e.ename;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING CLARK
10 ACCOUNTING KING
10 ACCOUNTING MILLER
20 RESEARCH ADAMS
20 RESEARCH FORD
20 RESEARCH JONES
20 RESEARCH SCOTT
20 RESEARCH SMITH
30 SALES ALLEN
30 SALES BLAKE
30 SALES JAMES
30 SALES MARTIN
30 SALES TURNER
30 SALES WARD
40 OPERATIONS --> department with no employees
15 rows selected.
SQL>
See if it helps.

I do not see the point of joining t1 and t2:
SELECT
t1.col1,
t1.col2
FROM
t1
WHERE
t1.col1 <> 121

Related

SQL subquery COUNT for Oracle

In my Oracle database, I have two tables T1 with primary key k1, and T2 with a composite primary key k1, k2. I would like to select all columns in T1 along with the number of lines in T2 such as T1.k1 = T2.k1.
It seems simple, but I can't figure how to use the COUNT function to get this result, any idea ?

I don't have your tables so I'll try to illustrate it using Scott's sample emp and dept tables:
SQL> select * from dept t1 order by t1.deptno;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
SQL> select deptno, empno, ename from emp order by deptno;
DEPTNO EMPNO ENAME
---------- ---------- ----------
10 7782 CLARK --> 3 employees in deptno 10
10 7839 KING
10 7934 MILLER
20 7566 JONES --> 5 employees in deptno 20
20 7902 FORD
20 7876 ADAMS
20 7369 SMITH
20 7788 SCOTT
30 7521 WARD --> 6 employees in deptno 30
30 7844 TURNER
30 7499 ALLEN
30 7900 JAMES
30 7698 BLAKE
30 7654 MARTIN
--> 0 employees in deptno 40
14 rows selected.
SQL>
A few options you might try:
Correlated subquery:
SQL> select t1.*,
2 (select count(*) from emp t2 where t2.deptno = t1.deptno) cnt
3 from dept t1
4 order by t1.deptno;
DEPTNO DNAME LOC CNT
---------- -------------- ------------- ----------
10 ACCOUNTING NEW YORK 3
20 RESEARCH DALLAS 5
30 SALES CHICAGO 6
40 OPERATIONS BOSTON 0
SQL>
(Outer) join with the COUNT function and the GROUP BY clause:
SQL> select t1.*, count(t2.rowid) cnt
2 from dept t1 left join emp t2 on t2.deptno = t1.deptno
3 group by t1.deptno, t1.dname, t1.loc
4 order by t1.deptno;
DEPTNO DNAME LOC CNT
---------- -------------- ------------- ----------
10 ACCOUNTING NEW YORK 3
20 RESEARCH DALLAS 5
30 SALES CHICAGO 6
40 OPERATIONS BOSTON 0
SQL>
(Outer) join with the COUNT function in its analytic form:
SQL> select distinct t1.*,
2 count(t2.rowid) over (partition by t1.deptno) cnt
3 from dept t1 left join emp t2 on t2.deptno = t1.deptno
4 order by t1.deptno;
DEPTNO DNAME LOC CNT
---------- -------------- ------------- ----------
10 ACCOUNTING NEW YORK 3
20 RESEARCH DALLAS 5
30 SALES CHICAGO 6
40 OPERATIONS BOSTON 0
SQL>

list values that are duplicated more than 4 times

I am making a join between 2 tables, where they bring me number_phone that have a relationship and I bring the times that these are repeated, however, I am trying to make a condition to the count, so that it only lists those that are repeated more than 4 times, I tried with having account and it brings me the counter all in null.
It is worth mentioning that I did not occupy group by for the count because it brought me wrong values.
SELECT
REPLACE(REPLACE(t.id_contrato,'0999',''),'0998','')as contrato ,
t.num_telefono,
conc.valor_actual,
(
SELECT COUNT('x')
FROM TBL_TELEFONO ct
WHERE ct.num_telefono = t.num_telefono
AND ct.origen_tel='TELEFONO CONTRATO'
-- HAVING COUNT(*) > 4
) as counter
FROM TBL_TELEFONO t
INNER JOIN CAM_TBL_ALERTA_CONCENTRADO conc ON t.num_telefono = conc.valor_actual
WHERE id_contrato IS NOT NULL
AND id_contrato NOT IN ('N/A')
ORDER BY 4 DESC
How can I list only those that are repeated more than 4 times?

I don't like debugging code with no sample data, so I'll try to illustrate it on Scott's sample EMP table. "Jobs" will act like your "telephone numbers".
SQL> select deptno, ename, job
2 from emp
3 order by job;
DEPTNO ENAME JOB
---------- ---------- ---------
20 SCOTT ANALYST --> 2 analysts
20 FORD ANALYST
10 MILLER CLERK --> 4 clerks
30 JAMES CLERK
20 SMITH CLERK
20 ADAMS CLERK
30 BLAKE MANAGER --> 3 managers
20 JONES MANAGER
10 CLARK MANAGER
10 KING PRESIDENT --> 1 president
30 TURNER SALESMAN --> 4 salesmen
30 MARTIN SALESMAN
30 WARD SALESMAN
30 ALLEN SALESMAN
14 rows selected.
SQL>
According to that, we'd like to fetch all clerks and salesmen as there are 4 (or more) of them.
Instead of count aggregate function, use count in its analytic form:
SQL> select deptno, ename, job,
2 count(*) over (partition by job) cnt
3 from emp
4 order by job;
DEPTNO ENAME JOB CNT
---------- ---------- --------- ----------
20 SCOTT ANALYST 2
20 FORD ANALYST 2
10 MILLER CLERK 4
30 JAMES CLERK 4
20 SMITH CLERK 4
20 ADAMS CLERK 4
30 BLAKE MANAGER 3
20 JONES MANAGER 3
10 CLARK MANAGER 3
10 KING PRESIDENT 1
30 TURNER SALESMAN 4
30 MARTIN SALESMAN 4
30 WARD SALESMAN 4
30 ALLEN SALESMAN 4
14 rows selected.
SQL>
Now things become easier: use that query as a CTE (or a subquery), and apply where clause:
SQL> with temp as
2 (select deptno, ename, job,
3 count(*) over (partition by job) cnt
4 from emp
5 )
6 select deptno, ename, job
7 from temp
8 where cnt >= 4
9 order by job;
DEPTNO ENAME JOB
---------- ---------- ---------
10 MILLER CLERK
30 JAMES CLERK
20 SMITH CLERK
20 ADAMS CLERK
30 TURNER SALESMAN
30 MARTIN SALESMAN
30 WARD SALESMAN
30 ALLEN SALESMAN
8 rows selected.
SQL>
Applied to your query (again, can't test it without any sample data):
with temp as
(select
replace(replace(t.id_contrato,'0999',''),'0998','')as contrato ,
t.num_telefono,
conc.valor_actual,
count(*) over (partition by t.num_telefono) as counter
from tbl_telefono t
inner join cam_tbl_alerta_concentrado conc on t.num_telefono = conc.valor_actual
where id_contrato is not null
and id_contrato not in ('N/A')
)
select contrato, num_telefono, valor_actual
from temp
where counter >= 4;

Join to the selection that have more than 4
SELECT
REPLACE(REPLACE(t.id_contrato,'0999',''),'0998','')as contrato ,
t.num_telefono,
conc.valor_actual,
ct.counter
FROM TBL_TELEFONO t
INNER JOIN (
SELECT num_telefono, COUNT(*) AS counter
FROM TBL_TELEFONO
WHERE origen_tel='TELEFONO CONTRATO'
GROUP BY num_telefono
HAVING COUNT(*) > 4
) ct
ON ct.num_telefono = t.num_telefono
INNER JOIN CAM_TBL_ALERTA_CONCENTRADO conc
ON t.num_telefono = conc.valor_actual
WHERE id_contrato IS NOT NULL
AND id_contrato NOT IN ('N/A')
ORDER BY ct.counter DESC

Wrap the query in another to return only where count > 4
select *
from (
<your query, but without order by>
) x
where count > 4
order by count desc

ORDER BY with DISTINCT gives ORA-01791: not a SELECTed expression

I trying to use oracle order by with select statement but it causes an Exception:
ORA-01791: not a SELECTed expression.
select distinct usermenu.menuname
from usermenu, userpermission
where userpermission.menuno = usermenu.menuno
and userpermission.userno = 1
order by userpermission.menuno;

When there's DISTINCT or an aggregate function in the SELECT statement's column list, ORDER BY a column which isn't part of the SELECT column list won't work.
Here's an example, based on Scott's schema.
This works OK, although D.LOC isn't selected:
SQL> select d.dname, e.ename
2 from dept d join emp e on e.deptno = d.deptno
3 order by d.loc;
DNAME ENAME
-------------- ----------
SALES BLAKE
SALES TURNER
SALES ALLEN
SALES MARTIN
SALES WARD
SALES JAMES
RESEARCH SCOTT
RESEARCH JONES
RESEARCH SMITH
RESEARCH ADAMS
RESEARCH FORD
ACCOUNTING KING
ACCOUNTING MILLER
ACCOUNTING CLARK
14 rows selected.
Now, add DISTINCT - basically, that's what you have:
SQL> select distinct d.dname, e.ename
2 from dept d join emp e on e.deptno = d.deptno
3 order by d.loc;
order by d.loc
*
ERROR at line 3:
ORA-01791: not a SELECTed expression
The same goes for aggregate functions, such as COUNT:
SQL> select d.dname, e.ename, count(*)
2 from dept d join emp e on e.deptno = d.deptno
3 group by d.dname, e.ename
4 order by d.loc;
order by d.loc
*
ERROR at line 4:
ORA-00979: not a GROUP BY expression
SQL>
So, what to do? Order by something else. Alternatively, use the current query as an inline view, join it with the table that contains the column you'd want to order the result by and it'll work:
SQL> select x.dname, x.ename
2 from (select distinct d.dname, e.ename
3 from dept d join emp e on e.deptno = d.deptno
4 ) x
5 join dept d1 on d1.dname = x.dname
6 order by d1.loc;
DNAME ENAME
-------------- ----------
SALES TURNER
SALES JAMES
SALES BLAKE
SALES WARD
SALES MARTIN
SALES ALLEN
RESEARCH SMITH
RESEARCH FORD
RESEARCH ADAMS
RESEARCH SCOTT
RESEARCH JONES
ACCOUNTING MILLER
ACCOUNTING KING
ACCOUNTING CLARK
14 rows selected.
SQL>

Apex. How to output NULL values too?

I had a question with a single sql query related to an interactive table in apex
Here, look, I prescribe a request in which I print all the existing artists in the database, but I only have entries where all the fields have values, and those in which at least one NULL are not displayed
select artist.name as "Artist", country.name as "Country" , city.name as "City of foundation", label.name as "Label of records"
from artist, country, city, label
where artist.country = country_id
and artist.city = city_id
and city.country = country_id
and artist.label = label_id
How to fix it?
https://i.stack.imgur.com/ZRYzm.png

As you didn't provide test case (a screenshot isn't quite enough - at least, not to me), I'll try to show what might be going on using Scott's schema.
There are 4 departments: note department 40, and the fact that nobody works in it:
SQL> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
SQL> select * from emp where deptno = 40;
no rows selected
If you want to display all 4 departments and employees who work in them, you'd join EMP and DEPT table. Outer join lets you display department 40 (which, as we saw, has no employees):
SQL> select d.deptno, d.dname, e.ename
2 from dept d left join emp e on e.deptno = d.deptno --> outer join is here
3 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
10 ACCOUNTING KING
20 RESEARCH JONES
20 RESEARCH SMITH
20 RESEARCH SCOTT
20 RESEARCH FORD
20 RESEARCH ADAMS
30 SALES WARD
30 SALES TURNER
30 SALES ALLEN
30 SALES JAMES
30 SALES MARTIN
30 SALES BLAKE
40 OPERATIONS --> this is what you're looking for
15 rows selected.
SQL>
You'd get the same result using the old Oracle's (+) outer join operator. You'd rather switch to modern joins and avoid that operator, though.
SQL> select d.deptno, d.dname, e.ename
2 from dept d, emp e
3 where d.deptno = e.deptno (+) --> the old outer join operator
4 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
10 ACCOUNTING KING
20 RESEARCH JONES
20 RESEARCH SMITH
20 RESEARCH SCOTT
20 RESEARCH FORD
20 RESEARCH ADAMS
30 SALES WARD
30 SALES TURNER
30 SALES ALLEN
30 SALES JAMES
30 SALES MARTIN
30 SALES BLAKE
40 OPERATIONS
15 rows selected.
SQL>

select how many DEPTNO's exists for each LOC plus number of EMP's

I guess messed up the joins...
Single DEPT has many LOCations, every DEPT should have a sum of its EMPs. I need to show DISTINCT LOCs, number of LOCs per DEPT and sum of EMPS in each DEPT - without dividing the locations.
With tables DEPT & EMP like this:
SELECT DEPTNO, DNAME, LOC FROM DEPT;
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
50 NONE DALLAS
select ENAME, DEPTNO from EMP;
SMITH 20
ALLEN 30
WARD 30
JONES 20
MARTIN 30
BLAKE 30
CLARK 10
SCOTT 20
KING 10
TURNER 30
ADAMS 20
JAMES 30
FORD 20
MILLER 10
CURT 40
Can't properly add the below subquery into my WITH... clause below. Need to select how many DEPTNO's exists for each LOC plus give a number of EMP's in each DEPT like here (plus location number) - it should use 2 different kinds of summaries in:
select e.deptno, count(e.deptno) from emp e
group by e.deptno;
10 3
20 5
30 6
40 1
Here is what I did:
WITH workers_per_dept as
(SELECT
d.LOC LOC,
d.deptno DEPTNO,
count(e.empno) EMP_NUMBER
FROM dept d
LEFT OUTER JOIN emp e ON (e.deptno = d.deptno)
GROUP BY d.LOC,d.deptno
ORDER BY d.deptno)
select
d.LOC LOCATION,
count(d.LOC) LOCATIONS_PER_DEPT,
workers_per_dept.EMP_NUMBER
from DEPT d, workers_per_dept
WHERE d.LOC = workers_per_dept.LOC
GROUP BY d.LOC, workers_per_dept.EMP_NUMBER
ORDER BY 1;
I receive this (should be groupped by LOC with):
BOSTON 1 1
CHICAGO 1 6
DALLAS 2 0
DALLAS 2 5
NEW YORK 1 3
(result should have not repeated LOC's - 'DALLAS 2 0' should be skipped)

Here is the query. You need to group by location and count distinct occurrences of Dept's in each group:
select d.loc,
count(distinct d.deptno) depts,
count(e.ename) emps
from dept d
left join emp e on d.deptno = e.deptno
group by d.loc

Stupid me... but finally did it myself
it was so easy...
SELECT d.LOC LOCATION,
(select count(d.LOC) FROM DEPT l WHERE l.LOC=d.LOC) LOCS,
(select count(e.deptno) FROM EMP e WHERE e.DEPTNO = d.DEPTNO) EMPS
FROM DEPT d;
Thanks for help!!!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to transfer Oracle (+) operator with joins to Spark SQL? - sql

I have a part of code in Oracle as this: SELECT t1.col1, t1.col2, t2.col1, t2,col2 FROM t1, t2 WHERE t1.col1 <> 121 AND t1.col1 = t2.col1(+) AND t1.col2 = t2.col2(+) AND 'ABC' = t2.col3(+) How to transfer it to Spark SQL assuming tables (t1 and t2) are already registered? Thanks.

I do not see the point of joining t1 and t2: SELECT t1.col1, t1.col2 FROM t1 WHERE t1.col1 <> 121

Related

SQL subquery COUNT for Oracle

list values that are duplicated more than 4 times

ORDER BY with DISTINCT gives ORA-01791: not a SELECTed expression

Apex. How to output NULL values too?

select how many DEPTNO's exists for each LOC plus number of EMP's

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to transfer Oracle (+) operator with joins to Spark SQL? - sql

I have a part of code in Oracle as this: SELECT t1.col1, t1.col2, t2.col1, t2,col2 FROM t1, t2 WHERE t1.col1 <> 121 AND t1.col1 = t2.col1(+) AND t1.col2 = t2.col2(+) AND 'ABC' = t2.col3(+) How to transfer it to Spark SQL assuming tables (t1 and t2) are already registered? Thanks.

I do not see the point of joining t1 and t2: SELECT t1.col1, t1.col2 FROM t1 WHERE t1.col1 <> 121

Related

SQL subquery COUNT for Oracle

list values ​that are duplicated more than 4 times

ORDER BY with DISTINCT gives ORA-01791: not a SELECTed expression

Apex. How to output NULL values too?

select how many DEPTNO's exists for each LOC plus number of EMP's

Categories

Resources

list values that are duplicated more than 4 times