SQL Optimization, Nested Query on the same table - sql

i have to create a query that return employees having mutliple territories parent for the same function code :
Table employee_territory_function
employee_id| employee_function_id | territory_id
+----------+-----------------------+-------------+
| 12345 | C1 | t1 |
| 12345 | C1 | t2 |
| 12346 | C2 | t3 |
| 12346 | C2 | t4 |
| 12347 | C4 | t8 |
Table territory
territory_id| territory_parent_id
+-----------+-------------------+
| t1 | P1 |
| t2 | P1 |
| t3 | P2 |
| t4 | P3 |
| t8 | P8 |
the result must be the employee_id 12346 which have multiple parents
my query was :
select * from employee_territory_function tr1 where tr1.employee_id in (
select ee.employee_id from (
select et.employee_id from employee_territory_function et
join territory territory on territory.id = et.territory_id
where et.employee_id in (
select etf.employee_id ,etf.employee_function_id from employee_territory_function etf
group by etf.employee_id ,etf.employee_function_id having count(*)>1)) ee
group by ee.employee_id ,ee.employee_function_id ,ee.territory_parent_id having count(*) =1)
The query takes much time execution with 10k for the couple ( employee , function code )
is there a way to optimize or rewrite the query differently ?

SELECT E.EMPLOYEE_ID,E.EMPLOYEE_FUNCTION_ID
FROM EMPLOYEE AS E
JOIN TERRITORY AS T ON E.TERRITORY_ID=T.TERRITORY_ID
GROUP BY E.EMPLOYEE_ID,E.EMPLOYEE_FUNCTION_ID
HAVING MIN(T.TERRITORY_PARENT_ID)<>MAX(T.TERRITORY_PARENT_ID)
Based on your sample data

Related

Can someone help me figure out if I'm making a mistake in my query?

I'm trying to create a query that returns the names of all people in my database that have less than half of the money of the person with the most money.
These is my query:
select P1.name
from Persons P1 left join
AccountOf A1 on A1.person_id = P1.id left join
BankAccounts B1 on B1.id = A1.account_id
group by name
having SUM(B1.balance) < MAX((select SUM(B1.balance) as b
from AccountOf A1 left join
BankAccounts B1 on B1.id = A1.account_id
group by A1.person_id
order by b desc
LIMIT 1)) * 0.5
This is the result:
+-------+
| name |
+-------+
| Evert |
+-------+
I have the following tables in the database:
+---------+--------+--+
| Persons | | |
+---------+--------+--+
| id | name | |
| 11 | Evert | |
| 12 | Xavi | |
| 13 | Ludwig | |
| 14 | Ziggy | |
+---------+--------+--+
+--------------+---------+
| BankAccounts | |
+--------------+---------+
| id | balance |
| 11 | 525000 |
| 12 | 750000 |
| 13 | 1900000 |
| 14 | 1600000 |
+--------------+---------+
+-----------+-----------+------------+
| AccountOf | | |
+-----------+-----------+------------+
| id | person_id | account_id |
| 301 | 11 | 12 |
| 302 | 13 | 12 |
| 303 | 13 | 14 |
| 304 | 14 | 11 |
| 305 | 14 | 13 |
+-----------+-----------+------------+
What am I missing here? I should get two entries in the result (Evert, Xavi)
I wouldn't approach the logic this way (I would use window functions). But your final having has two levels of aggregation. That shouldn't work. You want:
having SUM(B1.balance) < (select 0.5 * SUM(B1.balance) as b
from AccountOf A1 join
BankAccounts B1 on B1.id = A1.account_id
group by A1.person_id
order by b desc
limit 1
)
I also moved the 0.5 into the subquery and changed the left join to a join -- the tables need to match to get balances.
I would recommend window functions, if your - undisclosed! - database supports them.
You can join and aggregate just once, and then use a window max() to get the top balance. All that is then left to is to filter in an outer query:
select *
fom (
select p.id, p.name, coalesce(sum(balance), 0) balance,
max(sum(balance)) over() max_balance
from persons p
left join accountof ao on ao.person_id = p.id
left join bankaccounts ba on ba.id = ao.account_id
group by p.id, p.name
) t
where balance > max_balance * 0.5

SQL - Select records without duplicate on just one field in SQL?

Table 1
| Customer_ID | Template_ID
---------------------
| C1 | T1 |
| C1 | T2 |
---------------------
Table 2
---------------------
| Template_ID | Product_ID
---------------------
| T1 | P1 |
| T1 | P5 |
| T1 | P5 |
| T2 | P10 |
| T2 | P45 |
Expected Join query result:
------------------------------------------
| Customer_ID | Template_ID | Product_ID
------------------------------------------
| C1 | T1 | P1
| C1 | T1 | P5
| C1 | T2 | P10
| C1 | T2 | P45
.
.
For a template, I want to get only the unique Product_ID like above. Currently my query returns P5 twice like,
.
.
| C1 | T1 | P5
| C1 | T1 | P5
.
.
How can I handle this at the query level?
use distinct
select distinct t1.*,t2.productid
from table1 t1 join table2 t2 on t1.Template_ID =t2.Template_ID
Use DISTINCT to eliminates duplicates. It does not apply to the first column only, but to the whole row.
For example:
select distinct t1.customer_id, t1.template_id, t2.product_id
from t1
join t2 on t2.template_id = t1.template_id
You just have to GROUP BY the field you want to be unique, so Product_ID:
SELECT Customer_ID, Template_ID, Product_ID
FROM table1
JOIN table2 using ( Template_ID )
GROUP BY Product_ID;
Please try this.
SELECT
DISTINCT A.Customer_ID ,A.Template_ID ,B.Product_ID
FROM
table1 AS A
INNER JOIN table2 AS B
ON A.Template_ID = B.Template_ID

MS SQL Server - Select group of max n results of multiple columns in join query result set

Table 1
| Customer_ID | Template_ID
---------------------
| C1 | T1 |
| C1 | T2 |
| C2 | T100 |
| C2 | T5 |
---------------------
Table 2
---------------------
| Template_ID | Product_ID
---------------------
| T1 | P1 |
| T1 | P2 |
| T1 | P5 |
| T2 | P10 |
| T2 | P45 |
| T100 | P98 |
| T100 | P78 |
| T5 | P7777 |
| T5 | P9 |
| T5 | P10 |
| T5 | P1 |
Join query result:
------------------------------------------
| Customer_ID | Template_ID | Product_ID
------------------------------------------
| C1 | T1 | P1
| C1 | T1 | P2
| C1 | T1 | P5
| C1 | T2 | P10
| C1 | T2 | P45
| C2 | T100 | P98
| C2 | T100 | P78
| C2 | T5 | P7777
.
.
I have an existing join query which returns all the matches for Customer_ID & Template_ID, I want to restrict to get only the latest 'Templates for customers - Customer_ID & Template_ID '.
Expected output:
Customer_ID Template_ID Product ID
C1 T1 P1
C1 T1 P2
C1 T1 P5
C2 T100 P98
C2 T100 P78
PS: Actually I want the latest 10, for easier understanding I mention as only the recent Customer_ID & Template_ID combination . I have a date column in Table1 , and I got 'order by SAVED_DATE DESC' , so in the result set I want to get the first one. I have other tables as part of join, which I haven't provided to keep it simple.
You can try to make count window function by Customer_ID and Template_ID columns in t CTE result.
then use a correlated subquery exists to get the max count from etc.
;with cte as (
SELECT t1.Template_ID,
t1.Customer_ID,
t2.Product_ID,
COUNT(*) OVER(PARTITION BY Customer_ID,t2.Template_ID) cnt
FROM t1
join t2 on t1.Template_ID = t2.Template_ID
)
select Template_ID,
Customer_ID,
Product_ID
from cte c1
where exists (
select 1
from cte cc
where c1.Customer_ID = cc.Customer_ID
having max(cc.cnt) = c1.cnt
)
sqlfiddle
Sounds like you need a RANK on table1:
;with cte as
(
select *,
-- assign a ranking for each customer
RANK() OVER(PARTITION BY Customer_ID ORDER BY SAVED_DATE DESC) AS rnk
from table1
)
select ...
from cte
join table2 as t2
on cte.Template_ID = t2.Template_ID
WHERE cte.rnk <= 10 -- no filter for the n latest rows per curomer

SQL to find linking column across tables without foreign keys

I am trying to find table links using duplicate column names. Say i have the following tables
T1:
| Prod_ID | Cust_Id | Value |
| P1 | C1 | 1 |
| P2 | C2 | 2 |
| P3 | C3 | 3 |
| P4 | C4 | 4 |
| P5 | C5 | 5 |
T2:
| Prod_ID | Prod_Num |
| P1 | PN1 |
| P2 | PN2 |
| P3 | PN3 |
| P4 | PN4 |
| P5 | PN5 |
I rely on system tables to fetch table information. The data looks like
| tabname | colname |
| T1 | Prod_ID |
| T1 | Cust_Id |
| T1 | Value |
| T2 | Prod_ID |
| T2 | Prod_Num |
| T3 | .... |
If i want to find all tables with columns Prod_ID and Cust_ID, i could do the same using
SELECT tabname, count(*)
FROM syscat.columns
WHERE colname IN ('Prod_ID', 'Cust_Id')
GROUP BY tabname
HAVING count(*) > 1
Now, when i want to find how two columns across tables are linked, the query is getting complex.
For example: To find how Cust_Id and Prod_Num are linked, the expected output would be something like
| tabname | colname |
| T1 | Cust_id |
| T1 | Prod_id |
| T2 | Prod_id |
| T2 | Prod_Num |
Suggesting that Prod_Id is contained in both tables and can be used to map Cust_Id and Prod_num. Is there a script for getting something like above?
I would use self-joins for that.
SELECT c1.tabname, c2.colname joinCol, c3.tabname
FROM syscat.columns c1
JOIN syscat.columns c2 ON c1.tabname = c2.tabname
JOIN syscat.columns c3 ON c3.tabname != c2.tabname and c3.colname = c2.colname
JOIN syscat.columns c4 ON c4.tabname = c3.tabname and c3.colname = c2.colname
WHERE c1.colname = 'Cust_Id' and c4.colname = 'Prod_Num'
The output is the following:
tabname joinCol tabname
---------------------------
T1 Prod_id T2
which means that table t1 is joined with t2 using prod_id (cust_id and prod_num are on the input, therefore there is no need to have them on the output)
demo - it is SQL Server, however, JOIN will work in DB2 as well ;)

Build tree from database

I have two tables.
Table 1. (tbl_1)
| ID | Name |
| -- | -------------|
| 1 | Company1 |
| -- | -------------|
| 2 | Company2 |
| -- | -------------|
| 3 | Company2 |
Table 2. (tbl_2)
| ID | Company_group |
| -- | ------------- |
| 1 | Company2 |
| -- | ------------- |
| 2 | Company2 |
| -- | ------------- |
| 3 | Company2 |
I now that Company_2 is parent company an i want to get next result.
| ID | Name | RootName | RootId |
| -- | -------------| --------- | ------ |
| 1 | Company1 | Company2 | 2 |
| -- | -------------| ----------|--------|
| 3 | Company3 | Company2 | 2 |
I don't know parentId. But i can select all parent companies with follow query:
SELECT DISTINCT id parentId,
name parent_name FROM tbl_1 WHERE name in (
SELECT DISTINCT
Company_group
FROM tbl_2)
How can i build tree for this hierarchy? I can not think, please help.
It is a strange architecture for this case but architect of database is not me.
Also i wrote query but it works not correct. It returns more records.
SELECT ac.id_c parentId, acc.id, ac.Company_group parent_name
FROM tbl_2 ac
JOIN tbl_2 acc
ON ac.Company_group = acc.Company_group
AND ac.id in (
SELECT DISTINCT id parentId
FROM tbl_1 WHERE name in (
SELECT DISTINCT
id parentId
FROM tbl_2)
)
WHERE ac.Company_group iS NOT NULL AND acc.id IS NOT NULL
and ac.id <> acc.id
ORDER BY ac.Company_group
create table tbl_1 (ID int,Name varchar(100));
insert into tbl_1 (ID,Name) values (1,'Company1'),(2,'Company2'),(3,'Company3');
create table tbl_2 (ID int,Company_group varchar(100));
insert into tbl_2 (ID,Company_group) values (1,'Company2'),(2,'Company2'),(3,'Company2');
select t1.ID
,t1.Name
,t2.Company_group as RootName
,t1_b.ID as RootId
from tbl_1 t1
join tbl_2 t2
on t2.ID =
t1.ID
join tbl_1 t1_b
on t1_b.Name =
t2.Company_group
where t1.ID <> t1_b.ID
;
You simply need to join the table with itself:
SELECT *
FROM Company c1
LEFT OUTER JOIN Company c2 ON c1.ParnetID = c2.ID