SQL Question : Technical question that requires you to answer How many members worked at IBM before working at Google? [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
Assume the following dataset:
member_id
company
Year_started
1
Apple
2001
1
IBM
2002
1
Oracle
2005
1
Microsoft
2010
2
IBM
2002
2
Microsoft
2004
2
Oracle
2008
Member 1, began work at IBM in 2002, moved to Oracle in 2005 and moved to Microsoft in 2010. Member 2, began workin gat IBM in 2002, moved to Microsoft in 2004 and then Moved to oracle in 2008. Assume that for each member in each year, there is only one company (cannot work at 2 different companies in the same year).
**Question: How many members ever worked at IBM prior to working at Oracle? **
How would you go about solving this? I tried a combination of CASE when's but am lost as to where else to go. Thanks.
...
..

I would phrase this using EXISTS:
SELECT DISTINCT member_id
FROM yourTable t1
WHERE company = 'Google' AND
EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.member_id = t1.member_id AND
t2.company = 'IBM' AND
t2.Year_started < t1.Year_started);
In plain English, the above query says to report every employee who worked at Google in some year, for which there is a record from an earlier year for the same employee who worked at IBM at that time.

I would go for querying the table twice and then join like so:
select count(*)
from (
select member_id, min(Year_started) y
from my_table
where company = 'IBM'
group by member_id
) i
join (
select member_id, max(Year_started) y
from my_table
where company = 'Oracle'
group by member_id
) o on i.member_id = o.member_id and i.y < o.y
Note the difference of min/max functions on the year column. This would yield a match for the use case Oracle-IBM-Oracle as well as IBM-Oracle-IBM.

Simply do a self join:
select count(distinct m1.member_id)
from members m1
join members m2
on m1.member_id = m2.member_id and m1.Year_started < m2.Year_started
where m1.company = 'IBM' and m2.company = 'Oracle';

More verbose, but using CTEs:
For each member_id, get the first year they worked at IBM (if any), and get the first year they worked at Oracle (again, if any).
"For each member_id" translates to GROUP BY member_id
"Get the first year..." translates to MIN( CASE WHEN "Company" = 'etc' THEN "Year_Started" END )
Filter those rows to rows where the first-year-at-IBM is less-than their first-year-at-Oracle.
Then simply get the COUNT(*) of those rows.
WITH ibmOracleYears AS (
SELECT
member_id,
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) AS JoinedIbm,
MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END ) AS JoinedOracle
FROM
yourTable
GROUP BY
member_id
),
workedAtIbmBeforeOracle AS (
SELECT
y.*
FROM
ibmOracleYears AS y
WHERE
y.JoinedIbm IS NOT NULL /* <-- This `IS NOT NULL` check isn't absolutely necessary, but I'm including it for clarity. */
AND
y.JoinedOracle IS NOT NULL
AND
y.JoinedIbm < y.JoinedOracle
)
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
workedAtIbmBeforeOracle
But that query can be reduced down to this (if you don't mind anonymous expressions in HAVING clauses):
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
(
SELECT
member_id
FROM
yourTable
GROUP BY
member_id
HAVING
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) < MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END )
) AS q
SQLFiddle of both examples.

Related

How to rewrite CONNECT BY PRIOR Oracle style query to RECURSIVE CTE Postgres for query with correlated WHERE clause? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Now I have following working query for Oracle:
select * from (
select orgId, oNdId, stamp, op,
lgin, qwe, rty,
tusid, tnid, teid,
thid, tehid, trid,
name1, name2,
xtrdta, rownum as rnum from
(
select a.*
from tblADT a
where a.orgId=? and EXISTS(
SELECT oNdId, prmsn FROM (
SELECT oNdId, rp.prmsn FROM tblOND
LEFT JOIN tblRoleprmsn rp ON rp.roleId=? AND rp.prmsn='vors'
START WITH oNdId IN (
SELECT oNdId FROM tblrnpmsn rnp
WHERE rnp.roleId=?
AND rnp.prmsn=?
)
CONNECT BY PRIOR oNdId = parentId
)
WHERE oNdId = a.oNdId OR 1 = (
CASE WHEN prmsn IS NOT NULL THEN
CASE WHEN a.oNdId IS NULL THEN 1 ELSE 0 END
END
)
)
AND op IN (?)
order by stamp desc
) WHERE rownum < (? + ? + 1)
) WHERE rnum >= (? + 1)
For now I am trying to implement analog for PostreSQl. Based on my investigation I could use recursive CTE.
But I am not successful. The eaxamples I found all without where clause so it is not so easy.
Could you please help me with that ?
The Oracle query seems to have a few extra quirks and conditions I'm not able to understand. It's probably related to the specific use case.
In the absence of sample data I'll show you the simple case. You say:
There is a table 'tblOND' which has 2 columns 'oNdId' and 'parentId' it is a hierarchy here
Here's a query that would get all the children of nodes, according to an initial filtering predicate:
create table tblond (
ondid int primary key not null,
parentid int foreign key references tblond (ondid)
);
with recursive
n as (
select ondid, parentid, 1 as lvl
from tblond
where <search_predicate> -- initial nodes
union all
select t.ondid, t.parentid, n.lvl + 1
from n
join tblond t on t.parentid = n.ondid -- #1
)
select * from n
Recursive CTEs are not limited to hierarchies, but to any kind of graph. As long as you are able to depict the relationship to "walk" to the next nodes (#1) you can keep adding rows.
Also the example shows a "made up" column lvl; you can produce as many columns as you need/want.
The section before the UNION ALL is the "anchor" query that is run only once. After the UNION ALL is the "iterative" query that is run iteratively until it does not return any more rows.

How to get the count/records from the table which is not there in same table? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want the employee number who don't have outpunch record for the given date.
For example, In the below table, an employee with number 9100001820 don't have any record with MODE ='OUT' So Query should give his employee number as output
Employee_Number PDate PTime MODE
9100001820 9/8/2019 15:03:29 IN
9100001820 9/8/2019 14:55:34 IN
Use not exists:
select e.employee_number
from employees e -- certainly you have such a table with all employees
where not exists (select 1
from punches p
where p.employee_number = e.employee_number and
p.mode = 'OUT' and p.pdate = '2019-08-09'
);
Actually, though, I suspect you want employees who punched in on the date but didn't punch out. Use aggregation for that:
select p.employee_number
from punches p
where p.pdate = '2019-08-09'
group by p.employee_number
having sum(case when p.mode = 'OUT' then 1 else 0 end) = 0;
This query returns all the employee_numbers where there is no 'OUT' mode inserted.
select t1.Employee_Number
from yourtable t1
left outer join yourtable t2 on t2.Employee_Number = t1.Employee_Number and t2.Mode='Out'
where t2.Employee_Number is null
select t1.Employee_Number
from yourtable t1
where not exists (select * from yourtable where Employee_Number = t1.Employee_Number and Mode='Out')
Assuming you want employees who have punched in, but not punched out. In other words their latest punch record is a punch in record.
select *
from (select employee_number
,pdate
,ptime
,mode
,max(pdate) over (partition by employee_number) as max_pdate
,max(ptime) over (partition by employee_number, pdate) as max_ptime
from employee_punch
where pdate = '2019-08-09'
) m
where pdate = max_pdate
and ptime = max_ptime
and mode = 'IN'
Note, if you have shifts that span midnight, you will need to be smarter about your day filter.
Also, in Gordon's example, he uses not exists. That is rarely performant because on most optimizers it forces an inefficient nested loop join. Eray, gave the outer join example, almost always the better way to go.
My example is a more flexible version of Gordon's summary code, either of which are probably closer to what you want and equally performant.

DB2 SQL Using a set of clauses from this years sales to find out what the Client purchased last year

DB2 SQL Using a set of clauses I am getting a result of specific deals.
Now I am trying to (using a where exists) look at the same table and matching the Part_num, Customer_Name and Date(Date - 1 year) to get the previous years sales for the same parts/Cust.
My end result if just to give me those previous years deals where the Customer number and part match this years sales. I cannot seem to get it to work but I suspect there maybe an easier way.
Select Part_Num, Customer_Num,Start_Date, End_Date
from TableA
Where SalesNumber = A
)as A
WHERE (EXISTS
(
SELECT Part_Num, Customer_Num,Start_Date, End_Date from Table A )AS B
where B.Part_Num = A.PArt_num and B.Customer_Num = A.Customer_Num and date(A.Start_date) = year(B.Start_Date - 1 year)
I am sure I have it wrong. Any thoughts?
Try this (but please try to make an effort next time...):
SELECT Part_Num
,Customer_Num
,Start_Date
,End_Date
FROM TABLE A
WHERE EXISTS (
SELECT Part_Num
,Customer_Num
,Start_Date
,End_Date
FROM TABLE B
WHERE B.Part_Num = A.PArt_num
AND B.Customer_Num = A.Customer_Num
AND YEAR(A.Start_date) = YEAR(B.Start_Date) - 1 )

SQL query MAX(SUM(..)) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Table Structure:
Article(
model int(key),
year int(key),
author varchar(key),
num int)
num: number of articles wrote during the year
Find all the authors that each one of them in one year atleast wrote maximal number of articles (relative to all the other authors)
I tried:
SELECT author FROM Article,
(SELECT year,max(sumnum) s FROM
(SELECT year,author,SUM(num) sumnum FROM Article GROUP BY year,author)
GROUP BY year) AS B WHERE Article.year=B.year and Article.num=B.s;
Is this the right answer?
Thanks.
You might want to try a self-JOIN to get what you are looking for:
SELECT Main.author
FROM Article AS Main
INNER JOIN (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS SumMain
ON SumMain.year = Main.year
AND SumMain.author = Main.author
GROUP BY Main.author
HAVING SUM(Main.num) = MAX(SumMain.sumnum)
;
This would guarantee (as it is ANSI) you are getting the MAX of the SUMmed nums and only bringing back results for what you need. Keep in mind I only JOINed on those two fields because of the information provided ... if you have a unique ID you can JOIN on, or you require more specificity to get a 1-to-1 match, adjust accordingly.
Depending on what DBMS you are using, it can be simplified one of two ways:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
HAVING SUM(num) = MAX(sumnum)
) AS Main
;
Some DBMSes allow you to do multiple aggregate functions, and this could work there.
If your DBMS allows you to do OLAP functions, you can do something like this:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS Main
QUALIFY (
ROW_NUMBER() OVER (
PARTITION BY author
,year
ORDER BY sumnum DESC
) = 1
)
;
Which would limit the result set to only the highest sumnum, although you may need more parameters to handle things if you wanted the year to be involved (you are GROUPing by it, only reason I bring it up).
Hope this helps!
You mention for homework and a valid attempt, however incorrect.
This is under a premise (unclear since no sample data) that the model column is like an auto-increment, and there is only going to be one entry per author per year and never multiple records for the same author within the same year. Ex:
model year author num
===== ==== ====== ===
1 2013 A 15
2 2013 C 18
3 2013 X 17
4 2014 A 16
5 2014 B 12
6 2014 C 16
7 2014 X 18
8 2014 Y 18
So the result expected is highest article count in 2013 = 18 and would only return author "C". In 2014, highest article count is 18 and would return authors "X" and "Y"
First, get a query of what was the maximum number of articles written...
select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year
This would give you one record per year, and the maximum number of articles published... so if you had data for years 2010-2014, you would at MOST have 5 records returned. Now, it is as simple as joining this to the original table that had the matching year and articles
select
A2.*
from
( select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year ) PreQuery
JOIN Article A2
on PreQuery.Year = A2.Year
AND PreQuery.ArticlesPerYear = A2.num
I suggest a CTE
WITH maxyear AS
(SELECT year, max(num) AS max_articles
FROM article
GROUP BY year)
SELECT DISTINCT author
FROM article a
JOIN maxyear m
ON a.year=m.year AND a.num=m.max_articles;
and compare that in performance to a partition, which is another way
SELECT DISTINCT author FROM
(SELECT author, rank() AS r
OVER (PARTITION BY year ORDER BY num DESC)
FROM article) AS subq
WHERE r = 1;
I think some RDBMS will let you put HAVING rank()=1 on the subquery and then you don't need to nest queries.

Oracle - group by of joined tables

I tried to look for an answer and I found more advices, but not anyone of them was helpful, so I'm trying to ask now.
I have two tables, one with distributors (columns: distributorid, name) and the second one with delivered products (columns: distributorid, productid, corruptcount, date) - the column corruptcount contains the number of corrupted deliveries. I need to select the first five distributors with the most corrupted deliveries in last two months. I need to select distributorid, name and sum of corruptcount, here is my query:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
GROUP BY del.distributorid
But Oracle returns error message: "not a GROUP BY expression".And when I edit my query to this:
SELECT del.distributorid, d.name, del.corruptcount-- , SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
--GROUP BY del.distributorid
It's working as you expect and returns correct data:
1 IBM 10
2 DELL 0
2 DELL 1
2 DELL 6
3 HP 3
8 ACER 2
9 ASUS 1
I'd like to group this data. Where and why is my query wrong? Can you help please? Thank you very, very much.
I think the problem is just the d.name in the select list; you need to include it in the group by clause as well. Try this:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d join
delivery del
on d.distributorid = del.distributorid
WHERE d.distributorid IN
(SELECT distributorid
FROM delivery
WHERE storeid = 1 AND
"date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE AND
ROWNUM <= 5
GROUP BY distributorid
ORDER BY SUM(corruptcount) DESC
)
GROUP BY del.distributorid, d.name;
I also switched the query to using explicit join syntax with an on clause, instead of the outdated implicit join syntax using a condition in the where.
I also removed the additional layer of subquery. It is not really necessary.
EDIT:
"Why does d.name have to be included in the group by?" The easy answer is that SQL requires it because it does not know which value to include from the group. You could instead use min(d.name) in the select, for instance, and there would be no need to change the group by clause.
The real answer is a wee bit more complicated. The ANSI standard does actually permit the query as you wrote it. This is because id is (presumably) declared as a primary key on the table. When you group by a primary key (or unique key), then you can use other columns from the same table just as you did. Although ANSI supports this, most databases do not yet. So, the real reason is that Oracle doesn't support the ANSI standard functionality that would allow your query to work.