Tricky SQL SELECT Statement - sql

I have a performance issue when selecting data in my project.
There is a table with 3 columns: "id","time" and "group"
The ids are just unique ids as usual.
The time is the creation date of the entry.
The group is there to cummulate certain entries together.
So the table data may look like this:
ID | TIME | GROUP
------------------------
1 | 20090805 | A
2 | 20090804 | A
3 | 20090804 | B
4 | 20090805 | B
5 | 20090803 | A
6 | 20090802 | B
...and so on.
The task is now to select the "current" entries (their ids) in each group for a given date. That is, for each group find the most recent entry for a given date.
Following preconditions apply:
I do not know the different groups in advance - there may be many different ones changing over time
The selection date may lie "in between" the dates of the entries in the table. Then I have to find the closest one in each group. That is, TIME is less than the selection date but the maximum of those to which this rule applies in a group.
What I currently do is a multi-step process which I would like to change into single SELECT statement:
SELECT DISTINCT group FROM table to find the available groups
For each group found in 1), SELECT * FROM table WHERE time<selectionDate AND group=loop ORDER BY time DESC
Take the first row of each result found in 2)
Obviously this is not optimal.
So I would be very happy if some more experienced SQL expert could help me to find a solution to put these steps in a single statement.
Thank you!

The following will work on SQL Server 2005+ and Oracle 9i+:
WITH groups AS (
SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group)
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN groups g ON g.group = t.group AND g.maxtime = t.time
Any database should support:
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN (SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group) g ON g.group = t.group AND g.maxtime = t.time

Here's how I would do it in SQL Server:
SELECT * FROM table WHERE id in
(SELECT top 1 id FROM table WHERE time<selectionDate GROUP BY [group] ORDER BY [time])

The solution will vary by database server, since the syntax for TOP queries varies. Basically you are looking for a "top n per group" query, so you can Google that if you want.
Here is a solution in SQL Server. The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.
select
HRRanks.*
from
(
Select
b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from
Batting b
where
b.yearID > 1990
group by
b.yearID, b.playerID
)
HRRanks
where
HRRanks.HR_Rank <= 10
Here is a solution in Oracle (Top Salespeople per Department)
SELECT deptno, avg_sal
FROM(
SELECT deptno, AVG(sal) avg_sal
GROUP BY deptno
ORDER BY AVG(sal) DESC
)
WHERE ROWNUM <= 10;
Or using analytic functions:
SELECT deptno, avg_sal
FROM (
SELECT deptno, avg_sal, RANK() OVER (ORDER BY sal DESC) rank
FROM
(
SELECT deptno, AVG(sal) avg_sal
FROM emp
GROUP BY deptno
)
)
WHERE rank <= 10;
Or same again, but using DENSE_RANK() instead of RANK()

select * from TABLE where (GROUP, TIME) in (
select GROUP, max(TIME) from things
where TIME >= 20090804
group by GROUP
)
Tested with MySQL (but I had to change the table and column names because they are keywords).

SELECT *
FROM TABB T1
QUALIFY ROW_NUMBER() OVER ( PARTITION BY GROUPP,TIMEE order by id desc )=1

Related

query to find the second most common word in a table (oracle sql)

So I have a table employees as shown below
ID | name | department
---|------|-----------
1 | john | home
2 | alex | home
3 | ryan | tech
I'm trying to group these by the department number and have the count displayed. But I am trying to select the second most common, which in this case it should return (tech 1). Any help on how to approach this is appreciated.
Edit:
By only using MINUS, I'm still not familiar with LIMIT when searching around online.
We can use COUNT along with DENSE_RANK:
WITH cte AS (
SELECT department, COUNT(*) AS cnt,
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM yourTable
GROUP BY department
)
SELECT department, cnt
FROM cte
WHERE rnk = 2;
As of Oracle 12c, you might find the following limit query satisfactory:
SELECT department, COUNT(*) AS cnt
FROM yourTable
GROUP BY department
ORDER BY COUNT(*) DESC
OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY;
But this limit approach does not handle well the scenario where e.g. there might be 2 or more departments ties for first place. DENSE_RANK does a better job of handling such edge cases.

How to 'detect' a change of value in a column in SQL?

im new to SQL, i wanted to ask:
I have combined multiple tables with CTE and join and resulting on this Image here.
From this table, I wanted to detect and count how many workers changed the category from the 1st or 2nd job.
For example, Jonathan Carey has 'Sales Lapangan' as his first job_category, and changed to 'other' on his 2nd job, i wanted to count this job_category change as one.
I tried Case when, and while but i'm getting more confused.
This is my syntax for the table i created:
with data_apply2 as(with data_apply as(with all_apply as(with job_id as(select job_category,
row_number() over(order by job_category) as job_id
from job_post
group by job_category)
select jp.*, job_id.job_id from job_post jp
join job_id
on job_id.job_category=jp.job_category)
select ja.worker_id, wk.name, ja.id as id_application, aa.job_category, aa.job_id
from job_post_application ja
join all_apply aa
on aa.id=ja.job_post_id
join workers wk
on wk.id = ja.worker_id
order by worker_id,ja.id)
select *,
row_number() over(partition by worker_id order by worker_id) as worker_num
from data_apply)
Thank You
You can group by worker and check the number of distinct job categories:
SELECT worker_id,
COUNT(DISTINCT job_category) > 1 category_change
FROM data_apply
GROUP BY worker_id;
select case when job_category<> job_category then 1 else 0 end as cnt
from
(
select
worker_id,
name,
id_application,
job_category,
job_id,
worker_num,
coalesce(lag(job_category) over(partition by worker_id order by id_application), job_category) as job_category
from
sales_table
) x
This should help, using the Lag function I'm accessing the data over the previous row. and comparing it with the job_category and if they are not equal we are counting them as 1.

How to choose max of one column per other column

I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it
Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1
I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.
Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)
With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price

Find the months in 2019 that had the highest and least number of new joinees respectively

Suppose I have an employee table with a column as DateOfJoining. And I have entered the data as
2019-12-4
2019-12-6
2019-12-5
2019-10-5
2010-08-17
Now I want write the SQL query to find the month in which there is maximum number of joinees.
You can try below - using aggregation and TOP
select top 1 month(dateofjoining),count(*) as totaljoining
from tablename
where year(dateofjoining)=2019
group by month(dateofjoining)
order by 2 desc
Using Derived table and row_number()
Below query will give you the month in which there is maxmium number of joinees for each year.
select cnt,mnth,yr
from
(select count(DateOfJoining)cnt,
month(DateOfJoining)mnth,
year(DateOfJoining)yr,
row_number()over(partition by year(DateOfJoining) order by count(DateOfJoining)desc)srno
from #employee
group by month(DateOfJoining),year(DateOfJoining)
)tbl
where srno = 1
output
cnt mnth yr
----------- ----------- -----------
1 8 2010
3 12 2019
and if you want specifically for 2019 then add the condition yr ='2019' in the where clause.
where srno = 1
and yr =2019
output
cnt mnth yr
----------- ----------- -----------
3 12 2019
You want both the biggest and least -- although I am guessing that you want at least one employee.
with e as (
select year(dateofjoining) as yyyy,
month(dateofjoining) as mm,
count(*) as totaljoining
from employee
where dateofjoining >= '2019-01-01' and
dateofjoining < '2020-01-01'
group by year(dateofjoining), month(dateofjoining)
)
select e.*
from ((select top (1) e.*
from e
order by totaljoining asc
) union all
(select top (1) e.*
from e
order by totaljoining desc
)
) e;
Notes:
The date comparison uses direct comparisons to dates rather than using functions. This is a best practice so the optimizer can use indexes.
You want both the smallest and biggest value, so this uses a CTE so the group by is represented only once.
This will not return months with no employees.
If you want ties, then use top (1) with ties.

How to select a row based on its row number?

I'm working on a small project in which I'll need to select a record from a temporary table based on the actual row number of the record.
How can I select a record based on its row number?
A couple of the other answers touched on the problem, but this might explain. There really isn't an order implied in SQL (set theory). So to refer to the "fifth row" requires you to introduce the concept
Select *
From
(
Select
Row_Number() Over (Order By SomeField) As RowNum
, *
From TheTable
) t2
Where RowNum = 5
In the subquery, a row number is "created" by defining the order you expect. Now the outer query is able to pull the fifth entry out of that ordered set.
Technically SQL Rows do not have "RowNumbers" in their tables. Some implementations (Oracle, I think) provide one of their own, but that's not standard and SQL Server/T-SQL does not. You can add one to the table (sort of) with an IDENTITY column.
Or you can add one (for real) in a query with the ROW_NUMBER() function, but unless you specify your own unique ORDER for the rows, the ROW_NUMBERS will be assigned non-deterministically.
What you're looking for is the row_number() function, as Kaf mentioned in the comments.
Here is an example:
WITH MyCte AS
(
SELECT employee_id,
RowNum = row_number() OVER ( order by employee_id )
FROM V_EMPLOYEE
ORDER BY Employee_ID
)
SELECT employee_id
FROM MyCte
WHERE RowNum > 0
There are 3 ways of doing this.
Suppose u have an employee table with the columns as emp_id, emp_name, salary. You need the top 10 employees who has highest salary.
Using row_number() analytic function
Select * from
( select emp_id,emp_name,row_number() over (order by salary desc) rank
from employee)
where rank<=10
Using rank() analytic function
Select * from
( select emp_id,emp_name,rank() over (order by salary desc) rank
from employee)
where rank<=10
Using rownum
select * from
(select * from employee order by salary desc)
where rownum<=10;
This will give you the rows of the table without being re-ordered by some set of values:
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT '1')) AS RowID, * FROM #table
If using SQL Server 2012 you can now use offset/fetch:
declare #rowIndexToFetch int
set #rowIndexToFetch = 0
select
*
from
dbo.EntityA ea
order by
ea.Id
offset #rowIndexToFetch rows
fetch next 1 rows only