How to 'detect' a change of value in a column in SQL? - sql

im new to SQL, i wanted to ask:
I have combined multiple tables with CTE and join and resulting on this Image here.
From this table, I wanted to detect and count how many workers changed the category from the 1st or 2nd job.
For example, Jonathan Carey has 'Sales Lapangan' as his first job_category, and changed to 'other' on his 2nd job, i wanted to count this job_category change as one.
I tried Case when, and while but i'm getting more confused.
This is my syntax for the table i created:
with data_apply2 as(with data_apply as(with all_apply as(with job_id as(select job_category,
row_number() over(order by job_category) as job_id
from job_post
group by job_category)
select jp.*, job_id.job_id from job_post jp
join job_id
on job_id.job_category=jp.job_category)
select ja.worker_id, wk.name, ja.id as id_application, aa.job_category, aa.job_id
from job_post_application ja
join all_apply aa
on aa.id=ja.job_post_id
join workers wk
on wk.id = ja.worker_id
order by worker_id,ja.id)
select *,
row_number() over(partition by worker_id order by worker_id) as worker_num
from data_apply)
Thank You

You can group by worker and check the number of distinct job categories:
SELECT worker_id,
COUNT(DISTINCT job_category) > 1 category_change
FROM data_apply
GROUP BY worker_id;

select case when job_category<> job_category then 1 else 0 end as cnt
from
(
select
worker_id,
name,
id_application,
job_category,
job_id,
worker_num,
coalesce(lag(job_category) over(partition by worker_id order by id_application), job_category) as job_category
from
sales_table
) x
This should help, using the Lag function I'm accessing the data over the previous row. and comparing it with the job_category and if they are not equal we are counting them as 1.

Related

SQL ZOO Window LAG #8

Question: For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
Here is a few sample data of the covid table.
What I write:
SELECT name,date,MAX(confirmed-lag) AS PeakNew
FROM(
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) lag
FROM covid
ORDER BY confirmed
) temp
GROUP BY name
HAVING PeakNew>=1000
ORDER BY PeakNew DESC;
The result I got is weird, PeakNew seems correct, but the related date is not.
My answer
The right answer
Anyone can help to get the right answer? Thank you!
The below query works perfectly fine for me. Though the dates and values are correct, the output will say otherwise as the order is different. Here the order is by date, then by name.
SELECT z1.name, DATE_FORMAT(c.dt,'%Y-%m-%d'), z1.nc
FROM
(
SELECT z.name, MAX(z.nc) AS 'mx'
FROM (
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid ) z
WHERE z.nc >= 1000
GROUP BY z.name
) z1
INNER JOIN
(
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid
) c
ON c.nc = z1.mx
AND c.name = z1.name
ORDER BY 2 ASC
The date value in the outer query doesn't correspond to row where MAX(confirmed-lag) is found - it's just a random date value within that group. Check out the section titled, "The ONLY_FULL_GROUP_BY Issue" in this blog post: https://www.percona.com/blog/2019/05/13/solve-query-failures-regarding-only_full_group_by-sql-mode/ for more information.
I used the ROW_NUMBER() function to get the entire row corresponding to the maximum new cases. However, my final result wasn't ordered the way the answer was, and there's no specification to how it should be ordered, so I still didn't get that satisfying happy emoji.
You need to self join to obtain the date on which the max count occurred:
WITH CTE1 as
(SELECT name,DATE_FORMAT(whn, "%Y-%m-%d") as date,
confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY DATE(whn)) as increase
FROM covid
ORDER BY whn),
CTE2 AS
(SELECT name, MAX(increase) as max_increase
FROM CTE1
WHERE increase >999
GROUP BY name
ORDER BY date)
SELECT c1.name,c1.date,c2.max_increase as peakNewCases
FROM CTE1 as c1
JOIN CTE2 as c2
ON c1.name=c2.name AND c1.increase=c2.max_increase
WITH CTE1 as
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') as date_form, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY whn) AS newcases
FROM covid
ORDER BY name,whn)
SELECT name, date_form, newcases FROM
(
SELECT name, date_form, newcases, ROW_NUMBER() OVER (PARTITION BY name ORDER BY newcases DESC) as rank
FROM CTE1
WHERE newcases > 999
) cte2
WHERE rank =1

Insert into another table after fetching latest date and and performing an inner join

I have a table called "Member_Details" which has multiple records for each member_ID. For Example,
I have another table called "BMI_Data" that looks like the following.
The goal is to fetch the names of those members whose "BMI" in "Member_Details" is less than the "target_BMI" in "BMI_Data" table and insert it into a new table called "results" with "Member_ID, First_Name and BMI" as its schema.
Also, one consideration is to fetch the latest data available in the "Member_Details" for each member (based on date) and then do the comparison
The result for the above scenario would be something like this.
I tried using the following query
INSERT INTO results_table (Member_ID, First_Name, BMI)
select c.Member_ID, First_Name, BMI
from
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Member_ID ORDER BY Date desc)
AS ROWNUM FROM Member_Details) x
JOIN
BMI_Data c ON x.Member_ID = c.Member_ID
where
x.BMI < c.Target_BMI
The above query doesn't fetch the latest date and simply loads all records in which member BMI is less than target_BMI.
Please help !
An alternate query might be
INSERT INTO results_table (Member_ID, First_Name, BMI)
select md2.member_ID, md2.First_Name, md2.BMI
from BMI_Data bd
inner join (select distinct md.member_ID ,md.First_Name ,(select top 1 BMI from Member_Details where member_ID = md.member_ID order by Date desc) BMI from Member_Details md) md2 on md2.member_ID = bd.member_ID
where md2.BMI < bd.Target_BMI
First you haven't specify the condition after row_numbers defined
INSERT INTO results_table (Member_ID, First_Name, BMI)
select c.Member_ID, First_Name, BMI
from (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Member_ID ORDER BY Date desc) AS ROWNUM
FROM Member_Details
) x JOIN BMI_Data c
ON x.Member_ID = c.Member_ID
where x.ROWNUM = 1 and x.BMI < c.Target_BMI;
Wanted to note - there is no such date as '31-April-2018'! You might meant '1-May-2018'
In any case - it is important to make sure that when you are ordering by Date you first cast it to data type of DATE otherwise ordering is not correct. Below makes this ordering proper and in addition proposes alternative way by using ARRAY_AGG() with ORDER BY and LIMIT 1
#standardSQL
INSERT INTO results_table (Member_ID, First_Name, BMI)
SELECT * EXCEPT(Target_BMI)
FROM (
SELECT Member_ID, First_Name,
ARRAY_AGG(BMI ORDER BY PARSE_DATE('%d-%B-%Y', Date) DESC LIMIT 1)[OFFSET(0)] BMI
FROM `project.dataset.member_details`
GROUP BY Member_ID, First_Name
) d
JOIN `project.dataset.bmi_data` t
USING(Member_ID)
WHERE BMI < Target_BMI

Find the month in which maximum number of employees hired

I have a situation where I need to find the month in which maximum number of employees hired.
Here is my Employee table:
Although I have a solution for this:
select MM
from (
select *, dense_RANK() OVER(order by cnt desc) as rnk
from (
select month(doj) as MM,count(month(doj)) as CNT
from employee
group by month(doj)
)x
)y
where rnk=1
But I am not satisfied with what i have implemented and want the most feasible solution for it.
I think the simplest way is:
select top 1 year(doj), month(doj), count(*)
from employee
group by year(doj), month(doj)
order by count(*) desc;
Notes:
This interprets "month" as being "year/month". If you really do only want the month, then remove year() from both the select and group by.
This returns one row. If you want multiple rows when there are ties, then use select top (1) with ties.

SQL RANK with multiple WHERE clause

I have got few sales offices, together with their sales. I am trying to set-up report that will basically tell how is each office performing. Getting some SUMs, COUNTs are quite easy, however I am struggling with getting rank of single office.
I would like to have this query return the rank of single office, during the entire period and/or specified time (eg. BETWEEN '2015-01-01' AND '2015-01-15')
I need to also exclude some offices from the rank list (eg. OfficeName NOT IN ('GGG','QQQ')), so using the sample data, the rank of office 'XYZ' would be 5.
In case that the OfficeName = 'XYZ' is included in WHERE clause, the RANK would be obviously = 1 as SQL filters out other rows, not contained in WHERE clause before executing the rest of the code.
Is there any way of doing the same, without using the TemporaryTable ?
SELECT OfficeName, SUM(Value) as SUM,
RANK() OVER (ORDER BY SUM(VALUE) DESC) AS Rank
FROM Transactions t
JOIN Office o ON t.TransID=o.ID
WHERE OfficeName NOT IN ('GGG','QQQ')
--AND OfficeName = 'XYZ'
GROUP BY OfficeName
ORDER BY 2 DESC;
I am using MS SQL server 2008.
SQL Fiddle with some random data is here: http://sqlfiddle.com/#!3/fac7a/35
Many thanks for help!
if i understand you correctly you want to do:
SELECT *
FROM (
SELECT OfficeName, SUM(Value) as SUM,
RANK() OVER (ORDER BY SUM(VALUE) DESC) AS Rank
FROM Transactions t
JOIN Office o ON t.TransID=o.ID
WHERE OfficeName NOT IN ('GGG','QQQ')
GROUP BY OfficeName
) dat
WHERE OfficeName = 'XYZ';
You just need to wrap your code as derived table or use a CTE like this and then do the filter for OfficeName = 'XYZ'.
;WITH CTE AS
(
SELECT OfficeName, SUM(Value) as SUM,
RANK() OVER (ORDER BY SUM(VALUE) DESC) AS Rank
FROM Transactions t
JOIN Office o ON t.TransID=o.ID
WHERE OfficeName NOT IN ('GGG','QQQ')
GROUP BY OfficeName
)
SELECT *
FROM CTE
WHERE OfficeName = 'XYZ';
Here is an amusing way to do this without a subquery:
SELECT TOP 1 OfficeName, SUM(Value) as SUM,
RANK() OVER (ORDER BY SUM(VALUE) DESC) AS Rank
FROM Transactions t JOIN
Office o
ON t.TransID = o.ID
WHERE OfficeName NOT IN ('GGG','QQQ')
GROUP BY OfficeName
ORDER BY (CASE WHEN OfficeName = 'XYZ' THEN 1 ELSE 2 END);

Tricky SQL SELECT Statement

I have a performance issue when selecting data in my project.
There is a table with 3 columns: "id","time" and "group"
The ids are just unique ids as usual.
The time is the creation date of the entry.
The group is there to cummulate certain entries together.
So the table data may look like this:
ID | TIME | GROUP
------------------------
1 | 20090805 | A
2 | 20090804 | A
3 | 20090804 | B
4 | 20090805 | B
5 | 20090803 | A
6 | 20090802 | B
...and so on.
The task is now to select the "current" entries (their ids) in each group for a given date. That is, for each group find the most recent entry for a given date.
Following preconditions apply:
I do not know the different groups in advance - there may be many different ones changing over time
The selection date may lie "in between" the dates of the entries in the table. Then I have to find the closest one in each group. That is, TIME is less than the selection date but the maximum of those to which this rule applies in a group.
What I currently do is a multi-step process which I would like to change into single SELECT statement:
SELECT DISTINCT group FROM table to find the available groups
For each group found in 1), SELECT * FROM table WHERE time<selectionDate AND group=loop ORDER BY time DESC
Take the first row of each result found in 2)
Obviously this is not optimal.
So I would be very happy if some more experienced SQL expert could help me to find a solution to put these steps in a single statement.
Thank you!
The following will work on SQL Server 2005+ and Oracle 9i+:
WITH groups AS (
SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group)
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN groups g ON g.group = t.group AND g.maxtime = t.time
Any database should support:
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN (SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group) g ON g.group = t.group AND g.maxtime = t.time
Here's how I would do it in SQL Server:
SELECT * FROM table WHERE id in
(SELECT top 1 id FROM table WHERE time<selectionDate GROUP BY [group] ORDER BY [time])
The solution will vary by database server, since the syntax for TOP queries varies. Basically you are looking for a "top n per group" query, so you can Google that if you want.
Here is a solution in SQL Server. The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.
select
HRRanks.*
from
(
Select
b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from
Batting b
where
b.yearID > 1990
group by
b.yearID, b.playerID
)
HRRanks
where
HRRanks.HR_Rank <= 10
Here is a solution in Oracle (Top Salespeople per Department)
SELECT deptno, avg_sal
FROM(
SELECT deptno, AVG(sal) avg_sal
GROUP BY deptno
ORDER BY AVG(sal) DESC
)
WHERE ROWNUM <= 10;
Or using analytic functions:
SELECT deptno, avg_sal
FROM (
SELECT deptno, avg_sal, RANK() OVER (ORDER BY sal DESC) rank
FROM
(
SELECT deptno, AVG(sal) avg_sal
FROM emp
GROUP BY deptno
)
)
WHERE rank <= 10;
Or same again, but using DENSE_RANK() instead of RANK()
select * from TABLE where (GROUP, TIME) in (
select GROUP, max(TIME) from things
where TIME >= 20090804
group by GROUP
)
Tested with MySQL (but I had to change the table and column names because they are keywords).
SELECT *
FROM TABB T1
QUALIFY ROW_NUMBER() OVER ( PARTITION BY GROUPP,TIMEE order by id desc )=1