Grouping Start Dates

Grouping Start Dates - sql

Example of what I am trying to do:
I have 10 employees. They all started on different days throughout the year. Each get paid once a week. I want to query their first paycheck and call that week 1 for all employees. Then each subsequent paycheck will be 2...3...through 13. So basically I want to see what each of their first 13 weeks on the job looked like stacked against each other. I would expect my output to look something like this:

You can use row_number():
select
row_number() over(partition by EmployeeId order by PaycheckDate) week,
EmployeeId,
PaycheckDate,
Amount
from mytable
order by week, EmployeeId
If you want just the first 13 weeks per employee, then:
select *
from (
select
row_number() over(partition by EmployeeId order by PaycheckDate) week,
EmployeeId,
PaycheckDate,
Amount
from mytable
) t
where week <= 13
order by week, EmployeeId

Related

How to conditional SQL select

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.
Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.
Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?
UPDATE---since I did a bad job of describing the question, I've come to include some example data and expected results
Where current month is not present for user 33
Where current month is present

Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user...
SELECT
user_id, SUM(revenue) as s_revenue
FROM
(
SELECT
user_id, revenue, publish_month,
MAX(publish_month) OVER (PARTITION BY user_id) AS user_latest_publish_month
FROM
yourtableyoudidnotname
)
summarised
WHERE
publish_month >= DATEADD(month, -2, user_latest_publish_month)
GROUP BY
user_id
If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))
The ambiguity here is why it is important to include a Minimal Reproducible Example
With input data and require results, we could test our code against your requirements
If you're using strings for the publish_month, you shouldn't be, and should fix that with utmost urgency.

You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.
Here is how:
SELECT user_id, revienue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
now you just select the items with RN less than 3 and do your sum
SELECT user_id, SUM(revenue) as s_revenue
FROM (
SELECT user_id, revenue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
) X
WHERE RN <= 3
GROUP BY user_id
You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.
From the comment -- there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of
ORDER BY publish_month DESC
you would have
ORDER BY (100*publish_year)+publish_month DESC
This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

Problems with the MAX() IN SQL: not returning all desired information

I am exploring a dataset in Microsoft SQL Server Management, regarding sales.
I want to obtain the day with the highest number of items sold for each year, therefore a table like this (the values in the rows are totally random):
Year
Purchase Day
Max_Daily_Sales
2011
2011-11-12
48
2012
2012-12-22
123
I first tried to run this query:
WITH CTE_DailySales AS
(
SELECT DISTINCT
Purchase_Day,
Year,
SUM(Order_Quantity) OVER (PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold
FROM
[sql_cleaning].[dbo].[Sales$]
)
SELECT
Year, MAX(Daily_Quantity_Sold) AS Max_Daily_Sales
FROM
CTE_DailySales
GROUP BY
Year
ORDER BY
Year
It partially works since it gives me the highest quantity of items sold in a day for each year. However, I would also like to specify what day of the year it was.
If I try to write Purchase_Day in the Select statement, it returns the max for each day, not the single day with the highest number of items sold.
How could I resolve this problem?
I hope I've been clear enough and thanks you all for your help

I suggest you use ROW_NUMBER to get you max value, your query would be:
WITH CTE_DailySales AS
(
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) Daily_Quantity_Sold,
ROW_NUMBER() OVER(PARTITION BY Year ORDER BY SUM(Order_Quantity) DESC) as rn
FROM
[sql_cleaning].[dbo].[Sales$]
GROUP BY Purchase_Day,
Year
)
SELECT
*
FROM
CTE_DailySales
WHERE rn = 1

Simply :
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) OVER(PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold,
MAX(SUM(Order_Quantity)) OVER(PARTITION BY Purchase_Day, Year) AS MAX_QTY_YEAR
FROM [sql_cleaning].[dbo].[Sales$];

T-SQL - Get last 30 Rows for eacht ID

how can i get the last 30 rows in a month for each employee? i have a table with evaluations for each employee.
SELECT
Date,
Month,
Team,
Employee_ID,
Evaluation_Score,
Evaluation_Case_Number
From X
Where month = #month
Order by date desc
This is what i got, but i only want to see the last 30 Evaluation Scores (or less, if they don't have that many) for the declared month.
Is there a way to do this? Thanks in Advance.

You can use row_number(). Something like this:
select x.*
from (select x.*,
row_number() over (partition by employee_id order by date desc) as seqnum
from x
where month = #month
) x
where seqnum <= 30;

Use the TOP statement:
SELECT TOP(30)
Date,
Month,
Team,
Employee_ID,
Evaluation_Score,
Evaluation_Case_Number
From X
Where month = #month
Order by date desc
This will help you to limit the number of returned rows.

How to run sql n times increasing variable and after joining results

I've a transact table (historical) with a CreatedDate, this transact is related to employee transact table. (inner join in transact_id)
This being said, comes the problem: I need to query these table and get the state by month , because during the year, the CreatedDate can change. e.g. An employee update in July will create a new line, but this shouldn't affect the March total.
The solution looks like a forech, but how can I join all lines at the end? The result should be something like:
January - $123
February - $234
March - $123
...
I get the last state of each employee with this:
select AllTransact.id_employee, AllTransact.id_department from (
select id_employee, id_department, rank() over (partition by id_employee order by created_date desc) desc_rank
from Transact_Employee TransEmployee
inner join Transact on TransEmployee.ID_Transact = Transact.ID_Transact
and Transact.Status = 8
and Transact.Created_Date < #currentMonth) AllTransact
where desc_rank = 1
*I don't want to copy and past all the code 12 times. :)

You can partition over many columns. rank() OVER (partition BY [id_employee],datepart(month,[Created_Date]) ORDER BY [Created_Date] DESC) will give you what you have now but for each month (and it doesn't care what year that month is in so you either need to partition by year too or add limit on created_date).

How can I select one row for each week in a date range that spans more than a year?

In my postgreSQL data base, I have a table with columns of dates and prices. ('transdate' and 'price')
I would like to form a query which selects one row for each week over a date range which spans more than one year.
From another question/answer here, I implemented this code which works for date ranges of less than a year:
;with cte as
(
select *,
row_number() over (partition by Extract (week from transdate) order by transdate desc) as rn
from "tablename" where transdate between '06-01-1999' and '06-01-1999'::timestamp + `'50 week'::interval
)
select transdate, price from cte where rn = 1 order by transdate;
However, when I extend the interval greater than 50 weeks, it still only selects a max of 12 months.
How can I re-write this code to select one date/price from every week in the range?

Your problem is that week numbers wrap around at year boundaries but you want to look at the week number and the year at the same time. Lucky for you, you can PARTITION BY several things at once:
row_number() over (
partition by extract(week from transdate),
extract(year from transdate)
order by transdate desc
) as rn

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grouping Start Dates - sql

Related

How to conditional SQL select

Problems with the MAX() IN SQL: not returning all desired information

T-SQL - Get last 30 Rows for eacht ID

How to run sql n times increasing variable and after joining results

How can I select one row for each week in a date range that spans more than a year?

Categories

Resources