SQL : Group By on range of dynamic values - sql

This is similar to some other questions here, but those use a CASE which I cannot. This is on Oracle, and I will be running the query from an excel sheet. (And by the way these do not support WITH, which makes life much harder)
I have a range of dates in one big table - like 1/3/2011, 4/5/2012, 7/1/2013, 9/1/2013.....
Then I have another table with hours worked by employees on certain dates. So what I need to do is get a sum of number of hours worked by each employee in each intervening time period. So the tables are like
Dates
1-May-2011
5-Aug-2011
4-Apr-2012
....
and another
Employee Hours Date
Sam 4 1-Jan-2011
Sam 7 5-Jan-2011
Mary 12 7-Jan-2012
Mary 5 12-Dec-2013
......
so the result should be
Employee Hours In Date Range Till
Sam 11 1-May-2011
Sam 0 5-Aug-2011
Sam 0 4-Apr-2012
Mary 0 1-May-2011
Mary 0 5-Aug-2011
Mary 12 4-Apr-2012
....
Any pointers on how to achieve this please?

I'm unfamiliar with Oracle SQL and it's abilities/limitations, but since you asked for pointers, here's my take:
Join the tables (INNER JOIN) with the join rule being EmployeeHours.Date < Dates.Dates. Then GROUP BY Employee, Dates.Dates and select the grouping columns + SUM(Hours). What you'd end up with (Using your sample data) is:
Employee | Dates | Hours
Sam | 1-May-2011 | 11
Sam | 5-Aug-2011 | 11
Sam | 4-Apr-2012 | 11
Mary | 1-May-2011 | 0
Mary | 5-Aug-2011 | 0
Mary | 4-Apr-2012 | 12
With other (more complex) data, there will be more "interesting" results, but basically each row contains total hours up to that point.
You could then use that as an input to an outer query to find MAX(Hours) for all rows where Dates < currentDates and subtract that from your result.
Again, this is not a complete answer, but it's a direction that should work.

Related

Sum of field in a consecutive period based on a condition

I did this without complicated query and with Python. But I'm looking for a way to do this with Django ORM.
I have a table as follows:
user
date
point
Mary
2022/01/04
13
John
2022/01/04
10
Mary
2022/01/03
0
John
2022/01/03
5
Mary
2022/01/01
1
John
2022/01/01
1
Mary
2021/12/31
5
I want to calculate the Sum of points from now() to the date when the point value is greater than one.
Desired Output:
user
sum
Mary
14
13+1
John
10
10

How to measure an average count from a set of days each with their own data points, in SQL/LookerML

I have the following table:
id | decided_at | reviewer
1 2020-08-10 13:00 john
2 2020-08-10 14:00 john
3 2020-08-10 16:00 john
4 2020-08-12 14:00 jane
5 2020-08-12 17:00 jane
6 2020-08-12 17:50 jane
7 2020-08-12 19:00 jane
What I would like to do is get the difference between the min and max for each day and get the total count from the id's that are the min, the range between min and max, and the max. Currently, I'm only able to get this data for the past day.
Desired output:
Date | Time(h) | Count | reviewer
2020-08-10 3 3 john
2020-08-12 5 4 jane
From this, I would like to get the average show this data over the past x number of days.
Example:
If today was the 13th, filter on the past 2 days (48 hours)
Output:
reviewer | reviews/hour
jane 5/4 = 1.25
Example 2:
If today was the 13th, filter on the past 3 days (48 hours)
reviewer | reviews/hour
john 3/3 = 1
jane 5/4 = 1.25
Ideally, if this is possible in LookML without the use of a derived table, it would be nicest to have that. Otherwise, a solution in SQL would be great and I can try to convert to LookerML.
Thanks!
In SQL, one solution is to use two levels of aggregation:
select reviewer, sum(cnt) / sum(diff_h) review_per_hour
from (
select
reviewer,
date(decided_at) decided_date,
count(*) cnt,
timestampdiff(hour, min(decided_at), max(decided_at)) time_h
from mytable
where decided_at >= current_date - interval 2 day
group by reviewer, date(decided_at)
) t
group by reviewer
The subquery filters on the date range, aggregates by reviewer and day, and computes the number of records and the difference between the minimum and the maximum date, as hours. Then, the outer query aggregates by reviewer and does the final computation.
The actual function to compute the date difference varies across databases; timestampdiff() is supported in MySQL - other engines all have alternatives.

DAX SUMMARIZE() with filter - Powerpivot

Rephrasing a previous question after further research. I have a denormalised hierarchy of cases, each with an ID, a reference to their parent (or themselves) and a closure date.
Cases
ID | Client | ParentMatterName | MatterName | ClaimAmount | OpenDate | CloseDate
1 | Mr. Smith | ABC Ltd | ABC Ltd | $40,000 | 1 Jan 15 | 4 Aug 15
2 | Mr. Smith | ABC Ltd | John | $0 |20 Jan 15 | 7 Oct 15
3 | Mr. Smith | ABC Ltd | Jenny | $0 | 1 Jan 15 | 20 Jan 15
4 | Mrs Bow | JQ Public | JQ Public | $7,000 | 1 Jan 15 | 4 Aug 15
After the help of greggyb I also have another column, Cases[LastClosed], which will be true if the current row is closed, and is the last closed of the parent group.
There is also a second table of payments, related to Cases[ID]. These payments could be received in parent or child matters. I sum payments received as follows:
Recovery All Time:=CALCULATE([Recovery This Period], ALL(Date_Table[dateDate]))
I am looking for a new measure which will calculate the total recovered for a unique ParentMatterName, if the last closed matter in this group was closed in the Financial Year we are looking at - 30 June end date.
I am now looking at the SUMMARIZE() function to do the first part of this, but I don't know how to filter it. The layers of calculate are confusing. I've looked at This MSDN blog but it appears that this will filter to only show the total payments for that matter that was last closed (not adding the related children).
My current formula is:
Recovery on Closed This FY :=
CALCULATE (
SUMX (
SUMMARIZE (
MatterListView,
MatterListView[UniqueParentName],
"RecoveryAllTime", [Recovery All Time]
),
[RecoveryAllTime]
)
)
All help appreciated.
Again, your solution is much more easily solved with a model addition. Remember, storage is cheap, your end users are impatient.
Just store in your Cases table a column with the LastClosedDate of every parent matter, which indicates the date associated with the last closed child matter. Then it's a simple filter to return only those payments/matters that have LastClosedDate in the current fiscal year. Alternately, if you know for certain that you are only concerned with the year, you could store just LastClosedFiscalYear, to make your filter predicate a bit simpler.
If you need help with specific measures or how you might implement the additional field, let us know (I'd recommend adding these fields at the source, or deriving them in the source query rather than using calculated columns).

Showing Null Value Based on Three Tables

I’m working on showing employees that have not entered in any hours for a previous week. I’m currently working with three tables. One table is a calendar that has the first date of each week. The week format is Sunday to Saturday. The second table is the list of hours entered. The Time table contains the date the time was entered and the employees name. The third table is the list of all the employees. I can’t seem to get the joins to work how I would like them to. The end result I would like to see that Bob entered time in week 7 and 8, but week 9 is null. Thank you for your help. Its greatly appreciated.
Current Code
SELECT
d.Resource
,SUM(p.Hours) AS Hours
,m.[WeeksSundayToSaturday]
,DatePart(wk, m.[WeeksSundayToSaturday]) AS WeekNumber
FROM CalendarWeeks m
LEFT JOIN [TimeTracking] p ON
(m.[WeeksSundayToSaturday] BETWEEN p.Date AND p.Date + 7)
RIGHT JOIN [DepartmentMembers] d ON
d.Resource = p.CreatedBy
GROUP BY
d.Resource
,m.WeeksSundayToSaturday
Data Tables
Department Members
Name Department
Bob Engineer
Sue HR
John Operations
Time Tracking
Resource Hours Date
Bob 13 2/9/2014
Sue 12 2/10/2014
John 2 2/11/2014
Bob 6 2/12/2014
Bob 8 2/13/2014
John 8 2/14/2014
John 8 2/15/2014
Bob 8 2/16/2014
Bob 1 2/17/2014
Bob 2 2/18/2014
Bob 1 2/19/2014
Bob 8 2/20/2014
Bob 9 2/21/2014
Bob 6 2/22/2014
Sue 8 2/23/2014
John 2 2/24/2014
Calendar
WeeksSundayToSaturday
1/5/2014
1/12/2014
1/19/2014
1/26/2014
2/2/2014
2/9/2014
2/16/2014
2/23/2014
3/2/2014
3/9/2014
3/16/2014
3/23/2014
3/30/2014
Desired Result
Bob
Week 7 = 27
Week 8 = 35
Week 9 = NULL
Your above query is giving compilation error, please try below query i think it will help you
SELECT
d.Resource
,SUM(p.Hours) AS Hours
,m.[WeeksSundayToSaturday]
,DatePart(wk, m.[WeeksSundayToSaturday]) AS WeekNumber
FROM CalendarWeeks m
LEFT JOIN [TimeTracking] p ON (p.Date BETWEEN
m.[WeeksSundayToSaturday] AND Dateadd(d,6, m.[WeeksSundayToSaturday])
RIGHT JOIN [DepartmentMembers] d ON d.Resource = p.CreatedBy
GROUP BY d.Resource ,m.WeeksSundayToSaturday

How To Traverse a Tree/Work With Hierarchical data in SQL Code

Say I have an employee table, with a record for each employee in my company, and a column for supervisor (as seen below). I would like to prepare a report, which lists the names and title for each step in a supervision line. eg for dick robbins, 1d #15, i'd like a list of each supervisor in his "chain of command," all the way to the president, big cheese. I'd like to avoid using cursors, but if that's the only way to do this then that's ok.
id fname lname title supervisorid
1 big cheese president 1
2 jim william vice president 1
3 sally carr vice president 1
4 ryan allan senior manager 2
5 mike miller manager 4
6 bill bryan manager 4
7 cathy maddy foreman 5
8 sean johnson senior mechanic 7
9 andrew koll senior mechanic 7
10 sarah ryans mechanic 8
11 dana bond mechanic 9
12 chris mcall technician 10
13 hannah ryans technician 10
14 matthew miller technician 11
15 dick robbins technician 11
The real data probably won't be more than 10 levels deep...but I'd rather not just do 10 outside joins...I was hoping there was something better than that, and less involved than cursors.
Thanks for any help.
This is basically a port of the accepted answer on my question that I linked to in the OP comments.
you can use common-table expressions
WITH Family As
(
SELECT e.id, e.supervisorid, 0 as Depth
FROM Employee e
WHERE id = #SupervisorID
UNION All
SELECT e2.ID, e2.supervisorid, Depth + 1
FROM Employee e2
JOIN Family
On Family.id = e2.supervisorid
)
SELECT*
FROM Family
For more:
Recursive Queries Using Common Table Expressions
You might be interested in the "Materialized Path" solution, which does slightly de-normalize the table but can be used on any type of SQL database and prevents you from having to do recursive queries. In fact, it can even be used on no-SQL databases.
You just need to add a column which holds the entire ancestry of the object. For example, the table below includes a column named tree_path:
+----+-----------+----------+----------+
| id | value | parent | tree_path|
+----+-----------+----------+----------+
| 1 | Some Text | 0 | |
| 2 | Some Text | 0 | |
| 3 | Some Text | 2 | -2-|
| 4 | Some Text | 2 | -2-|
| 5 | Some Text | 3 | -2-3-|
| 6 | Some Text | 3 | -2-3-|
| 7 | Some Text | 1 | -1-|
+----+-----------+----------+----------+
Selecting all the descendants of the record with id=2 looks like this:
SELECT * FROM comment_table WHERE tree_path LIKE '-2-%' ORDER BY tree_path ASC
To build a tree, you can sort by tree_path to get an array that's fairly easy to convert to a tree.
You can also index tree_path and the index can be used when the wildcard is not at the beginning.
For example, tree_path LIKE '-2-%' can use the index, but tree_path LIKE '%-2-' cannot.
Some recursive function which either return the supervisor (if any) or null. Could be a SP which invokes itself as well, and using UNION.
SQL is a language for performing set operations and recursion is not one of them. Further, many database systems have limitations on recursion using stored procedures as a safety measure to prevent rogue code from running away with precious server resources.
So, when working with SQL always think 'flat', not 'hierarchical'. So I would highly recommend the 'tree_path' method that has been suggested. I have used the same approach and it works wonderfully and crucially, very robustly.