How to calculated Working Hours over 17 week period - sql

I am trying to work out How many field engineers work over 48 hours over a 17 week period. (by law you cannot work over 48 hours over a 17 week period)
I managed to run the query for 1 Engineer but when I run it without an Engineer filter my query is very slow.
I need to get the count of Engineers working over 48 hours and count under 48 hours then get an Average time worked per week.
Note: I am doing a Union on SPICEMEISTER & SMARTMEISTER because they are our old and new databases.
• How many field engineers go over the 48 hours
• How many field engineers are under the 48 hours
• What is the average time worked per week for engineers
SELECT DS_Date
,TechPersNo
FROM
(
SELECT DISTINCT
SMDS.EPL_DAT as DS_Date
,EN.pers_no as TechPersNo
FROM
[SpiceMeister].[FS_OTBE].[EngPayrollNumbers] EN
INNER JOIN
[SmartMeister].[Main].[PlusDailyKopf] SMDS
ON RIGHT(CAST(EN.[TechnicianID] AS CHAR(10)),5) = SMDS.PRPO_TECHNIKERNR
WHERE
SMDS.EPL_DAT >= '2017-01-01'
and
SMDS.EPL_DAT < '2018-03-01'
UNION ALL
SELECT DISTINCT
SPDS.DailySummaryDate as DS_Date
,EN.pers_no as TechPersNo
FROM
[SpiceMeister].[FS_OTBE].[EngPayrollNumbers] EN
INNER JOIN
[SpiceMeister].[FS_DS_BO].[DailySummaryHeader] SPDS
ON EN.TechnicianID = SPDS.TechnicianID
WHERE
SPDS.DailySummaryDate >= '2018-03-01'
) as Techa
where TechPersNo = 850009
) Tech
cross APPLY
Fast results

The slowness is definitely due to the use of cross apply with a correlated subquery. This will force the computation on a per-row basis and prevents SQL Server from optimizing anything.
This seems more like it should be a 'group by' query, but I can see why you had trouble making it up on account of the complex cumulative calculation in which you need output by person and by date, but the average involves not the date in question but a date range ending on the date in question.
What I would do first is make a common query to capture your base data between the two datasets. That's what I do in the 'dailySummaries' common table expression below. Then I would join dailySummaries onto itself matching by the employee and selecting the date range required. From that, I would group by employee and date, aggregating by the date range.
with
dailySummaries as (
select techPersNo = en.pers_no,
ds_date = smds.epl_dat,
dtDif = datediff(minute, smds.abfahrt_zeit, smds.rueck_zeit)
from smartMeister.main.plusDailyKopf
join smartMeister.main.plusDailyKopf smds
on right(cast(en.technicianid as char(10)),5) = smds.prpo_technikernr
where smds.epl_dat < '2018-03-01'
union all
select techPersNo = en.pers_no,
dailySummaryDate,
datediff(minute,
iif(spds.leaveHome < spds.workStart, spds.leaveHome, spds.workStart),
iif(spds.arrivehome > spds.workEnd, spds.arrivehome, spds.workEnd)
)
from spiceMeister.fs_ds_bo.dailySummaryHeader spds
join spiceMeister.fs_ds_bo.dailySummaryHeader spds
on en.TechnicianID = spds.TechnicianID
where spds.DailySummaryDate >= '2018-03-01'
)
select ds.ds_date,
ds.techPersNo,
AvgHr = convert(real, sum(dsPrev.dtDif)) / (60*17)
from dailySummaries ds
left join dailySummaries dsPrev
on ds.techPersNo = dsPrev.techPersNo
and dsPrev.ds_date between dateadd(day,-118, ds.ds_date) and ds.ds_date
where ds.ds_date >= '2017-01-01'
group by ds_date,
techPersNo
order by ds_date
I may have gotten a thing or two wrong in translation but you get the idea.
In the future, post a more minimal example. The union of the two datasets from the separate databases is not central to the problem you were trying to ask about. A lot of the date filterings aren't core to the question. The special casting in your join match logic is not important. These things cloud the real issue.

Related

Query for Age calculation and Date comparison inconsistent

I'm doing some counts to validate a table XXX, I designed 2 queries to calculate people younger than 18 years.
The query i'm using is the following:
select count(distinct user.id) from user
left join sometable on sometable.id = user.someTableId
left join anotherTable on sometable.anotherTableId = anotherTable.id
where (sometable.id = 'x' or user.public = true)
AND (DATE_PART('year', age(current_date, user.birthdate)) >= 0 and DATE_PART('year', age(current_date, user.birthdate)) <= 18);
This query is giving 5000 counts (Fake result)
but this query that is supposed to do the same:
select count(distinct user.id) from user
left join sometable on sometable.id = user.someTableId
left join anotherTable on sometable.anotherTableId = anotherTable.id
where (sometable.id = 'x' or user.public = true)
and (user.birthdate between '2002-08-26' and current_date)
SIDE NOTE: date '2002-08-26' is because today is 2020-08-26, so I subtracted 18 years from today's date.
is giving me a different count from the first one. (This last one, is giving the correct one, since is the same that I've in another nosql database)
I would like to know what's the difference in the queries or why the counts are different.
Thanks in advance.
In your first query, you are including everyone who has not yet turned 19.
In your second query, you are excluding a bunch of 18 year old's who were born prior to 2002-08-26. For example, someone born on 2002-04-12 is still 18 years old. She won't turn 19 until 2021-04-12.
Easiest way to write in postgres is this, which provides same results as your first query:
where extract(year from age(now(), birthdate)) <= 18
If you really want to use the format of your 2nd query, then change your line to:
where (birth_date between '2001-08-27' and current_date)

Can someone help me with this join

I need it to give me me a total of 0 for week 33 - 39, but I'm really bad with joining 3 tables and I cant figure it out
Right now it only gives me an answer for dates that there are actual records in the tracker_weld_table.
SELECT SUM(tracker_parts_archive.weight),
WEEK(mycal.dt) as week
FROM
tracker_parts_archive, tracker_weld_archive
RIGHT JOIN
(SELECT dt FROM calendar_table WHERE dt >= '2018-7-1' AND dt <= '2018-10-1') as mycal
ON
weld_worker = '133'AND date(weld_dateandtime) = mycal.dt
WHERE
tracker_weld_archive.tracker_partsID = tracker_parts_archive.id
GROUP BY week
I think you are trying for something like this:
SELECT WEEK(c.dt) as week, COALESCE(SUM(tpa.weight), 0)
FROM calendar_table c left join
tracker_weld_archive tw
on date(tw.weld_dateandtime) = c.dt left join
tracker_parts_archive tp
on tw.tracker_partsID = tp.id and tp.weld_worker = 133
WHERE c.dt >= '2018-07-01' AND c.dt <= '2018-10-01'
GROUP BY week
ORDER BY week;
Notes:
You want to keep all (matching) rows in the calendar table, so it should be first.
All subsequent joins should be LEFT JOINs.
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Write out the full proper date constant -- YYYY-MM-DD. This is an ISO-standard format.
I am guessing that weld_worker is a number, so single quotes are not needed for the comparison.
First, lets start with understanding what you want.. You want totals per week. This means there will be a "GROUP BY" clause (also for any MIN(), MAX(), AVG(), SUM(), COUNT(), etc. aggregates). What is the group BY basis. In this scenario, you want per week. Leading to the next part that you want for a specific date range qualified per your calendar table.
I would start in order what WHAT filtering criteria first. Also, ALWAYS TRY to identify all table( or alias).column in your queries so anyone after you knows where the columns are coming from, especially when multiple tables. In this case "ct" is the ALIAS for "Calendar_Table"
SELECT
ct.dt
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
Now, the above date looks to be INCLUSIVE of October 1 and looks like you are trying to generate a quarterly sum from July, Aug, Sept. I would change to LESS than Oct 1.
Now, your calendar has many days and you want it grouped by week, so the WEEK() function gets you that distinct reference without explicitly checking every date. Also, try NOT to use reserved keywords as final column names... makes for confusion later on sometimes.
I have aliased the column name as "WeekBasis". Here, I did a COUNT(*) just to show the total days and the group by showing it in context.
SELECT
WEEK( ct.dt ) WeekBasis,
MIN( ct.dt ) as FirstDayOfThisWeek,
MAX( ct.dt ) as LastDayOfThisWeek,
COUNT(*) as DaysInThisWeek
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
group by
WEEK( ct.dt )
So, at this point, we have 1 record per week within the date period you are concerned,
but I also grabbed the earliest and latest dates just to show other components too.
Now, lets get back to your extra tables. We know the dates in question, now need to
get the details from the other tables (which is lacking in the post. You should post
critical components such as how tables are related via common / joined column basis.
How is tracker_part_archive related to tracker_weld_archive??
To simplify your query, you dont even NEED your calendar table as the welding
table HAS a date field and you know your range. Just query against that directly.
IF your worker's ID is numeric, don't add quotes around it, just leave as a number.
SELECT
WEEK( twa.Weld_DateAndTime ) WeekBasis,
COUNT(*) WeldingEntriesDone,
SUM(tpa.weight) TotalWeight
from
tracker_weld_archive twa
JOIN tracker_parts_archive tpa
-- GUESSING on therelationship here.
-- may also be on a given date too???
-- all pieces welded by a person on a given date
ON twa.weld_worker = tpa.weld_worker
AND twa.Weld_DateAndTime = tpa.Weld_DateAndTime
where
twa.Weld_Worker = 133
AND twa.Weld_DateAndTime >= '2018-07-01'
AND twa.Weld_DateAndTime <= '2018-10-01'
group by
WEEK( twa.Weld_DateAndTime )
IF you provide the table structures AND sample data, this can be refined a bit more for you.

calculated field from two fields in separate tables based on a date

Using PostgreSQL v9.6
Trying average an actual intrusion time based by dividing the total intrusion time for the day by the available uptime for that day grouped by computer.
Click here for SQLFiddle of this
Query should look something like:
SELECT availability.adate, sum(intrusionEvents.duration)/availability.uptime, intrusionEvents.host
FROM availability LEFT JOIN intrusionEvents ON (availability.adate = intrusionEvents.itime::date)
WHERE availability.adate >= '20171001' AND availability.adate < '20171101'
GROUP BY availability.adate, intrusionEvents.host
ORDER BY availability.adate, intrusionEvents.host
(where uptime and intrusionDuration are in seconds...)
In variations of this I'm running into a a couple of issues:
In this example, availability.uptime needs to be in the aggregate.
I've gotten around this but then I end up with hundreds of rows.
Any suggestions (and examples) on how how to sum up individual durations from a field and then divide it by value from a different table with a matching date would be much appreciated!
So I added availability.uptime in the group by:
select availability.adate, sum(intrusionEvents.duration)/availability.uptime, intrusionEvents.host
FROM availability left join intrusionEvents on (availability.adate = intrusionEvents.itime::date)
WHERE availability.adate >= '20171001' and availability.adate < '20171101'
GROUP by availability.adate, intrusionEvents.host,
availability.uptime
Got the below results set:
Is this what you wanted?

MDX multiple filters

20 Yr SQL pro, new to MDX.
Trying to create a measure to get sales for products 30, 60, 90 days etc. after launch, but I want to exclude incomplete time periods. Here would be the sql:
select ProductName, sum(sales) '60DaySales'
from dimProduct p join factSales s on p.productkey = s.productkey
join dimCalendar c on s.orderDateKey = c.CalendarKey
where datediff(dd,p.LaunchDate,c.Date) between 31 and 62
and exists (select 1 from sales etc... where date >= 62 days)
Basically I only want to show '60DaySales' for products that also have sales beyond 62 days.
I have this MDX which gets me the time period:
sum(
filter(
[Sales].[Days Since Launch].members
,sales.[Days Since Launch].membervalue > 30 AND
sales.[Days Since Launch].membervalue < 63
)
,[Measures].[SalesBase]
)
but I'm not sure how to exclude items with no sales beyond 62 days. I've tried some combinations of iif(exists.. ) and nonempty but no luck...
I'd add extra columns rather than a calculated measures. I had a similar tasks (Sales for 30,60,90 days, but from the first sale date of customer). The best way is to add a columns to your sale measure table:
sales30 = iif(dateadd(day,30,p.LaunchDate) >= c.Date, sales, null),
sales60 = iif(dateadd(day,60,p.LaunchDate) >= c.Date, sales, null),
sales90 = iif(dateadd(day,90,p.LaunchDate) >= c.Date, sales, null)
Tasks like sales 30 days per every product is doable via MDX, but they are performance killers for big dimensions. SQL Server does it better due to its concurrent nature. Meanwhile, MDX isn't good at heavy calculations like these. So I am not even providing the code.

Multiple aggregate sums from different conditions in one sql query

Whereas I believe this is a fairly general SQL question, I am working in PostgreSQL 9.4 without an option to use other database software, and thus request that any answer be compatible with its capabilities.
I need to be able to return multiple aggregate totals from one query, such that each sum is in a new row, and each of the groupings are determined by a unique span of time, e.g. WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'. The number of records that satisfy there WHERE clause is unknown and may be zero, in which case ideally the result is "0". This is what I have worked out so far:
(
SELECT SUM(minutes) AS min
FROM downtime
WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-14' AND '2016-02-21'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-28' AND '2016-03-06'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-06' AND '2016-03-13'
)
UNION ALL
(
SELECT SUM(minutes))
FROM downtime
WHERE time_stamp BETWEEN '2016-03-13' AND '2016-03-20'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-20' AND '2016-03-27'
)
Result:
min
---+-----
1 | 119
2 | 4
3 | 30
4 |
5 | 62
6 | 350
That query gets me almost the exact result that I want; certainly good enough in that I can do exactly what I need with the results. Time spans with no records are blank but that was predictable, and whereas I would prefer "0" I can account for the blank rows in software.
But, while it isn't terrible for the 6 weeks that it represents, I want to be flexible and to be able to do the same thing for different time spans, and for a different number of data points, such as each day in a week, each week in 3 months, 6 months, each month in 1 year, 2 years, etc... As written above, it feels as if it is going to get tedious fast... for instance 1 week spans over a 2 year period is 104 sub-queries.
What I'm after is a more elegant way to get the same (or similar) result.
I also don't know if doing 104 iterations of a similar query to the above (vs. the 6 that it does now) is a particularly efficient usage.
Ultimately I am going to write some code which will help me build (and thus abstract away) the long, ugly query--but it would still be great to have a more concise and scale-able query.
In Postgres, you can generate a series of times and then use these for the aggregation:
select g.dte, coalesce(sum(dt.minutes), 0) as minutes
from generate_series('2016-02-07'::timestamp, '2016-03-20'::timestamp, interval '7 day') g(dte) left join
downtime dt
on dt.timestamp >= g.dte and dt.timestamp < g.dte + interval '7 day'
group by g.dte
order by g.dte;