Workload distribution in Oracle SQL - sql

I try to make a Workload distribution in SQL but it's seems hard.
My data are :
work-station | workload
------------------------
Station1 | 500
Station2 | 450
Station3 | 50
Station4 | 600
Station5 | 2
Station6 | 350
And :
Real Worker Number : 5
My needs are the following :
I required the exact match between real worker number than theoretical worker number (distribution)
I don't want to put someone in a station if it's not required (example : station5)
I don't want to know if my workers will be able to finish the complete workload
I want the best theoretical placement of my workers to have the best productivity
Is it possible to make this WorkLoad Distribution in a sql Request ?
Possible result :
work-station | workload | theoretical worker distribution
------------------------
Station1 | 500 | 1
Station2 | 450 | 1
Station3 | 50 | 0
Station4 | 600 | 2
Station5 | 2 | 0
Station6 | 350 | 1

Here is a very simplistic way to do it by prorating the workers by the percentage of total work assigned to each station.
The complexity comes from making sure that an integer number of workers is assigned and that the total number of assigned workers equals the number of workers that are available. Here is the query that does that:
with params as ( SELECT 5 total_workers FROM DUAL),
info ( station, workload) AS (
SELECT 'Station1', 500 FROM DUAL UNION ALL
SELECT 'Station2', 450 FROM DUAL UNION ALL
SELECT 'Station3', 50 FROM DUAL UNION ALL
SELECT 'Station4', 600 FROM DUAL UNION ALL
SELECT 'Station5', 2 FROM DUAL UNION ALL
SELECT 'Station6', 350 FROM DUAL ),
targets as (
select station,
workload,
-- What % of total work is assigned to station?
workload/sum(workload) over ( partition by null) pct_work,
-- How many workers (target_workers) would we assign if we could assign fractional workers?
total_workers * (workload/sum(workload) over ( partition by null)) target_workers,
-- Take the integer part of target_workers
floor(total_workers * (workload/sum(workload) over ( partition by null))) target_workers_floor,
-- Take the fractional part of target workers
mod(total_workers * (workload/sum(workload) over ( partition by null)),1) target_workers_frac
from params, info )
select t.station,
t.workload,
-- Start with the integer part of target workers
target_workers_floor +
-- Order the stations by the fractional part of target workers and assign 1 additional worker to each station until
-- the total number of workers assigned = the number of workers we have available.
case when row_number() over ( partition by null order by target_workers_frac desc )
<= total_workers - sum(target_workers_floor) over ( partition by null) THEN 1 ELSE 0 END target_workers
from params, targets t
order by station;
+----------+----------+----------------+--+
| STATION | WORKLOAD | TARGET_WORKERS | |
+----------+----------+----------------+--+
| Station1 | 500 | 1 | |
+----------+----------+----------------+--+
| Station2 | 450 | 1 | |
+----------+----------+----------------+--+
| Station3 | 50 | 0 | |
+----------+----------+----------------+--+
| Station4 | 600 | 2 | |
+----------+----------+----------------+--+
| Station5 | 2 | 0 | |
+----------+----------+----------------+--+
| Station6 | 350 | 1 | |
+----------+----------+----------------+--+

Below query should work:
First I divide workers to the stations that has more workload than mean work load.
Then I divide the rest of the workers to the stations in the same order of their remaining workloads.
http://sqlfiddle.com/#!4/55491/12
5 represent the number of workers.
SELECT
workload,
station,
SUM (worker_count)
FROM
(
SELECT workload, station, floor( workload / ( SELECT SUM (workload) / 5 FROM work_station ) ) worker_count -- divide workers to the stations have more workload then mean
FROM
work_station works
UNION ALL
SELECT t_table.*, 1
FROM ( SELECT workload, station
FROM work_station
ORDER BY
( workload - floor( workload / ( SELECT SUM (workload) / 5 FROM work_station ) ) * ( SELECT SUM (workload) / 5 FROM work_station )
) DESC
) t_table
WHERE
rownum < ( 5 - ( SELECT SUM ( floor( workload / ( SELECT SUM (workload) / 5 FROM work_station ) ) ) FROM work_station ) + 1
) -- count of the rest of the workers
) table_sum
GROUP BY
workload,
station
ORDER BY
station

Related

Stop SQL Select After Sum Reached

My database is Db2 for IBM i.
I have read-only access, so my query must use only basic SQL select commands.
==============================================================
Goal:
I want to select every record in the table until the sum of the amount column exceeds the predetermined limit.
Example:
I want to match every item down the table until the sum of matched values in the "price" column >= $9.00.
The desired result:
Is this possible?
You may use sum analytic function to calculate running total of price and then filter by its value:
with a as (
select
t.*,
sum(price) over(order by salesid asc) as price_rsum
from t
)
select *
from a
where price_rsum <= 9
SALESID | PRICE | PRICE_RSUM
------: | ----: | ---------:
1001 | 5 | 5
1002 | 3 | 8
1003 | 1 | 9
db<>fiddle here

SQL Query to change the column format to row for each value

I have a query that fetches the data somewhat like -
#1
select
assignment_number,
ldg_name,
effective_start_date,
tax_reporting_name,
payroll_relationship_number,
filing_status,
allowance,
additional_tax_extra_withold,
exemp_fit,
exemp_medicare,
exemp_wage_accumulation,
exemp_unemployment
from
( SELECT PPRD.payroll_relationship_number,
PAAM.assignment_number,
PLDG.name
LDG_NAME,
To_char(PAAM.effective_start_date, 'YYYY-MM-DD')
effective_start_date,
(SELECT DISTINCT HOU.name
FROM pay_dir_rep_cards_f PDRCF,
hr_organization_units HOU,
pay_dir_rep_card_usages_f PDRCUF,
pay_rel_groups_dn PRGD1
WHERE PDRCF.dir_card_id = PDCF.dir_card_id
AND HOU.organization_id = PDRCF.tax_unit_id
AND PDRCUF.dir_rep_card_id = PDRCF.dir_rep_card_id
AND PRGD1.relationship_group_id =
PDRCUF.relationship_group_id
AND PDRCF.dir_card_comp_id = PDCCF2.dir_card_comp_id
AND trunc(sysdate) -11 BETWEEN PDRCF.effective_start_date AND
PDRCF.effective_end_date
AND trunc(sysdate) -11 BETWEEN
PRGD1.start_date AND PRGD1.end_date)
TAX_REPORTING_NAME,
(SELECT DISTINCT Decode (dir_information_char1, '1', 'Single',
'2', 'Married',
'3',
'Married and withholding at higher single rate'
FROM pay_dir_card_components_f PDCCF3,
pay_dir_comp_details_f PDCDF3
WHERE PDCCF3.dir_card_id = PDCF.dir_card_id
AND PDCCF3.dir_card_comp_id = PDCDF3.dir_card_comp_id
AND PDCDF3.dir_information_category =
'HRX_US_WTH_FEDERAL_INCOME_TAX'
AND trunc(sysdate) -11 BETWEEN PDCDF3.effective_start_date
AND
PDCDF3.effective_end_date)
FILING_STATUS,
exemp_fit,
PDFC.exemp_medicare,
PDFC.exemp_wage_accumulation,
PDFC.exemp_unemployment,
PDFC.exemp_social_security,
PDFC.regular_rate,
PDFC.regular_amount,
PDFC.supplemental_rate,
PDFC.supplemental_amt,
PDFC.irs_lock_in_date,
PDFC.statutory_employee,
PDFC.cumulative_taxation,
PDFC.primary_address,
PDFC.state_disability_calc,
PDFC.state_unemp_calc,
PDFC.qualifying_dependent,
PDFC.other_dependent,
PDFC.total_dependent,
PDFC.other_income,
PDFC.deduction_amount,
PDFC.max_federal_allowance ,
PDFC.allowance,
PDFC.additional_tax_extra_withold
from
per_all_assignments_m paam,
per_all_payroll_relationship PPRD,
per_legislative_table PLDG,
Per_cards PDCF,
Per_components PDFC
WHERE PAAM.ASSIGNMENT_NUMBER = PPRD.ASSIGNMENT_NUMBER
AND PPRD.LEGAL_UNIT = PLDG.LEGAL_UNIT
AND PDCF.dir_card_id = PDFC.dir_card_id
AND PDFC.ASSIGNMENT_ID = PAAM.ASSIGNMENT_ID
AND trunc(sysdate) -11 BETWEEN PAAM.effective_start_date
AND
PAAM.effective_end_date)
The above query is giving me the correct output in the format-
assignment_number ldg_name effective_start_date tax_reporting_name payroll_relationship_number filing_status allowance additional_tax_extra_withold exemp_fit exemp_medicare exemp_wage_accumulation exemp_unemployment
10 US 02-Aug-2020 Ontario 10-1 Single 1000 10 Y N
I need these columns in the format(Columns into rows) -
ASSIGNMENT_NUMBER ValueDefinitionName
10 filing_status
10 allowance
10 additional_tax_extra_withold
10 exemp_fit
10 exemp_medicare
i.e. the column names which ever is not null for an assignment_number.
Also, in the format ,
ASSIGNMENT_NUMBER ValueDefinitionName Value1
10 filing_status Single
10 allowance 1000
10 additional_tax_extra_withold 10
10 exemp_fit Y
10 exemp_medicare N
Since exemp_wage_accumulation and exemp_unemployment is null it should not be included.
Is there any way possible to achieve using my first query#1?
You should be able to use unpivot for this...
Here is a way to do this. I replaced the large query you got as a temp block with the appropriate data types--> data. You would need to replace the query that i have in the "data" block with the large query you got and also make sure that the datatypes for all of the columns you got is consistent (in this case all of the non-varchar columns are type casted to varchar)
After that you can use unpivot as follows
with data
as (select assignment_number
,ldg_name
,to_char(effective_start_date,'dd-mon-yyyy') as effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,to_char(allowance) as allowance
,to_char(additional_tax_extra_withold) as additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
from t
)
select *
from data
unpivot (value1 for valuedefinitions in (
ldg_name
,effective_start_date
,tax_reporting_name
,payroll_relationship_number
,filing_status
,allowance
,additional_tax_extra_withold
,exemp_fit
,exemp_medicare
,exemp_wage_accumulation
,exemp_unemployment
)
)
+-------------------+------------------------------+-------------+
| ASSIGNMENT_NUMBER | VALUEDEFINITIONS | VALUE1 |
+-------------------+------------------------------+-------------+
| 10 | LDG_NAME | US |
| 10 | EFFECTIVE_START_DATE | 02-aug-2020 |
| 10 | TAX_REPORTING_NAME | Ontario |
| 10 | PAYROLL_RELATIONSHIP_NUMBER | 10-1 |
| 10 | FILING_STATUS | Single |
| 10 | ALLOWANCE | 1000 |
| 10 | ADDITIONAL_TAX_EXTRA_WITHOLD | 10 |
| 10 | EXEMP_FIT | Y |
| 10 | EXEMP_MEDICARE | N |
+-------------------+------------------------------+-------------+
Here is a db fiddle link
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=9d211abbede82e276464333018a70731
You can try below method but it will work only if all columns except assignment number are of same type i.e. varchar or integer. You can have your main query as CTE.
Note: This is not a complete working query but just a hint to get you started.
with table1 as (...You existing query...)
SELECT ASSIGNMENT_NUMBER, "FILING_STATUS" AS VALUEDefinationname, filing_status as value1 from table1 where filing_status is not null
union
SELECT ASSIGNMENT_NUMBER, "allowance", allowance from table1 where allowance is not null

How to dynamically perform a weighted random row selection in PostgreSQL?

I have following table for an app where a student is assigned task to play educational game.
Student{id, last_played_datetime, total_play_duration, total_points_earned}
The app selects a student at random and assigns the task. The student earns a point for just playing the game. The app records the date and time when the game was played and for how much duration. I want to randomly select a student and assign the task. At a time only one student can be assigned the task. To give equal opportunity to all students I am dynamically calculating weight for the student using the date and time a student last played the game, the total play duration and the total points earned by the student. A student will then be randomly choosen influenced on the weight.
How do I, in PostgreSQL, randomly select a row from a table depending on the dynamically calculated weight of the row?
The weight for each student is calculated as follows: (minutes(current_datetime - last_played_datetime) * 0.75 + total_play_duration * 0.5 + total_points_earned * 0.25) / 1.5
Sample data:
+====+======================+=====================+=====================+
| Id | last_played_datetime | total_play_duration | total_points_earned |
+====+======================+=====================+=====================+
| 1 | 01/02/2011 | 300 mins | 7 |
+----+----------------------+---------------------+---------------------+
| 2 | 06/02/2011 | 400 mins | 6 |
+----+----------------------+---------------------+---------------------+
| 3 | 01/03/2011 | 350 mins | 8 |
+----+----------------------+---------------------+---------------------+
| 4 | 22/03/2011 | 550 mins | 9 |
+----+----------------------+---------------------+---------------------+
| 5 | 01/03/2011 | 350 mins | 8 |
+----+----------------------+---------------------+---------------------+
| 6 | 10/01/2011 | 130 mins | 2 |
+----+----------------------+---------------------+---------------------+
| 7 | 03/01/2011 | 30 mins | 1 |
+----+----------------------+---------------------+---------------------+
| 8 | 07/10/2011 | 0 mins | 0 |
+----+----------------------+---------------------+---------------------+
Here is a solution that works as follows:
first compute the weight of each student
sum the weight of all students and multiply if by a random seed
then pick the first student above that target, random, weight
Query:
with
student_with_weight as (
select
id,
(
extract(epoch from (now() - last_played_datetime)) / 60 * 0.75
+ total_play_duration * 0.5
+ total_points_earned * 0.25
) / 1.5 weight
from student
),
random_weight as (
select random() * (select sum(weight) weight from student_with_weight ) weight
)
select id
from
student_with_weight s
inner join random_weight r on s.weight >= r.weight
order by id
limit 1;
You can use a cumulative sum on the weights and compare to rand(). It looks like this:
with s as (
select s.*,
<your expression> as weight
from s
)
select s.*
from (select s.*,
sum(weight) over (order by weight) as running_weight,
sum(weight) over () as total_weight
from s
) s cross join
(values (random())) r(rand)
where r.rand * total_weight >= running_weight - weight and
r.rand * total_weight < running_weight;
The values() clause ensures that the random value is calculated only once for the query. Funky things can happen if you put random() in the where clause, because it will be recalculated for each comparison.
Basically, you can think of the cumulative sum as dividing up the total count into discrete regions. The rand() is then just choosing one of them.

SQL GROUP BY and differences on same field (for MS Access)

Hi I have the following style of table under MS Access: (I didn't make the table and cant change it)
Date_r | Id_Person |Points |Position
25/05/2015 | 120 | 2000 | 1
25/05/2015 | 230 | 1500 | 2
25/05/2015 | 100 | 500 | 3
21/12/2015 | 120 | 2200 | 1
21/12/2015 | 230 | 2000 | 4
21/12/2015 | 100 | 200 | 20
what I am trying to do is to get a list of players (identified by Id_Person) ordered by the points difference between 2 dates.
So for example if I pick date1=25/05/2015 and date2=21/12/2015 I would get:
Id_Person |Points_Diff
230 | 500
120 | 200
100 |-300
I think I need to make something like
SELECT Id_Person , MAX(Points)-MIN(Points)
FROM Table
WHERE date_r = #25/05/2015# or date_r = #21/12/2015#
GROUP BY Id_Person
ORDER BY MAX(Points)-MIN(Points) DESC
But my problem is that i don't really want to order by (MAX(Points)-MIN(Points)) but rather by (points at date2 - points at date1) which can be different because points can decrease with the time.
One method is to use first and last However, this can sometimes produce strange results, so I think that conditional aggregation is best:
SELECT Id_Person,
(MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY (MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) DESC;
Because you have two dates, this is more easily written as:
SELECT Id_Person,
SUM(IIF(date_r = #25/05/2015#, Points, -Points)) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY SUM(IIF(date_r = #25/05/2015#, Points, -Points)) DESC;

How to calculate bandwidth by SQL query

I have a table like this:
-----------+------------+-------
first | last | bytes
-----------+------------+-------
1441013602 | 1441013602 | 10
-----------+------------+-------
1441013602 | 1441013603 | 20
-----------+------------+-------
1441013603 | 1441013605 | 30
-----------+------------+-------
1441013610 | 1441013612 | 30
which
'first' column is switching time of the first packet of a traffic flow
'last' column is switching time of the last packet of a traffic flow
'bytes' is the volume of the traffic flow.
How can I calculate bandwidth usage for each second from 1441013602 to 1441013612 ?
I want this:
1441013602 20 B/s
1441013603 20 B/s
1441013604 10 B/s
1441013605 10 B/s
1441013606 0 B/s
1441013607 0 B/s
1441013608 0 B/s
1441013609 0 B/s
1441013610 10 B/s
1441013611 10 B/s
1441013612 10 B/s
You can use PostgreSQL's generate_series function for this. Generate a series of rows, one for each second, since that's what you want. Then left join on the table of info, so that you get one row for each second for each data flow. GROUP BY seconds, and sum the data flow bytes.
e.g:
SELECT seconds.second, coalesce(sum(t.bytes::float8 / (t.last::float8-t.first::float8+1)),0)
FROM generate_series(
(SELECT min(t1.first) FROM Table1 t1),
(SELECT max(t1.last) FROM Table1 t1)
) seconds(second)
LEFT JOIN table1 t
ON (seconds.second BETWEEN t.first and t.last)
GROUP BY seconds.second
ORDER BY seconds.second;
http://sqlfiddle.com/#!15/b3b07/7
Note that we calculate the bytes per second of the flow, then sum that over the seconds of the flow across all flows. This only gives an estimate, since we don't know if the flow rate was steady over the flow duration.
For formatting the bytes, use the format function and/or pg_size_pretty.
Here an approach at SQL Fiddle
PostgreSQL 9.3 Schema Setup:
create table t ( first int, last int, bytes int );
insert into t values
(1441013602 , 1441013602 , 10 ),
(1441013602 , 1441013603 , 20 ),
(1441013603 , 1441013605 , 30 ),
(1441013610 , 1441013612 , 30 );
The query:
with
bytes as
( select first, last,
( last - first ) as calc_time,
bytes
from t
where ( last - first )>0 ),
bytes_per_second as
( select first, last, bytes / calc_time as Bs
from bytes ),
calc_interval as
( SELECT * FROM generate_series(1441013602,1441013612) )
select
i.generate_series, bps.Bs
from
calc_interval i
left outer join
bytes_per_second bps
on i.generate_series between bps.first and bps.last - 1
order by
i.generate_series
Results:
| generate_series | bs |
|-----------------|--------|
| 1441013602 | 20 |
| 1441013603 | 15 |
| 1441013604 | 15 |
| 1441013605 | (null) |
| 1441013606 | (null) |
| 1441013607 | (null) |
| 1441013608 | (null) |
| 1441013609 | (null) |
| 1441013610 | 15 |
| 1441013611 | 15 |
| 1441013612 | (null) |
Explanation:
bytes and bytes_per_second are to cleaning data, perhaps is more accurate to do an average.
calc_interval a generator for your seconds.
last select is the final calculation. Also joining generated seconds with bandwidth.