windowing functions ms access - sql

I am working on a class scheduling database in MS Access. There are a variety of classes, each of which is taught multiple times, sometimes multiple times in a day, but not necessarily every day. Each course has a unique set of software and data that is stored on a laptop. there is a set of laptops for each course with that software loaded.
For any given training day I need to assign a range of laptop IDs to the right classes in different rooms, depending on how many people will be taking that class in that room, so that the instructors know which laptops to take to the room with them to teach the class that day.
For example, I have the raw data:
Date Room ClassName HeadCount
---- ---- --------- ---------
11/30 101 Intro 10
11/30 102 Intro 15
11/30 103 Course 2 5
12/1 101 Intro 10
12/1 102 Course 2 15
12/1 103 Course 3 10
I also know the following about the laptops:
ClassName LaptopID
--------- ---------
Intro LT.Intro_1
Intro ...
Intro LT.Intro_30
Course 2 LT.Course 2_1
Course 2 ...
Course 2 LT.Course 2_30
Course 3 LT.Course 3_1
Course 3 ...
Course 3 LT.Course 3_30
Based on the above two tables, I would want to output:
Date Room ClassName HeadCount First Laptop Last Laptop
---- ---- --------- --------- ------------ -----------
11/30 101 Intro 10 LT.Intro_1 LT.Intro_10
11/30 102 Intro 15 LT.Intro_11 LT.Intro_25
11/30 103 Course 2 5 LT.Course 2_1 LT.Course 2_5
12/1 101 Intro 10 LT.Intro_1 LT.Intro_10
12/1 102 Course 2 15 LT.Course 2_1 LT.Course 2_15
12/1 103 Course 3 10 LT.Course 3_1 LT.Course 3_10
I know this is a windowing function, but MS Access doesn't have lead or lag. Is there a workaround?

You might want to change your table definitions for better performance. I have recreated two tables as you've mentioned.
You know your laptop ids are in sequence and you know the headcount per class. In order to follow a lead, you must know the last headcount.
which would be toal attendees on the same date, for the same class, before current class/event.
x = sum(headCount) where id < currentID & classname = currentClassname & date = currentDate. (Current means currentRow.)
Now you know total laptops used before the current row and the headCount for current row. The First laptop would be
f = min(laptopid) where laptopid > x (x being totaLaptopUsedBefore this Row)
for the Last laptop, you must also add the current headcount.
l = min(laptopid) where laptopid >= currentHeadCount + x
Note f checks laptopid is greater and L checks >=.
Here is a working demo which you can improve on:
Table1: tbl_ClassEvents
+----+------------+------+-----------+-----------+
| ID | date | Room | ClassName | HeadCount |
+----+------------+------+-----------+-----------+
| 1 | 30/11/2017 | 101 | Intro | 10 |
| 2 | 30/11/2017 | 102 | intro | 15 |
| 3 | 30/11/2017 | 103 | Course 2 | 5 |
| 4 | 01/12/2017 | 101 | Intro | 10 |
| 5 | 01/12/2017 | 102 | Course 2 | 15 |
| 6 | 01/12/2017 | 103 | Course 3 | 10 |
| 7 | 17/11/2017 | 101 | Intro | 16 |
+----+------------+------+-----------+-----------+
Table2: Tbl_ClassVsLaptop
+----+-----------+----------------+
| Id | ClassName | LaptopId |
+----+-----------+----------------+
| 1 | Intro | LT.Intro_1 |
| 2 | Intro | LT.Intro_2 |
| 3 | Intro | LT.Intro_3 |
| 4 | Intro | LT.Intro_4 |
| 5 | Intro | LT.Intro_5 |
| 6 | Intro | LT.Intro_6 |
| 7 | Intro | LT.Intro_7 |
| 8 | Intro | LT.Intro_8 |
| 9 | Intro | LT.Intro_9 |
| 10 | Intro | LT.Intro_10 |
| 11 | Intro | LT.Intro_11 |
| 12 | Intro | LT.Intro_12 |
| 13 | Intro | LT.Intro_13 |
| 14 | Intro | LT.Intro_14 |
| 15 | Intro | LT.Intro_15 |
| 16 | Intro | LT.Intro_16 |
| 17 | Intro | LT.Intro_17 |
| 18 | Intro | LT.Intro_18 |
| 19 | Intro | LT.Intro_19 |
| 20 | Intro | LT.Intro_20 |
| 21 | Intro | LT.Intro_21 |
| 22 | Intro | LT.Intro_22 |
| 23 | Intro | LT.Intro_23 |
| 24 | Intro | LT.Intro_24 |
| 25 | Intro | LT.Intro_25 |
| 26 | Intro | LT.Intro_26 |
| 27 | Intro | LT.Intro_27 |
| 28 | Intro | LT.Intro_28 |
| 29 | Intro | LT.Intro_29 |
| 30 | Intro | LT.Intro_30 |
| 31 | Course 2 | LT.Course 2_1 |
| 32 | Course 2 | LT.Course 2_2 |
| 33 | Course 2 | LT.Course 2_3 |
| 34 | Course 2 | LT.Course 2_4 |
| 35 | Course 2 | LT.Course 2_5 |
| 36 | Course 2 | LT.Course 2_6 |
| 37 | Course 2 | LT.Course 2_7 |
| 38 | Course 2 | LT.Course 2_8 |
| 39 | Course 2 | LT.Course 2_9 |
| 40 | Course 2 | LT.Course 2_10 |
| 41 | Course 2 | LT.Course 2_11 |
| 42 | Course 2 | LT.Course 2_12 |
| 43 | Course 2 | LT.Course 2_13 |
| 44 | Course 2 | LT.Course 2_14 |
| 45 | Course 2 | LT.Course 2_15 |
| 46 | Course 2 | LT.Course 2_16 |
| 47 | Course 2 | LT.Course 2_17 |
| 48 | Course 2 | LT.Course 2_18 |
| 49 | Course 2 | LT.Course 2_19 |
| 50 | Course 2 | LT.Course 2_20 |
| 51 | Course 2 | LT.Course 2_21 |
| 52 | Course 2 | LT.Course 2_22 |
| 53 | Course 2 | LT.Course 2_23 |
| 54 | Course 2 | LT.Course 2_24 |
| 55 | Course 2 | LT.Course 2_25 |
| 56 | Course 2 | LT.Course 2_26 |
| 57 | Course 2 | LT.Course 2_27 |
| 58 | Course 2 | LT.Course 2_28 |
| 59 | Course 2 | LT.Course 2_29 |
| 60 | Course 2 | LT.Course 2_30 |
| 61 | Course 3 | LT.Course 3_1 |
| 62 | Course 3 | LT.Course 3_2 |
| 63 | Course 3 | LT.Course 3_3 |
| 64 | Course 3 | LT.Course 3_4 |
| 65 | Course 3 | LT.Course 3_5 |
| 66 | Course 3 | LT.Course 3_6 |
| 67 | Course 3 | LT.Course 3_7 |
| 68 | Course 3 | LT.Course 3_8 |
| 69 | Course 3 | LT.Course 3_9 |
| 70 | Course 3 | LT.Course 3_10 |
| 71 | Course 3 | LT.Course 3_11 |
| 72 | Course 3 | LT.Course 3_12 |
| 73 | Course 3 | LT.Course 3_13 |
| 74 | Course 3 | LT.Course 3_14 |
| 75 | Course 3 | LT.Course 3_15 |
| 76 | Course 3 | LT.Course 3_16 |
| 77 | Course 3 | LT.Course 3_17 |
| 78 | Course 3 | LT.Course 3_18 |
| 79 | Course 3 | LT.Course 3_19 |
| 80 | Course 3 | LT.Course 3_20 |
| 81 | Course 3 | LT.Course 3_21 |
| 82 | Course 3 | LT.Course 3_22 |
| 83 | Course 3 | LT.Course 3_23 |
| 84 | Course 3 | LT.Course 3_24 |
| 85 | Course 3 | LT.Course 3_25 |
| 86 | Course 3 | LT.Course 3_26 |
| 87 | Course 3 | LT.Course 3_27 |
| 88 | Course 3 | LT.Course 3_28 |
| 89 | Course 3 | LT.Course 3_29 |
| 90 | Course 3 | LT.Course 3_30 |
+----+-----------+----------------+
Here is the query:
SELECT tbl_classEvents.ID
,tbl_classEvents.DATE
,tbl_classEvents.Room
,tbl_classEvents.ClassName
,tbl_classEvents.HeadCount
,(
SELECT min(laptopId)
FROM tbl_ClassVsLaptop T1
WHERE T1.ClassName = tbl_ClassEvents.ClassNAme
AND Mid([T1.LaptopID], InStrRev([T1.LaptopID], "_") + 1, 3) > (
+ Nz((
SELECT sum(headCount)
FROM tbl_classEvents T2
WHERE T2.ID < Tbl_ClassEvents.ID
AND T2.[DATE] = [Tbl_ClassEvents].[DATE]
AND T2.[ClassName] = [Tbl_ClassEvents].[ClassName]
), 0)
)
) AS FirstLaptop
,(
SELECT min(laptopId)
FROM tbl_ClassVsLaptop T1
WHERE T1.ClassName = tbl_ClassEvents.ClassNAme
AND Mid([T1.LaptopID], InStrRev([T1.LaptopID], "_") + 1, 3) >= (
+ [tbl_classEvents].[HeadCount] + Nz((
SELECT sum(headCount)
FROM tbl_classEvents T2
WHERE T2.ID < Tbl_ClassEvents.ID
AND T2.[DATE] = [Tbl_ClassEvents].[DATE]
AND T2.[ClassName] = [Tbl_ClassEvents].[ClassName]
), 0)
)
) AS LastLaptop
FROM tbl_classEvents
ORDER BY tbl_classEvents.DATE
,tbl_classEvents.Room
,tbl_classEvents.ClassNAme;
And the output:
+----+------------+------+-----------+-----------+---------------+----------------+
| ID | DATE | Room | ClassName | HeadCount | FirstLaptop | LastLaptop |
+----+------------+------+-----------+-----------+---------------+----------------+
| 7 | 17/11/2017 | 101 | Intro | 16 | LT.Intro_1 | LT.Intro_16 |
| 1 | 30/11/2017 | 101 | Intro | 10 | LT.Intro_1 | LT.Intro_10 |
| 2 | 30/11/2017 | 102 | intro | 15 | LT.Intro_11 | LT.Intro_25 |
| 3 | 30/11/2017 | 103 | Course 2 | 5 | LT.Course 2_1 | LT.Course 2_5 |
| 4 | 01/12/2017 | 101 | Intro | 10 | LT.Intro_1 | LT.Intro_10 |
| 5 | 01/12/2017 | 102 | Course 2 | 15 | LT.Course 2_1 | LT.Course 2_15 |
| 6 | 01/12/2017 | 103 | Course 3 | 10 | LT.Course 3_1 | LT.Course 3_10 |
+----+------------+------+-----------+-----------+---------------+----------------+

Related

Theil–Sen estimator using Hive

I would like to calculate the Theil–Sen estimator per ID for the value column in the sample table below using hive. The Theil–Sen estimator is defined here https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator, I tried to use arrays but could not figure out a solution. Any help is appreciated.
+----+-------+-------+
| 1 | 1 | 10 |
| 1 | 2 | 20 |
| 1 | 3 | 30 |
| 1 | 4 | 40 |
| 1 | 5 | 50 |
| 2 | 1 | 100 |
| 2 | 2 | 90 |
| 2 | 3 | 102 |
| 2 | 4 | 75 |
| 2 | 5 | 70 |
| 2 | 6 | 50 |
| 2 | 7 | 100 |
| 2 | 8 | 80 |
| 2 | 9 | 60 |
| 2 | 10 | 50 |
| 2 | 11 | 40 |
| 2 | 12 | 40 |
+----+-------+-------+

Get aggregate quantity from JOINED tables

I have the following two tables in my database
inventory_transactions table
id | date_created | company_id | product_id | quantity | amount | is_verified | buy_or_sell_to | transaction_type | parent_tx | invoice_id | order_id | transaction_comment
----+----------------------------+------------+------------+----------+--------+-------------+----------------+------------------+-----------+------------+----------+---------------------
1 | 2022-04-25 10:42:00.627495 | 20 | 100 | 23 | 7659 | t | | BUY | | 1 | |
2 | 2022-04-25 10:48:48.02342 | 21 | 2 | 10 | 100 | t | | BUY | | 2 | |
3 | 2022-04-25 11:00:11.624176 | 21 | 7 | 10 | 100 | t | | BUY | | 3 | |
4 | 2022-04-25 11:08:14.607117 | 23 | 1 | 11 | 1210 | t | | BUY | | 4 | |
5 | 2022-04-25 11:13:24.084845 | 23 | 28 | 16 | 2560 | t | | BUY | | 5 | |
6 | 2022-04-25 11:26:56.338881 | 23 | 28 | 15 | 3525 | t | 5 | BUY | | 6 | 1 |
7 | 2022-04-25 11:26:56.340112 | 5 | 28 | 15 | 3525 | t | 23 | SELL | 6 | 6 | 1 |
8 | 2022-04-25 11:30:08.529288 | 23 | 30 | 65 | 15925 | t | 5 | BUY | | 7 | 2 |
9 | 2022-04-25 11:30:08.531005 | 5 | 30 | 65 | 15925 | t | 23 | SELL | 8 | 7 | 2 |
14 | 2022-04-25 12:28:51.658902 | 23 | 28 | 235 | 55225 | t | 5 | BUY | | 11 | 5 |
15 | 2022-04-25 12:28:51.660478 | 5 | 28 | 235 | 55225 | t | 23 | SELL | 14 | 11 | 5 |
20 | 2022-04-25 13:01:31.091524 | 20 | 4 | 4 | 176 | t | | BUY | | 15 | |
10 | 2022-04-25 11:50:48.4519 | 21 | 38 | 1 | 10 | t | | BUY | | 8 | |
11 | 2022-04-25 11:50:48.454118 | 21 | 36 | 1 | 10 | t | | BUY | | 8 | |
12 | 2022-04-25 11:52:19.827671 | 21 | 29 | 1 | 10 | t | | BUY | | 9 | |
13 | 2022-04-25 11:53:16.699881 | 21 | 74 | 1 | 10 | t | | BUY | | 10 | |
16 | 2022-04-25 12:37:39.739125 | 20 | 1 | 228 | 58824 | t | | BUY | | 12 | |
17 | 2022-04-25 12:37:39.741106 | 20 | 3 | 228 | 58824 | t | | BUY | | 12 | |
18 | 2022-04-25 12:49:09.922686 | 21 | 41 | 10 | 1000 | t | | BUY | | 13 | |
19 | 2022-04-25 12:55:11.986451 | 20 | 5 | 22 | 484 | t | | BUY | | 14 | |
NOTE each transaction in the inventory_transactions table is recorded twice with the company_id and buy_or_sell_to swapped for the 2nd row and transaction_type BUY or SELL reserved. (similar to how a journal is menatained in accounting).
db# select * from inventory_transactions where buy_or_sell_to is not Null order by date_created limit 50;
id | date_created | company_id | product_id | quantity | amount | is_verified | buy_or_sell_to | transaction_type | parent_tx | invoice_id | order_id | transaction_comment
----+----------------------------+------------+------------+----------+--------+-------------+----------------+------------------+-----------+------------+----------+---------------------
6 | 2022-04-25 11:26:56.338881 | 23 | 28 | 15 | 3525 | t | 5 | BUY | | 6 | 1 |
7 | 2022-04-25 11:26:56.340112 | 5 | 28 | 15 | 3525 | t | 23 | SELL | 6 | 6 | 1 |
8 | 2022-04-25 11:30:08.529288 | 23 | 30 | 65 | 15925 | t | 5 | BUY | | 7 | 2 |
9 | 2022-04-25 11:30:08.531005 | 5 | 30 | 65 | 15925 | t | 23 | SELL | 8 | 7 | 2 |
companies table (consider this as the users table, in my project all users are companies)
id | company_type | gstin | name | phone_no | address | pincode | is_hymbee_verified | is_active | district_id | pancard_no
----+--------------+-----------------+-------------+------------+---------+---------+--------------------+-----------+-------------+------------
26 | RETAILER | XXXXXXXXXXXXXXX | ACD LLC | 12345%7898 | AQWSAQW | 319401 | | | 11 | AQWSDERFVV
27 | DISTRIBUTOR | XXXXXXXXXXXXXXX | CDF LLC | 123XX7898 | AGWSAQW | 319201 | | | 13 | AQWSDERFVV
28 | RETAILER | XXXXXXXXXXXXXXX | !## LLC | 1234!67XX9 | AQCCAQW | 319101 | | | 16 | AQWSDERFVV
29 | COMPANY | XXXXXXXXXXXXXXX | ZAZ LLC | 123456S898 | AQWQQQW | 319001 | | | 19 | AQWSDERFVV
Problem statement
The query I am trying to write will fetch quantity sold only to users who are RETAILERs and DISTRIBUTORS by users who are either a RETAILER or a DISTRIBUTOR.
for example, if a user is a RETAILER, we need to calculate how much quantity this RETAILER has sold to other users who are either RETAILER or DISTRIBUTORs.
In other words, for all rows in the companies table check if the company is of company_type, RETAILER or DISTRIBUTOR and from the inventory_transactions table, check how much quantity a partiuclar RETAILER OR DISTRIBUTOR has sold to other RETAILERs and DISTRIBUTORs
I have very basic knowledge of SQL and have only gotten so far:
select Seller.id as Seller_ROW, Buyer.id as Buyer_row, Seller.company_id, Buyer.buy_or_sell_to, Seller.company_type as Seller_Type, Buyer.company_type as Buyer_Type, Seller.quantity, Buyer.quantity
FROM
(select t.id, t.company_id, t.quantity, c.company_type
from inventory_transactions as t
join companies as c on c.id = t.company_id
where c.company_type = 'RETAILER' or company_type = 'DISTRIBUTOR'
) as Seller
JOIN
(select t.id, t.buy_or_sell_to, t.quantity, c.company_type
from inventory_transactions as t
join companies as c on c.id = t.buy_or_sell_to
where c.company_type = 'RETAILER' or company_type = 'DISTRIBUTOR') as Buyer on Seller.id = Buyer.id
output
seller_row | buyer_row | company_id | buy_or_sell_to | seller_type | buyer_type | quantity | quantity
------------+-----------+------------+----------------+-------------+-------------+----------+----------
25 | 25 | 22 | 25 | RETAILER | DISTRIBUTOR | 1 | 1
26 | 26 | 25 | 22 | DISTRIBUTOR | RETAILER | 1 | 1
31 | 31 | 37 | 43 | DISTRIBUTOR | RETAILER | 10 | 10
32 | 32 | 43 | 37 | RETAILER | DISTRIBUTOR | 10 | 10
33 | 33 | 21 | 43 | DISTRIBUTOR | RETAILER | 1 | 1
34 | 34 | 43 | 21 | RETAILER | DISTRIBUTOR | 1 | 1
35 | 35 | 21 | 49 | DISTRIBUTOR | RETAILER | 1 | 1
36 | 36 | 49 | 21 | RETAILER | DISTRIBUTOR | 1 | 1
37 | 37 | 21 | 51 | DISTRIBUTOR | RETAILER | 1 | 1
38 | 38 | 51 | 21 | RETAILER | DISTRIBUTOR | 1 | 1
There are duplicate rows in the resulting table and so i am unable to do a SUM().
Expected result
SELLER.company_id | SELLER.company_name | SELLER.company_type | QUANTITY | BUYER.company_type
26 | XYZ Retail Co. | RETAILER | 14 | RETAILER
26 | XYZ Retail Co. | RETAILER | 1 | DISTRIBUTOR
27 | ACD Distributions | DISTRIBUTOR | 0 | RETAILER
27 | ACD Distributions | DISTRIBUTOR | 10 | DISTRIBUTOR
This answer assumes that every sale is represented as two rows in inventory_transactions, which makes it possible to avoid duplicates by working with only one transaction_type, so we'll filter on SELL transactions.
SELECT t.company_id AS seller_company_id
, s.company_name AS seller_company_name
, s.company_type AS seller_company_type
, SUM(t.quantity) AS quantity
, b.company_type AS buyer_company_type
FROM inventory_transactions AS t
INNER JOIN companies AS s
ON s.id = t.company_id
INNER JOIN companies AS b
ON b.id = buy_or_sell_to
WHERE t.transaction_type = 'SELL'
AND s.company_type IN ('RETAILER','DISTRIBUTOR')
AND b.company_type IN ('RETAILER','DISTRIBUTOR')
GROUP BY t.company_id, s.company_name, s.company_type, b.company_type
ORDER BY seller_company_id, seller_company_name, seller_company_type, buyer_company_type
;

grouping dataframe based on specific column value

I am working on realtime project.I have I dataframe looks like below.
| id | name | values|
| 101 | a | 13 |
| 101 | b | 14 |
| cv |
59 |
| 101 | c | 13 |
| 23 |
| 102 | a | 13 |
| 102 | b | 14 |
| cv |
56 |
| 102 | c | 17 |
| 23
I need the data fame looks like below when the value is same like 'cv'
| 101 | a | 13 |
| 101 | b | cv |
| 101 | c | 13 |
| 23 |
| 102 | a | 13 |
| 102 | b | cv |
| 102 | c | 17 |
23 |

How to return the maximum and minimum values for specific ID SQL

Given the following SQL tables: https://imgur.com/a/NI8VrC7. For each specific ID_t I need to return the MAX() and MIN() value of Cena_c(total price) column of a given ID_t.
| ID_t | Nazwa |
| ---- | ----- |
| 1 | T1 |
| 2 | T2 |
| 3 | T3 |
| 4 | T4 |
| 5 | T5 |
| 6 | T6 |
| 7 | T7 |
| ID | ID_t | Ilosc | Cena_j | Cena_c | ID_p |
| ---- | ---- | ----- | ------ | ------ | ---- |
| 100 | 1 | 1 | 10 | 10 | 1 |
| 101 | 2 | 3 | 20 | 60 | 2 |
| 102 | 4 | 5 | 10 | 50 | 7 |
| 103 | 2 | 2 | 20 | 40 | 5 |
| 104 | 5 | 1 | 30 | 30 | 5 |
| 105 | 7 | 6 | 80 | 480 | 1 |
| 106 | 6 | 7 | 15 | 105 | 2 |
| 107 | 6 | 5 | 15 | 75 | 1 |
| 108 | 3 | 3 | 25 | 75 | 7 |
| 109 | 7 | 1 | 80 | 80 | 5 |
| 110 | 4 | 1 | 10 | 10 | 2 |
| 111 | 2 | 9 | 20 | 180 | 2 |
Based on provided tables the correct result should look like this:
| ID_t | Cena_c_max | Cena_c_min |
| ----- | ---------- | ---------- |
| T1 | 10 | 10 |
| T2 | 180 | 60 |
| T3 | 75 | 75 |
| T4 | 50 | 10 |
| T5 | 30 | 30 |
| T6 | 105 | 75 |
| T7 | 480 | 80 |
Is this even possible?
I haven't found anything yet that I could use to implement my solution.
SELECT concat('T',ID_t), max(Cena_c) as Cena_c_max, min(Cena_c) as Cena_c_min
FROM table
GROUP BY ID_t
Better is to solve it with joins of tables, because it will be avoided in the future if the prefix T is changed to another letter.
Hardcoding should be avoided.
select b.nazva as "Nazva", max(a.cena.c) as "Cena_c_max", min(a.cena.c) as "Cena_c_min"
from table1 as a
left join table2 as b on (
a.id_t = b.id_t
)
group by id_t

Pandas - Grouping Rows With Same Value in Dataframe

Here is the dataframe in question:
|City|District|Population| Code | ID |
| A | 4 | 2000 | 3 | 21 |
| A | 8 | 7000 | 3 | 21 |
| A | 38 | 3000 | 3 | 21 |
| A | 7 | 2000 | 3 | 21 |
| B | 34 | 3000 | 6 | 84 |
| B | 9 | 5000 | 6 | 84 |
| C | 4 | 9000 | 1 | 28 |
| C | 21 | 1000 | 1 | 28 |
| C | 32 | 5000 | 1 | 28 |
| C | 46 | 20 | 1 | 28 |
I want to regroup the population counts by city to have this kind of output:
|City|Population| Code | ID |
| A | 14000 | 3 | 21 |
| B | 8000 | 6 | 84 |
| C | 15020 | 1 | 28 |
df = df.groupby(['City', 'Code', 'ID'])['Population'].sum()
You can make a group by 'City', 'Code' and 'ID then make sum of 'population'.