MS SQL Server: Load All Data vs Aggregate with +1 round trip - sql

I love to get your opinion on this problem.
I need to show the list of order records for the range of particular date/time. Then summarise it with # of Order compare with the "last" order. "Last" can mean either last month OR last year
Since I am going to show the list of order record, I am thinking to get the record from last month OR last year with one hit (ie. together with the records of current date/time range)
OR, alternatively, I can:
Get the record of current date/time range, THEN
Get the total number of order (using aggregate) for last month OR last year
The alternative means there is 2 round trips to database (but less data to return). Or should I stick with my current method (loading all records including those from last month OR last year).
NOTE: The website and the SQL server is hosted in Microsoft Azure Cloud. But we might switch to AWS in the future.
Thanks
Input example (some fields are omitted including time for simplicity)
----------------------------------------------------------------
| Warehouse Id | Order Id | Product Id | Quantity | Order Date |
----------------------------------------------------------------
| 1 | 10 | 1 | 10 | 2016-09-25 |
| 1 | 9 | 5 | 5 | 2016-09-24 |
| 1 | 8 | 4 | 8 | 2016-09-23 |
| 1 | 7 | 6 | 2 | 2016-09-23 |
| 1 | 6 | 8 | 1 | 2016-09-23 |
| 1 | 5 | 1 | 2 | 2016-09-22 |
| 1 | 4 | 1 | 2 | 2016-09-21 |
| 1 | 3 | 5 | 10 | 2016-09-21 |
| 1 | 2 | 5 | 15 | 2016-08-12 |
| 1 | 1 | 5 | 5 | 2016-08-10 |
----------------------------------------------------------------
The desire OUTPUT:
Input:
WarehouseId: 1
StartDate: 2016-09-01 End Date: 2016-09-30)
Comparison type: Last Month (ie. StartDate: 2016-08-01 EndDate: 2016-08-31)
Output:
Warehouse: xxx
-------------------------------------------------
| Order Id | Product Id | Quantity | Order Date |
-------------------------------------------------
| 10 | 1 | 10 | 2016-09-25 |
| 9 | 5 | 5 | 2016-09-24 |
| 8 | 4 | 8 | 2016-09-23 |
| 7 | 6 | 2 | 2016-09-23 |
| 6 | 8 | 1 | 2016-09-23 |
| 5 | 1 | 2 | 2016-09-22 |
| 4 | 1 | 2 | 2016-09-21 |
| 3 | 5 | 10 | 2016-09-21 |
-------------------------------------------------
Total Order: 40 (increase 100% from last month)
So, what I am doing now is to get ALL records from 2016-08-01 to 2016-09-30. That way I can avoid 2 round trips.
Alternatively, I can do the following:
1. Get record from 2016-09-01 to 2016-09-30
var rec = (from rec in tblOrders
where (rec.WarehouseId == whsId) && (rec.OrderDate >= startDate) && (rec.OrderDate <= endDate)
select rec).ToList();
2. Then do the SUM of total order from 2016-08-01 to 2016-08-31 for comparison purposes
var recSum = (from rec in ef.tblOrders
where (rec.WarehouseId == whsId) && (rec.OrderDate >= cStartDate) && (rec.OrderDate <= cEndDate)
group rec by rec.WarehouseId into grec
select new
{
TotalQty = grec.Sum(x => x.Quantity),
}).FirstOrDefault();

You can do this with window functions:
select o.*
from (select o.*
sum(case when datetime is "last month" or "last year" then 1 else 0 end) over () as last_num_orders
from orders o
) o
where o.datetime between #date1 and #date2;
I am very unclear what "last" means in this context. However, you can do what you want with window functions, which is the preferred option 0.

Related

Find Aggregated Data Between Two Dates in Two Tables Where One is Updated Weekly and Other is Updated Hourly

I have data in two different tables, one is updated every week or once in the middle of the week if needed, and the other table is updated every hour or so because it has more data. The first table, can be seen as
agent_id | rank | ranking_date
---------------------------
1 | 1 | 2022-03-21
2 | 2 | 2022-03-21
1 | 4 | 2022-03-14
2 | 3 | 2022-03-14
1 | 2 | 2022-03-10
And the second table contains detailed information on sales.
agent_id | call_id | talk_time | product_sold | amount | call_date
------------------------------------------------------------------
1 | 1 | 13 | 1 | 53 |2022-03-10
1 | 2 | 24 | 2 | 2 |2022-03-10
2 | 3 | 43 | 4 | 11 |2022-03-10
1 | 4 | 31 | - | 0 |2022-03-10
2 | 5 | 12 | - | 0 |2022-03-10
1 | 6 | 11 | - | 0 |2022-03-11
1 | 7 | 35 | 2 | 79 |2022-03-11
2 | 8 | 76 | - | 0 |2022-03-11
1 | 9 | 42 | 1 | 23 |2022-03-11
2 | 10 | 69 | - | 0 |2022-03-11
How can I merge the two tables to get their aggregated information? Remember the ranks change at the beginning of every week, and the sales happen every day. But the rankings can also be changed in the middle of the week if needed. So what I am trying to get is created an aggregated table for understanding the sales by each agent. Something like this
agent_id | rank | ranking_date | total_calls_handled | total_talktime | total_amount
------------------------------------------------------------------------------------
1 | 1 | 2022-03-21 | 100 | 875 | 3000 (this is 3/21 - today)
2 | 2 | 2022-03-21 | 120 | 576 | 3689 (this is 3/21 - today)
1 | 4 | 2022-03-14 | 210 | 246 | 1846 (this is 3/14 - 3/21)
2 | 3 | 2022-03-14 | 169 | 693 | 8562 (this is 3/14 - 3/21)
1 | 2 | 2022-03-10 | 201 | 559 | 1749 (this is 3/7 - 3/10)
So the data is aggregated for each agent from 7-10, 10 - 14, then 14-21. Also, if say, the latest ranking date is 2022-03-21, and today is 2022-03-23, the query returns aggregation until today.
[Edit]: added table and data details
Table and data details:
Rankings table:
agent_id: unique_id of the agent
rank: rank of an agent assigned updated every Monday or if needed
ranking_date: date when agent's ranking was last updated (Automatically every Monday or if needed)
Sales Table:
agent_id: unique_id of the agent
call_id: unique_id for a call
talk_time: duration of the call
product_sold: unique_id of the product sold (- if agent was unsuccessful to sell)
amount: commission earned by the agent (therefore same product_id has different amount) (0 if agent was unsuccessful to sell)
call_date: date when which call was made
[Edit 2]: Here is SQLFiddle.
Here we join where ranking_date and call_date are in the same week. If you make calls sunday you will need to check whether it falls in the same week as you want.
The syntax in the query is for SQL server, as the SQL Fiddle given. You will need to modify the line of the join to
on date_part(w,r.ranking_date) = date_part(w,s.call_date)
which should be compatible with Google Redshift.
select
r.agent_id,
r.rank,
r.ranking_date,
count(s.call_id) TotalCalls,
sum(s.talk_time) TotalTime,
sum(s.amount) TotalAmount
from rankings r
left join sales s
on datename(ww,r.ranking_date)= datename(ww,s.call_date)
group by
r.agent_id,
r.rank,
r.ranking_date
GO
agent_id | rank | ranking_date | TotalCalls | TotalTime | TotalAmount
-------: | ---: | :----------- | ---------: | --------: | ----------:
1 | 1 | 2022-03-21 | 0 | null | null
1 | 2 | 2022-03-10 | 10 | 356 | 168
1 | 4 | 2022-03-14 | 0 | null | null
2 | 2 | 2022-03-21 | 0 | null | null
2 | 3 | 2022-03-14 | 0 | null | null
db<>fiddle here

Cumulative days across rows with several constraints SQL

I'm trying to figure out how to return a single line per Asset which shows the total days of cumulative Periods. However, I only want to add certain Periods depending on whether the StartDate is within 10 days of the EndDate of the previous Period.
TotalDays column in the sample data: If a Period does not have an EndDate the total days is Today (13/10/2018) minus the StartDate.
Breakdown of the Expected Output Table:
Row 1 / Asset1: TotalDays is 278 because
Period 2 started 1 day after Period 1 ended
and Period 3 started 6 days after Period 2 ended
therefore 63+29+186 = 278
Row 2 / Asset 2: TotalDays is 120 because
Period 1 and 2 are both Open so use the earliest StartDate
Today minus 15/06/2018 = 120
Row 3 / Asset 3: TotalDays is 66 because
Period 2 started over 10 days after Period 1 ended
If an Asset has no Open Periods it would not be displayed in the Output.
Happy to clarify anything as I know this is a bit fiddly!
Many Thanks.
Data Sample:
+-----+---------+--------+------------+------------+-----------+--------------------------------------+--------+
| Row | AssetID | Period | StartDate | EndDate | TotalDays | DaysBetweenEndDateAndStartDateOfNext | Status |
+-----+---------+--------+------------+------------+-----------+--------------------------------------+--------+
| 1 | 1 | 1 | 01/01/2018 | 05/03/2018 | 63 | NULL | Closed |
| 2 | 1 | 2 | 06/03/2018 | 04/04/2018 | 29 | 1 | Closed |
| 3 | 1 | 3 | 10/04/2018 | NULL | 186 | 6 | Open |
| 4 | 2 | 1 | 15/06/2018 | NULL | 120 | NULL | Open |
| 5 | 2 | 2 | 01/07/2018 | NULL | 104 | NULL | Open |
| 6 | 3 | 1 | 01/02/2018 | 10/02/2018 | 9 | NULL | Closed |
| 7 | 3 | 2 | 08/08/2018 | NULL | 66 | 179 | Open |
+-----+---------+--------+------------+------------+-----------+--------------------------------------+--------+
Expected Output:
+-----+---------+------------+---------+-----------+
| Row | AssetID | StartDate | EndDate | TotalDays |
+-----+---------+------------+---------+-----------+
| 1 | 1 | 01/01/2018 | NULL | 278 |
| 2 | 2 | 15/06/2018 | NULL | 120 |
| 3 | 3 | 08/08/2018 | NULL | 66 |
+-----+---------+------------+---------+-----------+

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;

SQL to display value for different dates

I have a table named Reading_Hist containing columns such as Reading, Date, ID. This table contains history of the readings. example
+----+---------+-------------+
| ID | Reading | ReadingDate |
+----+---------+-------------+
| 1 | 12 | 9/12/2018 |
| 2 | 15 | 9/12/2018 |
| 1 | 16 | 9/5/2018 |
| 4 | 1 | 9/12/2018 |
| 3 | 65 | 9/12/2018 |
| 1 | 23 | 8/29/2018 |
| 3 | 25 | 9/5/2018 |
| 2 | 23 | 9/5/2018 |
| 4 | 3 | 9/5/2018 |
+----+---------+-------------+
I want to write a sql to display each ID and it's current Reading on first column, next reading taken a week before and third reading taken two weeks before and last trend of the reading.
Example Result below.
+----+---------+------+------+-------+
| ID | Current | Wk_1 | Wk_2 | Trend |
+----+---------+------+------+-------+
| 1 | 12 | 16 | 23 | Down |
| 2 | 15 | 23 | NULL | Down |
| 3 | 65 | 25 | NULL | UP |
| 4 | 1 | 3 | NULL | Down |
+----+---------+------+------+-------+
You can use aggregation to get the maximum day of readings per ID. Then left join the current readings, them of the last week and two weeks ago. Use CASE to calculate the trend.
It could look something like:
SELECT x.id,
rh2.reading current,
rh3.reading wk_1,
rh4.reading wk_2,
CASE
WHEN rh2.reading > rh3.reading THEN
'Up'
WHEN rh2.reading < rh3.reading THEN
'Down'
WHEN rh2.reading = rh3.reading THEN
'-'
END trend
FROM (SELECT rh1.id,
max(rh1.reading_date) reading_date
FROM reading_hist rh1
GROUP BY rh1.id) x
LEFT JOIN reading_hist rh2
ON rh2.id = x.id
AND rh2.reading_date = x.reading_date
LEFT JOIN reading_hist rh3
ON rh3.id = x.id
AND rh3.reading_date = dateadd(day, -7, x.reading_date)
LEFT JOIN reading_hist rh4
ON rh4.id = x.id
AND rh4.reading_date = dateadd(day, -14, x.reading_date);
Of course this requires, that there are readings exactly 7 or 14 days from the last day of readings.

SQL Statement to show columns multiple times

I have a table containing an integer column that represents a work place, an integer column that represents the number of workpieces finished at that workplace and a date column.
I want to create a query that creates rows of the following type
location int | date of Max(workpiece) | max workpieces | Min(Date) | workpieces (Min(Date)) | max(Date) | workpieces (Max(Date))
So i want a row for each location containing the date of the day where the most pieces where finished plus the amount of the pieces, the oldest date and the pieces finished on that day and the newest date plus the number of pieces finished that day.
Do I have to use joins, to join the table with itself 3 times each given one of the criteria and then join on location? Is The GROUP BY Operator involved, which I don't quite get the hang of?
EDIT: Here's some sample data
+-------+-----------+-----------+-------------------+
| id | location | amount | date |
+-------+-----------+-----------+-------------------+
| 1 | 1 | 10 | 01.01.2016 |
| 2 | 2 | 5 | 01.01.2016 |
| 3 | 1 | 6 | 02.01.2016 |
| 4 | 2 | 35 | 02.01.2016 |
| 5 | 1 | 50 | 03.01.2016 |
| 6 | 2 | 20 | 03.01.2016 |
+-------+-----------+-----------+-------------------+
I want my output to look like this:
loc | dateMaxAmount| MaxAmount | MinDate | AmountMinDate | MaxDate | MaxDateAmount
1 | 03.01.2016 | 50 | 01.01.2016| 10 | 03.01.2016| 50
2 | 02.01.2016 | 35 | 01.01.2016| 5 | 03.01.2016| 20
I am using MS Access.