I have an MS Access database that is used for monitoring my investment portfolio.
I have a series of individual queries which monitor individual stock performance on 1 Week, 1 Month, 3 Month, 6 Month interval etc and returns the top 5 values. What I want to now do is merge these into a table in a format similar to the following. What Tickers are shown for each period aren't necessary the same or the same order
Ticker1 - Gain1Week | Ticker2 - Gain1Month | Ticker3 - Gain3Month | Ticker4 - Gain6Month
Top Reading for wk Top Reading for Mth etc.
2nd Top Reading wk
etc.
Here is a sample of the individual query for Ticker2 - Gain 1Month
SELECT TOP 5 qryStockOverview01Month.StockID, Round(([CurrentPrice]-[1Month])/[1Month]*100,2) AS Gain01Month
FROM qryStockOverview01Month LEFT JOIN qryStockOverviewNow ON qryStockOverview01Month.StockID = qryStockOverviewNow.StockID
ORDER BY Round(([CurrentPrice]-[1Month])/[1Month]*100,2) DESC;
What I am unsure of is how to join the queries together given there is no common field. I did try to put a autonumber field in the query for 1 - 5 and was thinking could use that as the common field but couldn't get that to work.
Related
This question already has answers here:
Window Functions or Common Table Expressions: count previous rows within range
(2 answers)
Closed 1 year ago.
For an analysis I need to aggregate the rows of a single table depending on their creation time. Basically, I want to know the count of orders that have been created within a certain period of time before the current order. Can't seem to find the solution to this.
Table structure:
order_id
time_created
1
00:00
2
00:01
3
00:03
4
00:05
5
00:10
Expected result:
order_id
count within 3 seconds
1
1
2
2
3
3
4
2
5
1
Sounds like an application for window functions. But, sadly, that's not the case. Window frames can only be based on row counts, not on actual column values.
A simple query with LEFT JOIN can do the job:
SELECT t0.order_id
, count(t1.time_created) AS count_within_3_sec
FROM tbl t0
LEFT JOIN tbl t1 ON t1.time_created BETWEEN t0.time_created - interval '3 sec'
AND t0.time_created
GROUP BY 1
ORDER BY 1;
db<>fiddle here
Does not work with time like in your minimal demo, as that does not wrap around. I suppose it's reasonable to assume timestamp or timestamptz.
Since you include each row itself in the count, an INNER JOIN would work, too. (LEFT JOIN is still more reliable in the face of possible NULL values.)
Or use a LATERAL subquery and you don't need to aggregate on the outer query level:
SELECT t0.order_id
, t1.count_within_3_sec
FROM tbl t0
LEFT JOIN LATERAL (
SELECT count(*) AS count_within_3_sec
FROM tbl t1
WHERE t1.time_created BETWEEN t0.time_created - interval '3 sec'
AND t0.time_created
) t1 ON true
ORDER BY 1;
Related:
Rolling sum / count / average over date interval
For big tables and many rows in the time frame, a procedural solution that walks through the table once will perform better. Like:
Window Functions or Common Table Expressions: count previous rows within range
Alternatives to broken PL/ruby: convert a warehouse journal table
GROUP BY and aggregate sequential numeric values
I'm new to SQL and I would appreciate any advice!
I have a table that stores the history of an order. It includes the following columns: ORDERID, ORDERMILESTONE, NOTES, TIMESTAMP.
There is one TIMESTAMP for every ORDERMILESTONE in an ORDERID and vice versa.
What I want to do is compare the TIMESTAMPs for certain ORDERMILESTONEs to obtain the amount of time it takes to go from start to finish or from order to shipping, etc.
To get this, I have to gather all of the lines for a specific ORDERID, then somehow iterate through them... while I was trying to do this by declaring a TVP for each ORDERID, but this is just going to take more time because some of my datasets are like 20000 rows long.
What do you recommend? Thanks in advance.
EDIT:
In my problem, I want to find the number of days that the order spends in QA. For example, once an order is placed, we need to make the item requested and then send it to QA. so there's a milestone "Processing" and a milestone "QA". The item could be in "Processing" then "QA" once, and get shipped out, or it could be sent back to QA several times, or back and forth between "Processing" and "Engineering". I want to find the total amount of time that the item spends in QA.
Here's some sample data:
ORDERID | ORDERMILESTONE | NOTES | TIMESTAMP
43 | Placed | newly ordered custom time machine | 07-11-2020 12:00:00
43 | Processing | first time assembling| 07-11-2020 13:00:05
43 | QA | sent to QA | 07-11-2020 13:30:12
43 | Engineering | Engineering is fixing the crank on the time machine that skips even years | 07-12-2020 13:00:02
43 | QA | Sent to QA to test the new crank. Time machine should no longer skip even years. | 07-13-2020 16:00:18
0332AT | Placed | lightsaber custom made with rainbow colors | 07-06-2020 01:00:09
0332AT | Processing| lightsaber being built | 07-06-2020 06:00:09
0332AT | QA | lightsaber being tested | 07-06-2020 06:00:09
I want the total number of days that each order spends with QA.
So I suppose I could create a lookup table that has each QA milestone and its next milestone. Then sum up the difference between each QA milestone and the one that follows. My main issue is that I don't necessarily know how many times the item will need to be sent to QA on each order...
To get the hours to complete a specific mile stone of all orders you can do
select orderid,
DATEDIFF(hh, min(TIMESTAMP), max(TIMESTAMP))
from your_table
where ORDERMILESTONE = 'mile stone name'
group by orderid
Assuming you are using SQL Server and your milestones are not repeated, then you can use:
select om.orderid,
datediff(seconds, min(timestamp), max(timestamp))
from order_milestones om
where milestone in ('milestone1', 'milestone2')
group by om.orderid;
If you want to do this more generally on every row, you can use a cumulative aggregation function:
select om.*,
datediff(seconds,
timestamp,
min(case when milestone = 'order' then timestamp end) over
(partition by orderid
order by timestamp
rows between current row and unbounded following
)
) as time_to_order
from order_milestones om
group by om.orderid;
You can create a lookup table taking a milestone and giving you the previous milestone. Then you can left join to it and left join back to the original table to get the row for the same order at the previous milestone and compare the dates (sqlfiddle):
select om.*, datediff(minute, pm.TIMESTAMP, om.TIMESTAMP) as [Minutes]
from OrderMilestones om
left join MilestoneSequence ms on ms.ORDERMILESTONE = om.ORDERMILESTONE
left join OrderMilestones pm on pm.ORDERID = om.ORDERID
and pm.ORDERMILESTONE = ms.PREVIOUSMILESTONE
order by om.TIMESTAMP
I have the following table, let's call it Names:
Name Id Date
Dirk 1 27-01-2015
Jan 2 31-01-2015
Thomas 3 21-02-2015
Next I have the another table called Consumption:
Id Date Consumption
1 26-01-2015 30
1 01-01-2015 20
2 01-01-2015 10
2 05-05-2015 20
Now the problem is, that I think that doing this using SQL is the fastest, since the table contains about 1.5 million rows.
So the problem is as follows, I would like to match each Id from the Names table with the Consumption table provided that the difference between the dates are the lowest, so we have: Dirk consumes on 27-01-2015 about 30. In case there are two dates that have the same "difference", I would like to calculate the average consumption on those two dates.
While I know how to join, I do not know how to code the difference part.
Thanks.
DBMS is Microsoft SQL Server 2012.
I believe that my question differs from the one mentioned in the comments, because it is much more complicated since it involves comparison of dates between two tables rather than having one date and comparing it with the rest of the dates in the table.
This is how you could it in SQL Server:
SELECT Id, Name, AVG(Consumption)
FROM (
SELECT n.Id, Name, Consumption,
RANK() OVER (PARTITION BY n.Id
ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date]))) AS rnk
FROM Names AS n
INNER JOIN Consumption AS c ON n.Id = c.Id ) t
WHERE t.rnk = 1
GROUP BY Id, Name
Using RANK with PARTITION BY n.Id and ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date])) you can locate all matching records per Id: all records with the smallest difference in days are going to have rnk = 1.
Then, using AVG in the outer query, you are calculating the average value of Consumption between all matching records.
SQL Fiddle Demo
I am having performance issue on a set of SQLs to generate current month's statement in realtime.
Customers will purchase some goods using points from an online system, and the statement containing "open_balance", "point_earned", "point_used", "current_balance" should be generated.
The following shows the shortened schema :
//~200k records
customer: {account_id:string, create_date:timestamp, bill_day:int} //totally 14 fields
//~250k records per month, kept for 6 month
history_point: {point_id:long, account_id:string, point_date:timestamp, point:int} //totally 9 fields
//each customer have maximum of 12 past statements kept
history_statement: {account_id:string, open_date:date, close_date:date, open_balance:int, point_earned:int, point_used:int, close_balance:int} //totally 9 fields
On every bill day, the view should automatically create a new month statement.
i.e. If bill_day is 15, then transaction done on or after 16 Dec 2013 00:00:00 should belongs to new bill cycle of 16 Dec 2013 00:00:00 - 15 Jan 2014 23:59:59
I tried the approach described below,
Calculate the last close day for each account (in materialized view, so that it update only after there is new customer or past month statement inserted into history_statement)
Generate a record for each customer each month that I need to calculate (Also in materialized view)
Sieve the point record for only point records within the date that I will calculate (This takes ~0.1s only)
Join 2 with 3 to obtain point earned and used for each customer each month
Join 4 with 4 on date less than open date to sum for open and close balance
6a. Select from 5 where open date is less than 1 month old as current balance (these are not closed yet, and the point reflect the point each customer own now)
6b. All the statements are obtained by union of history_statement and 5
On a development server, the average response time (200K customer, 1.5M transactions in current month) is ~3s which is pretty slow for web application, and on the testing server, where resources are likely to be shared, the average response time (200K customer, ~200k transaction each month for 8 months) is 10-15s.
Does anyone have some idea on writing a query with better approach or to speed up the query?
Related SQL:
2: IV_STCLOSE_2_1_T(Materialized view)
3: IV_STCLOSE_2_2_T (~0.15s)
SELECT ACCOUNT_ID, POINT_DATE, POINT
FROM history_point
WHERE point_date >= (
SELECT MIN(open_date)
FROM IV_STCLOSE_2_1_t
)
4: IV_STCLOSE_3_T (~1.5s)
SELECT p0.account_id, p0.open_date, p0.close_date, COALESCE(SUM(DECODE(SIGN(p.point),-1,p.point)),0) AS point_used, COALESCE(SUM(DECODE(SIGN(p.point),1,p.point)),0) AS point_earned
FROM iv_stclose_2_1_t p0
LEFT JOIN iv_stclose_2_2_t p
ON p.account_id = p0.account_id
AND p.point_date >= p0.open_date
AND p.point_date < p0.close_date + INTERVAL '1' DAY
GROUP BY p0.account_id, p0.open_date, p0.close_date
5: IV_STCLOSE_4_T (~3s)
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
SELECT t1.account_id AS STAT_ACCOUNT_ID, t1.open_date, t1.close_date, t1.open_balance, t1.point_earned AS point_earn, t1.point_used , t1.open_balance + t1.point_earned + t1.point_used AS close_balance
FROM (
SELECT v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used, COALESCE(sum(v2.point_used + v2.point_earned),0) AS OPEN_BALANCE
FROM t v1
LEFT JOIN t v2
ON v1.account_id = v2.account_id
AND v1.OPEN_DATE > v2.OPEN_DATE
GROUP BY v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used
) t1
It turns out to be that in IV_STCLOSE_4_T
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
is problematic.
At first thought WITH t AS would be faster as IV_STCLOSE_3_T is only evaluated once, but it apparently forced materializing the whole IV_STCLOSE_3_T, generating over 200k records despite I only need at most 12 of them from a single customer at any time.
With the above statement removed and appropriately indexing account_id, the cost reduced from over 500k to less than 500.
I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)