Calculation based on values in 2 different rows - sql

I have a table in MS Access which has stock prices arranged like
Ticker1, 9:30:00, $49.01
Ticker1, 9:30:01, $49.08
Ticker2, 9:30:00, $102.02
Ticker2, 9:30:01, $102.15
and so on.
I need to do some calculation where I need to compare prices in 1 row, with the immediately previous price (and if the price movement is greater than X% in 1 second, I need to report the instance separately).
If I were doing this in Excel, it's a fairly simple formula. I have a few million rows of data, so that's not an option.
Any suggestions on how I could do it in MS Access?
I am open to any kind of solutions (with or without SQL or VBA).
Update:
I ended up trying to traverse my records by using ADODB.Recordset in nested loops. Code below. I though it was a good idea, and the logic worked for a small table (20k rows). But when I ran it on a larger table (3m rows), Access ballooned to 2GB limit without finishing the task (because of temporary tables, the size of the original table was more like ~300MB). Posting it here in case it helps someone with smaller data sets.
Do While Not rstTickers.EOF
myTicker = rstTickers!ticker
rstDates.MoveFirst
Do While Not rstDates.EOF
myDate = rstDates!Date_Only
strSql = "select * from Prices where ticker = """ & myTicker & """ and Date_Only = #" & myDate & "#" 'get all prices for a given ticker for a given date
rst.Open strSql, cn, adOpenKeyset, adLockOptimistic 'I needed to do this to open in editable mode
rst.MoveFirst
sPrice1 = rst!Open_Price
rst!Row_Num = i
rst.MoveNext
Do While Not rst.EOF
i = i + 1
rst!Row_Num = i
rst!Previous_Price = sPrice1
sPrice2 = rst!Open_Price
rst!Price_Move = Round(Abs((sPrice2 / sPrice1) - 1), 6)
sPrice1 = sPrice2
rst.MoveNext
Loop
i = i + 1
rst.Close
rstDates.MoveNext
Loop
rstTickers.MoveNext
Loop

If the data is always one second apart without any milliseconds, then you can join the table to itself on the Ticker ID and the time offsetting by one second.
Otherwise, if there is no sequence counter of some sort to join on, then you will need to create one. You can do this by doing a "ranking" query. There are multiple approaches to this. You can try each and see which one works the fastest in your situation.
One approach is to use a subquery that returns the number of rows are before the current row. Another approach is to join the table to itself on all the rows before it and do a group by and count. Both approaches produce the same results but depending on the nature of your data and how it's structured and what indexes you have, one approach will be faster than the other.
Once you have a "rank column", you do the procedure described in the first paragraph, but instead of joining on an offset of time, you join on an offset of rank.

I ended up moving my data to a SQL server (which had its own issues). I added a row number variable (row_num) like this
ALTER TABLE Prices ADD Row_Num INT NOT NULL IDENTITY (1,1)
It worked for me (I think) because my underlying data was in the order that I needed for it to be in. I've read enough comments that you shouldn't do it, because you don't know what order is the server storing the data in.
Anyway, after that it was a join on itself. Took me a while to figure out the syntax (I am new to SQL). Adding SQL here for reference (works on SQL server but not Access).
Update A Set Previous_Price = B.Open_Price
FROM Prices A INNER JOIN Prices B
ON A.Date_Only = B.Date_Only
WHERE ((A.Ticker=B.Ticker) AND (A.Row_Num=B.Row_Num+1));
BTW, I had to first add the column Date_Only like this (works on Access but not SQL server)
UPDATE Prices SET Prices.Date_Only = Format([Time_Date],"mm/dd/yyyy");
I think the solution for row numbers described by #Rabbit should work better (broadly speaking). I just haven't had the time to try it out. It took me a whole day to get this far.

Related

How to query only old and duplicate data from a database in SQL

I'm trying to query my database to pull only duplicate/old data to write to a scratch section in excel (Using a macro passing SQL to the DB).
For now, I'm currently testing in Access alone to only filter out the old data.
First, I'm trying to filter my database by a specifed WorkOrder, RunNumber, and Row.
The code below only filters by Work Order, RunNumber, and Row. ...but SQL doesn't like when I tack on a 2nd AND statement; so this currently isn't working.
SELECT *
FROM DataPoints
WHERE (((DataPoints.[WorkOrder])=[WO2]) AND ((DataPoints.[RunNumber])=6) AND ((DataPoints.[Row]=1)
Once I figure that portion out....
Then if there is only 1 entry with specified WorkOrder, RunNumber, and Row, then I want filter it out. (its not needed in the scratch section, because its data is already written to the main section of my report)
If there are 2 or more entries with said criteria(WO, RN, and Row), then I want to filter out the newest entry based on RunDate and RunTime, and only keep all older entries.
For instance, in the clip below. The only item remaining in my filtered query will be the top entry with the timestamp 11:47:00AM.
.
Are there any recommended commands to complete this problem? Any ideas are helpful. Thank you.
I would suggest something along the lines of the following:
select t.*
from datapoints t
where
t.workorder = [WO2] and
t.runnumber = 6 and
t.row = 1 and
exists
(
select 1
from datapoints u
where
u.workorder = t.workorder and
u.runnumber = t.runnumber and
u.row = t.row and
(u.rundate > t.rundate or (u.rundate = t.rundate and u.runtime > t.runtime))
)
Here, if the correlated subquery within the where clause finds a record with the same workorder, runnumber and row, but with either a later rundate or the same rundate and a later runtime, then the record is returned by the main query.
You need two more )'s at the end of your code snippet. Or you can delete the parentheses completely in this example, MS Access will ad them back in as it deems necessary.
M.S. Access SQL can be tricky as it is not standards compliant and either doesn't allow for super complex queries, or it needs an ugly work around, like having a parentheses nesting nightmare when trying to join more than two tables.
For these reasons, I suggest using multiple Access queries to produce your results.

MS Access - SQL Inner Join two conditions and Len-Function

I have a database with a table which I use as master and which is being updated and extended on a daily basis by a table with the same layout. Before I update almost the whole master with daily data, I want to test if the values from a specific column changed during the daily update. Usually this column only contains Null or an "X".
As a prototype I only compared the specific column of Table A and Table B and if there is a difference, set a value with more than one characters into the column (here yesterday's date).
This is the code which worked as a prototype:
UPDATE ReiseMaster
INNER JOIN Update_Import
ON(ReiseMaster.Col3 <> Update_Import.Col3
SET ReiseMaster.Col3 = Date() - 1
Now, the column contains Null, "X" or a date in the master. For the next update I now have to make sure that this previously updated column values which are containing a date as a string will be excluded (otherwise ReiseMaster.Col3 <> Update_Import.Col3 will always be true for them in the future and the date will always be updated which is not intended to happen).
My approach was to exclude all datasets from the master table where the length of the values in the column is longer than 1.
Now here is my problem:
Running the SQL code makes MS Access not responding anymore, the whole program crashes. Can somebody advise me what could be wrong with the following code?
UPDATE ReiseMaster
INNER JOIN ReiseMaster_Import
ON(ReiseMaster.`Attachment Indicator` <> ReiseMaster_Import.`Attachment Indicator` AND LEN(ReiseMaster.`Attachment Indicator`) <= 1)
SET ReiseMaster.`Attachment Indicator` = Date() - 1
Additional info: I use Access VBA to run a code and during that also the SQL-statements which are being saved in a string. About the reason I add a date once I observe a change, I want to use the dates as a reference when the value has been changed for the first time to do further analysis with them in a later stage.
Avoid using complex joins in update queries! Since the entire recordset needs to be updateable, Access tends to have problems with it.
Instead, use a WHERE clause:
UPDATE ReiseMaster
INNER JOIN ReiseMaster_Import
ON(ReiseMaster.[Attachment Indicator] <> ReiseMaster_Import.[Attachment Indicator])
SET ReiseMaster.[Attachment Indicator] = Date() - 1
WHERE LEN(ReiseMaster.[Attachment Indicator]) <= 1
Also, Access uses brackets to escape spaces in column names.
Note that if you're not using any information from the joined table, and just use it to select records, you should use an Exists clause instead:
UPDATE ReiseMaster
SET ReiseMaster.[Attachment Indicator] = Date() - 1
WHERE EXISTS(SELECT 1 FROM ReiseMaster_Import WHERE ReiseMaster.[Attachment Indicator] <> ReiseMaster_Import.[Attachment Indicator])
AND LEN(ReiseMaster.[Attachment Indicator]) <= 1

daily difference calculation performance improvement

I need to calculate the daily price difference in percentage. The query I have works but is getting slower every day. The main idea is to calculate the delta with the previous row. The previous row is normally the previous day, but there might sometimes be a day missing. When that happens it needs to take the last day available.
I'm looking for a way to limit the set that I retrieve in the inner query. There are about 20.000 records added per day.
update
price_watches pw
set
min_percent_changed = calc.delta
from
(select
id,
product_id,
calculation_date,
(1 - (price_min / lag(price_min) over (order by product_id, calculation_date))) * 100 as delta
from
price_watches
where
price_min > 0) calc
where
calc.id = pw.id;
This is wrong on many levels.
1.) It looks like you are updating all rows, including old rows that already have their min_percent_changed set and probably shouldn't be updated again.
2.) You are updating even if the new min_percent_changed is the same as the old.
3.) You are updating rows to store a redundant value that could be calculated on the fly rather cheaply (if done right), thereby making the row bigger and more error prone and producing lots of dead row versions, which means a lot of work for vacuum and slowing down everything else.
You shouldn't be doing any of this.
If you need to materialize the daily delta for read performance optimization, I suggest a small additional 1:1 table that can be updated cheaply without messing with the main table. Especially, if you recalc the value for every row every time. But better calculate new data.
If you really want to recalculate for every row (like your current UPDATE seems to do), make that a MATERIALIZED VIEW to automate the process.
If the new query I am going to demonstrate is fast enough, don't store any redundant data and calculate deltas on the fly.
For your current setup, this query should be much faster, when combined with this matching index:
CREATE INDEX price_watches_product_id_calculation_date_idx
ON price_watches(product_id, calculation_date DESC NULLS LAST);
Query:
UPDATE price_watches pw
SET min_percent_changed = calc.delta
FROM price_watches p1
, LATERAL (
SELECT (1 - p1.price_min / p2.price_min) * 100 AS delta
FROM price_watches p2
WHERE p2.product_id = p1.product_id
AND p2.calculation_date < p1.calculation_date
ORDER BY p2.calculation_date DESC NULLS LAST
LIMIT 1
) calc
WHERE p1.price_min > 0
AND p1.calculation_date = current_date - 1 -- only update new rows!
AND pw.id = p1.id
AND pw.min_percent_changed IS DISTINCT FROM calc.delta;
I am restricting the update to rows from "yesterday": current_date - 1. This is a wild guess at what you actually need.
Explanation for the added last line of the query:
How do I (or can I) SELECT DISTINCT on multiple columns?
Similar to this answer on dba.SE from just a few hours ago:
Slow window function query with big table
Proper information in the question would allow me to adapt the query and give more explanation.

SQL Select/From/Where Run Speed

I have a program that is pulling data from a Visual FoxPro table and dumping into a Dataset with VB.net. My connection string works great, and the query I'm using usually runs with respectable speed. As I've ran it more, however, I've learned that there is a large amount of "bad" data in my table. So now, I'm trying to refine my query to buffer against the "bad" data, but what I thought would be a very small tweak has yielded massive performance losses, and I'm not particularly sure why.
My original query is:
'Pull desired columns for orders that have not "shipped" and were received in past 60 days.
'To "ship", an order must qualify with both an updated ship date and Sales Order #.
sqlSelect = "SELECT job_id,cust_id,total_sale,received,due,end_qty,job_descr,shipped,so "
sqlFrom = "FROM job "
sqlWhere = "WHERE fac = 'North Side' AND shipped < {12/30/1899} AND so = '' AND received >= DATE()-60;"
sql = sqlSelect & sqlFrom & sqlWhere
This has a run-time of about 20 seconds; while I'd prefer it to be quicker, it's not a problem. In my original testing (and occasional debugging), I replaced sqlWhere with sqlWhere = "WHERE job_id = 127350". This runs pretty much instantaneously.
Now the problem block: Once I replaced sqlWhere with
'Find jobs that haven't "shipped" OR were received within last 21 days.
'Recently shipped items are desired in results.
sqlWhere = "WHERE fac = 'North Side' AND ((shipped < {12/30/1899} AND so = '') OR received >= DATE()-21);"
My performance jumped to about 3 min 40 sec. This time is almost exactly the same as the time to run with sqlWhere = "WHERE received >= DATE();".
I'm not the moderator of these tables; I'm merely pulling from them to create a series of reports for our users. My best guess is that the received field is not indexed, this is the cause of my performance drop-off. But while my first search returns about 100 records, pulling the jobs only from today returns about 5, and still takes about 11x as long.
So my question is three part:
1) Would someone be able to explain the phenomenon I'm experiencing right now? I feel like I'm somewhat on the right track, but my knowledge of SQL has been limited to circumstantial use within other languages...
2) Is there something I'm missing, or some better way to obtain the results I need? There are a large volume of records that haven't "shipped", but simply because the user only input a shipped date or s/o, and didn't do the other. I need a way to view very recent orders (regardless of "shipped" status), and then also view less recent orders that have "bad" data, so I can get the user in the habit of cleaning up the data.
3) Is it bad SQL practice to overconstrain a WHERE clause? If I run fifteen field comparisons, joined together with nested ANDs/ORs, am I wasting my time when I could be doing something much cleaner?
Many thanks,
B
If you are looking for a non-indexed record in your WHERE string, the SQL engine must do a table scan, i.e. - look at every record in the table.
The difference between the two queries is having the OR instead of the AND. When you have a non-indexed column in an AND, the SQL engine can use the indexes to narrow down the number of records it has to look at for the non-indexed column. When you have an OR, it now must look at every record in the table and compare on that column.
Adding an index on the Received column would probably fix the performance issue.
In general, there are two things you don't want to have happen in your WHERE clause.
1. A primary condition on an non-indexed column
2. Using a calculation on a column. For example, doing WHERE Shipped-2 < date() is often worse than doing Shipped < Date() + 2, because the former doesn't typically allow the index to be used.
Refining your query through multiple WHERE clauses is generally a good thing. The fewer records you need to return to your application the better your performance will be, but you need to have appropriate indexing in place.

MS Access - Editing a query's sql with vba - change "SELECT Top n"

Context - I'm building an access database that keeps track of sailboat races and calculates overall season scores as well as smaller "series" of scores. However, in a "series" not every race is counted. For example, if a series has 10 races, I only want to count the top 7 races of each individual person.
I have a separate query that is able to calculate the number of races actually counted based on the total number in each series. The query I am working on now calculates each individual's score by adding up their points for their top "n" races in that series. I don't have an extensive knowledge in sql or vba, but I was able to figure out how to use the "SELECT Top n" to filter each individual's top scores and then use a SUM to get the total.
The problem I have now is that the "n" has to be adaptable because the series could have any number of races. After some research, I learned that the only way to alter "SELECT TOP" is to use vba to rewrite the query's definition. I'm not exactly sure how to accomplish this- I don't even know where to put the code to alter the query in vba.
Again, I don't have much experience in vba, but I'm eager to learn in order to accomplish what I need. Any help is appreciated and I can show my sql if needed.
So, I think you want to store the value of the number of races in a series into a variable, and use that variable in your Top N query.
Dim Db As DAO.Database
Dim rs As DAO.Recordset
Dim series As Integer
Set db = CurrentDb
Set rs = db.OpenRecordset("YourTableNameOrQueryName")
'Here we can open the Table and store the number of series into a variable.
series = rs!YourSeriesCountFieldInTableOrQuery
Dim SQL As String
SQL = "SELECT Top " & series & " races FROM YourTable"
' You can ensure you have the right number of series by setting a break point or
' Using Debug.Print (SQL) to see the SQL in the output window.
db.Execute "SQL", dbFailOnError
'The SQL string would be your query that you have working, as posted in your OP.
'The only difference would be the string concatenation of the number of series that is dynamic
rs.Close
Set rs = Nothing