Performance difference between max and greater than in SQL - sql

While designing a table which have impact based on date (e.g. Currency Rate) which one is better?
Effective date (Find out max(Effective date) and get the current value)
From Date - To Date condition (With greater than equals sysdate)
Rgrds.

It depends. You have a table representing changes to a particular entity, and you want to record when the entity changed, and when the change was replaced by a subsequent change.
1. If you record just the Effective Date for each event, INSERTs are simpler: they don't need to find the earlier record to update it. However, querying becomes more complex - you'll need to run a window over the dataset to find which record is applicable at any one time, potentially resulting in poorer performance.
Another downside is that this model is open-ended; you can't record a currency as "permanently closed" (which you could do if you had an End date).
2. If you record the Start and End dates for each event, queries are simpler: you only need to look at each row individually to know whether it is "current" as of any point in time or not. INSERTs, however, are slightly more complex - when you insert a new event, you have to update the earlier event to mark its End date.
This model is closed-ended; you can put a End date on the final event and have no "open" record.

Related

Selection Issue

So I have this QVD where records seize to update incrementally, the last record is dated 25/04/2022. However, I enable to generate new dates to calculate the interest accrued with the expression; RANGESUM(BELOW(SUM(DailyInterest),0,NOOFROWS())) + SUM(Interest).
My challenge is whenever I select any subsequent date, the amount defaults to 5,263.25, that's the initial amount as of 25/04/2022.
Table without any date selection
Table with 27/04/2022 selection
Apparently, in the above scenario, the amount should read 5,409.45 and not 5,263.25.
Help me out here, please!
Table with Daily Interest column
Disclaimer, there is guesswork in this answer, it might not work.
Your formula of rangesum + below (x,x,noofrows()) access the data table and sums all(noofrows) rows with date less or equal than the dimension value.
There is no set analysis in your formula; That means, when you make a selection, the other dates' data are removed from the calculation. Obviously 5263.25 (or maybe 5263.25-73.10) is an initial value, maybe sum(Interest).
One solution would be to make a set analysis that would make DailyInterest ignore your selections:
RANGESUM(BELOW(SUM({1}DailyInterest),0,NOOFROWS())) + SUM(Interest).
That might cause your chart to re-show dates you don't need. In this case, you have the generic problem of "I want to see less rows, but not actually select less data". You can solve this by using "Hide zeroes" and using IFs on every measure to make sure they are equal to zero/null for the dimension values you don't want.
You could of course calculate the accrued in your data model, which would be simpler.

Doubleor triple timestamp issue

I am using SQL assistant and my data brings in snapshots from a huge database in the form of timestamps. Occasionally the snapshots bring in multiples per hour. The data is correct, multiple snapshots do happen from time to time within an hour, not always but it does happen.
I am bringing this into Spotfire and viewing by an hour and when more than one snapshot happens in the hour, the data shows as doubled.
I only want to display one per hour preferably the last(max) timestamp for the hour. Example; for the 7 am hour the data has a snapshot for 7:10 am and one for 7:55 am.
These are correct but I only want to display the last(max) timestamp, 7:55 am in this case. I can't figure the issue out in Spotfire so I am leaning towards a fix in SQL. How can I display only 1 for each hour?
You'd do this similarly to how you'd probably do it in SQL -- using a ranking/rownumber function.
The basic way Rank in Spotfire works is Rank(Order columns, order direction, partitioned columns, tie method)
You need to partition by the combination of Date and Hour, and then sort descending by your timestamp column.
So the code to identify the rows that you want to isolate should be something along the lines of:
Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first")
What you do with it from here is going to depend on how you plan to use the data - for example, you can Limit Data Using Expression and set the code above = 1 which will limit your table accordingly (helpful if you don't want your users to accidentally forget to filter), or you can create a calculated column which turns it into a flag of some form like here:
If(Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first") = 1, "Latest", "Duplicate")
Which allows your users to filter by this property. This way, they have the option to look at the extra rows.
Ultimately, though, if you want to only ever see these rows, and have no use for the earlier records, I'd probably do it in SQL, if you have that ability. This reduces the number of rows you have to load into your analytic.

SQL query for limiting records

I have following SQL query in Data Flow, Control flow of SSIS package and I want to limit records by cutting off point, and that cut off point is current day/date from system. So, it should only display past records, not including todays. So, I think I need to use the specific field (which is date field - in the query its called 'FinalCloseDate' and compare with current system date and tell it to only to pull the records (perhaps < todays date) that happened before today or current system day.
Add
AND dbo.Producthit.FinalCloseDate < CAST(GETDATE() AS DATE)
to your WHERE clause.

Dynamically filtering large query result for presentation in SSRS

We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.

Opening Hours Database Design

We are currently developing an application in which multiple entities have associated opening hours. Opening hours may span multiple days, or may be contained within a single day.
Ex. Opens Monday at 6:00 and closes at Friday at 18:00.
Or
Opens Monday at 06:00 and closes Monday at 15:00.
Also, an entity may have multiple sets of opening hours per day.
So far, the best design I have found, is to define an opening hour to consist of the following:
StartDay, StartTime, EndDay and EndTime.
This design allows for all the needed flexibility. However, data integrity becomes an issue. I cannot seem to find a solution that will disallow overlapping spans (in the database).
Please share your thoughts.
EDIT: The database is Microsoft SQL Server 2008 R2
Consider storing your StartDay and StartTime, but then have a value for the number of hours that it's open. This will ensure that your closing datetime is after the opening.
OpenDate -- day of week? e.g. 1 for Monday
OpenTime -- time of day. e.g. 08:00
DurationInHours -- in hours or mins. e.g. 15.5
Presuming a robust trigger framework
On insert/update you would check if the new start or end date falls inside of any existing range. If it does then you would roll back the change.
CREATE TRIGGER [dbo].[mytable_iutrig] on [mytable] FOR INSERT, UPDATE AS
IF (SELECT COUNT(*)
FROM inserted, mytable
WHERE (inserted.startdate < mytable.enddate
AND inserted.startdate > mytable.startdate)
OR (inserted.enddate < mytable.enddate
AND inserted.enddate > mytable.startdate)) > 0
BEGIN
RAISERROR --error number
ROLLBACK TRANSACTION
END
Detecting and preventing overlapping time periods will have to be done at the application level. Of course you can attempt to use a trigger in the database but in my opinion this is not a database issue. The structure that you came up with is fine, but your application logic will have to take care of the overlap.
There's an article by Joe Celko on the SimpleTalk website, over here, that discusses a similar issue, and presents am elegant if complex solution. This is probably applicable to your situation.
A table with a single column TimeOfChangeBetweenOpeningAndClosing?
More seriously though, I would probably not worry too much about coming up with a single database structure for representing everything, eventually you'll probably want want a system involving recurrences, planned closures etc. Persist objects representing those, and then evaluate them to find out the closing/opening times.
This looks like a good solution, but you'll have to write a custom validation function. The built in database validation (i.e. unique, less than x, etc.) isn't going to cut it here. To ensure you don't have overlapping spans, every time you insert a record into the database, you're going to have to select existing records and compare...
First the logic, two spans will overlap if the start value of one falls between the start/end of the other. This is much easier if we have datetimes combined, instead of date1,time1 and date2,time2. So a query to find an overlap looks like this.
select openingId
from opening o1
join opening o2 on o1.startDateTime
between o2.startDateTime
AND o2.endDateTime
You can put this into a trigger and throw an error if a match is found.