Is this SQL the most efficient way - sql

We have a table that converts SAT scores into ACT scores using a year. if the data changes in the future we would add the new scores along with the year the scores change. We need to pass in a year and sat score and return the correct act score.
sample data with three rows would be
act sat year
28 1010 1998
29 1010 2012
30 1010 2015
If I pass in a SAT score of 1010 and a year of 2014 I should return an act score of 29 back.
I wrote the following SQL statement that works.
select act,
RANK() OVER(ORDER BY year DESC)
from keessattbl
where sat = 1010 and INT(year) <= 2014
FETCH FIRST ROW ONLY
Is this the most efficient way to handle this.
Thanks in advance Doug

Another option would be to use the following:
select k1.*
from keessattbl k1
where k1.sat = 1010
and k1.year = (select max(k2.year)
from keessattbl k2
where k2.sat = k1.sat
and k2.year <= 2014)
You will need to check which one is more efficient. If year (and possibly sat) is indexed, then both are probably quite fast.
But you will need to look at the execution plan (or simply time the statements) to find out.

I would say "Sure." Is it not performing well?
Also, most DBMS's have some way to get the first row of a result set, so you don't need to use DB2 unless you want to.

if you are not sure if it's the most efficient way to write then you can check by doing an EXPLAIN on the query. write the query another way, do an EXPLAIN on it and compare the costs. IBM provides the IBM Data Studio product for free. you can just right-click on your sql and select Visual Explain to get the results in the gui.

Related

IBM Cognos Analytics Selecting top 2 from a dataset

I'm working on a report where I need data for current week, and the week before, and compare these two. I have a week column in my data, which are transactions, So my data looks something like:
Amount - Week
13 - 01
19 - 01
11 - 02
10 - 02
13 - 02
12 - 03
18 - 03
15 - 04
And I want to this as a result from the two most recent weeks and sum of Amount:
Week 03: 30
Week 04: 15
Now it easy to get the most recent week, just a maximum (Week for report), but when I want to select the 2nd largest I'm getting stuck.
I've tried to do a filter that is basically "Maximum( case when week = maximum(week) then null else week)", but either I have not figured out the syntax or I this approach does not work.
Other alternative which I tired was the rank() feature and then a query which selects rank in (1, 2) but for whatever reason I couldn't get this approach to work and only got the error
The function "to_char" is being used for local processing but is not available as a built-in function, or at least one of its parameters is not supported.
Which I believe has something to do with the aggregation (multiple records per occurence of week). Anyway I'm kind of stuck and the error messages aren't giving me any clues. Would very much appreicate some help!
RANK should work fine, but it may not work well if you try to get Cognos to do all of the work in one place. I thought I could filter on the ranked data item and set the Application property to After auto aggregation. But I got strange results.
Rather than trying to create one complicated solution, try breaking the problem into smaller, simpler components.
Define Query1
Data Items:
Week = [namespace].[query subject].[Week]
Amount = [namespace].[query subject].[Amount] with the detail aggregation set to Total
Rank = rank([namespace].[query subject].[Week])
Create Query2 and set Query1 as its source.
Data Items:
[Query1].[Week]
[Query1].[Amount]
Detail Filters:
[Query1].[Rank] <= 2
Use Query2 as the source for your list.

Rank in powerpivot

In Powerpivot, I have a problem in ranking in Table 1, based on Sales and Year. I want to have the result like that:
Year Store Sales **Rank**
2013 A 200 3
2013 B 250 2
2013 C 300 1
2014 A 350 2
2014 B 300 3
2014 C 400 1
Which rank function could I use to have this rank result?
Thanks in advance.
Tran,
Probably the smartest way to go is to use the 'X' functions. They can be a bit tricky and non intuitive, yet are extremely powerful.
First, create a simple measure to calculate the total sales:
TotalSales:=SUM(Stores[Sales])
Then, use this formula below to calculate the rank (per store per year):
Rank:=RANKX(ALL(Stores[Store]), [TotalSales])
That should do what you are looking for. Once those two measures are ready, create a new powerpivot table, dray Year and Store onto rows pane and add required values.
ALL function overwrites the applied rows filter and thus allows to calculate rank per year.
The result should look like this:
Hope this helps.

Finding Outliers In SQL

I am very new to SQL and have my data in an Access database (~50k rows) with the following structure
State Year Date Price
CA 2012 1/2/13 5.00
NY 2013 1/2/13 6.00
NY 2013 1/7/13 7.00
A (State, Year) pair, though held in different columns here, represent a vintage (like a wine). So we talk about how the price of "CA 2012" moves throughout the year.
Because some of our data is entered manually into this database, there is opportunity for error. We would like to write a query that flags any suspicious entries for further review.
I have read many different questions and threads on the subject but have not found anything that addresses my main concern of how to find local outliers - the price can move up and down so prices that may be okay for some date range may be an outlier earlier in the year
Update: I chunked my data into buckets of months so finding local outliers might be easier as a result of that. I'm still looking for good outlier detection methods I can implement in SQL.
Sometimes simple is best- No need for an intro to statistics yet. I would recommend starting with simple grouping. Within that function you can Average, get the minimum, the Maximum and other useful bits of data. Here are a couple of examples to get you started:
SELECT Table1.State, Table1.Yr, Count(Table1.Price) AS CountOfPrice, Min(Table1.Price) AS MinOfPrice, Max(Table1.Price) AS MaxOfPrice, Avg(Table1.Price) AS AvgOfPrice
FROM Table1
GROUP BY Table1.State, Table1.Yr;
Or (in case you want month data included)
SELECT Table1.State, Table1.Yr, Month([Dt]) AS Mnth, Count(Table1.Price) AS CountOfPrice, Min(Table1.Price) AS MinOfPrice, Max(Table1.Price) AS MaxOfPrice
FROM Table1
GROUP BY Table1.State, Table1.Yr, Month([Dt]);
Obviously you'll need to modify the table and field names (Just so you know though- 'Year' and 'Date' are both reserved words and best not used for field names.)

SQL YTD for previous years and this year

Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)

SQL to select records for a specific date given created time and modified time

CONTEXT
I've been asked by my management to "analyze" our issue tracking database - they use it to catalog our internal bugs, etc. My SQL and DB skills are primitive so I need some help.
THE DATA
I received a single table of 3 million records. It accounts for 250K bugs. Each revision of a bug is a row in the table. That's how 250K bugs ends up in 3 million records.
The data looks like this
BugID Created Modified AssignedTo Priority Status
27 mar-31-2003 mar-31-2003 mel 2 Open
27 mar-31-2003 apr-01-2003 mel 1 Open
27 mar-31-2003 apr-10-2003 steve 1 Fixed
Thus, I have the complete history of every bug and can see how they have evolved every day.
WHAT I WANT TO ACCOMPLISH
I have a lot of things I've been asked to provide as reports. But the most basic question I have been asked to do is enable someone to look at the bugs as they existed at a specific date.
For example, if someone asked for all the bugs on mar 1 2003, then bug 27 isn't in the result because it doesn't exist on that day. Or if they asked for the bugs on April 7 they'd see bug 27 and that still marked as open
MY SPECIFIC QUESTION
Given the schema I outlined, what SQL query will provide a view of the records on a specific date?
TECHNICAL DETAILS
I am using Microsoft SQL Server 2008
WHAT I'VE TRIED SO FAR
As I said my SQL skills are primitive. I was able use WHERE clauses to filter out modifications made after the target date and bugs that didn't exist by the target date, but wasn't able to find the single record happened on that date.
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY BugID ORDER BY Modified DESC) AS sequence_id,
*
FROM
yourTable
WHERE
Modified <= #datetime_stamp
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
This assumes you want to see the fixed bugs. If you want to filter out bugs that were fixed 'a long time ago' (say, 30 days), add this...
AND (Status <> 'Fixed' OR Modified >= DATEADD(DAY, -30, #datetime_stamp))