IBM Cognos Analytics Selecting top 2 from a dataset - sql

I'm working on a report where I need data for current week, and the week before, and compare these two. I have a week column in my data, which are transactions, So my data looks something like:
Amount - Week
13 - 01
19 - 01
11 - 02
10 - 02
13 - 02
12 - 03
18 - 03
15 - 04
And I want to this as a result from the two most recent weeks and sum of Amount:
Week 03: 30
Week 04: 15
Now it easy to get the most recent week, just a maximum (Week for report), but when I want to select the 2nd largest I'm getting stuck.
I've tried to do a filter that is basically "Maximum( case when week = maximum(week) then null else week)", but either I have not figured out the syntax or I this approach does not work.
Other alternative which I tired was the rank() feature and then a query which selects rank in (1, 2) but for whatever reason I couldn't get this approach to work and only got the error
The function "to_char" is being used for local processing but is not available as a built-in function, or at least one of its parameters is not supported.
Which I believe has something to do with the aggregation (multiple records per occurence of week). Anyway I'm kind of stuck and the error messages aren't giving me any clues. Would very much appreicate some help!

RANK should work fine, but it may not work well if you try to get Cognos to do all of the work in one place. I thought I could filter on the ranked data item and set the Application property to After auto aggregation. But I got strange results.
Rather than trying to create one complicated solution, try breaking the problem into smaller, simpler components.
Define Query1
Data Items:
Week = [namespace].[query subject].[Week]
Amount = [namespace].[query subject].[Amount] with the detail aggregation set to Total
Rank = rank([namespace].[query subject].[Week])
Create Query2 and set Query1 as its source.
Data Items:
[Query1].[Week]
[Query1].[Amount]
Detail Filters:
[Query1].[Rank] <= 2
Use Query2 as the source for your list.

Related

Need column comprised of data from date two weeks ago for comparison

Let me start by saying that I am somewhat new to SQL/Snowflake and have been putting together queries for roughly 2 months. Some of my query language may not be ideal and I fully understand if there's a better, more efficient way to execute this query. Any and all input is appreciated. Also, this particular query is being developed in Snowflake.
My current query is pulling customer volumes by department and date based on a 45 day window with a 24 day lookback from current date and a 21 day look forward based on scheduled appointments. Each date is grouped based on where it falls within that 45 day window: current week (today through next 7 days), Week 1 (forward-looking days 8-14), and Week 2 (forward-looking days 15-21). I have been working to try and build out a comparison column that, for any date that lands within either the Week 1 or Week 2 group, will pull in prior period volumes from either 14 days prior (Week 1) or 21 days prior (Week 2) but am getting nowhere. Is there a best-practice for this type of column? Generic example of the current output is attached. Please note that the 'Prior Wk' column in the sample output was manually populated in an effort to illustrate the way this column should ideally work.
I have tried several different iterations of count(case...) similar to that listed below; however, the 'Prior Wk' column returns the count of encounters/scheduled encounters for the same day rather than those that occurred 14 or 21 days ago.
Count(Case When datediff(dd,SCHED_DTTM,getdate())
between -21 and -7 then 1 else null end
) as "Prior Wk"
I've tried to use an IFF statement as shown below, but no values return.
(IFF(ENCOUNTER_DATE > dateadd(dd,8,getdate()),
count(case when ENC_STATUS in (“Phone”,”InPerson”) AND
datediff(dd,ENCOUNTER_Date,getdate()) between 7 and 14 then 1
else null end), '0')
) as "Prior Wk"
Also have attempted creating and using a temporary table (example included) but have not managed to successfully pull information from the temp table that didn't completely disrupt my encounter/scheduled counts. Please note for this approach I've only focused on the 14 day group and have not begun to look at the 21 day/Week 2 group. My attempt to use the temp table to resolve the problem centered around the following clause (temp table alias: "Date1"):
CASE when AHS.GL_Number = "DATEVISIT1"."GL_NUMBER" AND
datevisit1.lookback14 = dateadd(dd,14,PE.CONTACT_Date)
then "DATEVISIT1"."ENC_Count"
else null end
as "Prior Wk"*
I am extremely appreciative of any insight on the current best practices around pulling prior period data into a column alongside current period data. Any misuse of terminology on my part is not deliberate.
I'm struggling to understand your requirement but it sounds like you need to use window functions https://docs.snowflake.com/en/sql-reference/functions-analytic.html, in this case likely a SUM window function. The LAG window function, https://docs.snowflake.com/en/sql-reference/functions/lag.html, might also be of some help

Using the TABLE_DATE_RANGE function in BigQuery

I'm using BigQuery for the first time in quite awhile, so I'm a bit rusty.
I'm using a public dataset that can be found here for Reddit data.
Here is a snapshot:
What I'm trying to do is create a query that extracts all data from 2017.
Basically, I want to use the BQ syntax specific equivalent of this, which is written using Standard SQL:
fh-bigquery.reddit_posts.2017*
I know that would involve using the TABLE_DATA_RANGE function, but I'm stumped on the specific wording of it.
If I was using just one of the tables, it would look like this:
SELECT
FORMAT_UTC_USEC(SEC_TO_TIMESTAMP(created_utc)) AS created_date
FROM
[fh-bigquery:reddit_posts.2017_06]
LIMIT
10
But I'm obviously trying to span this over multiple months.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_SECONDS(created_utc) AS created_date
FROM `fh-bigquery.reddit_posts.2017_*`
LIMIT 10
It does what your query for one table does - but for all tables for 2017 (not sure what actually the logic you are looking for in your query - but I hope you left it outside the question just for simplicity sake)
Note: you can use _TABLE_SUFFIX in your query to identify which exactly table specific row comes from - for example:
#standardSQL
SELECT
_TABLE_SUFFIX AS month,
COUNT(1) AS records
FROM `fh-bigquery.reddit_posts.2017_*`
GROUP BY month
ORDER BY month
with output as below
month records
----- ---------
01 9,218,513
02 8,588,120
03 9,616,340
04 9,211,051
05 9,498,553
06 9,597,725
07 9,989,122
08 10,424,133
09 9,787,604
10 10,281,718
In case if for whatever reason you still bound to BigQuery Legacy SQL - you can use below
#legacySQL
SELECT
FORMAT_UTC_USEC(SEC_TO_TIMESTAMP(created_utc)) AS created_date
FROM TABLE_QUERY([fh-bigquery:reddit_posts], "LEFT(table_id, 5) = '2017_'")
LIMIT 10
But it is highly recommended to migrate to Standard SQL

How do I perform an "average days since event" type query in mdx?

I have a cube that I have in the past used to report on counts of events. Let's say for the month of July I want to break down the number of events that have occurred for each product.
I'd have something like this:
SELECT Measures.[Count] ON COLUMNS,
Product.[Product Id].Members ON ROWS
FROM [MyCube]
WHERE [Time].[Month].[July 2012]
With an output like:
Count
Car 5
Train 6
Now I want to modify the output to get something like this:
Count Avg Days since sale
Car 5 12
Train 6 14
How do I do that?
I've spent a few hours trying to find a solution to this in MDX, but can't find a way to do this. (I'm very new to MDX)
I've found several solutions that would work if I included days in either COLUMNS or ROWS.
For instance:
WITH MEMBER Measures.[NumDays] as
count([Time].[Date].CurrentMember :
[Time].[Date].&[2012-07-27T00:00:00])
SELECT
([Time].[Date].Members,{Measures.[Count],Measures.[NumDays]}) ON COLUMNS,
[Product].[Product Id].Members ON ROWS
FROM [MyCube]
WHERE [Time].[Month].[July 2012]
This gets me really close. The result is something like:
July 1 July 2 ...
Count Num Days Count Num Days ...
Car 1 24 3 23
Train 0 24 1 23
I could use this result set and get what I want in my .NET code. I can calculate the average days since sale by weighted average.
For car, for instance, it would be (1*24 + 3*23)/(1 + 3). I'd have to do some mangled and unfortunate things in my .NET code to get this to work, and I'd also have to send back about 80 times as much data from the server as I need.
The problem I've been running into with MDX is that as soon as I take away [Time].[Date].Members from the result set, [Time].[Date].CurrentMember is [Time].[Date].All, and I can't do any meaningful calculations from it.
Is there a way to use the second MDX as a subquery, and rollup the values it returns in the way I need to?
The subquery approach isn't necessary for my solution, but I am curious if that can be done. Any help would be appreciated. Thanks in advance.
You can use the Sum function:
WITH MEMBER [Measures].[NumDays] AS
Count([Time].[Date].CurrentMember :
[Time].[Date].&[2012-07-27T00:00:00])
MEMBER [Measures].[avg] AS Sum([Time].[Date].Members, Measures.[Count] * [Measures].[NumDays])/ Sum([Time].[Date].Members, Measures.[Count])
SELECT
{[Measures].[avg]} ON COLUMNS,
[Product].[Product Id].Members ON ROWS
FROM [MyCube]
WHERE [Time].[Month].[July 2012]
When you remove [Time].[Date].Members from your second query, the current member on the hierarchy retated to [Time].[Date] is the default member (which is the All member).

SQL YTD for previous years and this year

Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)

Is this SQL the most efficient way

We have a table that converts SAT scores into ACT scores using a year. if the data changes in the future we would add the new scores along with the year the scores change. We need to pass in a year and sat score and return the correct act score.
sample data with three rows would be
act sat year
28 1010 1998
29 1010 2012
30 1010 2015
If I pass in a SAT score of 1010 and a year of 2014 I should return an act score of 29 back.
I wrote the following SQL statement that works.
select act,
RANK() OVER(ORDER BY year DESC)
from keessattbl
where sat = 1010 and INT(year) <= 2014
FETCH FIRST ROW ONLY
Is this the most efficient way to handle this.
Thanks in advance Doug
Another option would be to use the following:
select k1.*
from keessattbl k1
where k1.sat = 1010
and k1.year = (select max(k2.year)
from keessattbl k2
where k2.sat = k1.sat
and k2.year <= 2014)
You will need to check which one is more efficient. If year (and possibly sat) is indexed, then both are probably quite fast.
But you will need to look at the execution plan (or simply time the statements) to find out.
I would say "Sure." Is it not performing well?
Also, most DBMS's have some way to get the first row of a result set, so you don't need to use DB2 unless you want to.
if you are not sure if it's the most efficient way to write then you can check by doing an EXPLAIN on the query. write the query another way, do an EXPLAIN on it and compare the costs. IBM provides the IBM Data Studio product for free. you can just right-click on your sql and select Visual Explain to get the results in the gui.