Qlikview - calculate and use calculated variable in script - sql

As a new Qlikview user, I'm looking for the best way to create calculated variables, and variables based on calculated variables, in my data and use them in displays. My data is connected via ODBC.
For example, let's say I want a variable Rating based on the "Risk" variable in my dataset. The raw data contains a Risk variable that is "L" or "H". I would like to create an indicator, like Risk_H, that is 0 or 1 (if Risk='H'). Then I would like to create the Rating like "Rating = 1 + Risk_H*2". Can I do all of this in a script and have the variable Rating in my dataset?
When I try the above, I can create the Risk_H variable, but then I am not sure how to reference it in the script to calculate the Rating variable. I have read other posts that address using the load statement (Qlikview Calculated Fields with Load Script) but have been unsuccessful using calculated variables to create new variables.
Example code (which works):
SQL SELECT *,
case when (Risk = 'H') then 1
else 0
end as Risk_H
FROM [Data];
How can I create Risk_H in order to use it in the same script, like the below? In other settings, I would use something like "calculated Risk_H" to refer to it.
SQL SELECT *,
case when (Risk = 'H') then 1
else 0
end as Risk_H,
(10 + Risk_H*2) as Rating // Qlikview says it can't find Risk_H
FROM [Data];
I've tried creating Risk_H in a load script, but Qlikview doesn't recognize Risk_H in a later SQL statement. I've also tried creating a table with Risk_H , and pulling the data from that table. And in reality I'm trying to create 10+ indicators, not just one, so nested case statements aren't the answer.
EDIT: I'm told that resident tables may be the answer to performing calculations. If you can provide syntax for this using tables connected via ODBC that may answer the question.

It appears that your second Select statement is not valid SQL so as a result QlikView will complain that it cannot find Risk_H. You could try a more complicated SQL query with a sub-query to resolve this, or you could use a resident load in QlikView as follows:
Source_Data:
SQL SELECT *,
case when (Risk = 'H') then 1
else 0
end as Risk_H
FROM [Data];
Calculated_Data:
NOCONCATENATE
LOAD
*,
(10 + Risk_H*2) as Rating
RESIDENT Source_Data;
DROP TABLE Source_Data;
You also mentioned that you have around 10 indicators that you wish to use, so I agree, a case statement would probably not be a good idea. You can move this part into QlikView as well if you like using a MAPPING load and the ApplyMap function as follows:
Indicator_Map:
MAPPING
LOAD
*
INLINE [
Risk, Value
H, 1
I, 2
J, 3
];
Source_Data:
SQL SELECT *,
case when (Risk = 'H') then 1
else 0
end as Risk_H
FROM [Data];
Calculated_Data:
NOCONCATENATE
LOAD
*,
(10 + (ApplyMap('Indicator_Map',Risk, 0) * 2)) as Rating
RESIDENT Source_Data;
DROP TABLE Source_Data;
I added a couple of extra entries for your Risk "indicators" to give you an idea. Of course, the table doesn't need to be inline, it could come from another SQL statement, other file etc.
In the above example, what happens is that the Risk field's value is supplied as a parameter to the mapping table Indicator_Map which then returns the associated value. If no risk value is found, it returns 0 (the third parameter).

Related

Write SQL from SAS

I have this code in SAS, I'm trying to write SQL equivalent. I have no experience in SAS.
data Fulls Fulls_Dupes;
set Fulls;
by name, coeff, week;
if rid = 0 and ^last.week then output Fulls_Dupes;
else output Fulls;
run;
I tried the following, but didn't produce the same output:
Select * from Fulls where rid = 0 groupby name,coeff,week
is my sql query correct ?
SQL does not have a concept of observation order. So there is no direct equivalent of the LAST. concept. If you have some variable that is monotonically increasing within the groups defined by distinct values of name, coeff, and week then you could select the observation that has the maximum value of that variable to find the observation that is the LAST.
So for example if you also had a variable named DAY that uniquely identified and ordered the observations in the same way as they exist in the FULLES dataset now then you could use the test DAY=MAX(DAY) to find the last observation. In PROC SQL you can use that test directly because SAS will automatically remerge the aggregate value back onto all of the detailed observations. In other SQL implementations you might need to add an extra query to get the max.
create table new_FULLES as
select * from FULLES
group by name, coeff, week
having day=max(day) or rid ne 0
;
SQL also does not have any concept of writing two datasets at once. But for this example since the two generated datasets are distinct and include all of the original observations you could generate the second from the first using EXCEPT.
So if you could build the new FULLS you could get FULLS_DUPES from the new FULLS and the old FULLS.
create table FULLS_DUPES as
select * from FULLES
except
select * from new_FULLES
;

Add new column in SQL based on 2 conditions

I would like to create a new column in SQL using two conditions. I want to create a column that says 1 if the item count is greater than 1 and if item/sum is greater than 5, otherwise return 0. This is not my original data, is there anyway I could add this new column with these specified rows using a select statement.
First I want to say that using SQL key words or SQL function names as table name or column name should be avoided, so if possible, I recommend to rename the columns "number" and "sum".
The second point is it's unclear what should happen in case the sum column is 0. Since a division by zero is not possible, you will need to add a condition for that.
Anyway, the way to achieve such things is using CASE WHEN. So let's add the new column:
ALTER TABLE yourtable ADD column column_name INT;
Now, you need to execute an update command for that column providing the logic you want to apply. As an example, you can do this:
UPDATE yourtable SET column_name =
CASE WHEN item <= 1 THEN 0
WHEN sum = 0 THEN 1
WHEN item / sum > 0.5 THEN 1
ELSE 0 END;
This will set the new column to 1 only in case item is > 1 and sum is 0 or sum item / sum is > 0.5 (greater 50%). In all other cases it will be set to 0. Again, the bad column naming can be seen since "WHEN sum..." looks like you really build a sum and not just use a column.
If you want as example to set the new column to 0 instead of 1 when the sum is 0, just change it and try out.
In case you want to automatically apply this logic fur future inserts or updates, you can add a trigger on your new column. Something like this:
CREATE TRIGGER set_column_name
BEFORE INSERT ON yourtable
FOR EACH ROW
SET new.column_name = CASE WHEN new.item <= 1 THEN 0
WHEN new.sum = 0 THEN 1
WHEN new.item / new.sum > 0.5 THEN 1
ELSE 0 END;
But take care, the syntax of triggers depend on the DB you are using (this example will work in MYSQL). Since you did not tell us which DB you use, you maybe need to modify it. Furthermore, depending on your DB type and your requirements, you need zero, one or two triggers (for updates and for inserts).

Query Snowflake Jobs [duplicate]

is there any way within snowflake/sql query to view what tables are being queried the most as well as what columns? I want to know what data is of most value to my users and not sure how to do this programatically. Any thoughts are appreciated - thank you!
2021 update
The new ACCESS_HISTORY view has this information (in preview right now, enterprise edition).
For example, if you want to find the most used columns:
select obj.value:objectName::string objName
, col.value:columnName::string colName
, count(*) uses
, min(query_start_time) since
, max(query_start_time) until
from snowflake.account_usage.access_history
, table(flatten(direct_objects_accessed)) obj
, table(flatten(obj.value:columns)) col
group by 1, 2
order by uses desc
Ref: https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
2020 answer
The best I found (for now):
For any given query, you can find what tables are scanned through looking at the plan generated for it:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
You can find all of your previous ran queries too:
select QUERY_TEXT
from table(information_schema.query_history())
So the natural next step would be combine both - but that's not straightforward, as you'll get an error like:
SQL compilation error: argument 1 to function EXPLAIN_JSON needs to be constant, found 'SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.c')'
The solution would be to combine the queries from the query_history() with the SYSTEM$EXPLAIN_PLAN_JSON outside (to make the strings constant), and then you will be able to find out the most queried tables.

How to use aggregate function to filter a dataset in ssrs 2008

I have a matrix in ssrs2008 like below:
GroupName Zone CompletedVolume
Cancer 1 7
Tunnel 1 10
Surgery 1 64
ComplatedVolume value is coming by a specific expression <<expr>>, which is equal to: [Max(CVolume)]
This matrix is filled by a stored procedure that I am not supposed to change if possible. What I need to do is that not to show the data whose CompletedVolume is <= 50. I tried to go to tablix properties and add a filter like [Max(Q9Volume)] >= 50, but when I try to run the report it says that aggregate functions cannot be used in dataset filters or data region filters. How can I fix this as easy as possible?
Note that adding a where clause in sql query would not solve this issue since there are many other tables use the same SP and they need the data where CompletedVolume <= 50. Any help would be appreciated.
EDIT: I am trying to have the max(Q9Volume) value on SP, but something happening I have never seen before. The query is like:
Select r.* from (select * from results1 union select * from results2) r
left outer join procedures p on r.pid = p.id
The interesting this is there are some columns I see that does not included by neither results1/results2 nor procedures tables when I run the query. For example, there is no column like Q9Volume in the tables (result1, result2 and procedures), however when I run the query I see the columns on the output! How is that possible?
You can set the Row hidden property to True when [Max(CVolume)] is less or equal than 50.
Select the row and go to Row Visibility
Select Show or Hide based on an expression option and use this expression:
=IIF(
Max(Fields!Q9Volume.Value)<=50,
True,False
)
It will show something like this:
Note maximum value for Cancer and Tunnel are 7 and 10 respectively, so
they will be hidden if you apply the above expression.
Let me know if this helps.

Sql custom statement

What is the Programme for Sql data in Microsoft Expression Web Custom Statement similar to following Access database statement:
Sum(IIf([accident]![Rly]='CR',1,0))
Im not sure you will be able to use this format as this will run against each individual row and always return 1 or 0. To run against your whole query to get the total sum you need something similar to this:
SELECT SUM(Rly) FROM
(SELECT ID, CASE WHEN Accident.Rly = 'CR' THEN 1 ELSE 0 END Rly
FROM Accident)
The syntax may be slightly different but hopefully this will set you in the right direction.