Qlikview, Conditional based on number of rows in straight table - qlikview

So in Qlikview, I am trying to make a conditional that would only display the table if the table has less than 50000 rows. How would I go about to doing this?
The table I am working with is used for the user to create their own reports. They are able to choose what fields they want to see and are able to see those fields next to a calculated value column. I have tried using the RowNo() and NoOfRows() functions but was not able to get anywhere with that. If you have any other ideas, I would appreciate it.
Thanks

Consider that the number of rows will be determined by the number of distinct entries for your dimension for the table. So you could use:
Count(Distinct myDimension) < 50000
Where myDimension is the dimension of your table (or some concatenation of many dimensions if you have more than one dimension in your table).
Chris J's answer should be faster than the above Count(Distinct... since it does not require runtime elimination of duplicates, but depending on your data, you may need to create an extra table with a resident load to contain the counter correctly.
In my experience however, users prefer to have a logical limit on their data (something like being forced to select a week) rather than having a fixed limit to the number of records.
You can enforce this kind of limit with a condition like
GetSelectedCount(myWeekField) <= 1

As part of your load script, you should add an additional field to the table that you are
,1 as RecordSum;
Then set a variable in the script
set vRecordSum = sum(RecordSum)
Then, on the straight table set to conditional with the formula $(vRecordSum)<50000

One easy way should be to do, as a condition :
SUM(1) < 50.000
Sum(1) should represent the number of rows.

Related

iSeries query changes selected RRN of subquery result rows

I'm trying to make an optimal SQL query for an iSeries database table that can contain millions of rows (perhaps up to 3 million per month). The only key I have for each row is its RRN (relative record number, which is the physical record number for the row).
My goal is to join the table with another small table to give me a textual description of one of the numeric columns. However, the number of rows involved can exceed 2 million, which typically causes the query to fail due to an out-of-memory condition. So I want to rewrite the query to avoid joining a large subset with any other table. So the idea is to select a single page (up to 30 rows) within a given month, and then join that subset to the second table.
However, I ran into a weird problem. I use the following query to retrieve the RRNs of the rows I want for the page:
select t.RRN2 -- Gives correct RRNs
from (
select row_number() over() as SEQ,
rrn(e2) as RRN2, e2.*
from TABLE1 as e2
where e2.UPDATED between '2013-05-01' and '2013-05-31'
order by e2.UPDATED, e2.ACCOUNT
) as t
where t.SEQ > 270 and t.SEQ <= 300 -- Paging
order by t.UPDATED, t.ACCOUNT
This query works just fine, returning the correct RRNs for the rows I need. However, when I attempted to join the result of the subquery with another table, the RRNs changed. So I simplified the query to a subquery within a simple outer query, without any join:
select rrn(e) as RRN, e.*
from TABLE1 as e
where rrn(e) in (
select t.RRN2 -- Gives correct RRNs
from (
select row_number() over() as SEQ,
rrn(e2) as RRN2, e2.*
from TABLE1 as e2
where e2.UPDATED between '2013-05-01' and '2013-05-31'
order by e2.UPDATED, e2.ACCOUNT
) as t
where t.SEQ > 270 and t.SEQ <= 300 -- Paging
order by t.UPDATED, t.ACCOUNT
)
order by e.UPDATED, e.ACCOUNT
The outer query simply grabs all of the columns of each row selected by the subquery, using the RRN as the row key. But this query does not work - it returns rows with completely different RRNs.
I need the actual RRN, because it will be used to retrieve more detailed information from the table in a subsequent query.
Any ideas about why the RRNs end up different?
Resolution
I decided to break the query into two calls, one to issue the simple subquery and return just the RRNs (rows-IDs), and the second to do the rest of the JOINs and so forth to retrieve the complete info for each row. (Since the table gets updated only once a day, and rows never get deleted, there are no potential timing problems to worry about.)
This approach appears to work quite well.
Addendum
As to the question of why an out-of-memory error occurs, this appears to be a limitation on only some of our test servers. Some can only handle up to around 2m rows, while others can handle much more than that. So I'm guessing that this is some sort of limit imposed by the admins on a server-by-server basis.
Trying to use RRN as a primary key is asking for trouble.
I find it hard to believe there isn't a key available.
Granted, there may be no explicit primary key defined in the table itself. But is there a unique key defined in the table?
It's possible there's no keys defined in the table itself ( a practice that is 20yrs out of date) but in that case there's usually a logical file with a unique key defined that is by the application as the de-facto primary key to the table.
Try looking for related objects via green screen (DSPDBR) or GUI (via "Show related"). Keyed logical files show in the GUI as views. So you'd need to look at the properties to determine if they are uniquely keyed DDS logicals instead of non-keyed SQL views.
A few times I've run into tables with no existing de-facto primary key. Usually, it was possible to figure out what could be defined as one from the existing columns.
When there truly is no PK, I simply add one. Usually a generated identity column. There's a technique you can use to easily add columns without having to recompile or test any heritage RPG/COBOL programs. (and note LVLCHK(*NO) is NOT it!)
The technique is laid out in Chapter 4 of the modernizing Redbook
http://www.redbooks.ibm.com/abstracts/sg246393.html
1) Move the data to a new PF (or SQL table)
2) create new LF using the name of the existing PF
3) repoint existing LF to new PF (or SQL table)
Done properly, the record format identifiers of the existing objects don't change and thus you don't have to recompile any RPG/COBOL programs.
I find it hard to believe that querying a table of mere 3 million rows, even when joined with something else, should cause an out-of-memory condition, so in my view you should address this issue first (or cause it to be addressed).
As for your question of why the RRNs end up different I'll take the liberty of quoting the manual:
If the argument identifies a view, common table expression, or nested table expression derived from more than one base table, the function returns the relative record number of the first table in the outer subselect of the view, common table expression, or nested table expression.
A construct of the type ...where something in (select somethingelse...) typically translates into a join, so there.
Unless you can specifically control it, e.g., via ALWCPYDTA(*NO) for STRSQL, SQL may make copies of result rows for any intermediate set of rows. The RRN() function always accesses physical record number, as contrasted with the ROW_NUMBER() function that returns a logical row number indicating the relative position in an ordered (or unordered) set of rows. If a copy is generated, there is no way to guarantee that RRN() will remain consistent.
Other considerations apply over time; but in this case it's as likely to be simple copying of intermediate result rows as anything.

Fast way to select small sample from huge table

The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.
While it's not exactly clear to me what you need to achieve, when I have needed a large sample subset that is very well distributed between Parent and/or common Attribute values, I have done it like this:
SELECT *
FROM YourTable
WHERE (YourID % 10) = 3
This also has the advantage that you can get another completely different sample just by changing the "3" to another digit. Plus you can change the sub-sample size by adjusting the "10".
You can make use of NEWID():
SELECT TOP 100
*
FROM
YourTable
ORDER BY NEWID()
#RBarryYoung solution is right, generic and it works for any constant statistic distribution, like id sequences (or any auto-increment column). Sometimes, though, your distribution is not constant or you can run into performance issues (SQL Server has to scan all index entries to calculate the WHERE clause).
If any of those affects your problem, consider the built-in T-SQL operator TOP that may suit your needs:
SELECT TOP (30) PERCENT *
FROM YourTable;

Fast Way To Estimate Rows By Criteria

I have seen a few posts detailing fast ways to "estimate" the number of rows in a given SQL table without using COUNT(*). However, none of them seem to really solve the problem if you need to estimate the number of rows which satisfy a given criteria. I am trying to get a way of estimating the number of rows which satisfy a given criteria, but the information for these criteria is scattered around two or three tables. Of course a SELECT COUNT(*) with the NOLOCK hint and a few joins will do, and I can afford under- or over-estimating the total records. The probem is that this kind of query will be running every 5-10 minutes or so, and since I don't need the actual number-only an estimate-I would like to trade-off accuracy for speed.
The solution, if any, may be "SQL Server"-specific. In fact, it must be compatible with SQL Server 2005. Any hints?
There is no easy way to do this. You can get an estimate for the total number of rows in a table, e.g. from system catalog views.
But there's no way to do this for a given set of criteria in a WHERE clause - either you would have to keep counts for each set of criteria and the values, or you'd have to use black magic to find that out. The only place that SQL Server keeps something that would go into that direction is the statistics it keeps on the indices. Those will have certain information about what kind of values occur how frequently in an index - but I quite honestly don't have any idea if (and how) you could leverage the information in the statistics in your own queries......
If you really must know the number of rows matching a certain criteria, you need to do a count of some sort - either a SELECT COUNT(*) FROM dbo.YourTable WHERE (yourcriteria) or something else.
Something else could be something like this:
wrap your SELECT statement into a CTE (Common Table Expression)
define a ROW_NUMBER() in that CTE ordering your data by some column (or set of columns)
add a second ROW_NUMBER() to that CTE that orders your data by the same column (or columns) - but in the opposite direction (DESC vs. ASC)
Something like this:
;WITH YourDataCTE AS
(
SELECT (list of columns you need),
ROW_NUMBER() OVER(ORDER BY <your column>) AS 'RowNum',
ROW_NUMBER() OVER(ORDER BY <your column> DESC) AS 'RowNum2'
FROM
dbo.YourTable
WHERE
<your conditions here>
)
SELECT *
FROM YourDataCTE
Doing this, you would get the following effect:
your first row in your result set will contain your usual data columns
the first ROW_NUMBER() will contain the value 1
the second ROW_NUMBER() will contain the total number of row that match that criteria set
It's surprisingly good at dealing with small to mid-size result sets - I haven't tried yet how it'll hold up with really large result sets - but it might be something to investigate and see if it works.
Possible solutions:
If the count number is big in comparison to the total number of rows in the table, then adding indexes that cover where condition will help and the query will be very fast.
If the result number is close to the total number of rows in the table, indexes will not help much. You could implement a trigger that would maintain a 'conditional count table'. So whenever row matching condition added you would increment the value in the table, and when row is deleted you would decrement the value. So you will query this small 'summary count table'.

Selecting 'highest' X rows without sorting

I've got a table with huge amount of data. Lets say 10GB of lines, containing bunch of crap. I need to select for example X rows (X is usually below 10) with highest amount column.
Is there any way how to do it without sorting the whole table? Sorting this amount of data is extremely time-expensive, I'd be OK with one scan through the whole table and selecting X highest values, and letting the rest untouched. I'm using SQL Server.
Create an index on amount then SQL Server can select the top 10 from that and do bookmark lookups to retrieve the missing columns.
SELECT TOP 10 Amount FROM myTable ORDER BY Amount DESC
if it is indexed, the query optimizer should use the index.
If not, I do no see how one could avoid scanning the whole thing...
Wether an index is usefull or not depends on how often you do that search.
You could also consider putting that query into an indexed view. I think this will give you the best benefit/cost ration.

Optimized way to get x Random rows satisfying given criteria in MySQL

I need to get x rows from a Database Table which satisfy some given criteria.
I know that we can get random rows from MySQL using ORDER BY RAND ().
SELECT * FROM 'vids' WHERE 'cat'=n ORDER BY RAND() LIMIT x
I am looking for the most optimized way do the same {Low usage of system resources is main priority. Next important priority is speed of the query}. Also, in the table design, should I make 'cat' INDEX ?
I'm trying to think of how to do this too. My thinking at the moment is the following three alternatives:
1) select random rows ignoring criteria, then throw out ones that do not match at the application level and select more random rows if needed. This method will be effective if your criteria matches lots of rows in your table, perhaps 20% or more (need to benchmark)
2) select rows following criteria, and choosing a row based on a random number between 1 and count(*) (random number determined in the application). This will be effective if the data matching the criteria is evenly distributed, but will fail terribly if for example you are selecting a date range, and the majority of random numbers will fall upon records outside this range.
3) my current favourite, but also the most work. For every combination of criteria you intend to use to select a random record, you insert a record into a special table for that criteria. You then select random records from the special table, and follow them back to your data. For example, you might have a table like this:
Table cat: name, age, eye_colour, fur_type
If you want to be able to select random cats with brown fur, then you need a table like this:
Table cats_with_brown_fur: id (autonumber), cat_fk
You can then select a random record from this table based on the autonumber id, and it will be fast, and will produce evenly distributed random results. But indeed, if you select from many sets of criteria, you will have some overheads on maintaining these tables.
That's my current take on it, anyway. Good luck
Order by Rand() is a bad idea.
Here's a better solution:
How can i optimize MySQL's ORDER BY RAND() function?
Google is your friend, a lot of people have it explained it better than I ever could.
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
http://www.phpbuilder.com/board/showthread.php?t=10338930
http://www.paperplanes.de/2008/4/24/mysql_nonos_order_by_rand.html