How to not count the null value from a fact table in SSAS? - sql

I have many measures of distinct count in a cube. My problem is that those measures count the null value as well. I've found two solutions to eliminate the null value:
I've created named queries in data source view for each measure where i put the condition that the column that i need does not contains null [where column is not null] (but this solution is not that practical, because if you have many measures, that do not need to count the null value you have to make a lot of fact tables as named queries to eliminate the null)
I've created an additional column as Named calculation in the fact table, where i tested if the column that i need contains null to put 1 else to put 0 (CASE WHEN Column IS NULL THEN 1 ELSE 0). After that i created a measure of maximum on this additional column and i created a measure of distinct count on the column that i needed . And finally, i created a calculation where i tested the following: IIF([measure that i need]- [Maximum of additional column]<0,null,[measure that i need]- [Maximum of additional column])
Both solutions works but my question is if there is another solution more simple than those two mentioned or if there is an option in SSAS.
If someone knows please share the information.

In Sql it is possible to use
select count(column_name) from table.
this doesn't count the null values.
count(*) does count the null values.

Related

How to get data based on two columns from same table in SQL

I wanted to retrieve some data from a table based on two columns see the below table structure
Update
i want the output data based on two condition
1. if the code value is having 'Web' or 'Offline'.
2. Memo column is having data same as Pre_memo column.
Output should be as shown below
So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data.
select distinct OrderTable.Memo,
max(OrderTable.Memo_Date) as Date1,
max(ot.Pre_Memo_Date) as Date2
from OrderTable,
OrderTable ot
where OrderTable.code in ('Web')
and ot.code in ('Offline')
and OrderTable.Memo = ot.Pre_Memo
group by OrderTable.Memo
Can anyone help on this? With the use of OrderTable only once in the query and filter based on memo and pre_memo column as it's having same data?
You can use union all and do the conditional aggregation :
select Memo, max(case when code = 'Offline' then Date end) as Memo_date,
max(case when code = 'Web' then Date end) as Per_Memo_date
from (select Date, 'Web' as code, Pre_memo as Memo
from OrderTable o
where code = 'Web'
union all
select Date, 'Offline', Memo
from OrderTable o
where code = 'Offline'
) t
group by Memo;
"I wanted to retrieve some data from a table based on two columns see the below table structure"
Providing a sample is sufficient to illustrate the problem (and it is desirable to do so on SO) but it is not sufficient and thus not a replacement for defining the problem, which you have failed to do.
Absent such definition of the problem, we can only guess what you're trying to achieve. E.G.
from the subset of tuples that have 'Offline' for 'code' value, take the MAX() 'Date' value per appearing value of 'Memo'.
Match that (using some matching condition) to the subset of tuples that have 'Web' for 'code value and retain the 'Date' value from those as 'Memo_date' in the result set.
matching condition being that 'Memo' value of [a tuple in] the former is equal to 'Pre_memo' value in [the matching tuple in] the latter.
If all that is correct, then that explains why it is impossible to do this in SQL without having at least two references. You cannot avoid doing some kind of matching, and matching by definition takes two distinct things to match (even if the two distinct things are distinct subsets of one and the same thing). In fact it is almost certainly a fundamental design mistake for you to have those two distinct things in one single table, probably under the totally misguided belief that "having everything in one table makes things easier".
"So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data"
From the way you have presented the question, I suspect that you were hoping for some means to exploit the fact that those 'Offline' tuples are "the next" after a 'Web' tuple, and that you could write the SQL in such a way that the engine could then derive a sort of "single pass" algorithm (which you probably assume would go faster).
It does not work like that. SQL tables have no inherent ordering and as a consequence there simply ain't no such thing as "the next" in a table.

SSAS: Show distinct count measure with unknown member

I have a measure that counts distincted ID's on some fact table.
Let's say it looks like this:
[id] [linkedtableid] [datecolumn]
1 someid date1
2 someid date1
3 someid date1
4 someid date1
5 null date1
You may see that for date1 there is 5 distinct rows. But in my case it results count = 4. I thought that this can be connected somehow with UnknownMember processing, but I ended up with nothing with this assumption. I've already tried everything in my cube solution, but can't find the reason of such behavior. It seems like row with null value in it just doesn't count by distinct count function.
Also, if I fill this null value in relational DB and then reprocess the cube, all is counting correctly.
I probably missed something, maybe some option somewhere.
Resolved by removing unneeded relations between the measure for distinct count and dimensions. There was 2 other dimensions, one connected through direct link, one through referenced. I don't know why nulls were not calculated there, maybe because of unability to link via reference link with null-valued field.

String Grouping from a single column in Oracle database having million rows and removing duplicates

We have a huge table and one of the column contains queries like e.g. in row 1
1. (((firstname:Adam OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Individual
and in row 2 of same column
2. (((firstname:Adam* OR firstname:Neil ) AND lastname:Lee) ) AND category:"Legal" AND type:Organization
Similarly there are few other types of Query strings which are used eventually to query external services.
Issue is based on certain criteria I have to group and remove duplicates from this table.
There are few rules to determine grouping of Strings in different rows.One of them is that if first name and lastname are same then ignore category and type values, therefore above two rows will be grouped to one. There are around million rows. Comparing Strings and doing grouping is not looking elegant solution. What could be best possible solution using sql.

IN SSAS how to remove null value in distinct count measures

I have column in fact table .the column in some row has 'Null' value.i have measure based on this column with aggregate function Set to DistinctCount
this measure count null value too.
but i don't want to count null value what should i do?
Most efficient would be to filter out NULL values in the data source view (using a named query for example). This won't affect performance too much as a distinct count measure is calculated in a separate measure group anyway.
One popular solution that works is to count from a view of the table that filters out the nulls. This works, but I would bet that it requires another scan of the fact table.
Another solution is like fighting fire with fire.
Add a computed column that is 0 if it's null and 1 if it's not:
CASE WHEN _DollarsLY IS NULL THEN 0 ELSE 1 END AS _DistinctCountHackLY
Then you can do something like this in a cube calculation:
iif(_DistinctCountHackLY=2 or _DollarsLY=null,_DistinctUPCLY-1,_DistinctUPCLY)

How not to display columns which are NULL in a view

I've set up a view which combines all the data across several tables. Is there a way to write this so that only columns which contain non-null data are displayed, and those columns which contain all NULL values are not included?
ADDED:
Sorry, still studying and working on my first big project so every day seems to be a new experience at the minute. I haven't been very clear, and that's partly because I'm not sure I'm going about things the right way! The client is an academic library, and the database records details of specific collections. The view I mentioned is to display all the data held about an item, so it is bringing together tables on publication, copy, author, publisher, language and so on. A small number of items in the collection are papers, so have additional details over and above the standard bibliographic details. What I didn't want was a user to get all the empty fields relating to papers if what was returned only consisted of books, therefore the paper table fields were all null. So I thought perhaps there would be a way to not show these. Someone has commented that this is the job of the client application rather than the database itself, so I can leave this until I get to that phase of the project.
There is no way to do this in sql.
CREATE VIEW dbo.YourView
AS
SELECT (list of fields)
FROM dbo.Table1 t1
INNER JOIN dbo.Table2 t2 ON t1.ID = t2.FK_ID
WHERE t1.SomeColumn IS NOT NULL
AND t2.SomeOtherColumn IS NOT NULL
In your view definition, you can include WHERE conditions which can exclude rows that have certain columns that are NULL.
Update: you cannot really filter out columns - you define the list of columns that are part of your view in your view definition, and this list is fixed and cannot be dynamically changed......
What you might be able to do is us a ISNULL(column, '') construct to replace those NULLs with an empty string. Or then you need to handle excluding those columns in your display front end - not in the SQL view definition...
The only thing I see you could do is make sure to select only those columns from the view that you know aren't NULL:
SELECT (list of non-null fields) FROM dbo.YourView
WHERE (column1 IS NOT NULL)
and so forth - but there's no simple or magic way to select all columns that aren't NULL in one SELECT statement...
You cannot do this in a view, but you can do it fairly easily using dynamic SQL in a stored procedure.
Of course, having a schema which shifts is not necessarily good for clients who consume the data, but it can be efficient if you have very sparse data AND the consuming client understands the varying schema.
If you have to have a view, you can put a "header" row in your view which you can inspect client-side on the first row in your loop to see if you want to not bother with the column in your grid or whatever, you can do something like this:
SELECT * FROM (
-- This is the view code
SELECT 'data' as typ
,int_col
,varchar_col
FROM TABLE
UNION ALL
SELECT 'hdr' as typ
-- note that different types have to be handled differently
,CASE WHEN COUNT(int_col) = 0 THEN NULL ELSE 0 END
,CASE WHEN COUNT(varchar_col) = 0 THEN NULL ELSE '' END
FROM TABLE
) AS X
-- have to get header row first
ORDER BY typ DESC -- add other sort criteria here
If we're reading your question right, there won't be a way to do this in SQL. The output of a view must be a relation - in (over-)simplified terms, it must be rectangular. That is, each row must have the same number of columns.
If you can tell us more about your data and give us some idea of what you want to do with the output, we can perhaps offer more positive suggestions.
In general, add a WHERE clause to your query, e.g.
WHERE a IS NOT NULL AND b IS NOT NULL AND c IS NOT NULL
Here, a b c are your column names.
If you are joining tables together on potentially NULL columns, then use an INNER JOIN, and NULL values will not be included.
EDIT: I may have misunderstood - the above filters out rows, but you may be asking to filter out columns, e.g. you have several columns and you only want to display columns that contain at least one null value across all the rows you are returning. Using dynamic SQL offers a solution, since the set columns varies depending upon your data.
Here's a SQL query that builds another SQL query containing the appropriate columns. You could run this query, and then submit it's result as another query. It assumes 'pk' is some column that is always non-null, e.g. a primary key - this means we can prefix additional row names with a comma.
SELECT CONCAT("SELECT pk"
CASE (count(columnA)) WHEN 0 THEN '' ELSE ',columnA' END,
CASE (count(columnB)) WHEN 0 THEN '' ELSE ',columnB' END,
// etc..
' FROM (YourQuery) base')
FROM
(YourQuery) As base
The query works using Count(column) - the aggregate function ignores NULL values, and so returns 0 for a column consisting entirely of NULLs. The query builder assumes that YourQuery uses aliases to ensure there no duplicate column names.
While you cant put this into a view, you could wrap it up as a stored procedure that copies the data to another table - the result table. You may also set up a trigger so that the result table is updated whenever the base tables change.
I suspect what's going on is that an end user is running CrystalReports and complaining about all the empty columns that have to be removed manually.
It would actually be possible to create a stored procedure that would create a view on the fly, leaving out dataless columns. But then you would have to run this proc before using the view.
Is that acceptable?