My Fact table looks like this:
ticketID price statusID
1 100 1
2 100 1
2 100 2
3 150 1
I am using SSAS to create an OLAP Cube for my data warehouse.
I cannot use the aggregateFunction 'Sum' for the measure 'total price', because I will get 450$ instead of 350$ (which is the correct tatal)
Regards
Then you can still add a view in db like this:
select
ticketid,
price,
statusid,
case when rn=1 then 1 else 0 end as IsMaxStatus
from
(select ticketid,price,statusid,
row_number()over
(partition by ticketid, price order by statusid desc) as rn
from yourFactTb
) as fact
Then add a dimension [IsMaxStatus] which inlcuded two records of 0/1 in your cube and set the Dim-usage as regular with that measure-group depended on above fact table, and then add a calculated measure say [cal-price] with below formula:
with member [cal-price] as
([Price],[IsMaxStatus].[IsMaxStatus].&[1])
select [cal-price] on 0
from [YourCube]
You can also calculate other measure by this measure-group without the filter of dim [IsMaxStatus]
Hope it helps.
www.mdx-helper.com
Related
I am using SQL Server and wondering if it is possible to iterate through time series data until specific condition is met and based on that label my data in other table?
For example, let's say I have a table like this:
Id Date Some_kind_of_event
+--+----------+------------------
1 |2018-01-01|dsdf...
1 |2018-01-06|sdfs...
1 |2018-01-29|fsdfs...
2 |2018-05-10|sdfs...
2 |2018-05-11|fgdf...
2 |2018-05-12|asda...
3 |2018-02-15|sgsd...
3 |2018-02-16|rgw...
3 |2018-02-17|sgs...
3 |2018-02-28|sgs...
What I want to get, is to calculate for each key the difference between two adjacent events and find out if there exists difference > 10 days between these two adjacent events. In case yes, I want to stop iterating for that specific key and put label 'inactive', otherwise 'active' in my other table. After we finish with one key, we start with another.
So for example id = 1 would get label 'inactive' because there exists two dates which have difference bigger that 10 days. The final result would be like that:
Id Label
+--+----------+
1 |inactive
2 |active
3 |inactive
Any ideas how to do that? Is it possible to do it with SQL?
When working with a DBMS you need to get away from the idea of thinking iteratively. Instead you need to try and think in sets. "Instead of thinking about what you want to do to a row, think about what you want to do to a column."
If I understand correctly, is this what you're after?
CREATE TABLE SomeEvent (ID int, EventDate date, EventName varchar(10));
INSERT INTO SomeEvent
VALUES (1,'20180101','dsdf...'),
(1,'20180106','sdfs...'),
(1,'20180129','fsdfs..'),
(2,'20180510','sdfs...'),
(2,'20180511','fgdf...'),
(2,'20180512','asda...'),
(3,'20180215','sgsd...'),
(3,'20180216','rgw....'),
(3,'20180217','sgs....'),
(3,'20180228','sgs....');
GO
WITH Gaps AS(
SELECT *,
DATEDIFF(DAY,LAG(EventDate) OVER (PARTITION BY ID ORDER BY EventDate),EventDate) AS EventGap
FROM SomeEvent)
SELECT ID,
CASE WHEN MAX(EventGap) > 10 THEN 'inactive' ELSE 'active' END AS Label
FROM Gaps
GROUP BY ID
ORDER BY ID;
GO
DROP TABLE SomeEvent;
GO
This assumes you are using SQL Server 2012+, as it uses the LAG function, and SQL Server 2008 has less than 12 months of any kind of support.
Try this. Note, replace #MyTable with your actual table.
WITH Diffs AS (
SELECT
Id
,DATEDIFF(DAY,[Date],LEAD([Date],1,0) OVER (ORDER BY [Id], [Date])) Diff
FROM #MyTable)
SELECT
Id
,CASE WHEN MAX(Diff) > 10 THEN 'Inactive' ELSE 'Active' END
FROM Diffs
GROUP BY Id
Just to share another approach (without a CTE).
SELECT
ID
, CASE WHEN SUM(TotalDays) = (MAX(CNT) - 1) THEN 'Active' ELSE 'Inactive' END Label
FROM (
SELECT
ID
, EventDate
, CASE WHEN DATEDIFF(DAY, EventDate, LEAD(EventDate) OVER(PARTITION BY ID ORDER BY EventDate)) < 10 THEN 1 ELSE 0 END TotalDays
, COUNT(ID) OVER(PARTITION BY ID) CNT
FROM EventsTable
) D
GROUP BY ID
The method is counting how many records each ID has, and getting the TotalDays by date differences (in days) between the current the next date, if the difference is less than 10 days, then give me 1, else give me 0.
Then compare, if the total days equal the number of records that each ID has (minus one) would print Active, else Inactive.
This is just another approach that doesn't use CTE.
So I've been just re-familiarizing myself with SQL after some time away from it, and I am using Mode Analytics sample Data warehouse, where they have a dataset for SF police calls in 2014.
For reference, it's set up as this:
incident_num, category, descript, day_of_week, date, time, pd_district, Resolution, address, ID
What I am trying to do is figure out the total number of incidents for a category, and a new column of all the people who have been arrested. Ideally looking something like this
Category, Total_Incidents, Arrested
-------------------------------------
Battery 10 4
Murder 200 5
Something like that..
So far I've been trying this out:
SELECT category, COUNT (Resolution) AS Total_Incidents, (
Select COUNT (resolution)
from tutorial.sf_crime_incidents_2014_01
where Resolution like '%ARREST%') AS Arrested
from tutorial.sf_crime_incidents_2014_01
group by 1
order by 2 desc
That returns the total amount of incidents correctly, but for the Arrested, it keeps printing out 9014 Arrest
Any idea what I am doing wrong?
The subquery is not correlated. It just selects the count of all rows. Add a condition, that checks for the category to be equal to that of the outer query.
SELECT o.category,
count(o.resolution) total_incidents,
(SELECT count(i.resolution)
FROM tutorial.sf_crime_incidents_2014_01 i
WHERE i.resolution LIKE '%ARREST%'
AND i.category = o.category) arrested
FROM tutorial.sf_crime_incidents_2014_01 o
GROUP BY 1
You could use this:
SELECT category,
COUNT(Resolution) AS Total_Incidents,
SUM(CASE WHEN Resolution LIKE '%ARREST%' THEN 1 END) AS Arrested
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY category
ORDER BY 2 DESC;
does anyone know how to sum and subtract from aggregate values of a group? I have Groups 1,2, and 3 with Amounts. I want to take the sum of Group 1 and subtract it by the sum of Group 2 in a row outside the main row grouping. I've used IF statements and looked all over as well as tested everything i can think of. Can anyone shed light how to do this?
DataSet1
Fields: "Group", "Amounts"
Note: SQL is not an option because i have a column grouping across the top as well.
I don't know what you mean by "outside the main row grouping." Could you provide an example?
From your description (less the one hedge factor I just mentioned), it looks like all you need to do is this:
select <whatever>, Sum( Field1 ) as Group1, Sum( Field2 ) as Group2,
Sum( Field1 ) - Sum( Field2 ) as FinalAmount
from ...
Edit: So you have an actual column called "Group" and you want sums of each group designated by the value of that column. That's easy to do.
select Group,
Sum( case Group when 1 then Amount else 0 end ) as Group1,
Sum( case Group when 2 then Amount else 0 end ) as Group2,
Sum( case Group when 3 then Amount else 0 end ) as Group3
from ...
If you want to do further processing between the group totals, I would suggest CTE or derived/inline view.
I am trying to get a summary of the balance per month within my database. The table has the following fields
tran_date
type (Income or Expense)
amount
I can get as far as retrieving the sum for each type for every month but want the sum for the whole month. This is my current query:
SELECT DISTINCT strftime('%m%Y', tran_date), type, SUM(amount) FROM tran WHERE exclude = 0 GROUP BY tran_date, type
This returns
032013 Income 100
032013 Expense 200
I would like the summary on one row, in this example 032013 -100.
Just use the right group by. This uses conditional aggregation, assuming that you want "income - expense":
SELECT strftime('%m%Y', tran_date), type,
SUM(case when type = 'Income' then amount when type = 'Expense' then - amount end)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
If you want just the full sum, then this is easier:
SELECT strftime('%m%Y', tran_date), type,
SUM(amount)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
Your original query returned type rows because "type" was in the group by clause.
Also, distinct is (almost) never needed with group by.
Imagine I have a table showing the sales of Acme Widgets, and where they were sold. It's fairly easy to produce a report grouping sales by country. It's fairly easy to find the top 10. But what I'd like is to show the top 10, and then have a final row saying Other. E.g.,
Ctry | Sales
=============
GB | 100
US | 80
ES | 60
...
IT | 10
Other | 50
I've been searching for ages but can't seem to find any help which takes me beyond the standard top 10.
TIA
I tried some of the other solutions here, however they seem to be either slightly off, or the ordering wasn't quite right.
My attempt at a Microsoft SQL Server solution appears to work correctly:
SELECT Ctry, Sales FROM
(
SELECT TOP 2
Ctry,
SUM(Sales) AS Sales
FROM
Table1
GROUP BY
Ctry
ORDER BY
Sales DESC
) AS Q1
UNION ALL
SELECT
Ctry AS 'Other',
SUM(Sales) AS Sales
FROM
Table1
WHERE
Ctry NOT IN (SELECT TOP 2
Ctry
FROM
Table1
GROUP BY
Ctry
ORDER BY
SUM(Sales) DESC)
Note that in my example, I'm only using TOP 2 rather than TOP 10. This is simply due to my test data being rather more limited. You can easily substitute the 2 for a 10 in your own data.
Here's the SQL Script to create the table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Table1](
[Ctry] [varchar](50) NOT NULL,
[Sales] [float] NOT NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
And my data looks like this:
GB 10
GB 21.2
GB 34
GB 16.75
US 10
US 11
US 56.43
FR 18.54
FR 98.58
WE 44.33
WE 11.54
WE 89.21
KR 10
PO 10
DE 10
Note that the query result is correctly ordered by the Sales value aggregate and not the alphabetic country code, and that the "Other" category is always last, even if it's Sales value aggregate would ordinarily push it to the top of the list.
I'm not saying this is the best (read: most optimal) solution, however, for the dataset that I provided it seems to work pretty well.
SELECT Ctry, sum(Sales) Sales
FROM (SELECT COALESCE(T2.Ctry, 'OTHER') Ctry, T1.Sales
FROM (SELECT Ctry, sum(Sales) Sales
FROM Table1
GROUP BY Ctry) T1
LEFT JOIN
(SELECT TOP 10 Ctry, sum(sales) Sales
FROM Table1
GROUP BY Ctry) T2
on T1.Ctry = T2.Ctry
) T
GROUP BY Ctry
The pure SQL solutions to this problem make multiple passes through the individual records more than once. The following solution only queries the data once, and uses a SQL ranking function, ROW_NUMBER() to determine if some results belong in the "Other" category. The ROW_NUMBER() function has been available in SQL Server since SQL Server 2008. In my database, this seems to have resulted in a more efficient query. Please note that the "Other" row will appear above some rows if the total of the "Other" sales exceeds the top 10. If this is not desired some adjustments would need to be made to this query:
SELECT CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END AS Ctry,
SUM(Sales) as Sales FROM
(
SELECT Ctry, SUM(Sales) as Sales,
ROW_NUMBER() OVER(ORDER BY SUM(Sales) DESC) AS RowNumber
FROM Table1 GROUP BY Ctry
) as AggregateQuery
GROUP BY CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END
ORDER BY SUM(Sales) DESC
Using a real analytics SQL engine, such as Apache Spark, you can use Common Table Expression with to do:
with t as (
select rank() over (order by sales desc) as r, sales,city
from DB
order by sales desc
)
select sales, city, r
from t where r <= 10
union
select sum(sales) as sales, "Other" as city, 11 as r
from t where r > 10
In pseudo SQL:
select top 10 order by sales
UNION
select 'Other',SUM(sales) where Ctry not in (select top 10 like above)
Union the top ten with an outer Join of the top ten with the table it self to aggregate the rest.
I don't have access to SQL here but I'll hazzard a guess:
select top (10) Ctry, sales from table1
union all
select 'other', sum(sales)
from table1
left outer join (select top (10) Ctry, sales from table1) as table2
on table2.Ctry = table2.Ctry
where table2.ctry = null
group by table1.Ctry
Of course if this is a rapidly changing top(10) then you either lock or maintain a copy of the top(10) for the duration of the query.
Have in mind that depending on your use (and database volume / restrictions) you can achieve the same results using application code (python, node, C#, java etc). Sure it will depend on your use-case but hey, it's possible.
I ended up doing this in C# for instance:
// Mockup Class that has a CATEGORY and it's VOLUME
class YourModel { string category; double volume; }
List<YourModel> groupedList = wholeList.Take (5).ToList ();
groupedList.Add (new YourModel()
{
category = "Others",
volume = tempChartData.Skip (5).Select (t => t.qtd).Sum ()
});
Disclaimer
I understand that this is a "SQL Only" tagged question, but there might be other people like me out there who can make use of the application layer instead of relying only on SQL to make it happen. I am just trying to show people other ways of doing the same thing, that might be helpful. Even if this gets downvoted to oblivion I know that someone will be happy to read this because they were taught to use each tool to it's best, and think "outside the box".