SQL Hive add column based on column value - sql

I have a query which looks like
select
number,
class,
unix_timestamp(date1) - unix_timestamp(date2) as time_example,
sum(unix_timestamp(date1) - unix_timestamp(date2)) over(partition by unix_timestamp(date1) - unix_timestamp(date2) order by class) as class_time
from myTable
Which gives results such as
number class time_example class_time
1 math 5 5
1 science 5 10
1 art 5 15
1 math 2 2
1 science 2 4
1 art 2 6
1 math 10 10
1 science 10 20
1 art 10 30
I want to add columns based on class, and only have 3 different columns because there are only 3 columns. For example the time for math and would get 17. This is the desired table i would like to get
number class class_time
1 math 17
1 science 17
1 art 17

You can do that by using group by
select number, class,
sum(unix_timestamp(date1) - unix_timestamp(date2)) as class_time
from myTable
group by number,class;

Related

Burndown analysis in SQL Server Management Studio

I'm trying to prepare my data to create a burndown visual. As you can see the Rate column isn't simply A - B, as it carries forward the previous value if B is null.
I've tried some case statements using lag and sums but no avail.
Some direction on the case statement or an optimal solution would be ideal.
For example, this is how my data looks:
ID
A
B
1
20
NULL
2
20
3
3
20
NULL
4
20
7
5
20
NULL
6
20
NULL
7
20
NULL
8
20
5
9
20
7
And I want a rate column that looks like this.
ID
A
B
Rate
1
20
NULL
20
2
20
3
17
3
20
NULL
17
4
20
7
10
5
20
NULL
10
6
20
NULL
10
7
20
NULL
10
8
20
5
5
9
20
7
-2
Thanks to #Larnu for the guidance.
Here is the solution when you have your data partitioned by some group ID and ordered by some data or row ID.
SELECT
GROUP_ID,
ROW_ID,
COL_A,
COL_B,
COL_A - (SUM(ISNULL(COL_B,0)) OVER (PARTITION BY GROUP_ID ORDER BY ROW_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM table

In a game show database scenario, how do I fetch the average total episode score per season in a single query?

Pardon the title gore. I'm having trouble finding a good way to express my question, which is endemic to the problem.
The Tables
season
id name
---- ------
1 Season 1
2 Season 2
3 Season 3
episode
id season_id number title
---- ----------- -------- ---------------------------------------
1 1 1 Pilot
2 1 2 1x02 - We Got Picked Up
3 1 3 1x03 - This is the Third Episode
4 2 1 2x01 - We didn't get cancelled.
5 2 2 2x02 - We're running out of ideas!
6 3 1 3x01 - We're still here.
7 3 2 3x02 - Okay, this game show is dying.
8 3 3 3x03 - Untitled
score
id episode_id score contestant_id (table not given)
---- ------------ ------- ---------------------------------
1 1 35 1
2 1 -12 2
3 1 8 3
4 1 5 4
5 2 13 1
6 2 -2 5
7 2 3 3
8 2 -14 6
9 3 -14.5 1
10 3 -3 2
11 3 1.5 7
12 3 9.5 5
13 4 22.8 1
14 4 -3 8
15 5 2 1
16 5 13.5 9
17 5 7 3
18 6 13 1
19 6 -84 10
20 6 12 11
21 7 3 1
22 7 10 2
23 8 29 1
24 8 1 5
As you can see, you have multiple episodes per season, and multiple scores per episode (one score per contestant). Contestants can reappear in later episodes (irrelevant), scores are floating point values, and there can be an arbitrary number of scores per episode.
So what am I looking for?
I'd like to get the average total episode score per season, where the total episode score is the sum of all the scores in an episode. Mathematically, this comes out to be the sum of all scores in a season divided by the number of episodes. Easy enough to comprehend, but I have had trouble doing it in a single query and getting the correct result. I'd like an output like the following:
name average_total_episode_score
---------- -----------------------------
Season 1 9.83
Season 2 21.15
Season 3 -5.33
The top-level query needs to be on the season table as it will be combined with other, similar queries on the same table. It's easy enough to do this with an aggregate in a subquery, but an aggregation executes the subquery, failing my single-query requirement. Can this be done in a single query?
Hope this should work
Select s.id, avg(score)
FROM Season S,
Episode e,
Score sc
WHERE s.id = e.season_id
AND e.id = sc.episode_id
Group by s.id
Okay, just figured it out. As usual, I had to write and post a whole book before the simple solution descended upon me.
The problem in my query (which I didn't give in the question) was the lack of a DISTINCT count. Here is a working query:
SELECT
"season"."id",
"season"."name",
(SUM("score"."score") / COUNT(DISTINCT "episode"."id")) AS "average_total_episode_score"
FROM "season"
LEFT OUTER JOIN "episode"
ON ("season"."id" = "episode"."season_id")
LEFT OUTER JOIN "score"
ON ("episode"."id" = "score"."episode_id")
GROUP BY "season"."id"
select Se.id AS Season_Id, sum(score) As season_score, avg(score) from score S join episode E ON S.episode_id = E.id
join Season se ON se.id = e.season_id group by se.id

Calculate scattered rows's average in sql server?

The table looks like below:
testid stepid serverid duration
1 1 1 10
1 2 1 11
2 1 2 12
2 2 2 13
3 1 1 14
3 2 1 15
4 1 2 16
4 2 2 17
4 tests ran on two servers. Each test has 2 steps. I would like to calculate average duration of each step of all tests on the 2 servers given test id. For example, if given test ids are 1 and 2, the final table looks like below:
stepid avg_duration
1 (10 + 12) / 2
2 (11 + 13) / 2
This is just a group by, right?
select stepid, avg(duration)
from t
where testid in (1, 2)
group by stepid;
Note: You might want avg(duration*1.0) if you want "normal" division.

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?
You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;

MDX: iif condition on the value of dimension

I have 1 Virtual cube consists of 2 cubes.
Example of fact table of 1st cube.
id object_id time_id date_id state
1 10 2 1 0
2 11 5 1 0
3 10 7 1 1
4 10 3 1 0
5 11 4 1 0
6 11 7 1 1
7 10 8 1 0
8 11 5 1 0
9 10 7 1 1
10 10 9 1 2
Where State: 0 - Ok, 1 - Down, 2 - Unknown
For this cube I have one measure StateCount it should count States for each object_id.
Here for example we have such result:
for 10 : 3 times Ok , 2 times Down, 1 time Unknown
for 11 : 3 times Ok , 1 time Down
Second cube looks like this:
id object_id time_id date_id status
1 10 2 1 0
2 11 5 1 0
3 10 7 1 1
4 10 3 1 1
5 11 4 1 1
Where Status: 0 - out, 1 - in. I keep this in StatusDim.
In this table I keep records that should not be count. If object have status 1 that means that I have exclude it from count.
If we intersect these tables and use StateCount we will receive this result:
for 10 : 2 times Ok , 1 times Down, 1 time Unknown
for 11 : 2 times Ok , 1 time Down
As far as i know, i must use calculated member with IIF condition. Currently I'm trying something like this.
WITH MEMBER [Measures].[StateTimeCountDown] AS(
iif(
[StatusDimDown.DowntimeHierarchy].[DowntimeStatus].CurrentMember.MemberValue
<> "in"
, [Measures].[StateTimeCount]
, null )
)
The multidimensional way to do this would be to make attributes from your state and status columns (hopefully with user understandable members, i. e. using "Ok" and not "0"). Then, you can just use a normal count measure on the fact tables, and slice by these attributes. No need for complex calculation definitions.