I often meet in different reports intermediate roll-ups like this:
|calltypename |rating |number |
+-----------------+----------------------------------------+-------+
|sales |1.0 |1 |
|sales |5.0 |2 |
| |3.666666666666666666666666666667 |3 |
|service |1.0 |1 |
|service |3.0 |1 |
|service |5.0 |3 |
|service |9.0 |1 |
| |4.666666666666666666666666666667 |6 |
Here records are grouped by calltypename with intermediate roll-ups:
average rating and sum of numbers.
Informix SQL have no ROLLUP operator, so I'm trying to achieve similar result with UNION:
select calltypename, TO_NUMBER(datavalue) as rating, count(*) as number
from calldata
where datakey="qrate1"
group by calltypename, rating
union all
select calltypename, AVG(TO_NUMBER(datavalue)) as rating, count(*) as number
from calldata
where datakey="qrate1"
group by calltypename
order by calltypename, rating
It produces the following result:
|calltypename |rating |number |
+-----------------+----------------------------------------+-------+
|sales |1.0 |1 |
|sales |3.666666666666666666666666666667 |3 |
|sales |5.0 |2 |
|service |1.0 |1 |
|service |3.0 |1 |
|service |4.666666666666666666666666666667 |6 |
|service |5.0 |3 |
|service |9.0 |1 |
Is there any hint how to order the records so that roll-ups will always take their place below the related group?
After some time, I have found a solution that I do not like very much. Idea is to add a fake column "ROLLUP" that will be used in ORDER BY statement:
select calltypename as queue, "" as rollup,
datavalue as rating, count(*) as number
from calldata
where datakey="qrate1"
group by queue, rating
union all
select calltypename as queue, "rollup" as rollup,
TO_CHAR(AVG(TO_NUMBER(datavalue)),"*.*") as rating, count(*) as number
from calldata
where datakey="qrate1"
group by queue
order by queue, rollup, rating
This produce a result:
|queue |rollup |rating|number |
+-------+----------+------+-------+
|sales | |1 |1 |
|sales | |5 |2 |
|sales |rollup |3.7 |3 |
|service| |1 |1 |
|service| |3 |1 |
|service| |5 |3 |
|service| |9 |1 |
|service|rollup |4.7 |6 |
But I would like it without ROLLUP column...
Related
I have the following table in Impala.
|LogTime|ClientId|IsNewSession|
|1 |123 |1 |
|2 |123 | |
|3 |123 | |
|3 |666 |1 |
|4 |666 | |
|10 |123 |1 |
|23 |666 |1 |
|24 |666 | |
|25 |444 |1 |
|26 |444 | |
I want to make a new table as follows:
|LogTime|ClientId|IsNewSession|SessionId|
|1 |123 |1 |1 |
|2 |123 | |1 |
|3 |123 | |1 |
|3 |666 |1 |1 |
|4 |666 | |1 |
|10 |123 |1 |2 |
|23 |666 |1 |2 |
|24 |666 | |2 |
|25 |444 |1 |1 |
|26 |444 | |1 |
Basically, I want to make SessionId column that has a unique session ID per set of rows until there's a value of 1 in IsNewSession column after group by ClientId, to differentiate different sessions per ClientId.
I've made IsNewSession column to do so, but not sure how to iterate on the rows to make SessionId column.
Any help would be greatly appreciated!
You can use a cumulative sum:
select t.*,
sum(isnewsession) over (partition by clientid order by logtime) as sessionid
from t;
Is it possible to "group" rows within BigQuery/SQL depending on column values? Let's say I want to assign a string/id for all rows between stream_start_init and stream_start and then do the same for the rows between stream_resume and the last stream_ad.
The amount of stream_ad event can differ hence I can't use a RANK() or ROW() to group them be based on those values.
|id, timestamp, event|
|1 | 1231231 | first_visit|
|2 | 1231232 | login|
|3 | 1231233 | page_view|
|4 | 1231234 | page_view|
|5 | 1231235 | stream_start_init|
|6 | 1231236 | stream_ad|
|7 | 1231237 | stream_ad|
|8 | 1231238 | stream_ad|
|9 | 1231239 | stream_start|
|6 | 1231216 | stream_resume|
|6 | 1231236 | stream_ad|
|7 | 1231217 | stream_ad|
|8 | 1231258 | stream_ad|
|10| 1231240 | page_view|
How I wish the table to be
|id, timestamp, event, group_id|
|1 | 1231231 | first_visit, null|
|2 | 1231232 | login, null|
|3 | 1231233 | page_view, null|
|4 | 1231234 | page_view, null|
|5 | 1231235 | stream_start_init, group_1|
|6 | 1231236 | stream_ad, group_1|
|7 | 1231237 | stream_ad, group_1|
|8 | 1231238 | stream_ad, group_1|
|9 | 1231239 | stream_start, group_1|
|6 | 1231216 | stream_resume, group_2|
|6 | 1231236 | stream_ad, group_2|
|7 | 1231217 | stream_ad, group_2|
|8 | 1231258 | stream_ad, group_2|
|10| 1231240 | page_view, null|
I wouldn't assign a string. I would assign a number. This appears to be a cumulative sum. I think a sum of the number of "stream_start_init" and "stream_resume" does what you want:
select t.*,
countif(event in ('stream_start_init', 'stream_resume')) over (order by timestamp) as group_id
from t;
Note that this produces 0 for the first group -- which seems like a good thing. You can convert that to a NULL using NULLIF().
If you really want strings, you can use CONCAT().
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
IF(event IN ('stream_start_init', 'stream_start', 'stream_resume', 'stream_ad'),
COUNTIF(event IN ('stream_start_init', 'stream_resume')) OVER(ORDER BY timestamp),
NULL
) AS group_id
FROM `project.dataset.table`
I have the following input table. I need a smart way to dynamically renumber the parents section indexes starting from "01" and show them in a new column.
I'm using SQL Server 2014 Express SP2
MyTable:
ID Integer
SECTION Varchar
Query:
SELECT * FROM MyTable
Results:
+--+--------+
|ID|SECTION |
+--+--------+
|1 |03 |
|2 |03.01 |
|3 |03.01.01|
|4 |03.02 |
|5 |03.03 |
|6 |04 |
|7 |04.01 |
|8 |04.02 |
|9 |05 |
+--+--------+
Here is what I'm trying to achieve from my select or procedure:
+--+--------+--------+
|ID|SECTION |NEWSECT |
+--+--------+--------+
|1 |03 |01 |
|2 |03.01 |01.01 |
|3 |03.01.01|01.01.01|
|4 |03.02 |01.02 |
|5 |03.03 |01.03 |
|6 |04 |02 |
|7 |04.01 |02.01 |
|8 |04.02 |02.02 |
|9 |05 |03 |
+--+--------+--------+
This is just string operations:
select t.*,
stuff(section, 1, 2,
right(concat('00', dense_rank() over (order by left(section, 2))), 2)
)
from t;
I mean, the dense_rank() is doing the work for renumbering the main sections. The rest is just getting the value into your section.
Here is a db<>fiddle.
I have a table with a bunch of scores for lessons, on a user by user basis
------------------------
|uid |lesson_id |score |
------------------------
|1 |0 |20 |
|1 |0 |25 |
|1 |0 |15 |
|1 |0 |40 |
|1 |1 |70 |
|1 |0 |10 |
|1 |1 |20 |
|1 |1 |55 |
|1 |1 |55 |
|1 |0 |5 |
|1 |2 |65 |
------------------------
I also have a table of all possible lessons that can be scored:
------------
|lesson_id |
------------
|0 |
|1 |
|2 |
|3 |
|4 |
|5 |
------------
I need to calculate the maximum score for each lesson in the second table from the scores in the first table, and take an average of that over the number of lessons in the second table:
So, the maximum scores for the scores table are (for user 1):
-----------------------
|lesson_id |max_score |
-----------------------
|0 |40 |
|1 |70 |
|2 |65 |
-----------------------
I need to sum them: 175 and divide by the total number of lessons in table 2: 6 which should give the answer 29.16
Any ideas how to do this in a single statement?
I can get the average of all max values for the scores table (for user 1) like so:
SELECT AVG(max_score) AS avg_max_score FROM
(
SELECT uid, lesson_id, MAX(score) AS max_score FROM cdu_user_progress
AND uid =1
GROUP BY lesson_id
) AS m
SELECT
AVG(max_score)
FROM
(
SELECT
lesson.lesson_id,
max(isNull(score,0)) as max_score
FROM
lesson
LEFT JOIN
cdu_user_progress
ON
lesson.lesson_id = cdu_user_progress.lesson_id
GROUP BY
lesson.lesson_id
) AS m
I need to get the sum of the scores for the first of each lesson_id, but I also need the overall min and max scores for all lesson_ids as well as some other info:
cdu_groups:
----------------
|id |name |
----------------
|1 |group_1 |
|2 |group_2 |
----------------
cdu_user_progress145:
-----------------------------------------------------------
|id |uid |group_id |lesson_id |game_id |score |date |
-----------------------------------------------------------
|1 |145 |1 |1 |0 |40 |1391627323 |
|2 |145 |1 |1 |0 |80 |1391627567 |
|3 |145 |1 |2 |0 |40 |1391627323 |
|4 |145 |1 |3 |0 |30 |1391627323 |
|5 |145 |1 |3 |0 |90 |1391627567 |
|6 |145 |1 |4 |0 |20 |1391627323 |
|7 |145 |1 |5 |0 |35 |1391627323 |
-----------------------------------------------------------
I need this output:
-----------------------------------------------------------------
|name |group_id |min_score |max_score |... |sum_first_scores |
-----------------------------------------------------------------
|group_1 |1 |20 |90 |... |165 |
-----------------------------------------------------------------
SELECT
cdu_groups.*,
MAX(score) AS max_score,
MIN(score) AS min_score,
COUNT(DISTINCT(lesson_id)) AS scored_lesson_count,
COUNT(DISTINCT CASE WHEN score >= 75 then lesson_Id ELSE NULL END) as passed_lesson_count,
SUM(first_scores.first_score) AS sum_first_scores
FROM cdu_user_progress
JOIN cdu_groups ON cdu_groups.id = cdu_user_progress.group_id
JOIN
(
SELECT lesson_id, MIN(date), score AS first_score FROM cdu_user_progress
WHERE cdu_user_progress.uid = 145
GROUP BY lesson_id
) AS first_scores ON first_scores.lesson_id = cdu_user_progress.lesson_id
WHERE cdu_user_progress.uid = 145
I'm getting this error though:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SUM(first_scores.first_score) AS sum_first_scores FROM cdu_user_progress ' at line 7