Calculation going wrong due to JOIN issue - sql

Table -
+----+-----------+-----------+---------+---------------------+------------+
| ID | Client_Id | Driver_Id | City_Id | Status | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
My attempt -
WITH src
AS (SELECT Count(status) AS Denom,
request_at
FROM trips
WHERE status = 'completed'
GROUP BY request_at),
src2
AS (SELECT Count(status) AS Num,
request_at
FROM trips
WHERE status <> 'completed'
GROUP BY request_at)
SELECT Cast(Count(num) AS FLOAT)/Cast(Count(Denom) AS FLOAT) AS cancel_rate,
trips.request_at
FROM src,
src2,
trips
GROUP BY trips.request_at;
I am trying to find the cancellation rate per day but it is clearing wrong (MY OUTPUT)-
+-------------+------------+
| cancel_rate | request_at |
+-------------+------------+
| 24 | 2013-10-01 |
| 18 | 2013-10-02 |
| 18 | 2013-10-03 |
+-------------+------------+
The cancellation rate for 2013-10-01 should be 0.5 and not 24. Similarly for other dates it should be different.
I know the problem lies with this part but I do not know what is the correct way or how to approach it
SELECT Cast(Count(num) AS FLOAT)/Cast(Count(Denom) AS FLOAT) AS cancel_rate,
trips.request_at
FROM src,
src2,
trips
Is there any way to put in more than 1 select statement in With NAME as () clause ? So that I won't use any JOIN or multiple tables.

Use conditional aggregation:
SELECT SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as denom,
SUM(CASE WHEN status <> 'completed' THEN 1 ELSE 0 END) as num,
AVG(CASE WHEN status <> 'completed' THEN 1.0 ELSE 0 END) as cancel_rate
FROM trips
GROUP BY request_at;
Note that calculation for the cancel_rate. This is simpler to do using AVG() rather than dividing the two values. The use of 1.0is because SQL Server does integer arithmetic, so 1 / 2 is 0 rather than 0.5.

OK, a bit late again, but here is another variation (edited):
SELECT SUM(CASE LEFT(status,9) WHEN 'cancelled' THEN 1. ELSE 0 END)
/COUNT(*) cancellation_rate,
request_at
FROM trips GROUP BY request_at ORDER BY request_at

Related

How do I return a row with 0 if the group by has 0 records?

I have this alert_levels table:
| id | levels |
-----------------
| 1 | critical |
| 2 | error |
| 3 | warning |
| 4 | info |
Then I have this alerts table
| id | alert_time | alert_level_id | alert_type |
--------------------------------------------------------------
| 1 | 2020-03-01 08:01:00.000 | 4 | Type 1 |
| 2 | 2020-03-03 10:58:00.000 | 4 | Type 1 |
| 3 | 2020-03-17 09:05:00.000 | 4 | Type 2 |
| 4 | 2020-03-21 21:03:00.000 | 4 | Type 2 |
| 5 | 2020-03-27 23:10:00.000 | 4 | Type 1 |
| 6 | 2020-04-10 05:49:00.000 | 4 | Type 2 |
| 7 | 2020-04-10 06:29:00.000 | 4 | Type 2 |
| 8 | 2020-04-14 18:56:00.000 | 4 | Type 2 |
| 9 | 2020-04-19 22:34:00.000 | 4 | Type 2 |
...
The alert_level_id in the alerts table is a foreign key of id from the alert_levels table.
What I want is to count the number of occurences of each alert_type grouped by the alert_level_id whithin a chosen time period. And if there is no occurency then it should show 0.
This is how it should look like:
| alert_level_id | type_1_count | type_2_count | total_count|
-------------------------------------------------------------
| 1 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 |
| 4 | 9 | 130 | 139 |
I've tried something like this:
SELECT al.id,
count(CASE WHEN alert_type = 'Type 1' THEN 1 END) type_1_count,
count(CASE WHEN log_type = 'Type 2' THEN 1 END) type_2_count,
count(CASE WHEN log_type = 'Type 1' OR log_type = 'Type 2' THEN 1 END) total_count
FROM alert_levels al
LEFT JOIN alerts a ON al.id = a.alert_level_id
WHERE a.alert_time >= ? AND a.alert_time < ?
GROUP BY al.id
ORDER BY al.id ASC;
The first thing with this query is that I feel like there is a simpler query for this, and secondly if there is only alerts with a an alert_level_id 4 in the chosen period, it only retuns one the row with that alert level. But I always want all 4 rows returned..
In Postgres, you can use filter for conditional aggregation:
SELECT al.id,
count(*) FILTER (WHERE a.alert_type = 'Type 1') as type_1_count,
count(*) FILTER (WHERE a.alter_type = 'Type 2') as type_2_count,
COUNT(a.id) as total_count
FROM alert_levels al LEFT JOIN
alerts a
ON al.id = a.alert_level_id AND
a.alert_time >= ? AND a.alert_time < ? AND
a.type in ('Type 1', 'Type 2')
GROUP BY al.id
ORDER BY al.id ASC;
Also note the conditions that have been moved to the ON clause.

Get the total time every time a truck has no speed in SQL?

I have the following table in SQL Server 2014:
Vehicle_Id | Speed | Event | Datetime
-----------+---------+--------------+----------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00
1 | 0 | Door-Closed | 2019-05-04 15:15:00
1 | 50 | Driving | 2019-05-04 15:35:00
1 | 0 | Parked | 2019-05-04 15:50:00
1 | 0 | Door-Open | 2019-05-04 15:51:00
1 | 0 | Door-Closed | 2019-05-04 15:52:00
1 | 50 | Driving | 2019-05-04 15:57:00
I need to identify blocks within a datetime in which the truck has been on speed = 0 for more than an hour. So every time a row appears with speed 0, it should create a unique block_id until a row with speed appears. So the total time should be the first time the truck has speed 0 until the next row it finds with speed > 0.
Expected Output:
Vehicle_Id | Speed | Event | Datetime | Block | Total_State_Time_Block(Minutes)
-----------+---------+--------------+------------------------+-------------+---------------------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 | 35 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 | 35 Minutes
1 | 50 | Driving | 2019-05-04 15:35:00 | 2 | 15 Minutes
1 | 0 | Parked | 2019-05-04 15:50:00 | 3 | 7 Minutes
1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 | 7 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 | 7 Minutes
1 | 50 | Driving | 2019-05-04 15:57:00 | 4 | ...
So, as it's ordered by datetime, the idea is to create groups of adjacent rows with speed = 0 so I can identify the times a truck hasn't moved for more than an hour.
I tried windowing functions to get the result by vehicle and day. But I can't achieve this last step.
You can try with lag()
select
vehicle_id,
speed,
event,
datetime,
sum(case when speed = rnk then 0 else 1 end) over (order by datetime) as block
from
(
select
*,
lag(speed) over (order by datetime) as rnk
from myTable
) val
output:
| vehicle_id | speed | event | datetime | block |
| ---------- | ----- | ----------- | ------------------------ | ----- |
| 1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 |
| 1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 |
| 1 | 50 | Driving | 2019-05-04 15:35:00 | 2 |
| 1 | 0 | Parked | 2019-05-04 15:50:00 | 3 |
| 1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 |
| 1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 |
| 1 | 50 | Driving | 2019-05-04 15:57:00 | 4 |
If you just want periods where the truck has been at speed = 0 for an hour or more, you don't need your expected output. Instead, you can look at the next value with a speed and calculate the decimal hours.
That is, you can get the blocks directly. This gets the start of the block with the duration:
select t.*,
datediff(second, datetime, coalesce(datetime, max_datetime)
) / (60.0 * 60) as decimal_hours
from (select t.*,
lag(speed) over (partition by vehicle_id order by datetime) as prev_speed
min(case when speed > 0 then datetime end) over (partition by vehicle_id order by datetime) as next_speed,
max(datetime) over (partition by vehicle_id) as max_datetime
from t
) t
where (prev_speed is null or prev_speed > 0) and
speed = 0

SQL - Identify consecutive numbers in a table

Is there a way to flag consecutive numbers in an SQL table?
Based on the values in 'value_group_4' column, is it possible to tag continous values? This needs to be done within groups of each 'date_group_1'
I tried using row_numbers, rank, dense_rank but unable to come up with a foolproof way.
This has nothing to do with consecutiveness. You simply want to mark all rows where date_group_1 and value_group_4 are not unique.
One way:
select
mytable.*,
case when exists
(
select null
from mytable agg
where agg.date_group_1 = mytable.date_group_1
and agg.value_group_4 = mytable.value_group_4
group by agg.date_group_1, agg.value_group_4
having count(*) > 1
) then 1 else 0 end as flag
from mytable
order by date_group_1, value_group_4;
In a later version of SQL Server you'd use COUNT OVER instead.
SQL tables represent unordered sets. There is no such thing as consecutive values, unless a column specifies the ordering. Your data does not have such an obvious column, but I'll assume one exists and just call it id for convenience.
With such a column, lag()/lead() does what you want:
select t.*,
(case when lag(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
when lead(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
else 0
end) as flag
from t;
On close inspection, value_group_3 may do what you want. So you can use that for the id.
If your version of SQL Server doesn't have a full suite of windowing functions it should be still possible. This problem looks like a last-non-null problem which Itzik Ben-Gan has good example here... http://www.itprotoday.com/software-development/last-non-null-puzzle
Also, look at Mikael Eriksson's answer here which uses no windowing functions.
If the order of your data is determined by the date_group_1, value_group_3 column values, then why not make it as simple as the following query:
select
*,
rank() over(partition by date_group_1 order by value_group_3) - 1 value_group_3,
case
when count(*) over(partition by date_group_1, value_group_3) > 1 then 1
else 0
end expected_result
from data;
Output:
| date_group_1 | category_group_2 | value_group_3 | value_group_3 | expected_result |
+--------------+------------------+---------------+---------------+-----------------+
| 2018-01-11 | A | 15.3 | 0 | 0 |
| 2018-01-11 | B | 17.3 | 1 | 1 |
| 2018-01-11 | A | 17.3 | 1 | 1 |
| 2018-01-11 | B | 21 | 3 | 0 |
| 2018-01-22 | A | 15.3 | 0 | 0 |
| 2018-01-22 | B | 17.3 | 1 | 0 |
| 2018-01-22 | A | 21 | 2 | 0 |
| 2018-01-22 | B | 23 | 3 | 0 |
| 2018-03-13 | A | 15.3 | 0 | 0 |
| 2018-03-13 | B | 17.3 | 1 | 1 |
| 2018-03-13 | A | 17.3 | 1 | 1 |
| 2018-03-13 | B | 23 | 3 | 0 |
| 2018-05-15 | A | 6 | 0 | 0 |
| 2018-05-15 | B | 6.3 | 1 | 0 |
| 2018-05-15 | A | 15 | 2 | 0 |
| 2018-05-15 | B | 16.3 | 3 | 1 |
| 2018-05-15 | A | 16.3 | 3 | 1 |
| 2018-05-15 | B | 22 | 5 | 0 |
| 2019-05-04 | A | 0 | 0 | 0 |
| 2019-05-04 | B | 7 | 1 | 0 |
| 2019-05-04 | A | 15.3 | 2 | 0 |
| 2019-05-04 | B | 17.3 | 3 | 0 |
Test it online with SQL Fiddle.

Using CTE to count number of rows in inner query

I'm learning CTE and I've encounter an exercise on I cannot solve. It is not a homework, but an exercise from an online course I've taken to learn SQL. I'm interested in where I've made a mistake and some explanation so answering with only the correct code will not help me to learn CTE.
The task is to count projects that raised 100% to 150% of the minimum amount, and those that raised more than 150%.
I've written the following CTE:
WITH nice_proj AS
(SELECT project_id AS pid,
amount AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal,
amount
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN sum(amount)/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY minimal;
The query returns nothing but it should produce something similar to:
+-------+----------------+
| count | tag |
+-------+----------------+
| 16 | good projects |
+-------+----------------+
| 7 | great projects |
+-------+----------------+
Please have a look at the tables (they are truncated):
donation
+----+------------+--------------+---------+------------+------------+
| id | project_id | supporter_id | amount | amount_eur | donated |
+----+------------+--------------+---------+------------+------------+
| 1 | 4 | 4 | 928.40 | 807.70 | 2016-09-07 |
+----+------------+--------------+---------+------------+------------+
| 2 | 8 | 18 | 384.38 | 334.41 | 2016-12-16 |
+----+------------+--------------+---------+------------+------------+
| 3 | 6 | 12 | 367.21 | 319.47 | 2016-01-21 |
+----+------------+--------------+---------+------------+------------+
| 4 | 2 | 19 | 108.62 | 94.50 | 2016-12-29 |
+----+------------+--------------+---------+------------+------------+
| 5 | 10 | 20 | 842.58 | 733.05 | 2016-11-30 |
+----+------------+--------------+---------+------------+------------+
| 6 | 4 | 15 | 653.76 | 568.77 | 2016-08-05 |
+----+------------+--------------+---------+------------+------------+
| 7 | 4 | 14 | 746.52 | 649.48 | 2016-08-03 |
+----+------------+--------------+---------+------------+------------+
| 8 | 10 | 3 | 962.36 | 837.25 | 2016-10-30 |
+----+------------+--------------+---------+------------+------------+
| 9 | 1 | 20 | 764.05 | 664.72 | 2016-08-24 |
+----+------------+--------------+---------+------------+------------+
| 10 | 10 | 4 | 1033.42 | 899.08 | 2016-02-26 |
+----+------------+--------------+---------+------------+------------+
| 11 | 5 | 6 | 571.90 | 497.55 | 2016-10-06 |
+----+------------+--------------+---------+------------+------------+
project
+----+------------+-----------+----------------+
| id | category | author_id | minimal_amount |
+----+------------+-----------+----------------+
| 1 | music | 1 | 1677 |
+----+------------+-----------+----------------+
| 2 | music | 5 | 21573 |
+----+------------+-----------+----------------+
| 3 | travelling | 2 | 4952 |
+----+------------+-----------+----------------+
| 4 | travelling | 5 | 3135 |
+----+------------+-----------+----------------+
| 5 | travelling | 2 | 8555 |
+----+------------+-----------+----------------+
| 6 | video | 4 | 6835 |
+----+------------+-----------+----------------+
| 7 | video | 4 | 7978 |
+----+------------+-----------+----------------+
| 8 | games | 1 | 4560 |
+----+------------+-----------+----------------+
| 9 | games | 2 | 4259 |
+----+------------+-----------+----------------+
| 10 | games | 1 | 5253 |
+----+------------+-----------+----------------+
My advice is to aggregate the donations table first, then compare it to the project table.
By doing this the join between donations and project is always 1:1. This in turn means you avoid having to group by "values" (minimal_amount), instead only grouping by "identifiers" (project_id).
WITH
donation_summary AS
(
SELECT
project_id,
SUM(amount) AS total_amount
FROM
donation
GROUP BY
project_id
)
SELECT
CASE WHEN d.total_amount <= p.minimal_amount * 1.5
THEN 'good projects'
ELSE 'great projects'
END
AS tag,
COUNT(*) AS project_count
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount
GROUP BY
tag
That said, I'd normally use the following final query and get two columns rather than two rows...
SELECT
SUM(CASE WHEN d.total_amount <= p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS good_projects,
SUM(CASE WHEN d.total_amount > p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS great_projects
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount
You need to remove amount from the grouping, this should return the expected result:
WITH nice_proj AS
(SELECT project_id AS pid,
sum(amount) AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN amount/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY tag;

Complicated SELECT statement in Oracle DB

Can you please help me with one complicated select statement?
I have a table like this:
+----+-----------+-----------+-----------------+
| ID | User_name | Situation | Date_time |
+----+-----------+-----------+-----------------+
| 1 | Alex | 1 | 14.3.18 11:30 |
| 4 | Alex | 2 | 14.3.18 11:35 |
| 6 | Alex | 3 | 14.3.18 12:30 |
| 7 | Johnny | 1 | 15.3.18 10:01 |
| 9 | Johnny | 2 | 15.3.18 10:05 |
| 12 | Johnny | 3 | 15.3.18 10:20 |
| 14 | Alex | 1 | 20.3.18 20:00 |
| 15 | Alex | 2 | 20.3.18 20:25 |
| 17 | Alex | 3 | 20.3.18 21:25 |
+----+-----------+-----------+-----------------+
And I need a select statement, which will give me the following result:
User_name, Date_time_1 (Date_time of situation 1), Date_time_3 (Date_time of situation 3).
*In this case the result will have just 3 rows (2 for Alex and 1 for Johnny). Each row will contain 3 columns as described above.
And sorry for the formatting - I posted that from a mobile. I will add the result table when I will get to PC.*
That's how the output should looks like:
+----+-----------+-------------+-----------------+
| ID | User_name |Date_time_1 | Date_time_3 |
+----+-----------+-------------+-----------------+
| 1 | Alex |14.3.18 11:30| 14.3.18 12:30 |
| 2 | Johnny |15.3.18 10:01| 15.3.18 10:20 |
| 3 | Alex |20.3.18 20:00| 20.3.18 21:25 |
+----+-----------+-------------+-----------------+
You could use conditional aggregation:
SELECT User_name,
MAX(CASE WHEN Situation = 1 THEN Date_time END) AS date_time_1,
MAX(CASE WHEN Situation = 3 THEN Date_time END) AS date_time_3
FROM tab
GROUP BY User_name;
EDIT
In this case the result will have just 3 rows (2 for Alex and 1 for Johnny)
WITH cte AS (
SELECT t.*, SUM(CASE WHEN Situation=1 THEN 1 ELSE 0 END)
OVER(PARTITION BY User_name ORDER BY id) AS s
FROM tab t
)
SELECT User_name,
MAX(CASE WHEN Situation = 1 THEN Date_time END) AS date_time_1,
MAX(CASE WHEN Situation = 3 THEN Date_time END) AS date_time_3
FROM cte
GROUP BY s, User_name;
DBFiddle Demo