How to streamline mssql query that includes calculation - sql

I have 3 tables containing data that I am attempting to get counts and then do calculations. I have a working query but it repetitious.
SELECT person_id,
(SELECT COUNT(*) from place_to_go where people.person_id=person_id) as 'Num_To_Go',
(SELECT COUNT(*) from place_been where people.person_id=person_id) as 'Num_Visited',
((​SELECT​ COUNT(*)​ ​FROM​ place_been ​WHERE​ people.person_id=person_id)​ /​ (​SELECT COUNT(*) ​FROM​ place_to_go ​WHERE​ people.person_id=person_id))​ ​*​ 100 ​AS​ ​'Perc_Visited'
FROM people;
What I'm trying accomplish is to not have the repeated sub queries for the percentage calculation. Any changes I make to that end in syntax errors and it's getting quite frustrating.
Thought I may have been able to use
SELECT person_id,
(SELECT COUNT(*) from place_to_go where people.person_id=person_id) as 'Num_To_Go',
(SELECT COUNT(*) from place_been where people.person_id=person_id) as 'Num_Visited',
(CONVERT(DECIMAL(3,0), 'Num_To_Go'))/(CONVERT(DECIMAL(3,0), 'Num_Visited')​) ​*​ 100 ​AS​ ​'Perc_Visited'
FROM people;
But that ends in an error converting data type varchar to numeric
Any pointers would be very much appreciated.

I would use APPLY :
SELECT person_id, Num_To_Go, Num_Visited, (Num_To_Go * 1.0 / Num_Visited) * 100 AS Perc_Visited
FROM people p OUTER APPLY
( SELECT COUNT(*) AS Num_To_Go
FROM place_to_go pg
WHERE P.person_id = pg.person_id
) pg OUTER APPLY
( SELECT COUNT(*) AS Num_Visited
FROM place_been pb
WHERE p.person_id = pb.person_id
) pb;

You can try using subquery
select *, (CONVERT(DECIMAL(3,0), Num_To_Go))/(CONVERT(DECIMAL(3,0), Num_Visited)​) ​*​ 100.00 ​AS​ ​'Perc_Visited'
from
(
SELECT person_id,
(SELECT COUNT(*) from place_to_go where people.person_id=person_id) as 'Num_To_Go',
(SELECT COUNT(*) from place_been where people.person_id=person_id) as 'Num_Visited',
FROM people
)A

This is a bit of a stab in the dark, however, perhaps:
SELECT p.person_id,
COUNT(DISTINCT p2g.{id_column}) AS NumToGo,
COUNT(DISTINCT pb.{id_column}) AS NNumVisited,
((COUNT(DISTINCT pb.{id_column}) * 1.0) / COUNT(DISTINCT p2g.{id_column})) * 100 AS Perc_Visited --* 1.0 due to integer math. I.e. 99/100 = 0
FROM people p
LEFT JOIN place_to_go p2g ON p.person_id = p2g.person_id
LEFT JOIN place_been pb ON p.person_id = pb.person_id
GROUP BY person_id;

Here's how I'd tackle it: Runnable Example
select ppl.person_id
, coalesce(ptg.cnt,0) as 'Num_To_Go'
, coalesce(pb.cnt,0) as 'Num_Visited'
, case
when coalesce(ptg.cnt,0) = 0 then 100 --avoid /0 error ; if there are no places to go let's say we've been to them all
else 100.0 * coalesce(pb.cnt,0) / ptg.cnt
end 'Perc_Visited'
from people ppl
left outer join (select person_id, count(1) cnt from place_to_go group by person_id) ptg on ptg.person_id = ppl.person_id
left outer join (select person_id, count(1) cnt from place_been group by person_id) pb on pb.person_id = ppl.person_id
I've moved the queries to get counts into subqueries under the FROM clause; so you get a count per person once for each of the tables (place_to_go, place_been), then reuse those results any time you require them.
I join those subqueries using the person_id field. I've used left outer joins so that even if a person doesn't have any records in either table, we still see that person in the results.
I use coalesce(cnt,0) to ensure that should there be no records associated with a person we see 0 instead of null.
I stuck a case statement around the logic to calculate a percentage, since division is involved and potentially the divisor may be 0, resulting in a divide-by-zero error. This case statement ensures that in such situations we return 100%; and only use the calculation where we're safe from this exception.
Finally, I stuck 100.0 * in instead of 100 * to ensure our solution can return non-integer results; i.e. so we're not truncated to 0 decimal places.
However, there's also an issue with your design. It assumes that every place you've been to is listed in the places to go table. If that assumption's true, you're better off having one table for places_to_go with a field to flag whether or not you've been. That way you enforce that rule in your code, improve performance, and reduce space.
i.e. Runnable Example
create table places_to_go
(
place_id bigint not null foreign key references places(place_id)
, person_id bigint not null foreign key references people(person_id)
, have_been bit not null default (0)
--& indexes / primary key field for this table / whatever else as required
)
select ppl.person_id
, coalesce(ptg.cnt_to_go,0) as 'Num_To_Go'
, coalesce(ptg.cnt_have_been,0) as 'Num_Visited'
, case
when coalesce(ptg.cnt_to_go,0) = 0 then null --avoid /0 error ;
else 100.0 * ptg.cnt_have_been / ptg.cnt_to_go
end 'Perc_Visited'
from people ppl
left outer join
(
select person_id
, count(1) cnt_to_go
, count(case when have_been = 1 then 1 end) cnt_have_been
from place_to_go
group by person_id
) ptg
on ptg.person_id = ppl.person_id

Wrap your query up in a derived table. Do the final calculation on its result:
select person_id, [Num_To_Go], [Num_Visited],
[Num_To_Go] * 100.0 / [Num_Visited]​ ​AS​ ​[Perc_Visited]
from
(
SELECT person_id,
(SELECT COUNT(*) from place_to_go where people.person_id=person_id) as [Num_To_Go],
(SELECT COUNT(*) from place_been where people.person_id=person_id) as [Num_Visited]
FROM people
) dt
Or have a CTE (common table expression):
with cte as
(
SELECT person_id,
(SELECT COUNT(*) from place_to_go where people.person_id=person_id) as [Num_To_Go],
(SELECT COUNT(*) from place_been where people.person_id=person_id) as [Num_Visited]
FROM people
)
select person_id, [Num_To_Go], [Num_Visited],
[Num_To_Go] * 100.0 [Num_Visited] ​AS​ [Perc_Visited]
from cte

Related

How to get rid of VIEW in this request

CREATE VIEW A1 AS
SELECT client_ID , COUNT(dog_id)
FROM test_clients
GROUP BY client_ID
HAVING COUNT(dog_id)=2;
CREATE VIEW A2 AS
SELECT filial , COUNT(A1.client_ID)
FROM A1
JOIN test_clients USING (client_ID)
GROUP BY filial
HAVING COUNT(A1.client_ID)>10;
SELECT COUNT(filial)
FROM A2;
As far as I understand, this can be done through a subquery, but how?
Burns down to:
SELECT count(*)
FROM (
SELECT 1
FROM (
SELECT client_id
FROM test_clients
GROUP BY 1
HAVING count(dog_id) = 2
) a1
JOIN test_clients USING (client_id)
GROUP BY filial
HAVING count(*) > 10
) a2;
Assuming filial is defined NOT NULL.
Probably faster to use a window function and get rid of the self-join:
SELECT count(*)
FROM (
SELECT 1
FROM (
SELECT filial
, count(dog_id) OVER (PARTITION BY client_id) AS dog_ct
FROM test_clients
) a1
WHERE dog_ct = 2
GROUP BY filial
HAVING count(*) > 10
) a2;
Depending on your exact table definition we might be able to optimize a bit further ...
A slight refractor of Erwin's suggestion, just for you to play around with...
The outer query works because...
the inner query happens first
the WHERE clause happens next
then the GROUP BY and HAVING clauses
then the SELECT clause (so the COUNT() OVER ())
finally the DISTINCT
SELECT
DISTINCT
COUNT(filial) OVER ()
FROM
(
SELECT
filial,
client_id,
COUNT(dog_id) OVER (PARTITION BY client_id) AS client_dog_ct
FROM
test_clients
)
count_dogs
WHERE
client_dog_ct = 2
GROUP BY
filial
HAVING
COUNT(DISTINCT client_id) > 10
You may or may not want the COUNT(DISTINCT client_id), its not clear. So, play with that too.
I'm not saying it's any better, just that it's different and might help your learning.

Put together two selects into one

Could you help me put the second select into first one? I need calculate rate of type in first select. Second select works good.
First select:
WITH "global" AS (
SELECT
m.id
,json_build_array(
ce.payload->>'Name',
ce.payload->>'Date',
ce.payload->>’Type,
ce.payload->>’Rate’,
row_number() over (partition by m.id order by ce.payload->>’Date’ desc)) as "value"
FROM public."events" ce
LEFT OUTER JOIN "external"."mapping" m
ON ce.id=m.id
WHERE ce.type IN ('cs_calls','pc_calls')
AND coalesce(ce.payload ->> 'Name', '')!=''
AND m.id IS NOT NULL
)
SELECT
id,
value
FROM “global”
Second select:
select
id,
cast(issue as float)/cast(total_count as float) as Rate
from (select
id,
sum(case when type='Issue' then 1 else 0 end) as issue,
count(*) total_count
from events
GROUP BY id)
If Id is the way to join this tables then you can try the following
select
g.id,
g.value,
((issue * 1.0) / total_count) as Rate
from
(
select
id,
sum(case when type='Issue' then 1 else 0 end) as issue,
count(*) total_count
from events
group by
id
) e
join global g
on e.id = g.id

Oracle - optimising SQL query

I have two tables - countries (id, name) and users (id, name, country_id). Each user belongs to one country. I want to select 10 random users from the same random country. However, there are countries that have less than 10 users, so I can't use them. I need to select only from those countries, that have at least 10 users.
I can write something like this:
SELECT * FROM(
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
WHERE u.country_id =
(
SELECT *
FROM
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM users
{MANY_OTHER_JOINS_AND_CONDITIONS}
GROUP BY users.country_id
) X ON X.country_id = c.id
WHERE X.cnt >= 10
ORDER BY DBMS_RANDOM.RANDOM
) Y
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10
However, In my real scenario, I have more conditions and joins to other tables for determining which user is applicable. By using this query, I must have these conditions on two places - in query that actually selects data and in the count subquery.
Is there any way how to write query like this but without having those other conditions on two places (which is probably not good performance-wise)?
You can use a CTE for the user criteria to avoid repeating the logic and to allow the DB to cache that set once (though in my experience the DB isn't as good at that as it should be, so check your execution plan).
I'm more of a Sql Server guy than Oracle, and syntax is subtly different so this may need some tweaks yet, but try this:
WITH SafeUsers (ID, Name, country_id) As
(
--criteria for users only has to specified here
SELECT ID, Name, country_id
FROM users
WHERE ...
),
RandomCountry (ID) As
(
SELECT ID
FROM (
SELECT u.country_id AS ID
FROM SafeUsers u -- but we reference it HERE
GROUP BY u.country_id
HAVING COUNT(u.Id) >= 10
ORDER BY DBMS_RANDOM.RANDOM
) c
WHERE ROWNUM = 1
)
SELECT u.*
FROM (
SELECT s.*
FROM SafeUsers s -- and HERE
INNER JOIN RandomCountry r ON s.country_id = r.ID
ORDER BY DBMS_RANDOM.RANDOM
) u
WHERE ROWNUM <= 10
And by removing nesting and introducing names for each intermediate step, this query is suddenly much easier to read and maintain.
you could create a view
for
create view user_with_many_cond as
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
ths looking to your query
You could use having instead of a where outside the query
The order by seems could be placed inside the inner query
so the filter for one row
SELECT * FROM(
SELECT *
FROM user_with_many_cond u
WHERE u.country_id =
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM user_with_many_cond
GROUP BY users.country_id
HAVING cnt >=10
ORDER BY DBMS_RANDOM.RANDOM
) X ON X.country_id = c.id
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10
To get countries with more than 10 users:
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as rn
FROM users
GROUP BY users.country_id having count(*) > 10
Use this as a sub-query to choose a country and grab some users:
with ctry as (
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as ctry_rn
FROM users
GROUP BY users.country_id having count(*) > 10
)
, usr as (
select user_id
, row_number() over (order by dbms_random.value()) as usr_rn
from ctry
join users
on users.country_id = ctry.country_id
where ctry.ctry_rn = 1
)
select users.*
from usr
join users
on users.user_id = usr.user_id
where usr.usr_rn <= 10
/
This example ignores your {MANY_OTHER_JOINS_AND_CONDITIONS}: please inject them back where you need them.

SQL I have to find the entire row of people that did something the same day. count function?

I have a table called Donates.
I have to find all d_names who donated more than once on a single day.
I have no idea how to combine those 2 queries.
Any help is appreciated.
This is my table.
3 fields.
donors receivers giftdate
a donor could only give a receiver a gift one time.
Donors can donate more than once and receivers can receive more than once.
I just have to find who donated a gift more than once on a day. But i need to know when and to who.
You are correct that you would use COUNT, and you would use a HAVING clause to filter:
select d_name
from Donates
group by d_name
having count(1) > 1
You will of course need to add whatever other clauses to meet your requirements, such as limiting to or grouping by day. The simplest being to limit the results to one single day (you can use both WHERE and HAVING in the same query):
select d_name
from Donates
where g_date = #Date
group by d_name
having count(1) > 1
Responding to your comment, you can join on this query as a derived table:
select *
from Donates
inner join (
select d_name
from Donates
where g_date = #Date
group by d_name
having count(1) > 1
) x on Donates.d_name = x.d_name
After all the comments in multiple places, I believe you're finally looking for something like:
select d_name, r_name, g_date
from Donates
inner join (
select d_name, g_date
from Donates
group by d_name, g_date
having count(1) > 1
) x on Donates.d_name = x.d_name and Donates.g_date = x.g_date
OP now says he is using Oracle, can't use GROUP BY, and wants all fields in the table.
He wants donors who donated more than once in any given day (regardless of the receivers).
select distinct d1.*
from Donates d1
inner join Donates d2
on d1.donors = d2.donors
and trunc(d1.giftdate) = trunc(d2.giftdate)
and d1.rowid < d2.rowid
;
select *
from Donates
where d_name in (
select d_name
from Donates
where cast(d_date as Date) in (
select cast(d_date as Date)
from Donates
group by cast(d_date as Date)
having count(cast(d_date as Date)) > 1
)
group by d_name
)
I would suggest simply using analytic functions:
select d.*
from (select d.*, count(*) over (partition by trunc(d.giftdate), d.name) as cnt
from donates d
) d
where cnt > 1;

Inner join to check tables contain same values not working as expected

SELECT COUNT(1) FROM own.no_preselection_1_a;
SELECT COUNT(1) FROM own.no_preselection_1;
SELECT COUNT(1) FROM
(SELECT DISTINCT * FROM own.no_preselection_1_a
);
SELECT COUNT(1) FROM
(SELECT DISTINCT * FROM own.no_preselection_1
);
SELECT COUNT(1)
FROM OWN.no_preselection_1 t1
INNER JOIN OWN.no_preselection_1_a t2
ON t1.number = t2.number
AND t1.location_number = t2.location_number;
This returns:
COUNT(1)
----------------------
398
COUNT(1)
----------------------
398
COUNT(1)
----------------------
308
COUNT(1)
----------------------
308
COUNT(1)
----------------------
578
If we look at the visual explanation of joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
The problem is on that last query. I would have thought that if the sets are the same (ie a perfect overlap), then the inner join would would return a set the size of the original sets.
Is the problem that each of the duplicates are creating entries for all of each other? (eg if there are 3 dupes of the same value on each table, it would create 3x3 = 9 entries for it?)
What's the solution here? (Just select the distincts to do the inner join on?) Is this a good test for checking if two tables contain the same data?
You have duplicates in your table, as the first and third, and second and fourth counts in your list make clear.
The join is working as it should, so there is no "problem". What are you trying to accomplish? Your goal is not being satisfied by the join.
I would suggest that you annotate your question with some actual data and the results that you want.
If you want to show that the two tables have the same values, you might try a union. Assuming that all the columns are the same in both tables and the columns in a row uniquely identify each row:
select t.*
from ((select '1' as which, t.*
from OWN.no_preselection_1 t
) union all
(select '1-a' as which, t.*
from OWN.no_preselection_1_a
)
) t
group by < all the columns in the tables >
having count(*) <> 1
If you are limited to the two columns and want to see if there are corresponding entries (with duplicates), the following works:
select t.*
from ((select '1' as which, number, location_number,
row_number() over (partition by number, location_number order by number) as seqnum
from OWN.no_preselection_1 t
) union all
(select '1-a' as which, number, location_number,
row_number() over (partition by number, location_number order by number) as seqnum
from OWN.no_preselection_1_a
)
) t
group by number, location_number, seqnum
having count(*) <> 1