How to calculate max, min and average in postgresql - sql

I want to separate jobs with color based on conversion of what I did as follow:
select count(ap.id),
(count(ap.id)/cast(SUM(j.views) as float) * 100) as conversion,
j.company_id
from applications ap
right join jobs j
on ap.job_id = j.id
where j.company_id = 61805
group by j.id
if conversion column is greater than 75% of total average of that result, I want to create new alias column and value will be green. if conversion column is between 35% and 75%, column value will be yellow and less than 35%, column value will be red.
Is can be possible do in postgres as above query I've mentioned? Thanks in advance.

Window functions are your friend:
SELECT count,
conversion,
CASE
WHEN conversion > avg_conv * 0.75
THEN 'green'
WHEN conversion < avg_conv * 0.35
THEN 'red'
ELSE 'yellow'
END AS color,
company_id
FROM (SELECT count,
conversion,
avg(conversion) OVER () AS avg_conv,
company_id
FROM (SELECT count(ap.id),
(count(ap.id)/cast(SUM(j.views) as float) * 100) AS conversion,
j.company_id
FROM applications ap
RIGHT JOIN jobs j
ON ap.job_id = j.id
WHERE j.company_id = 61805
GROUP BY j.company_id
) with_avg
) with_color;
(untested)

Related

SQL - Group values by range

I have following query:
SELECT
polutionmm2 AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00' group by polutionmm2
this query returns these values:
"metric","value"
50,580
100,8262
150,1548
200,6358
250,869
300,3780
350,505
400,2248
450,318
500,1674
550,312
600,7420
650,1304
700,2445
750,486
800,985
850,139
900,661
950,99
1000,550
I would need to edit the query in a way that it groups them toghether in ranges of 100, starting from 0. So everything that has a metric value between 0 and 99 should be one row, and the value the sum of the rows... like this:
"metric","value"
0,580
100,9810
200,7227
300,4285
400,2556
500,1986
600,8724
700,2931
800,1124
900,760
1000,550
The query will run over about 500.000 rows.. Can this be done via query? Is it efficient?
EDIT:
there can be up to 500 ranges, so an automatic way of grouping them would be great.
You can use generate_series() and a range type to generate the the ranges you want, e.g.:
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
This generates the ranges [0,100), [100,200) and so on up until [1000,).
You can adjust the width and the number of ranges by using different parameters for generate_series() and adjusting the expression that evaluates the last range
This can be used in an outer join to aggregate the values per range:
with ranges as (
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
)
select r.range as metric,
sum(t.value)
from ranges r
left join the_table t on r.range #> t.metric
group by range;
The expression r.range #> t.metric tests if the metric value falls into the (generated) range
Online example
You can create a Pseudo table with interval you like and join with that table.
I'll use recursive CTE for this case.
WITH RECURSIVE cte AS(
select 0 St, 99 Ed
UNION ALL
select St + 100, Ed + 100 from cte where St <= 1000
)
select cte.st as metric,sum(tb.value) as value from cte
inner join [tableName] tb --with OP query result
on tb.metric between cte.St and cte.Ed
group by cte.st
order by st
here is DB<>fiddle with some pseudo data.
use conditional aggregation
SELECT
case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00'
group by case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end

Out of range integer: infinity

So I'm trying to work through a problem thats a bit hard to explain and I can't expose any of the data I'm working with but what Im trying to get my head around is the error below when running the query below - I've renamed some of the tables / columns for sensitivity issues but the structure should be the same
"Error from Query Engine - Out of range for integer: Infinity"
WITH accounts AS (
SELECT t.user_id
FROM table_a t
WHERE t.type like '%Something%'
),
CTE AS (
SELECT
st.x_user_id,
ad.name as client_name,
sum(case when st.score_type = 'Agility' then st.score_value else 0 end) as score,
st.obs_date,
ROW_NUMBER() OVER (PARTITION BY st.x_user_id,ad.name ORDER BY st.obs_date) AS rn
FROM client_scores st
LEFT JOIN account_details ad on ad.client_id = st.x_user_id
INNER JOIN accounts on st.x_user_id = accounts.user_id
--WHERE st.x_user_id IN (101011115,101012219)
WHERE st.obs_date >= '2020-05-18'
group by 1,2,4
)
SELECT
c1.x_user_id,
c1.client_name,
c1.score,
c1.obs_date,
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
FROM CTE c1
LEFT JOIN CTE c2 on c1.x_user_id = c2.x_user_id and c1.client_name = c2.client_name and c1.rn = c2.rn +2
I know the query works for sure because when I get rid of the first CTE and hard code 2 id's into a where clause i commented out it returns the data I want. But I also need it to run based on the 1st CTE which has ~5k unique id's
Here is a sample output if i try with 2 id's:
Based on the above number of row returned per id I would expect it should return 5000 * 3 rows = 150000.
What could be causing the out of range for integer error?
This line is likely your problem:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
When the value of c2.score is 0, 1.0/c2.score will be infinity and will not fit into an integer type that you’re trying to cast it into.
The reason it’s working for the two users in your example is that they don’t have a 0 value for c2.score.
You might be able to fix this by changing to:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / NULLIF(c2.score, 0)) * 100, 0) AS INT) AS score_diff

Get column sum and use to calculate percent of total, why doesn't work with CTEs

I did this following query, however it gave the the result of 0 for each orderStatusName, does anyone know where is the problem?
with tbl as (
select s.orderstatusName, c.orderStatusId,count(c.orderId) counts
from [dbo].[ci_orders] c left join
[dbo].[ci_orderStatus] s
on s.orderStatusId = c.orderStatusId
where orderedDate between '2018-10-01' and '2018-10-29'
group by orderStatusName, c.orderStatusId
)
select orderstatusName, counts/(select sum(counts) from tbl as PofTotal) from tbl
the result is :0
You're using what is known as integer math. When using 2 integers in SQL (Server) the return value is an integer as well. For example, 2 + 2 = 4, 5 * 5 = 25. The same applies to division 8 / 10 = 0. That's because 0.8 isn't an integer, but the return value will be one (so the decimal points are lost).
The common way to change this behaviour is to multiply one of the expressions by 1.0. For example:
counts/(select sum(counts) * 1.0 from tbl) as PofTotal
If you need more precision, you can increase the precision of the decimal value of 1.0 (i.e. to 1.000, 1.0000000, etc).
Use window functions and proper division:
select orderstatusName, counts * 1.0 / total_counts
from (select t.*, sum(counts) over () as total_counts
from tbl
) t;
The reason you are getting 0 is because SQL Server does integer division when the operands are integers. So, 1/2 = 0, not 0.5.

Combine 2 complex queries into 1

I am trying to figure out if there's a way to combine these 2 queries into a single one. I've run into the limits of what I know and can't figure out if this is possible or not.
This is the 1st query that gets last year sales for each day per location (for one month):
if object_id('tempdb..#LY_Data') is not null drop table #LY_Data
select
[LocationId] = ri.LocationId,
[LY_Date] = convert(date, ri.ReceiptDate),
[LY_Trans] = count(distinct ri.SalesReceiptId),
[LY_SoldQty] = convert(money, sum(ri.Qty)),
[LY_RetailAmount] = convert(money, sum(ri.ExtendedPrice)),
[LY_NetSalesAmount] = convert(money, sum(ri.ExtendedAmount))
into #LY_Data
from rpt.SalesReceiptItem ri
join #Location l
on ri.LocationId = l.Id
where ri.Ignored = 0
and ri.LineType = 1 /*Item*/
and ri.ReceiptDate between #_LYDateFrom and #_LYDateTo
group by
ri.LocationId,
ri.ReceiptDate
Then the 2nd query computes a ratio based on the total sales for that month for each day (to be used later):
if object_id('tempdb..#LY_Data2') is not null drop table #LY_Data2
select
[LocationId] = ly.LocationId,
[LY_Date] = ly.LY_Date,
[LY_Trans] = ly.LY_Trans,
[LY_RetailAmount] = ly.LY_RetailAmount,
[LY_NetSalesAmount] = ly.LY_NetSalesAmount,
[Ratio] = ly.LY_NetSalesAmount / t.MonthlySales
into #LY_Data2
from (
select
[LocationId] = ly.LocationId,
[MonthlySales] = sum(ly.LY_NetSalesAmount)
from #LY_Data ly
group by
ly.LocationId
) t
join #LY_Data ly
on t.LocationId = ly.LocationId
I've tried using the first query as a subquery in the 2nd query group-by from clause, but that won't let me select those columns in the outer most select statement (multi part identifier couldn't be bound).
As well as putting the first query into the join clause at the end of the 2nd query with the same issue.
There's probably something I'm missing, but I'm still pretty new to SQL so any help or just a pointer in the right direction would be greatly appreciated! :)
You can try using a Common Table Expression (CTE) and window function:
if object_id('tempdb..#LY_Data') is not null drop table #LY_Data
;with
cte AS
(
select
[LocationId] = ri.LocationId,
[LY_Date] = convert(date, ri.ReceiptDate),
[LY_Trans] = count(distinct ri.SalesReceiptId),
[LY_SoldQty] = convert(money, sum(ri.Qty)),
[LY_RetailAmount] = convert(money, sum(ri.ExtendedPrice)),
[LY_NetSalesAmount] = convert(money, sum(ri.ExtendedAmount))
from rpt.SalesReceiptItem ri
join #Location l
on ri.LocationId = l.Id
where ri.Ignored = 0
and ri.LineType = 1 /*Item*/
and ri.ReceiptDate between #_LYDateFrom and #_LYDateTo
group by
ri.LocationId,
ri.ReceiptDate
)
select
[LocationId] = cte.LocationId,
[LY_Date] = cte.LY_Date,
...
[Ratio] = cte.LY_NetSalesAmount / sum(cte.LY_NetSalesAmount) over (partition by cte.LocationId)
into #LY_Data
from cte
sum(cte.LY_NetSalesAmount) over (partition by cte.LocationId) gives you the sum for each locationId. The code assume that this sum is always non-zero. Otherwise, a divide-by-0 error will occur.
Seems like all you need to do is calculate ratio in the first query.
You can do this with a correlated subquery.
SELECT
...
convert(money, sum(ri.ExtendedAmount)/(SELECT sum(ri2.ExtendedAmount)
FROM rpt.SalesReceiptItem ri2
WHERE ri2.LocationId=ri.LocationId
)
) AS ratio --extended amount/total extended amount for this location

sql work out growth in query

I'm trying to work out the percentage of growth between 2 years but it is returning growth as 0.
SELECT my.finmonth,
my.trnyear,
my.drawofficenum,
my1.ytdty,
ly1.ytdly,
CASE
WHEN my1.ytdty <> 0 THEN ( my1.ytdty - ly1.ytdly ) / ly1.ytdly * 100
ELSE 0
END AS Growth2012
FROM salestymonth my
LEFT JOIN salestyytd my1
ON my.finmonth = my1.finmonth
AND my.trnyear = my1.trnyear
AND my.drawofficenum = my1.drawofficenum
LEFT JOIN saleslyytd ly1
ON my.finmonth = ly1.finmonth
AND my.trnyear = ly1.trnyear
AND my.drawofficenum = ly1.drawofficenum
WHERE my.finmonth = '1'
ORDER BY ytdty DESC
Try 1.00*(my1.YTDTY - ly1.YTDLY) / ly1.YTDLY. If your column types are integers, you won't get non-integer results from dividing unless you force the numerator to be a decimal or float.