How to calculate metrics between two tables - sql

How to calculate metrics between two tables? In addition, I noticed that when using FROM tbl1, tbl2, there are noises, the WHERE filters did not work, a total count(*) was returned
Query:
select
count(*) filter(WHERE tb_get_gap.system in ('LINUX','UNIX')) as gaps,
SUM(CAST(srvs AS INT)) filter(WHERE tb_getcountsrvs.type = 'LZ') as total,
100 - (( gaps / total ) * 100)
FROM tb_get_gap, tb_getcountsrvs
Error:
SQL Error [42703]: ERROR: column "gaps" does not exist
I need to count in the tb_get_gap table by fields = ('LINUX', 'UNIX'), then a SUM ()in thesrvs field in the
tb_getcountsrvs table by fields = 'LZ' in type, right after
making this formula 100 - ((gaps / total) * 100)

It would seem that you cannot define gaps and also use it in the same query. In SQL Server you would have to use the logic twice. Maybe a subquery would work better.
select 100 - (t.gaps / t.total) * 100)
from
(
select
count(*) filter(WHERE tb_get_gap.system in ('LINUX','UNIX')) as gaps,
SUM(CAST(srvs AS INT)) filter(WHERE tb_getcountsrvs.type = 'LZ') as total
FROM tb_get_gap, tb_getcountsrvs
) t

Related

Avoid Cartesian Join for the spark SQL query

I am trying to calculate the processRate from the total count of two temp tables but I'm getting the error "Detected implicit cartesian product for INNER join between logical plans" where I am not even performing joins. I am sure this error can be resolved by restructuring the query in correct format and I need your help on it. Below is the query,
spark.sql("""
CREATE OR REPLACE TEMPORARY VIEW final_processRate AS
SELECT
((a.total - b.total)/a.total))* 100 AS processRate
FROM
(select count (*) as total from sales) a,
(select count (*) as total from sales where status = 'PENDING') b
""")
I'm getting this error while trying to view the data using,
spark.sql("select * from processRate limit 10").show(false)
Can you please help on formatting the above query to resolve this issue and view the data of final_processRate?
You don't need subquery for this. Just use a conditional aggregation:
spark.sql("""
CREATE OR REPLACE TEMPORARY VIEW final_processRate AS
SELECT
((count(*) - count(case when status='PENDING' then 1 end)) / count(*)) * 100 AS processRate
FROM sales
""")
Then you can query the temp view using:
spark.sql("select * from final_processRate")
which should give you a single number/percentage calculated above.
I would write this as:
select avg(case when status = 'PENDING' then 0.0 else 1 end)
from sales;
This returns the proportion of rows that are not pending.

query the percentage of occurrences in an SQL table

I have a table of names, where each row has the columns name, and occurrences.
I'd like to calculate the percentage of a certain name from the table.
How can I do that in one query?
You can get it by using SUM(occurrences):
select
name,
100.0 * sum(occurrences) / (select sum(occurrences) from users) as percentage
from
users
where name = 'Bob'
Try this:
SELECT name, cast(sum(occurance) as float) /
(select sum(occurance) from test) * 100 percentage FROM test
where name like '%dog%'
Demo here
It is not very elegant due to the subquery in the field list but this will do the job if you want it in one query:
SELECT
`name`,
(CAST(SUM(`occurance`) AS DOUBLE)/CAST((SELECT SUM(`occurance`) FROM `user`) AS DOUBLE)) as `percent`
FROM
`user`
WHERE
`name`='miroslav';
Example Fiddle
Hope this helps,
I think conditional aggregation is the best approach:
select sum(case when name = #name then occurrences else 0 end) / sum(occurrences) as ratio
from t;
If you want an actual percentage between 0 and 100 multiply by 100.

calculating percentage in postgresql with conditions

I have one table and I want to calculate the percentage of one column
I tried to do so in two ways.
but I am actually face with error.
The error is 'syntax error at or near "select"'
This is my code in below:
WITH total AS
(
select krs_name,count(fclass) as cluster
FROM residentioal
GROUP BY krs_name
)
SELECT total.cluster,
total.krs_name
(select count(fclass)
FROM residentioal
where fclass='village'
or fclass='hamlet'
or fclass='suburb'
or fclass='island'
AND krs_name = total.krs_name
)::float / count(fclass) * 100 as percentageofonline
FROM residentioal, total
WHERE residentioal.krs_name = total.krs_name
GROUP BY total.krs_name total.krs_name
My table has 5437 rows in which there is 8 group of krs_name and in the other column namely fclass, there is 6 group. Therefore I want to calculate the percentage of 4 groups from fclass for each krs_name . thus, i have to first query the count(fclass) group by krs_name and then query the count of fclass where fclass is equal to my condition group by krs_name and finally count(fclass) "with condition" / count(fclass) "total fclass" * 100 goup by krs_name?
I'm using Postgresql 9.1.
The problem is in this line:
SELECT total.cluster, total.krs_name (
The open paren makes no sense.
But, this seems to do what you want and it is much simpler:
SELECT r.krs_name, COUNT(*) as total,
AVG( (fclass in ('village', 'hamlet', 'suburb', 'island'))::int ) * 100 as percentageofonline
FROM residentioal r
GROUP BY r.krs_name

Divide COUNT Column By COUNT(DISTINCT(Column To give avg order size SQL SERVER

Ok I have a list of the months orders so that bit is easy.
SELECT COUNT(*) AS ITEMS,
For the next part easy to:
COUNT(DISTINCT(PICKSET_NO))AS PICKSETS,
Its the next part I cant work out:
SUM(ITEMS/PICKSETS) AS AVGPICKSETSIZE
FROM dbo.orders
Thanks for your help on this. Here is the code in one block.
SELECT
COUNT(*) AS ITEMS,
COUNT(DISTINCT(PICKSET_NO))AS PICKSETS,
SUM(ITEMS/PICKSETS)
FROM dbo.CollationOrders
GO
Repeat the expression:
SELECT COUNT(*) AS ITEMS,
COUNT(DISTINCT PICKSET_NO) AS PICKSETS,
COUNT(*) / (1.0 * COUNT(DISTINCT PICKSET_NO))
FROM dbo.CollationOrders;
You can't re-use column aliases in the same select.
The 1.0 is to prevent integer division.
You may do like this:
SELECT ITEMS,
PICKSETS,
ITEMS / (1.0 * PICKSETS)
FROM (
SELECT COUNT(*) AS ITEMS,
COUNT(DISTINCT PICKSET_NO) AS PICKSETS
FROM TableName)t
I ended up using:
SELECT
COUNT(*) AS ITEMS,
COUNT(DISTINCT om.PicksetNo) AS PICKSETS ,
FORMAT(COUNT(*) / (1.0 * COUNT(DISTINCT om.PicksetNo)),'N2'),
ca.YWK AS YWK
FROM CHDS_Common.dbo.OMOrder om
INNER JOIN CHDS_Management.dbo.Calendar ca ON om.EarliestPickDate = ca.DT
GROUP BY ca.YWK

how to perform multiple aggregations on a single SQL query

I have a table with Three columns:
GEOID, ParcelID, and PurchaseDate.
The PKs are GEOID and ParcelID which is formatted as such:
GEOID PARCELID PURCHASEDATE
12345 AB123 1/2/1932
12345 sfw123 2/5/2012
12345 fdf323 4/2/2015
12346 dfefej 2/31/2022 <-New GEOID
What I need is an aggregation based on GEOID.
I need to count the number of ParcelIDs from last month PER GEOID
and I need to provide a percentage of that GEOID of all total sold last month.
I need to produce three columns:
GEOID Nbr_Parcels_Sold Percent_of_total
For each GEOID, I need to know how many Parcels Sold Last month, and with that Number, find out how much percentage that entails for all Solds.
For example: if there was 20 Parcels Sold last month, and 4 of them were sold from GEOID 12345, then the output would be:
GEOID Nbr_Parcels_Sold Perc_Total
12345 4 .2 (or 20%)
I am having issues with the dual aggregation. The concern is that the table in question has over 8 million records.
if there is a SQL Warrior out here who have seen this issue before, Any wisdom would be greatly appreciated.
Thanks.
Hopefully you are using SQL Server 2005 or later version, in which case you can get advantage of windowed aggregation. In this case, windowed aggregation will allow you to get the total sale count alongside counts per GEOID and use the total in calculations. Basically, the following query returns just the counts:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Total_Parcels_Sold = SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The COUNT(*) call gives you counts per GEOID, according to the GROUP BY clause. Now, the SUM(...) OVER expression gives you the grand total count in the same row as the detail count. It is the empty OVER clause that tells the SUM function to add up the results of COUNT(*) across the entire result set. You can use that result in calculations just like the result of any other function (or any expression in general).
The above query simply returns the total value. As you actually want not the value itself but a percentage from it for each GEOID, you can just put the SUM(...) OVER call into an expression:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Percent_of_total = COUNT(*) * 100 / SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The above will give you integer percentages (truncated). If you want more precision or a different representation, remember to cast either the divisor or the dividend (optionally both) to a non-integer numeric type, since SQL Server always performs integral division when both operands are integers.
How about using sub-query to count the sum
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
EDIT
To update another table by the result, use UPDATE under WITH()
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
UPDATE target SET
Nbr_Parcels_Sold = source.Nbr_Parcels_Sold,
Perc_Total = source.Perc_Total
FROM
[AnotherTable] target
INNER JOIN
(
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
) source ON target.GEOID = source.GEOID
Try the following. It grabs the total sales into a variable then uses it in the subsequent query:
DECLARE #pMonthStartDate DATETIME
DECLARE #MonthEndDate DATETIME
DECLARE #TotalPurchaseCount INT
SET #pMonthStartDate = <EnterFirstDayOfAMonth>
SET #MonthEndDate = DATEADD(MONTH, 1, #pMonthStartDate)
SELECT
#TotalPurchaseCount = COUNT(*)
FROM
GEOIDs
WHERE
PurchaseDate BETWEEN #pMonthStartDate
AND #MonthEndDate
SELECT
GEOID,
COUNT(PARCELID) AS Nbr_Parcels_Sold,
CAST(COUNT(PARCELID) AS FLOAT) / CAST(#TotalPurchaseCount AS FLOAT) * 100.0 AS Perc_Total
FROM
GEOIDs
WHERE
ModifiedDate BETWEEN #pMonthStartDate
AND #MonthEndDate
GROUP BY
GEOID
I'm guessing your table name is GEOIDs. Change the value of #pMonthStartDate to suit yourself. If your PKs are as you say then this will be a quick query.