Ratingshive table is dynamically partitioned on Genre and the table contains movie_titles and rating of movies
select 100 * stars / total
from (select count(rating) as stars
from ratingshive
where rating = 5) t1,
(select count(1) as total
from ratingshive) t2
When I run the above query in Hive, I get this error -
FAILED: ParseException line 1:100 missing EOF at ',' near 't1'
Hive does not like your query most probably because of old version. But you do not need to query the same ratingshive table two times and then cross join scalar results. Use aggregation with case:
select 100 * count(case when rating=5 then 1 end)/count(*) as rating_percent
from ratingshive;
Related
I am trying to calculate the processRate from the total count of two temp tables but I'm getting the error "Detected implicit cartesian product for INNER join between logical plans" where I am not even performing joins. I am sure this error can be resolved by restructuring the query in correct format and I need your help on it. Below is the query,
spark.sql("""
CREATE OR REPLACE TEMPORARY VIEW final_processRate AS
SELECT
((a.total - b.total)/a.total))* 100 AS processRate
FROM
(select count (*) as total from sales) a,
(select count (*) as total from sales where status = 'PENDING') b
""")
I'm getting this error while trying to view the data using,
spark.sql("select * from processRate limit 10").show(false)
Can you please help on formatting the above query to resolve this issue and view the data of final_processRate?
You don't need subquery for this. Just use a conditional aggregation:
spark.sql("""
CREATE OR REPLACE TEMPORARY VIEW final_processRate AS
SELECT
((count(*) - count(case when status='PENDING' then 1 end)) / count(*)) * 100 AS processRate
FROM sales
""")
Then you can query the temp view using:
spark.sql("select * from final_processRate")
which should give you a single number/percentage calculated above.
I would write this as:
select avg(case when status = 'PENDING' then 0.0 else 1 end)
from sales;
This returns the proportion of rows that are not pending.
I have one table and I want to calculate the percentage of one column
I tried to do so in two ways.
but I am actually face with error.
The error is 'syntax error at or near "select"'
This is my code in below:
WITH total AS
(
select krs_name,count(fclass) as cluster
FROM residentioal
GROUP BY krs_name
)
SELECT total.cluster,
total.krs_name
(select count(fclass)
FROM residentioal
where fclass='village'
or fclass='hamlet'
or fclass='suburb'
or fclass='island'
AND krs_name = total.krs_name
)::float / count(fclass) * 100 as percentageofonline
FROM residentioal, total
WHERE residentioal.krs_name = total.krs_name
GROUP BY total.krs_name total.krs_name
My table has 5437 rows in which there is 8 group of krs_name and in the other column namely fclass, there is 6 group. Therefore I want to calculate the percentage of 4 groups from fclass for each krs_name . thus, i have to first query the count(fclass) group by krs_name and then query the count of fclass where fclass is equal to my condition group by krs_name and finally count(fclass) "with condition" / count(fclass) "total fclass" * 100 goup by krs_name?
I'm using Postgresql 9.1.
The problem is in this line:
SELECT total.cluster, total.krs_name (
The open paren makes no sense.
But, this seems to do what you want and it is much simpler:
SELECT r.krs_name, COUNT(*) as total,
AVG( (fclass in ('village', 'hamlet', 'suburb', 'island'))::int ) * 100 as percentageofonline
FROM residentioal r
GROUP BY r.krs_name
select COUNT(analysed) from Results where analysed="True"
I want to display count of rows in which analysed value is true.
However, my query gives the error: "The multi-part identifier "Results.runId" could not be bound.".
This is the actual query:
select ((SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True')/failCount) as PercentAnalysed
from Runs
where Runs.runId=Analysed.runId
My table schema is:
The value I want for a particular runId is: (the number of entries where analysed=true)/failCount
EDIT : How to merge these two queries?
i) select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner from Runs,Product where Runs.prodId=Product.prodId
ii) select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I tried this :
select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner,counts.runId,(cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs,Product
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
where Runs.prodId=Product.prodId
but it gives error.
Your problems are arising from improper joining of tables. You need information from both Runs and Results, but they aren't combined properly in your query. You have the right idea with a nested subquery, but it's in the wrong spot. You're also referencing the Analysed table in the outer where clause, but it hasn't been included in the from clause.
Try this instead:
select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I've set this up as an inner join to eliminate any runs which don't have analysed results; you can change it to a left join if you want those rows, but will need to add code to handle the null case. I've also added casts to the two numbers, because otherwise the query will perform integer division and truncate any fractional amounts.
I'd try the following query:
SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True'
This will count all of your rows where Analysed is 'True'. This should work if the datatype of your Analysed column is either BIT (Boolean) or STRING(VARCHAR, NVARCHAR).
Use CASE in Count
SELECT COUNT(CASE WHEN analysed='True' THEN analysed END) [COUNT]
FROM Results
Click here to view result
select COUNT(*) from Results where analysed="True"
Hi how can I get the percentage of each record over the total?
Lets imagine I have one table with the following
ID code Points
1 101 2
2 201 3
3 233 4
4 123 1
The percentage for ID 1 is 20% for 2 is 30% and so one
how do I get it?
There's a couple approaches to getting that result.
You essentially need the "total" points from the whole table (or whatever subset), and get that repeated on each row. Getting the percentage is a simple matter of arithmetic, the expression you use for that depends on the datatypes, and how you want that formatted.
Here's one way (out a couple possible ways) to get the specified result:
SELECT t.id
, t.code
, t.points
-- , s.tot_points
, ROUND(t.points * 100.0 / s.tot_points,1) AS percentage
FROM onetable t
CROSS
JOIN ( SELECT SUM(r.points) AS tot_points
FROM onetable r
) s
ORDER BY t.id
The view query s is run first, that gives a single row. The join operation matches that row with every row from t. And that gives us the values we need to calculate a percentage.
Another way to get this result, without using a join operation, is to use a subquery in the SELECT list to return the total.
Note that the join approach can be extended to get percentage for each "group" of records.
id type points %type
-- ---- ------ -----
1 sold 11 22%
2 sold 4 8%
3 sold 25 50%
4 bought 1 50%
5 bought 1 50%
6 sold 10 20%
To get that result, we can use the same query, but a a view query for s that returns total GROUP BY r.type, and then the join operation isn't a CROSS join, but a match based on type:
SELECT t.id
, t.type
, t.points
-- , s.tot_points_by_type
, ROUND(t.points * 100.0 / s.tot_points_by_type,1) AS `%type`
FROM onetable t
JOIN ( SELECT r.type
, SUM(r.points) AS tot_points
FROM onetable r
GROUP BY r.type
) s
ON s.type = t.type
ORDER BY t.id
To do that same result with the subquery, that's going to be a correlated subquery, and that subquery is likely to get executed for every row in t.
This is why it's more natural for me to use a join operation, rather than a subquery in the SELECT list... even when a subquery works the same. (The patterns we use for more complex queries, like assigning aliases to tables, qualifying all column references, and formatting the SQL... those patterns just work their way back into simple queries. The rationale for these patterns is kind of lost in simple queries.)
try like this
select id,code,points,(points * 100)/(select sum(points) from tabel1) from table1
To add to a good list of responses, this should be fast performance-wise, and rather easy to understand:
DECLARE #T TABLE (ID INT, code VARCHAR(256), Points INT)
INSERT INTO #T VALUES (1,'101',2), (2,'201',3),(3,'233',4), (4,'123',1)
;WITH CTE AS
(SELECT * FROM #T)
SELECT C.*, CAST(ROUND((C.Points/B.TOTAL)*100, 2) AS DEC(32,2)) [%_of_TOTAL]
FROM CTE C
JOIN (SELECT CAST(SUM(Points) AS DEC(32,2)) TOTAL FROM CTE) B ON 1=1
Just replace the table variable with your actual table inside the CTE.
I have a postgresql query like this:
with r as (
select
1 as reason_type_id,
rarreason as reason_id,
count(*) over() count_all
from
workorderlines
where
rarreason != 0
and finalinsdate >= '2012-12-01'
)
select
r.reason_id,
rt.desc,
count(r.reason_id) as num,
round((count(r.reason_id)::float / (select count(*) as total from r) * 100.0)::numeric, 2) as pct
from r
left outer join
rtreasons as rt
on
r.reason_id = rt.rtreason
and r.reason_type_id = rt.rtreasontype
group by
r.reason_id,
rt.desc
order by r.reason_id asc
This returns a table of results with 4 columns: the reason id, the description associated with that reason id, the number of entries having that reason id, and the percent of the total that number represents.
This table looks like this:
What I would like to do is only display the top 10 results based off the total number of entries having a reason id. However, whatever is leftover, I would like to compile into another row with a description called "Other". How would I do this?
with r2 as (
...everything before the select list...
dense_rank() over(order by pct) cause_rank
...the rest of your query...
)
select * from r2 where cause_rank < 11
union
select
NULL as reason_id,
'Other' as desc,
sum(r2.num) over() as num,
sum(r2.pct) over() as pct,
11 as cause_rank
from r2
where cause_rank >= 11
As said above Limit and for the skipping and getting the rest use offset... Try This Site
Not sure about Postgre but SELECT TOP 10... should do the trick if you sort correctly
However about the second part: You might use a Right Join for this. Join the TOP 10 Result with the whole table data and use only the records not appearing on the left side. If you calculate the sum of those you should get your "Sum of the rest" result.
I assume that vw_my_top_10 is the view showing you the top 10 records. vw_all_records shows all records (including the top 10).
Like this:
SELECT SUM(a_field)
FROM vw_my_top_10
RIGHT JOIN vw_all_records
ON (vw_my_top_10.Key = vw_all_records.Key)
WHERE vw_my_top_10.Key IS NULL