Why is my SQL aliasing not being recognized? - sql

This may be an incredibly simple question, but I'm not seeing what the problem here is. I'm trying to teach myself SQL and was working on an experiment to play with subqueries and aliasing. When I try to enter the following query (into BigQuery), I get an error message "Unrecognized name: cast1 at [3:1]" which persists even if I copy the COUNT lines into the outer query. Obviously there is something I'm not understanding about aliasing here, but I'm not sure where I am going wrong. I would appreciate any help from more experienced SQL users out there on how to improve, thank you in advance!
SELECT
cast__1_,
cast1 + cast2 + cast3 + cast4 AS num_films,
(
SELECT
cast__1_,
COUNT (cast__1_) AS cast1,
COUNT (cast__2_) AS cast2,
COUNT (cast__3_) AS cast3,
COUNT (cast__4_) AS cast4,
FROM `dataproject1-351413.movie_data.movies` AS movies
WHERE
cast__1_ IS NOT Null
GROUP BY
cast__1_
)
FROM `dataproject1-351413.movie_data.movies`
GROUP BY
cast__1_
(The intended result was two columns, pairing each actor with the number of films across the dataset, in case that is not clear from the query)

The structure of your query seems a bit skewed, you need to treat the aggregate query as a derived table, such as:
SELECT cast__1_, cast1 + cast2 + cast3 + cast4 AS num_films
from (
SELECT
cast__1_,
COUNT (cast__1_) AS cast1,
COUNT (cast__2_) AS cast2,
COUNT (cast__3_) AS cast3,
COUNT (cast__4_) AS cast4,
FROM `dataproject1-351413`.movie_data.movies
WHERE cast__1_ IS NOT Null
GROUP BY cast__1_
)t;

Related

Grouping a percentage calculation in postgres/redshift

I keep running in to the same problem over and over again, hoping someone can help...
I have a large table with a category column that has 28 entries for donkey breed, then I'm counting two specific values grouped by each of those categories in subqueries like this:
WITH totaldonkeys AS (
SELECT donkeybreed,
COUNT(*) AS total
FROM donkeytable1
GROUP BY donkeybreed
)
,
sickdonkeys AS (
SELECT donkeybreed,
COUNT(*) AS totalsick
FROM donkeytable1
JOIN donkeyhealth on donkeytable1.donkeyid = donkeyhealth.donkeyid
WHERE donkeyhealth.sick IS TRUE
GROUP BY donkeybreed
)
,
It's my goal to end up with a table that has primarily the percentage of sick donkeys for each breed but I always end up struggling like hell with the problem of not being able to group by without using an aggregate function which I cannot do here:
SELECT (CAST(sickdonkeys.totalsick AS float) / totaldonkeys.total) * 100 AS percentsick,
totaldonkeys.donkeybreed
FROM totaldonkeys, sickdonkeys
GROUP BY totaldonkeys.donkeybreed
When I run this I end up with 28 results for each breed of donkey, one correct I believe but obviously hundreds of useless datapoints.
I know I'm probably being really dumb here but I keep hitting in to this same problem again and again with new donkeydata, I should obviously be structuring the whole thing a new way because you just can't do this final query without an aggregate function, I think I must be missing something significant.
You can easily count the proportion that are sick in the donkeyhealth table
SELECT d.donkeybreed,
AVG( (dh.sick)::int ) AS proportion_sick
FROM donkeytable1 d JOIN
donkeyhealth dh
ON d.donkeyid = dh.donkeyid
GROUP BY d.donkeybreed

how to calculate prevalence using sql code

I am trying to calculate prevalence in sql.
kind of stuck in writing the code.
I want to make automative code.
I have check that I have 1453477 of sample size and number of people who has disease is 851451 using count.
The formula of calculating prevalence is no.of person who has disease/no.sample size.
select (COUNT(condition_id)/COUNT(person_id)) as prevalence
from disease
where condition_id=12345;
when I run above code, I get 1 as a output where I am suppose to get 0.5858.
Can some one please help me out?
Thanks!
In your current query you count the number of rows in the disease table, once using the column condition_id, once using the column person_id. But the number of rows is the same - this is why you get 1 as a result.
I think you need to find the number of different values for these columns. This can be done using count distinct:
select (COUNT(DISTINCT condition_id)/COUNT(DISTINCT person_id)) as prevalence
from disease
where condition_id=12345;
You can cast by
count(...)/count(...)::numeric(6,4) or
count(...)/count(...)::decimal
as two options.
Important point is apply cast to denominator or numerator part(in this case denominator), Do not apply to division as
(count(...)/count(...))::numeric(6,4) which again results an integer.
I am pretty sure that the logic that you want is something like this:
select avg( (condition_id = 12345)::int )
from disease;
Your version doesn't have the sample size, because you are filtering out people without the condition.
If you have duplicate people in the data, then this is a little more complicated. One method is:
select (count(distinct person_id) filter (where condition_id = 12345)::numeric /
count(distinct person_id
)
from disease;

SQL Oracle: Trying to pull a count with AND operators, New and needs experienced eyes

I am new to SQL and have had pretty good luck figuring things out thus far but I am missing something in this query:
The question is how to return a distinct count from two columns using another column and the criteria if the value is greater than 0.
I have tried IF and AND operators (My current query returns a 0 not an error, and it works when only using one .shp criteria)
select count (distinct ti.TO_ADDRESS)
from ti
where ti.input_id = 'xxx_029_01z_c_zzzzbab_ecrm.shp'
and ti.input_id = 'xxx_030_01z_c_zzzzbab_ecrm.shp'
and ti.OPENED>0;
Thanks so much!!
I think you want two levels of aggregation:
select count(*)
from (select ti.TO_ADDRESS
from ti
where ti.input_id in ('xxx_029_01z_c_zzzzbab_ecrm.shp', 'xxx_030_01z_c_zzzzbab_ecrm.shp') and
ti.OPENED > 0
group by ti.TO_ADDRESS
having count(distinct ti.input_id) = 2 -- has both of them
) ti;

Add a Percent of Total Column in Access Query

I have the following query:
SELECT [3-2017].[Dealer#], Sum([3-2017].Balance) AS SumOfBalance
FROM [3-2017]
GROUP BY [3-2017].[Dealer#];
I want to add a column that returns the percent of grand total for each dealer#. I think I need another single-row query with the totals and join or union somehow, but I'm completely lost.
I've looked through a few different posts and this is the top result, but it doesn't solve my problem, unfortunately.
Really best and easiest done in a report. However, try:
SELECT [3-2017].[Dealer#], Sum([3-2017].Balance) AS SumDealerBal,
(SELECT Sum(Balance) FROM [3-2017]) AS SumBalance, SumDealerBal/SumBalance AS DealerPct
FROM [3-2017]
GROUP BY [3-2017].[Dealer#];
Advise no spaces or special character/punctuation (underscore is only exception) in any names. Better would be DealerNum or Dealer_Num.

Getting a unique value from an aggregated result set

I've got an aggregated query that checks if I have more than one record matching certain conditions.
SELECT RegardingObjectId, COUNT(*) FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN ('55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58')
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
That's nice, but then there is one field in AsyncOperationBase that will be unique. Say count(*) = 3, well, AsyncOperationBaseId in AsyncOperationBase will have 3 different values since AsyncOperationBase is the table's primary key.
To be honest, I would not even know what terms and expressions to Google to find a solution.
If anyone has a solution and also, is there any words to describe what I'm looking for ? Perhaps BI people are often faced with such a requirement or something...
I could do it with an SSRS report where the report would visually do the grouping then I could expand each grouped row to get the AsyncOperationBaseId value, but simply through SQL, I can't seem to find a way out...
Thanks.
select * from [CRM_MSCRM].[dbo].[AsyncOperationBase]
where RegardingObjectId in
(
SELECT RegardingObjectId
FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN
(
'55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58'
)
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
)