Calculating x users of total users - count vs. sum - sql

I have to calculate the percentage of total daily users that are x_users. X_users are defined as those in column y with records that isnull or ='null'. Isnull is a null record and 'null' is the string - both are in my tables.
For simplicity, I provided a shortened example (minus all the dimensions and group bys) of my query and subquery below.
Sample query
COUNT (DISTINCT (CASE
WHEN event_name ='launch' THEN user_id
END
)) AS daily_users,
SUM(is_null + null_str) as x_users
Sample subquery
if(column_y is null,1,0) as is_null,
if(column_y = 'null',1,0) as null_str
However, when I run this query, I am resulting in a table where my number of x_users are much higher than the daily user. That is not correct since the type of user (in this case, x_users) should be lower than the total users.
Sample final table
User
country
daily_users
x_users
1
US
5
12
2
UK
10
18
Can anyone help point me in the right direction? Any help is appreciated. Thanks!

I speculate that you want something like this:
COUNT(DISTINCT CASE WHEN event_name = 'launch' THEN user_id END) as daily_users,
COUNT(DISTINCT CASE WHEN event_name = 'launch' AND (is_null > 0 OR null_str > 0) THEN user_id END) as daily_x_users,

Related

Query to find sorted list of same columns with different conditions

I am new to SQL, I have a question which I think you guys could help me out.
with properties as
(
select count(*) as records
From propdata P
where root_tstamp >= '2020-01-01' and and
(case when min_rooms is null and max_rooms is null then 0 else 1 end) = 1 and
(1 between min_rooms and max_rooms)
)
select apartments, count(*)/properties.records as all
From propdata P inner join properties
on properties.records is not null
where root_tstamp >= '2020-01-01' and
(1 between rooms and rooms) and
(case when min_rooms is null and max_rooms is null then 0 else 1 end) = 1
group by apartments, all
order by all desc;
When I run this query I get the result as Apartments and all by sorting in descending order for the condition 1 rooms which is mentioned in where condition ((1 between min_rooms and max_rooms)).
RESULT:
APARTMENTS ALL
willington 9.893
greens apartment. 8.92
garden glow apartment. 6.82
What I need is column wise descending sorted list of apartments for each condition in terms of rooms 1, 2 ,3 in the where condition.
EXPECTED RESULT
APARTMENTS - 1ROOM ALL. APARTMENT - 2 ROOM ALL
willington 9.893 FLOWARD APARTMENTS 8.1
greens apartment. 8.92. KNIGHT ANGELS 5.8
garden glow apartment. 6.82. HOVEY APARTMENTS. 2.3
Can anyone please help me out ! Thank you
I have mentioned the result which i get now and the result i wanted. is there any way to display the same column in terms of each condition sorted in descending to show seperate. Appreciate your help thank you.
SQL is not designed to work in that way. It looks as through you need this for presentation to a human. If so, do not try to do this in your data layer, do it in your application / presentation layer.
For a natural SQL structure, anything dynamically varying, such as the apartment's categorisation by number of rooms, should be in a column's values, not a new column.
SELECT
rooms,
apartments,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS all
FROM
propdata
WHERE
root_tstamp >= '2020-01-01'
AND (1 between rooms and rooms)
AND (case when min_rooms is null and max_rooms is null then 0 else 1 end) = 1
GROUP BY
rooms,
apartments
ORDER BY
rooms,
all DESC;
Notes:
SUM(COUNT(*)) OVER () totals up the COUNT(*) values from the whole result set, avoiding the need for your common table expression.
I'm not sure exactly what the WHERE clause is meant to be doing, so I've just copied it from your question. If you explain it a bit more, with examples, that can most likely be improved / simplified too.

How to write a SQL query to calculate percentages based on values across different tables?

Suppose I have a database containing two tables, similar to below:
Table 1:
tweet_id tweet
1 Scrap the election results
2 The election was great!
3 Great stuff
Table 2:
politician tweet_id
TRUE 1
FALSE 2
FALSE 3
I'm trying to write a SQL query which returns the percentage of tweets that contain the word 'election' broken down by whether they were a politician or not.
So for instance here, the first 2 tweets in Table 1 contain the word election. By looking at Table 2, you can see that tweet_id 1 was written by a politician, whereas tweet_id 2 was written by a non-politician.
Hence, the result of the SQL query should return 50% for politicians and 50% for non-politicians (i.e. two tweets contained the word 'election', one by a politician and one by a non-politician).
Any ideas how to write this in SQL?
You could do this by creating one subquery to return all election tweets, and one subquery to return all election tweets by politicians, then join.
Here is a sample. Note that you may need to cast the totals to decimals before dividing (depending on which SQL provider you are working in).
select
politician_tweets.total / election_tweets.total
from
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%'
) election_tweets
join
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%' and
politician = 1
) politician_tweets
on 1 = 1
You can use aggregation like this:
select t2.politician, avg( case when t.tweet like '%election%' then 1.0 else 0 end) as election_ratio
from tweets t join
table2 t2
on t.tweet_id = t2.tweet_id
group by t2.politician;
Here is a db<>fiddle.

google bigQuery subqueries to joins

I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.
tableName
food.id STRING NULLABLE
food.basket.id STRING NULLABLE
food.foodType STRING NULLABLE
food.price INTEGER NULLABLE
food.printed BOOLEAN NULLABLE
food.variations RECORD REPEATED
food.variations.id INTEGER REPEATED
food.variations.amount INTEGER NULLABLE
Sample data
id basket.id. foodType. price. printed. variations.id variations.amount
1. abbcd. JUNK. 100. TRUE. NULL. NULL
2. cdefg. PIZZA. 200. TRUE. 1234. 10
2345. 20
5678. 20
3. abbcd. JUNK. 200. FALSE. 1234. 10
4. uiwka. TOAST. 500. FALSE. NULL. NULL
variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity
so now I am trying to aggregate some data for this table
I am trying to get
number of items printed (items where printed = TRUE)
number of items unprinted (items where printed = FALSE)
total cost of all items
total price of all variations
total number of unique baskets for a specific foodType
This is the query I have:
select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,
SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,
SUM(item.price) as price,
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
from tableName;
printed. unprinted. price. baskets. variations.
2. 2. 1000. 1. 60
Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?
Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)
So, below two subqueries
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
can be replace with
COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,
SUM((SELECT SUM(amount) FROM item.variations)) AS variations
Believe me or not - but result will be the same
Row printed nonPrinted price baskets variations
1 2 2 1000 1 60
So, as you can see yo don't need subqueries and you don't need joins here either
Note: in the second row - (SELECT SUM(amount) FROM item.variations) is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum ...
Hope you get it

SQL: Select Top 2 Query is Excluding Records with more than 2 Records

I just joined after having a problem writing a query in MS Access. I am trying to write a query that will pull out the first two valid samples in from a list of replicated sample results and then would like to average the sample values. I have written a query that does pull samples with only two valid samples and averages these values. However, my query doesn't pull samples where there are more than two valid sample results. Here's my query:
SELECT temp_platevalid_table.samp_name AS samp_name, avg (temp_platevalid_table.mean_conc) AS fin_avg, count(temp_platevalid_table.samp_valid) AS sample_count
FROM Temp_PlateValid_table
WHERE (Temp_PlateValid_table.id In (SELECT TOP 2 S.id
FROM Temp_PlateValid_table as S
WHERE S.samp_name = S.samp_name and s.samp_valid=1 and S.samp_valid=1
ORDER BY ID))
GROUP BY Temp_PlateValid_table.samp_name
HAVING ((Count(Temp_PlateValid_table.samp_valid))=2)
ORDER BY Temp_PlateValid_table.samp_name;
Here's an example of what I'm trying to do:
ID Samp_Name Samp_Valid Mean_Conc
1 54d2d2 1 15
2 54d2d2 1 20
3 54d2d2 1 25
The average mean_conc should be 17.5, however, with my current query, I wouldn't receive a value at all for 54d2d2. Is there a way to tweak my query so that I get a value for samples that have more than two valid values? Please note that I'm using MS Access, so I don't think I can use fancier SQL code (partition by, etc.).
Thanks in advance for your help!
Is this what you want?
select pv.samp_name, avg(pv.value_conc)
from Temp_PlateValid_table pv
where pv.samp_valid = 1 and
pv.id in (select top 2 id
from Temp_PlateValid_table as pv2
where pv2.samp_name = pv.samp_name and pv2.samp_valid = 1
)
group by pv.samp_name;
You might need avg(pv.value_conc * 1.0).

SQL - Group by an agregate function

I have a question whether if it's possible to make a group by an aggregate function.
Scenario:
I have a table which has biomass(kg) and number of individuals for everyday and a description, therefore I can calculate the total av. weight and total number of individuals within two dates as:
select
description,
sum(biomass)/sum(number_individuals) as av.weight,
sum(number_individuals) as individuals
from
Table
group by description
Which works okay, now, the thing is that I want to group those individuals separating them by weight ranges, in order to get something like:
description range(kg) number av.weigh(g)
Foo 2-3 2400 2584.48
I have tried something like
SELECT
description,
case when sum(biomass)/sum(number_individuals) >= 2000.0
and sum(biomass)*1000/sum(number_individuals) < 3000 then '2-3'
else 'nothing'
end as desc_range
FROM Table
Group by
description,
sum(biomass)/sum(number_individuals)
But it doesn't seem to work, neither using the alias desc_range ofc.
I am using Informix 9.40 TC3
Any help will be appreciated.
Best regards
If you want to aggregate on an aggregation, you usually need a subquery. However, you mention individuals, so perhaps this is what you want:
select description,
(case when biomass between 2 and 3 then '2-3'
else 'nothing'
end) as biomass
sum(biomass)/sum(number_individuals) as av.weight, sum(number_individuals) as individuals
from Table
group by description,
(case when biomass between 2 and 3 then '2-3'
else 'nothing'
end);