google bigQuery subqueries to joins - sql

I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.
tableName
food.id STRING NULLABLE
food.basket.id STRING NULLABLE
food.foodType STRING NULLABLE
food.price INTEGER NULLABLE
food.printed BOOLEAN NULLABLE
food.variations RECORD REPEATED
food.variations.id INTEGER REPEATED
food.variations.amount INTEGER NULLABLE
Sample data
id basket.id. foodType. price. printed. variations.id variations.amount
1. abbcd. JUNK. 100. TRUE. NULL. NULL
2. cdefg. PIZZA. 200. TRUE. 1234. 10
2345. 20
5678. 20
3. abbcd. JUNK. 200. FALSE. 1234. 10
4. uiwka. TOAST. 500. FALSE. NULL. NULL
variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity
so now I am trying to aggregate some data for this table
I am trying to get
number of items printed (items where printed = TRUE)
number of items unprinted (items where printed = FALSE)
total cost of all items
total price of all variations
total number of unique baskets for a specific foodType
This is the query I have:
select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,
SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,
SUM(item.price) as price,
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
from tableName;
printed. unprinted. price. baskets. variations.
2. 2. 1000. 1. 60
Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?

Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)
So, below two subqueries
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
can be replace with
COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,
SUM((SELECT SUM(amount) FROM item.variations)) AS variations
Believe me or not - but result will be the same
Row printed nonPrinted price baskets variations
1 2 2 1000 1 60
So, as you can see yo don't need subqueries and you don't need joins here either
Note: in the second row - (SELECT SUM(amount) FROM item.variations) is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum ...
Hope you get it

Related

Calculating x users of total users - count vs. sum

I have to calculate the percentage of total daily users that are x_users. X_users are defined as those in column y with records that isnull or ='null'. Isnull is a null record and 'null' is the string - both are in my tables.
For simplicity, I provided a shortened example (minus all the dimensions and group bys) of my query and subquery below.
Sample query
COUNT (DISTINCT (CASE
WHEN event_name ='launch' THEN user_id
END
)) AS daily_users,
SUM(is_null + null_str) as x_users
Sample subquery
if(column_y is null,1,0) as is_null,
if(column_y = 'null',1,0) as null_str
However, when I run this query, I am resulting in a table where my number of x_users are much higher than the daily user. That is not correct since the type of user (in this case, x_users) should be lower than the total users.
Sample final table
User
country
daily_users
x_users
1
US
5
12
2
UK
10
18
Can anyone help point me in the right direction? Any help is appreciated. Thanks!
I speculate that you want something like this:
COUNT(DISTINCT CASE WHEN event_name = 'launch' THEN user_id END) as daily_users,
COUNT(DISTINCT CASE WHEN event_name = 'launch' AND (is_null > 0 OR null_str > 0) THEN user_id END) as daily_x_users,

SQL - query to SUM based on certain criteria from other columns

I have an issue where I need to essentially categorically sum dollars based on criteria from different columns. For instance, there are multiple ways that a Client can be categorized and in the categories, the only one that matters is HIT. If a client has at least one line that contains HIT it should always be categorized as HIT even when that specific line isn't (example (line 9). As you can see in my data set, client A has lines that are both HIT and NONE but since Client A has at least one line that is HIT, all of the dollars should be categorized as HITS dollars. Since none of the other clients have HIT categories, all of their dollars would go into NOT.
CLIENT DOLLARS CATEGORY
A 12434 HIT
B 212 NONE
C 21 NONE
D 1231 NONE
B 784 NONE
A 43577 HIT
D 64 NONE
A 123 NONE
D 12 NONE
A 53 NONE
A 10 NONE
I'm trying to build this into a CASE ie.
SELECT CASE
WHEN category = 'HIT' THEN 'HITS'
WHEN category <> 'HIT' THEN 'NOT'
ELSE 'OTHER' END AS 'RESULT'
SUM(dollars) AS Dollars
FROM table 1
GROUP BY 'RESULT'
Obviously this won't pick up HIT Dollars for Client A when the category is NONE. Any help would be greatly appreciated.
Thanks!
You could join your table with a subquery that lists the hit clients:
select (case when (hits.client is null)
then 0
else 1
end) as hit,
sum(dollars) as Dollars
from t
left outer join ( select distinct client
from t
where category = 'HIT' ) hits
on t.client = hits.client
group by hit;
SQL fiddle: http://sqlfiddle.com/#!9/8d403e/7/0
You need more information. I would suggest window functions:
SELECT (CASE WHEN is_hit = 1 THEN 'HITS'
ELSE 'NOT'
END) as result,
SUM(dollars) AS Sum_Dollars
FROM (SELECT dollars,
MAX(CASE WHEN category = 'HIT' THEN 1 ELSE 0 END) OVER (PARTITION BY client) as is_hit
FROM t
)
GROUP BY is_hit
SQL Fiddle: http://sqlfiddle.com/#!4/40351/6

Postgresql: Query to know which fraction of the values are larger/smaller

I would like to query my database to know which fraction/percentage of the elements of a table are larger/smaller than a given value.
For instance, let's say I have a table shopping_list with the following schema:
id integer
name text
price double precision
with contents:
id name price
1 banana 1
2 book 20
3 chicken 5
4 chocolate 3
I am now going to buy a new item with price 4, and I would like to know where this new item will be ranked in the shopping list. In this case the element will be greater than 50% of the elements.
I know I can run two queries and count the number of elements, e.g.:
-- returns = 4
SELECT COUNT(*)
FROM shopping_list;
-- returns = 2
SELECT COUNT(*)
FROM shopping_list
WHERE price > 4;
But I would like to do it with a single query to avoid post-processing the results.
if you just want them in single query use UNION
SELECT COUNT(*), 'total'
FROM shopping_list
UNION
SELECT COUNT(*),'greater'
FROM shopping_list
WHERE price > 4;
The simplest way is to use avg():
SELECT AVG( (price > 4)::float)
FROM shopping_list;
One way to get both results is as follows:
select count(*) as total,
(select count(*) from shopping_list where price > 4) as greater
from shopping_list
It will get both results in a single row, with the names you specified. It does, however, involve a query within a query.
I found the aggregate function PERCENT_RANK which does exactly what I wanted:
SELECT PERCENT_RANK(4) WITHIN GROUP (ORDER BY price)
FROM shopping_list;
-- returns 0.5

T-SQL SUM All with a Conditional COUNT

I have a query that produces the following:
Team | Member | Cancelled | Rate
-----------------------------------
1 John FALSE 150
1 Bill TRUE 10
2 Sarah FALSE 145
2 James FALSE 110
2 Ashley TRUE 0
What I need is to select the count of members for a team where cancelled is false and the sum of the rate regardless of cancelled status...something like this:
SELECT
Team,
COUNT(Member), --WHERE Cancelled = FALSE
SUM(Rate) --All Rows
FROM
[QUERY]
GROUP BY
Team
So the result would look like this:
Team | CountOfMember | SumOfRate
----------------------------------
1 1 160
2 2 255
This is just an example. The real query has multiple complex joins. I know I could do one query for the sum of the rate and then another for the count and then join the results of those two together, but is there a simpler way that would be less taxing and not cause me to copy and paste an already complex query?
You want a conditional sum, something like this:
sum(case when cancelled = 'false' then 1 else 0 end)
The reason for using sum(). The sum() is processing the records and adding a value, either 0 or 1 for every record. The value depends on the valued of cancelled. When it is false, then the sum() increments by 1 -- counting the number of such values.
You can do something similar with count(), like this:
count(case when cancelled = 'false' then cancelled end)
The trick here is that count() counts the number of non-NULL values. The then clause can be anything that is not NULL -- cancelled, the constant 1, or some other field. Without an else, any other value is turned into NULL and not counted.
I have always preferred the sum() version over the count() version, because I think it is more explicit. In other dialects of SQL, you can sometimes shorten it to:
sum(cancelled = 'false')
which, once you get used to it, makes a lot of sense.

My aggregate is not affected by ROLLUP

I have a query similar to the following:
SELECT CASE WHEN (GROUPING(Name) = 1) THEN 'All' ELSE Name END AS Name,
CASE WHEN (GROUPING(Type) = 1) THEN 'All' ELSE Type END AS Type,
sum(quantity) AS [Quantity],
CAST(sum(quantity) * (SELECT QuantityMultiplier FROM QuantityMultipliers WHERE a = t.b) AS DECIMAL(18,2)) AS Multiplied Quantity
FROM #Table t
GROUP BY Name, Type WITH ROLLUP
I'm trying to return a list of Names, Types, a summed Quantity and a summed quantity multiplied by an arbitrary number. All fine so far. I also need to return a sub-total row per Name and per Type, such as the following
Name Type Quantity Multiplied Quantity
------- --------- ----------- -------------------
a 1 2 4
a 2 3 3
a ALL 5 7
b 1 6 12
b 2 1 1
b ALL 7 13
ALL ALL 24 40
The first 3 columns are fine. I'm getting null values in the rollup rows for the multiplied quantity though. The only reason I can think this is happening is because SQL doesn't recognize the last column as an aggregate now that I've multiplied it by something.
Can I somehow work around this without things getting too convoluted?
I will be falling back onto temporary tables if this can't be done.
In your sub-query to acquire the multiplier, you have WHERE a=b. Are either a or b from the tables in your main query?
If these values are static (nothing to do with the main query), it looks like it should be fine...
If the a or b values are the name or type field, they can be NULL for the rollup records. If so, you can change to something similiar to...
CAST(sum(quantity * (<multiplie_query>)) AS DECIMAL(18,2)).
If a or b are other field from your main query, you'd be getting multiple records back, not just a single multiplier. You could change to something like...
CAST(sum(quantity) * (SELECT MAX(multiplier) FROM ...)) AS DECIMAL(18,2))