Adding values with condition in google bigquery - google-bigquery

I need to add some values with a condition in GoogleBigQuery
NOTICE: I edited the original question since it was not accurate enough.
Thanks to the two participants who have tried to help me.
I tried to apply the solutions kindly suggested by you but I got the same result from the pct column as a result.
Something like this:
results
Here is the more detailed definition:
TABLE
Columns:
Shop: Shop location
brand: Brands of cars sold at shoplocation
sales: sales of each brand sold at each shop_location
rank: Rank of each brand per shop location (the biggest the greater)
total_sales_shop: SUM of all brand sales per shop location
pct: percentage of sales by brand in relationship with shop location
pct_acc:
What i need to calc is pct_acc which is the cumulative sum of the shops by rank (while it has no relation with brand)
PCT_ACC
My need is to reach something like PCT_ACC, and then save the results in another one like this:endtable

You can use following query to get the required data:
select values, rank,
sum(case when rank<2 then values else 0 end) condition1
from table
group by values, rank
Need to add/remove columns from select and group by as per requirement
To get the cumulative sum you can use following query:
select shop, brand, sales, rank, total_sales_shop, pct ,
sum(pct) over (partition by shop order by rank) as pct_act
from data
And to get the final table you can use combination of case statement and group by
e.g
select shop,
max(case when rank=1 then pct_act end) as rank_1,
max(case when rank=2 then pct_act end) as rank_2,
max(case when rank=3 then pct_act end) as rank_3,
max(case when rank=4 then pct_act end) as rank_4,
max(case when rank=5 then pct_act end) as rank_5
from cumulative_sum
group by shop

If you want only the final sum for the rows with that condition you can try:
SELECT
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
If you want to get the rank and the sum only for the rows with that condition you can try doing
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
WHERE
RANK < 2
GROUP BY
Rank
Finally, if you want to get the rank and the sum considering all the rows you can try doing:
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
GROUP BY
Rank
I hope it helps

Related

T-SQL query to find the required output

I am new to SQL queries, I have some data and I am trying to find the result which is shown below.
In my sample data, I have customer ID repeating multiple times due to multiple locations, What I am looking to do is create a query which gives output shown in image output format,
If customer exists only once I take that row
If customer exists more than once, I check the country; if Country = 'US', I take that ROW and discard others
If customer exists more than once and country is not US, then I pick the first row
PLEASE NOTE: I Have 35 columns and I dont want to change the ROWS order as I have to select the 1st row in case customer exist more than once and country is not 'US'.
What I have tried: I am trying to do this using rank function but was unsuccessful. Not sure if my approach is right, Please anyone share the T-SQL query for the problem.
Regards,
Rahul
Sample data:
Output required :
I have created a (short) dbfiddle
Short explanation (to just repeat the code here on SO):
Step1:
-- select everyting, and 'US' as first row
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END;
Step2:
-- filter only rows which are first row...
SELECT *
FROM (
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
-- ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END
) x
WHERE x.R=1
I can't vouch for performance but it should work on SQL Server 2005. Assuming your table is named CustomerData try this:
select cust_id, country, Name, Sales, [Group]
from CustomerData
where country = 'US'
union
select c.* from CustomerData c
join (
select cust_id, min(country) country
from CustomerData
where cust_id not in (
select cust_id
from CustomerData
where country = 'US'
)
group by cust_id
) a on a.cust_id = c.cust_id and a.country = c.country
It works by finding all those with a record with US as the country and then unioning that with the first country from every record that doesn't have the US as a country. If min() isn't getting the country you want then you'll need to find an alternative aggregation function that will select the country you want.

SQL calculation of percentage with group by with condition and sum of distinct items

I am dealing with the table as below:
Now my goal is to calculate with SQL the amount of distinct ID's by Zip Code and the percentage of ID's that is fraudulent grouped by the Zip Code. Important note: The same ID can occur several times and sometimes be a Fraud and sometimes not. If the ID is a fraud at least once, it counts as a Fraud. Only ID's that are constantly "True" are counted as non-fraudulent.
So the desired output should look like this:
What's the most efficient way to create my query?
Use following query. This is pseudo-sql, but I think you can get the point.
We group by zip_code, count distinct id using count distinct and finally count the percentage as a ratio of overall count of items in group and items which have fraud = 1.
SELECT
zip_code,
COUNT(DISTINCT id) AS number_distinct,
((SUM(IF fraud = 'true' THEN 1 ELSE 0 END) / COUNT(*)) * 100) AS percentage
FROM
table
GROUP BY
zip_code
Use two levels of aggregation:
select zip_code, count(*) as num_ids,
avg(case when fraud = 'true' then 1.0 else 0 end) as fraud_ratio
from (select zip_code, id, count(*) as cnt,
min(fraud) as fraud
from t
group by zip_code, id
) t
group by zip_code;
Note: This uses the fact that as a string, 'true' < 'false'.

PL/SQL Group By question adding extra columns dependent on row numbers

I'm struggling with a group by. I have a query which pulls two rows of data for some stock that has been counted. The rows it returns are like this.
However, I need this to display on one row like below.
This example only has two counts taking place but other examples could have up to 4 rows so would potentially need a Count 3 and Count 4 column. The count difference needs to be the last count quantity - the first rows original quantity. There is a dstamp field which can be used to identify when each count happened.
My current SQL I'm using to pull this data is below
Select bin, sku, original_qty, (original_qty + count_qty) countQty, count_difference, quantity, counter
FROM stock_counts
order by bin, dstamp DESC
You are not even returning dstamp in the results. But if you want to pivot, you can use conditional aggregation. It is not really clear what all the columns mean. But you can readily pivot the quantities by time using:
select bin, sku,
max(case when seqnum = 1 then countQty end) as original_qty,
max(case when seqnum = 2 then countQty end) as qty1,
max(case when seqnum = 3 then countQty end) as qty2,
max(case when seqnum = 4 then countQty end) as qty3
from (select sc.*,
row_number() over (partition by sku, bin order by dstamp) as seqnum
from stock_counts sc
) sc
group by sku, bin;
Of course, you need to have enough columns to cover the number of quantities you are concerned about.

Use where clause in count function

I'm looking to create a table with 2 columns. 1 column contains the amount paid each month while the other column contains the number of customer who ordered that month and paid that month.
select sum(paid), count(distinct customer where Order_Month = Paid_Month)
from DataTable
group by Paid_Month
Is there an easy way of doing so?
Use a case expression:
select sum(paid),
count(distinct case when Order_Month = Paid_Month then customer end)
from DataTable
group by Paid_Month;

Sum values from one column if Index column is distinct?

How do I sum values from one column when index column is distinct?
Initially, I had this SQL query:
SELECT COALESCE(SUM(ISNULL(cast(Quantity as int),0)),0) AS QuantitySum FROM Records
Also tried to do this, but this is incorrect when some Quantity values happen to be the same:
SELECT COALESCE(SUM(DISTINCT ISNULL(cast(Quantity as int),0)),0) AS QuantitySum FROM Records
How can I fix this query to sum only records quantity that is distinct by Index value?
Example of Table:
Index Quantity
AN121 40
AN121 40
BN222 120
BN111 20
BN2333 40
So.. I want to return 220
I have duplicate Ids, but quantity can be the same for different records
Do you mean that you only want to sum one value of quantity for each individual value of the index column?
select sum(case when row_number() over (partition by `index` order by newid()) = 1
then cast(Quantity as int)
end) as QuantitySum
from Records;
Or, do you mean that you only want to sum values of quantity when there is exactly one row with a given index value:
select sum(case when count(*) over (partition by `index`) = 1
then cast(Quantity as int)
end) as QuantitySum
from Records;
Both of these use window functions to restrict the values being processed.
Also, a column called quantity should be stored as a numeric type, so conversion isn't needed to take the sum.
You can try something like:
SELECT DISTINCT COL1
, SUM(COL2)
FROM MYTABLE
GROUP BY COL1
You can use this, if you have duplicated Ids and Quantity:
SELECT COALESCE(SUM(DISTINCT ISNULL(cast(Quantity as int),0)),0) AS QuantitySum
FROM (SELECT Id, Min(Quantity) From Records group by Id)