How to GROUP BY a column/s in data studio - sql

I have a sales table with the purchase history of multiple customers. Needless to say a single customer can appear multiple times in the table. I need to group by the customers and do a count of the industries each customer works in and visualize it in a table in data studio. I need to do all of this in data studio itself.
In big query the syntax would look something like this:
SELECT Industry, count(industry) AS industry_count
FROM (SELECT
CustomerID,
Industry
FROM `project1.pr.df_full`
WHERE segment = 'Lost'
GROUP BY CustomerID, Industry)
GROUP BY Industry
ORDER BY industry_count DESC
How can I achieve the same thing in data studio? The WHERE clause doesn't have to be there because I have a segment filter on the page I'm trying to do this on

As I said in the comment, I reproduced your query and it worked fine.
Here you can see a guide about how to connect BigQuery to DataStudio
Please, notice that DataStudio have some limitations in the query syntax:
If you need further information, please let me know

You could query the raw data, and make calculations on Data Studio side. Be sure to use the field you need to group by as dimension.
SELECT
CustomerID,
Industry,
segment
FROM `project1.pr.df_full`
Then in a Data Studio table use "Industry" field as dimension, and "CustomerId" field as metric, using Count as the aggregation for the metric. As you also have a "segment" field in your data source, filtering by this field will not be a problem.
I hope this help!

I'm wondering why you don't write the query as:
SELECT CustomerID, COUNT(DISTINCT Industry) as industry_count
FROM `project1.pr.df_full`
WHERE segment = 'Lost'
GROUP BY CustomerID
ORDER BY industry_count DESC;

Related

Query GROUP BY and COUNT

I'm new to SQL and taking COURSERA's "SQL for Data Science" course.I have the following question in a summary assignment:
Show the number of orders placed by each customer and sort the result by the number of orders in descending order.
Having failed to write the correct code, the answer would be as follows (of course one of several options):
SELECT *
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC
I am still having trouble understanding the query logic. I would appreciate your assistance in understanding this query.
I seriously hope that Coursera isn't giving you the query you cited above as the recommended answer. It won't run on most databases, and even in cases such as MySQL where it might run, it is not completely correct. You should be using this version:
SELECT CustomerId, COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC;
A basic rule of GROUP BY is that the only columns available for selection are those which appear in the GROUP BY clause. In addition to these columns, aggregates of any column(s) may also appear in the select. The version I gave you above follows these rules, and is ANSI compliant, meaning it would run on any database.
When you say SELECT * it represents ALL COLUMNS. But you are grouping by only CustomerId which is wrong in SQL.
Specify the other columns in the group section that you want to show
The script should be something like
SELECT CustomerName, DateEntered
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId, CustomerName, DateEntered
ORDER BY number_of_orders DESC

Repeat a query over the parameter list

I would like to iterate the same query while using different parameter values from a predefined list.
Say, I have a table with two columns. The first columns contains customer name. The second column contains customer spending.
###CUSTOMER; SPENDING###
customer1; 1000
customer2; 111
customer3; 100
customer1; 323
...
I know the complete list of customers: customerlist = {customer1, customer2, customer3}.
I would like to do something like:
Select sum(spending)
from mytable
where customer = #customerlist
The query should compute the sum of spending for each customer defined in the customer list. I have found some examples of sql procedures with stored parameters but not the case with one parameter of multiple values.
Thank you
P.S. This is just a hypothetical example to illustrate my question (I know it would be much more effective to use here a simple group by).
You can use nested query like this
SELECT CustomerList.CustomerName Cust, isnull((SELECT SUM(Spending) CustSpending
FROM Customer
WHERE Customer.CustomerName = CustomerList.CustomerName),0)
FROM CustomerList
This would normally be done using GROUP BY:
Select customer, sum(spending)
from mytable
group by customer;
GROUP BY is a very fundamental part of SQL. You should review your knowledge of SQL so you understand how to use it.

Total Count of each subtype and assign to every record again

I am novice to SQL. I am learning count(). So here is what I am trying to do. There is a table in which Products and Product Types are listed. I want an output which will have a separate column which gives total count of each product_type and assigns it to every record. Can anyone help me to write this query? I searched the forums, but couldn't find similar requirement. Please find attached image for Source table and target table example.
Thank you,
DP
Case for SQL Query
Use windowed COUNT:
SELECT Product,
Product_Type,
COUNT(*) OVER(PARTITION BY Product_Type) AS "Count"
FROM table

Using SQL to find total amount donated to organizations

I have a SQL table with 2 columns,
organization_id,
amount_donated
I want to return a new table with 2 columns,
organization_id
total_amount_donated
How would I go about finding the total amount donated to each organization? I am thinking I have to use GROUP BY but I haven't been able to find a solution.
You're right, you do want to use GROUP BY. This is because you need to use the SUM aggregate function, and by using GROUP BY organization_id you will sum the amounts corresponding to each organization.
SELECT organization_id, SUM(amount_donated) AS total_amount_donated
FROM your_table
GROUP BY organization_id
This is pretty simple SQL, you should probably get your hands on a tutorial or a book to step you through the basics. :)
You should't to create table, because it will create duplicate data. Instead create a VIEW.
CREATE VIEW vw_total_donation
AS
SELECT organization_id, SUM(amount_donated) tot_donation
FROM table1
GROUP BY organization_id
select organisation_id,sum(amount_donated) as total_amt_donated from your_table
group by organisation_id
You can query like this

Query Tuning with Different Disaggregations

Is there any possibility where in I can reduce the number of queries and implement the below different disaggregations in a single query? The table accessed is huge and contain millions of records. So I trying to optimize the query so that I do not have to access the same table multiple times!
It looks like you want rollup or grouping sets. I think this may come close to what you are looking for:
SELECT 'Chicago' AS Region, District, SchoolName AS School, Category ,
COUNT(DISTINCT ssid) AS Total ,
SUM(DirectEnroll) AS Met
FROM final.NSC_Analysis
WHERE GradYear = 2013
GROUP BY Category, Schoolname, District WITH rollup;
Unfortunately, this doesn't quite work for the count(distinct) in SQL Server prior to 2008.