Query Tuning with Different Disaggregations - sql

Is there any possibility where in I can reduce the number of queries and implement the below different disaggregations in a single query? The table accessed is huge and contain millions of records. So I trying to optimize the query so that I do not have to access the same table multiple times!

It looks like you want rollup or grouping sets. I think this may come close to what you are looking for:
SELECT 'Chicago' AS Region, District, SchoolName AS School, Category ,
COUNT(DISTINCT ssid) AS Total ,
SUM(DirectEnroll) AS Met
FROM final.NSC_Analysis
WHERE GradYear = 2013
GROUP BY Category, Schoolname, District WITH rollup;
Unfortunately, this doesn't quite work for the count(distinct) in SQL Server prior to 2008.

Related

Use Common Table Expression to get difference between two sets of data

I'm trying to use a common table expression to find the differences between two queries I wrote. The first query returns how many patients belong to each ROOMID(each ID represent a specific room).
Second query I have is how many patients that belong to each ROOMId have surgery operated on them. PatientID represent each patient.
select roomID, count(distinct patientID) as totalinsurgery
from data with (nolock)
where ptprocess = 'surgery'
group by clientid, batchid
Second query:
select CAroomid, sum(patientsinroom) as patientsinroom
from data
group by caroomid
So the idea behind is try to get the 'difference' in result of the two query. So how many patients in the room went to surgery. What is the best way to use common table expression to get the result?
So how many patients in the room went to surgery.
I suspect you just want conditional aggregation:
select roomId,
count(distinct case when ptprocess = 'surgery' then patientID end) as num_surgery
count(distinct patientID) as total
from data
group by roomId;
Note: I have no idea why you are using count(distinct). Can a patient really occur more than one time in a room?

How to GROUP BY a column/s in data studio

I have a sales table with the purchase history of multiple customers. Needless to say a single customer can appear multiple times in the table. I need to group by the customers and do a count of the industries each customer works in and visualize it in a table in data studio. I need to do all of this in data studio itself.
In big query the syntax would look something like this:
SELECT Industry, count(industry) AS industry_count
FROM (SELECT
CustomerID,
Industry
FROM `project1.pr.df_full`
WHERE segment = 'Lost'
GROUP BY CustomerID, Industry)
GROUP BY Industry
ORDER BY industry_count DESC
How can I achieve the same thing in data studio? The WHERE clause doesn't have to be there because I have a segment filter on the page I'm trying to do this on
As I said in the comment, I reproduced your query and it worked fine.
Here you can see a guide about how to connect BigQuery to DataStudio
Please, notice that DataStudio have some limitations in the query syntax:
If you need further information, please let me know
You could query the raw data, and make calculations on Data Studio side. Be sure to use the field you need to group by as dimension.
SELECT
CustomerID,
Industry,
segment
FROM `project1.pr.df_full`
Then in a Data Studio table use "Industry" field as dimension, and "CustomerId" field as metric, using Count as the aggregation for the metric. As you also have a "segment" field in your data source, filtering by this field will not be a problem.
I hope this help!
I'm wondering why you don't write the query as:
SELECT CustomerID, COUNT(DISTINCT Industry) as industry_count
FROM `project1.pr.df_full`
WHERE segment = 'Lost'
GROUP BY CustomerID
ORDER BY industry_count DESC;

what does Group By multiple columns means?

I use oracle 11g , so i read alot of artics about it but i dont understand
how exactly its happened in database , so lets say that have two tables:
select * from Employee
select * from student
so when we want to make group by in multi columns :
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
so my question is: what exactly happened in database ? i mean the query count(*) do first in every column in group by and then sort it ? or what? can any one explain it in details ?.
SQL is a descriptive language, not a procedural language.
What the query does is determine all rows in the original data where the group by keys are the same. It then reduces them to one row.
For example, in your data, these all have the same data:
subject year name
English 1 Harsh
English 1 Pratik
English 1 Ramesh
You are saying to group by subject, year, so these become:
Subject Year Count(*)
English 1 3
Often, this aggregation is implemented using sorting. However, that is up to the database -- and there are many other algorithms. You cannot assume that the database will sort the data. But, if it easier for you to think of it, you can think of the data being sorted by the group by keys, in order to identify the groups. Just one caution, the returned values are not necessarily in any particular order (unless your query includes an order by).

SQL - making several groupings (performance)

I have some SQL query that founds records based on provided parameters. That query is pretty heavy, so I want to execute it less as possible.
After I getting result from that query, I need to perform its breakdown.
For example, consider the following query:
SELECT location, department, industry
FROM data
WHERE ...
After that, I need to perform breakdown of that results, e.g. I need to provide list of all locations where from I have results and counts of each type, same for departments and same for industries.
As I know, in order to get breakdown by locations, I need to perform GROUP BY (location) and then count.
My question is: is it possible, for performance considerations, to perform several groupings/ counts on query result without recalculating it over and over again for each grouping?
Yes, this is possible. Unless I misunderstood you.
You need to use windowed functions. For instance:
SELECT location
, department
, industry
, COUNT(*) OVER(PARTITION BY location, department)
, COUNT(*) OVER(PARTITION BY location, department, industry)
FROM data
WHERE ...;
Keep in mind, that doing a COUNT(DISTINCT column) is not possible.
If I understand correctly, you can do what you want with grouping sets (documented here):
SELECT location, department, industry, count(*)
FROM data
WHERE ...
GROUP BY GROUPING SETS ((location), (department), (industry))
This will return rows like:
location1 NULL NULL 10
. . .
NULL dept1 NULL 17
. . .
If you want to get fancy, and you have no NULL values in any of the columns, you can do:
SELECT (case when location is not null then 'location'
when department is not null then 'department'
when industry is not null then 'industry'
end) as which,
coalesce(location, department, industry) as name, count(*)
FROM data
WHERE ...
GROUP BY GROUPING SETS ((location), (department), (industry))
ORDER BY which;
You can actually do the same thing using the GROUPING() function, if you do have NULL values in the columns, but you have to replace the coalesce() as well.

(SQL) Using SELECT statements to display data with odd requirements

So I'm taking a course on learning basic SQL (using Oracle), and I felt like I had become fairly fluent with using SELECT statements (grouping, joining, having, etc), but now I'm at a loss on how to deal with this latest problem.
I need to write a statement that would only display rows with more than one piece of data. So, say I had
COMPANY PRODUCT
One Car
One Book
Two Game
it should only list company 'One'. But I can't find anything online to help me.
Select Company
From YourTableName
Group By Company
Having Count(*) > 1
better way to know count of each company is :
Select Company,Count(*)
From Table
Group By Company
Having Count(*) > 1