Count the number of occurences in each bucket Redshift SQL - sql

This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.
The table would like
Market Bucket
Org Count
1 Markets
3
2 Markets
1
3 Markets
0
select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts
I was trying to formulate the query from above but I got confused on how to effectively formulate the query
Organization Facts
Market Name
Org ID
Org Name
New York
15683
Company x
Orlando
38478
Company y
Twin Cities
2738
Company z
Twin Cities
15683
Company x
Detroit
99
Company xy

You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.
Something like:
with markets as (
select
org_name,
count(distinct market_name) as market_count
from organization_facts
)
select
market_count,
count(*) as org_count
from markets
group by market_count
order by market_count

If I follow you correctly, you can do this with two levels of aggregation. Assuming that org_id represents a company in your dataset:
select cnt_markets, count(*) cnt_org_id
from (select count(*) cnt_markets from organization_facts group by org_id) t
group by cnt_markets
The subquery counts the number of markets per company. I assumed no duplicate (ord_id, market_name) tuples in the table ; if that's not the case, then you need count(distinct market_name) instead of count(*) in that spot.
Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.
Note that I left apart the enterprise_account column ,that appears in your query but not in your data.

Related

I need to get count of total count for the query I had in postgresql

I created a select query as following, now I need to get the total count of the "No.of Ideas generated" column in a separate row as total which will have a count of the individual count of particular idea_sector and idea_industry combination.
Query:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
order by idea_sector ,idea_industry
Output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
Required output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
------------------------------------------------------------------------
Total 9
You can accomplish this with grouping sets. That's where we tell postgres, in the GROUP BY clause, all of the different ways we would like to see our result set grouped for the aggregated column(s)
SELECT
c.idea_sector,
c.idea_industry,
count(*) as "No.of Ideas generated"
FROM hackathon2k21.consolidated_report c
GROUP BY
GROUPING SETS (
(idea_sector,idea_industry),
())
ORDER BY idea_sector ,idea_industry;
This generates two grouping sets. One that groups by idea_sector, idea_industry granularity like in your existing sql and another that groups by nothing, essentially creating a full table Total.
The easiest way seems to be adding a UNION ALL operator like this:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
--order by idea_sector ,idea_industry
UNION ALL
SELECT 'Total', NULL, COUNT(*)
from hackathon2k21.consolidated_report

how many banks are currently rated B+ or above and when was the last time (dateindex) that they had been below

I have these two tables and im trying to get the dateindex of the last time that the company was rated below a B+. dateindex=19941 which means 1994 quarter 1
This selects all the companies that have B+ or above in q2 2020
SELECT DISTINCT mr.name, mc.rating, mr.DateIndex
FROM [Model].[rating]mc inner join [Model].[RawHist]mr
ON mc.BankId=mr.BankId
WHERE mc.Rating in ('A+','A','A-','B+') AND mr.DateIndex in('20202')
And it yields the following
How can I add the dateindex the last time it was below B+. so it would have those three fields and two more fields one with the last grade below b+ and its date index for 5 total fields.
This is what i have so far with the results
Its giving me way to many rows.
I have these two tables and im trying to get the dateindex of the last time that the company was rated below a B+.
This sounds like aggregation:
SELECT mr.name, MAX(mr.DateIndex)
FROM [Model].[rating] mc JOIN
[Model].[RawHist]mr
ON mc.BankId = mr.BankId
WHERE mc.Rating NOT IN ('A+', 'A', 'A-','B+')
GROUP BY mr.name;
This assumes that "less than B+" means that it is not one of the listed ratings.

How to link the relational datatables and divide the sum of 3 columns from one table by 1 column in another table using SQL SELECT query

I'm creating a database for Formula 1 drivers/teams. The idea is to display the cost comparisons of team budgets vs team points and driver salaries, to see the effective cost per point for the top 10 teams.
E.g from 2015 info
Team Mercedes
Income received = euro 467.4m (sponsors 122m, partners 212.40m, tv 133m).
Points Scored = 703 (total from both drivers: Hamilton 381, Rosberg 322).
Effective Cost Per Point = 2,506,417.11
I have 3 datatables: One each for the team, the drivers, and a table to join tables 1&2 together, to then create the correct SELECT queries:
Table 'team'
teamid
teamname
sponsors
partners
tv
total
Table 'driver'
driverid
drivername
salary
points
Table 'driverteam'
teamid
driverid
totalbudget
totalpoints
I first need to get the sum of the sponsors,partners and tv so I have created the following SELECT statement:
$sql = "SELECT teamname, sum(sponsors+partners+tv) as total FROM team ORDER BY total DESC LIMIT 10";
I know how to get the salary and points from a driver:
$sql = "SELECT drivername, salary FROM driver ORDER BY salary DESC LIMIT 10";
$sql = "SELECT drivername, points FROM driver ORDER BY points DESC LIMIT 10";
Seems ok, but now comes my problem, how to get the SUM from table 1 and divide it by the points that the drivers have got in table 2.
My limited brain says I need a INNER JOIN using table 3 to get table 1 & 2 together, before performing the necessary divisions etc.
As there are 2 drivers per team, I need to divide the team total budget by each drivers' points, as well as the team total by both drivers points.
What SELECT queries do I need to achieve this?

Access SQL: How to specify which record to return based on the "more important" condition?

I have 2 tables (MS ACCESS):
Table "Orders"
OrderID Product Product_Group Client Client_Group Revenue
1 Cars Vehicles Men People 10 000
2 Houses NC_Assets Women People 15 000
3 Houses NC_Assets Partnersh Companies 12 000
4 Cars Vehicles Corps Companies 3 000
Table "Gouping"
Product Product_Group Client Client_Group Tax rate
Cars Companies Taxable 30%
Vehicles Companies Taxable 15%
Houses People Taxable 13%
Houses Women Taxable 15%
I want to join these tables to see which orders will fall into which taxable group. As you can see some products/clients are mapped differently than their groups -> if that is the case, the query should return only one record for this pair and exclude any pairing containing their groups. In pseudo-code:
If there's product-client grouping, return this record Else
If there's product-client grouping ---//----- else
If there's product group - client ----///-----else
If there's product group-client group ---///----
End if * 4
In that order.
Now my query (pseudo):
SELECT [Orders].*, [Grouping].* FROM [Orders] LEFT JOIN [Grouping] ON
(([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND
([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
Returns both Cars-Companies and Vehicles-Companies. I'm out of ideas how to set it up to get only the most granular records from each combination. UNION? NOT EXISTS?
Any help appreciated.
I want to join these tables to see how many orders qualify as good,
mediocre etc.
Sounds like you want counts of the particular conditions...Assuming you have a SUM and CASE (I haven't written queries for MS Access in about 10 years...), here's some pseudo-code that should get you started:
SELECT SUM(CASE WHEN {mediocre-conditions} THEN 1 ELSE 0 END) AS MediocreCount,
SUM(CASE WHEN {good-conditions} THEN 1 ELSE 0 END) AS GoodCount,
SUM(CASE WHEN {great-conditions} THEN 1 ELSE 0 END) AS GreatCount
FROM [Orders] LEFT JOIN [Grouping] ON (([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND ([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
[update] I don't like giving bad answers, so did a quick look...based on this link: Does MS Access support "CASE WHEN" clause if connect with ODBC?, it appears you may be able to do:
SELECT SUM(IIF({mediocre-conditions},1,0)) AS MediocreCount,
SUM(IIF({good-conditions},1,0)) AS GoodCount,
SUM(IIF({great-conditions},1,0)) AS GreatCount

Need help in understanding a SELECT query

I have a following query. It uses only one table (Customers) from Northwind database.
I completely have no idea how does it work, and what its intention is. I hope there is a lot of DBAs here so I ask for explanation. particularly don't know what the OVER and PARTITION does here.
WITH NumberedWomen AS
(
SELECT CustomerId ,ROW_NUMBER() OVER
(
PARTITION BY c.Country
ORDER BY LEN(c.CompanyName) ASC
)
women
FROM Customers c
)
SELECT * FROM NumberedWomen WHERE women > 3
If you needed the db schema, it is here
This function:
ROW_NUMBER() OVER (PARTITION BY c.Country ORDER BY LEN(c.CompanyName) ASC)
assigns continuous row numbers to records within each country, ordering the records by LEN(companyName).
If you have these data:
country companyName
US Apple
US Google
UK BAT
UK BP
US GM
, then the query will assign numbers from 1 and 3 to the US companies and 1 to 2 to UK companies, ordering them by the name length:
country companyName ROW_NUMBER()
US GM 1
US Apple 2
US Google 3
UK BP 1
UK BAT 2
ROW_NUMBER() is a ranking function.
OVER tells it how to create rank numbers.
PARTITION BY [expression] tells the ROW_NUMBER function to restart ranking whenever [expression] contains a new value
In your case, for every country, a series of numbers starting with 1 is created. Within a country, the Companies are ordered by the length of their name (shorter name = lower rank).
The final query:
SELECT * FROM NumberedWomen WHERE women > 3
selects all customers except if the company-country combination is part of one of the companies with the 3 shortest names in the same country.