Need help in understanding a SELECT query - sql

I have a following query. It uses only one table (Customers) from Northwind database.
I completely have no idea how does it work, and what its intention is. I hope there is a lot of DBAs here so I ask for explanation. particularly don't know what the OVER and PARTITION does here.
WITH NumberedWomen AS
(
SELECT CustomerId ,ROW_NUMBER() OVER
(
PARTITION BY c.Country
ORDER BY LEN(c.CompanyName) ASC
)
women
FROM Customers c
)
SELECT * FROM NumberedWomen WHERE women > 3
If you needed the db schema, it is here

This function:
ROW_NUMBER() OVER (PARTITION BY c.Country ORDER BY LEN(c.CompanyName) ASC)
assigns continuous row numbers to records within each country, ordering the records by LEN(companyName).
If you have these data:
country companyName
US Apple
US Google
UK BAT
UK BP
US GM
, then the query will assign numbers from 1 and 3 to the US companies and 1 to 2 to UK companies, ordering them by the name length:
country companyName ROW_NUMBER()
US GM 1
US Apple 2
US Google 3
UK BP 1
UK BAT 2

ROW_NUMBER() is a ranking function.
OVER tells it how to create rank numbers.
PARTITION BY [expression] tells the ROW_NUMBER function to restart ranking whenever [expression] contains a new value
In your case, for every country, a series of numbers starting with 1 is created. Within a country, the Companies are ordered by the length of their name (shorter name = lower rank).
The final query:
SELECT * FROM NumberedWomen WHERE women > 3
selects all customers except if the company-country combination is part of one of the companies with the 3 shortest names in the same country.

Related

Count the number of occurences in each bucket Redshift SQL

This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.
The table would like
Market Bucket
Org Count
1 Markets
3
2 Markets
1
3 Markets
0
select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts
I was trying to formulate the query from above but I got confused on how to effectively formulate the query
Organization Facts
Market Name
Org ID
Org Name
New York
15683
Company x
Orlando
38478
Company y
Twin Cities
2738
Company z
Twin Cities
15683
Company x
Detroit
99
Company xy
You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.
Something like:
with markets as (
select
org_name,
count(distinct market_name) as market_count
from organization_facts
)
select
market_count,
count(*) as org_count
from markets
group by market_count
order by market_count
If I follow you correctly, you can do this with two levels of aggregation. Assuming that org_id represents a company in your dataset:
select cnt_markets, count(*) cnt_org_id
from (select count(*) cnt_markets from organization_facts group by org_id) t
group by cnt_markets
The subquery counts the number of markets per company. I assumed no duplicate (ord_id, market_name) tuples in the table ; if that's not the case, then you need count(distinct market_name) instead of count(*) in that spot.
Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.
Note that I left apart the enterprise_account column ,that appears in your query but not in your data.

I need to get count of total count for the query I had in postgresql

I created a select query as following, now I need to get the total count of the "No.of Ideas generated" column in a separate row as total which will have a count of the individual count of particular idea_sector and idea_industry combination.
Query:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
order by idea_sector ,idea_industry
Output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
Required output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
------------------------------------------------------------------------
Total 9
You can accomplish this with grouping sets. That's where we tell postgres, in the GROUP BY clause, all of the different ways we would like to see our result set grouped for the aggregated column(s)
SELECT
c.idea_sector,
c.idea_industry,
count(*) as "No.of Ideas generated"
FROM hackathon2k21.consolidated_report c
GROUP BY
GROUPING SETS (
(idea_sector,idea_industry),
())
ORDER BY idea_sector ,idea_industry;
This generates two grouping sets. One that groups by idea_sector, idea_industry granularity like in your existing sql and another that groups by nothing, essentially creating a full table Total.
The easiest way seems to be adding a UNION ALL operator like this:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
--order by idea_sector ,idea_industry
UNION ALL
SELECT 'Total', NULL, COUNT(*)
from hackathon2k21.consolidated_report

Total Sum SQL Server

I have a query that collects many different columns, and I want to include a column that sums the price of every component in an order. Right now, I already have a column that simply shows the price of every component of an order, but I am not sure how to create this new column.
I would think that the code would go something like this, but I am not really clear on what an aggregate function is or why I get an error regarding the aggregate function when I try to run this code.
SELECT ID, Location, Price, (SUM(PriceDescription) FROM table GROUP BY ID WHERE PriceDescription LIKE 'Cost.%' AS Summary)
FROM table
When I say each component, I mean that every ID I have has many different items that make up the general price. I only want to find out how much money I spend on my supplies that I need for my pressure washers which is why I said `Where PriceDescription LIKE 'Cost.%'
To further explain, I have receipts of every customer I've worked with and in these receipts I write down my cost for the soap that I use and the tools for the pressure washer that I rent. I label all of these with 'Cost.' so it looks like (Cost.Water), (Cost.Soap), (Cost.Gas), (Cost.Tools) and I would like it so for Order 1 it there's a column that sums all the Cost._ prices for the order and for Order 2 it sums all the Cost._ prices for that order. I should also mention that each Order does not have the same number of Costs (sometimes when I use my power washer I might not have to buy gas and occasionally soap).
I hope this makes sense, if not please let me know how I can explain further.
`ID Location Price PriceDescription
1 Park 10 Cost.Water
1 Park 8 Cost.Gas
1 Park 11 Cost.Soap
2 Tom 20 Cost.Water
2 Tom 6 Cost.Soap
3 Matt 15 Cost.Tools
3 Matt 15 Cost.Gas
3 Matt 21 Cost.Tools
4 College 32 Cost.Gas
4 College 22 Cost.Water
4 College 11 Cost.Tools`
I would like for my query to create a column like such
`ID Location Price Summary
1 Park 10 29
1 Park 8
1 Park 11
2 Tom 20 26
2 Tom 6
3 Matt 15 51
3 Matt 15
3 Matt 21
4 College 32 65
4 College 22
4 College 11 `
But if the 'Summary' was printed on every line instead of just at the top one, that would be okay too.
You just require sum(Price) over(Partition by Location) will give total sum as below:
SELECT ID, Location, Price, SUM(Price) over(Partition by Location) AS Summed_Price
FROM yourtable
WHERE PriceDescription LIKE 'Cost.%'
First, if your Price column really contains values that match 'Cost.%', then you can not apply SUM() over it. SUM() expects a number (e.g. INT, FLOAT, REAL or DECIMAL). If it is text then you need to explicitly convert it to a number by adding a CAST or CONVERT clause inside the SUM() call.
Second, your query syntax is wrong: you need GROUP BY, and the SELECT fields are not specified correctly. And you want to SUM() the Price field, not the PriceDescription field (which you can't even sum as I explained)
Assuming that Price is numeric (see my first remark), then this is how it can be done:
SELECT ID
, Location
, Price
, (SELECT SUM(Price)
FROM table
WHERE ID = T1.ID AND Location = T1.Location
) AS Summed_Price
FROM table AS T1
to get exact result like posted in question
Select
T.ID,
T.Location,
T.Price,
CASE WHEN (R) = 1 then RN ELSE NULL END Summary
from (
select
ID,
Location,
Price ,
SUM(Price)OVER(PARTITION BY Location)RN,
ROW_number()OVER(PARTITION BY Location ORDER BY ID )R
from Table
)T
order by T.ID

sql statement to select previous rows to a search param

Im after an sql statement (if it exists) or how to set up a method using several sql statements to achieve the following.
I have a listbox and a search text box.
in the search box, user would enter a surname e.g. smith.
i then want to query the database for the search with something like this :
select * FROM customer where surname LIKE searchparam
This would give me all the results for customers with surname containing : SMITH . Simple, right?
What i need to do is limit the results returned. This statement could give me 1000's of rows if the search param was just S.
What i want is the result, limited to the first 20 matches AND the 10 rows prior to the 1st match.
For example, SMI search:
Sives
Skimmings
Skinner
Skipper
Slater
Sloan
Slow
Small
Smallwood
Smetain
Smith ----------- This is the first match of my query. But i want the previous 10 and following 20.
Smith
Smith
Smith
Smith
Smoday
Smyth
Snedden
Snell
Snow
Sohn
Solis
Solomon
Solway
Sommer
Sommers
Soper
Sorace
Spears
Spedding
Is there anyway to do this?
As few sql statements as possible.
Reason? I am creating an app for users with slow internet connections.
I am using POSTGRESQL v9
Thanks
Andrew
WITH ranked AS (
SELECT *, ROW_NUMBER() over (ORDER BY surname) AS rowNumber FROM customer
)
SELECT ranked.*
FROM ranked, (SELECT MIN(rowNumber) target FROM ranked WHERE surname LIKE searchparam) found
WHERE ranked.rowNumber BETWEEN found.target - 10 AND found.target + 20
ORDER BY ranked.rowNumber
SQL Fiddle here. Note that the fiddle uses the example data, and I modified the range to 3 entries before and 6 entries past.
I'm assuming that you're looking for a general algorithm ...
It sounds like you're looking for a combination of finding the matches "greater than or equal to smith", and "less than smith".
For the former you'd order by surname and limit the result to 20, and for the latter you'd order by surname descending and limit to 10.
The two result sets can then be added together as arrays and reordered.
I think you need to use ROW_NUMBER() (see this link).
WITH cust1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY surname) as numRow FROM customer
)
SELECT c1.surname, c1.numRow, x.flag
FROM cust1 c1, (SELECT *,
case when numRow = (SELECT MIN(numRow) FROM cust1 WHERE surname='Smith') then 1 else 0 end as flag
FROM cust1) x
WHERE x.flag = 1 and c1.numRow BETWEEN x.numRow - 1 AND x.numRow + 1
ORDER BY c1.numRow
SQLFiddle here.
This works, but the flag finally isn't necessary and it would be a query like PinnyM posts.
A variation on #PinnyM's solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() over (ORDER BY surname) AS rowNumber
FROM customer
),
minrank AS (
SELECT
*,
MIN(CASE WHEN surname LIKE searchparam THEN rowNumber END) OVER () AS target
FROM ranked
)
SELECT
surname
FROM minrank
WHERE rowNumber BETWEEN target - 10 AND target + 20
;
Instead of two separate calls to the ranked CTE, one to get the first match's row number and the other to read the results from, another CTE is introduced to serve both purposes. Can't speak for PostgreSQL but in SQL Server this might result in a better execution plan for the query, although in either case the real efficiency would still need to be verified by proper testing.

Any option except cursor in this kind of group by?

I have a sample data as:
Johnson; Michael, Surendir;Mishra, Mohan; Ram
Johnson; Michael R.
Mohan; Anaha
Jordan; Michael
Maru; Tushar
The output of the query should be:
Johnson; Michael 2
Mohan; Anaha 1
Michael; Jordon 1
Maru; Tushar 1
Surendir;Mishra 1
Mohan; Ram 1
As you can see it is print the count of each name separated by , but with a twist. We cannot simply do a groupby on full name because sometimes the name may contain middle name 1st initial and sometimes it may not. Eg. Johnson; Michael and Johnson; Michael R. are counted as single name and hence their count is 2. Further either Johnson; Michael should appear or Johnson; Michael R. should appear in resultset with count of 2 (not both because that would be repeated record)
The table contains names separated by , and it is not possible to denormalize it as it is LIVE and given to us by someone else.
Is there anyway to write a query for this without using cursor? I have around 3 million records in my DB and I have to support pagination etc also. What do you think would be the best way to achieve this?
This is why your data should be normalised.
;with cte as
(
select 1 as Item, 1 as Start, CHARINDEX(',',People+',' , 1) as Split,
People+',' as People
from YourHorribleTable
union all
select cte.Item+1, cte.Split+1, nullif(CHARINDEX(',',people, cte.Split+1),0), People as Split
from cte
where cte.Split<>0
)
select Person, COUNT(*)
from
(
select case when nullif(charindex (' ', person, 2+nullif(CHARINDEX(';', person),0)),0) is null then person
else substring(person,1,charindex (' ', person, 2+nullif(CHARINDEX(';', person),0)))
end as Person
from
(
select LTRIM(RTRIM( SUBSTRING(people, start,isnull(split,len(People)+1)-start))) as person
from cte
) v
where person<>''
) v
group by Person
order by COUNT(*) desc