Assistance with SQL Query (Windowing Functions) - sql

Houston Apartment Order 1
Houston Apartment Order 5
Houston TownHouse Order 3
Houston TownHouse Order 4
Austin Condo
Dallas MultiFamily Order 2
All,
I have a result set like the one above.
Using the familiar Customer -> Orders schema as an example,
The first two columns (e.g. Houston, Apartment) come from category1 and category2 fields on the Customer Table.
The third column comes from the Orders Table and will represent the Primary Key of the table. The values in this column were deliberately listed out of order (1...5...3) to show that I cannot guarantee the order the values.
What I want is to have a column that adds a Rank or Row_number (or calculation?) that numbers each combination of Category 1 and 2:
1 Houston Apartment Order 1
1 Houston Apartment Order 5
2 Houston TownHouse Order 3
2 Houston TownHouse Order 4
3 Austin Condo
4 Dallas MultiFamily Order 2
So, Houston-Aparment is 1, Houston-TownHouse is 2, etc...
I would like to avoid any sub/nested queries if possible.
Please note:
The values in the example or just sample data. The real data is not based on a Customer/orders so I respectfully and humbly ask that you please not chastise me for having Cities and Apartment types as categories, etc (I would put these in separate domain tables in this instance) or suggest a change of schema
Can anyone help please?!
Steve

Based on the results that you show, I think you want:
select dense_rank() over (order by Category1, Category2) as rankorder, *
from Customers c join
Orders o
on o.CustomerID = c.CustomerID
You seem to be adding an index based only on the first two categories, and never starting over again (the partition is used to start counting over again). You have ties with no gaps (1, 1, 2 . . .). For this case, you want dense_rank(). If you had ties with gaps (1, 1, 3 . . .), you would use rank(). If you just wanted the ordering (1, 2, 3) you would use row_number().

If your database supports windowing functions, you could use row_number():
select row_number() over (partition by Category1, Category2 order by CustomerID)
from Customers c
join Orders o
on o.CustomerID = c.CustomerID

Something like this should do:
create table Data
(
city varchar(50),
propertyType varchar(50),
anOrder int
)
insert into Data select 'Houston', 'Apartment', 1
insert into Data select 'Houston', 'Apartment', 5
insert into Data select 'Houston', 'TownHouse', 3
insert into Data select 'Houston', 'TownHouse', 4
insert into Data select 'Austin', 'Condo', 1
insert into Data select 'Dallas', 'MultiFamily', 2
select city, propertyType, RANK() OVER
(PARTITION BY Data.city ORDER BY Data.city,Data.propertyType DESC) AS Rank
from Data
group by city, propertyType

Related

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

Is there a way to display the total count of rows in a separate row?

I have a table that looks like this:
City_Id
City
41
Athena
39
Beijing
35
London
30
Rio de Janeiro
28
Salt Lake City
18
Sochi
7
Sydney
4
Torino
is there a way to display another row in the bottom that will display the total count of rows?
City_Id
City
41
Athena
39
Beijing
35
London
30
Rio de Janeiro
28
Salt Lake City
18
Sochi
7
Sydney
4
Torino
Total
8
You can actually use GROUPING SETS for this. This avoids having to scan the table twice.
However you still have the data-type mismatch problem. You could solve it by casting, but it's probably easier to just swap the columns around
SELECT
CASE WHEN GROUPING(City) = 0 THEN City ELSE 'Total' END AS City,
CASE WHEN GROUPING(City_Id) = 0 THEN City_Id ELSE COUNT(*) END AS City_Id
FROM Table1
GROUP BY GROUPING SETS (
(City_Id, City),
()
)
ORDER BY GROUPING(City_Id);
SQL Fiddle
What this does is generate separate result-sets, unioned together. You can differentiate between a grouped row and a non-grouped row using the GROUPING function.
I would agree with most of the other comments that acquiring a result set count would be more appropriate from the application code (which usually has a mechanism specifically for this purpose).
However...
If you must have a TSQL solution for your question, an option is to return the count in a separate column. This is different than returning it in a separate row, of course. There are pros & cons with each approach.
DROP TABLE IF EXISTS #Cities;
CREATE TABLE #Cities (
City_Id INT,
City VARCHAR(128)
);
INSERT INTO #Cities
VALUES
(41, 'Athena'),
(39, 'Beijing'),
(35, 'London'),
(30, 'Rio de Janeiro'),
(28, 'Salt Lake City'),
(18, 'Sochi'),
(7 , 'Sydney'),
(4 , 'Torino');
SELECT *, COUNT(*) OVER(ORDER BY (SELECT NULL)) AS Total
FROM #Cities;
--Count is properly reflected based on WHERE clause.
SELECT *, COUNT(*) OVER(ORDER BY (SELECT NULL)) AS Total
FROM #Cities
WHERE City LIKE 'S%';
--Be careful with this one--the COUNT(*) may not be what you expected.
SELECT TOP(4) *, COUNT(*) OVER(ORDER BY (SELECT NULL)) AS Total
FROM #Cities;
NOTE: be aware that this approach may not scale (perform) well for large result sets. Be sure to do some testing!
As you know already, it should be done in the presentation layer. But if you just want to know if there is any way, then I would suggest to use UNION ALL
select cast(City_Id as varchar(10)) City_Id, City from Table1
union all
select 'Total' as City_Id, cast(count(*) as varchar(14)) from Table1
Here is the sql fiddle

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

Collapse Multiple Records Into a Single Record With Multiple Columns

In a program I'm maintaining we were given a massive (~500 lines) SQL statement by the customer. It is used for generating flat files with fixed length records for transmitting data to another big business. Since its a massive flat file its not relational and the standard normal forms of data are collapsed. So, if you have a record that can have multiple codes associated, in this case upto 19, they all have be written into single line, but seperate fields, in the flat file.
Note: this example is simplified.
The data might look like this, with three tables:
RECORDS
record_id firstname lastname
--------------------------------
123 Bob Schmidt
324 George Washington
325 Ronald Reagan
290 George Clooney
CODE_TABLE
code_id code_cd code_txt
--------------------------------
5 3 President
2 4 Actor
3 7 Plumber
CODES_FOR_RECORDS
record_id code_cd
-------------------
123 7
325 3
290 4
324 3
325 4
123 4
This needs to produce records like:
firstname lastname code1 code2 code3
Bob Schmidt Actor Plumber NULL
George Washington President NULL NULL
Ronald Reagon Actor President NULL
George Clooney Actor NULL NULL
The portion of the current query we were given looks like this, but with 19 code columns instead of the 5:
select
x.record_id,
max(case when x.rankk = 1 then code_txt end) as CodeColumn1,
max(case when x.rankk = 2 then code_txt end) as CodeColumn2,
max(case when x.rankk = 3 then code_txt end) as CodeColumn3,
max(case when x.rankk = 4 then code_txt end) as CodeColumn4,
max(case when x.rankk = 5 then code_txt end) as CodeColumn5,
from
(
select
r.record_id,
ct.code_txt as ctag ,
dense_rank() over (partition by r.record_id order by cfr.code_id) as rankk
from
records as r
codes_for_records as cfr,
code_table as ct
where
r.record_id = cfr.record_id
and ct.code_cd = cfr.code_cd
and cfr.code_cd is not null
and ct.code_txt not like '%V%'
) as x
where
x.record_id is not null
group by
x.record_id
I trimmed down things for simplicties sake, but the actual statment includes an inner query and a join and more where conditions, but that should get the idea across. My brain is telling me there has to be a better way, but I'm not an SQL expert. We are using DB2 v8 if that helps. And the codes have to be in seperate columns, so no coalescing things into a single string. Is there a cleaner solution than this?
Update:
I ended up just refacorting the original query, it sill uses the ugly MAX() business, but overall the query is much more readable due to reworking other parts.
It sounds like what you are looking for is pivoting.
WITH joined_table(firstname, lastname, code_txt, rankk) AS
(
SELECT
r.firstname,
r.lastname,
ct.code_txt,
dense_rank() over (partition by r.record_id order by cfr.code_id) as rankk
FROM
records r
INNER JOIN
codes_for_records cfr
ON r.record_id = cfr.record_id
INNER JOIN
codes_table ct
ON ct.code_cd = cfr.code_cd
),
decoded_table(firstname, lastname,
CodeColumn1, CodeColumn2, CodeColumn3, CodeColumn4, CodeColumn5) AS
(
SELECT
firstname,
lastname,
DECODE(rankk, 1, code_txt),
DECODE(rankk, 2, code_txt),
DECODE(rankk, 3, code_txt),
DECODE(rankk, 4, code_txt),
DECODE(rankk, 5, code_txt)
FROM
joined_table jt
)
SELECT
firstname,
lastname,
MAX(CodeColumn1),
MAX(CodeColumn2),
MAX(CodeColumn3),
MAX(CodeColumn4),
MAX(CodeColumn5)
FROM
decoded_table dt
GROUP BY
firstname,
lastname;
Note that I've never actually done this myself before. I'm relying on the linked document as a reference.
You might need to include the record_id to account for duplicate names.
Edit: Added the GROUP BY.
One of the possible solutions is using of recursive query:
with recursive_view (record_id, rankk, final) as
(
select
record_id,
rankk,
cast (ctag as varchar (100))
from inner_query t1
union all
select
t1.record_id,
t1.rankk,
/* all formatting here */
cast (t2.final || ',' || t1.ctag as varchar (100))
from
inner_query t1,
recursive_view t2
where
t2.rankk < t1.rankk
and t1.record_id = t2.record_id
and locate(t1.ctag, t2.final) = 0
)
select record_id, final from recursive_view;
Can't guarantee that it works, but hope it will be helpful. Another way is using of custom aggregate function.

How to Select and Order By columns not in Groupy By SQL statement - Oracle

I have the following statement:
SELECT
IMPORTID,Region,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,RefObligor
There exists some extra columns in table Positions that I want as output for "display data" but I don't want in the group by statement.
These are Site, Desk
Final output would have the following columns:
IMPORTID,Region,Site,Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
Ideally I'd want the data sorted like:
Order BY
IMPORTID,Region,Site,Desk,RefObligor
How to achieve this?
It does not make sense to include columns that are not part of the GROUP BY clause. Consider if you have a MIN(X), MAX(Y) in the SELECT clause, which row should other columns (not grouped) come from?
If your Oracle version is recent enough, you can use SUM - OVER() to show the SUM (grouped) against every data row.
SELECT
IMPORTID,Site,Desk,Region,RefObligor,
SUM(NOTIONAL) OVER(PARTITION BY IMPORTID, Region,RefObligor) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
Order BY
IMPORTID,Region,Site,Desk,RefObligor
Alternatively, you need to make an aggregate out of the Site, Desk columns
SELECT
IMPORTID,Region,Min(Site) Site, Min(Desk) Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,Min(Site),Min(Desk),RefObligor
I believe this is
select
IMPORTID,
Region,
Site,
Desk,
RefObligor,
Sum(Sum(Notional)) over (partition by IMPORTID, Region, RefObligor)
from
Positions
group by
IMPORTID, Region, Site, Desk, RefObligor
order by
IMPORTID, Region, RefObligor, Site, Desk;
... but it's hard to tell without further information and/or test data.
A great blog post that covers this dilemma in detail is here:
http://bernardoamc.github.io/sql/2015/05/04/group-by-non-aggregate-columns/
Here are some snippets of it:
Given:
CREATE TABLE games (
game_id serial PRIMARY KEY,
name VARCHAR,
price BIGINT,
released_at DATE,
publisher TEXT
);
INSERT INTO games (name, price, released_at, publisher) VALUES
('Metal Slug Defense', 30, '2015-05-01', 'SNK Playmore'),
('Project Druid', 20, '2015-05-01', 'shortcircuit'),
('Chroma Squad', 40, '2015-04-30', 'Behold Studios'),
('Soul Locus', 30, '2015-04-30', 'Fat Loot Games'),
('Subterrain', 40, '2015-04-30', 'Pixellore');
SELECT * FROM games;
game_id | name | price | released_at | publisher
---------+--------------------+-------+-------------+----------------
1 | Metal Slug Defense | 30 | 2015-05-01 | SNK Playmore
2 | Project Druid | 20 | 2015-05-01 | shortcircuit
3 | Chroma Squad | 40 | 2015-04-30 | Behold Studios
4 | Soul Locus | 30 | 2015-04-30 | Fat Loot Games
5 | Subterrain | 40 | 2015-04-30 | Pixellore
(5 rows)
Trying to get something like this:
SELECT released_at, name, publisher, MAX(price) as most_expensive
FROM games
GROUP BY released_at;
But name and publisher are not added due to being ambiguous when aggregating...
Let’s make this clear:
Selecting the MAX(price) does not select the entire row.
The database can’t know and when it can’t give the right answer every
time for a given query it should give us an error, and that’s what it
does!
Ok… Ok… It’s not so simple, what can we do?
Use an inner join to get the additional columns
SELECT g1.name, g1.publisher, g1.price, g1.released_at
FROM games AS g1
INNER JOIN (
SELECT released_at, MAX(price) as price
FROM games
GROUP BY released_at
) AS g2
ON g2.released_at = g1.released_at AND g2.price = g1.price;
Or Use a left outer join to get the additional columns, and then filter by the NULL of a duplicate column...
SELECT g1.name, g1.publisher, g1.price, g2.price, g1.released_at
FROM games AS g1
LEFT OUTER JOIN games AS g2
ON g1.released_at = g2.released_at AND g1.price < g2.price
WHERE g2.price IS NULL;
Hope that helps.