Oracle SQL - Counting distinct column combinations - sql

Running Oracle 12.1. I have a Line Items table. Its structure is fixed, and I cannot change it. I need to build a dashboard style page of information of the Line items table for a person to look at their sales territory. This person might be a GVP, who owns a large territory, or a Manager, or an individual rep. The Line Items table is pretty de-normalized, as this copy is part of a DW. This ‘copy’ of the table is only updated every 2 weeks, and it looks like this.
Line_Item_ID // PK
Account_ID //
Company_Name // The legal name of the Headquarters
LOB_Name // Line of business, aka Division within the Company_Name
Account_Type // One of 2 values, ‘NAMED’ or “GENERAL’
ADG_STATUS // 3 possible values, ‘A’, ‘D’ or ‘G’
Industry // One of 15 values, for this example assume it is ONLY ‘MFG’, ‘GOV’, ‘HEALTHCARE’
// Now have the sales hierarchy of the rep who sold this
GVP // Group Vice President
SVP // Sales Vice President
RVP // Regional Vice President
RM // Regional Manager
REP // Sales Rep
// Now have information about the product sold
ProductName
ProductPrice
VariousOtherFields….
I need to make an aggregated table which will be used for quick access of the dashboard. It will have counts of various combinations, and there will be one row per PERSON, not account. A person is every UNIQUE person listed in any of the GVP, SVP, RVP, RM or REP fields. Here is what the end result table will look like. Other than PERSON, every column is based on a DISTINCT count, and it is an integer value.
PERSON
TOTAL_COMPANIES // For this person, count of DISTINCT COMPANY_NAME
TOTAL_LOBS // For this person, count of DISTINCT LOBS
TOTAL_COMPANIES_NAMED // count of DISTINCT COMPANY_NAME with ACCOUNT_TYPE=’NAMED’
TOTAL_COMPANIES_GENERAL // count of DISTINCT COMPANY_NAME with ACCOUNT_TYPE=’GENERAL’
TOTAL_LOBS_NAMED // count of DISTINCT LOB_NAME with ACCOUNT_TYPE=’NAMED’
TOTAL_LOBS_GENERAL // count of DISTINCT LOB_NAME with ACCOUNT_TYPE=’GENERAL’
TOTAL_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’
TOTAL_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’
TOTAL_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’
TOTAL_LOB_STATUS_A // count of DISTINCT LOB_NAME with ADG_STATUS=’A’
TOTAL_LOB_STATUS_D // count of DISTINCT LOB_NAME with ADG_STATUS=’D’
TOTAL_LOB_STATUS_G // count of DISTINCT LOB_NAME with ADG_STATUS=’G’
//Now Various Industry Permutations. I have 15 different industries, but only showing 2. This will only be at the COMPANY_NAME level, not the LOB_NAME level
MFG_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’ and Industry = ‘MFG’
MFG_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’ and Industry = ‘MFG’
MFG_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’ and Industry = ‘MFG’
GOV_COMPANIES_STATUS_A // count of DISTINCT COMPANY_NAME with ADG_STATUS=’A’ and Industry = ‘GOV’
GOV_COMPANIES_STATUS_D // count of DISTINCT COMPANY_NAME with ADG_STATUS=’D’ and Industry = ‘GOV’
GOV_COMPANIES_STATUS_G // count of DISTINCT COMPANY_NAME with ADG_STATUS=’G’ and Industry = ‘GOV’
There are approx. 400 people, 35000 unique accounts, and 200,000 entries in the line items table.
So what is my strategy? I have thought about making another table of unique PERSON values, and using it as a driving table. Let’s call this table PERSON_LIST.
Pseudo-code…
For each entry in PERSON_LIST
For all LINE_ITEMS where person_list in ANY(GVP, SVP, RVP, RM, REP) do
Calculations…
This would be an incredibly long running process…
How can I do this more effectively (set based as opposed to row by row)? I believe I would have to use the PIVOT operator for the INDUSTRY list, but can I use PIVOT with additional criteria? Aka count of distinct COMPANY with a specific industry and a specific ADG_STATUS?
Any ideas or SQL code most appreciated.

You could unpivot the original data to get the data from the original GVP etc. columns into one 'person' column:
select * from line_items
unpivot (person for role in (gvp as 'GVP', svp as 'SVP', rvp as 'RVP',
rm as 'RM', rep as 'REP'))
And then use that as a CTE or inline view, with pretty much what you showed; conditional aggregation using case expressions, something like:
select person,
count(distinct company_name) as total_companies,
count(distinct lob_name) as total_lobs,
count(distinct case when account_type='NAMED' then company_name end)
as total_companies_named,
count(distinct case when account_type='GENERAL' then company_name end)
as total_companies_general,
count(distinct case when account_type='NAMED' then lob_name end)
as total_lobs_named,
count(distinct case when account_type='GENERAL' then lob_name end)
as total_lobs_general,
count(distinct case when adg_status='A' then company_name end)
as total_companies_status_a,
count(distinct case when adg_status='D' then company_name end)
as total_companies_status_d,
count(distinct case when adg_status='G' then company_name end)
as total_companies_status_g,
count(distinct case when adg_status='A' then lob_name end)
as total_lob_status_a,
count(distinct case when adg_status='D' then lob_name end)
as total_lob_status_d,
count(distinct case when adg_status='G' then lob_name end)
as total_lob_status_g,
count(distinct case when adg_status='A' and industry = 'MFG' then company_name end)
as mfg_companies_status_a,
count(distinct case when adg_status='D' and industry = 'MFG' then company_name end)
as mfg_companies_status_d,
count(distinct case when adg_status='G' and industry = 'MFG' then company_name end)
as mfg_companies_status_g,
count(distinct case when adg_status='A' and industry = 'GOV' then company_name end)
as gov_companies_status_a,
count(distinct case when adg_status='D' and industry = 'GOV' then company_name end)
as gov_companies_status_d,
count(distinct case when adg_status='G' and industry = 'GOV' then company_name end)
as gov_companies_status_g
from (
select * from line_items
unpivot (person for role in (gvp as 'GVP', svp as 'SVP', rvp as 'RVP',
rm as 'RM', rep as 'REP'))
)
group by person;

Related

select sum where condition is true and count > 3

I have two tables.
FootballPlayers with columns Id_footballplayer, Last_Name, Fisrt_Name, Age
Transfers with columns Id_transfer, Name_club, price, date, acceptance (yes or no), code_footballplayer
How can I write a SQL query to select the last names of the players and the sum of the successful transfers carried out by them, the number of which exceeds 3?
I already wrote a query that displays the total amount of all successful transfers for each player
SELECT FootballPLayers.Last_Name,
SUM(CASE acceptance WHEN 'yes' THEN price ELSE 0 END) AS amount_price
FROM FootballPlayers
INNER JOIN Transfers ON FootballPlayers.ID_footballplayer = Transfers.code_footballplayer
GROUP BY FootballPlayers.Last_Name;
But I don’t know how to add a condition if the number of successful transfers is more than 3
Since this is a group scenario, after theGROUP BY you probably want:
HAVING COUNT(1) > 3
The HAVING clause works very similarly to WHERE, but is applied differently.
An alternative would be the sub-query:
SELECT * FROM
(
SELECT FootballPLayers.Last_Name,
SUM(CASE acceptance WHEN 'yes' THEN price ELSE 0 END) AS amount_price,
COUNT(1) AS [Transfers]
FROM FootballPlayers
INNER JOIN Transfers ON FootballPlayers.ID_footballplayer = Transfers.code_footballplayer
GROUP BY FootballPlayers.Last_Name
) x
WHERE x.Transfers > 3

How to aggregate data in one column by values in another column using SQL

I have a table in PostgreSQL that contains demographic data for each province of my country.
Columns are: Province_name, professions, Number_of_people.
As you can see, Province_names are repeated for each profession.
How then can I get the province names not repeated and instead get the professions in separate columns?
It sounds like you want to pivot your table (Really: It is better to show data and expected output in your question!)
demo:db<>fiddle
This is the PostgreSQL way (since 9.4) to do that using the FILTER clause
SELECT
province,
SUM(people) FILTER (WHERE profession = 'teacher') AS teacher,
SUM(people) FILTER (WHERE profession = 'banker') AS banker,
SUM(people) FILTER (WHERE profession = 'supervillian') AS supervillian
FROM mytable
GROUP BY province
If you want to go a more common way, you can use the CASE clause
SELECT
province,
SUM(CASE WHEN profession = 'teacher' THEN people ELSE 0 END) AS teacher,
SUM(CASE WHEN profession = 'banker' THEN people ELSE 0 END) AS banker,
SUM(CASE WHEN profession = 'supervillian' THEN people ELSE 0 END) AS supervillian
FROM mytable
GROUP BY province
What you want to do is a pivot which is a little more complicated in Postgresql then in other rdbms. You can use the crosstab function. Find a introduction here: https://www.vertabelo.com/blog/technical-articles/creating-pivot-tables-in-postgresql-using-the-crosstab-function
for you it would look something like this:
SELECT *
FROM crosstab( 'select Province_name, professions, Number_of_people from table1 order by 1,2')
AS final_result(Province_name TEXT, data_scientist NUMERIC,data_engineer NUMERIC,data_architect NUMERIC,student NUMERIC);

Trying to combine two different attributes into one column

I am trying to write a query that shows 3 different columns. The activity class name, total revenue, and number of customers. My code below shows those columns (along with 2 different prices, student and regular). What I am trying to do is write a query that calculates revenue based on what the customer type is * the appropriate customer type (student or regular). I can't seem to be write the query to distinguish the 2 different customer types and the appropriate price into a 'revenue' column. Any help would be much appreciated!
SELECT ACTIVITY_NAME AS CLASS,
STUDENT_PRICE,
REGULAR_PRICE,
Count(CUSTOMER.CUSTOMER_TYPE) AS NUM_CUST,
Count(CUSTOMER.CUSTOMER_TYPE) * ACTIVITY.STUDENT_PRICE AS REVENUE
FROM ACTIVITY
JOIN ACTIVITY_BOOKING
ON ACTIVITY.ID = ACTIVITY_BOOKING.ACTIVITY_ID
JOIN CUSTOMER
ON ACTIVITY_BOOKING.CUSTOMER_ID = CUSTOMER.ID
GROUP BY ACTIVITY_NAME,
STUDENT_PRICE,
REGULAR_PRICE
ORDER BY ACTIVITY.ACTIVITY_NAME
I think you need SUM not COUNT and you can sum conditional value as:
SELECT ACTIVITY_NAME AS CLASS,
STUDENT_PRICE,
REGULAR_PRICE,
Count(CUSTOMER.CUSTOMER_TYPE) AS NUM_CUST,
Sum (case when CUSTOMER.CUSTOMER_TYPE = 'Student' then STUDENT_PRICE when CUSTOMER.CUSTOMER_TYPE = 'Regular' then REGULAR_PRICE else 0 end) AS REVENUE
FROM ACTIVITY
JOIN ACTIVITY_BOOKING
ON ACTIVITY.ID = ACTIVITY_BOOKING.ACTIVITY_ID
JOIN CUSTOMER
ON ACTIVITY_BOOKING.CUSTOMER_ID = CUSTOMER.ID
GROUP BY ACTIVITY_NAME,
STUDENT_PRICE,
REGULAR_PRICE
ORDER BY ACTIVITY.ACTIVITY_NAME

SQL: multiple counts from same table

I am having a real problem trying to get a query with the data I need. I have tried a few methods without success. I can get the data with 4 separate queries, just can't get hem into 1 query. All data comes from 1 table. I will list as much info as I can.
My data looks like this. I have a customerID and 3 columns that record who has worked on the record for that customer as well as the assigned acct manager
RecID_Customer___CreatedBy____LastUser____AcctMan
1-------1374----------Bob Jones--------Mary Willis------Bob Jones
2-------1375----------Mary Willis------Bob Jones--------Bob Jones
3-------1376----------Jay Scott--------Mary Willis-------Mary Willis
4-------1377----------Jay Scott--------Mary Willis------Jay Scott
5-------1378----------Bob Jones--------Jay Scott--------Jay Scott
I want the query to return the following data. See below for a description of how each is obtained.
Employee___Created__Modified__Mod Own__Created Own
Bob Jones--------2-----------1---------------1----------------1
Mary Willis------1-----------2---------------1----------------0
Jay Scott--------2-----------1---------------1----------------1
Created = Counts the number of records created by each Employee
Modified = Number of records where the Employee is listed as Last User
(except where they created the record)
Mod Own = Number of records for each where the LastUser = Acctman
(account manager)
Created Own = Number of Records created by the employee where they are
the account manager for that customer
I can get each of these from a query, just need to somehow combine them:
Select CreatedBy, COUNT(CreatedBy) as Created
FROM [dbo].[Cust_REc] GROUP By CreatedBy
Select LastUser, COUNT(LastUser) as Modified
FROM [dbo].[Cust_REc] Where LastUser != CreatedBy GROUP By LastUser
Select AcctMan, COUNT(AcctMan) as CreatePort
FROM [dbo].[Cust_REc] Where AcctMan = CreatedBy GROUP By AcctMan
Select AcctMan, COUNT(AcctMan) as ModPort
FROM [dbo].[Cust_REc] Where AcctMan = LastUser AND NOT AcctMan = CreatedBy GROUP By AcctMan
Can someone see a way to do this? I may have to join the table to itself, but my attempts have not given me the correct data.
The following will give you the results you're looking for.
select
e.employee,
create_count=(select count(*) from customers c where c.createdby=e.employee),
mod_count=(select count(*) from customers c where c.lastmodifiedby=e.employee),
create_own_count=(select count(*) from customers c where c.createdby=e.employee and c.acctman=e.employee),
mod_own_count=(select count(*) from customers c where c.lastmodifiedby=e.employee and c.acctman=e.employee)
from (
select employee=createdby from customers
union
select employee=lastmodifiedby from customers
union
select employee=acctman from customers
) e
Note: there are other approaches that are more efficient than this but potentially far more complex as well. Specifically, I would bet there is a master Employee table somewhere that would prevent you from having to do the inline view just to get the list of names.
this seems pretty straight forward. Try this:
select a.employee,b.created,c.modified ....
from (select distinct created_by from data) as a
inner join
(select created_by,count(*) as created from data group by created_by) as b
on a.employee = b.created_by)
inner join ....
This highly inefficient query may be a rough start to what you are looking for. Once you validate the data then there are things you can do to tidy it up and make it more efficient.
Also, I don't think you need the DISTINCT on the UNION part because the UNION will return DISTINCT values unless UNION ALL is specified.
SELECT
Employees.EmployeeID,
Created =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.CreatedBy=Employees.EmployeeID),
Mopdified =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.LastUser=Employees.EmployeeID AND Cust_REc.CreateBy<>Employees.EmployeeID),
ModOwn =
CASE WHEN NOT Empoyees.IsManager THEN NULL ELSE
(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID)
END,
CreatedOwn=(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID AND CReatedBy=Employees.EMployeeID)
FROM
(
SELECT
EmployeeID,
IsManager=CASE WHEN EXISTS(SELECT AcctMan FROM CustRec WHERE AcctMan=EmployeeID)
FROM
(
SELECT DISTINCT
EmployeeID
FROM
(
SELECT EmployeeID=CreatedBy FROM Cust_Rec
UNION
SELECT EmployeeID=LastUser FROM Cust_Rec
UNION
SELECT EmployeeID=AcctMan FROM Cust_Rec
)AS Z
)AS Y
)
AS Employees
I had the same issue with the Modified column. All the other columns worked okay. DCR example would work well with the join on an employees table if you have it.
SELECT CreatedBy AS [Employee],
COUNT(CreatedBy) AS [Created],
--Couldn't get modified to pull the right results
SUM(CASE WHEN LastUser = AcctMan THEN 1 ELSE 0 END) [Mod Own],
SUM(CASE WHEN CreatedBy = AcctMan THEN 1 ELSE 0 END) [Created Own]
FROM Cust_Rec
GROUP BY CreatedBy

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.