SQL: multiple counts from same table

SQL: multiple counts from same table - sql

I am having a real problem trying to get a query with the data I need. I have tried a few methods without success. I can get the data with 4 separate queries, just can't get hem into 1 query. All data comes from 1 table. I will list as much info as I can.
My data looks like this. I have a customerID and 3 columns that record who has worked on the record for that customer as well as the assigned acct manager
RecID_Customer___CreatedBy____LastUser____AcctMan
1-------1374----------Bob Jones--------Mary Willis------Bob Jones
2-------1375----------Mary Willis------Bob Jones--------Bob Jones
3-------1376----------Jay Scott--------Mary Willis-------Mary Willis
4-------1377----------Jay Scott--------Mary Willis------Jay Scott
5-------1378----------Bob Jones--------Jay Scott--------Jay Scott
I want the query to return the following data. See below for a description of how each is obtained.
Employee___Created__Modified__Mod Own__Created Own
Bob Jones--------2-----------1---------------1----------------1
Mary Willis------1-----------2---------------1----------------0
Jay Scott--------2-----------1---------------1----------------1
Created = Counts the number of records created by each Employee
Modified = Number of records where the Employee is listed as Last User
(except where they created the record)
Mod Own = Number of records for each where the LastUser = Acctman
(account manager)
Created Own = Number of Records created by the employee where they are
the account manager for that customer
I can get each of these from a query, just need to somehow combine them:
Select CreatedBy, COUNT(CreatedBy) as Created
FROM [dbo].[Cust_REc] GROUP By CreatedBy
Select LastUser, COUNT(LastUser) as Modified
FROM [dbo].[Cust_REc] Where LastUser != CreatedBy GROUP By LastUser
Select AcctMan, COUNT(AcctMan) as CreatePort
FROM [dbo].[Cust_REc] Where AcctMan = CreatedBy GROUP By AcctMan
Select AcctMan, COUNT(AcctMan) as ModPort
FROM [dbo].[Cust_REc] Where AcctMan = LastUser AND NOT AcctMan = CreatedBy GROUP By AcctMan
Can someone see a way to do this? I may have to join the table to itself, but my attempts have not given me the correct data.

The following will give you the results you're looking for.
select
e.employee,
create_count=(select count(*) from customers c where c.createdby=e.employee),
mod_count=(select count(*) from customers c where c.lastmodifiedby=e.employee),
create_own_count=(select count(*) from customers c where c.createdby=e.employee and c.acctman=e.employee),
mod_own_count=(select count(*) from customers c where c.lastmodifiedby=e.employee and c.acctman=e.employee)
from (
select employee=createdby from customers
union
select employee=lastmodifiedby from customers
union
select employee=acctman from customers
) e
Note: there are other approaches that are more efficient than this but potentially far more complex as well. Specifically, I would bet there is a master Employee table somewhere that would prevent you from having to do the inline view just to get the list of names.

this seems pretty straight forward. Try this:
select a.employee,b.created,c.modified ....
from (select distinct created_by from data) as a
inner join
(select created_by,count(*) as created from data group by created_by) as b
on a.employee = b.created_by)
inner join ....

This highly inefficient query may be a rough start to what you are looking for. Once you validate the data then there are things you can do to tidy it up and make it more efficient.
Also, I don't think you need the DISTINCT on the UNION part because the UNION will return DISTINCT values unless UNION ALL is specified.
SELECT
Employees.EmployeeID,
Created =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.CreatedBy=Employees.EmployeeID),
Mopdified =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.LastUser=Employees.EmployeeID AND Cust_REc.CreateBy<>Employees.EmployeeID),
ModOwn =
CASE WHEN NOT Empoyees.IsManager THEN NULL ELSE
(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID)
END,
CreatedOwn=(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID AND CReatedBy=Employees.EMployeeID)
FROM
(
SELECT
EmployeeID,
IsManager=CASE WHEN EXISTS(SELECT AcctMan FROM CustRec WHERE AcctMan=EmployeeID)
FROM
(
SELECT DISTINCT
EmployeeID
FROM
(
SELECT EmployeeID=CreatedBy FROM Cust_Rec
UNION
SELECT EmployeeID=LastUser FROM Cust_Rec
UNION
SELECT EmployeeID=AcctMan FROM Cust_Rec
)AS Z
)AS Y
)
AS Employees

I had the same issue with the Modified column. All the other columns worked okay. DCR example would work well with the join on an employees table if you have it.
SELECT CreatedBy AS [Employee],
COUNT(CreatedBy) AS [Created],
--Couldn't get modified to pull the right results
SUM(CASE WHEN LastUser = AcctMan THEN 1 ELSE 0 END) [Mod Own],
SUM(CASE WHEN CreatedBy = AcctMan THEN 1 ELSE 0 END) [Created Own]
FROM Cust_Rec
GROUP BY CreatedBy

Related

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100

UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)

You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Join result of two queries with same column in oracle

I am using Oracle. I am currently working on one table with two different query output. I want to combine two output in single output, I have tried Union all and union but no luck.
with D as
(
Select
VP.HOMELABORLEVELNM4 as DEPT,
SUM(X.DURATIONSECSQTY/3600.0) as ACTL_HR,
SUM(X.WAGEAMT) as ACTL_DLR,
to_char(X.APPLYDTM,'YYYY-MM') AS MONTHLY,
VP.HOMELABORLEVELDSC4 as DESCRIPTION,
NULL as DAILY,
NULL as DEPT1,
NULL as ACTL_HR1,
NULL as ACTL_DLR1
from VP_EMPLOYEEV42 VP,
WFCTOTAL X
where
VP.PERSONID = X.EMPLOYEEID and
X.APPLYDTM between '01-DEC-18' and '31-DEC-18' and
X.EMPLOYEEID in (select personid from PERSONCSTMDATA where CUSTOMDATADEFID ='154' and PERSONCSTMDATATXT = 'USKEANE')
group by VP.HOMELABORLEVELNM4, VP.HOMELABORLEVELDSC4, to_char(X.APPLYDTM,'YYYY-MM')
union all
Select
NULL as DEPT,
NULL as ACTL_HR,
NULL as ACTL_DLR,
NULL as MONTHLY,
VP.HOMELABORLEVELDSC4 as DESCRIPTION,
to_char(X.APPLYDTM) as DAILY,
VP.HOMELABORLEVELNM4 as DEPT1,
SUM(X.DURATIONSECSQTY/3600.0) as ACTL_HR1,
SUM(X.WAGEAMT) as ACTL_DLR1
from VP_EMPLOYEEV42 VP,
WFCTOTAL X
where
VP.PERSONID = X.EMPLOYEEID and
X.APPLYDTM = '31-DEC-18' and
X.EMPLOYEEID in (select personid from PERSONCSTMDATA where CUSTOMDATADEFID ='154' and PERSONCSTMDATATXT = 'USKEANE')
group by VP.HOMELABORLEVELNM4, VP.HOMELABORLEVELDSC4, to_char(X.APPLYDTM)
)
select D.DEPT DEPT,
SUM(D.ACTL_HR) ACTL_HR,
SUM(D.ACTL_DLR) ACTL_DLR,
D.MONTHLY MONTHLY,
D.DESCRIPTION DESCRIPTION,
D.DAILY DAILY,
D.DEPT1 DEPT1,
SUM(D.ACTL_HR1) ACTL_HR1,
SUM(D.ACTL_DLR1) ACTL_DLR1
from D
group by D.DEPT, D.MONTHLY, D.DAILY, D.DESCRIPTION, D.DEPT1
order by DESCRIPTION
it is giving me output like this
-DEPT-HR-DLR-MONTHLY-DESC-DAILY-DEPT-HR-DLR-
-1-12-12-11/1-Manu-NULL-NULL-NULL-NULL-
-NULL-NULL-NULL-NULL-Manu-17-1-12-12-

As long as you have null value in any of the fields you want to group on, you will receive it as a separate row. I think you'd want to review your needed output, then we can try to paraphrase it with code to you.
Hint: you may want to look into JOIN, and group only on D.MONTHLY, D.DAILY, D.DESCRIPTION, D.DEPT1, because your DEPT column is missing in one of the tables.

I think your goal (which is not quite clear to me) might be easier to achieve following this pattern:
with Query1 as (select fields from table where conditions are met),
Query2 as (select fields from table where conditions are met)
select fields
from Query1
outer join Query2
on Query1.identifier_for_match=Query2.identifier_for_match
where optional conditions are true
Note - the 'identifier_for_match' might be employeeid in your case (which would make it a required part of the Query1/Query2 resultset) - you have to look at the model and figure out how the query should combine the rows.
Also - an answer fitting your tables is easier to provide if the DDL for the tables is provided and some data for the same (including the desired output)

Show unique ID's in a table with all extra info

SELECT Personeelsnummer, Achternaam, Voornaam, Departement, SubDep, SubSubDep, FTE, RedenUitDienst, Anciennitëitsdatum, GeldigOp, Schrapping, Ancienniteit, Positie, Nieveau, OmschrijfingStatuut
FROM tbl_Worker
GROUP BY Personeelsnummer
OR
SELECT (DISTINCT Personeelsnummer), Achternaam, Voornaam, Departement, SubDep, SubSubDep, FTE, RedenUitDienst, Anciennitëitsdatum, GeldigOp, Schrapping, Ancienniteit, Positie, Nieveau, OmschrijfingStatuut
FROM tbl_Worker
GROUP BY Personeelsnummer
I have a worker table with 49000 records, this includes a 'snapshot' from all workers EVERY month. But what I need is a table with all employees the company 'ever' had but only once. so I tried to wright the query's show above but they are not working.
So what I need is a query that shows all unique 'Personeelsnummers' with all the extra information about these persons.
what does work is this: SELECT DISTINCT Personeelsnummer FROM tbl_Worker ==> this gives me a table with 1200 records but only the numbers but I need all the extra information.

Instead of GROUP BY, use WHERE to get the first or last record:
SELECT w.*
FROM tbl_Worker as w
WHERE monthcol = (SELECT MAX(w2.monthcol)
FROM tbl_Worker as w2
WHERE w2.Personeelsnummer = w.Personeelsnummer
);
You would use MIN() to get the first month's record. My Dutch is a bit weak, so I don't know which column refers to the date for the record.
For performance, you want an index on tbl_Worker(Personeelsnummer, GeldigOp):
create index idx_tbl_worker_Personeelsnummer_GeldigOp on tbl_Worker(Personeelsnummer, GeldigOp);
EDIT:
Or you can do it with a JOIN:
SELECT w.*
FROM tbl_Worker as w INNER JOIN
(SELECT Personeelsnummer, MAX(GeldigOp) as max_GeldigOp
FROM tbl_Worker
GROUP BY Personeelsnummer
) as ww
ON ww.Personeelsnummer = w.Personeelsnummer and ww.max_GeldigOp = w.GeldigOp;

You're looking for a group by:
select *
from table
group by field1
Which can occasionally be written with a distinct on statement:
select distinct on field1 *
from table
As seen in this topic.

SQL Nested Select -Subquery returned more than 1 value-

I have a table Sales with columns SalesID, SalesName, SalesCity, SalesState.
I am trying to come up with a query that only shows salesName where there is one SalesName per SalesCity. So for example, if SaleA is in Houston and SaleB is in Houston, SaleA and SaleB will not be returned.
select
SalesName, SalesCity, SalesState
from
Sales
where
(select count(*) from Sales group by SalesCity) = 1;
I am not entirely sure how to link the inner select back out. I need another column in the nested select to identify the SalesID. I am currently stuck and have made no progress.

You can get the names of cities that have only 1 sale by using GROUP BY and HAVING operators. Then use these results in your where clause:
SELECT SalesName, SalesCity, SalesState
FROM Sales WHERE SalesCity IN
(
SELECT SalesCity
FROM Sales
GROUP BY SalesCity
HAVING COUNT(SalesCity) = 1
)

You can do this without a subquery:
select MIN(SalesName) as SalesName, SalesCity, MIN(SalesState) as SalesState
from Sales
group by SalesCity
having count(*) = 1;
If there is only one row for the city, then the min() will return the value on that row.

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.

for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1

Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.

Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.

I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table

For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html

There is also the Levenshtein distance algorithym.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas