SQL COUNT By month - sql

I have the following table that registers the days students went to school. I need to count the days they are PRESENT, but also need to count the total of school days for each month. (When ASISTENCIA is either 0 or 1)
This what I have so far, but it doesn't count the total.
BEGIN
SELECT
u.user_id user_id,
u.user_first_name as names,
u.user_last_name_01 as lastname1,
u.user_last_name_02 as lastname2,
MONTH(a.FECHA_ASISTENCIA) month,
COUNT(*) as absent_days,
p.PHONE as phone,
p.CITY as city,
#EDUCATION_LEVEL_ID
FROM
users u
inner join asistencia a ON u.user_id = a.USER_ID
inner join profile p ON u.rut_SF = p.RUT_SF
WHERE
a.ASISTENCIA = 0 -- NOT PRESENT
AND a.EDUCATION_LEVEL_ID = #EDUCATION_LEVEL_ID
AND YEAR(a.FECHA_ASISTENCIA) = #EDUCATION_LEVEL_YEAR
GROUP BY
u.user_id,
u.user_first_name,
u.user_last_name_01,
u.user_last_name_02,
MONTH(a.FECHA_ASISTENCIA),
p.TELEFONO,
p.CIUDAD_DOM
ORDER BY mes
END
ATTENDANCE
USER_ID
DATE
ATTENDANCE
EDUCATION_LEVEL_ID
123
2021-04-13
0
1
123
2021-04-14
1
1
DESIRED OUTPUT
names
lastname1
lastname2
month
absent_days
total_class_days
city
JOHN
SMITH
SMITH
3
10
24
CITY
JOHN
SMITH
SMITH
4
8
24
CITY

Without examples of what these tables look like, it is hard to give you a solid answer. However, it appears as though your biggest challenge is that ABSTENTIA does not exist.
This is a common problem for analysis - you need to create rows that do not exist (when the user was absent).
The general approach is to:
create a list of unique users
create a list of unique dates you care about
Cross Join (cartesian join) these to create every possible combination of user and date
Outer Join #3 to #4 so you can populate a PRESENT flag, and now you can see both who were PRESENT and which were not.
Filter out rows which don't apply (for instance if a user joined on 3/4/2021 then ignore the blank rows before this date)
You can accomplish this with some SQL that looks like this:
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) as INTERVAL_ID,
DATEADD('DAY', (INTERVAL_ID - 1), '2021-01-01'::timestamp_ntz) as SPINE_START,
DATEADD('DAY', INTERVAL_ID, '2022-01-01'::timestamp_ntz) as SPINE_END
FROM TABLE (GENERATOR(ROWCOUNT => 365))
),
GROUPS AS (
SELECT
USERID,
MIN(DESIRED_INTERVAL) AS LOCAL_START,
MAX(DESIRED_INTERVAL) AS LOCAL_END
FROM RASGO.PUBLIC.RASGO_SDK__OP4__AGGREGATE_TRANSFORM__8AB1FEDF90
GROUP BY
USERID
),
GROUP_SPINE AS (
SELECT
USERID,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM GROUPS G
CROSS JOIN LATERAL (
SELECT
SPINE_START, SPINE_END
FROM GLOBAL_SPINE S
WHERE S.SPINE_START BETWEEN G.LOCAL_START AND G.LOCAL_END
)
)
SELECT
G.USERID AS GROUP_BY_USERID,
GROUP_START,
GROUP_END,
T.*
FROM GROUP_SPINE G
LEFT JOIN {{ your_table }} T
ON DESIRED_INTERVAL >= G.GROUP_START
AND DESIRED_INTERVAL < G.GROUP_END
AND G.USERID = T.USERID;
The above script works on Snowflake, but the syntax might be slightly different depending on your RDBMS. There are also some other tweaks you can make regarding when you insert the blank rows, I chose 'local' which means that we begin inserting rows for each user on their very first day. You could change this to global if you wanted to populate data from every single day between 1/1/2021 and 1/1/2022.
I used Rasgo to generate this SQL

Related

Query keeps giving me duplicate records. How can I fix this?

I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1

SQL: multiple counts from same table

I am having a real problem trying to get a query with the data I need. I have tried a few methods without success. I can get the data with 4 separate queries, just can't get hem into 1 query. All data comes from 1 table. I will list as much info as I can.
My data looks like this. I have a customerID and 3 columns that record who has worked on the record for that customer as well as the assigned acct manager
RecID_Customer___CreatedBy____LastUser____AcctMan
1-------1374----------Bob Jones--------Mary Willis------Bob Jones
2-------1375----------Mary Willis------Bob Jones--------Bob Jones
3-------1376----------Jay Scott--------Mary Willis-------Mary Willis
4-------1377----------Jay Scott--------Mary Willis------Jay Scott
5-------1378----------Bob Jones--------Jay Scott--------Jay Scott
I want the query to return the following data. See below for a description of how each is obtained.
Employee___Created__Modified__Mod Own__Created Own
Bob Jones--------2-----------1---------------1----------------1
Mary Willis------1-----------2---------------1----------------0
Jay Scott--------2-----------1---------------1----------------1
Created = Counts the number of records created by each Employee
Modified = Number of records where the Employee is listed as Last User
(except where they created the record)
Mod Own = Number of records for each where the LastUser = Acctman
(account manager)
Created Own = Number of Records created by the employee where they are
the account manager for that customer
I can get each of these from a query, just need to somehow combine them:
Select CreatedBy, COUNT(CreatedBy) as Created
FROM [dbo].[Cust_REc] GROUP By CreatedBy
Select LastUser, COUNT(LastUser) as Modified
FROM [dbo].[Cust_REc] Where LastUser != CreatedBy GROUP By LastUser
Select AcctMan, COUNT(AcctMan) as CreatePort
FROM [dbo].[Cust_REc] Where AcctMan = CreatedBy GROUP By AcctMan
Select AcctMan, COUNT(AcctMan) as ModPort
FROM [dbo].[Cust_REc] Where AcctMan = LastUser AND NOT AcctMan = CreatedBy GROUP By AcctMan
Can someone see a way to do this? I may have to join the table to itself, but my attempts have not given me the correct data.
The following will give you the results you're looking for.
select
e.employee,
create_count=(select count(*) from customers c where c.createdby=e.employee),
mod_count=(select count(*) from customers c where c.lastmodifiedby=e.employee),
create_own_count=(select count(*) from customers c where c.createdby=e.employee and c.acctman=e.employee),
mod_own_count=(select count(*) from customers c where c.lastmodifiedby=e.employee and c.acctman=e.employee)
from (
select employee=createdby from customers
union
select employee=lastmodifiedby from customers
union
select employee=acctman from customers
) e
Note: there are other approaches that are more efficient than this but potentially far more complex as well. Specifically, I would bet there is a master Employee table somewhere that would prevent you from having to do the inline view just to get the list of names.
this seems pretty straight forward. Try this:
select a.employee,b.created,c.modified ....
from (select distinct created_by from data) as a
inner join
(select created_by,count(*) as created from data group by created_by) as b
on a.employee = b.created_by)
inner join ....
This highly inefficient query may be a rough start to what you are looking for. Once you validate the data then there are things you can do to tidy it up and make it more efficient.
Also, I don't think you need the DISTINCT on the UNION part because the UNION will return DISTINCT values unless UNION ALL is specified.
SELECT
Employees.EmployeeID,
Created =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.CreatedBy=Employees.EmployeeID),
Mopdified =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.LastUser=Employees.EmployeeID AND Cust_REc.CreateBy<>Employees.EmployeeID),
ModOwn =
CASE WHEN NOT Empoyees.IsManager THEN NULL ELSE
(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID)
END,
CreatedOwn=(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID AND CReatedBy=Employees.EMployeeID)
FROM
(
SELECT
EmployeeID,
IsManager=CASE WHEN EXISTS(SELECT AcctMan FROM CustRec WHERE AcctMan=EmployeeID)
FROM
(
SELECT DISTINCT
EmployeeID
FROM
(
SELECT EmployeeID=CreatedBy FROM Cust_Rec
UNION
SELECT EmployeeID=LastUser FROM Cust_Rec
UNION
SELECT EmployeeID=AcctMan FROM Cust_Rec
)AS Z
)AS Y
)
AS Employees
I had the same issue with the Modified column. All the other columns worked okay. DCR example would work well with the join on an employees table if you have it.
SELECT CreatedBy AS [Employee],
COUNT(CreatedBy) AS [Created],
--Couldn't get modified to pull the right results
SUM(CASE WHEN LastUser = AcctMan THEN 1 ELSE 0 END) [Mod Own],
SUM(CASE WHEN CreatedBy = AcctMan THEN 1 ELSE 0 END) [Created Own]
FROM Cust_Rec
GROUP BY CreatedBy

SQL Grouping / Contract Value

I know this is going to be a simple one but after a midnight run at other coding my SQL brain is fried so I'm just reaching out for some quick help. I've got a test table with Agent, AgentID, Parent Account, AccountID, and TCV. What I need to do is pull all the agent/account IDs where the AccountIDs belong to an aggregate parent account under that agents name >= 10K.
So in this example, John has 2 accounts under the parent account ABC123 and since their total value is >=10K, these 2 would get pulled. But notice below where Jane has 2 accounts likewise under ABC123 but b/c their total value in her name is < 10K, they would not be pulled. So the results would be something like this:
Essetially I need to pull all AccountIDs where the total value of the parent account they roll-up to for that person is >= 10K. BTW, I'm using SQL Server Management Studio R2.
You just do a simple group by/having to get a list of agentids/parentaccounts that meet the 10k criteria. Then you can use that in a sub-select to join back to the same table and get the list of account ids
select agentid, accountid
from table t
inner join (
select agentid, parentaccount
from table
group by agentid, parentaccount
having sum(tcv) >= 10000
) t1
on t.agentid = t1.agentid
and t.parentaccount = t1.parentaccount
;WITH MyCTE AS
(
SELECT AgentID,
ParentAccount,
SUM(TCV) AS Total
FROM TableName
GROUP BY AgentID,
ParentAccount
)
SELECT T.AgentId, T.AccountId
FROM Table T
JOIN MyCTE M
ON M.AgentId = T.AgentId
AND M.ParentAccount= T.ParentAccount
WHERE M.Total>10000

Find incorrect records by Id

I am trying to find records where the personID is associated to the incorrect SoundFile(String). I am trying to search for incorrect records among all personID's, not just one specific one. Here are my example tables:
TASKS-
PersonID SoundFile(String)
123 D10285.18001231234.mp3
123 D10236.18001231234.mp3
123 D10237.18001231234.mp3
123 D10212.18001231234.mp3
123 D12415.18001231234.mp3
**126 D19542.18001231234.mp3
126 D10235.18001234567.mp3
126 D19955.18001234567.mp3
RECORDINGS-
PhoneNumber(Distinct Records)
18001231234
18001234567
So in this example, I am trying to find all records like the one that I indented. The majority of the soundfiles like '%18001231234%' are associated to PersonID 123, but this one record is PersonID 126. I need to find all records where for all distinct numbers from the Recordings table, the PersonID(s) is not the majority.
Let me know if you need more information!
Thanks in advance!!
; WITH distinctRecordings AS (
SELECT DISTINCT PhoneNumber
FROM Recordings
),
PersonCounts as (
SELECT t.PersonID, dr.PhoneNumber, COUNT(*) AS num
FROM
Tasks t
JOIN distinctRecordings dr
ON t.SoundFile LIKE '%' + dr.PhoneNumber + '%'
GROUP BY t.PersonID, dr.PhoneNumber
)
SELECT t.PersonID, t.SoundFile
FROM PersonCounts pc1
JOIN PersonCounts pc2
ON pc2.PhoneNumber = pc1.PhoneNumber
AND pc2.PersonID <> pc1.PersonID
AND pc2.Num < pc1.Num
JOIN Tasks t
ON t.PersonID = pc2.PersonID
AND t.SoundFile LIKE '%' + pc2.PhoneNumber + '%'
SQL Fiddle Here
To summarize what this does... the first CTE, distinctRecordings, is just a distinct list of the Phone Numbers in Recordings.
Next, PersonCounts is a count of phone numbers associated with the records in Tasks for each PersonID.
This is then joined to itself to find any duplicates, and selects whichever duplicate has the smaller count... this is then joined back to Tasks to get the offending soundFile for that person / phone number.
(If your schema had some minor improvements made to it, this query would have been much simpler...)
here you go, receiving all pairs (PersonID, PhoneNumber) where the person has less entries with the given phone number than the person with the maximum entries. note that the query doesn't cater for multiple persons on par within a group.
select agg.pid
, agg.PhoneNumber
from (
select MAX(c) KEEP ( DENSE_RANK FIRST ORDER BY c DESC ) OVER ( PARTITION BY rt.PhoneNumber ) cmax
, rt.PhoneNumber
, rt.PersonID pid
, rt.c
from (
select r.PhoneNumber
, t.PersonID
, count(*) c
from recordings r
inner join tasks t on ( r.PhoneNumber = regexp_replace(t.SoundFile, '^[^.]+\.([^.]+)\.[^.]+$', '\1' ) )
group by r.PhoneNumber
, t.PersonID
) rt
) agg
where agg.c < agg.cmax
;
caveat: the solution is in oracle syntax though the operations should be in the current sql standard (possibly apart from regexp_replace, which might not matter too much since your sound file data seems to follow a fixed-position structure ).

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.