choosing latest string when aggregating results in mysql

choosing latest string when aggregating results in mysql - sql

I've been tasked to do generate some reports on our Request Tracker usage. Request Tracker is a ticketing system we use for several departments were I work. To do this I'm taking a nightly snapshot of details about tickets altered for the day into another database. This approach decouples my reporting from the the internal database schema that RT uses.
Amongst many other questions for the report, I need to report how many tickets were resolved in each month per Department. In RT the department is stored as a CustomField, and my modelling follows that trend, as you can see in my query below. However due to how I'm grabbing snapshots each night, I have multiple rows for a ticket, and the Department field can change over the month. I'm only interested in the most recent Department field. I don't know how to get that in a query.
I know I can use 'GROUP BY' to reduce my query results down to one per ticket, but when I do that, I don't know how to grab the last Department setting. As the Departments are all strings, a MAX() doesnt't get the last one. MySQL doesn't require you to use an aggregating function for fields you're selecting, but the results are indeterminate (from my testing it looks like it might grab the first one on my version of MySQL).
To illustrate, here is the results from a query that shows me two tickets, and all it's Department field settings:
"ticket_num","date","QueueName","CF","CFValue","closed"
35750,"2009-09-22","IT_help","Department","",""
35750,"2009-09-23","IT_help","Department","",""
35750,"2009-09-24","IT_help","Department","",""
35750,"2009-09-25","IT_help","Department","",""
35750,"2009-09-26","IT_help","Department","",""
35750,"2009-10-02","IT_help","Department","",""
35750,"2009-10-03","IT_help","Department","",""
35750,"2009-10-12","IT_help","Department","",""
35750,"2009-10-13","IT_help","Department","",""
35750,"2009-10-26","IT_help","Department","Conference/Visitors","2009-10-26 10:10:32"
35750,"2009-10-27","IT_help","Department","Conference/Visitors","2009-10-26 10:10:32"
36354,"2009-10-20","IT_help","Department","",""
36354,"2009-10-21","IT_help","Department","",""
36354,"2009-10-22","IT_help","Department","FS Students",""
36354,"2009-10-23","IT_help","Department","FS Students",""
36354,"2009-10-26","IT_help","Department","FS Students","2009-10-26 12:23:00"
36354,"2009-10-27","IT_help","Department","FS Students","2009-10-26 12:23:00"
As we can see, both tickets were closed on the 26th, and both tickets had an empty Department field for a few days when they first showed up. I've included my query below, you can see that I've artificially limited the number of columns returned in the second half of the where statement:
SELECT d.ticket_num, d.date, q.name as QueueName, cf.name as CF, cfv.value as CFValue, d.closed
FROM daysCF dcf
INNER JOIN daily_snapshots d on dcf.day_id = d.id
INNER JOIN Queues q on d.queue_id = q.id
INNER JOIN CustomFieldValues cfv on dcf.cfv_id = cfv.id
INNER JOIN CustomFields cf on cf.id = cfv.field_id
WHERE cf.name = 'Department' and (d.ticket_num = 35750 or d.ticket_num = 36354)
ORDER by d.ticket_num, d.date
How can I modify that query so I get a result set that tells me that in October there was one ticket closed for "FS Students" and one ticket closed for "Conference/Visitors"?

This is the "greatest-n-per-group" problem that comes up frequently on Stack Overflow.
Here's how I'd solve it in your case:
SELECT d1.ticket_num, d1.date, q.name as QueueName,
cf.name as CF, cfv.value as CFValue, d1.closed
FROM daysCF dcf
INNER JOIN daily_snapshots d1 ON (dcf.day_id = d1.id)
INNER JOIN Queues q ON (d1.queue_id = q.id)
INNER JOIN CustomFieldValues cfv ON (dcf.cfv_id = cfv.id)
INNER JOIN CustomFields cf ON (cf.id = cfv.field_id)
LEFT OUTER JOIN daily_snapshots d2 ON (d1.ticket_num = d2.ticket_num AND d1.date < d2.date)
WHERE d2.id IS NULL AND cf.name = 'Department'
ORDER by d1.ticket_num, d1.date;

Mysql doesn't have a LAST operator, so you really need to do this using a temporary table.
CREATE TEMPORARY TABLE last_dates SELECT ticket_num, MAX(date) AS date
FROM daily_snapshots GROUP BY ticket_num
that gets you a table with the last date for each ticket. Then in your main query, join against this table with both the ticket_num and date fields. This will filter out all rows for which the date isn't the latest for the corresponding ticket number.
You might need an index on that temporary table, I'll leave that to you.

Related

SQL - join three tables based on (different) latest dates in two of them

Using Oracle SQL Developer, I have three tables with some common data that I need to join.
Appreciate any help on this!
Please refer to https://i.stack.imgur.com/f37Jh.png for the input and desired output (table formatting doesn't work on all tables).
These tables are made up in order to anonymize them, and in reality contain other data with millions of entries, but you could think of them as representing:
Product = Main product categories in a grocery store.
Subproduct = Subcategory products to the above. Each time the table is updated, the main product category may loses or get some new suproducts assigned to it. E.g. you can see that from May to June the Pulled pork entered while the Fishsoup was thrown out.
Issues = Status of the products, for example an apple is bad if it has brown spots on it..
What I need to find is: for each P_NAME, find the latest updated set of subproducts (SP_ID and SP_NAME), and append that information with the latest updated issue status (STATUS_FLAG).
Please note that each main product category gets its set of subproducts updated at individual occasions i.e. 1234 and 5678 might be "latest updated" on different dates.
I have tried multiple queries but failed each time. I am using combos of SELECT, LEFT OUTER JOIN, JOIN, MAX and GROUP BY.
Latest attempt, which gives me the combo of the first two tables, but missing the third:
SELECT
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE
FROM SUBPRODUCT
LEFT OUTER JOIN PRODUCT ON PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
JOIN(SELECT SP_PRODUCT_ID, MAX(SP_VALUE_DATE) AS latestdate FROM SUBPRODUCT GROUP BY SP_PRODUCT_ID) sub ON
sub.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID AND sub.latestDate = SUBPRODUCT.SP_VALUE_DATE;

Trying to find a row with a max value is a common SQL pattern - you can do it with a join, like your example, but it's usually more clear to use a subquery or a window function.
Correlated subquery example
select
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE,
ISSUES.STATUS_FLAG, ISSUES.STATUS_LAST_UPDATED
from PRODUCT
join SUBPRODUCT
on PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
and SUBPRODUCT.SP_VALUE_DATE = (select max(S2.SP_VALUE_DATE) as latestDate
from SUBPRODUCT S2
where S2.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID)
join ISSUES
on ISSUES.ISSUE_ID = SUBPRODUCT.SP_ID
and ISSUES.STATUS_LAST_UPDATED = (select max(I2.STATUS_LAST_UPDATED) as latestDate
from ISSUES I2
where I2.ISSUE_ID = ISSUES.ISSUE_ID)
Window function / inline view example
select
PRODUCT.P_NAME,
S.SP_PRODUCT_ID, S.SP_NAME, S.SP_ID, S.SP_VALUE_DATE,
I.STATUS_FLAG, I.STATUS_LAST_UPDATED
from PRODUCT
join (select SUBPRODUCT.*,
max(SP_VALUE_DATE) over (partition by SP_PRODUCT_ID) as latestDate
from SUBPRODUCT) S
on PRODUCT.P_ID = S.SP_PRODUCT_ID
and S.SP_VALUE_DATE = S.latestDate
join (select ISSUES.*,
max(STATUS_LAST_UPDATED) over (partition by ISSUE_ID) as latestDate
from ISSUES) I
on I.ISSUE_ID = S.SP_ID
and I.STATUS_LAST_UPDATED = I.latestDate
This often performs a bit better, but window functions can be tricky to understand.

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;

You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y

You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.

Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.

Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

Return data entered in column order by row

I am working on a simple timesheet module for a larger production system and need to display a table of information to the user. I have the following tables to work with:
TimeRecords
ID
WorkerID
AssyLineID
Station
Sequence
NbrHours
DateSubmitted
Workers
ID
Name
AssyLines
Name
During data entry, time is entered by AssyLine for each worker. A given worker may work on 2 or more different stations during the course of the day. The Sequence value is assigned based on the order of names as entered during data entry.
Now I want to return this data for all assembly lines and all workers in the following format:
ResultSet
Worker.ID
Worker.Name
AssyLine.Name - group returned rows by assembly line, in alphabetical order
Sequence - within each assembly line, group by sequence
NbrHours - total hours for worker for this assembly line, all stations
TotalHours - total hours for this worker across all assembly lines and stations
Other caveats:
1) The rows for a given worker should be grouped together, starting with the assembly line where they logged the most hours, in the sequence for that assembly line. I plan to consolidate all entries for a given worker into one row for display to the user and this is much easier if all rows for one user are grouped together. If that can't be done I will have to group and sort the row data in code...
Here is the query I have come up with so far:
SELECT
w.ID
,w.Name
,a.Name
,tr.NbrHours
,tr.Seq
FROM
TimeRecords tr
INNER JOIN
Workers w ON
w.ID = tr.WorkerId
INNER JOIN
AssyLines a ON
a.AssyLineID = tr.AssyLineId
WHERE
tr.DateSubmitted < '2000-01-01'
ORDER BY
w.Name
,a.Name
,tr.Seq
,NbrHours DESC
Obviously this leaves a lot to be desired. The worker entries are not grouped together and there is no overall total for the worker.
Can anyone help me get this right? I'm thinking I will need to do this with a Stored Proc rather than a view...
Thanks,
Dave

Most of this can be done with a simple group by clause; the messy part comes with your requirement for showing all hours, but I believe something like this should work depending on what DB you are using:
SELECT
w.ID
,w.Name
,a.Name
,tr.Seq
,SUM(tr.NbrHours) as nbrHours
(SELECT SUM(tr.NbrHours)
FROM TimeRecords tr2
WHERE tr2.WorkerId = w.id and tr2..DateSubmitted < '2000-01-01') as TotalHours
FROM
TimeRecords tr
INNER JOIN
Workers w ON
w.ID = tr.WorkerId
INNER JOIN
AssyLines a ON
a.AssyLineID = tr.AssyLineId
WHERE
tr.DateSubmitted < '2000-01-01'
GROUP BY
w.ID
,w.Name
,a.Name
,tr.Seq
ORDER BY
ReportName
,ShortName
,tr.Seq

How to view the Suggsetions sorted in the Database for the last three months?

I am a New ASP.NET Developer and I am trying to develop a simple suggestion box system. I have the following part of my database desing:
User Table: Username, Name, DivisionCode... etc
Division Table: SapCode, Division
SuggestionLog Table: ID, Title, Description, submittedDate, Username
(The first attribute is the primary key in each table and the attribute (submittedDate) is of DateTime data type)
Now, I need to develop a table that shows suggestions for the last three months. I already developed a query that shows the Employee Name, Username, Division, Suggestion Title, Suggestion Description. All what I want now is to show the Month. For example, to show the suggestions for the last three months, the Month column should show: Jan-2012, Dec-2011, Nov-2011 So how to do that?
My current SQL query:
SELECT dbo.SafetySuggestionsLog.Title, dbo.SafetySuggestionsLog.Description, dbo.SafetySuggestionsType.Type, dbo.SafetySuggestionsLog.Username,
dbo.employee.Name, dbo.Divisions.DivisionShortcut
FROM dbo.Divisions INNER JOIN
dbo.employee ON dbo.Divisions.SapCode = dbo.employee.DivisionCode INNER JOIN
dbo.SafetySuggestionsLog ON dbo.employee.Username = dbo.SafetySuggestionsLog.Username INNER JOIN
dbo.SafetySuggestionsType ON dbo.SafetySuggestionsLog.TypeID = dbo.SafetySuggestionsType.ID
The desired output is to display:
Employee Name, Username, Division, SuggestionTitle, SuggstionDescription, SuggestionType Month(submissionDate)

I reformatted you query so it would fit on the page without scrolling.
Hopefully this provides what you need. It uses DATENAME to get the month and year parts from the current date and DATEPART to do the "three months ago" calculation.
Note that DATEPART doesn't behave as you might expect - it counts the number of period-end boundaries (in this case months) - hence the condition is
...WHERE DATEDIFF(month,SafetySuggestionsLog.submittedDate,getdate()) < 3
because the last three months have two month-end boundaries between them.
I also added an ORDER BY clause.
SELECT dbo.SafetySuggestionsLog.Title,
dbo.SafetySuggestionsLog.Description,
dbo.SafetySuggestionsType.Type,
dbo.SafetySuggestionsLog.Username,
dbo.employee.Name,
dbo.Divisions.DivisionShortcut,
left(datename(month,SafetySuggestionsLog.submittedDate),3)
+ '-'
+ datename(year,SafetySuggestionsLog.submittedDate) AS SubmittedMonth
FROM dbo.Divisions
INNER JOIN dbo.employee
ON dbo.Divisions.SapCode = dbo.employee.DivisionCode
INNER JOIN dbo.SafetySuggestionsLog
ON dbo.employee.Username = dbo.SafetySuggestionsLog.Username
INNER JOIN dbo.SafetySuggestionsType
ON dbo.SafetySuggestionsLog.TypeID = dbo.SafetySuggestionsType.ID
WHERE DATEDIFF(month,SafetySuggestionsLog.submittedDate,getdate()) < 3
ORDER BY SafetySuggestionsLog.submittedDate DESC
It might also be worth noting that you don't have to fully qualify the name of all the columns in the query - it's valid SQL to alias the input tables like so:
...INNER JOIN dbo.SafetySuggestionsLog AS log
You can then refer to column names by alias in the query - e.g.
log.Username
instead of
dbo.SafetySuggestionsLog.Username
which makes it a bit easier to read.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas