SQL Find latest record only if COMPLETE field is 0 - sql

I have a table with multiple records submitted by a user. In each record is a field called COMPLETE to indicate if a record is fully completed or not.
I need a way to get the latest records of the user where COMPLETE is 0, LOCATION, DATE are the same and no additional record exist where COMPLETE is 1. In each record there are additional fields such as Type, AMOUNT, Total, etc. These can be different, even though the USER, LOCATION, and DATE are the same.
There is a SUB_DATE field and ID field that denote the day the submission was made and auto incremented ID number. Here is the table:
ID NAME LOCATION DATE COMPLETE SUB_DATE TYPE1 AMOUNT1 TYPE2 AMOUNT2 TOTAL
1 user1 loc1 2017-09-15 1 2017-09-10 Food 12.25 Hotel 65.54 77.79
2 user1 loc1 2017-09-15 0 2017-09-11 Food 12.25 NULL 0 12.25
3 user1 loc2 2017-08-13 0 2017-09-05 Flight 140 Food 5 145.00
4 user1 loc2 2017-08-13 0 2017-09-10 Flight 140 NULL 0 140
5 user1 loc3 2017-07-14 0 2017-07-15 Taxi 25 NULL 0 25
6 user1 loc3 2017-08-25 1 2017-08-26 Food 45 NULL 0 45
The results I would like is to retrieve are ID 4, because the SUB_DATE is later that ID 3. Which it has the same Name, Location, and Date information and there is no COMPLETE with a 1 value.
I would also like to retrieve ID 5, since it is the latest record for the User, Location, Date, and Complete is 0.
I would also appreciate it if you could explain your answer to help me understand what is happening in the solution.

Not sure if I fully understood but try this
SELECT *
FROM (
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your_table t
) a
WHERE CompleteForNameLocationAndDate = 0 AND
SUB_DATE = LastSubDate
So what we have done here:
First, if you run just the inner query in Management Studio, you will see what that does:
The first max function will partition the data in the table by each unique Name,Location,Date set.
In the case of your data, ID 1 & 2 are the first partition, 3&4 are the second partition, 5 is the 3rd partition and 6 is the 4th partition.
So for each of these partitions it will get the max value in the complete column. Therefore any partition with a 1 as it's max value has been completed.
Note also, the convert function. This is because COMPLETE is of datatype BIT (1 or 0) and the max function does not work with that datatype. We therefore convert to INT. If your COMPLETE column is type INT, you can take the convert out.
The second max function partitions by unique Name, Location and Date again but we are getting the max_sub date this time which give us the date of the latest record for the Name,Location,Date
So we take that query and add it to a derived table which for simplicity we call a. We need to do this because SQL Server doesn't allowed windowed functions in the WHERE clause of queries. A windowed function is one that makes use of the OVER keyword as we have done. In an ideal world, SQL would let us do
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your)table t
WHERE MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) = 0 AND
SUB_DATE = MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE)
But it doesn't allow it so we have to use the derived table.
So then we basically SELECT everything from our derived table Where
CompleteForNameLocationAndDate = 0
Which are Name,Location, Date partitions which do not have a record marked as complete.
Then we filter further asking for only the latest record for each partition
SUB_DATE = LastSubDate
Hope that makes sense, not sure what level of detail you need?
As a side, I would look at restructuring your tables (unless of course you have simplified to better explain this problem) as follows:
(Assuming the table in your examples is called Booking)
tblBooking
BookingID
PersonID
LocationID
Date
Complete
SubDate
tblPerson
PersonID
PersonName
tblLocation
LocationID
LocationName
tblType
TypeID
TypeName
tblBookingType
BookingTypeID
BookingID
TypeID
Amount
This way if you ever want to add Type3 or Type4 to your booking information, you don't need to alter your table layout

Related

Query to select appropriate row and calculate elapsed time

I need some help in coming up with a query that will return the answer to the question “How long has a Help Desk Ticket been owned by the currently assigned group?” Following is a subset of the data model with some sample data:
Help Desk Cases
Case ID (PK) Assigned Person Assigned Group
123456 Robert Hardware
Help Desk Case Assignment History
Case ID (PK) Seq # (PK) Assigned Group Assigned Person Elapsed Time Row Added Date/Time
123456 1 Hardware 10
123456 2 Software 2
123456 3 Hardware Sam 1
123456 4 Software Sophie 6
123456 5 Hardware 8
123456 6 Hardware Sam 3
123456 7 Hardware Robert
The Elapsed Time column for the most recent row (Seq #7) is not updated until a subsequent row (Seq #8) is written, so I don’t think I can use an aggregate function. For the sample data above, I need to get the Row Added column from Seq # 5 and subtract it from the current date to get the total amount of time the case has been most recently assigned to the Hardware group (we ignore previous assignments such as Seq # 1 and Seq # 3).
The Query output for the example above should be:
Case ID Assigned Group Assigned Person Time Owned
123456 Hardware Robert Current Date - Seq #5 Row Added Date/Time
With Oracle 12c and higher...
select case_id,
last_assigned_group as assigned_group,
last_assigned_person as assigned_person,
nvl(last_row_added, systimestamp) - first_row_added as time_owned
from help_desk_case_assignment_history
match_recognize (
partition by case_id
order by seq#
measures
first(row_added) as first_row_added,
last(row_added) as last_row_added,
last(assigned_group) as last_assigned_group,
last(assigned_person) as last_assigned_person
one row per match
after match skip past last row
pattern (
assignment_run* case_end
)
define
assignment_run as (assigned_group = next(assigned_group)),
case_end as (elapsed_time is null or next(assigned_group) is null)
)
;
In human words: Per each helpdesk case ID find the last uninterrupted "run" of assignments within the same group. For the last "run" of assignments identify its starting time, ending time, and ending person. And display the found values.
With Oracle 11g and lower...
with xyz as (
select X.*,
case when lnnvl(assigned_group = lag(assigned_group) over (partition by case_id order by seq#)) then seq# end as assignment_run_start
from help_desk_case_assignment_history X
),
xyz2 as (
select X.*,
last_value(assignment_run_start) ignore nulls over (partition by case_id order by seq#) as assignment_run_id
from xyz X
),
xyz3 as (
select case_id, assigned_group, assignment_run_id,
max(assigned_person) keep (dense_rank last order by seq#) as last_assigned_person,
nvl(max(row_added) keep (dense_rank last order by seq#), systimestamp)
- min(row_added) keep (dense_rank first order by seq#)
as time_owned,
row_number() over (partition by case_id order by assignment_run_id desc) as last_group_ind
from xyz2 X
group by case_id, assigned_group, assignment_run_id
)
select case_id, assigned_group, last_assigned_person as assigned_person, time_owned
from xyz3
where last_group_ind = 1
;
Perhaps ugly, but pretty straightforward and working.
In human words:
Identify the boundaries (starts) of assignment runs as increasing numeric IDs.
Extend the found assignment run starts to the whole assignment runs.
Calculate the assignments' run times and last assigned persons.
Restrict the previous calculation to the last (by their ID) assignment run only.

How many customers upgraded from Product A to Product B?

I have a "daily changes" table that records when a customer "upgrades" or "downgrades" their membership level. In the table, let's say field 1 is customer ID, field 2 is membership type and field 3 is the date of change. Customers 123 and ABC each have two rows in the table. Values in field 1 (ID) are the same, but values in field 2 (TYPE) and 3 (DATE) are different. I'd like to write a SQL query to tell me how many customers "upgraded" from membership type 1 to membership type 2 how many customers "downgraded" from membership type 2 to membership type 1 in any given time frame.
The table also shows other types of changes. To identify the records with changes in the membership type field, I've created the following code:
SELECT *
FROM member_detail_daily_changes_new
WHERE customer IN (
SELECT customer
FROM member_detail_daily_changes_new
GROUP BY customer
HAVING COUNT(distinct member_type_cd) > 1)
I'd like to see an end report which tells me:
For Fiscal 2018,
X,XXX customers moved from Member Type 1 to Member Type 2 and
X,XXX customers moved from Member Type 2 to Member type 1
Sounds like a good time to use a LEAD() analytical function to look ahead for a given customer's member_Type; compare it to current record and then evaluate if thats an upgrade/downgrade then sum results.
DEMO
CTE AS (SELECT case when lead(Member_Type_Code) over (partition by Customer order by date asc) > member_Type_Code then 1 else 0 end as Upgrade
, case when lead(Member_Type_Code) over (partition by Customer order by date asc) < member_Type_Code then 1 else 0 end as DownGrade
FROM member_detail_daily_changes_new
WHERE Date between '20190101' and '20190201')
SELECT sum(Upgrade) upgrades, sum(downgrade) downgrades
FROM CTE
Giving us: using my sample data
+----+----------+------------+
| | upgrades | downgrades |
+----+----------+------------+
| 1 | 3 | 2 |
+----+----------+------------+
I'm not sure if SQL express on rex tester just doesn't support the sum() on the analytic itself which is why I had to add the CTE or if that's a rule in non-SQL express versions too.
Some other notes:
I let the system implicitly cast the dates in the where clause
I assume the member_Type_Code itself tells me if it's an upgrade or downgrade which long term probably isn't right. Say we add membership type 3 and it goes between 1 and 2... now what... So maybe we need a decimal number outside of the Member_Type_Code so we can handle future memberships and if it's an upgrade/downgrade or a lateral...
I assumed all upgrades/downgrades are counted and a user can be counted multiple times if membership changed that often in time period desired.
I assume an upgrade/downgrade can't occur on the same date/time. Otherwise the sorting for lead may not work right. (but if it's a timestamp field we shouldn't have an issue)
So how does this work?
We use a Common table expression (CTE) to generate the desired evaluations of downgrade/upgrade per customer. This could be done in a derived table as well in-line but I find CTE's easier to read; and then we sum it up.
Lead(Member_Type_Code) over (partition by customer order by date asc) does the following
It organizes the data by customer and then sorts it by date in ascending order.
So we end up getting all the same customers records in subsequent rows ordered by date. Lead(field) then starts on record 1 and Looks ahead to record 2 for the same customer and returns the Member_Type_Code of record 2 on record 1. We then can compare those type codes and determine if an upgrade or downgrade occurred. We then are able to sum the results of the comparison and provide the desired totals.
And now we have a long winded explanation for a very small query :P
You want to use lag() for this, but you need to be careful about the date filtering. So, I think you want:
SELECT prev_membership_type, membership_type,
COUNT(*) as num_changes,
COUNT(DISTINCT member) as num_members
FROM (SELECT mddc.*,
LAG(mddc.membership_type) OVER (PARTITION BY mddc.customer_id ORDER BY mddc.date) as prev_membership_type
FROM member_detail_daily_changes_new mddc
) mddc
WHERE prev_membership_type <> membership_type AND
date >= '2018-01-01' AND
date < '2019-01-01'
GROUP BY membership_type, prev_membership_type;
Notes:
The filtering on date needs to occur after the calculation of lag().
This takes into account that members may have a certain type in 2017 and then change to a new type in 2018.
The date filtering is compatible with indexes.
Two values are calculated. One is the overall number of changes. The other counts each member only once for each type of change.
With conditional aggregation after self joining the table:
select
2018 fiscal,
sum(case when m.member_type_cd > t.member_type_cd then 1 else 0 end) upgrades,
sum(case when m.member_type_cd < t.member_type_cd then 1 else 0 end) downgrades
from member_detail_daily_changes_new m inner join member_detail_daily_changes_new t
on
t.customer = m.customer
and
t.changedate = (
select max(changedate) from member_detail_daily_changes_new
where customer = m.customer and changedate < m.changedate
)
where year(m.changedate) = 2018
This will work even if there are more than 2 types of membership level.

SQL Query to return one record for a column with duplicated values based on the date of another column

I am sure this question may have been answered before, but it's hard to phrase, and I've spent a couple of hours on Google and still not found a solution.
I have a table (view) that has a record of device serial numbers that are rented out. The row is created by a sales order when we ship out a device (Status = Shipped) and then when the device is returned the same record is updated (Status = Returned). The TransDate is updated when shipped and then when returned.
Now a device is back in our warehouse, we will rent it out again, this time with a new order (because it will 99% be a different customer). So we get a new row in the table, with a new order number but, of course, with the same serial number.
Here is some example data (but there are many additional fields in the real table)
SerNo TransDate OrdNo Status (record id for describing the issue)
1111 20170105 1234 Returned 1
2222 20161220 1235 Shipped 2
3333 20170105 1235 Returned 3
4444 20170105 1236 Returned 4
1111 20170115 1311 Returned 5
4444 20170110 1312 Shipped 6
6666 20170110 1313 Shipped 7
1111 20170125 1401 Shipped 8
My challenge is that I need a Select query that will return just one record for every serial number that is in the table... and where there is more than one record for the same serial number, I need the one with the latest date.
In other words the result would include record id's:
2, 6, 7, 8 (devices out at customers) AND 3 (because this has been returned but not re-shipped).
(Records 1, 4, 5 have been returned, but then rented out again, so they are now just historical records and do not represent current status).
I know GROUP BY will not work because I have to aggregate the other fields, and I need all the other fields (there are many more) from the record with the latest date for a given serial number.
We are running this on SQL Server 2012.
Thank you in advance!
You can use the row_number function to get the first row for each serno ordered by descending transdate.
select serno,transdate,ordno,status
from (
select t.*, row_number() over(partition by serno order by transdate desc) as rnum
from data t
) x
where rnum=1

Always Include Certain Records in Daterange

Sorry if this is too general a question, but I couldn't find much material on this.
I'm wondering if there is any way in SQL or Tableau to always include certain records despite changes in a date range?
For example, I have 200 records that range from 1940-2004 and want 2 or 3 of these records to always be returned in the query (which includes a date range statement) is there a known method?
I'd like to avoid altering the date attributes based on the date range statement itself is possible.
Initial data:
Person_ID | Group | Date
ID 1 2 1-1-2003
ID 2 1 1-1-1994
ID 3 1 1-1-1985
ID 4 1 1-1-1992
ID 5 2 1-1-1991
ID 6 2 1-1-2002
ID 7 1 1-1-2003
ID 8 2 1-1-2005
ID 9 2 1-1-1999
ID 10 1-1-2002
ID 11 1-1-1989
For my results, I want it to be possible so that no matter the daterange I select, ID 10 and ID 11 are included.
SELECT Person_ID
FROM table
WHERE DATE BETWEEN date1 AND date2
Will always yield ID 10 and ID 11 no matter the dates inputted.
I don't know much about tableau but you can try this...
SELECT Person_ID
FROM table
WHERE (DATE BETWEEN date1 AND date2) OR Person_ID = 10 OR Person_ID = 11
If you want help on figuring out queries try and say what you want as literal as possible using sql words. So in this case you could say "I want to select the person_id from table where the date is between date1 and date2 or if the person_id is 10 or if the person_id is 11". If you ever say but (ex: date between date1 and date2 but if their id is 10 then also do it) then most likely you can put an or there : ). So if I were to do it without the or it sound more normal (in my opinion at least -> "where the date is between date1 and date2 but if the person_id is 10 or 11 also include it"). Hope that helps!

Get last value based on an Id in SQL Server 2005

Consider my query,
Select EmpId,RemainingBalance from Salary where EmpId='15'
My results pane,
15 450.00
15 350.00
15 250.00
How to get last RemainingBalance amount (ie) 250.00...
Presumably you have a datetime in the table that can be used to determine which is the latest record, so you can use this:
SELECT TOP 1 EmpId, RemainingBalance
FROM Salary
WHERE EmpId = '15'
ORDER BY SomeDateTimeField DESC
If you don't have such a datetime field that indicates when a record was created, then you need another field that can be used to imply the same (e.g. an IDENTITY field, where the greater the number, the more recent the record) - approach would be the same as above.