SQL query to get latest user to update record - sql

I have a postgres database that contains an audit log table which holds a historical log of updates to documents. It contains which document was updated, which field was updated, which user made the change, and when the change was made. Some sample data looks like this:
doc_id | user_id | created_date | field | old_value | new_value
--------+---------+------------------------+-------------+---------------+------------
A | 1 | 2018-07-30 15:43:44-05 | Title | | War and Piece
A | 2 | 2018-07-30 15:45:13-05 | Title | War and Piece | War and Peas
A | 1 | 2018-07-30 16:05:59-05 | Title | War and Peas | War and Peace
B | 1 | 2018-07-30 15:43:44-05 | Description | test 1 | test 2
B | 2 | 2018-07-30 17:45:44-05 | Description | test 2 | test 3
You can see that the Title of document A was changed three times, first by user 1 then by user 2, then again by user 1.
Basically I need to know which user was the last one to update a field on a particular document. So for example, I need to know that User 1 was the last user to update the Title field on document A. I don't really care what time it happened, just the document, field, and user.
So sample output would be something like this:
doc_id | field | user_id
--------+-------------+---------
A | Title | 1
B | Description | 2
Seems like it should be fairly straightforward query to write but I'm having some trouble with it. I would think that group by would be in order but the problem is that if I group by doc_id I lose the user data:
select doc_id, max(created_date)
from document_history
group by doc_id;
doc_id | max
--------+------------------------
B | 2018-07-30 15:00:00-05
A | 2018-07-30 16:00:00-05
I could join these results table back to the document_history table but I would need to do so based on the doc_id and timestamp which doesn't seem quite right. If two people editing a document at the exact same time I would get multiple rows back for that document and field. Maybe that's so unlikely I shouldn't worry about it, but still...
Any thoughts on a way to do this in a single query?

You want to filter the records, so think where, not group by:
select dh.*
from document_history
where dh.created_date = (select max(dh2.created_date) from document_history dh2 where dh2.doc_id = dh.doc_id);
In most databases, this will have better performance than a group by, if you have an index on document_history(doc_id, created_date).

If your DBMS supports window functions (e.g. PostgreSQL, SQL Server; aka analytic function in Oracle) you could do something like this (SQLFiddle with Postgres, other systems might differ slightly in the syntax):
http://sqlfiddle.com/#!17/981af/4
SELECT DISTINCT
doc_id, field,
first_value(user_id) OVER (PARTITION BY doc_id, field ORDER BY created_date DESC) as last_user
FROM get_last_updated
first_value() OVER (... ORDER BY x DESC) orders the window frames/partitions descending and then takes the first value which is your latest time stamp.
I added the DISTINCT to get your expected result. The window function just adds a new column to your SELECT result but within the same partition with the same value. If you do not need it, remove it and then you are able to work with the origin data plus the new won information.

Related

Display the previous date for a user in an additional column

I have a list of users and a list of review dates corresponding to each user, the user can have multiple reviews relating to them. What I need to do is create an additional column that shows me the users previous review date, if they don't have a previous review I need it to be null.
An example of the result I require is shown below with the column in bold being the column I want to add:
| User | Review Date | Previous Review Date
| ----- | -------------- | ------------------------
| 1122334 | 01/01/2022 | 06/06/2021
| 1122334 | 06/06/2021 | 06/01/2021
| 1122334 | 06/01/2021 | null
| 2244668 | 01/10/2021 | 01/04/2021
| 2244668 | 01/04/2021 | null
| 3344556 | 10/11/2021 | 10/03/2021
| 3344556 | 10/03/2021 | null
You can see in the example, that the previous review date for the user on row 1 will be the same users review date on row number 2
I have tried using the below:
select user, lead(review_date) over order(order by user,review_date desc) as Previous_review_date
this code works until I need it to be a null value in which case it will simply add the previous review date from an unrelated user.
Any help would be greatly appreciated.
Pretty sure OUTER APPLY would work here as well using a limit.
Note this could be useful if you need more than just a single column of data.
some docs - outer apply
ask tom - LINQ, cross/outer apply
In essence outer apply will run sub query once for each row in table A correlating the results between the two. Since we limit and order the results; we'll only get 1 record back whose review date is less than the review date. Now as an outer, we keep all records from A and only show results from Z when they exist. So the Z.review_date will be null when no such date/user can be correlated.
SELECT A.user, A.Review_date Z.review_date as Previous_review_Date
FROM TABLE A
OUTER APPLY (SELECT review_date
FROM Table B on A.User=B.User and B.Review_date < a.Review_Date
ORDER BY review_Date Desc
FETCH FIRST 1 ROWS ONLY) Z
Depending on volumn of data one approach vs the other can be more efficient. (See ask tom article)
Using your current approach:
SELECT A.user, A.Review_Date, lead(A.Review_date) over (partition by A.User ORDER BY A.Review_Date DESC) FROM TABLE A
The reason your's isn't working is because it's ordering ALL records by date; not those specific to a user. So you need to "partition" the data to each user and only order that users' review dates.
you need to partition the data to identify the lead value
select user, lead(review_date) over order(partition by user order by review_date desc) as Previous_review_date

SQL Help - Join small lookup table where not all columns are required (and an other option)

I have one large table with transactions and a smaller lookup table with values I want to add based on 4 common columns. The trick here is not every combination of these 4 columns will exist in the lookup table and there are scenarios where I want it to stop checking and accept the match instead of going to the next column. I also have an "Other" option to default to if it doesn't match any of the options.
Table structures are something like this:
transaction_table
country, trans_id, store_type, store_name, channel, browser, purchase_amount, currency
lookup_table
country, store_name, channel, browser, trans_fee
The data could be something like this:
transaction_table:
country| trans_id| store_type |store_name |channel |browser |amt |currency
US | 001 | Big Box | Target | B&M |N/A |1.45 |USD
US | 002 | Big Box | Target | Online |Chrome |1.79 |USD
US | 003 | Small | Bob's Store| B&M |N/A |2.50 |USD
US | 004 | Big Box | Walmart | B&M |N/A |1.12 |USD
US | 005 | Big Box | Walmart | Online |Firefox |3.79 |USD
US | 006 | Big Box | Amazon | Online |IE |4.54 |USD
US | 007 | Small | Jim's Plc | B&M |IE |2.49 |USD
lookup_table:
country|store_name |channel |browser |trans_fee
US |Target |B&M |N/A |0.25
US |Target |Online | |0.15
US |Walmart | | |0.30
US |Other | | |0.45
So looking at the lookup_table data:
Row 1 is very specific and would be a match on all 4 of the join
columns.
Row 2 would not care what browser was used to shop at Target so
regardless of the "browser" value, the trans_fee should come back
the same (other stores may care though).
Row 3 is saying any transaction with a country='US' and the
store_name='Walmart', regardless of the rest of the join columns
would have the same trans_fee
Row 4 is the "other" scenario where it should look first at the
store_name column and if it doesn't find a match, go to Other.
The lookup_table data can change and may end up being time dependent (start_date and end_date columns added) so it really wouldn't be a good candidate for a long, complex CASE statement.
I was thinking of a combination of checking each column with an IF IN statement but I'm hoping there's a more straightforward conditional join type statement I can use to go column by column and have an other option.
Thanks!
edit: I didn't specify this but I want to basically return all of the data from transaction_table and add the corresponding trans_fee to each line.
You will need to use a conditional JOIN.
Something like this
SELECT *
FROM lookup_table
LEFT OUTER JOIN transaction_table
ON CASE WHEN lookup_table.store_name IS NOT NULL
THEN transacton_table.store_name = lookup_table.store_name END
Such partial matching is tricky. And your problem is not really that well set up. You seem to have NULLs in some columns and general values in others.
In any case, you can solve this by matching what you can and then using order by to get the best match. In your case, I think this looks like this:
select tt.*,
(select trans_fee
from lookup l
where l.country = tt.country and
l.store_name in ('other', tt.store_name) and
(l.channel = tt.channel or l.channel is null) and
(l.browser = tt.browser or l. browser is null)
order by (case when l.store_name = tt.store_name then 1 else 2 end),
(case when l.channel = tt.channel then 1 else 2 end),
(case when l.browser = tt.browser then 1 else 2 end)
fetch first 1 row only
) as trans_fee
from transaction_table tt;
This is generic SQL. But the same idea should work in any database.

Need Strategy To Generate Snapshot Of Table Contents After a Series of Random Inserts

There are 10 rooms with a set of inventory items. When an item is added/deleted from a room a new row gets inserted to a MS-SQL table. I need the latest update for each room.
Take this series of inserts:
id| room| descriptor1| descriptor2| descriptor3|
1 | A | blue | 2 | large |
2 | B | red | 1 | small |
3 | A | blue | 1 | large |
What the resulting table needs to show:
room| descriptor1| descriptor2| descriptor3|
A | blue |1 | large |
B | red |1 | small |
Ideally, I would write a trigger that would update a room status table. I could then just query the room status table (Select *) to obtain the result. However, this table does not belong to me, I only have read access to a constantly updated table. I need to poll periodically or when I need a report.
How do I do this in MS-SQL? I have some inkling of how I would do it to obtain the status of just one room something like:
SELECT descriptor1, descriptor2, descriptor3
FROM myTable mt1
WHERE id = (SELECT MAX(id)
from myTable mt2
WHERE room = 'A'
);
Since I have 10 rooms i would need to do this query 10 times. Can this be narrowed down to a single query? What happens when there are 100 rooms? Is there a better way?
Thanks!
Matt
You were very close:
SELECT descriptor1, descriptor2, descriptor3
FROM myTable mt1
WHERE id in (SELECT MAX(id)
From myTable
Group By room
);
Instead of creating a trigger to update a static table, you should look into creating a view.

Access 2007 select first value of query results

I am running into a rather annoying thingy in Access (2007) and I am not sure if this is a feature or if I am asking for the impossible.
Although the actual database structure is more complex, my problem boils down to this:
I have a table with data about Units for specific years. This data comes from different sources and might overlap.
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
...
Now I would like the user to select certain sources, order them by priority and then extract one data value for each year.
For example, if the user selects source 1, 2 and 3 and orders them by (3, 1, 2), then I would like the following result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
I am able to order the initial table, based on a specific order. I do this with the following query
SELECT Unit, IYR, X1, Source
FROM TestTable
WHERE Source In (1,2,3)
ORDER BY Unit, IYR,
IIf(Source=3,1,IIf(Source=1,2,IIf(Source=2,3,4)))
This gives me the following intermediate result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2011 | 90 | 1 |
Next step is to only get the first value of each year. I was thinking to use the following query:
SELECT X.Unit, X.IYR, first(X.X1) as FirstX1
FROM (...) AS X
GROUP BY X.Unit, X.IYR
Where (…) is the above query.
Now Access goes bananas. Whatever order I give to the intermediate results, the result of this query is.
Unit | IYR | X1 |
--------------------
A | 2009 | 55 |
A | 2010 | 80 |
A | 2011 | 90 |
In other words, for year 2010 it shows the value of source 1 instead of 3. It seems that Access does not care about the ordering of the nested query when it applies the FIRST() function and sticks to the original ordering of the data.
Is this a feature of Access or is there a different way of achieving the desired results?
Ps: Next step would be to use a self join to add the source column to the results again, but I first need to resolve above problem.
Rather than use first it may be better to determine the MIN Priority and then join back e.g.
SELECT
t.UNIT,
t.IYR,
t.X1,
t.Source ,
t.PrioritySource
FROM
(SELECT
Unit,
IYR,
X1,
Source,
SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
) as t
INNER JOIN
(SELECT
Unit,
IYR,
MIN(SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3)) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
GROUP BY
Unit,
IYR ) as MinPriortiy
ON t.Unit = MinPriortiy.Unit and
t.IYR = MinPriortiy.IYR and
t.PrioritySource = MinPriortiy.PrioritySource
which will produce this result (Note I include Source and priority source for demonstration purposes only)
UNIT | IYR | X1 | Source | PrioritySource
----------------------------------------------
A | 2009 | 55 | 1 | 2
A | 2010 | 150 | 3 | 1
A | 2011 | 90 | 1 | 2
Note the first subquery is to handle the fact that Access won't let you join on a Switch
Yes, FIRST() does use an arbitrary ordering. From the Access Help:
These functions return the value of a specified field in the first or
last record, respectively, of the result set returned by a query. If
the query does not include an ORDER BY clause, the values returned by
these functions will be arbitrary because records are usually returned
in no particular order.
I don't know whether FROM (...) AS X means you are using an ORDER BY inline (assuming that is actually possible) or if you are using a VIEW ('stored Query object') here but either way I assume the ORDER BY is being disregarded (because an ORDER BY should only apply to the final result).
The alternative is to use MIN() (or possibly MAX()).
This is the most concise way I have found to write such queries in Access that require pulling back all columns that correspond to the first row in a group of records that are ordered in a particular way.
First, I added a UniqueID to your table. In this case, it's just an AutoNumber field. You may already have a unique value in your table, in which case you can use that.
This will choose the row with a Source 3 first, then Source 1, then Source 2. If there is a tie, it picks the one with the higher X1 value. If there is a further tie, it is broken by the UniqueID value:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=
(SELECT TOP 1 [UniqueID] FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, UniqueID)
This yields:
Unit IYR X1 Source UniqueID
A 2009 55 1 1
A 2010 150 3 4
A 2011 90 1 5
I recommend (1) you create an index on the IYR field -- this will dramatically increase your performance for this type of query, and (2) if you have a lot (>~100K) records, this isn't the best choice. I find it works quite well for tables in the 1-70K range. For larger datasets, I like to use my GroupIncrement function to partition each group (similar to SQL Server's ROW_NUMBER() OVER statement).
The Choose() function is a VBA function and may not be clear here. In your case, it sounds like there is some interactivity required. For that, you could create a second table called "Choices", like so:
Rank Choice
1 3
2 1
3 2
Then, you could substitute the following:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=(SELECT TOP 1 [UniqueID] FROM
[TestTable] t2 INNER JOIN [Choices] c
ON t2.Source=c.Choice
WHERE t.IYR=t2.IYR ORDER BY c.[Rank], t2.X1 DESC, t2.UniqueID);
Indexing Source on TestTable and Choice on the Choices table may be helpful here, too, depending on the number of choices required.
Q:
Can you get this to work without the need for surrogate key? For
example what if the unique key is the composite of
{Unit,IYR,X1,Source}
A:
If you have a compound key, you can do it like this-- however I think that if you have a large dataset, it will totally kill the performance of the query. It may help to index all four columns, but I can't say for sure because I don't regularly use this method.
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.Unit & t.IYR & t.X1 & t.Source =
(SELECT TOP 1 Unit & IYR & X1 & Source FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, Unit, IYR)
In certain cases, you may have to coalesce some of the individual parts of the key as follows (though Access generally will coalesce values automatically):
t.Unit & CStr(t.IYR) & CStr(t.X1) & CStr(t.Source)
You could also use a query in your FROM statements instead of the actual table. The query itself would build a composite of the four fields used in the key, and then you'd use the new key name in the WHERE clause of the top SELECT statement, and in the SELECT TOP 1 [key] of the subquery.
In general, though, I will either: (a) create a new table with an AutoNumber field, (b) add an AutoNumber field, (c) add an integer and populate it with a unique number using VBA - this is useful when you get a MaxLocks error when trying to add an AutoNumber, or (d) use an already indexed unique key.

Select ID given the list of members

I have a table for the link/relationship between two other tables, a table of customers and a table of groups. a group is made up of one or more customers. The link table is like
APP_ID | GROUP_ID | CUSTOMER_ID
1 | 1 | 123
1 | 1 | 124
1 | 1 | 125
1 | 2 | 123
1 | 2 | 125
2 | 3 | 123
3 | 1 | 123
3 | 1 | 124
3 | 1 | 125
I now have a need, given a list of customer IDs to be able to get the group ID for that list of customer IDs. Group ID may not be unique, the same group ID will contain the same list of customer IDs but this group may exist in more than one app_id.
I'm thinking that
SELECT APP_ID, GROUP_ID, COUNT(CUSTOMER_ID) AS COUNT
FROM GROUP_CUST_REL
WHERE CUSTOMER_ID IN ( <list of ids> )
GROUP BY APP_ID, GROUP_ID
HAVING COUNT(CUSTOMER_ID) = <number of ids in list>
will return me all of the group IDs that contain all of the customer ids in the given list and only those group ids. So for a list of (123,125) only group id 2 would be returned from the above example
I will then have to link with the app table to use its created timestamp to identify the most recent application that the group existed in so that I can then pull the correct/most up to date info from the group table.
Does anyone have any thoughts on whether this is the most efficient way to do this? If there is another quicker/cleaner way I'd appreciate your thoughts.
This smells like a division:
Division sample
Other related stack overflow question
Taking a look at the provided links you'll see the solution to similar issues from relational alegebra's point of view, doesn't seem to be quicker and arguably cleaner.
I didn't look at your solution at first, and when I solved this I turned out to have solved this the same way you did.
Actually, I thought this:
<number of ids in list>
Could be turned into something like this (so that you don't need the extra parameter):
select count(*) from (<list of ids>) as t
But clearly, I was wrong. I'd stay with your current solution if I were you.