Aware there is an almost identical question here, but that covers the SQL query required, rather than the mechanism of event triggering.
Lets say I have two tables. One table contains performance data for each staff member each week. The other table is a table that holds the staff members information. What I want is to update a value in the table to a Y or N based on whether that staff member left at the week date.
staffTable
+----------+----------------+------------+
| staff_id | staff_name | leave_date |
+----------+----------------+------------+
| 1 | Joseph Blogges | 2020-01-24 |
| 2 | Joe Bloggs | 9999-12-31 |
| 3 | Joey Blogz | 9999-12-31 |
+----------+----------------+------------+
targetTable
+------------+----------+--------+-----------+
| week_start | staff_id | target | left_flag |
+------------+----------+--------+-----------+
| 2020-01-13 | 1 | 10 | N |
| 2020-01-20 | 1 | 10 | N |
| 2020-01-27 | 1 | 8 | Y |
+------------+----------+--------+-----------+
What I am trying to do is have the left_flag automatically change from 'N' to 'Y' when the week_start value is greater than leave_date of the staff member (in the other table).
I have tried successfully putting this into a view, which works, but the problem is that existing applications, views and queries will need to all reference a new view instead of a table and I want to be able to query the data table as my front-end has issues interacting in live with a view instead of a table.
I have also successfully used a UDF to return the leave_date and then create computed column that will check if this UDF variable is greater than the start_date column and this worked fine until I realised that the UDF is the most resource consuming query on the entire server and is completely disproportionate.
Is there a way that I can trigger an update to the staffTable when a criteria is met in another table, or is there a totally better and different way of doing this? If it can't be done easily, I'll try to switch to a view and work around it in the front-end.
I'm going to describe the process rather than writing the code.
What you are describing can be accomplished using triggers on staffTable. When a new row is inserted or updated the trigger would change any rows in targetTable. This would be an after insert/update trigger.
The heart of the trigger would be:
update tt
set left_flag = 'Y'
from targettable tt join
inserted i
on tt.staff_id = i.staff_id
where i.leave_date < tt.week_start and
tt.left_flag <> 'Y';
I am facing an issue where a data supplier is generating a dump of his multi-tenant databases in a single table. Recreating the original tables is not impossible, the problem is I am receiving millions of rows every day. Recreating everything, every day, is out of question.
Until now, I was using SSIS to do so, with a lookup-intensive approach. In the past year, my virtual machine went from having 2 GB of ram to 128, and still growing.
Let me explain the disgrace:
Imagine a database where users have posts, and posts have comments. In my real scenario, I am talking about 7 distinct tables. Analyzing a few rows, I have the following:
+-----+------+------+--------+------+-----------+------+----------------+
| Id* | T_Id | U_Id | U_Name | P_Id | P_Content | C_Id | C_Content |
+-----+------+------+--------+------+-----------+------+----------------+
| 1 | 1 | 1 | john | 1 | hello | 1 | hello answer 1 |
| 2 | 1 | 2 | maria | 2 | cake | 2 | cake answer 1 |
| 3 | 2 | 1 | pablo | 1 | hello | 1 | hello answer 3 |
| 4 | 2 | 1 | pablo | 2 | hello | 2 | hello answer 2 |
| 5 | 1 | 1 | john | 3 | nosql | 3 | nosql answer 1 |
+-----+------+------+--------+------+-----------+------+----------------+
the Id is from my table
T_Id is the "tenant" Id, which identifies multiple databases
I have imagined the following possible solution:
I make a query that selects non-existent Ids for each table, such as:
SELECT DISTINCT n.t_id,
n.c_id,
n.c_content
FROM mytable n
WHERE n.id > 4
AND NOT EXISTS (SELECT 1
FROM mytable o
WHERE o.id <= 4
AND n.t_id = o.t_id
AND n.c_id = o.c_id)
This way, I am able to select only the new occurrences whenever a new Id of a table is found. Although it works, it may perform badly when working with 100s of millions of rows.
Could anyone share a suggestion? I am quite lost.
Thanks in advance.
EDIT > my question is vague
My final intent is to rebuild the tables from the dump, incrementally, avoiding lookups outside the database. Every now and then I am gonna run a script that will select new tenants, users, posts and comments and add them to their corresponding tables.
My previous solution worked as follows:
Cache the whole database
For each new row, search for the columns inside the cache
If it doesn't exist, then insert it
I know it sounds dumb, but it made sense as a new developer working with ETLs
First, if you have a full flat DB dump, I'll suggest you to work on your file before even importing it in your DB (low level file processing is pretty cheap and nearly instantaneous).
From Removing lines in one file that are present in another file using python you can remove all the already parsed line since your last run.
with open('new.csv','r') as source:
lines_src = source.readlines()
with open('old.csv','r') as f:
lines_f = f.readlines()
destination = open('diff_add.csv',"w")
for data in lines_src:
if data not in lines_f:
destination.write(data)
destination.close()
This take less than five second to work on a 900Mo => 1.2Go dump. With this you'll only work with line that really make change in one of your new table.
Now you can import this flat DB to a working table.
As you'll have to search the needle in each line, some index on the ids may by a good idea (go to composite index that use your Tenant_id first).
For the last part, I don't know exactly how your data look, can you have some update to do ?
The Operators - EXCEPT and INTERSECT can help you too with this kind of problem.
I've created an table that tracks the various attributes of objects over time.
Id | Attribute1 | Attribute2 | Attribute3 | StartDate | EndDate
------------------------------------------------------------------
01 | 100 | Null | Null | 2004-02-03 | 2006-04-30
01 | 100 | Null | D | 2006-05-01 | 2010-11-06
01 | 150 | Null | D | 2010-11-07 | Null
02 | 700 | 5600 | Null | 1998-09-27 | 2002-01-27
New data (~10s of thousands of records) come in each day. What I want to do is compare each record to the current data for that id, and then:
a) Do nothing if the attributes match.
b) If the attributes are different, update the current record so that the EndDate is the current date, and create a new record with the new attributes.
c) Create a new record if there isn't any data for that id.
My question is, what is the most efficient way to do this?
I can write a script that goes through each record, does the comparison, and the updates the table as appropriate, but I fell like this is brute-force, rather than an intelligent solution.
Would this be a good place to use a cursor?
How do you process data? As it comes in or in batch?
If it is as it comes in, then I would do a set of checks on the most likely attribute to change and to the least likely (just to optimize the checking a bit) and update as needed. 10's of thousands is not enough data to worry about slowing down too much. This is the straight forward approach.
If you process as a batch (like at end of business each day), sort the data by ID then descending end date. Delete all other instances of ID and only care about the latest one. No intermediary data would matter.
Example: you have 2 entries for id 1, one with endDate Jan 1 other with endDate Jan 25. Look at Jan 25 entry first and update if needed. Jan 1 entry is too old to care about at that point.
I am running into a rather annoying thingy in Access (2007) and I am not sure if this is a feature or if I am asking for the impossible.
Although the actual database structure is more complex, my problem boils down to this:
I have a table with data about Units for specific years. This data comes from different sources and might overlap.
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
...
Now I would like the user to select certain sources, order them by priority and then extract one data value for each year.
For example, if the user selects source 1, 2 and 3 and orders them by (3, 1, 2), then I would like the following result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
I am able to order the initial table, based on a specific order. I do this with the following query
SELECT Unit, IYR, X1, Source
FROM TestTable
WHERE Source In (1,2,3)
ORDER BY Unit, IYR,
IIf(Source=3,1,IIf(Source=1,2,IIf(Source=2,3,4)))
This gives me the following intermediate result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2011 | 90 | 1 |
Next step is to only get the first value of each year. I was thinking to use the following query:
SELECT X.Unit, X.IYR, first(X.X1) as FirstX1
FROM (...) AS X
GROUP BY X.Unit, X.IYR
Where (…) is the above query.
Now Access goes bananas. Whatever order I give to the intermediate results, the result of this query is.
Unit | IYR | X1 |
--------------------
A | 2009 | 55 |
A | 2010 | 80 |
A | 2011 | 90 |
In other words, for year 2010 it shows the value of source 1 instead of 3. It seems that Access does not care about the ordering of the nested query when it applies the FIRST() function and sticks to the original ordering of the data.
Is this a feature of Access or is there a different way of achieving the desired results?
Ps: Next step would be to use a self join to add the source column to the results again, but I first need to resolve above problem.
Rather than use first it may be better to determine the MIN Priority and then join back e.g.
SELECT
t.UNIT,
t.IYR,
t.X1,
t.Source ,
t.PrioritySource
FROM
(SELECT
Unit,
IYR,
X1,
Source,
SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
) as t
INNER JOIN
(SELECT
Unit,
IYR,
MIN(SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3)) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
GROUP BY
Unit,
IYR ) as MinPriortiy
ON t.Unit = MinPriortiy.Unit and
t.IYR = MinPriortiy.IYR and
t.PrioritySource = MinPriortiy.PrioritySource
which will produce this result (Note I include Source and priority source for demonstration purposes only)
UNIT | IYR | X1 | Source | PrioritySource
----------------------------------------------
A | 2009 | 55 | 1 | 2
A | 2010 | 150 | 3 | 1
A | 2011 | 90 | 1 | 2
Note the first subquery is to handle the fact that Access won't let you join on a Switch
Yes, FIRST() does use an arbitrary ordering. From the Access Help:
These functions return the value of a specified field in the first or
last record, respectively, of the result set returned by a query. If
the query does not include an ORDER BY clause, the values returned by
these functions will be arbitrary because records are usually returned
in no particular order.
I don't know whether FROM (...) AS X means you are using an ORDER BY inline (assuming that is actually possible) or if you are using a VIEW ('stored Query object') here but either way I assume the ORDER BY is being disregarded (because an ORDER BY should only apply to the final result).
The alternative is to use MIN() (or possibly MAX()).
This is the most concise way I have found to write such queries in Access that require pulling back all columns that correspond to the first row in a group of records that are ordered in a particular way.
First, I added a UniqueID to your table. In this case, it's just an AutoNumber field. You may already have a unique value in your table, in which case you can use that.
This will choose the row with a Source 3 first, then Source 1, then Source 2. If there is a tie, it picks the one with the higher X1 value. If there is a further tie, it is broken by the UniqueID value:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=
(SELECT TOP 1 [UniqueID] FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, UniqueID)
This yields:
Unit IYR X1 Source UniqueID
A 2009 55 1 1
A 2010 150 3 4
A 2011 90 1 5
I recommend (1) you create an index on the IYR field -- this will dramatically increase your performance for this type of query, and (2) if you have a lot (>~100K) records, this isn't the best choice. I find it works quite well for tables in the 1-70K range. For larger datasets, I like to use my GroupIncrement function to partition each group (similar to SQL Server's ROW_NUMBER() OVER statement).
The Choose() function is a VBA function and may not be clear here. In your case, it sounds like there is some interactivity required. For that, you could create a second table called "Choices", like so:
Rank Choice
1 3
2 1
3 2
Then, you could substitute the following:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=(SELECT TOP 1 [UniqueID] FROM
[TestTable] t2 INNER JOIN [Choices] c
ON t2.Source=c.Choice
WHERE t.IYR=t2.IYR ORDER BY c.[Rank], t2.X1 DESC, t2.UniqueID);
Indexing Source on TestTable and Choice on the Choices table may be helpful here, too, depending on the number of choices required.
Q:
Can you get this to work without the need for surrogate key? For
example what if the unique key is the composite of
{Unit,IYR,X1,Source}
A:
If you have a compound key, you can do it like this-- however I think that if you have a large dataset, it will totally kill the performance of the query. It may help to index all four columns, but I can't say for sure because I don't regularly use this method.
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.Unit & t.IYR & t.X1 & t.Source =
(SELECT TOP 1 Unit & IYR & X1 & Source FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, Unit, IYR)
In certain cases, you may have to coalesce some of the individual parts of the key as follows (though Access generally will coalesce values automatically):
t.Unit & CStr(t.IYR) & CStr(t.X1) & CStr(t.Source)
You could also use a query in your FROM statements instead of the actual table. The query itself would build a composite of the four fields used in the key, and then you'd use the new key name in the WHERE clause of the top SELECT statement, and in the SELECT TOP 1 [key] of the subquery.
In general, though, I will either: (a) create a new table with an AutoNumber field, (b) add an AutoNumber field, (c) add an integer and populate it with a unique number using VBA - this is useful when you get a MaxLocks error when trying to add an AutoNumber, or (d) use an already indexed unique key.
I have a relationships table, the table looks something like this
------------------------
| client_id | service_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 4 |
| 1 | 7 |
| 2 | 1 |
| 2 | 5 |
------------------------
I have a list of new permissions I need to add, what I'm doing right now is, for example, if I have to add new permissions for the client with id 1, i do
DELETE FROM myTable WHERE client_id = 1
INSERT INTO ....
Is there a more efficient way I can remove only the ones I won't insert later, and add only the new ones?
yes, you can do this but in my humble opinion, it's not really sql dependent subject. actually it depends on your language/platform choice. if you use a powerful platform like .NET or Java, there are many database classes like adapters, datasets etc. which are able to take care of things for you like finding the changed parts, updating/inserting/deleting only necessery parts etc.
i prefer using hibernate/nhibernate like libraries. in this case, you don't even need to write sql queries most of the time. just do the things at oop level and synchronize with the database.
If you put the new permissions into another table, you could do something like:
DELETE FROM myTable WHERE client_id in (SELECT client_id FROM tmpTable);
INSERT INTO myTable AS (SELECT client_id, service_id FROM tmpTable);
You are still taking 2 passes, but you are doing them all at once instead of one at a time.