Using SELECT ... FOR UPDATE to poll for a value change - sql

I have a table that contains tasks and their status, akin to:
| task_id | task_status |
+---------+-------------+
| 71 | 1 |
| 85 | 3 |
| 110 | 2 |
Let's call the table TASKS.
Status is an enumerated value, for example:
= SCHEDULED
= RUNNING
= DONE
I need to poll this status to inform the user about a task he started. Currently, I'm just polling it on the server using a while loop, like this pseudocode:
status = old_status
while(timeout_not_expired and status==old_status) {
status = get_status("SELECT task_status FROM TASKS WHERE task_id=%1", task_id)
wait(check_interval)
}
return status
That's nasty, not only it spams the Oracle SQL server, it also spams our log of SQL queries.
So I did a bit of googling and found about SELECT ... FOR UPDATE. I tried to run this statement:
SELECT
task_status
FROM TASKS
WHERE task_id = 361
FOR UPDATE OF task_status
But it returns immediately. So the question:
Is this even what FOR UPDATE is for?
If yes, how do I get it to wait on the row with a timeout?

No, that isn't what that clause is for. From the documentation:
The FOR UPDATE clause lets you lock the selected rows so that other users cannot lock or update the rows until you end your transaction.
Your query selects the current status for that task and locks the row, essentially on the assumption that you plan to update it, and don't want anyone else to be able to change it between your select and subsequent update.
So after you perform that query, no-one else can update the status of that task until you commit or rollback - kind of the opposite of what you're trying to achieve.
You could look at alert or queueing mechanisms, but you might want to investigate continuous query notification, though it could be overkill for this.

Related

How do I create a Splunk query for unused event types?

I have found that I can create a Splunk query to show how many times results of a certain event type appear in results.
severity=error | stats count by eventtype
This creates a table like so:
eventtype | count
------------------------
myEventType1 | 5
myEventType2 | 12
myEventType3 | 30
So far so good. However, I would like to find event types with zero results. Unfortunately, those with a count of 0 do not apear in the query above, so I can't just filter by that.
How do I create a Splunk query for unused event types?
There are lots of different ways for that, depending on what you mean by "event types". Somewhere, you have to get a list of whatever you are interested in, and roll them into the query.
Here's one version, assuming you had a csv that contained a list of eventtypes you wanted to see...
severity=error
| stats count as mycount by eventtype
| inputcsv append=t mylist.csv
| eval mycount=coalesce(mycount,0)
| stats sum(mycount) as mycount by eventtype
Here's another version, assuming that you wanted a list of all eventtypes that had occurred in the last 90 days, along with the count of how many had occurred yesterday:
earliest=-90d#d latest=#d severity=error
| addinfo
| stats count as totalcount count(eval(_time>=info_max_time-86400)) as yesterdaycount by eventtype

Updating the user ranking with a SQL Server stored procedure

I have two tables, one contains user data and the other contains user ranking information (points needed for the promotion)
Let's say that the user table looks like this:
login | ArticlePoints | PhotoPoints | StageId
and the user ranking information table looks like this:
StageId | StageName | MinimumPoints
and the user information table might contain data like this:
1 | Beginner | 100
2 | Advanced | 200
3 | Expert | 300
What I would like to have is a procedure which does add user points and check whether it is enough for the ranking promotion. Right now I do it like this:
I do have a function which does check "manually" whether the user points is between 100 and 200 and then it does set the user stage = 2, id it's more it check whether it's between 200 and 300 etc.
Stored procedure which does update users set stage = MYFUNCTION from the point 1.
The thing is that it's not a good solution, right now it is not ready for the easy updates(I can't just add Super Expert with minimum 400 points, I'd need to edit the function).
I am trying to prepare a better solution for this problem but I have no idea how to "connect" both tables.
Write an UPDATE query that returns the StageID for the calculated values, something like:
UPDATE t1
SET t1.StageID =
(SELECT TOP 1 StageID
FROM [RANKING_TABLE] t2
WHERE t1.ArticlePoints + t1.PhotoPoints >= t2.MinimumPoints
ORDER BY t2.MinimumPoints DESC)
FROM [USER_TABLE] t1
So if the USER has 250 points in total, Beginner and Advanced would be achieved, using the TOP 1 and the ORDER BY t2.MinimumPoints DESC, would select the highest Stage.

user defined psuedocolumn oracle

I have a large dataset in an oracle database that is currently accessed from Java one item at a time. For example if a user is trying to do a bulk get of 50 items it will process them sequentially, calling a stored procedure for each one. I am now trying to implement a bulk get, but am having some difficulty due to the way the user can pass in a range query:
An example table:
prim_key | identifier | start | end
----------+--------------+---------+-------
1 | aaa | 1 | 3
2 | aaa | 3 | 7
3 | bbb | 1 | 5
The way it works is that if you have a query like (id='aaa' and pos=1) it will find prim_key = 1, but if you query (id='aaa' and pos=2) it won't find anything. If you do (id='aaa' and pos=-2) then it will again find prim_key=1 because the stored proc converts the -2 into a range scan equivalent to start<=2 and end>2.
(Extra context: the start/end are actually dates and this querying mechanism allows efficient "latest as of date" queries as opposed to doing something like select prim_key,
start from myTable
where start = (select max(start) from myTable where start <= 2))
This is all fine and works correctly for single gets, but now I'm trying to do bulk gets so that we can speed up the batch considerably. The first attempt was to multithread the individual calls, but it put too much stress on the database to be doing so many parallel queries on the same table. To solve this I've been trying to create a query like
select prim_key
from myTable
where (identifier='aaa' and start=3)
or (identifier='aaa' and start<=2 and end>2)
building this up from the list of input parameters ('aaa',3 ; 'bbb',-2), which works well and produces an explain plan using all of the indexes I would expect.
My Problem: I need to know what the input parameters were that retrieved that row in order to do further processing and return the relevant prim_key. I need to use something like a psuedocolumn that I can define myself:
select prim_key, PSUEDO
from myTable
where (identifier='aaa' and start=3 and PSUEDO='a3')
or (identifier='aaa' and start<=2 and end>2 and PSUEDO='a-2')
but I can't find any way to return a value from the where clause, and I think subqueries would lose the indexing efficiencies gained by doing it all in one select.
Try something like:
select
prim_key,
case when start = 3 then 'a3' else 'a-2' end pseudo
from
you_table
where
...

Cumulative average number of records created for specific day of week or date range

Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.
The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.
Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.
So, given the following basic table structure:
id | action | date_created
1 | 'merp' | 2007-06-20 17:17:00
2 | 'foo' | 2007-06-21 09:54:48
3 | 'bar' | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36
How would I go about calculating the average number of records created for each given day of the week?
day_of_week | average_records_created
1 | 234
2 | 23
3 | 5
4 | 67
5 | 234
6 | 12
7 | 36
I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:
SELECT
DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;
Halps? Please, don't make me cut myself again. :'(
How far back do you need to go when sampling this information? This solution works as long as it's less than a year.
Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.
Then, your report can be something along the lines of:
select
DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
MyCompanionTable
group by
DayOfWeek
Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.
I rewrote your query as:
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.
Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week

How should I go about implementing an "autonumber" field in SQL Server 2005?

I'm aware of IDENTITY fields but I have a feeling that I couldn't use one to solve my problem.
Let's say I have multiple clients. Each client has multiple orders. Each client needs to have their orders numbered sequentially, specific to them.
Example table structure:
Orders:
OrderID | ClientID | ClientOrderID | etc...
Some example rows for this table would be:
OrderID | ClientID | ClientOrderID | etc...
1 | 1 | 1 | ...
2 | 1 | 2 | ...
3 | 2 | 1 | ...
4 | 3 | 1 | ...
5 | 1 | 3 | ...
6 | 2 | 2 | ...
I know the naive way would be to take the MAX ClientOrderID for any client and use that value for INSERTs but that would be subject to concurrency issues. I was considering using a transaction but I'm not quite sure what the broadest isolation scope that can be used for this. I'll be using LINQ to SQL but I have feeling that isn't relevant.
Somebody correct me if I'm wrong, but as long as your MAX() call is in the same step as your insert, you won't have a problem with concurrency.
So, you could not do
select #newOrderID=max(ClientOrderID) + 1
from orders
where clientid=#myClientID;
insert into ( ClientID, ClientOrderID, ...)
values( #myClientID, #newOrderID, ...);
But you can do
insert into ( ClientID, ClientOrderID, ...)
select #myClientID, max(ClientOrderID) + 1, ...
from orders
where clientid=#myClientID;
I'm assuming OrderID is an identity column.
Again, if I'm incorrect on this, please let me know. Preferably with a URL
You could use a Repository pattern to handle your Orders and let it control the number of each specific clients order number. If you implement the OrderRepository correctly it could control the concurrency and number the order before saving it to the database (let the repository and not the db set the number).
Repository pattern: http://martinfowler.com/eaaCatalog/repository.html
One possibility (though I don't like to do this) is to have a lookup table that would tell you the greatest Order Number given for each vendor. Inside of a transaction, you'd fetch the most recent one from VendorOrderNumber, save your new order, increment the value in VendorOrderNumber, commit transaction.
This is an odd way to store data, but assuming you need it, there is nothing built-in that you can use.
Your suggestion of Max(ClientOrderID) is straight forward and pretty easy to implement (follow John MacIntyre's advice). It will probably work acceptably well on tables with a few thousand orders. As the table grows this approach will of course slow down.
Nick DeVore's suggestion of a lookup table is a little messier to implement but won't substantially be affected by data growth.
Depending on where/when you actually need the ClientOrderID, you could calculate the id when needed like this:
SELECT *,
ROW_NUMBER() OVER(ORDER BY OrderID) AS ClientOrderID
FROM Orders
WHERE ClientID = 1
This assumes that the ClientOrderIDs are in the same sequence as the OrderID. Without actually persisting the ID, it is awkward to use as a key to anything else. This approach should not be affected by data growth.