I've been working on learning SQL starting with basically no knowledge. I've only been at this for about a week.
I'm working with a piece of simulation software. I've been given a task to run many replications of a simulation that will generate random events of either something happens or nothing happens. The sim will create a table of the events and a column for replication numbers. If the event happened then it will get an event number, if it didn't, it will skip that rep and go to the next. I need a query to take the replications that have no event and replace the null entry with a zero.
As an added bonus, I might need to generate the number for the replication where there is a null.
The last bit of code I tried looked like this:
SELECT IsNull(table1.column1, 0)
FROM table1
SELECT COALESCE(table1.column1,0)
FROM table1
Should do the trick. COALESCE returns the first non-null value it finds, so this should work.
Related
I have CDC enable on a table and I'm trying to get the minimum LSN for that table to use in an ETL job. However when I run this query
select sys.fn_cdc_get_min_lsn('dbo_Table')
I get a different result to this query
select min(__$start_lsn) from cdc.dbo_Table_CT
Shouldn't these queries return the same values? and if they don't why not? and how to get them back in sync?
The first query:
select sys.fn_cdc_get_min_lsn('dbo_Table')
Invokes a system function that returns the lowest POSSIBLE lsn for the capture instance. This value is set when the cleanup function runs. It is recorded in, and queried from, cdc.change_tables.
The second query:
select min(__$start_lsn) from cdc.dbo_Table_CT
Looks at the actual capture instance and returns the lowest ACTUAL lsn for the instance. This value is set when the first actual change to the instance is logged after the cleanup function runs. It is recorded in, and queried from, cdc.dbo_Table_CT.
They're unlikely to tie out, statistically speaking. For the purposes of an ETL job, using the call to the system table will likely be quicker, and is a more accurate reflection of when the current set of change records started being accumulated.
A coworker in accounting was complaining about how she ran a query twice and it doubled her values and she got confused. Im just a Junior IT person who has very little VBA experience. I am basically just trying to add code to make it so my queries in our databases can't be run more than once unless you restart the database. I was thinking of doing a boolean check to see if a query has been run and if it has don't allow it to be run again. Or maybe I could just do a simple if statement. Please let me know if you have any input on this issue.
I couldn't find anything on the Googs either.
I would think on a date and a session ID as default values in each table, you could code the addition of both etc.
These are populated, date =date() as default value and sessionID is the DMAX from your SessionID table, as extra column in said query.
This SessionID table is incemented by a startup popup form, running macro.
The Primary Key of each table being operated on would be the date and the sessionID not allowing dupes. You probably dont need the date, just a sessionID in the PK.
It is not always the best idea to implement ad-hoc ideas by users like this.
You should analyze what happened here, and make sure it cannot happen in the application design, not by arbitrary rules.
Example: If the update query adds fees to a bill, and this must happen only once per bill, then the update query should also set a flag "fees added" in the bill record. And it should not update bills with this flag set.
I coding a application that dealing with files. So, I have a table that contains information about all the files that registered in the application.
My "files" table looks like this: ID, Path and LastScanTime.
The algorithm that I use in my application is simple:
Take the oldest row (LastScanTime is the oldest)
Extract the file path
Do some magics on this file (takes exactly 5 minutes)
Update the LastScanTime to the current time (now)
Go to step "1"
Until now, the task is pretty simple. For this, I going to use this SQL statement for getting the oldest item:
SELECT TOP 1 * FROM files ORDER BY [LastScanTime] ASC
and at the end of the item processing (preventing the item to be selected immediately again):
UPDATE Files SET [LastScanTime]=GETDATE() WHERE Id=#ItemID
Now, I going to add some complexity to the algorithm:
Take the 3 oldest row (LastScanTime is the oldest)
For each row, do:
A. Extract the file path
B. Do some magics on this file (takes exactly 5 minutes)
C. Update the LastScanTime to the current time (now)
D. Go to step "1"
The problem that now I facing with is that the whole process is going to be processed in parallel (no more serial processing). So, changing my SQL statement to the next statement is not enough!
SELECT TOP 3 * FROM files ORDER BY [LastScanTime] ASC
Why this SQL statement isn't enough?
Let's say that I run my code and started to execute the first 3 items. Now, after a minute I want to execute another 3 items. This SQL statement will retrieve exactly the same "oldest" items that we already started to process.
Possible solution
Implementing a SELECT & UPDATE (combined) that getting the 3 oldest item and immediately update their last scan time. Since there no SELECT & UPDATE in same statement, what will happens if between the executing of the first SELECT, will come in another SELECT? The both statements will get the same results. This is a problem... Another problem is that we mark the item as "scanned recently", before the scan is really finished. What happend if the scanned will terminated by an error?
I'm looking for tips and tricks to solve this problem. The solutions can add columns as needed.
I'll appreciate you help.
Well I usually have habit of having two different field name in the database. one is AddedDate and another is ModifiedDate.
So the algorithm in your terms will be:-
Take the oldest row (AddedDate is the oldest)
Extract the file path
Do some process on this file
Update the ModifiedDate to the current time (now)
It seems that you are going to invent event queue with your SQL. Possibly standard approaches like RabbitMQ or ActiveMQ may solve your problem.
Check this simple piece of code that uses a generator to create unique primary keys in a Firebird table:
CREATE OR ALTER TRIGGER ON_BEFOREINSERT_PK_BOOKING_ITEM FOR BOOKING_ITEM BEFORE INSERT POSITION 0
AS
BEGIN
IF ((NEW.booking_item_id IS NULL) OR (NEW.booking_item_id = 0)) THEN BEGIN
SELECT GEN_ID(LastIdBookingItem, 1) FROM RDB$DATABASE INTO :NEW.booking_item_id;
END
END!
This trigger grabs and increments then assigns a generated value for the booking item id thus creating an auto-incremented key for the BOOKING_ITEM table. The trigger even checks that the booking id has not already been assigned a value.
The problem is the auto-incremented value will be lost (wasted) if, for some reason, the BOOKING_ITEM record cannot be posted.
I have a couple of ideas on how to avoid this wasting but have concerns about each one. Here they are:
Decrement the counter if a posting error occurs. Within the trigger I set up a try-except block (do try-except blocks even exist in Firebird PSQL?) and run a SELECT GEN_ID(LastIdBookingItem, -1) FROM RDB$DATABASEon post exceptions. Would this work? What if another transaction sneaks in and increments the generator before I decrement it? That would really mess things up.
Use a Temporary Id. Set the id to some unique temp value that I change to the generator value I want on trigger AFTER INSERT. This method feels somewhat contrived and requires a way that ensures that the temp id is unique. But what if the booking_item_id was supplied client side, how would I distinguish that from a temp id?. Plus I need another trigger
Use Transaction Control. This is like option 1. except instead of using the try-except block to reset the generator I start a transaction and then roll it back if the record fails to post. I don't know the syntax for using transaction control. I thought I read somewhere that SAVEPOINT/SET TRANSACTION is not allowed in PSQL. Plus the roll back would have to happen in the AFTER INSERT trigger so once again I need another trigger.
Surely this is an issue for any Firebird developer that wants to use Generators. Any other ideas? Is there something I'm missing?
Sequences are outside transaction control, and meddling with them to get 'gapless' numbers will only cause troubles because another transaction could increment the sequence as well concurrently, leading to gaps+duplicates instead of no gaps:
start: generator value = 1
T1: increment: value is 2
T2: increment: value is 3
T1: 'rollback', decrement: value is 2 (and not 1 as you expect)
T3: increment: value is 3 => duplicate value
Sequences should primarily be used for generating artificial primary keys, and you shouldn't care about the existence of gaps: it doesn't matter as long as the number uniquely identifies the record.
If you need an auditable sequence of numbers, and the requirement is that there are no gaps, then you shouldn't use a database sequence to generate it. You could use a sequence to assign numbers after creating and committing the invoice itself (so that it is sure it is persisted). An invoice without a number is simply not final yet. However even here there is a window of opportunity to get a gap, eg if an error or other failure occurs between assigning the invoice number and committing.
Another way might be to explicitly create a zero-invoice (marked as cancelled/number lost) with the gap numbers, so that the auditor knows what happened to that invoice.
Depending on local law and regulations, you shouldn't 're-use' or recycle lost numbers as that might be construed as fraud.
You might find other ideas in "An Auditable Series of Numbers". This also contains a Delphi project using IBObjects, but the document itself describes the problem and possible solutions pretty well.
What if, instead of using generators, you create a table with as many columns as the number of generators, giving each column the name of a generator. Something like:
create table generators
(
invoiceNumber integer default 0 not null,
customerId integer default 0 not null,
other generators...
)
Now, you have a table where you can increment invoice number using a SQL inside a transaction, something like:
begin transaction
update generator set invoiceNumber = invoiceNumber + 1 returning invoiceNumber;
insert into invoices set ..........
end transaction.
if anything goes wrong, the transaction would be rolled-back, together with the new
invoice number. I think there would be no more gaps in the sequence.
Enio
I have a view that lists employee (EmpID), request number (ReqNo), date request was opened (OpenDate) and the date it was moved to the next step in the process (AssignDate). What I am trying to do is get an average of the daily queue size. If EmpID 001 has 20 requests on 1/1/13, then has 24 on 1/2/13, 21 on 1/3/13 the average over 3 days should be 21.66, rounded up to 22. I have the following view:
CREATE VIEW EmpReqs
AS
SELECT [EmpID], [OpenDate], [AssignDate], [ReqID]
FROM [Metrics].[dbo].[Assignments]
WHERE OpenDate BETWEEN '01/01/2013' AND '12/31/2013' AND
[EmpID] IS NOT NULL AND
[ReqNo] NOT LIKE 'M%'
I then wrote a query to pull individual employee's queues per day:
/* First attempt to generate daily queue #s */
SELECT * FROM BLReqs
WHERE [BusLiaison] LIKE 'PN' AND
[OpenDate] <= '11/15/2013' AND
[AssignDate] > '11/15/2013'
Because no one has attempted to pull this information before, I have no way of verifying how accurate the above is. I tried using current dates, since I can see those in our database to compare but the code doesn't work, nothing is returned when I change the dates to 2014 and run my query.
What is the easiest way to verify that my code is correct, short of manually counting a day's queue?
Can anyone see any issues with the above scripts?
Is there a way to get the above code to work with current dates?
This question is really hard to answer because it is kind of broad and has little information at the same time. I'll try anyway:
Because no one has attempted to pull this information before, I have
no way of verifying how accurate the above is.
Try checking the result of this query for a few sampled dates.
I tried using current dates, since I can see those in our database to
compare but the code doesn't work, nothing is returned when I change
the dates to 2014 and run my query.
So clearly, the query is not working. You should probably find out why. Run the query for a date of which you know that it should return results but doesn't. Remove conditions one by one to see which one is incorrectly removing all rows. This should be enough to identify the bug.
Can anyone see any issues with the above scripts?
No, looks fine. A very simple query. That's why I said that we have too little information. There is some key piece of information missing that allows us to find the bug.
Is there a way to get the above code to work with current dates?
Stop staring at the code and hoping for a revelation. Debug it. Experiment.