select records that do not exist in a select - sql

I need to track changes created in a directory and also saved the history. I have a function that scans the files in that directory then it inserts them in a table. let's say that the first time this program was run there where files A, and B. As a result the table should look like
FileID File DateModified
1 A 101010
2 B 020202
let's say the user modifies file B therefore the next time the program runs the table should look like:
FileID File DateModified
1 A 101010
2 B 020202
3 A 101010
4 B 030303
From looking at the table above we know that file B has been changed because it has a different modified date and also that file A was not modified. Moreover my program know that the records that where inserted are all the records with a fileID greater than 2. How could I perform a select that will return the last file B because that file was modified. I want to be able to know which files have been modified how could I build that query.
Please read above first in order to understand this part. Here is another example.
First time program runs:
FileID File DateModified
1 X 101010
2 Y 020202
Next time program runs:
FileID File DateModified
1 X 101010
2 Y 020202
3 Y 020202
4 A 010101
so far we know that file X has been deleted because it is not included in the new scan. Moreover we know that file A has been created. And lastly that File Y was not modified it is the same. I would like to perform a select where I can just get the files that where created or modified such as file A in this case.
I am looking for something like:
select * from table1 where fileID > 2 AND File NOT IN (SELECT File FROM table1 WHERE File <=2) AND DateModified NOT IN (SELECT DateModified FROM table1 WHERE File <=2)
I don't know why is it that when I perform such query I get different results. Maybe I will have to group the File and DateModified into one column to make it work.

I would add a column called scan_number so that you can compare the latest scan with the previous scan.
SELECT curr.file, prev.file, curr.DateModified, prev.DateModified
FROM table1 curr
LEFT
JOIN table1 prev
on curr.file = prev.file
and curr.scan_number = 100
and prev.scan_number = 99
WHERE curr.DateModified != prev.DateModified
OR curr.file IS NULL
OR prev.file IS NULL
If you want to catch inserts and deletes, you need full outer join, but it seems sqlite doesn't support that. You might have to run the query twice, once to find inserts and updates, and once to find deletes.

DateModified is being asked to perform too many jobs: it is used to track both the file modification date and proof-of-existence for a given filename on a given date.
You could add another column, ScanId, a foreign key to a new table ScanDates that records a scan id and the date the scan was run.
That way, you could inspect all the results with a given ScanId and compare against any selected previous scan, and also keep track of the real modification dates of your files.

Related

Script that will take a field from one table and see if there has been an entry in another table containing that field

I have an issue where we are monitoring uploads into the database and alerting if a file takes more than 10 minutes to upload.
However we are finding that a majority of the alerts are happening because the file size is such that it is taking more than 10 minutes to upload.
So what I am hoping that someone can help me with is a script that will take the ID from table A (unique per upload) and look in the second table for anything in the last 10 minutes that contains that ID in column B of the second table?
So Table A, Upload ID 121212 created > now - 10 minutes
In Table B has there been an entry into the table in the last 10 minutes containing 121212 in column UploadFile.
If No then return positive, if yes then end
What I thought would be a simple task I am failing miserably at creating
You can have query like below. Use INNER JOIN with condition for UploadID and B's Created between A's Created and 10 minutes from A's Created.
Rererence - DATEADD (Transact-SQL)
IF EXISTS(SELECT 1
FROM A
INNER JOIN B
ON B.UploadFile LIKE CONCAT('%:', A.UploadID)
AND B.Created BETWEEN A.Created AND DATEADD(MINUTE, 10, A.Created)
WHERE A.UploadID = 121212)
BEGIN
RETURN 0;
END
ELSE
BEGIN
RETURN 1;
END

What is the best way to reassign ordinal number of a move operation

I have a column in the sql server called "Ordinal" that is used to indicate the display order of the rows. It starts from 0 and skips 10 for the next row. so we have something like this:
Id Ordinal
1 0
2 20
3 10
It skips 10 because we wanted to be able to move item in between items (based on ordinal) without having to reassign ordinal number for the entire table.
As you can imagine eventually, Ordinal number will need to be reassign somehow for a move in between operation either on surrounding rows or for the entire table as the unused ordinal numbers between the target items are all used up.
Is there any algorithm that I can use to effectively reorder the ordinal number for the move operation taken in the consideration like long term maintainability of the table and minimizing update operations of the table?
You can re-number the sequences using a somewhat complicated UPDATE statement:
UPDATE u
SET u.sequence = 10 * (c.num_below-1)
FROM test u
JOIN (
SELECT t.id, count(*) AS num_below
FROM test t
JOIN test tr ON tr.sequence <= t.sequence
GROUP BY t.id
) c ON c.id=u.id
The idea is to obtain a count of items with the sequence lower than that of the current row, multiply the count by ten, and assign it as the new count.
The content of test before the UPDATE:
ID Sequence
__ ________
1 0
2 10
3 20
4 12
The content of test after the UPDATE:
ID Sequence
__ ________
1 0
2 30
3 10
4 20
Now the sequence numbers are evenly spread again, so you can continue inserting in the middle until you run out of new sequence numbers; then you can re-number again.
Demo.
These won't answer your question directly--I just thought I might suggest some other approaches:
One possibility--don't try to do it by hand. Have your software manage the numbers. If they need re-writing, just save them with new numbers.
a second--use a "Linked List" instead. In each record store the index of the next record you want displayed, then have your code load that directly into a linked list.
Yet another simple approach. Let's say you're inserting a new record with an ordinal equal x.
First, check if there's a row having ordinal value equal x. In case there's one, just update all the records having the ordinal value equal or bigger than x increasing them by y. Then, you are safe to insert a new record.
This way you're sure you'll not run update every time and of course, you'll keep the order.

sql query to fetch only those records where sum(colNm)<xyz and store first and last records rowid/pk

I have a table with millions of records which holds information about a user, his or her documents in a BLOB, and a column holding the file size per row.
While reporting I need to extract all these records along with their attachments and store them in a folder. However, the constraint is that the folder size should not exceed 4GB.
What I need, is to fetch records only till that record, where the summation of file sizes is less than 4GB. I have hardly any experience in databases, and do not have any DB expert to refer.
for eg - say i need to fetch only records till sum(fileSize) < 9
Name fileSize
A 1
B 2
C 3
D 2
E 9
F 4
My query needs to return records A,B,C and D.
Also, i need to store the rowID/uniqueID of the first and last record for another subsequent process.
The DB being used is IBM DB2.
Thanks!
So here is some trick how you can find your file size. and in procedure you can manage with data.
select length(file_data) from files
where length(file_data)<99999999;
LENGTH(FILE_DATA)
82944
82944
91136
3 rows selected.
select dbms_lob.getlength(file_data) from files
where length(file_data)<89999;
DBMS_LOB.GETLENGTH(FILE_DATA)
82944
82944
2 rows selected.
dbms_lob.getlength() vs. length() to find blob size in oracle
hope this helps....

Limit results to x groups

I'm developing a system using Trac, and I want to limit the number of "changelog" entries returned. The issue is that Trac collates these entries from multiple tables using a union, and then later combines them into single 'changesets' based on their timestamp. I wish to limit the results to the latest e.g. 3 changesets, but this requires retrieving as many rows as necessary until I've got 3 unique timestamps. Solution needs to work for SQLite/Postgres.
Trac's current SQL
Current SQL Result
Time User Field oldvalue newvalue permanent
=======================================================================
1371806593507544 a owner b c 1
1371806593507544 a comment 2 lipsum 1
1371806593507544 a description foo bar 1
1371806593324529 b comment hello world 1
1371806593125677 c priority minor major 1
1371806592492812 d comment x y 1
Intended SQL Result (Limited to 1 timestamp e.g.)
Time User Field oldvalue newvalue permanent
=======================================================================
1371806593507544 a owner b c 1
1371806593507544 a comment 2 lipsum 1
1371806593507544 a description foo bar 1
As you already pointed out on your own, this cannot be resolved in SQL due to the undetermined number of results. And I think this is not even required.
You can use a slightly modified trac/ticket/templates/ticket.html Genshi template to get what you want. Change
<div id="changelog">
<py:for each="change in changes">
into
<div id="changelog">
<py:for each="change in changes[-3:]">
and place the file into <env>/templates/ restart your web-server. But watch out for changes to ticket.html, whenever you attempt to upgrade your Trac install. Every time you do that, you might need to re-apply this change on the current template of the respective version. But IMHO its still a lot faster and cleaner than to patch Trac core code.
If you want just three records (as in the "Data Limit 1" result set), you can use limit:
select *
from t
order by time desc
limit 3
If you want all records for the three most recent time stamps, you can use a join:
select t.*
from t join
(select distinct time
from t
order by times desc
limit 3
) tt
on tt.time = t.time

Join More Than 2 Tables

I have three tables.
Table Data contains data for individual parts that come from a
"data.txt" file.
Table Limits contains the limits for the Data table
from a "limits.txt" file.
Table Files is a listing for
each individual .txt file above.
So the "Files" table looks like this. As you can see it is a listing of each file that exists. The LimitsA file will contain the limits for every Data file of type A.
ID File_Name Type Sub-Type
1 DataA_10 A 10
2 DataA_20 A 20
3 DataA_30 A 30
4 LimitsA A NONE
5 DataB_10 B 10
6 DataB_20 B 20
7 LimitsB B NONE
The "Data" table looks like this. The File_ID is the foreign key from the "Files" table. Specifically, this would be data for DataA_10 above:
ID File_ID Dat1 Dat2 Dat3... Dat20
1 1 50 52 53
2 1 12 43 52
3 1 32 42 62
The "Limits" table looks like this. The File_ID is the foreign key from the "Files" table. Specifically, this would be data for LimitsA above:
ID File_ID Sub-Type Lim1 Lim2
1 4 10 40 60
2 4 20 20 30
3 4 30 10 20
So what I want to do is JOIN the correct limits from the "Limit" table to the data from the corresponding "Data" table. Each row of DataA_10 would have the limits of "40" and "60" from the LimitsA table. Unfortunately there is no way to directly link the limits table to the data table. The only way to do this would be to look back to the files table and see that LimitsA and DataA_10 are of type A. Once I link those two together I then need to specifically only grab the Limits for Sub-Type 10.
In the end I would like to have a result that looks like this.
Result:
ID File_ID Dat1 Dat2 Dat3... Dat20 Lim1 Lim2
1 1 50 52 53 40 60
2 1 12 43 52 40 60
3 1 32 42 62 40 60
I hope this is clear enough to understand. It seems to me like an issue of joining more than 2 tables, but I have been unable to find a suitable solution online as of yet. If you have a solution or any advice it would be greatly appreciated.
Your 'Files' table is actually 2 separate (but related) concepts that have been merged. If you break them out using subqueries you'll have a much easier time making a join. Note that joining like this is not the most efficient method, but then again neither is the given schema...
SELECT Data.*, Limits.Lim1, Limits.Lim2
FROM (SELECT * FROM Files WHERE SubType IS NOT NULL) DataFiles
JOIN (SELECT * FROM Files WHERE SubType IS NULL) LimitFiles
ON LimitFiles.Type = DataFiles.Type
JOIN Data
ON DataFiles.ID = Data.File_ID
JOIN Limits
ON LimitFiles.ID = Limits.File_ID
AND DataFiles.SubType = Limits.SubType
ORDER BY Data.File_ID
UPDATE
To be more specific on how to improve the schema: Currently, the Files table doesn't have a clear way to differentiate between Data and Limit file entries. Aside from this, the Data entries don't have a clear link to a single Limit file entry. Although both of these can be figured out as in the SQL above, such logic might not play well with the query optimizer, and certainly can't guarantee the Data-Limit link that you require.
Consider these options:
Instead of linking to a 'Limit' file via Type, link directly to a Limit entry Id. Set a foreign key on that link to ensure the expected Limit entry is available.
Separate the 'Limit' entries from the 'Data' entries by putting them in a separate table.
Create an index on the foreign key. For that matter, add indices for all foreign keys - SQL Server doesn't do this by default.
Of these, I would consider having a foreign key as essential, and the others as modest improvements.