Left join with multiple matches on right table with conditions

Left join with multiple matches on right table with conditions - sql

I am trying to do a SQL Server query where in my left table (database 1 - Db1), I have some Data that need to be checked some times.
I can`t write in this table (Db1, read only) so I have created other database Db2 that when I check the information related to some record in Db1, I save the date of this check at Db2.
Then, I have a list on the left table that shows me all the records that have to be check. When I create the check record in db2, for some time, 3-4 month, the list does not have to show me this record again.
So the query must handle that, if there is not record in db2, if is null, the record must be showed in the list, then if I create one check, for a time (3-4 month) does not have to show this record, when the 3-4 months have passed, the record have to be again in the list to remember to do a new check, then if I create a new check in Db2 with new date, the record at left has to hide again for 3-4 month more and so on.
I have tried this, that works if there is only one record at Db2, but as soon as I create another check, the old record, the first one, does not allow to hide the left record on the list.
I hope some one can give me a clue.
Thank you in advance
SELECT uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
FROM [GI].[dbo].[alq03] uno
LEFT JOIN [GI_impuestos].[dbo].[checkar] tres ON uno.n_contrato = tres.n_contrato
WHERE tres.n_contrato IS NULL
/*checks*/
AND (tres.fecha_checkar IS NULL)
OR (tres.fecha_checkar NOT BETWEEN (GETDATE() - 120) AND (GETDATE()))
GROUP BY uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
ORDER BY uno.fecha_fin asc

Here is a simple example of what you are asking for. You will have to modify it to suit your needs.
create table readOnly(
id int primary key,
info varchar(100) not null);
insert into readOnly values
(1,'info to check'),
(2,'more info'),
(3,'not finished yet!');
create table chk (
rid int primary key,
checked date);
insert into chk values
(1,'2021-07-01'),
(2,'2021-12-01')
select
id,
coalesce(cast(checked as char),'never') last_check,
info info_to_check
from
readOnly
left join
chk on id = rid
where
datediff(m,checked,getdate())>4
or checked is null
id | last_check | info_to_check
-: | :----------- | :----------------
1 | 2021-07-01 | info to check
3 | never | not finished yet!
db<>fiddle here

Related

How to add a row and timestamp one SQL Server table based on a change in a single column of another SQL Server table

[UPDATE: 2/20/19]
I figured out a pretty trivial solution to solve this problem.
CREATE TRIGGER TriggerClaims_History on Claims
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO Claims_History
SELECT name, status, claim_date
FROM Claims
EXCEPT SELECT name, status, claim_date FROM Claims_History
END
GO
I am standing up a SQL Server database for a project I am working on. Important info: I have 3 tables - enrollment, cancel, and claims. There are files located on a server that populate these tables every day. These files are NOT deltas (i.e. each new file placed on server every day contains data from all previous files) and because of this, I am able to simply drop all tables, create tables, and then populate tables from files each day. My question is regarding my claims table - since tables will be dropped and created each night, I need a way to keep track of all the different status changes.
I'm struggling to figure out the best way to go about this.
I was thinking of creating a claims_history table that is NOT dropped each night. Essentially I'd want my claims_history table to be populated each time an initial new record is added to the claims table. Then I'd want to scan the claims table and add a row to the claims_history table if and only if there was a change in the status column (i.e. claims.status != claims_history.status).
Day 1:
select * from claims
id | name | status
1 | jane doe | received
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
Day 2:
select * from claims
id | name | status
1 | jane doe | processed
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
1 | jane doe | processed | datetime
Is there a SQL script that can do this? I'd also like to automatically have the timestamp field populate in claims_history table each time a new row is added (status change). I know I could write a python script to handle something like this, but i'd like to keep it in SQL if at all possible. Thank you.

Acording to your questions you need to create a trigger after update of the column claims.status and it very simple to do that use this link to know and see how to do a simple trigger click here create asimple sql server trigger
then as if there is many problem to manipulate dateTime in a query a would suggest you to use UNIX time instead of using datetime you can use Long or bigInt UNix time store the date as a number to know the currente time simple use the query SELECT UNIX_TIMESTAMP()

A very common approach is to use a staging table and a production (or final) table. All your ETLs will truncate and load the staging table (volatile) and then you execute an Stored Procedure that adds only the new records to your final table. This requires that all the data you handle this way have some form of key that identifies unequivocally a row.
What happens if your files suddenly change format or are badly formatted? You will drop your table and won't be able to load it back until you fix your ETL. This approach will save you from that, since the process will fail while loading the staging table and won't impact the final table. You can also keep deleted records for historic reasons instead of having them deleted.
I prefer to separate the staging tables into their proper schema, for example:
CREATE SCHEMA Staging
GO
CREATE TABLE Staging.Claims (
ID INT,
Name VARCHAR(100),
Status VARCHAR(100))
Now you do all your loads from your files into these staging tables, truncating them first:
TRUNCATE TABLE Staging.Claims
BULK INSERT Staging.Claims
FROM '\\SomeFile.csv'
WITH
--...
Once this table is loaded you execute a specific SP that adds your delta between the staging content and your final table. You can add whichever logic you want here, like doing only inserts for new records, or inserting already existing values that were updated on another table. For example:
CREATE TABLE dbo.Claims (
ClaimAutoID INT IDENTITY PRIMARY KEY,
ClaimID INT,
Name VARCHAR(100),
Status VARCHAR(100),
WasDeleted BIT DEFAULT 0,
ModifiedDate DATETIME,
CreatedDate DATETIME DEFAULT GETDATE())
GO
CREATE PROCEDURE Staging.UpdateClaims
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
-- Update changed values
UPDATE C SET
Name = S.Name,
Status = S.Status,
ModifiedDate = GETDATE()
FROM
Staging.Claims AS S
INNER JOIN dbo.Claims AS C ON S.ID = C.ClaimID -- This has to be by the key columns
WHERE
ISNULL(C.Name, '') <> ISNULL(S.Name, '') AND
ISNULL(C.Status, '') <> ISNULL(S.Status, '')
-- Insert new records
INSERT INTO dbo.Claims (
ClaimID,
Name,
Status)
SELECT
ClaimID = S.ID,
Name = S.Name,
Status = S.Status
FROM
Staging.Claims AS S
WHERE
NOT EXISTS (SELECT 'not yet loaded' FROM dbo.Claims AS C WHERE S.ID = C.ClaimID) -- This has to be by the key columns
-- Mark deleted records as deleted
UPDATE C SET
WasDeleted = 1,
ModifiedDate = GETDATE()
FROM
dbo.Claims AS C
WHERE
NOT EXISTS (SELECT 'not anymore on files' FROM Staging.Claims AS S WHERE S.ClaimID = C.ClaimID) -- This has to be by the key columns
COMMIT
END TRY
BEGIN CATCH
DECLARE #v_ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR (#v_ErrorMessage, 16, 1)
END CATCH
END
This way you always work with dbo.Claims and the records are never lost (just updated or inserted).
If you need to check the last status of a particular claim you can create a view:
CREATE VIEW dbo.vClaimLastStatus
AS
WITH ClaimsOrdered AS
(
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
DateRanking = ROW_NUMBER() OVER (PARTITION BY C.ClaimID ORDER BY C.CreatedDate DESC)
FROM
dbo.Claims AS C
)
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
FROM
ClaimsOrdered AS C
WHERE
DateRanking = 1

How to design a SQL table where a field has many descriptions

I would like to create a product table. This product has unique part numbers. However, each part number has various number of previous part numbers, and various number of machines where the part can be used.
For example the description for part no: AA1007
Previous part no's: AA1001, AA1002, AA1004, AA1005,...
Machine brand: Bosch, Indesit, Samsun, HotPoint, Sharp,...
Machine Brand Models: Bosch A1, Bosch A2, Bosch A3, Indesit A1, Indesit A2,....
I would like to create a table for this, but I am not sure how to proceed. What I have been able to think is to create a table for Previous Part no, Machine Brand, Machine Brand Models individually.
Question: what is the proper way to design these tables?

There are of course various ways to design the tables. A very basic way would be:
You could create tables like below. I added the columns ValidFrom and ValidTill, to identify at which time a part was active/in use.
It depends on your data, if datatype date is enough, or you need datetime to make it more exactly.
CREATE TABLE Parts
(
ID bigint NOT NULL
,PartNo varchar(100)
,PartName varchar(100)
,ValidFrom date
,ValidTill date
)
CREATE TABLE Brands
(
ID bigint NOT NULL
,Brand varchar(100)
)
CREATE TABLE Models
(
ID bigint NOT NULL
,BrandsID bigint NOT NULL
,ModelName varchar(100)
)
CREATE TABLE ModelParts
(
ModelsID bigint NOT NULL
,PartID bigint NOT NULL
)
Fill your data like:
INSERT INTO Parts VALUES
(1,'AA1007', 'Screw HyperFuturistic', '2017-08-09', '9999-12-31'),
(1,'AA1001', 'Screw Iron', '1800-01-01', '1918-06-30'),
(1,'AA1002', 'Screw Steel', '1918-07-01', '1945-05-08'),
(1,'AA1004', 'Screw Titanium', '1945-05-09', '1983-10-05'),
(1,'AA1005', 'Screw Futurium', '1983-10-06', '2017-08-08')
INSERT INTO Brands VALUES
(1,'Bosch'),
(2,'Indesit'),
(3,'Samsung'),
(4,'HotPoint'),
(5,'Sharp')
INSERT INTO Models VALUES
(1,1,'A1'),
(2,1,'A2'),
(3,1,'A3'),
(4,2,'A1'),
(5,2,'A2')
INSERT INTO ModelParts VALUES
(1,1)
To select all parts of a certain date (in this case 2013-03-03) of the "Bosch A1":
DECLARE #ReportingDate date = '2013-03-03'
SELECT B.Brand
,M.ModelName
,P.PartNo
,P.PartName
,P.ValidFrom
,P.ValidTill
FROM Brands B
INNER JOIN Models M
ON M.BrandsID = B.ID
INNER JOIN ModelParts MP
ON MP.ModelsID = M.ID
INNER JOIN Parts P
ON P.ID = MP.PartID
WHERE B.Brand = 'Bosch'
AND M.ModelName = 'A1'
AND P.ValidFrom <= #ReportingDate
AND P.ValidTill >= #ReportingDate
Of course there a several ways to do an historization of data.
ValidFrom and ValidTill (ValidTo) is one of my favourites, as you can easily do historical reports.
Unfortunately you have to handle the historization: When inserting a new row - in example for your screw - you have to "close" the old record by setting the ValidTill column before inserting the new one. Furthermore you have to develop logic to handle deletes...
Well, thats a quite large topic. You will find tons of information in the world wide web.

For the part number table, you can consider the following suggestion:
id | part_no | time_created
1 | AA1007 | 2017-08-08
1 | AA1001 | 2017-07-01
1 | AA1002 | 2017-06-10
1 | AA1004 | 2017-03-15
1 | AA1005 | 2017-01-30
In other words, you can add a datetime column which versions each part number. Note that I added a primary key id column here, which is invariant over time and keeps track of each part, despite that the part number may change.
For time independent queries, you would join this table using the id column. However, the part number might also serve as a foreign key. Off the top of my head, if you were generating an invoice from a previous date, you might lookup the appropriate part number at that time, and then join out to one or more tables using that part number.
For the other tables you mentioned, I do not see a similar requirement.

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.

First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null

Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya

You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.

Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null

Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

Maintaining logical consistency with a soft delete, whilst retaining the original information

I have a very simple table students, structure as below, where the primary key is id. This table is a stand-in for about 20 multi-million row tables that get joined together a lot.
+----+----------+------------+
| id | name | dob |
+----+----------+------------+
| 1 | Alice | 01/12/1989 |
| 2 | Bob | 04/06/1990 |
| 3 | Cuthbert | 23/01/1988 |
+----+----------+------------+
If Bob wants to change his date of birth, then I have a few options:
Update students with the new date of birth.
Positives: 1 DML operation; the table can always be accessed by a single primary key lookup.
Negatives: I lose the fact that Bob ever thought he was born on 04/06/1990
Add a column, created date default sysdate, to the table and change the primary key to id, created. Every update becomes:
insert into students(id, name, dob) values (:id, :name, :new_dob)
Then, whenever I want the most recent information do the following (Oracle but the question stands for every RDBMS):
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by created desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: All queries over the entire database take that little bit longer. If the table was the size indicated this doesn't matter but once you're on your 5th left outer join using range scans rather than unique scans begins to have an effect.
Add a different column, deleted date default to_date('2100/01/01','yyyy/mm/dd'), or whatever overly early, or futuristic, date takes my fancy. Change the primary key to id, deleted then every update becomes:
update students x
set deleted = sysdate
where id = :id
and deleted = ( select max(deleted) from students where id = x.id );
insert into students(id, name, dob) values ( :id, :name, :new_dob );
and the query to get out the current information becomes:
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by deleted desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: Two DML operations; I still have to use ranked queries with the additional cost or a range scan rather than a unique index scan in every query.
Create a second table, say student_archive and change every update into:
insert into student_archive select * from students where id = :id;
update students set dob = :newdob where id = :id;
Positives: Never lose any information.
Negatives: 2 DML operations; if you ever want to get all the information ever you have to use union or an extra left outer join.
For completeness, have a horribly de-normalised data-structure: id, name1, dob, name2, dob2... etc.
If number 1 is not an option if I never want to lose any information and always do a soft delete. Number 5 can be safely discarded as causing more trouble than it's worth.
I'm left with options 2, 3 and 4 with their attendant negative aspects. I usually end up using option 2 and the horrific 150 line (nicely-spaced) multiple sub-select joins that go along with it.
tl;dr I realise I'm skating close to the line on a "not constructive" vote here but:
What is the optimal (singular!) method of maintaining logical consistency while never deleting any data?
Is there a more efficient way than those I have documented? In this context I'll define efficient as "less DML operations" and / or "being able to remove the sub-queries". If you can think of a better definition when (if) answering please feel free.

I'd stick to #4 with some modifications.No need to delete data from original table ; it's enough to copy old values to archive table before updating(or before deleting) original record. That's can be easily done with row level trigger. Retrieving all information in my opinion is not a frequent operation, and I don't see anything wrong with extra join /union. Also, you can define a view , so all queries will be straightforward from end user perspective.

MySQL LEFT JOIN SELECT not selecting all the left side records?

I'm getting odd results from a MySQL SELECT query involving a LEFT JOIN, and I can't understand whether my understanding of LEFT JOIN is wrong or whether I'm seeing a genuinely odd behavior.
I have a two tables with a many-to-one relationship: For every record in table 1 there are 0 or more records in table 2. I want to select all the records in table 1 with a column that counts the number of related records in table 2. As I understand it, LEFT JOIN should always return all records on the LEFT side of the statement.
Here's a test database that exhibits the problem:
CREATE DATABASE Test;
USE Test;
CREATE TABLE Dates (
dateID INT UNSIGNED NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
UNIQUE KEY (dateID)
) TYPE=MyISAM;
CREATE TABLE Slots (
slotID INT UNSIGNED NOT NULL AUTO_INCREMENT,
dateID INT UNSIGNED NOT NULL,
UNIQUE KEY (slotID)
) TYPE=MyISAM;
INSERT INTO Dates (date) VALUES ('2008-10-12'),('2008-10-13'),('2008-10-14');
INSERT INTO Slots (dateID) VALUES (3);
The Dates table has three records, and the Slots 1 - and that record points to the third record in Dates.
If I do the following query..
SELECT d.date, count(s.slotID) FROM Dates AS d LEFT JOIN Slots AS s ON s.dateID=d.dateID GROUP BY s.dateID;
..I expect to see a table with 3 rows in - two with a count of 0, and one with a count of 1. But what I actually see is this:
+------------+-----------------+
| date | count(s.slotID) |
+------------+-----------------+
| 2008-10-12 | 0 |
| 2008-10-14 | 1 |
+------------+-----------------+
The first record with a zero count appears, but the later record with a zero count is ignored.
Am I doing something wrong, or do I just not understand what LEFT JOIN is supposed to do?

You need to GROUP BY d.dateID. In two of your cases, s.DateID is NULL (LEFT JOIN) and these are combined together.
I think you will also find that this is invalid (ANSI) SQL, because d.date is not part of a GROUP BY or the result of an aggregate operation, and should not be able to be SELECTed.

I think you mean to group by d.dateId.

Try removing the GROUP BY s.dateID

The dateid for 10-12 and 10-13 are groupd together by you. Since they are 2 null values the count is evaluated to 0

I don't know if this is valid in MySQL but you could probably void this mistake in the future by using the following syntax instead
SELECT date, count(slotID) as slotCount
FROM Dates LEFT OUTER JOIN Slots USING (dateID)
GROUP BY (date)
By using the USING clause you don't get two dateID's to keep track of.

replace GROUP BY s.dateID with d.dateID.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Left join with multiple matches on right table with conditions - sql

Related

How to add a row and timestamp one SQL Server table based on a change in a single column of another SQL Server table

How to design a SQL table where a field has many descriptions

Get all missing values between two limits in SQL table column

Maintaining logical consistency with a soft delete, whilst retaining the original information

MySQL LEFT JOIN SELECT not selecting all the left side records?

Categories

Resources