sql join: Detect changes - sql

Periodically, I want to compare a global sql table (called "resource") with a local backup one (called "region_db") to see if a field has been changed. The field I'm monitoring this way is called "state", and the primary key is called "id". Currently I'm doing
SELECT id, state FROM resource
Then manually going through the resulting rows in a loop. For each (id, state) tuple, I do
SELECT state FROM region_db WHERE id = id
And check if the state from the local region_db matches the one from the global resource db. I'm able to detect two cases this way: 1) when a new id is added to resource, and 2) when the state of an existing row changes.
However, I'm missing the case where a row is deleted from the resource table.
I'm thinking about using JOINs but not sure about how to efficiently distinguish between the three cases (modify existing, add new, and delete row from resource table) while minimizing the number of JOINs / DB operations.

You can use full join:
select coalesce(r.id, reg.id) as id,
(case when r.id is null then 'DELETED'
when reg.id is null then 'CREATED'
else 'UPDATED'
end)
from resource r full join
region_db reg
on r.id = reg.id
where r.id is null or reg.id is null or r.state <> reg.state; -- something changed

WITH joined AS (
SELECT
region.state as 'region_state',
resource.state as 'global_state'
FROM
resource
INNER JOIN
region_db
ON
resource.id = region_db.id
) SELECT * FROM joined WHERE region_state <> 'global_state';
;
This query will get you a table that reflects when the state of an existing row changes. If you do a left join instead of an inner join in the with query, you will get records that may have been added but not backed up yet to region_db. Like-wise, with a right join, you may get records that have been deleted but not propagated yet.
Hopefully this helps.

You could use a UNION ALL that should tell you the differences in the tables -- basically checking for where count(*) = 1 meaning where the rows don't match (because of the GROUP BY)
SELECT id,state
FROM (
SELECT id, state FROM resource
UNION ALL
SELECT id,state FROM region_db
) tbl
GROUP BY id, state
HAVING count(*) = 1
ORDER BY id;

Related

Join to grab only non-matching records SQL

I have some data which I'm trying to clean in order to run further analysis on. One of the columns in the table is called record_type, this can either be NEW or DEL. This means that initially a NEW record might be added but then a DEL record would come in later to say that particular record is now expired (NEW and DEL records would be matched on the record_id). However both the NEW and DEL record would stay in the data, it doesn't get deleted.
So what I had planned to do is to create two CTEs, one for DEL records only and one for NEW records only:
WITH deleted_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'DEL'
)
, new_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'NEW'
)
Then outer join on the record_id column in both the CTEs.
SELECT *
FROM new_rec
FULL OUTER JOIN deleted_rec ON deleted_rec.record_id = new_rec.record_id
The goal would have been for the output to only include records which haven't had a DEL record come in for that record_id so that way I can guarantee that all the type NEW records I have in my final table would not have had a DEL record come in for them at any point and they would therefore all be active. However, I had forgotten that FULL OUTER JOIN return everything rather than just what didn't match so is there a way to get around this to get my desired output?
I would just use a single query with exists logic:
SELECT *
FROM main_table t1
WHERE record_type = 'NEW' AND
NOT EXISTS (SELECT 1 FROM main_table t2
WHERE t2.id = t1.id AND t2.record_type = 'DEL');
In plain English, the above query says to find all records which are NEW which also do not have associated with the same id another record having DEL.

How to Select table for Join using Case in SQL

i have a table attribute_name in which a column c_type indicate what type of value we have like 1,2,3,4 so that base on that value i decide which table to join .
so i select that table first Join (case statment) On (case statment)
but i does not work.
SELECT attribute_names.*,attributes_trans_name.*,
(CASE
WHEN attribute_names.c_type=1
THEN attribute_values_text.c_fk_files_id
WHEN attribute_names.c_type=3
THEN attribute_values_longtext.c_fk_files_id
WHEN attribute_names.c_type=8
THEN attribute_values_file.c_fk_files_id
END) as file_id
From attributes_trans_name,
attribute_names JOIN
(CASE
WHEN attribute_names.c_type=1
THEN attribute_values_text
WHEN attribute_names.c_type=3
THEN attribute_values_longtext
WHEN attribute_names.c_type=8
THEN attribute_values_file
END)
ON
(CASE
WHEN attribute_names.c_type=1
THEN attribute_values_text.c_fk_attribute_names_id
WHEN attribute_names.c_type=3
THEN attribute_values_longtext.c_fk_attribute_names_id
WHEN attribute_names.c_type=8
THEN attribute_values_file.c_fk_attribute_names_id
END) = attribute_names.c_id
WHERE
attribute_names.c_id=attributes_trans_name.c_fk_attribute_names_id
With proper JOIN/LEFT JOIN context, you can do in single query. Left join means I want the record from the left side always, but OPTIONAL if there is a match on the right side. So, I have adjusted your query to reflect. I have also rewritten to use "alias" names for the file names so it is shorter for read and write than bulky long table names.
So, the main table is the attribute_names as that appears to be the basis of all the joins with the C_ID column into each of the others. Notice indentation helps me know / follow what is linked to what, and not just all tables listed in bulk.
Now, by having each of the left-joins in place, it will ALWAYS TRY to link to their respective other tables by the foreign key, but as you know your data, only one of them will really have the piece of information you need. So your CASE construct is simplified down. If = 1, then look at the ATV (alias) table and its column, otherwise AVLT alias if = 3 and finally AVF if = 8
SELECT
AN.*,
ATN.*,
CASE WHEN AN.c_type = 1
THEN ATV.c_fk_files_id
WHEN AN.c_type = 3
THEN AVLT.c_fk_files_id
WHEN AN.c_type = 8
THEN AVF.c_fk_files_id END as file_id
From
attribute_names AN
JOIN attributes_trans_name ATN
ON AN.c_id = ATN.c_fk_attribute_names_id
LEFT JOIN attribute_values_text AVT
ON AN.c_id = AVT.c_fk_attribute_names_id
LEFT JOIN attribute_values_longtext AVLT
ON AN.c_id = AVLT.c_fk_attribute_names_id
LEFT JOIN attribute_values_file AVF
ON AN.c_id = AVF.c_fk_attribute_names_id

SQL Server group by foreign key and select dependant columns

I have some performance issues when querying in SQL server. I need to GROUP BY foreign key (academic_unit_id), but I also need to select a column that is dependant on the FK (academic_unit_name).
In SQL Server I can't just select academic_unit_name in the same query, it must be aggregated or in the GROUP BY.
I think the options I have are:
SELECT academic_unit_id (foreign key) and academic_unit_unit_name (dependant on FK), and then group by both
SELECT
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD, -- id
alumno.NOM_UNIDAD_ACADEM_SCD, -- name
ROUND(COUNT(case when opcion.punto = 1 then 1 end), 2) as amount_yes,
ROUND(COUNT(case when opcion.punto = 0 then 1 end), 2) as amount_no
FROM BANNER_ENCUESTA.R_RESP_ITEM_ALUMNO_CSP_SCD AS ria INNER JOIN
BANNER_ENCUESTA.TIPO_OPCION_SCD AS opcion ON ria.COD_TIPO_OPCION_SCD = opcion.COD_TIPO_OPCION_SCD INNER JOIN
BANNER_ENCUESTA.ALUMNO_SCD AS alumno ON ria.COD_ALUMNO_SCD = alumno.COD_ALUMNO_SCD
GROUP BY
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD, -- group by FK
alumno.NOM_UNIDAD_ACADEM_SCD -- group by name
GROUP BY PK and aggregate the academic_unit_name. I can aggregate using max, since all names are equals for a given id.
SELECT
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD, -- id
MAX(alumno.NOM_UNIDAD_ACADEM_SCD), -- aggregate name
ROUND(COUNT(case when opcion.punto = 1 then 1 end), 2) as amount_yes,
ROUND(COUNT(case when opcion.punto = 0 then 1 end), 2) as amount_no
FROM BANNER_ENCUESTA.R_RESP_ITEM_ALUMNO_CSP_SCD AS ria INNER JOIN
BANNER_ENCUESTA.TIPO_OPCION_SCD AS opcion ON ria.COD_TIPO_OPCION_SCD = opcion.COD_TIPO_OPCION_SCD INNER JOIN
BANNER_ENCUESTA.ALUMNO_SCD AS alumno ON ria.COD_ALUMNO_SCD = alumno.COD_ALUMNO_SCD
GROUP BY
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD --Group by FK
SELECT only academic_unit_id and then JOIN with AcademicUnit again to obtain the name.
with banner_questions as (
SELECT
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD -- id,
ROUND(COUNT(case when opcion.punto = 1 then 1 end), 2) as amount_yes,
ROUND(COUNT(case when opcion.punto = 0 then 1 end), 2) as amount_no
FROM BANNER_ENCUESTA.R_RESP_ITEM_ALUMNO_CSP_SCD AS ria INNER JOIN
BANNER_ENCUESTA.TIPO_OPCION_SCD AS opcion ON ria.COD_TIPO_OPCION_SCD = opcion.COD_TIPO_OPCION_SCD INNER JOIN
BANNER_ENCUESTA.ALUMNO_SCD AS alumno ON ria.COD_ALUMNO_SCD = alumno.COD_ALUMNO_SCD
GROUP BY
ria.COD_DOCENTE_SCD,
ria.COD_CURSO_SECCION_SCD,
ria.COD_ITEM_SCD,
ria.COD_GRUPO_PREGUNTA_SCD,
alumno.IDN_UNIDAD_ACADEM_SCD) -- group by FK
SELECT
banner_questions.*,
student_ua.name -- Join with name
from NORMALIZADO_PRELIMINAR.AcademicUnit as student_ua INNER JOIN
banner_questions on student_ua.id = banner_questions.IDN_UNIDAD_ACADEM_SCD
In terms of performance, I'd like to know if one of these alternatives is better and under what assummptions. Also, I'd like to know if there are better choices to get the same result.
In the question I think you mean Foreign key rather than Primary key... the field is a Primary key in another table Academic_unit but is looking at, say, student_unit records which have an FK to Academic_unit.
So the question is for the field alumno.NOM_UNIDAD_ACADEM_SCD - do you GROUP BY it, MAX() it or JOIN it later?
Personally I suggest just
trying all three and see which ones run the fastest - which is best really depends on specific circumstances - and they often run very similarly
use the simplest version if they run at similar speeds - which is likely to be the GROUP BY version
In particular, the GROUP BY and MAX() should result in almost identical plans as they are sorted the same way.
The 'join it later' approach can have some speed advantages in certain circumstances (particularly when it's not just being joined to a reference table, but to a broader set of sub-queries), but I'm often wary about these. They have the disadvantage of making your code a bit more complex - which can have issues if you use the data for other things, or if SQL Server has bad estimates for the amount of data it expects. In this case, as this is just linking to the reference table alumno, it's unlikely to give any specific advantage.
In your code for option 3 above, you still have links to BANNER_ENCUESTA.ALUMNO_SCD AS alumno. The advantage of doing the join later would be to remove that from the initial grouping component, then link to it later to get the specific values e.g.,
In the GROUP BY within the CTE, also group by ria.COD_ALUMNO_SCD, but remove BANNER_ENCUESTA.ALUMNO_SCD AS alumno from the FROM clause
Put BANNER_ENCUESTA.ALUMNO_SCD AS alumno into the main SELECT part of the query, and join to banner_questions on that field
Note there is also a fourth option (temporary tables) which is used when
SQL Server gets estimates for how many rows it expects really wrong - and makes a really bad plan
You're joining not to reference tables, but to views (particularly if they have 'TOP' expressions or 'GROUP BY' in them) - in these cases, SQL Server may sometimes run the view completely once for every row in the join.
In these cases, it can be useful to split the query into two parts along the lines of #3, but instead of a CTE, you save it into a temporary table e.g., SELECT .... INTO #temp FROM ... GROUP BY.
You then use the temporary table, joined to the view that was problematic, and it will often run better.

How does SQL Server Update rows with more than one value?

In an update statement for a temp table, how does SQL Server decide which value to use when there are multiple values returned, for example:
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
In this situation the problem is more than one dte_start_date is returned for each id value in the temp table. There is there's no index or unique value in the tables I'm working on so I need to know how SQL Server will choose between the different values.
It is non-deterministic. See the following example for a better understanding. Though it is not exactly the same scenario explained here, it is pretty similar
When the single value is to be retrieved from the database also use the SET statement with a query to set the value. For example:
SET #v_user_user_id = (SELECT u.user_id FROM users u WHERE u.login = #v_login);
Reason: Unlike Oracle, SQL Server does not raise an error if more than one row is returned from a SELECT query that is used to populate variables. The above query will throw an exception whereas the following will not throw an exception and the variable will contain a random value from the queried table(s).
SELECT #v_user_user_id = u.user_id FROM users u WHERE u.login = #v_login;
It is non-deterministic which value is used if you have a one two many relationship.
In MS-SQL-Sever (>=2005) i would use a CTE since it's a readable way to specify what i want using ROW_NUMBER. Another advantage of a CTE is that you can change it easily to do a select instead of an update(or delete) to see what will happen.
Assuming that you want the latest record(acc.to dte_start_date) for every id:
WITH CTE AS
(
SELECT a.*, rn = ROW_NUMBER() OVER (PARTITION BY a.id
ORDER BY a.dte_start_date DESC)
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
)
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A INNER JOIN CTE ON A.ID = CTE.ID
WHERE CTE.RN = 1

SQL Query to retrieve data while excluding a set of rows

I have basically four tables (SQL Server):
Objects:
id
ObjectName
Components
id
ComponentName
ObjectsDetails:
ObjectID
ComponentID
ExclusionTable
id
ComponentID
Basically, these tables describe Objects and what Objects are made of (what components)
For example, Object "A" may be made out of component "A" and component "B".
In this case, the tables would be populated this way:
Objects:
id ObjectName
1 A
Components:
id ComponentName
1 A
2 B
ObjectDetails:
ObjectID ComponentID
1 1
1 2
Now, "ExclusionTable" may have a list of components that are to be excluded from a search (therefore, excluding entire objects if the object is made out of at least one of those components).
For example, I would like to ask:
"Give me all the Objects that are not made out of components A and B".
Therefore, my question is:
Is there a way to write a query for that ? No views, no stored procedures please.. my SQL engine does not support that.
I tried something like:
SELECT DISTINCT ObjectName FROM Objects INNER JOIN ObjectsDetails ON Objects.id =
ObjectDetails.ObjectID WHERE ObjectsDetails.ComponentID NOT IN (1,2)
in case ExclusionTable tells us that Components A and B needs to be excluded.
Of course, that doesn't work...
I tried a few variations using WHERE NOT EXISTS (SELECT * FROM ExclusionTable) but I am not proficient enough in SQL to understand how to get it to work using one query only (if it is even possible).
Thanks!
You should avoid doing queries with [not] in (select ...)
SELECT DISTINCT ObjectName
FROM Objects
INNER JOIN ObjectsDetails ON Objects.id = ObjectDetails.ObjectID
LEFT JOIN ExclusionTable on ExclusionTable.ComponentId = ObjectsDetails.ComponentID
where ExclusionTable.ComponentId is null;
This will retrieve only rows for which the ComponentID is not in ExclusionTable.
Update:
SELECT ObjectName
FROM Objects
INNER JOIN ObjectsDetails ON Objects.id = ObjectDetails.ObjectID
LEFT JOIN ExclusionTable on ExclusionTable.ComponentId = ObjectsDetails.ComponentID
group by ObjectName
having count(distinct ObjectsDetails.ComponentID) = sum(case when ExclusionTable.id is null then 1 else 0 end)
New approach, I think the only other way I could do it is basically to compare the number of components per object with the number of components in the object not included on the list. When these number are equal, no component is on the excluded list and we can show the object.
I'm sorry I can't make a test right now, please use EXPLAIN select ... to compare the queries, if they work.
Basically, if you need to get all objects not made from A or B, you need to get all objects EXCEPT those made from A or B.
SELECT DISTINCT Id, ObjectName
FROM Objects
WHERE Id NOT IN (
SELECT DISTINCT ObjectDetails.ObjectID
FROM ObjectDetails
INNER JOIN Components ON ObjectDetails.ComponentID = Components.Id
WHERE Components.ComponentName = 'A' OR Components.ComponentName = 'B'
)
Would that be what you're looking for?
EDIT: Of course, you can omit the join if you already have the component ids - then just put those in the where clause to filter them out.
select id, objectname
from Objects
left outer join
( select objectid from ObjectsDetails od inner join Exclusiontable et
on od.ComponentID= et.ComponentID) excludedid
on Objects.ID = excludedid.ObjectID and excludedid.ObjectID is null