Updating a column with maximum matches from another table

Updating a column with maximum matches from another table - sql

I have two tables let’s say A & B and would like to update the column of Status in table A with maximum matches from the column Scores in table B by comparing the 'Topics' of two tables.
I am using the script shown here, but it's taking a really long time so I'd appreciate it if somebody could provide an alternative / better and faster option/script
UPDATE tableA
SET status = (SELECT max(scores)
FROM tableB
WHERE tableB.topics = tableA.topics)

Try creating proper indexes for each column involved and you should be fine, e.g:
CREATE INDEX idx_tableb_topics_scores ON tableb (topics,scores);
An alternative to your query is to apply the aggregate function max() in a way that it only has to be executed once, but I doubt it will speed things up:
UPDATE tablea a SET status = j.max_scores
FROM (SELECT a.topics,max(b.scores) AS max_scores
FROM tablea a
JOIN tableb b ON a.topics = b.topics
GROUP BY a.topics) j
WHERE a.topics = j.topics;

For this query:
UPDATE tableA
SET status = (SELECT max(scores)
FROM tableB
WHERE tableB.topics = tableA.topics
);
The only index you need is on tableB(topics, scores).
If you like, you can rewrite this as an aggregation, which looks like this:
UPDATE tableA
SET status = b.max_scores
FROM (SELECT b.topic, MAX(scores) as max_scores
FROM tableB b
GROUP BY b.topic
) b
WHERE b.topic = a.topic;
Note that this is subtly different from your query. If there are topics in A that are not in B, then this will not update those rows. I do not know if that is desirable.
If many rows in A have the same topic, then pre-aggregating could be significantly faster than the correlated subquery.

Related

Compare 2 Tables and write RowID from Table A to Table B

I have Table A with RowID, Date, Vendor and Cost, Confirmed
I have table B with RowID, Date, Vendor and Cost, Confirmed
Table A list our purchases.
Table B list the statement data from the credit card.
I would like to compare Date, Vendor and Cost in Table A with the same columns in Table B. If there is a match with those three columns, then I would like to take the RowID value from Table A and write it to the matching row in Table B under the Confirmation column.
I am very new to SQL and I am not even sure this is a reasonable expectation.
What do you think?
Is this enough detail to provide your opinion?
Thank you for any help you can give.
Currently I am using an Outer Right Join to get all the rows that do NOT have a match. What I really need is the opposite.

It might help to know what database engine you are using...
My answers are going to relate to MS SQL Server, but much SQL Syntax is the same...
To answer your first question, I would write something like:
Update TableB Set Confirmation = TableA.RowID From TableA
Where TableA.Vendor = TableB.Vendor
And TableA.Cost = TableB.Cost
And TableA.Date = TableB.Date
I would use aliases, but I left them out to hopefully make it easier to understand.
To answer your second question, you can specify an INNER JOIN which is the opposite of an OUTER JOIN and as you mentioned, what you are looking for, as it will return ALL rows that Match and exclude the rest.

You can do this with this query
UPDATE
B
SET
Confirmation = A.RowID
FROM
TableA A
INNER JOIN TableB B
ON B.Vendor = A.Vendor
AND B.Cost = A.Cost
AND B.Date = A.Date
Basically we do an inner join to keep the intersection (the records that match) of the two tables. and update the records that coincided from table B with the id of table A

updating column when equi join returns multiple rows

I have a query like this:
update TABLEA set x1 = b.x2, y1 = b.y2
from TABLEB b
where pid = b.id
however, ID is not unique in TABLEB and so can return 0, 1 or more rows. It seems to be working, and I'm not actually that bothered which row I get the data from as the x2/y2 values I want are always the same for the same ID.
Is the above ok and will it just update from one row, or should I be modifying the sql to ensure only one row is returned? For future reference, even if it is ok, what is the query to only return the first row it finds and is it just as efficient as the above?
thanks.

Your code is fine, but it is going to update using an indeterminate value.
If this concerns you, you can use apply:
update a
set x1 = b.x2,
y1 = b.y2
from tablea a cross apply
(select top (1) b.*
from tableb b
where a.pid = b.id
) b
By putting an order by in the subquery, you can control which value gets updated.

Determine datatypes of columns - SQL selection

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks

Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?

You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);

SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'

This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?

It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

How to efficiently retrieve data in one to many relationships

I am running into an issue where I have a need to run a Query which should get some rows from a main table, and have an indicator if the key of the main table exists in a subtable (relation one to many).
The query might be something like this:
select a.index, (select count(1) from second_table b where a.index = b.index)
from first_table a;
This way I would get the result I want (0 = no depending records in second_table, else there are), but I'm running a subquery for each record I get from the database. I need to get such an indicator for at least three similar tables, and the main query is already some inner join between at least two tables...
My question is if there is some really efficient way to handle this. I have thought of keeping record in a new column the "first_table", but the dbadmin don't allow triggers and keeping track of it by code is too risky.
What would be a nice approach to solve this?
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list. If no row in the second table exists, I won't turn on this indicator.
To search for all rows in first_table which have at least one row in second_table, or which don't have rows in the second table.
Another option I just found:
select a.index, b.index
from first_table a
left join (select distinct(index) as index from second_table) b on a.index = b.index
This way I will get null for b.index if it doesn' exist (display can finally be adapted, I'm concerned on query performance here).
The final objective of this question is to find a proper design approach for this kind of case. It happens often, a real application culd be a POS system to show all clients and have one icon in the list as an indicator wether the client has open orders.

Try using EXISTS, I suppose, for such case it might be better then joining tables. On my oracle db it's giving slightly better execution time then the sample query, but this may be db-specific.
SELECT first_table.ID, CASE WHEN EXISTS (SELECT * FROM second_table WHERE first_table.ID = second_table.ID) THEN 1 ELSE 0 END FROM first_table

why not try this one
select a.index,count(b.[table id])
from first_table a
left join second_table b
on a.index = b.index
group by a.index

Two ideas: one that doesn't involve changing your tables and one that does. First the one that uses your existing tables:
SELECT
a.index,
b.index IS NOT NULL,
c.index IS NOT NULL
FROM
a_table a
LEFT JOIN
b_table b ON b.index = a.index
LEFT JOIN
c_table c ON c.index = a.index
GROUP BY
a.index, b.index, c.index
Worth noting that this query (and likely any that resemble it) will be greatly helped if b_table.index and c_table.index are either primary keys or are otherwise indexed.
Now the other idea. If you can, instead of inserting a row into b_table or c_table to indicate something about the corresponding row in a_table, indicate it directly on the a_table row. Add exists_in_b_table and exists_in_c_table columns to a_table. Whenever you insert a row into b_table, set a_table.exists_in_b_table = true for the corresponding row in a_table. Deletes are more work since in order to update the a_table row you have to check if there are any rows in b_table other than the one you just deleted with the same index. If deletes are infrequent, though, this could be acceptable.

Or you can avoid join altogether.
WITH comb AS (
SELECT index
, 'N' as exist_ind
FROM first_table
UNION ALL
SELECT DISTINCT
index
, 'Y' as exist_ind
FROM second_table
)
SELECT index
, MAX(exist_ind) exist_ind
FROM comb
GROUP BY index

The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list.
To search for all rows in first_table which have at least one row in second_table.
Here you go:
SELECT a.index, 1 as c_check -- 1: at least one row in second_table exists for a given row in first_table
FROM first_table a
WHERE EXISTS
(
SELECT 1
FROM second_table b
WHERE a.index = b.index
);

I am assuming that you can't change the table definitions, e.g. partitioning the columns.
Now, to get a good performance you need to take into account other tables which are getting joined to your main table.
It all depends on data demographics.
If the other joins will collapse the rows by high factor, you should consider doing a join between your first table and second table. This will allow the optimizer to pick best join order , i.e, first joining with other tables then the resulting rows joined with your second table gaining the performance.
Otherwise, you can take subquery approach (I'll suggest using exists, may be Mikhail's solution).
Also, you may consider creating a temporary table, if you need such queries more than once in same session.

I am not expert in using case, but will recommend the join...
that works even if you are using three tables or more..
SELECT t1.ID,t2.name, t3.date
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 ON t1.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON t2.ID = t3.ID
--WHERE t1.ID = #ProductID -- this is optional condition, if want specific ID details..
this will help you fetch the data from Normalized(BCNF) tables.. as they always categorize data with type of nature in separate tables..
I hope this will do...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas