Deleting records from a table without primary key - sql

I need to delete some specific record from database table but table itself does not have primary key. So condition depends on other table. So what is the correct way to do that?
delete from table_1
where exists
(select distinct tb.*
from table_1 tb, table_2 tb_2, table_3 tb_3
where tb1.col = tb2.col
and tb3.col = tb2.col
and tb3.col_2= 10)
is that correct way to do that? Lets say table_1 has 4 columns and first two columns should be the criteria to remove.

If the select version of your query returns the results you want to delete, then you're good. A couple things though..
Use the ANSI compliant explicit join syntax instead of the comma delineated, implicit syntax (which is long since depreciated). The explicit syntax looks better and is easier to read, anyway.
Correlate your EXISTS back to the main table. And you don't need a distinct, it will return positive whether there is 1 matching row or 10 billion.
SELECT *
FROM table_1 tb_1
WHERE EXISTS (SELECT *
FROM table_2 tb_2
JOIN table_3 tb_3 ON tb_2.col = tb_3.col
WHERE tb_1.col = tb_2.col
AND tb_3.col_2 = 10)

Related

Condtional statement execution in sqlite

New to SQL and Sqlite, I'm afraid. I need to delete a row from table A, but only if that row isn't referenced in table B. So, I guess I need to do something like this:
delete from tableA
where (col1 = 94) and
(select count(*) from tableB (where col2 = 94) = 0);
What's the right way to do this?
Alternatively, I could just do this from C in two steps, first checking that the row isn't referenced, and then deleting it. Would this be better? I'm hesitant to do this because I would need to put an sqlite3_mutex around several steps in the C code, which might be less efficient that executing a single more complex statement. Thanks.
Your method is pretty close. The parentheses are in the wrong place:
delete from tableA
where (col1 = 94) and
(select count(*) from tableB where col2 = 94) = 0;
For instance, SQL doesn't allow a paren before the where.
I would however suggest that you learn about foreign keys. These are a useful construct in SQL. The SQLite documentation is here.
A foreign key constraint would have the database do this check automatically whenever a row is being deleted from tableA. The delete would return an error, but it would make sense -- something like "you can't delete this row because another row references it".
delete tableA
from tableA
left join tableB on tableA.col1 = tableB.col2
where tableB.col2 is null
and tableA.col1 = 94
You can left join the two tables and delete only those where the link could not be established between them (tableB.col2 is null).
Alternatively you can do
delete from tableA
where col1 = 94
and not exists
(
select 1 from tableB where col2 = 94
)

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.

How to auto increment a value in one table when inserted a row in another table

I currently have two tables:
Table 1 has a unique ID and a count.
Table 2 has some data columns and one column where the value of the unique ID of Table 1 is inside.
When I insert a row of data in Table 2, the the count for the row with the referenced unique id in Table 1 should be incremented.
Hope I made myself clear. I am very new to PostgreSQL and SQL in general, so I would appreciate any help how to do that. =)
You could achieve that with triggers.
Be sure to cover all kinds of write access appropriately if you do. INSERT, UPDATE, DELETE.
Also be aware that TRUNCATE on Table 2 or manual edits in Table 1 could break data integrity.
I suggest you consider a VIEW instead to return aggregated results that are automatically up to date. Like:
CREATE VIEW tbl1_plus_ct AS
SELECT t1.*, t2.ct
FROM tbl1 t1
LEFT JOIN (
SELECT tbl1_id, count(*) AS ct
FROM tbl2
GROUP BY 1
) t2 USING (tbl1_id)
If you use a LEFT JOIN, all rows of tbl1 are included, even if there is no reference in tbl2. With a regular JOIN, those rows would be omitted from the VIEW.
For all or much of the table, it is fastest to aggregate tbl2 first in a subquery, then join to tbl1 - like demonstrated above.
Instead of creating a view, you could also just use the query directly, and if you only fetch a single row, or only few, this alternative form would perform better:
SELECT t1.*, count(t2.tbl1_id) AS ct
FROM tbl1 t1
LEFT JOIN tbl2 t2 USING (tbl1_id)
WHERE t1.tbl1_id = 123 -- for example
GROUP BY t1.tbl1_id -- being the primary key of tbl1!

How to efficiently retrieve data in one to many relationships

I am running into an issue where I have a need to run a Query which should get some rows from a main table, and have an indicator if the key of the main table exists in a subtable (relation one to many).
The query might be something like this:
select a.index, (select count(1) from second_table b where a.index = b.index)
from first_table a;
This way I would get the result I want (0 = no depending records in second_table, else there are), but I'm running a subquery for each record I get from the database. I need to get such an indicator for at least three similar tables, and the main query is already some inner join between at least two tables...
My question is if there is some really efficient way to handle this. I have thought of keeping record in a new column the "first_table", but the dbadmin don't allow triggers and keeping track of it by code is too risky.
What would be a nice approach to solve this?
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list. If no row in the second table exists, I won't turn on this indicator.
To search for all rows in first_table which have at least one row in second_table, or which don't have rows in the second table.
Another option I just found:
select a.index, b.index
from first_table a
left join (select distinct(index) as index from second_table) b on a.index = b.index
This way I will get null for b.index if it doesn' exist (display can finally be adapted, I'm concerned on query performance here).
The final objective of this question is to find a proper design approach for this kind of case. It happens often, a real application culd be a POS system to show all clients and have one icon in the list as an indicator wether the client has open orders.
Try using EXISTS, I suppose, for such case it might be better then joining tables. On my oracle db it's giving slightly better execution time then the sample query, but this may be db-specific.
SELECT first_table.ID, CASE WHEN EXISTS (SELECT * FROM second_table WHERE first_table.ID = second_table.ID) THEN 1 ELSE 0 END FROM first_table
why not try this one
select a.index,count(b.[table id])
from first_table a
left join second_table b
on a.index = b.index
group by a.index
Two ideas: one that doesn't involve changing your tables and one that does. First the one that uses your existing tables:
SELECT
a.index,
b.index IS NOT NULL,
c.index IS NOT NULL
FROM
a_table a
LEFT JOIN
b_table b ON b.index = a.index
LEFT JOIN
c_table c ON c.index = a.index
GROUP BY
a.index, b.index, c.index
Worth noting that this query (and likely any that resemble it) will be greatly helped if b_table.index and c_table.index are either primary keys or are otherwise indexed.
Now the other idea. If you can, instead of inserting a row into b_table or c_table to indicate something about the corresponding row in a_table, indicate it directly on the a_table row. Add exists_in_b_table and exists_in_c_table columns to a_table. Whenever you insert a row into b_table, set a_table.exists_in_b_table = true for the corresponding row in a_table. Deletes are more work since in order to update the a_table row you have to check if there are any rows in b_table other than the one you just deleted with the same index. If deletes are infrequent, though, this could be acceptable.
Or you can avoid join altogether.
WITH comb AS (
SELECT index
, 'N' as exist_ind
FROM first_table
UNION ALL
SELECT DISTINCT
index
, 'Y' as exist_ind
FROM second_table
)
SELECT index
, MAX(exist_ind) exist_ind
FROM comb
GROUP BY index
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list.
To search for all rows in first_table which have at least one row in second_table.
Here you go:
SELECT a.index, 1 as c_check -- 1: at least one row in second_table exists for a given row in first_table
FROM first_table a
WHERE EXISTS
(
SELECT 1
FROM second_table b
WHERE a.index = b.index
);
I am assuming that you can't change the table definitions, e.g. partitioning the columns.
Now, to get a good performance you need to take into account other tables which are getting joined to your main table.
It all depends on data demographics.
If the other joins will collapse the rows by high factor, you should consider doing a join between your first table and second table. This will allow the optimizer to pick best join order , i.e, first joining with other tables then the resulting rows joined with your second table gaining the performance.
Otherwise, you can take subquery approach (I'll suggest using exists, may be Mikhail's solution).
Also, you may consider creating a temporary table, if you need such queries more than once in same session.
I am not expert in using case, but will recommend the join...
that works even if you are using three tables or more..
SELECT t1.ID,t2.name, t3.date
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 ON t1.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON t2.ID = t3.ID
--WHERE t1.ID = #ProductID -- this is optional condition, if want specific ID details..
this will help you fetch the data from Normalized(BCNF) tables.. as they always categorize data with type of nature in separate tables..
I hope this will do...

Find difference between two big tables in PostgreSQL

I have two similar tables in Postgres with just one 32-byte latin field (simple md5 hash).
Both tables have ~30,000,000 rows. Tables have little difference (10-1000 rows are different)
Is it possible with Postgres to find a difference between these tables, the result should be 10-1000 rows I described above.
This is not a real task, I just want to know about how PostgreSQL deals with JOIN-like logic.
EXISTS seems like the best option.
tbl1 is the table with surplus rows in this example:
SELECT *
FROM tbl1
WHERE NOT EXISTS (SELECT FROM tbl2 WHERE tbl2.col = tbl1.col);
If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or:
SELECT *
FROM tbl1
FULL OUTER JOIN tbl2 USING (col)
WHERE tbl2 col IS NULL OR
tbl1.col IS NULL;
Overview over basic techniques in a later post:
Select rows which are not present in other table
Aside: The data type uuid is efficient for md5 hashes:
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
To augment existing answers I use the row() function for the join condition. This allows you to compare entire rows. E.g. my typical query to see the symmetric difference looks like this
select *
from tbl1
full outer join tbl2
on row(tbl1) = row(tbl2)
where tbl1.col is null
or tbl2.col is null
If you want to find the difference without knowing which table has more rows than other, you can try this option that get all rows present in either tables:
SELECT * FROM A
WHERE NOT EXISTS (SELECT * FROM B)
UNION
SELECT * FROM B
WHERE NOT EXISTS (SELECT * FROM A)
In my experience, NOT IN with a subquery takes a very long time. I'd do it with an inclusive join:
DELETE FROM table1 where ID IN (
SELECT id FROM table1
LEFT OUTER JOIN table2 on table1.hashfield = table2.hashfield
WHERE table2.hashfield IS NULL)
And then do the same the other way around for the other table.