SQL design: finding rows that are not on a different table - sql

Simplified situation:
I have a table, let's call it header with, say, a primary key header_ID, and some other attributes
I will be needing to flag those rows with different filters. I can't know in advance how many, but for most rows it will be between 0 to 10.
So, I thought of creating another table (let's call it filter) with attributes h_ID and filter_ID. So, if this table has the following it would mean that header_ID=2 was flagged with the filter 1 and 3, and header_ID=3 with filter 1
2 1
2 3
3 1
Then, another table with filter_ID and an attribute with some text describing the filter.
The basic query that I see myself doing often is, for example, "get certain rows from header that don't have any filters". Is this a design that makes these types of queries efficient? If so, how?
I was thinking about perhaps adding a filter_ID column to header, and replacing header_ID column with a filter_ID primary key, that will have value NULL in header when no filters have been assigned. Is that a better idea?
I appreciate your input

You can do a left join and then filter to only get rows with null values in the left table.
SELECT [cols]
FROM header
LEFT JOIN filters
ON header_id=h_id
WHERE [filter col] IS NULL
By the descriptions of your tables, it would be a pretty bad idea to have a filter column in the header table. It would lead to some pretty inefficient queries.

Related

How to get the differences between two - kind of - duplicated tables (sql)

Prolog:
I have two tables in two different databases, one is an updated version of the other. For example we could imagine that one year ago I duplicated table 1 in the new db (say, table 2), and from then I started working on table 2 never updating table 1.
I would like to compare the two tables, to get the differences that have grown in this period of time (the tables has preserved the structure, so that comparison has meaning)
My way of proceeding was to create a third table, in which I would like to copy both table 1 and table 2, and then count the number of repetitions of every entry.
In my opinion, this, added to a new attribute that specifies for every entry the table where he cames from would do the job.
Problem:
Copying the two tables into the third table I get the (obvious) error to have two duplicate key values in a unique or primary key costraint.
How could I bypass the error or how could do the same job better? Any idea is appreciated
Something like this should do what you want if A and B have the same structure, otherwise just select and rename the columns you want to confront....
SELECT
*
FROM
B
WHERE NOT EXISTS (SELECT * FROM A)
if NOT EXISTS doesn't work in your DBMS you could also use a left outer join comparing the rows columns values.
SELECT
A.*
from
A left outer join B
on A.col = B.col and ....

SQL 2 JOINS USING SINGLE REFERENCE TABLE

I'm trying to achieve 2 joins. If I run the 1st join alone it pulls 4 lots of results, which is correct. However when I add the 2nd join which queries the same reference table using the results from the select statement it pulls in additional results. Please see attached. The squared section should not be being returned
So I removed the 2nd join to try and explain better. See pic2. I'm trying to get another column which looks up InvolvedInternalID against the initial reference table IRIS.Practice.idvClient.
Your database is simply doing as you tell it. When you add in the second join (confusingly aliased as tb1 in a 3 table query) the database is finding matching rows that obey the predicate/truth statement in the ON part of the join
If you don't want those rows in there then one of two things must be the case:
1) The truth you specified in the ON clause is faulty; for example saying SELECT * FROM person INNER JOIN shoes ON person.age = shoes.size is faulty - two people with age 13 and two shoes with size 13 will produce 4 results, and shoe size has nothing to do with age anyway
2) There were rows in the table joined in that didn't apply to the results you were looking for, but you forgot to filter them out by putting some WHERE (or additional restriction in the ON) clause. Example, a table holds all historical data as well as current, and the current record is the one with a NULL in the DeletedOn column. If you forget to say WHERE deletedon IS NULL then your data will multiply as all the past rows that don't apply to your query are brought in
Don't alias tables with tbX, tbY etc.. Make the names meaningful! Not only do aliases like tbX have no relation to the original table name (so you encounter tbX, and then have to go searching the rest of the query to find where it's declared so you can say "ah, it's the addresses table") but in this case you join idvclient in twice, but give them unhelpful aliases like tb1, tb3 when really you should have aliased them with something that describes the relationship between them and the rest of the query tables
For example, ParentClient and SubClient or OriginatingClient/HandlingClient would be better names, if these tables are in some relationship with each other.
Whatever the purpose of joining this table in twice is, alias it in relation to the purpose. It may make what you've done wriong easier to spot, for example "oh, of course.. i'm missing a WHERE parentclient.type = 'parent'" (or WHERE handlingclient.handlingdate is not null etc..)
The first step to wisdom is by calling things their proper names

Reproduce certain rows in PostgreSQL

I have a table with IDs that represent Individuals (call it id_table) and another table with characteristics an individual can have (call it char_table).
Now I want to add a column to the id_table containing certain values from the char_table. The difficulty, for me, is, that some ID shall get only one characteristic (simple case) and some ID shall get several characteristics. For that reason the rows with these ID that get several characteristics must be reproduced so often until I have matched every characteristic to that ID.
For Example:
ID '001' shall get the characteristics 'a', 'b', and 'c'. So the row ID='001' must be reproduced 2 times (to get the same row 3 times) and each of these rows shall get one of these 3 characteristics.
I hope I explained intelligible enough.
Anyone an idea how to do this?
Thanks.
In almost all cases, you want to do this with an association or junction table. This table would like like:
create table IndividualCharacteristicsr (
individualId int not null,
characterId int not null
);
(It might also have a unique id itself.)
Each characteristic an individual has would be on a separate row. So, three characteristics mean three rows. A typical query using this information would join all three tables:
from id_table i left outer join
IndividualCharacteristicsr ic
on i.individualId = ic.individualId left outer join
char_table c
on c.characteristicId = ic.characteristicId
(The left outer join includes individuals with no characteristics.)
In Postgres, you could store the characteristics in a string or an array. Neither is very convenient for joining back to id_char. The better approach is an additional table.

How not to display columns which are NULL in a view

I've set up a view which combines all the data across several tables. Is there a way to write this so that only columns which contain non-null data are displayed, and those columns which contain all NULL values are not included?
ADDED:
Sorry, still studying and working on my first big project so every day seems to be a new experience at the minute. I haven't been very clear, and that's partly because I'm not sure I'm going about things the right way! The client is an academic library, and the database records details of specific collections. The view I mentioned is to display all the data held about an item, so it is bringing together tables on publication, copy, author, publisher, language and so on. A small number of items in the collection are papers, so have additional details over and above the standard bibliographic details. What I didn't want was a user to get all the empty fields relating to papers if what was returned only consisted of books, therefore the paper table fields were all null. So I thought perhaps there would be a way to not show these. Someone has commented that this is the job of the client application rather than the database itself, so I can leave this until I get to that phase of the project.
There is no way to do this in sql.
CREATE VIEW dbo.YourView
AS
SELECT (list of fields)
FROM dbo.Table1 t1
INNER JOIN dbo.Table2 t2 ON t1.ID = t2.FK_ID
WHERE t1.SomeColumn IS NOT NULL
AND t2.SomeOtherColumn IS NOT NULL
In your view definition, you can include WHERE conditions which can exclude rows that have certain columns that are NULL.
Update: you cannot really filter out columns - you define the list of columns that are part of your view in your view definition, and this list is fixed and cannot be dynamically changed......
What you might be able to do is us a ISNULL(column, '') construct to replace those NULLs with an empty string. Or then you need to handle excluding those columns in your display front end - not in the SQL view definition...
The only thing I see you could do is make sure to select only those columns from the view that you know aren't NULL:
SELECT (list of non-null fields) FROM dbo.YourView
WHERE (column1 IS NOT NULL)
and so forth - but there's no simple or magic way to select all columns that aren't NULL in one SELECT statement...
You cannot do this in a view, but you can do it fairly easily using dynamic SQL in a stored procedure.
Of course, having a schema which shifts is not necessarily good for clients who consume the data, but it can be efficient if you have very sparse data AND the consuming client understands the varying schema.
If you have to have a view, you can put a "header" row in your view which you can inspect client-side on the first row in your loop to see if you want to not bother with the column in your grid or whatever, you can do something like this:
SELECT * FROM (
-- This is the view code
SELECT 'data' as typ
,int_col
,varchar_col
FROM TABLE
UNION ALL
SELECT 'hdr' as typ
-- note that different types have to be handled differently
,CASE WHEN COUNT(int_col) = 0 THEN NULL ELSE 0 END
,CASE WHEN COUNT(varchar_col) = 0 THEN NULL ELSE '' END
FROM TABLE
) AS X
-- have to get header row first
ORDER BY typ DESC -- add other sort criteria here
If we're reading your question right, there won't be a way to do this in SQL. The output of a view must be a relation - in (over-)simplified terms, it must be rectangular. That is, each row must have the same number of columns.
If you can tell us more about your data and give us some idea of what you want to do with the output, we can perhaps offer more positive suggestions.
In general, add a WHERE clause to your query, e.g.
WHERE a IS NOT NULL AND b IS NOT NULL AND c IS NOT NULL
Here, a b c are your column names.
If you are joining tables together on potentially NULL columns, then use an INNER JOIN, and NULL values will not be included.
EDIT: I may have misunderstood - the above filters out rows, but you may be asking to filter out columns, e.g. you have several columns and you only want to display columns that contain at least one null value across all the rows you are returning. Using dynamic SQL offers a solution, since the set columns varies depending upon your data.
Here's a SQL query that builds another SQL query containing the appropriate columns. You could run this query, and then submit it's result as another query. It assumes 'pk' is some column that is always non-null, e.g. a primary key - this means we can prefix additional row names with a comma.
SELECT CONCAT("SELECT pk"
CASE (count(columnA)) WHEN 0 THEN '' ELSE ',columnA' END,
CASE (count(columnB)) WHEN 0 THEN '' ELSE ',columnB' END,
// etc..
' FROM (YourQuery) base')
FROM
(YourQuery) As base
The query works using Count(column) - the aggregate function ignores NULL values, and so returns 0 for a column consisting entirely of NULLs. The query builder assumes that YourQuery uses aliases to ensure there no duplicate column names.
While you cant put this into a view, you could wrap it up as a stored procedure that copies the data to another table - the result table. You may also set up a trigger so that the result table is updated whenever the base tables change.
I suspect what's going on is that an end user is running CrystalReports and complaining about all the empty columns that have to be removed manually.
It would actually be possible to create a stored procedure that would create a view on the fly, leaving out dataless columns. But then you would have to run this proc before using the view.
Is that acceptable?

Update all rows of a single column

I'm dealing with two tables which have 2 columns, as listed under.
Table 1: table_snapshot
account_no | balance_due
Table 2: table_ paid
account_no | post_balance | delta_balance
I added a third column to table2 with the following command:
ALTER TABLE table_paid ADD delta_balance number(18);
I'm trying to use the following query, to update the new column ( delta_balance ) with the difference in balances between 1 and 2.
FYI, table_paid is a subset of table_snapshot. i,e., table 2 has only a few accounts present in table 1. I get an error saying : SQL Statement not properly ended. the query i'm using is:
UPDATE table_paid
SET table_paid.delta_balance = table_paid.post_balance - table_snapshot.balance_due
from table_paid, table_snapshot
WHERE table_paid.account_no = table_snapshot.account_no;
Appreciate if someone can correct my query.
Many thanks.
novice.
Oracle doesn't have the UPDATE ... FROM syntax that you're using from MS Sql Server (which, I believe, isn't ANSI anyway). Instead, when you need to do an update on a result set, Oracle has you create the resultset as a kind of inline view, then you update through the view, like so:
UPDATE ( SELECT tp.delta_balance
, tp.post_balance
, ts.balance_due
FROM table_paid tp
JOIN table_snapshot ts
ON tp.account_no = ts.account_no
)
SET delta_balance = post_balance - balance_due;
This is more "correct" than the answers supplied by Babar and palindrom, as their queries will update every row in table_paid, even if there are no corresponding rows in table_snapshot. If there is a 1-1 correspondance, you don't need to worry, but it's safer to do it with the inline view.
It's unclear from your example which table is the parent table, or (as I'm guessing) neither is the parent table and account_no is pointing to the primary key of another table (presumably account, or "table_account" by your naming conventions). In any case, it's clear that there is not a 1-1 correspondence in your table - 15K in one, millions in the other.
This could mean 2 things: either there are many rows in table_snapshot that have no corresponding row in table_paid, or there are many rows in table_snapshot for each row in table_paid. If the latter is true, your query is impossible - you will have multiple updates for each row in table_paid, and the result will be unpredictable; how will you know which of the "post_balance - balance_due" expressions will ultimately determine the value of a given delta_balance?
If you run my query, you will find this out quickly enough - you will get an error message that says, "ORA-01779: cannot modify a column which maps to a non key-preserved table". This error will appear based not on the data in the table (it may be okay), but based on the primary keys you have defined on the two tables. If the join condition you specify doesn't unambiguously result in a 1-1 relationship between the updated table and the rest of the join, based on the defined keys, you will get this error. It's Oracle's way of telling you, "You're about to screw up your data".
In the other answers here, you will only get an error (in that case, ORA-01427: single-row subquery returns more than one row) if you actually have data that would cause a problem; my version is more strict, so it may turn out that you will need to use the other versions.
And, as the others have said, you'll definitely want an index on account_no for the table_snapshot table. One on the table_paid wouldn't hurt either.
Try this
UPDATE table_paid
SET table_paid.delta_balance = table_paid.post_balance -
(SELECT table_snapshot.balance_due from table_snapshot WHERE table_paid.account_no =
table_snapshot.account_no);
UPDATE table_paid
SET table_paid.delta_balance = table_paid.post_balance - ( select balance_due from table_snapshot
WHERE table_paid.account_no = table_snapshot.account_no )