I am currently working with an H2 database and I have written the following SQL, however the H2 database engine does not support the NOT IN being performed on a multiple column sub-query.
DELETE FROM AllowedParam_map
WHERE (AllowedParam_map.famid,AllowedParam_map.paramid) NOT IN (
SELECT famid,paramid
FROM macros
LEFT JOIN macrodata
ON macros.id != macrodata.macroid
ORDER BY famid)
Essentially I want to remove rows from allowedparam_map wherever it has the same combination of famid and paramid as the sub-query
Edit: To clarify, the sub-query is specifically trying to find famid/paramid combinations that are NOT present in macrodata, in an effort to weed out the allowedparam_map, hence the ON macros.id != macrodata.macroid. I'm also terrible at SQL so this might be completely the wrong way to do it.
Edit 2: Here is some more info about the pertinent schema:
Macros
| ID | NAME | FAMID |
| 0 | foo | 1 |
| 1 | bar | 1 |
| 2 | baz | 1 |
MacroData
| ID | MACROID | PARAMID | VALUE |
| 0 | 0 | 1 | 1024 |
| 1 | 0 | 2 | 200 |
| 2 | 0 | 3 | 89.85 |
AllowedParam_Map
| ID | FAMID | PARAMID |
| 0 | 1 | 1 |
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
The parameters are allowed on a per-family basis. Notice how the allowedParam_map table contains an entry for famid=1 and paramid=4, even though macro 0, aka "foo", does not have an entry for paramid=4. If we expand this, there might be another famid=1 macro that has paramid=4, but we cant be sure. I want to cull from the allowedParam_map table any unused parameters, based on the data in the macrodata table.
IN and NOT IN can always be replaced with EXISTS and NOT EXISTS.
Some points first:
You are using an ORDER BY in your subquery, which is of course superfluous.
You are outer-joining a table, which should have no effect when asking for existence. So either you need to look up a field in the outer-joined table, then inner-join it or you don't, then remove it from the query. (It's queer to join every non-related record (macros.id != macrodata.macroid) anyway.
You say in the comments section that both famid and paramid reside in table macros, so you can remove the outer join to macrodata from your query. You get:
As you say now that famid is in table macros and paramid is in table macrodata and you want to look up pairs that exist in AllowedParam_map, but not in the aformentioned tables, you seem to be looking for a simple inner join.
DELETE FROM AllowedParam_map
WHERE NOT EXISTS
(
SELECT *
FROM macros m
JOIN macrodata md ON md.macroid = m.id
WHERE m.famid = AllowedParam_map.famid
AND md.paramid = AllowedParam_map.paramid
);
You can use not exists instead:
DELETE FROM AllowedParam_map m
WHERE NOT EXISTS (SELECT 1
FROM macros LEFT JOIN
macrodata
ON macros.id <> macrodata.macroid -- I strongly suspect this should be =
WHERE m.famid = ?.famid and m.paramid = ?.paramid -- add the appropriate table aliases
);
Notes:
I strongly suspect the <> should be =. <> does not make sense in this context.
Replace the ? with the appropriate table alias.
NOT EXISTS is better than NOT IN anyway. It does what you expect if one of the value is NULL.
Related
I'm struggling to find a value that might be in different tables but using UNION is a pain as there are a lot of tables.
[Different table that contains the suffixes from the TestTable_]
| ID | Name|
| -------- | -----------|
| 1 | TestTable1 |
| 2 | TestTable2 |
| 3 | TestTable3 |
| 4 | TestTable4 |
TestTable1 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------- |
| 1 | goose | withFeather? |featherID |
| 2 | rooster| withoutFeather?|shinyfeatherID |
| 3 | rooster| age | 20 |
TestTable2 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------------------|
| 1 | brazilian_goose | withFeather? |featherID |
| 2 | annoying_rooster | withoutFeather?|shinyfeatherID |
| 3 | annoying_rooster | no_legs? |dead |
TestTable3 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------- |
| 1 | goose | withFeather? |featherID |
| 2 | rooster| withoutFeather?|shinyfeatherID |
| 3 | rooster| age | 15 |
Common columns: q1 and a1
Is there a way to parse through all of them to lookup for a specific value without using UNION because some of them might have different columns?
Something like: check if "q1='age'" exists in all those tables (from 1 to 50)
Select q1,*
from (something)
where q1 exists in (TestTable_*)... or something like that.
If not possible, not a problem.
You could use dynamic SQL but something I do in situations like this where I have a list of tables that I want to quickly perform the same actions on is to either use a spreadsheet to paste the list of tables into and type a query into the cell with something like #table then use the substitute function to replace it.
Alternative I just paste the list into SSMS and use SHIFT+ALT+ArrowKey to select the column and start typing stuff out.
So here is my list of tables
Then I use that key combo. As you can see my cursor has now selected all those rows.
Now I can start typing and all rows selected will get the input.
Then I just go to the other side of the table names and repeat the action
It's not a perfect solution but it's quick a quick and dirty way of doing something repetitive quickly.
If you want to find all the tables with that column name you can use information schema.
Select table_name from INFORMATION_SCHEMA.COLUMNS where COLUMN_NAME = 'q1'
Given the type of solution you are after I can offer a method that I've had to use on legacy systems.
You can query sys.columns for the name of the column(s) you need to find in N tables and join using object_id to sys.tables where type='U'. This will give you a list of table names.
From this list you can then build a working query for each table, and depending on your requirements (is this ad-hoc?) either just manually execute it yourself of build a procedure that will do it for you using sp_executesql
Eg
select t.name, c.name
into #workingtable
from sys.columns c
join sys.tables t on t.object_id=c.object_id
where c.name in .....
psudocode:
begin loop while rows exist in #working table
select top 1 row from #workingtable
set #sql=your query specific to that table and column(s)
exec(#sql) / sp_executesql / try/catch as necessary
delete row from working table
end loop
Hopefully that give ideas at least for how you might implement your requirements.
Say if I have two queries returning two tables with the same number of rows.
For example, if query 1 returns
| a | b | c |
| 1 | 2 | 3 |
| 4 | 5 | 6 |
and query 2 returns
| d | e | f |
| 7 | 8 | 9 |
| 10 | 11 | 12 |
How to obtain the following, assuming both queries are opaque
| a | b | c | d | e | f |
| 1 | 2 | 3 | 7 | 8 | 9 |
| 4 | 5 | 6 | 10 | 11 | 12 |
My current solution is to add to each query a row number column and inner join them
on this column.
SELECT
q1_with_rownum.*,
q2_with_rownum.*
FROM (
SELECT ROW_NUMBER() OVER () AS q1_rownum, q1.*
FROM (.......) q1
) q1_with_rownum
INNER JOIN (
SELECT ROW_NUMBER() OVER () AS q2_rownum, q2.*
FROM (.......) q2
) q2_with_rownum
ON q1_rownum = q2_rownum
However, if there is a column named q1_rownum in either of the query,
the above will break. It is not possible for me to look into q1 or q2;
the only information available is that they are both valid SQL queries
and do not contain columns with same names. Are there any SQL construct
similar to UNION but for columns instead of rows?
There is no such function. A row in a table is an entity.
If you are constructing generic code to run on any tables, you can try using less common values, such as "an unusual query rownum" -- or something more esoteric than that. I would suggest using the same name in both tables and then using using clause for the join.
Not sure if I understood your exact problem, but I think you mean both q1 and q2 are joined on a column with the same name?
You should add each table name before the column to distinguish which column is referenced:
"table1"."similarColumnName" = "table2"."similarColumnName"
EDIT:
So, problem is that if there is already a column with the same alias as your ROW_NUMBER(), the JOIN cannot be made because you have an ambiguous column name.
The easier solution if you cannot know your incoming table's columns is to make a solid alias, for example _query_join_row_number
EDIT2:
You could look into prefixing all columns with their original table's name, thus removing any conflict (you get q1_with_rows.rows and conflict column is q1_with_rows.q1.rows)
an example stack on this: In a join, how to prefix all column names with the table it came from
I was looking to provide an answer to this question in which the OP has two tables:
Table1
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | |
| 2 | |
| 3 | |
+--------+--------+
Table2
+----+--------+--------+--------+
| ID | testID | stepID | status |
+----+--------+--------+--------+
| 1 | 1 | 1 | pass |
| 2 | 1 | 2 | fail |
| 3 | 1 | 3 | pass |
| 4 | 2 | 1 | pass |
| 5 | 2 | 2 | pass |
| 6 | 3 | 1 | fail |
+----+--------+--------+--------+
Here, the OP is looking to update the status field for each testID in Table1 with pass if the status of all stepID records associated with the testID in Table2 have a status of pass, else Table1 should be updated with fail for that testID.
In this example, the result should be:
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | fail |
| 2 | pass |
| 3 | fail |
+--------+--------+
I wrote the following SQL code in an effort to accomplish this:
update Table1 a inner join
(
select
b.testID,
iif(min(b.status)=max(b.status) and min(b.status)='pass','pass','fail') as v
from Table2 b
group by b.testID
) c on a.testID = c.testID
set a.testStatus = c.v
However, MS Access reports the all-too-familiar, 'operation must use an updateable query' response.
I know that a query is not updateable if there is a one-to-many relationship between the record being updated and the set of values, but in this case, the aggregated subquery would yield a one-to-one relationship between the two testID fields.
Which left me asking, why is this query not updateable?
You're joining in a query with an aggregate (Max).
Aggregates are not updateable. In Access, in an update query, every part of the query has to be updateable (with the exception of simple expressions, and subqueries in WHERE part of your query), which means your query is not updateable.
You can work around this by using domain aggregates (DMin and DMax) instead of real ones, but this query will take a large performance hit if you do.
You can also work around it by rewriting your aggregates to take place in an EXISTS or NOT EXISTS clause, since that's part of the WHERE clause thus doesn't need to be updateable. That would likely minimally affect performance, but means you have to split this query in two: 1 query to set all the fields to "pass" that meet your condition, another to set them to "fail" if they don't.
Problem: SQL Query that looks at the values in the "Many" relationship, and doesn't return values from the "1" relationship.
Tables Example: (this shows two different tables).
+---------------+----------------------------+-------+
| Unique Number | <-- Table 1 -- Table 2 --> | Roles |
+---------------+----------------------------+-------+
| 1 | | A |
| 2 | | B |
| 3 | | C |
| 4 | | D |
| 5 | | |
| 6 | | |
| 7 | | |
| 8 | | |
| 9 | | |
| 10 | | |
+---------------+----------------------------+-------+
When I run my query, I get multiple, unique numbers that show all of the roles associated to each number like so.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 4 | C |
| 4 | A |
| 5 | B |
| 5 | C |
| 5 | D |
| 6 | D |
| 6 | A |
+---------------+-------+
I would like to be able to run my query and be able to say, "When the role of A is present, don't even show me the unique numbers that have the role of A".
Maybe if SQL could look at the roles and say, WHEN role A comes up, grab unique number and remove it from column 1.
Based on what I would "like" to happen (I put that in quotations as this might not even be possible) the following is what I would expect my query to return.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 5 | B |
| 5 | C |
| 5 | D |
+---------------+-------+
UPDATE:
Query Example: I am querying 8 tables, but I condensed it to 4 for simplicity.
SELECT
c.UniqueNumber,
cp.pType,
p.pRole,
a.aRole
FROM c
JOIN cp ON cp.uniqueVal = c.uniqueVal
JOIN p ON p.uniqueVal = cp.uniqueVal
LEFT OUTER JOIN a.uniqueVal = p.uniqueVal
WHERE
--I do some basic filtering to get to the relevant clients data but nothing more than that.
ORDER BY
c.uniqueNumber
Table sizes: these tables can have anywhere from 50,000 rows to 500,000+
Pretending the table name is t and the column names are alpha and numb:
SELECT t.numb, t.alpha
FROM t
LEFT JOIN t AS s ON t.numb = s.numb
AND s.alpha = 'A'
WHERE s.numb IS NULL;
You can also do a subselect:
SELECT numb, alpha
FROM t
WHERE numb NOT IN (SELECT numb FROM t WHERE alpha = 'A');
Or one of the following if the subselect is materializing more than once (pick the one that is faster, ie, the one with the smaller subtable size):
SELECT t.numb, t.alpha
FROM t
JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') = 0) AS s USING (numb);
SELECT t.numb, t.alpha
FROM t
LEFT JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') > 0) AS s USING (numb)
WHERE s.numb IS NULL;
But the first one is probably faster and better[1]. Any of these methods can be folded into a larger query with multiple additional tables being joined in.
[1] Straight joins tend to be easier to read and faster to execute than queries involving subselects and the common exceptions are exceptionally rare for self-referential joins as they require a large mismatch in the size of the tables. You might hit those exceptions though, if the number of rows that reference the 'A' alpha value is exceptionally small and it is indexed properly.
There are many ways to do it, and the trade-offs depend on factors such as the size of the tables involved and what indexes are available. On general principles, my first instinct is to avoid a correlated subquery such as another, now-deleted answer proposed, but if the relationship table is small then it probably doesn't matter.
This version instead uses an uncorrelated subquery in the where clause, in conjunction with the not in operator:
select num, role
from one_to_many
where num not in (select otm2.num from one_to_many otm2 where otm2.role = 'A')
That form might be particularly effective if there are many rows in one_to_many, but only a small proportion have role A. Of course you can add an order by clause if the order in which result rows are returned is important.
There are also alternatives involving joining inline views or CTEs, and some of those might have advantages under particular circumstances.
I have a table that contains URL strings, i.e.
/A/B/C
/C/E
/C/B/A/R
Each string is split into tokens where the separator in my case is '/'. Then I assign integer value to each token and the put them into dictionary (different database table) i.e.
A : 1
B : 2
C : 3
E : 4
D : 5
G : 6
R : 7
My problem is to find those rows in first tables which contain given sequence of tokens. Additional problem is that my input is sequence of ints, i.e. I have
3, 2
and I'd like to find following rows
/A/B/C
/C/B/A/R
How to do this in efficient way. By this I mean how to design proper database structure.
I use PostgreSQL, solution should work well for 2 mln of rows in first table.
To clarify my example - I need both 'B' AND 'C' to be in the URL. Also 'B' and 'C' can occur in any order in the URL.
I need efficient SELECT. INSERT does not have to be efficient. I do not have to do all work in SQL if this changes anything.
Thanks in advance
I'm not sure how to do this, but I'm just giving you some idea that might be useful. You already have your initial table. You process is and create the token table:
+------------+---------+
| TokenValue | TokenId |
+------------+---------+
| A | 1 |
| B | 2 |
| C | 3 |
| E | 4 |
| D | 5 |
| G | 6 |
| R | 7 |
+------------+---------+
That's ok for me. Now, what I would do is to create a new table in which I would match the original table with the tokens of the token table (OrderedTokens). Something like:
+-------+---------+---------+
| UrlID | TokenId | AnOrder |
+-------+---------+---------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 5 | 1 |
| 2 | 2 | 2 |
| 2 | 1 | 3 |
| 2 | 7 | 4 |
| 3 | 3 | 1 |
| 3 | 4 | 2 |
+-------+---------+---------+
This way you can even recreate your original table as long as you use the order field. For example:
select string_agg(t.tokenValue, '/' order by ot.anOrder) as OriginalUrl
from OrderedTokens as ot
join tokens t on t.tokenId = ot.tokenId
group by ot.urlId
The previous query would result in:
+-------------+
| OriginalUrl |
+-------------+
| A/B/C |
| D/B/A/R |
| C/E |
+-------------+
So, you don't even need your original table anymore. If you want to get Urls that have any of the provided token ids (in this case B OR C), you sould use this:
select string_agg(t.tokenValue, '/' order by ot.anOrder) as OriginalUrl
from OrderedTokens as ot
join Tokens t on t.tokenId = ot.tokenId
group by urlid
having count(case when ot.tokenId in (2, 3) then 1 end) > 0
This results in:
+-------------+
| OriginalUrl |
+-------------+
| A/B/C | => It has both B and C
| D/B/A/R | => It has only B
| C/E | => It has only C
+-------------+
Now, if you want to get all Urls that have BOTH ids, then try this:
select string_agg(t.tokenValue, '/' order by ot.anOrder) as OriginalUrl
from OrderedTokens as ot
join Tokens t on t.tokenId = ot.tokenId
group by urlid
having count(distinct case when ot.tokenId in (2, 3) then ot.tokenId end) = 2
Add in the count all the ids you want to filter and then equal that count the the amount of ids you added. The previous query will result in:
+-------------+
| OriginalUrl |
+-------------+
| A/B/C | => It has both B and C
+-------------+
The funny thing is that none of the solutions I provided results in your expected result. So, have I misunderstood your requirements or is the expected result you provided wrong?
Let me know if this is correct.
It really depends on what you mean by efficient. It will be a trade-off between query performance and storage.
If you want to efficiently store this information, then your current approach is appropriate. You can query the data by doing something like this:
SELECT DISTINCT
u.url
FROM
urls u
INNER JOIN
dictionary d
ON
d.id IN (3, 2)
AND u.url ~ E'\\m' || d.url_component || E'\\m'
This query will take some time, as it will be required to do a full table scan, and perform regex logic on each URL. It is, however, very easy to insert and store data.
If you want to optimize for query performance, though, you can create a reference table of the URL components; it would look something like this:
/A/B/C A
/A/B/C B
/A/B/C C
/C/E C
/C/E E
/D/B/A/R D
/D/B/A/R B
/D/B/A/R A
/D/B/A/R R
You can then create a clustered index on this table, on the URL component. This query would retrieve your results very quickly:
SELECT DISTINCT
u.full_url
FROM
url_components u
INNER JOIN
dictionary d
ON
d.id IN (3, 2)
AND u.url_component = d.url_component
Basically, this approach moves the complexity of the query up front. If you are doing few inserts, but lots of queries against this data, then that is appropriate.
Creating this URL component table is trivial, depending on what tools you have at your disposal. A simple awk script could work through your 2M records in a minute or two, and the subsequent copy back into the database would be quick as well. If you need to support real-time updates to this table, I would recommend a non-SQL solution: whatever your app is coded in could use regular expressions to parse the URL and insert the components into the component table. If you are limited to using the database, then an insert trigger could fulfill the same role, but it will be a more brittle approach.