Modifying column in access - sql

I have 2 tables in MS Access, TableA and TableB. Table A has only 1 field: myFieldID, and TableB has only 1 field: myFieldName (In reality I have more fields, but these are the ones that matter for the sake of my problem).
Both tables have records that mean the same thing, but written in a different, but similar way.
For instances TableA has:
|TableA.myFieldId |
|-----------------|
|MM0001P |
|HR0003P |
|MH0567P |
So as you can see all of the records are formated this way (with a P at the end):
([A-Z][A-Z][0-9][0-9][0-9][0-9]P)
then, TableB has:
|TableB.myFieldName |
|--------------------------------------------|
|MH-0567 Materials Handling important Role |
|MM-0001 Materials Management Minor Role |
|HR-0003 Human Resources Super Important Role|
So this one has the format (without 'P' at the end):
([A-Z][A-Z]-[0-9][0-9][0-9][0-9] ([A-Z]|[a-z]*))
First, I would like to make join queries with tableA and tableB on these fields, but as you can see, results will be NULL every time since both fields have completely different records.
So I would like to change every name in TableA.myFieldId with his corresponding name in TableB.myFieldName
Problem is, that both tables have around 1 million records, and the fields are repeated multiple times in both tables, plus I don't know how to do this (MS Access doesn't even let me use Regular Expressions).

I would make a table (or query, if it changes often enough) of all unique entries in the 2nd table and the corresponding key for the 1st table. Then use that table or query to help join the two tables.
Something like
Select myFieldName as FName, left(myFieldName,2) & mid(myFieldName,4,4) & "P" as FID
from TableB
group by FName, FID
Important note - are all IDs found in both files, or do you have records in either table that are not in the other? If they don't always match, you may need additional logic or steps to make a master table from both tableA and tableB.

Related

How to get the differences between two - kind of - duplicated tables (sql)

Prolog:
I have two tables in two different databases, one is an updated version of the other. For example we could imagine that one year ago I duplicated table 1 in the new db (say, table 2), and from then I started working on table 2 never updating table 1.
I would like to compare the two tables, to get the differences that have grown in this period of time (the tables has preserved the structure, so that comparison has meaning)
My way of proceeding was to create a third table, in which I would like to copy both table 1 and table 2, and then count the number of repetitions of every entry.
In my opinion, this, added to a new attribute that specifies for every entry the table where he cames from would do the job.
Problem:
Copying the two tables into the third table I get the (obvious) error to have two duplicate key values in a unique or primary key costraint.
How could I bypass the error or how could do the same job better? Any idea is appreciated
Something like this should do what you want if A and B have the same structure, otherwise just select and rename the columns you want to confront....
SELECT
*
FROM
B
WHERE NOT EXISTS (SELECT * FROM A)
if NOT EXISTS doesn't work in your DBMS you could also use a left outer join comparing the rows columns values.
SELECT
A.*
from
A left outer join B
on A.col = B.col and ....

optimizing PGSQL SQL search queries on big texts ('like', full text search, ... )

We have a software solution which is used by +200 customers. We recently switched to pgsql, because our former database was too slow handeling the search queries our customers use.
Our dabatabase looks like this:
TABLE A
1. ID
(+ some other fields which aren't important here)
TABLE B
This table is used to store 'data' on the items in table A. This is different for every customer. For example 'Type' can be 'CLIENTNAME' and value 'AZERTY'. One record on TABLE A can have infinite records in TABLE B. Mostly 1 record in Table A has between 5 - 10 records on Table B.
1. ID TABLE A
2. TYPE
3. VALUE
TABLE C
1. TABLE A ID
2. VERSIONNR
3. DESCRIPTION
This file has the different verions of the records in TABLE A. Each of these versions has an extended description. This can range from 0 characters to infinite.
Our problem: our customers are used on 'google-like' searching. For example: they type 'AZERTY' and we show all the records from TABLE A where the ID of TABLE A:
'AZERTY' is in the description of the most recent version of TABLE C
'AZERTY' is in one of the values of TABLE B
Additional problem: this search is a 'contains'. If they search 'ZER', they should also find the records with 'AZERTY' in it. Multiple arguments are an 'AND', if they search for 'ZER 123', we need to show all records where the description matches 'ZER' and '123' or the values match 'ZER' and '123'.
What we have done so far:
There is an option a user can check in/out whether they want to search the description or not. We mosty advice them to only search for the values and only use the description in case of need.
We make several search threads to the database for one search query, because searching all documents at once would take too much time.
Some time ago, on our former slow database engine, a collegue of mine made 'search tables', basically this is a table which contains all values on a TABLE A ID so there isn't need for any join in the SQL query when searching. It looks like this:
TABLE D
TABLE A ID
VALUES (all values from TABLE B for this TABLE A ID, seperated by a ' ')
DESCRIPTION (the description of the most recent version for this TABLE A ID)
Example record:
- 1
- ZER 123 CLIENT NAME NUMBER 7856 jsdfjklf 4556423
- DESCRIPTION CAN BE VERY LONG.
If a customer searches for 'ZER 123' this becomes:
"select TABLE_A_ID from TABLE_D where values like '%ZER%' and values like '%123%'"
Important:
Some of our customers have alot of records in TABLE A. +5.000.000, which means there are alot of records in TABLE B (+/- 50.000.000). Most of our customers have between 300.000 and 500.000 records in TABLE A.
My questions:
Is there a better / faster way to search through all the values then that search table? Without the search table i would have to make a join for every ' ' in the search argument of the customer, which will work too slow (i think?) if they have alot of records in TABLE A. For example:
select ID from TABLE_A
INNER JOIN TABLE_B Sub1 ON TABLE_A.ID = Sub1.TABLE_A_ID and Sub1.VALUE like '%ZER%'
INNER JOIN TABLE_B Sub2 on FILE_A.ID = Sub2.TABLE_A_ID and Sub2.VALUE like '%123%'
I have taken a look at the full text search in PGSQL. I don't think i can use it since you can't use it as like (= 'contains') ?
Is there any index I can use on the values (FILE B or search file) and description (FILE C or search file) to make the searches faster? I've read on it and i don't think there is any, because indexes aren't used when searching with "like '%ZER%'" ?
I hope i've explained this cleary.
Thanks in advance!
Your terminology is confusing, but I assume you mean "tables" when you write "files".
You cannot reasonably search in several tables with a single query, but you can search in several columns of a single table at the same time.
Based on your description, I would say that you need a trigram index on the concatenation of the relevant string columns in the table.

SQL 2 JOINS USING SINGLE REFERENCE TABLE

I'm trying to achieve 2 joins. If I run the 1st join alone it pulls 4 lots of results, which is correct. However when I add the 2nd join which queries the same reference table using the results from the select statement it pulls in additional results. Please see attached. The squared section should not be being returned
So I removed the 2nd join to try and explain better. See pic2. I'm trying to get another column which looks up InvolvedInternalID against the initial reference table IRIS.Practice.idvClient.
Your database is simply doing as you tell it. When you add in the second join (confusingly aliased as tb1 in a 3 table query) the database is finding matching rows that obey the predicate/truth statement in the ON part of the join
If you don't want those rows in there then one of two things must be the case:
1) The truth you specified in the ON clause is faulty; for example saying SELECT * FROM person INNER JOIN shoes ON person.age = shoes.size is faulty - two people with age 13 and two shoes with size 13 will produce 4 results, and shoe size has nothing to do with age anyway
2) There were rows in the table joined in that didn't apply to the results you were looking for, but you forgot to filter them out by putting some WHERE (or additional restriction in the ON) clause. Example, a table holds all historical data as well as current, and the current record is the one with a NULL in the DeletedOn column. If you forget to say WHERE deletedon IS NULL then your data will multiply as all the past rows that don't apply to your query are brought in
Don't alias tables with tbX, tbY etc.. Make the names meaningful! Not only do aliases like tbX have no relation to the original table name (so you encounter tbX, and then have to go searching the rest of the query to find where it's declared so you can say "ah, it's the addresses table") but in this case you join idvclient in twice, but give them unhelpful aliases like tb1, tb3 when really you should have aliased them with something that describes the relationship between them and the rest of the query tables
For example, ParentClient and SubClient or OriginatingClient/HandlingClient would be better names, if these tables are in some relationship with each other.
Whatever the purpose of joining this table in twice is, alias it in relation to the purpose. It may make what you've done wriong easier to spot, for example "oh, of course.. i'm missing a WHERE parentclient.type = 'parent'" (or WHERE handlingclient.handlingdate is not null etc..)
The first step to wisdom is by calling things their proper names

Extracting different data from the same table as different fields with additional tables as lookups

I have two tables. One gives me basic information about demographics. One of the categories in my demographics table is a subset of people, which is housed in ATID 530 (there are several hundred different ATIDs) of this table:
As you can see the PK of this table is ADefID. My other table uses this as a FK. It houses indexes to additional definitions for records in the original table. However those additional definitions are also just records in the original table. The second table just provides pointers.
So if we pick a record, let's say ADefID=4684423, and look it up in the second table, we are returned this:
The CategoryADefID will then point back to the original table's ADefID for another record:
(note the ATID of this ADefID differs from the original ADefID that this is related to)
So. Let's say I want to pull out a set of records from the first row, say
WHERE ATID = 530 AND CycleID = 9600
But I also want to pull the ADesc (and maybe ADEValue) from the related definition as a separate field.
So the end result would look sort of like this:
I understand enough to make the join to the second table and return the CategoryADefID, but I dont know how to use that to call back to another ADefID in the original table. The other limitation is that I would use the ATID field in the WHERE clause (ATID=530) and the related definition will have a different ATID.
Just add another join back to the original table:
Select *
From tableA a
join tableB b on b.ADefID=a.ADefID
join tableA a2 on a2.ADefID = b.CategoryADefID
Where a.ADefID = 4684423

Using a list with a selected item as a value

Let us consider the following table structures:
Table1
Table1_ID A
1 A1
2 A1;B1
and
Table2
Table2_ID Table1_ID B C
1 1 foobar barfoo
2 2 foofoo barbar
The view I'm using is defined by the following query:
SELECT Table1.A, B, C
FROM Table2
INNER JOIN Table1 ON Table1.Table1_ID = Table2.Table1_ID;
95% of A's data consists in a 2 characters long string. In this case, it works fine. However, 5% of it is actually a list (using a semicolon as a separator) of possible values for this field.
This means my users would like to choose between these values when it is appropriate, and keep using the single value automatically the rest of the time. Of course, this is not possible with a single INNER JOIN, since there cannot be a constant selected value.
Table2 is very large, while Table1 is quite small. Manually filling a local A field in each row within Table2 would be a huge waste of time.
Is there an efficient way for SQL (or, more specifically, SQL Server 2008) to handle this? Such as a list with a selected item within a field?
I was planning to add a "A_ChosenValue" field that would store the chosen value when there's a list in A, and remain empty when A only stores a single value. It would only require users to fill it 5% of the time, which is okay. But I was thinking there could be a better way than using two columns to store a single value.
Ideally you would just alter your schema and add a new entity to support the many-to-many relationship between Table1 and Table2 such as the following with a compound key of all three columns.
Table3
| Table1_ID | Table2_ID | A |
-----------------------------
| 1 | 1 | A1 |
------------------------------
| 2 | 2 | A1 |
------------------------------
| 2 | 2 | B1 |
------------------------------
You could then do a select and join on this table and due to it being indexed you won't lose any performance.
Without altering the table structure or normalizing data it is possible using a conditional select statement like that shown in this SO post but the query wouldn't perform so well as you would have to use a function to split the values containing a semi-colon.
Answering my own question:
I added a LocalA column in Table1, in order that my view actually selects ISNULL(LocalA, Table1.A). Therefore, the displayed value equals A by default, and users can manually overwrite it to select a specific value when A stores a list.
I am not sure whether this is the most efficient solution or not, but at least it works without requiring two columns in the view.