Replace certain values by the means of a table with replacement values - sql

I have a table with data in it. Generally the values are correct but sometimes we need to modify them. The modifications are saved in a second table.
I wanted to create a query that dynamically replaces the values if they exist in the replacement table.
This is what my query design looks like but it doesn't work:
This is my query code:
SELECT
b.Pos,
b.Posten,
IsNull(c.Wert_Neu, b.Bez1) AS Bez1,
IsNull(c.Wert_Neu, b.Bez2) AS Bez2,
IsNull(c.Wert_Neu, b.Bez3) AS Bez3,
b.Wert,
b.Einheit
FROM
Table_Values b LEFT JOIN
Table_Replacements c ON b.Bez1 = c.Wert_Alt AND b.Bez2 = c.Wert_Alt AND b.Bez3 = c.Wert_Alt
Where is my logical error? It doesn't replace the values. I assume it has something to do with the joins all going there without OR, but OR would be too costly for performance.
Anyone with a better idea?

Looks like what you want to do is to replace each of the values with the one that appears in the replacement table, but you have three separate columns, and each of those three values will have a different corresponding entry in the replacement table. So you will have to link to that table three different times, once for each value, to link to its replacement, something like:
SELECT
b.Pos,
b.Posten,
IsNull(c.Wert_Neu, b.Bez1) AS Bez1,
IsNull(d.Wert_Neu, b.Bez2) AS Bez2,
IsNull(e.Wert_Neu, b.Bez3) AS Bez3,
b.Wert,
b.Einheit
FROM
Table_Values b
LEFT JOIN Table_Replacements c on b.bez1=c.wert_alt
LEFT JOIN Table_Replacements d on b.bez2=d.wert_alt
LEFT JOIN Table_Replacements e on b.bez3=e.wert_alt
It will be important that your replacement table have an index on wert_alt so that those links can be done efficiently.
Another possibility is to actually store the replacement values in your main data table. So the fields in it would be:
bez1
bez1Replacement
bez2
bez2Replacement
...
Maybe have a trigger on the table so that on any insert or update, the trigger looks up each of the three replacement values from the replacement table and adds them to the main data record. That would not be exactly normalized, but it would speed up your query. But, you may not need to do that at all. The above query is probably efficient enough if you do have that index.

Related

create a table with a Boolean column generated based on other tables columns values?

I have tables A, B, C with millions of rows each. Tables B and C reference table A. The tables are mainly used for one query with multiple filters but only one of those filters vary between queries. since the constant parameters are adding significant time to the query execution time, I was wondering if there is a way to precompute these params into a new table. I was looking at materialized views but the issue is that the computed type I want will be different from the original column type. To explain I will give an example.
lets say these tables represent a book store database. Table A contains general information and table B contain multiple codes for each book to indicate what categories they fall under such as 406, 678, 252.. . I'm building a query to search for books that only fall under 3 of those categories. The variable here is the keyword search in the discreption of the book. I will always need books under those 3 categories (codes) so these are constants.
What I want to do is create a table where it will have a column that tells me whether a given serial falls under those 3 codes or not. this can be done with a boolean type. I don't want to have to join these table and filter for these 3 codes (and more in the real scenario) for every query.. As I understand materialized views can't have generated fields?
What do you think is a good solution here?
You have multiple options.
Partial Index
PostgreSQL allows you to create an index with a where clause like so:
create index tableb_category on tableb (category)
where category in (406, 678, 252);
Create a view for those categories:
create view v_books_of_interest
as
select tablea.*, tableb.*
from tablea
inner join table b
on tableb.bookid = tablea.bookid
and tableb.category in (406, 678, 252);
Now, your queries can use this book_of_interest rather than books. Frankly, I would start with this first. Query optimization with the right indexes goes a long way. Millions of rows in multiple table are manageable.
Materialized view
create materialized view mv_books_of_interest
as
select tablea.*, tableb.*
from tablea
inner join table b
on tableb.bookid = tablea.bookid
and tableb.category in (406, 678, 252);
with no data;
Periodically, run a cron job (or the like) to refresh it:
refresh materialized view mv_books_of_interest;
Partitioning data
https://www.postgresql.org/docs/9.3/ddl-partitioning.html will get you started. If your team is on-board with table inheritance, great. Give it a shot and see how that works for your use case.
Trigger
Create a field is_interesting in tableA (or tableB, depending on how you want to access data). Create a trigger that checks for a certain criteria when data is inserted in dependencies and then turns the book's flag true/false. That will allow your queries to run faster but could slow down your inserts and updates.

Conditional Join act as a mater table

Hi sorry I could not find a way to best title what I am looking for.
Anyway I have an idea of how to do something but I just need a new pair of eyes to look at what I am trying to do to see if it is possible. I basically have two tables one which has a load of text and numbers and another one which acts like a master table so if anything is found in the second table use that otherwise only use what is found in the first table. However I am unsure how to complete the select statement for this, I know i can go down the route of doing two separate select statements and union them up together but there must be an easier way. After playing around I have a query which I think may work but I am unsure if I have missed something. For example we have Table A and Table B (B holding the master data)
SELECT DISTINCT
A.ID,
COALESCE(B.PROD, A.PROD) AS PROD
COALESCE(B.TEXT1, A.TEXT1) AS TEXT1,
COALESCE(B.NUMBER, A.NUMBER) AS NUMBER
FROM
TABLEA A
FULL OUTER JOIN TABLEB B ON A.PROD = B.PROD
Now what I want is the statement to pick up the following information
Anything found in Table A but not in Table B
Anything found in Table B not in Table A
Anything in Table B as the master which is found in
in Table A
I added a full outer join as there maybe items in table B not in Table A
Will the query work, i have checked against out data and it seems to work however I am not sure if i have missed something.
Thanks

Insert with select, dependent on the values in the table inserting into EDITED

So I need to figure out how to insert into a table, from another table, with a where clause that requires me to access the table that I am inserting into. I tried an alias from the table I am inserting into, but I quickly found out that you cannot do that. Basically, what I want to check is that the values that I am inserting into the table match a particular field within the table that I am inserting into. Here is what I've tried:
INSERT INTO "USER"."TABLE1" AS A1
SELECT *
FROM "USER"."TABLE2" AS A2
WHERE A2."HIERARCHYLEVEL" = 2
AND A2."PARENT" = A1."INSTANCE"
Obviously, this was to no avail. I've tried a couple other queries, but they didn't me anywhere, either. Any help would be much appreciated.
EDIT:
I would like to add rows to this table, not add columns to the table. The two tables are of the exact same structure -- in fact, I extracted the data already in table1 from table2. What I have in table1 currently is a bunch of records who have NO PARENT, but an instance. What I want to add is all the records who have a parent in table2 that are equal to the instance in table 1.
Currently there is no way to join on a table when inserting. The solution with the subselect where you select from the table, is the correct.
Aliasing the table you want to change is only possible with UPDATE, UPSERT and MERGE. For these operations it makes sense, as you need to match a column and then decide if you need to update it or insert something instead. In your example the line from table1 that you match is not relevant, as you don't want to change it, so from the statement point of view it is not really relevant that the table you use in your subselect is the same that the one you insert into.
As alternative, I can suggest you following solution, which is equivalent with yours:
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1
WHERE A1."INSTANCE" in (select "PARENT" from "user"."table1")
AND A2."HIERARCHYLEVEL" = 2
This gave me the answer I was looking for, although I am sure there is an easier -- or more efficient -- way to do it.
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1,
"user"."table1" AS A2
WHERE A1."INSTANCE" = A2."PARENT"
AND A2."HIERARCHYLEVEL" = 2

Hive to Hive ETL

I have two large Hive tables, say TableA and TableB (which get loaded from different sources).
These two tables have almost identical table structure / columns with same partition column, a date stored as string.
I need to filter records from each table based on certain (identical) filter criteria.
These tables have some columns containing "codes", which need to be looked up to get its corresponding "values".
There are eight to ten such lookup tables, say, LookupA, LookupB, LookupC, etc.,
Now, I need to:
do a union of those filtered records from TableA and TableB.
do a lookup into the lookup tables and replace those "codes" from the filtered records with their respective "values". If a "code" or "value" is unavailable in the filtered records or lookup table respectively, I need to substitute it with zero or an empty string
transform the dates in the filtered records from one format to another
I am a beginner in Hive. Please let know how I can do it. Thanks.
Note: I can manage till union of the tables. Need some guidance on lookup and transformation.
To basically do a lookup Please follow these steps below,
You have to create a custom User Defined function(UDF) which basically does the look up work,meaning you have to create a Java Program internally for looking up, jar it and add it to Hive something like below:
ADD JAR /home/ubuntu/lookup.jar
You then have to add lookup file containing keyvalue pair as follows:
ADD FILE /home/ubuntu/lookupA;
You then have to create a temporary lookup function such as
CREATE TEMPORARY FUNCTION getLookupValueA AS 'com.LookupA';
Finally you have to call this lookup function in the Select query which will basically populate lookup value for the given lookup key.
Same thing can be achieved using JOIN but that will take a hit on the performance.
Taking a join approach you can very well join by the lookupcode for source and lookup tables something like
select a.key,b.lookupvalue
table a join lookuptable b
where a.key=b.lookupKey
Now for Date Transformation, you can use Date functions in Hive.
For the above problem follow the following steps:
Use union schema to union two tables(schema must be same).
For the above scenario you can try pig script.
script would look like(jn table A and tableB with lookup table and generate the appropriate columns):
a = join TableA by codesA left outer, lookupA by codesA.
b = join a by codesB left outer, lookupB by codesB.
Similarly for Table B.
Suppose some value of codesA does not have a value in the lookup table, then:
z = foreach b generate codesA as codesA, valueA is null ? '0' as valuesA.
(will replace all null values from value with 0).
If you are using Pig 0.12 or later, you can use ToString(CurrentTime(),'yyyy-MM-dd')
I hope it will solve your problem. Let me know in case of any concern.

How do you complex join a number table with an actual table with many clauses dependent on the data from the number table?

I have a table of numbers (PLSQL collection containing some_table_line_ids passed in from a website).
Then I have some_table also has columns -> config_data, config_state
I want to pull in all lines that have the same table_id from the all the table_ids in the number table.
I also want to pull in all lines that have the same config_data as each record pulled in from the first part.
So its a parent/child relationship. This can be done in two for loops by selecting a line by an id in a cursor then another for loop selecting each line equaling the parents config data. Each loop I am performing data manipulation on each line.
I would like to combine both these into a single cursor having all table ids that I need.
What would that look like?
You just want to do a complicated join on different factors. Something like:
select st2.*
from numbers n join
some_table st
on st.table_id = n.table_id join
some_table st2
on st2.config_data = st.config_data
Quite possibly, you actually want:
select distinct st.*
since you might otherwise have duplicates. Or, you might want:
select n.table_id, st.config_data, st2.*
So you know which of the original values was responsible for bringing in the row.
You describe the array as a PL/SQL collection. If you employ a SQL type instead you could include it in the FROM clause by using the TABLE function.
create type some_table_line_id_nt as table of number;
Something like:
select s.*
from some_table s
join table(some_table_line_ids) t
on s.id = t.column_value
(I haven't offered a complete solution as you haven't given enough details of table structure and data.)
I solved the issue using start with and connect by prior.