Joining two tables based off of parsed column content

Joining two tables based off of parsed column content - sql

I am trying to join two tables based off of two columns that have been combined (often imperfectly) into another tables column. I am trying to join the tables so the correct records are associated with each other, so I can compare the FDebit and the Debit fields.
The FMEMO is normally generated by taking the Num, then adding a space, then adding the Memo text. As you can see below, our process is not yet perfect. I would therefore like to match the Num, then the space, then the first 10 characters of the Memo field to the FMEMO field.
I have included code below with sample data. Could you please offer me some suggestions on how to accomplish this?
Invoices table
MEMO Num DEBIT
Supplies. Soto Cano 1135 2.25
Suction Hose (1-1/2") by the food 3 74.04
Hose/Tubing:Braided Hose (1") by the food 3 98.72
QP10 Meyers surface pump (60hz) 3 206.27
Cage including f tank, alum parts box and 2 buckets of alum 3 752.03
Cage including valve manifold, F1 & F2 3 3774.08
cage with IBC in it 1135 268.41
Pvc accesories for installation of LWTS. 1175 4.26
Pvc accesories for installation of LWTS. 1175 27.26
Expenses table
FMEMO FDebit
Supplies. Soto Cano 41.8 2.25
3 Suction Hose (1-1/2 74.04
3 Hose/Tubing:Braided Hose (1 98.72
3 QP10 Meyers surface pump (60hz) 3970 206.27
3 Cage including f tank, alum parts box and 2 buckets of alum 14474 752.03
3 Cage including valve manifold, F1 & F2 72638 3774.08
3 cage with IBC in it 5166 268.41
1175 Pvc accesories for installation of LWTS. 82.03 4.26
1175 Pvc accesories for installation of LWTS. 524.67 27.26
Code to replicate:
CREATE TABLE #tempExpenses (
FMEMO varchar(Max), FDebit money)
CREATE TABLE #tempInvoices (
MEMO varchar(Max), Num integer, DEBIT money)
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('Supplies. Soto Cano 41.8', 2.25)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Suction Hose (1-1/2', 74.04)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Hose/Tubing:Braided Hose (1', 98.72)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 QP10 Meyers surface pump (60hz) 3970', 206.27)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Cage including f tank, alum parts box and 2 buckets of alum 14474', 752.03)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Cage including valve manifold, F1 & F2 72638', 3774.08)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 cage with IBC in it 5166', 268.41)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('1175 Pvc accesories for installation of LWTS. 82.03', 4.26)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('1175 Pvc accesories for installation of LWTS. 524.67', 27.26)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Supplies. Soto Cano', 1135, 2.25)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Suction Hose (1-1/2") by the food', 3, 74.04)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Hose/Tubing:Braided Hose (1") by the food', 3, 98.72)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('QP10 Meyers surface pump (60hz)', 3, 206.27)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Cage including f tank, alum parts box and 2 buckets of alum', 3, 752.03)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Cage including valve manifold, F1 & F2', 3, 3774.08)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('cage with IBC in it', 1135, 268.41)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Pvc accesories for installation of LWTS.', 1175, 4.26)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Pvc accesories for installation of LWTS.', 1175, 27.26)<br/>
SELECT *
FROM #tempExpenses
SELECT *
FROM #tempInvoices

Well I really hate myself for producing this TSQL but I think this is what you are looking for:
SELECT *
FROM #tempInvoices i
INNER JOIN #tempExpenses e ON CAST(Num as varchar(10)) + ' ' + SUBSTRING(MEMO,1,9-LEN(CAST(NUM as varchar(10)))) = SUBSTRING(FMEMO,1,10)
Which concatenates the number and takes so many characters form the field i.e if 3 then 9, if 1111 then 9-4 and joins with the same amount of characters form the other table.
Of course this is a very inefficient and ugly query. I would rather normalize the data in the database (parse clean etc)

Related

How would I create one data set from a table that contains two identical values in the first column but different data in the remaining columns Oracle

I have produced a dataset where some of the data has two identical memberkeys but different contract values, while other memberkeys only appear once. I need to merge those memberkeys that have two rows into one distinct memberkey row containing all the data from both rows while leaving the single row memberkey as is.
Current
MemberKey
SubscriberKey
VALUEONE
VALUETWO
VALUETHREE
VALUEFOUR
VALUEFIVE
VALUESIX
VALUESEVEN
VALUEEIGHT
VALUENINE
VALUETEN
VALUEELEVEN
2235
H4931
MA84100303
ENGLISH
ACOC
5TX4VV3TD79
13
1
2235
2235
A84100303
b
ENGLISH
AUCOC
A84100303
4375
H4931
MA48450239
SPANISH
APIM
9QP3K96WK88
14
1
4375
4375
A48450239
SPANISH
AUPIM
A48450239
375
375
H4931
MA08111511
ENGLISH
AMAR
8B06P95CG54
Desired
MemberKey
SubscriberKey
VALUEONE
VALUETWO
VALUETHREE
VALUEFOUR
VALUEFIVE
VALUESIX
VALUESEVEN
VALUEEIGHT
VALUENINE
VALUETEN
VALUEELEVEN
2235
2235
A84100303
b
H4931
MA84100303
ENGLISH
ACOC
AUCOC
A84100303
5TX4VV3TD79
13
1
4375
4375
A48450239
H4931
MA48450239
SPANISH
APIM
AUPIM
A48450239
9QP3K96WK88
14
1
375
375
H4931
MA08111511
ENGLISH
AMAR
8B06P95CG54
I've tried several approaches (ctes, temp tables, convoluted joins etc) without success.
Thanks

Could you create a table and then update this new table with your old table using by joining on the memberkey?
update new_table a
set a.valueone = (select max(x.valueone)
from old_table x
where a.memberkey = x.memberkey);
commit;
This would work as long as for the same memberkey you do not have a different value for the same column.

This seems to work
MERGE INTO MEMBERS m2
USING MEMBERS_GTT m1 ON ( m1.MemberKey = m2.MemberKey )
WHEN MATCHED
THEN
UPDATE SET m2.SUBSCRIBERKEY = COALESCE(m1.SUBSCRIBERKEY,m2.SUBSCRIBERKEY)
WHEN NOT MATCHED
THEN
INSERT ( MemberKey ,
SUBSCRIBERKEY
)
VALUES ( m1.MemberKey ,
m1.SUBSCRIBERKEY
)

Filter condition using "parent" CurrentMember

Here is the data-set:
CREATE TABLE Movies(id INT, name VARCHAR(50), genre VARCHAR(50), budget DECIMAL(10));
INSERT INTO Movies VALUES
(1, 'Pirates of the Caribbean', 'Fantasy', 379000000),
(2, 'Avengers', 'Superhero', 365000000),
(3, 'Star Wars', 'Science fiction', 275000000),
(4, 'John Carter', 'Science fiction', 264000000),
(5, 'Spider-Man', 'Superhero', 258000000),
(6, 'Harry Potter', 'Fantasy', 250000000),
(7, 'Avatar', 'Science fiction', 237000000);
To filter relatively to a constant value no problem, e.g. to get all the movies with a budget higher than 300M$:
WITH
MEMBER X AS SetToStr(Filter(Movie.[Name].[Name].Members - Movie.[Name].CurrentMember, Measures.Budget > 300000000))
SELECT
Movie.[Name].[Name].Members ON ROWS,
X ON COLUMNS
FROM
Cinema
Which gives:
Avatar {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Avengers {[Movie].[Name].&[Pirates of the Caribbean]}
Harry Potter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
John Carter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Pirates of the Caribbean {[Movie].[Name].&[Avengers]}
Spider-Man {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Star Wars {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
But how to compare to the budget of the current movie instead of the hard-coded 300M$ to get the movies more expensive than the current one?
It would give {} for "Pirates of the Caribbean" as it is the most expensive movie.
For "Avengers" it would be { 'Pirates of the Caribbean' } as this is the second most expensive and only "Pirates of the Caribbean" is more expensive.
For "Avatar" it would give all the other movies as it is the less expensive.
The issue is that inside the Filter function's condition CurrentMember refers to the currently tested tuple and not the one currently selected on the ROWS axis.

Instead of using Filter() for each movie, I would first compute an ordered set of movies based on budget values. Then X could be defined using the SubSet and Rank function.
Here is an example using a different schema but I guess you'll get the point easily:
with
set ordered_continents as order( [Geography].[Geography].[Continent], -[Measures].[#Sales] )
member xx as SetToStr( SubSet( ordered_continents, 0, Rank( [Geography].[Geography].currentMember, ordered_continents) - 1))
select {[#Sales], [xx] } on 0, [Geography].[Geography].[Continent] on 1 from [Sales]
I'm not familiar with SSAS so I'm using icCube but I guess the MDX should be very much similar.

SQL - count function not working correctly

I'm trying to count the blood type for each blood bank I'm using oracle DB
the blood bank table is created like this
CREATE TABLE BloodBank (
BB_ID number(15),
BB_name varchar2(255) not NULL,
B_type varchar2(255),CONSTRAINT
blood_ty_pk FOREIGN KEY
(B_type) references BloodType(B_type),
salary number(15) not Null,
PRIMARY KEY (BB_ID)
);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (370,'new york Blood Bank','A+,A-,B+',12000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (791,'chicago Blood Bank','B+,AB-,O-',90000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (246,'los angeles Blood Bank','O+,A-,AB+',4500);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (360,'boston Blood Bank','A+,AB+',13000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (510,'seattle Blood Bank','AB+,AB-,B+',2300);
select * from BloodBank;
when I use the count function
select count(B_type)
from bloodbank
group by BB_ID;
the result would be like this
so why the count function is not working correctly?
I'm trying to display each blood bank blood type count which is not only one in this case

I hope I don't get downvoted for solving the specific problem you're asking about, but this query would work:
select bb_id,
bb_name,
REGEXP_COUNT(b_type, ',')+1
from bloodbank;
However, this solution ignores a MAJOR issue with your data, which is that you do not normalize it as #Tim Biegeleisen correctly instructs you to do. The solution I've provided is EXTREMELY hacky in that it counts the commas in your string to determine the number of blood types. This is not at all reliable, and you should 100% do what Tim B recommends. But for the circumstances you find yourself in, this will tell you how many different blood types are kept at a specific blood bank.
http://sqlfiddle.com/#!4/8ed1c2/2

You should normalize your data and get each blood type value onto a separate record. That is, your starting data should look like this:
BB_ID | BB_name | b_type | salary
370 | new york Blood Bank | A+ | 12000
370 | new york Blood Bank | A- | 12000
370 | new york Blood Bank | A+ | 12000
... and so on
With this data model, the query you want is something along these lines:
SELECT BB_ID, BB_name, b_type, COUNT(*) AS cnt
FROM bloodbank
GROUP BY BB_ID, BB_name, b_type;
Or, if you want just counts of types across all bloodbanks, then use:
SELECT b_type, COUNT(*) AS cnt
FROM bloodbank
GROUP BY b_type;

SQL Insert values in order - can it be done in one or two steps?

I am trying to insert values in the defined order.. is there anyway to consolidate the 6 lines into 2 lines of code? Thank you in advance.
students = [('TOM', 6120, 85),
('Jerry', 6110,86),
('Spike', 6120,55),
('Tyke',6110,73),
('Butch',6110,89),
('Toodle',6120,76)]
courses = [(6110,'Data Science I', 'LSB105'),
(6120,'Data Science II', 'LSB109')]
grading = [('A', 90, 100),
('B', 80,90),
('C',70,80)]
import sqlite3
conn = sqlite3.connect('example3.db')
c = conn.cursor()
c.execute('CREATE TABLE students(name TEXT, courseid INTEGER, score INTEGER)') #create a table
c.executemany('INSERT INTO students VALUES(?,?,?)', students)
c.execute('CREATE TABLE courses(courseid INTEGER, name TEXT, classroom TEXT)') #create a table
c.executemany('INSERT INTO courses VALUES(?,?,?)', courses)
c.execute('CREATE TABLE gradingscheme(letter TEXT, lower REAL, upper REAL)') #create a table
c.executemany('INSERT INTO gradingscheme VALUES(?,?,?)', grading)
conn.commit()
conn.close()

you can executescript ...
for example (from doc):
con.executescript("""
insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');
insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery');
insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour');
insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter');
""")

Using Trim function in PL/SQL

I have a below values in a column of table:
Veracruz-Llave Federal Hwy 150
Campeche Federal Hwy 261
Puebla Federal Hwy 190
Morelos Federal Hwy 160
Puebla Federal Hwy 160
I would like to update the table such that state name before the string 'Federal Hwy' is deleted/removed.
Desired output would be:
Federal Hwy 150
Federal Hwy 261
Federal Hwy 190
Federal Hwy 160
Ie., Remove anything before 'Federal Hwy'
Can TRIM() function help here?

I'm guessing you are cleaning up addresses and need to do an update. Test it first like this with some sample addresses:
select regexp_substr('Veracruz-Llave Federal Hwy 150', ' (Federal.*)$', 1, 1, null, 1 ) from dual;
Then make a backup of the table, and run (untested, use at your own risk) - this assumes ALL of your column data has a single space after the text you want to delete, it is the first space in the line and you want to keep the rest of the string:
update table set column_name = regexp_substr(column_name, ' (Federal.*)$', 1, 1, null, 1 );
This takes the first subgroup (whats in the parens) of the first occurrence of a pattern of a space, followed by 0 or more characters of any type anchored to the end of the line. In other words keep everything after but not including the first space.
Edit: Amended example based on new info provided by the OP below.

You can use REPLACE along with SUBSTR and INSTR
SQL> create table states (name varchar2(200));
insert into states values('Campeche Federal Hwy 261');
insert into states values ('Puebla Federal Hwy 190');
insert into states values('Morelos Federal Hwy 160');
insert into states values ('Puebla Federal Hwy 160');
commit;
-- OUTPUT:
Table created.
1 row created.
1 row created.
1 row created.
1 row created.
Commit complete.
--update table STATES to get only state
SQL>update STATES
set NAME = REPLACE
(NAME,
SUBSTR(NAME,0,
(INSTR(NAME,'Federal',1,1))-1
)
,'');
--4 rows updated.
SQL> commit;
Commit complete.
SQL> select * from STATES;
NAME
----------------------------------------------------------------------------------------------------
Federal Hwy 261
Federal Hwy 190
Federal Hwy 160
Federal Hwy 160

I assume that you need to trim state names(i.e. first word) from string;
logic is use substr fuction to give require string, instr will return position of first space so that trimming can start from it.
update table_name set column_name = ltrim(substr(column_name, instr(column_name,' ')));

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joining two tables based off of parsed column content - sql

Related

How would I create one data set from a table that contains two identical values in the first column but different data in the remaining columns Oracle

Filter condition using "parent" CurrentMember

SQL - count function not working correctly

SQL Insert values in order - can it be done in one or two steps?

Using Trim function in PL/SQL

Categories

Resources