full outer join behavior - hive

HIVE SQL - I have 2 tables. survey and survey_comments and the structure is as shown below:
create external table if not exists survey(
id string,
category_name string,
subcategory_name string)
STORED AS parquet;
insert into survey(id, category_name, subcategory_name)
values ('1', 'Engine', 'Engine problem other than listed');
insert into survey(id, category_name, subcategory_name)
values ('1', 'Exterior Body', 'Color match of painted parts');
insert into survey(id, category_name, subcategory_name)
values ('1', 'Exterior Body', 'Tail lights');
insert into survey(id, category_name, subcategory_name)
values ('1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up');
insert into survey(id, category_name, subcategory_name)
values ('1', 'Transmission', 'Rough shifting');
create external table if not exists survey_comments(
id string,
category_name_txt string,
subcategory_name_txt string,
comments string)
STORED AS parquet;
insert into survey_comments(id, category_name_txt, subcategory_name_txt)
values ('1', 'Exterior Body', 'Tail lights', 'Moisture in lower portion of rear tail lights along with leaves etc.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt)
values ('1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up', 'Small amount of fog low on front windshield during/after rain.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt)
values ('1', 'Miscellaneous', 'General problem other than listed', 'When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt)
values ('1', 'Miscellaneous', 'General problem other than listed', 'Touch-up paint too red, not same red as on the car.');
Now my full outer join is as shown below:
select b.id, b.category_name, b.subcategory_name, a.category_name_txt, a.sub_category_name_txt, a.comments
from survey b full outer join survey_comments a
on (
b.id = a.id and
b.category_name = a.category_name_txt and b.subcategory_name = a.sub_category_name_txt
)
I am not getting the rows in survey_comment_txt that has category_name as 'Miscellaneous'. I need non matching rows in survey and survey_comments as seperate rows and matching rows as well. What am I doing wrong.

Tested using CTEs, have found two issues. 1st: You are inserting four columns in the survey_comments, but named only three. 2nd: in the query column name should be subcategory_name_txt, not sub_category_name_txt.
Test after fixing:
with survey as (
select stack(5,
'1', 'Engine', 'Engine problem other than listed',
'1', 'Exterior Body', 'Color match of painted parts',
'1', 'Exterior Body', 'Tail lights',
'1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up',
'1', 'Transmission', 'Rough shifting') as (id, category_name, subcategory_name)
),
survey_comments as (
select stack(4,
'1', 'Exterior Body', 'Tail lights', 'Moisture in lower portion of rear tail lights along with leaves etc.',
'1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up', 'Small amount of fog low on front windshield during/after rain.',
'1', 'Miscellaneous', 'General problem other than listed', 'When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.',
'1', 'Miscellaneous', 'General problem other than listed', 'Touch-up paint too red, not same red as on the car.') as (id, category_name_txt, subcategory_name_txt,comments)
)
select b.id, b.category_name, b.subcategory_name, a.category_name_txt, a.subcategory_name_txt, a.comments
from survey b full outer join survey_comments a
on (
b.id = a.id and
b.category_name = a.category_name_txt and b.subcategory_name = a.subcategory_name_txt
)
Returns:
b.id b.category_name b.subcategory_name a.category_name_txt a.subcategory_name_txt a.comments
1 Engine Engine problem other than listed NULL NULL NULL
1 Exterior Body Color match of painted parts NULL NULL NULL
1 Exterior Body Tail lights Exterior Body Tail lights Moisture in lower portion of rear tail lights along with leaves etc.
1 Heating/Ventilation and Cooling Front windshield fogs up Heating/Ventilation and Cooling Front windshield fogs up Small amount of fog low on front windshield during/after rain.
NULL NULL NULL Miscellaneous General problem other than listed Touch-up paint too red, not same red as on the car.
NULL NULL NULL Miscellaneous General problem other than listed When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.
1 Transmission Rough shifting NULL NULL NULL
Miscellaneous is returned.

Related

SQL Error: ORA-00933: SQL command not properly ended?

I am wanting to insert data or to populate a table with new data and have used the insert into command on its own as well as insert into with the columns and values underneath but keep getting the error in the title.
INSERT INTO A2_FILM (FILM_NO, FILM_NAME, CLASSIFICATION, DURATION, DESCRIPTION, YEAR_RELEASED)
VALUES (00948371, 'Lightyear', 'U', 105, 'Legendary space ranger Buzz Lightyear embarks on an intergalactic adventure alongside ambitious recruits Izzy, Mo, Darby, and his robot companion, Sox.', TO DATE('2022', 'YYYY'));
With Oracle 19c and SqlDeveloper 21 i not received any error:
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
FILM_NO NUMBER(12)
FILM_NAME VARCHAR2(100)
CLASSIFICATION CHAR(1)
DURATION NUMBER(5)
DESCRIPTION VARCHAR2(1000)
YEAR_RELEASED DATE
INSERT INTO A2_FILM (FILM_NO, FILM_NAME, CLASSIFICATION, DURATION, DESCRIPTION, YEAR_RELEASED)
VALUES (00948371,
'Lightyear',
'U',
105,
'Legendary space ranger Buzz Lightyear embarks on an intergalactic adventure alongside ambitious recruits Izzy, Mo, Darby, and his robot companion, Sox.',
TO_DATE('2022', 'YYYY')
);
1 row inserted.
If the populated value in YEAR_RELEASED column has to equal to YYYY, it looks like you have to change the datatype to NUMBER. Otherwise if you execute INSEERT INTO TO_DATE('2022', 'YYYY') will return DD-MM-YYYY which will be equal to SYSDATE, i.e. in this case the return value is 01-07-2022.
If you change the datatype to NUMBER for YEAR_RELEASED column you can use EXTRACT() and try this one:
INSERT INTO A2_FILM (FILM_NO, FILM_NAME, CLASSIFICATION, DURATION, DESCRIPTION, YEAR_RELEASED) VALUES (00948371, 'Lightyear', 'U', 105, 'Legendary space ranger Buzz Lightyear embarks on an intergalactic adventure alongside ambitious recruits Izzy, Mo, Darby, and his robot companion, Sox.', EXTRACT (YEAR FROM TO_DATE('2022-07-01', 'YYYY-MM-DD')));
db<>fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=91ba7c0f3117bd6cfc814656a896856e
If the populated value has to be equal to DD-MM-YYYY you can use :
INSERT INTO A2_FILM (FILM_NO, FILM_NAME, CLASSIFICATION, DURATION, DESCRIPTION, YEAR_RELEASED) VALUES (00948371, 'Lightyear', 'U', 105, 'Legendary space ranger Buzz Lightyear embarks on an intergalactic adventure alongside ambitious recruits Izzy, Mo, Darby, and his robot companion, Sox.', TO_DATE('2022', 'YYYY'));
db<>fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=41aac71840efc45eaa62e9db42fb75c5

Filter condition using "parent" CurrentMember

Here is the data-set:
CREATE TABLE Movies(id INT, name VARCHAR(50), genre VARCHAR(50), budget DECIMAL(10));
INSERT INTO Movies VALUES
(1, 'Pirates of the Caribbean', 'Fantasy', 379000000),
(2, 'Avengers', 'Superhero', 365000000),
(3, 'Star Wars', 'Science fiction', 275000000),
(4, 'John Carter', 'Science fiction', 264000000),
(5, 'Spider-Man', 'Superhero', 258000000),
(6, 'Harry Potter', 'Fantasy', 250000000),
(7, 'Avatar', 'Science fiction', 237000000);
To filter relatively to a constant value no problem, e.g. to get all the movies with a budget higher than 300M$:
WITH
MEMBER X AS SetToStr(Filter(Movie.[Name].[Name].Members - Movie.[Name].CurrentMember, Measures.Budget > 300000000))
SELECT
Movie.[Name].[Name].Members ON ROWS,
X ON COLUMNS
FROM
Cinema
Which gives:
Avatar {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Avengers {[Movie].[Name].&[Pirates of the Caribbean]}
Harry Potter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
John Carter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Pirates of the Caribbean {[Movie].[Name].&[Avengers]}
Spider-Man {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Star Wars {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
But how to compare to the budget of the current movie instead of the hard-coded 300M$ to get the movies more expensive than the current one?
It would give {} for "Pirates of the Caribbean" as it is the most expensive movie.
For "Avengers" it would be { 'Pirates of the Caribbean' } as this is the second most expensive and only "Pirates of the Caribbean" is more expensive.
For "Avatar" it would give all the other movies as it is the less expensive.
The issue is that inside the Filter function's condition CurrentMember refers to the currently tested tuple and not the one currently selected on the ROWS axis.
Instead of using Filter() for each movie, I would first compute an ordered set of movies based on budget values. Then X could be defined using the SubSet and Rank function.
Here is an example using a different schema but I guess you'll get the point easily:
with
set ordered_continents as order( [Geography].[Geography].[Continent], -[Measures].[#Sales] )
member xx as SetToStr( SubSet( ordered_continents, 0, Rank( [Geography].[Geography].currentMember, ordered_continents) - 1))
select {[#Sales], [xx] } on 0, [Geography].[Geography].[Continent] on 1 from [Sales]
I'm not familiar with SSAS so I'm using icCube but I guess the MDX should be very much similar.

Advice on SQL query for finding records that are included in a range of epoch times

EDITED to include a 'CREATE TABLE' script for those that need less abstraction (see bottom of question)
Working a water management alarm system where the hardware runs a scheduled timer 'program' that contains start/stop times, and each program may run one or more 'stations' (valves or sprinklers or whatever) that run with their own start/stop times controlled by the 'program'. See the block below for a rough example, followed by an explanation.
|<---- Program 1 ----------------------------------------------->|
|<-- Station 1 -->| |<---- Station 2 ---->|<---- Station 3 --->|
|<---- Program 2 ---------------------------------------->|
|<---- Station 4 ---->|<---- Station 5 -->|
|<---- Program 3 --->|
|<---- Station 6 -->|
|<-- Master Pump Fail Alarm -->|
In the above 'pseudo schedule', Program 1 has 3 stations running and Program 2 has 2 stations running. Program 3 has one station. A master pump fails exactly when Station 5 in Program 2 starts, but the alarm ends (maybe got fixed?) after Program 2 ends but still within the run time of Program 1. Arrows indicate start and stop times and range of the event's running time.
Basically, I'm trying to suss out a WHERE clause algorithm along the lines of something like:
SELECT stuff WHERE "a program's or station's running time occurred *during* the alarm period (alarm.begintime to alarm.endtime in the schema below)".
The result for that using the above example schedule should return a list of records that includes Program 1 with Stations 2 and 3, and Program 2 with only Station 5.
The start and end times for Programs, Stations and Alarms are recorded in the database in EPOCH/UNIXTIME. Database engine is PostgreSQL 9.6.5 on x86_64-pc-linux-gnu.
If something needs more clarification, please ask. Thanks in advance for any tidbits to help solve the problem.
Schema below should create a similar layout to the visualization block above:
CREATE TABLE test(
oid PRIMARY KEY,
itemid INTEGER,
itemtype VARCHAR (255),
begintime INTEGER,
endtime INTEGER
);
INSERT INTO test VALUES (1, 1, 'Program', 1556809522, 1556809680);
INSERT INTO test VALUES (2, 1, 'Station', 1556809522, 1556809560);
INSERT INTO test VALUES (3, 2, 'Station', 1556809620, 1556809642);
INSERT INTO test VALUES (4, 3, 'Station', 1556809642, 1556809680);
INSERT INTO test VALUES (5, 2, 'Program', 1556809522, 1556809670);
INSERT INTO test VALUES (6, 4, 'Station', 1556809541, 1556809630);
INSERT INTO test VALUES (7, 5, 'Station', 1556809630, 1556809660);
INSERT INTO test VALUES (8, 3, 'Program', 1556809522, 1556809618);
INSERT INTO test VALUES (9, 6, 'Station', 1556809522, 1556809617);
INSERT INTO test VALUES (10, 1, 'Alarm', 1556809630, 1556809675);

SQL Insert values in order - can it be done in one or two steps?

I am trying to insert values in the defined order.. is there anyway to consolidate the 6 lines into 2 lines of code? Thank you in advance.
students = [('TOM', 6120, 85),
('Jerry', 6110,86),
('Spike', 6120,55),
('Tyke',6110,73),
('Butch',6110,89),
('Toodle',6120,76)]
courses = [(6110,'Data Science I', 'LSB105'),
(6120,'Data Science II', 'LSB109')]
grading = [('A', 90, 100),
('B', 80,90),
('C',70,80)]
import sqlite3
conn = sqlite3.connect('example3.db')
c = conn.cursor()
c.execute('CREATE TABLE students(name TEXT, courseid INTEGER, score INTEGER)') #create a table
c.executemany('INSERT INTO students VALUES(?,?,?)', students)
c.execute('CREATE TABLE courses(courseid INTEGER, name TEXT, classroom TEXT)') #create a table
c.executemany('INSERT INTO courses VALUES(?,?,?)', courses)
c.execute('CREATE TABLE gradingscheme(letter TEXT, lower REAL, upper REAL)') #create a table
c.executemany('INSERT INTO gradingscheme VALUES(?,?,?)', grading)
conn.commit()
conn.close()
you can executescript ...
for example (from doc):
con.executescript("""
insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');
insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery');
insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour');
insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter');
""")

Joining two tables based off of parsed column content

I am trying to join two tables based off of two columns that have been combined (often imperfectly) into another tables column. I am trying to join the tables so the correct records are associated with each other, so I can compare the FDebit and the Debit fields.
The FMEMO is normally generated by taking the Num, then adding a space, then adding the Memo text. As you can see below, our process is not yet perfect. I would therefore like to match the Num, then the space, then the first 10 characters of the Memo field to the FMEMO field.
I have included code below with sample data. Could you please offer me some suggestions on how to accomplish this?
Invoices table
MEMO Num DEBIT
Supplies. Soto Cano 1135 2.25
Suction Hose (1-1/2") by the food 3 74.04
Hose/Tubing:Braided Hose (1") by the food 3 98.72
QP10 Meyers surface pump (60hz) 3 206.27
Cage including f tank, alum parts box and 2 buckets of alum 3 752.03
Cage including valve manifold, F1 & F2 3 3774.08
cage with IBC in it 1135 268.41
Pvc accesories for installation of LWTS. 1175 4.26
Pvc accesories for installation of LWTS. 1175 27.26
Expenses table
FMEMO FDebit
Supplies. Soto Cano 41.8 2.25
3 Suction Hose (1-1/2 74.04
3 Hose/Tubing:Braided Hose (1 98.72
3 QP10 Meyers surface pump (60hz) 3970 206.27
3 Cage including f tank, alum parts box and 2 buckets of alum 14474 752.03
3 Cage including valve manifold, F1 & F2 72638 3774.08
3 cage with IBC in it 5166 268.41
1175 Pvc accesories for installation of LWTS. 82.03 4.26
1175 Pvc accesories for installation of LWTS. 524.67 27.26
Code to replicate:
CREATE TABLE #tempExpenses (
FMEMO varchar(Max), FDebit money)
CREATE TABLE #tempInvoices (
MEMO varchar(Max), Num integer, DEBIT money)
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('Supplies. Soto Cano 41.8', 2.25)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Suction Hose (1-1/2', 74.04)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Hose/Tubing:Braided Hose (1', 98.72)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 QP10 Meyers surface pump (60hz) 3970', 206.27)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Cage including f tank, alum parts box and 2 buckets of alum 14474', 752.03)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 Cage including valve manifold, F1 & F2 72638', 3774.08)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('3 cage with IBC in it 5166', 268.41)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('1175 Pvc accesories for installation of LWTS. 82.03', 4.26)<br/>
INSERT INTO #tempExpenses (FMEMO, FDEBIT) VALUES ('1175 Pvc accesories for installation of LWTS. 524.67', 27.26)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Supplies. Soto Cano', 1135, 2.25)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Suction Hose (1-1/2") by the food', 3, 74.04)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Hose/Tubing:Braided Hose (1") by the food', 3, 98.72)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('QP10 Meyers surface pump (60hz)', 3, 206.27)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Cage including f tank, alum parts box and 2 buckets of alum', 3, 752.03)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Cage including valve manifold, F1 & F2', 3, 3774.08)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('cage with IBC in it', 1135, 268.41)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Pvc accesories for installation of LWTS.', 1175, 4.26)<br/>
INSERT INTO #tempInvoices (MEMO, Num, DEBIT) VALUES ('Pvc accesories for installation of LWTS.', 1175, 27.26)<br/>
SELECT *
FROM #tempExpenses
SELECT *
FROM #tempInvoices
Well I really hate myself for producing this TSQL but I think this is what you are looking for:
SELECT *
FROM #tempInvoices i
INNER JOIN #tempExpenses e ON CAST(Num as varchar(10)) + ' ' + SUBSTRING(MEMO,1,9-LEN(CAST(NUM as varchar(10)))) = SUBSTRING(FMEMO,1,10)
Which concatenates the number and takes so many characters form the field i.e if 3 then 9, if 1111 then 9-4 and joins with the same amount of characters form the other table.
Of course this is a very inefficient and ugly query. I would rather normalize the data in the database (parse clean etc)