Hive : join 2 tables and select different columns in single query

Hive : join 2 tables and select different columns in single query - hive

I have two tables in hive say Table A and Table B. Basically i want to join both of them and want to select the different column based on some condition in single query .
Table A:
empid;name;sal;dept
1;'X';100;IT
2;'Y';100;IT
3;'Z';100;ADMIN
Table B:
empid;name;address
1;'X';A
2;'Y';B
3;'Z';C
Desired output:
When Dept='IT'
select empid,name,address from Table A join Table B on (A.empid=B.empid)
When Dept='ADMIN'
select empid,address from Table A join Table B on (A.empid=B.empid)
Can someone please help me with the approach?

If you are looking for a single query, the output has to be in the same structure.
I assume you don't want to show the name of the ADMIN due to some security reasons.
If so, you could do the below instead. Admin's names will be shown as 'Name not available'. Hope this helps. If not, please tell us the reason behind your question.
SELECT ta.empid,
CASE WHEN ta.dept = 'IT' THEN ta.name
WHEN ta.dept = 'ADMIN' THEN 'Name not available'
END AS name,
tb.address
FROM tableA ta,
tableB tb
WHERE ta.empid = tb.empid;

Related

SQL Inner Join w/ Unique Vals

Questions similar to this one about using DISTINCT values in an INNER JOIN have been asked a few times, but I don't see my (simple) use case.
Problem Description:
I have two tables Table A and Table B. They can be joined via a variable ID. Each ID may appear on multiple rows in both Table A and Table B.
I would like to INNER JOIN Table A and Table B on the distinct values of ID which appear in Table B and select all rows of Table A with a Table A.ID which appears matching some condition in Table B.
What I want:
I want to make sure I get only one copy of each row of Table A with a Table A.ID matching a Table B.ID which satisfies [some condition].
What I would like to do:
SELECT * FROM TABLE A
INNER JOIN (
SELECT DISTINCT ID FROM TABLE B WHERE [some condition]
) ON TABLE A.ID=TABLE B.ID
Additionally:
As a further (really dumb) constraint, I can't say anything about the SQL standard in use, since I'm executing the SQL query through Stata's odbc load command on a database I have no information about beyond the variable names and the fact that "it does accept SQL queries," ( <- this is the extent of the information I have).

If you want all rows in a that match an id in b, then use exists:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Trying to use join just complicates matters, because it both filters and generates duplicates.

SQL SELECT query where the IDs were already found

I have 2 tables:
Table A has 3 columns (for example) with opportunity sales header data:
OPP_ID, CLOSE_DTTM, STAGE
Table B has 3 columns with the individual line items for the Opportunities:
OPP_LINE_ID, OPP_ID, AMOUNT_USD
I have a select statement that correctly parses through Table A and returns a list of Opportunities. What I would like to do is, without joining the data, to have a SELECT statement that will get data from Table B but only for the OPP_IDs that were found in my first query.
The result should be 2 views/resultset (one for each select query) and not just 1 combined view where Table B is joined to Table A.
The reason why I want to keep them separate is because I will have to perform a few manipulations to the result from table B and i don't want the result from table A affected.

Subquery is all what you need
SELECT OPP_ID, CLOSE_DTTM, STAGE
From table a
where a.opp_id IN (Select opp_id from table b)

Presuming you're using this in some client side data access library that represents B's data in some 2 dimensional collection and you want to manipulate it without affecting/ having A's data present in that collection:
Identify the records in A:
SELECT * FROM a WHERE somecolumn = 'somevalue'
Identify the records in B that relate to A, but don't return A's data:
SELECT b.* FROM a JOIN b ON a.opp_id = b.opp_id WHERE a.somecolumn = 'somevalue'
Just because JOIN is used doesn't mean your end-consuming program has to know about A's data. You could also use IN, like the other answer does, but internally the database will rewrite them to be the same thing anyway

I tend to use exists for this type of query:
select b.*
from b
where exists (select 1 from a where a.opp_id = b.opp_id);
If you want two results sets, you need to run two queries. It is unclear what the second query is, perhaps the first query on A.

How to populate query with result from different table?

I have two tables. Table A and table B. Table A has a column that is a reference to the primary key to table B. I want to run a select query on table A and then populate the column that referrers to B with all of the data in that row of B.
SELECT * from A a LEFT JOIN B b ON a."b_id" = b."id" WHERE ...
That gives a result with each row containing all of the columns of A and all of the columns of B. It is a confusing mess to figure out which column is from which table. I want to be able to do something like.
row.A."column name"
row.B."column name"
I don't want to have to rename every single column using AS. There must be a better way to do this.

Not a 100% sure what your asking but what I think your asking is.
You want a way to have only column B values to show? If so you could do:
SELECT B.*
FROM A
JOIN B
ON A.b_id = B.id
That will only get you the B columns and data, If you want A also maybe do but you want to have it separate from b maybe do:
SELECT B.*,'|' AS ['|'], A.*
FROM A
JOIN B
ON A.b_id = B.id
Hopefully this is helpful, if not to you maybe another reader.

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?

You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);

SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'

This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?

It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

Fetch all data when 'where statement" can contain null SQL-server 2012

I am trying to get some data on my web page in a table using SQL server 2012 but i'm having a hard time writing my SELECT query.
This is what I have so far:
SELECT DISTINCT a.Id_972 Id,
a.Datum_972 Date,
a.Omschrijving_972 Title,
a.Bedrag_972 Amount,
b.OmschrN_977 Type,
c.OmschrKN_976 Project,
d.Status_975 Status
FROM WebOnkosten_972 AS a,
WebOnkostenType_977 AS b,
WebOnkostenProject_976 AS c,
WebOnkostenToestand_975 AS d
WHERE a.Type_972 = b.Type_977' AND
a.Project_972 = c.ProjectNR_976 AND
a.Id_972 = d.IdOnkost_975
This straight forward Select query get the data like it should, but it is not exactly what I want.
Im fetching my project name from table 'c' with an id in table 'a'.
My problem here is that it is possible that projects can be nulls. But I still want to get every record. I want to show this data in a table. I want to have an emtpy cell where project is a null in the DB. I understand why this query does not give me the records where the projectId from table a are null. But I can't find a way to make it happen.
Can anybody help me?
Sorry for my imperfect English and a bad title. I didn't really know what to say there.

If you mean that there are records in table 'a' that do not have a match in table 'c', then change the FROM clause to LEFT JOINS.
example :
SELECT A.PK, B.PK
FROM A
LEFT JOIN B ON a.pk = b.pk
This will bring all the records from table 'A'. For each match on table B, the result set will display (A.pk,B.pk) .
For each A.pk that has no match, the result set will display (A.pk,NULL)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive : join 2 tables and select different columns in single query - hive

Related

SQL Inner Join w/ Unique Vals

SQL SELECT query where the IDs were already found

How to populate query with result from different table?

SQL Query returns more

Fetch all data when 'where statement" can contain null SQL-server 2012

Categories

Resources