how to pull all records in hive based on another column

how to pull all records in hive based on another column - sql

If this is what my table looks like below:
my_id my_words my_people my_number
100 need more info? Jim 1
100 now Mary 2
100 what's that? Jim 3
101 okay now Jim 1
101 sounds good Mary 2
102 still hungry? Jim 1
102 now I'm thirsty though Mary 2
102 I don't understand Jim 3
102 no I'm not hungry Mary 4
103 are you there? Jim 1
103 I don't know Mary 2
103 That's okay Jim 3
How can I get this output?
my_id my_words my_people my_number
100 need more info? Jim 1
100 now Mary 2
100 what's that? Jim 3
102 still hungry? Jim 1
102 Now I'm thirsty though Mary 2
102 I don't understand Jim 3
right now I have: SELECT my_id, my_words, my_people, my_number from table where my_people="Mary" AND lower(my_words) like 'now%';
But I don't only want to return those rows, I also want to return Jim's comment right before and right after Mary's (before/after based on my_number column)
Maybe this is unrelated, but ultimately, I'm going to want this in Excel with this format:
my_id Jim_words Mary_words Jim_next_words
100 need more info? now what's that?
102 still hungry? now I'm thirsty though I don't understand

Could you please try below code? Code has explanations as comment.
WITH cte as (SELECT my_id, my_words, my_people, my_number
row_number() over( partition by my_id order by my_number) as rn --giving a unique row number for a my id
from table)
SELECT
distinct mytab.my_id, chosentab.my_words jims_words, mytab.my_people, mytab.my_number,
case when case when lower(mytab.my_words) like 'now%' then mytab.rn+1 end = chosentab.rn then chosentab.my_words end jims_words_after_marys_now,
case when case when lower(mytab.my_words) like 'now%' then mytab.rn-1 end = chosentab.rn then chosentab.my_words end jims_words_before_marys_now
FROM
cte mytab,
cte chosentab
where
mytab.my_id=chosentab.my_id and
case when lower(mytab.my_words) like 'now%' then mytab.rn+1 end = chosentab.rn and -- selecting jims rows where mary said now after jim
case when lower(mytab.my_words) like 'now%' then mytab.rn-1 end = chosentab.rn -- selecting jims rows where mary said now before jim
Now I created the SQL based on our discussion. Could you please validate and let me know it if worked?

Related

Self JOIN to find the parent detail which matches with the row data -

I am trying to query in MS SQL and I can not resolve it. I have a table employees:
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
1 John Green James Sue null 101011 1
2 Michael Sloan Barry Lilly null 101011 1
3 Sally Green Andrew Molly Jemi 101011 1
4 Barry Sloan Soul Paul Lilly 101011 0
5 James Green Ned White Sue 101011 0
I want a query that selects rows where the father name and mother name of child matches with name and wife name. For the example table, where I want to return the result of rows where father and mother name matches the name and wife name column. For eg. id=1, where John's father name James and mother name Sue matches with id 5 which returns James as first name and Sue as wife name. So my query should return (this is my expected result)
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
5 James Green Ned White Sue 101011 0
4 Barry Sloan Soul Paul Lilly 101011 0
I tried with the below query but it checks for James only. How to change my query so that it checks all the names and returns the expected result.
select * FROM employees
where first_name like '%James%'
and wife_name like '%Sue%'
and pincode=101011;
Any tips on this will be really helpful. I am new to joins, need help on writing self join to get the result.

…
select *
from thetable as p -- the parent/father
where exists -- with one child at least
(
select *
from thetable as c
where c.fathername = p.name
and c.mothername = p.wifename
-- lastname?
)

Too long for a comment, but also not intended as a slam against what you are working with. Please take as constructive criticism.
Aside from VERY POOR DESIGN on the table content, getting that corrected before you get too deep into whatever you are working should be done first. A more typical design might be having a table of people. Now, to get the relationships you could do a couple ways. One is that on each individual person's record, you add 2 additional IDs. FatherID, MotherID. These IDs would join directly back to the child vs hard strings to match against. Take a surname like Smith or Jones. Then, look at the many instances of a "John Smith" may exist, yes a lot, and lower probability of finding a matching wife's name of Sue, Mary or whatever else name. But even that could lead to multiple possibilities. Yes, you are adding a PIN, but even a computer can generate a random pin of 1234.
By having the IDs, there is NO ambiguity of who the relationship is with.
If the data were slightly altered to something like
Id Name Surname FatherID MotherID SpouseID
-- ------- ------- ---------- ---------- --------
1 John Green 5 6 null
2 Michael Sloan 4 3 null
3 Lilly Sloan null null 4
4 Barry Sloan null null 3
5 James Green 9 10 6
6 Sue Green 7 8 5
7 Bill Jones null null 8
8 Martha Jones null null 7
9 Brian Green null null 10
10 Beth Smith-Green null null 9
So, in this modified example, you can see right away that ID#1 John Green has parents of Father (ID#5) is James and Mother (ID#6) is Sue. But even from this, James is a child to Father (ID#9) Brian and Mother (ID#10) Beth. This scenario is showing to a grand-parent level capacity and that each of James and Sue are also children but to their respective parents. Sue's parents of the Jones surname.
For Michael Sloan, parents of #4 Barry, and #3 Lilly.
And I additionally added a spouse ID. This prevents redundancy of people's names copied all over. Then you can query based on the child's parent's respective IDs to find out vs a hopeful name LIKE guess.
So, even though not solving a relatively simple query, fixing the underlying foundation of your database and is relations will, long-term, help ease your querying in the future.

Try this:
SELECT
T2.*
FROM Employee T1
JOIN Employee T2 ON T2.Name = T1.FatherName
AND T2.WifeName = T1.MotherName

SQL for joining two tables and grouping by shared column

I want to join two tables and use the column they both share to group the results, including a null result for those accountIds which only appear in one table.
Table a
AccountId
productApurchases
Steve
1
Jane
5
Bill
10
Abed
2
Table b
AccountId
productApurchases
Allan
1
Jane
10
Bill
2
Abed
1
Mike
2
Desired output
AccountId
productApurchases
productBpurchases
Steve
1
0
Jane
5
10
Bill
10
2
Abed
2
1
Mike
0
2
I've been trying with various joins but cannot figure out how to group by all the account ids.
Any advice much appreciated, thanks.

Use full join:
select accountid,
coalesce(productApurchases, 0) as productApurchases,
coalesce(productBpurchases, 0) as productBpurchases
from a full join
b
using (accountid);

SQL statements without group by

If I wanted to find all values in a table that occur more than twice without using group by, how would I do that? I understand how to do this with group by and was curious how to do it without group by (EDIT: could you do this with join?).
For example, if I had last names in a certain zip code, and I wanted to find entries with this last name more than twice, how would I do this without group by in SQL statements?
I tried
select name, count() from population order by name asc having count() > 2;
but that doesn't do what I want it to. Any suggestions?

Being this tagged only as sql it seems a general solution is being looked for. Since the SQL:2003 revision it should be fair to say that this can be solved with window functions:
SELECT name FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) rank,
name
FROM population
) s
WHERE rank = 3
See a sample fiddle here.
Anyway, the fact that it is possible to solve this without a GROUP BY doesn't mean that it should :)

It seems correlated query can work for you. Please check.
Assuming the data set given below
Id Zip Lastname
--- ----- --------
101 12345 John
102 12345 John
103 12345 John
104 12345 Ram
105 12345 Kelly
106 12345 Kelly
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna
110 45678 David
111 45678 David
Query
select * from test.population pop1
where 2 < (select count(*) from test.population pop2
where pop1.Lastname=pop2.Lastname and pop1.Zipcode = pop2.Zipcode)
The output of above query is
Id Zip Lastname
--- ------ --------
101 12345 John
102 12345 John
103 12345 John
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna

SQL Query that leaves duplicate fields blank?

What I would like is if my data is this:
ID Name
0 Jim
1 Dave
1 Bob
1 John
2 Ann
My query returns this:
ID Name
0 Jim
1 Dave
Bob
John
2 Ann
Is there a simple way to do this?
Edit:
The query which returns the original 2 columns of data would be something like this:
SELECT ID, NAME
FROM TestTable
ORDER BY ID

Merging two rows to one while replacing null values

Let's say I've got the following database table
Name | Nickname | ID
----------------------
Joe Joey 14
Joe null 14
Now I want to do a select statement that merges these two columns to one while replacing the null values. The result should look like this:
Joe, Joey, 14
Which sql statement manages this (if it's even possible)?

Simplest solution:
SQL> select * from t69
2 /
NAME NICKNAME ID
---------- ---------- ----------
Joe Joey 14
Joe 14
Michael 15
Mick 15
Mickey 15
SQL> select max(name) as name
2 , max(nickname) as nickname
3 , id
4 from t69
5 group by id
6 /
NAME NICKNAME ID
---------- ---------- ----------
Joe Joey 14
Michael Mickey 15
SQL>
If you have 11gR2 you could use the new-fangled LISTAGG() function but otherwise it is simple enough to wrap the above statement in a SELECT which concatenates the NAME and NICKNAME columns.

AFAIK,
the question is not clear.so i am making some assumptions over here.
your output has the first and 3rd columns for both the rows as same.
Only the 2nd field is different.
so u can simply write a select quest
select one.name,two.nick_name,one.id from
(select name,id from your_tb group by name,id) one,
your_tb two
where two.nickname is not NULL
and two.name=one.name
and two.id=one.id;
may be we can tune this but i am not good in tuning sql squeries,but this is the way i suppose u need.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas