How can I add one more table to this hive command?

How can I add one more table to this hive command? - hive

dt1=$1
desfile="data_$dt1"
hql="
select DISTINCT b.mid, b.create_time, d.content from
(select mid, create_time, to_id, dt from table_1 where dt>=$dt1 and to_id='1042015' ) b
join
(select mid, content from table_2 where dt>=$dt1) d
on(b.mid=d.mid)
"
hive -e "$hql"> $desfile
This query is to get content from two tables table_1 and table_2, different fields from different tables. If I want to get one more field from another table table_3 conditioned on the same 'mid' field, should I change the query to this form below:
hql="
select DISTINCT a.off_time, b.mid, b.create_time, d.content from
(select mid, create_time, to_id, dt from table_1 where dt>=$dt1 and to_id='1042015' ) b
join
(select mid, content from table_2 where dt>=$dt1) d
join
(select mid, off_time from table_3 where dt>=dt1) a
on(b.mid=d.mid=a.mid)
"

You can't join on all three tables at the same time, but you should join b and d first, and then join to a.
select DISTINCT a.off_time, b.mid, b.create_time, d.content from
(select mid, create_time, to_id, dt from table_1 where dt>=$dt1 and to_id='1042015') b
join
(select mid, content from table_2 where dt>=$dt1) d
on(b.mid = d.mid)
join
(select mid, off_time from table_3 where dt>=$dt1) a
on(b.mid = a.mid)

Related

Joining with a distinct column

How do I join a table with a distinct value in a SQL view?
My code looks like below
Select a, b, c, d from TableA inner join TableB on TableA.account = TableB.account
I want TableB.account to have distinct values when I join the table.
The selected fields(a,b,c,d) do not have to be distinct values.

SELECT distinct_foo.thing
, bar.other_things
FROM (
SELECT DISTINCT
thing
FROM foo
) AS distinct_foo
INNER
JOIN bar
ON bar.thing = distinct_foo.thing
;

Find common values in many tables

I have 5 tables and I want to find the common values between them in one column. The column name differs in two tables(account_number, account,account_id).
select * from db.table1 as a
INNER JOIN db.table2 as b
ON a.account = b.account_id
INNER JOIN db.table3 as c
ON a.account = c.account_number
INNER JOIN db.table4 as d
ON a.account = d.account_number
INNER JOIN db.table_5 as f
ON a.account = f.account_number;`
I tried the above with the idea of every time compare the last result of the inner join with the new one but it seems wrong.

Your method is okay, but it can return duplicates. Another method is to use union all:
select account
from ((select account, 'a' as which from db.table1) union all
(select account_id, 'b' as which from db.table2) union all
(select account_number, 'c' as which from db.table3) union all
(select account_number, 'd' as which from db.table4) union all
(select account_number, '3' as which from db.table5)
) t
group by account
having count(distinct which) = 5;
Note that this can be easily tweaked to get accounts that are in three or four tables rather than in all of them.

oracle12c,sql,difference between count(*) and sum()

Tell me the difference between sql1 and sql2:
sql1:
select count(1)
from table_1 a
inner join table_2 b on a.key = b.key where a.id in (
select id from table_1 group by id having count(1) > 1
)
sql2:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b on a.key = b.key group by a.id having count(1) > 1
)
Why is the output not the same?

The queries are not even similar. They are very different. Let's check the first one:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
where a.id in (
select id from table_1 group by id having count(1) > 1
) ;
You are first making an inner join:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
In this case, you can use count(1), count(id), count(*), it's equivalent. You are counting the common elements in both tables: those ones that have in common the key field.
After that, you are enforcing this:
where a.id in (
select id from table_1 group by id having count(1) > 1
)
In other words, that every "id" of the table_1 must be at least two times in the table_1 table.
And lastly, you are doing this:
select count(1)
In other words, counting those elements. So, translated into english you have done this:
get every record of table_1 and pair with records of table_2 for the id, and get only those that match
for the result above, filter out only the elements whose id of the table_1 appears more than one time
count that result
Let's see what happens with the second query:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
group by a.id
having count(1) > 1
);
You are making the same inner join:
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
but, you are grouping it by the id of the table:
group by a.id
and then filtering out only those elements who appear more than one time:
having count(1) > 1
The result so far are a set of records that have in common the key field in both tables, but grouped by the id: this means that only those fields that are at leas two times in the table_b are outputed of this join. After that, you group by id, collapsing those results into the table_1.id field and counting the result. I presume that very few records will match this strict criteria.
And lastly, you sum all those set.

When you use count(*) you count ALL the rows. The SUM() function is an aggregate function that returns the sum of all or distinct values in a set of values.

PostgreSQL: Error in left join

I am trying to join my master table to some sub-tables in PostgreSQL in a single select query. I am getting a syntax error and I have the feeling I am making a terrible mistake or doing something which is not allowed. The code:
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2) tbl2 using (id)
left join
-- I get syntax error here
(
With a as (select id from some_table),
b as (Select value from other_table)
Select id, value from a, b) tbl3 using (id)
order by tbl1.id
Can we use WITH clause in left joins sub or nested queries and Is there a better way to do this?
UPDATE1
Well, I would like to add some more details. I have three select queries like this (having unique ID) and I want to join them based on ID.
Query1:
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
Select
id,
my_records
from a left join c on some_condtion_with_a
order by 1
Second query:
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
Third Query:
With d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
select
id,
records..
from f
I tried to join them using left join. The first two queries were joined successfully. While, joining third query I got the syntax error. Maybe, I am complicating things thus I asked is there a better way to do it.

You are over complicating things. There is no need to use a derived table to outer join my_table2. And there is no need for a CTE plus a derived table to join the tbl3 alias:
Select id,
length,
other_stuff
from my_table tbl1
Left join my_table2 tbl2 using (id)
left join (
select st.id, ot.value
from some_table st
cross join other_table ot
) tbl3 using (id)
order by tbl1.id;
This assumes that the cross join you create with Select id, value from a, b is intended.

Not tested, but I think you need this. try:
with a as (select id from some_table),
b as (Select value from other_table)
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2
)
tbl2 using (id)
left join
(
Select id, value from a, b
)
tbl3 using (id)
order by tbl1.id

I've only ever seen/used WITH in the following format:
WITH
temptablename(columns) as (query),
temptablename2(columns) as (query),
...
temptablenameX(columns) as (query)
SELECT ...
i.e. they come first
You'll probably find it easier to write queries if you use indentation to describe nesting levels. I like to make my SELECT FROM WHERE GROUPBY ORDERBY at one indent level, and then tablename INNER JOIN ON etc more indented:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Organising your indents every time you nest down a layer really helps. By making everything that is "a table or a block of data like a table (i.e. a subquery)" indented the same amount you can easily see the notional order that the DB should retrieve
Move your WITHs to the top of the statement, you will still use the alias names in place in the sub sub query of course
Looking at your query, there isn't much point in your subqueries.. You don't do any grouping or particularly complex processing of the data, you just select an ID and another column and then join it in. Your query will be simpler if you don't do this:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Instead, do this:
SELECT
column
FROM
table
INNER JOIN
subtable
ON
table.id = subtable.id
WHERE
blah

Re your updated requirements, following the same pattern.
look for --my comments
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
SELECT * FROM
(
--your first
Select
id,
my_records
from a left join c on some_condtion_with_a
) Q1
LEFT OUTER JOIN
(
--your second
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
) Q2
ON Q1.XXXX = Q2.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
LEFT OUTER JOIN
(
--your third
select
id,
records..
from f
) Q3
ON QX.XXXXX = Q3.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
It'll work, but it might not be the prettiest or most necessary SQL arrangement. As both i and HWNN have said, you can rewrite a lot of these queries where you're just doing some simple selecting in your WITH.. But likely that theyre simple enough that the database optimizer can also see this and rerwite the query for you when it runs it
Just remember to code clearly, and lay your indentation out nicely to stop it tunring into a massive, unmaintainable, undebuggable spaghetti mess

SQL Select Distinct with Conditional

Table1 has columns (id, a, b, c, group). There are several rows that have the same group, but id is always unique. I would like to SELECT group,a,b FROM Table1 WHERE the group is distinct. However, I would like the returned data to be from the row with the greatest id for that group.
Thus, if we have the rows
(id=10, a=6, b=40, c=3, group=14)
(id=5, a=21, b=45, c=31, group=230)
(id=4, a=42, b=65, c=2, group=230)
I would like to return these 2 rows:
[group=14, a=6,b=40] and
[group=230, a=21,b=45] (because id=5 > id=4)
Is there a simple SELECT statement to do this?

Try:
select grp, a, b
from table1 where id in
(select max(id) from table1 group by grp)

You can do it using a self join or an inner-select. Here's inner select:
select `group`, a, b from Table1 AS T1
where id=(select max(id) from Table1 AS T2 where T1.`group` = T2.`group`)
And self-join method:
select T1.`group`, T2.a, T2.b from
(select max(id) as id,`group` from Table1 group by `group`) T1
join Table1 as T2 on T1.id=T2.id

2 selects, your inner select gets:
SELECT MAX(id) FROM YourTable GROUP BY [GROUP]
Your outer select joins to this table.
Think about it logically, the inner select gets a sub set of the data you need.
The outer select inner joins to this subset and can get further data.
SELECT [group], a, b FROM YourTable INNER JOIN
(SELECT MAX(id) FROM YourTable GROUP BY [GROUP]) t
ON t.id = YourTable.id

SELECT mi.*
FROM (
SELECT DISTINCT grouper
FROM mytable
) md
JOIN mytable mi
ON mi.id =
(
SELECT id
FROM mytable mo
WHERE mo.grouper = md.grouper
ORDER BY
id DESC
LIMIT 1
)
If your table is MyISAM or id is not a PRIMARY KEY, then make sure you have a composite index on (grouper, id).
If your table is InnoDB and id is a PRIMARY KEY, then a simple index on grouper will suffice (id, being a PRIMARY KEY, will be implictly included).
This will use an INDEX FOR GROUP-BY to build the list of distinct groupers, and for each grouper it will use the index access to find the maximal id.

Don't know how to do it in mysql. But the following code will work for MsSQL...
SELECT Y.* FROM
(
SELECT DISTINCT [group], MAX(id) ID
FROM Table1
GROUP BY [group]
) X
INNER JOIN Table1 Y ON X.ID=Table1.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I add one more table to this hive command? - hive

Related

Joining with a distinct column

Find common values in many tables

oracle12c,sql,difference between count(*) and sum()

PostgreSQL: Error in left join

SQL Select Distinct with Conditional

Categories

Resources