HIve join with a where query - hive

I basically want to to a cross join with my store-product master list with the calendar table that has all possible dates. However, i want to filter for a year (365 days) before making the join with the master list.
I am trying the following query -
select * from ( select a.store_id,a.product_id from mez_2018_store_product_lst) a cross join
(select b.day_id,cast(to_date(from_unixtime(unix_timestamp(b.day_date, 'yyyy-MM-dd'))) as b.date from calendar where day_id>=20170101 and day_id<=20180101 ) b
And I keep getting EOF error.
Can you guys help ?

Try with below query:
hive> select * from
(select store_id,
product_id from mez_2018_store_product_lst) a
cross join
(select day_id,
to_date(from_unixtime(unix_timestamp(day_date, 'yyyy-MM-dd')))dt from calendar
where day_id>=20170101 and day_id<=20180101 ) b;

Related

My SQL LEFT JOIN statement has multiple matches on the table on the right. I would only like to return a single match containing the next date

My SQL statement:
SELECT c.*, s.followup FROM clients c LEFT JOIN scans s ON c.id=s.client_id
The scans table joins to the clients table on the client_id column. The scans table can have multiple entries for each client and has a followup column containing dates. I would like to return only the scan that has the date closest to today.
When I attempted this using a WHERE condition, it eliminated items from the left table that didn't have a followup date.
If you use a WHERE in a subquery, you can select the rows you want to include:
SELECT c.*, s.followup
FROM clients c
LEFT JOIN (
SELECT client_id, MIN(followup)
FROM scans
WHERE followup > CURDATE()
GROUP BY client_id
) s
ON c.id=s.client_id
SELECT c.*, s.min
FROM clients c
LEFT JOIN (
SELECT client_id, MIN(followup)
FROM scans
WHERE followup > CURRENT_DATE
GROUP BY client_id
) AS s
ON c.id=s.client_id
The only change was s.followup to s.min in line 1.

Avoid using CROSS JOIN on my SQL query (too heavy)

I am working on an SQL query in order to define customer types, the goal is to differenciate the old active customers from the churn customers (churn = customers that stopped using your company's product or service during a certain time frame)
In order to do that, i came up with this query that works perfectly :
WITH customers AS (
SELECT
DATE(ord.delivery_date) AS date,
ord.customer_id
FROM table_template AS ord
WHERE cancel_date IS NULL
AND order_type_id IN (1,3)
GROUP BY DATE(ord.delivery_date), ord.customer_id, ord.delivery_date),
days AS (SELECT DISTINCT date FROM customers),
recap AS (
SELECT * FROM (
SELECT
a1.date,
a2.customer_id,
MAX(a2.date) AS last_order,
DATE_DIFF(a1.date, MAX(a2.date), day) AS days_since_last,
MIN(a2.date) AS first_order,
DATE_DIFF(a1.date, MIN(a2.date), day) AS days_since_first
FROM days AS a1
CROSS JOIN customers AS a2 WHERE a2.date <= a1.date
GROUP BY a1.date, customer_id)
)
SELECT * FROM recap
The result of the query :
The only issue of this query is that the calculation is too heavy (it uses a lot of CPU seconds) I think that it is due to the CROSS JOIN.
I need some of your help in order to find another way to come with the same result, a way that doesn't need a CROSS JOIN to have the same output, do you guys think it is possible ?
As you have mentioned the problem of query taking a long time to load was because of the internet issue. Also, I will try to explain Inner Join further with a sample query as below:
SELECT distinct a1.id,a1.date
FROM `table1` AS a1
INNER JOIN `table2` AS a2
ON a2.date <= a1.date
The INNER JOIN selects all rows from both the tables as long as the condition satisfies. In this sample query it gives the result based on condition a2.date <= a1.date only if date values in table1 are greater than or equal to date values in table2.
Input Table 1:
Input Table 2:
Output Table:

Pivot table in SQL Server results in error

Currently I'm trying to learn on pivot table, here is my table diagram.
I want to generate data row in branch name and column with month with sum total in sales.
SELECT *
FROM
(SELECT
BRANCH.NAME, SALES.TOTAL, TIME.MONTH
FROM
SALES
INNER JOIN
BRANCH ON SALES.BRANCH_ID = BRANCH.BRANCH_ID
INNER JOIN
TIME ON SALES.TIME_ID = TIME.TIME_ID
) AS TABLE1
PIVOT (
SUM(SALES.TOTAL) FOR TIME.MONTH IN ([APR],[MAY],[JUN])
) PIVOTTABLE
it shows an error:
The column prefix 'SALES' does not match with a table name or alias name used in the query.
Is it my table structure got problem or just my query are wrong?
Remove Sales and Time prefix or use TABLE1 instead:
PIVOT (
SUM(TOTAL) FOR MONTH IN ([APR],[MAY],[JUN])
) PIVOTTABLE
Try this:
SELECT * FROM
(
SELECT BRANCH.NAME,SALES.TOTAL,TIME.MONTH
FROM SALES
INNER JOIN BRANCH
ON SALES.BRANCH_ID=BRANCH.BRANCH_ID
INNER JOIN TIME
ON SALES.TIME_ID=TIME.TIME_ID
)AS TABLE1
PIVOT (
SUM(TABLE1.TOTAL) FOR TABLE1.MONTH IN ([APR],[MAY],[JUN])
) PIVOTTABLE

sql optimize a query using the join

I have a table productHistory
productHistory (id_H , id_product , name , tsInsert);
I wanna get from the table productHistory the last product in the giving period (start, end):
tsInsert must be between the start and the end.
I can do like this:
select max(id_H)
from productHistory
where tsInsert>=:start and tsInsert <=:end
group by id_product;
then select all from productHistory where id_H in the previous selection.
This query is very heavy, is there any other solution using the right join for example?
I tried this solution:
SELECT * FROM productHistory x
INNER JOIN
(
SELECT MAX(id_H) as maxId
FROM productHistory
GROUP id_product
) y
ON x.id_H = y.maxId
and x.TSINSERT >=:start and x.TSINSERT <=:end

SQL: Left Outer Join, different GROUP BY, need to replicate records ?

I have information about accounts in two tables (A, B).
The records in A are all unique at the account level (account_id), but in table B, accounts are identified by account_id and month_start_dt, so each account may exist in zero or more months.
The trouble is, when I left outer join A to B so the joined table contains all records from A with the records from B (by account, by month) any account that does not exist in table B for a given month does not have a record for that month.
Desired outcome: If an account does not exist in table B for a given month, create a record for that account in the joined table with month_start_dt and 0 for all variables being selected from B.
As it stands, I can get the join to work where all accounts not appearing in B (not appearing at all, in any month) have 0 values for all variables being selected from B (using nvl(variable, 0) ) but, these accounts only have a single record. They should have one for each month.
Create a temp table with number of records you want for not-existing rows and right join the result of first query.
select tbl.* from ( select * from A left join B on a.col1 = b.col2) tbl join tmpTable on tbl.col2 = tmpTable.zerocol
try this.
I don't see why you need an outer join. This uses Standard SQL's EXCEPT (MINUS in Oracle):
SELECT account_id, month_start_dt, all_variables
FROM B
UNION
(
SELECT account_id, month_start_dt, 0 AS all_variables
FROM A
CROSS JOIN (
SELECT DISTINCT month_start_dt
FROM B
) AS DT1
EXCEPT
SELECT account_id, month_start_dt, 0 AS all_variables
FROM B
);
You could use a tally Calendar table, with months (of several years). See this similar question: How to create a Calender table for 100 years in Sql
And then have:
FROM
A
CROSS JOIN
( SELECT y
, m
FROM Calendar
WHERE ( y = #start_year
AND m >= #start_month
)
OR ( y > #start_year
AND y < #end_year
)
OR ( y = #end_year
AND m <= #end_month
)
) AS C
LEFT JOIN
B
ON B.account_id = A.account_id
AND YEAR(B.start_date) = C.y
AND MONTH(B.start_date) = C.m