Hive - select rows within 1 year of earliest date - sql

I am trying to select all rows in a table that are within 1 year of the earliest date in the table. I'm using the following code:
select *
from baskets a
where activitydate < (select date_add((select min(activitydate) mindate_a from baskets), 365) date_b from baskets)
limit 10;
but get the following error message:
Error while compiling statement: FAILED: ParseException line 1:55 cannot recognize input near 'select' 'date_add' '(' in expression specification
Total execution time: 00:00:00.338
Any suggestions?
EDIT:
With this code:
select *
from baskets a
where activitydate < (select date_add(min(activitydate), 365) from baskets)
limit 10;
I'm getting this error:
Error while compiling statement: FAILED: ParseException line 1:55 cannot recognize input near 'select' 'date_add' '(' in expression specification

I'd be tempted to use window functions:
select b.*
from (select b.*, min(activity_date) as min_ad
from baskets b
) b
where activity_date < add_months(min_ad, 12);
If you really want your syntax to work, try reducing the number of selects:
where activitydate < (select date_add(min(activitydate), 365) from baskets)

Use JOINs instead of select in Sub-query. I don't think Hive supports select in where clause with < condition. Only IN and EXISTS could be used as of Hive 0.13.
: Language Manual SubQueries
SELECT a.*
FROM baskets a
JOIN (SELECT DATE_ADD(MIN(b.activitydate), 365) maxdate
FROM baskets) b
ON a.activitydate < b.maxdate
LIMIT 10;

Related

How to write DELETE Statement with Inner Query in PostgreSQL?

The below query works fine in Oracle DB.
The same query fails in Postgres DB, but the inner query works fine in Postgres DB.
DELETE FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY A.RESORT, A.RESV_NAME_ID ORDER BY ACTION_ID DESC) RNK
FROM STAGE_DETAILS A ) B
WHERE B.RNK>1;
I am getting syntax error for this.
Error :-
[Code: 0, SQL State: 42601] ERROR: syntax error at or near "(" Position: 13 [Script position: 3109 - 3110]
How to fix this ?
Thanks.
use following syntax
DELETE FROM STAGE_DETAILS A
USING (SELECT ACTION_ID,RESV_NAME_ID,ROW_NUMBER() OVER (PARTITION BY RESORT, RESV_NAME_ID ORDER BY ACTION_ID DESC) RNK
FROM STAGE_DETAILS) B
WHERE B.RNK>1 AND B.ACTION_ID = A.ACTION_ID AND B.RESV_NAME_ID = A.RESV_NAME_ID;
You cannot delete from a subquery in Postgres. This should do what you want:
DELETE FROM STAGE_DETAILS SD
WHERE ACTION_ID < (SELECT MAX(SD2.ACTION_ID)
FROM STAGE_DETAILS SD2
WHERE SD2.RESORT = SD.RESORT AND
SD2.RESV_NAME_ID = SD.RESV_NAME_ID
);

how to run a different select statement based on condition in Hive SQL

I would like to know how to run a different select statement based on condition in Hive SQL.
The following query does not work but throws an error.
Error while compiling statement: FAILED: ParseException line 4:2
cannot recognize input near '(' 'SELECT' '1' in expression
specification
SELECT
CASE WHEN '${UN}'!= '' THEN
(
SELECT *
from table1 t
WHERE t.yymmddval BETWEEN '${D1}' AND '${D2}'
AND t.un in ('${UN}')
)
ELSE
(
SELECT *
from table1 t
WHERE t.yymmddval BETWEEN '${D1}' AND '${D2}'
AND t.un in (
(SELECT
o.unq_num as un
FROM table2 as o
WHERE o.date >= '2017-01-01'
AND upper(o.srl_num) in ('${R}')
LIMIT 1)
)
)
END
Use UNION ALL with your queries + add conditions for switching corresponding query:
select *
from table1 t
where (t.yymmddval BETWEEN '${D1}' and '${D2}')
and t.un in ('${UN}')
and '${UN}'!= '' --switching condition
union all
select *
from table1 t
where (t.yymmddval BETWEEN '${D1}' AND '${D2}')
and t.un in
(SELECT
o.unq_num as un
FROM table2 as o
WHERE o.date >= '2017-01-01'
AND upper(o.srl_num) in ('${R}')
LIMIT 1)
and '${UN}'= '' --switching condition

'LAG' function not working in Amazon Redshift

I'm trying to find out the retention rate by using the following query on Amazon Redshift:
WITH t AS (
SELECT ga.ownerid,
DATE_PART('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_PART('month',ga.creationtime)) OVER (PARTITION BY ownerid ORDER BY DATE_PART('month',ga.creationtime)) = DATE_PART('month',ga.creationtime) -interval '1 month' OR NULL AS repeat_transaction
FROM flx2.groupactivities ga
JOIN auth.members m ON ga.ownerid = m.id
WHERE ga.activitytype = 'assign'
AND ga.groupid NOT IN (SELECT groupid
FROM (SELECT groupid,
COUNT(DISTINCT memberid)
FROM flx2.grouphasmembers
GROUP BY groupid
HAVING COUNT(DISTINCT memberid) = 1))
AND ga.ownerid IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 5)
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
GROUP BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
ORDER BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
but it gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT ga.ownerid,
DATE_TRUNC('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_TRUNC('month...
[Amazon](500310) Invalid operation: ORDER/GROUP BY expression not found in targetlist;
Execution time: 0.29s
1 statement failed.
I believe there's something wrong with the LAG function here, but I'm not quite sure. I got this query from the post here and I modified it according to my requirements.
Would someone please be able to help me out with what's going wrong here?
I appreciate the help in advance.
A quick look, but lag by itself is not an aggregate function so repeat_transaction would need to be included in the group by.

HIVE: Error in GROUP BY Key

hive -e "select a.EMP_ID,
count(distinct c.SERIAL_NBR) as NUM_CURRENT_EMP,
count(distinct c.SERIAL_NBR)/count(distinct a.SERIAL_NBR) as DISTINCT_EMP
from ORDERS_COMBINED_EMPLOYEES as a
inner join ORDERS_EMPLOYEE_STATS as b
on a.CPP_ID = b.CPP_ID
left join ( select SERIAL_NBR, MIN(TRAN_DT) as TRAN_DT
from EMP_TXNS
group by SERIAL_NBR
) c
on c.SERIAL_NBR = a.SERIAL_NBR
where c.TRAN_DT > a.LAST_TXN_DT
group by a.EMP_ID
having (
(NUM_CURRENT_EMP >= 25 and DISTINCT_EMP > 0.01)
) ; " > EMPLOYEE_ORDERS.txt
Getting error message,
"FAILED: SemanticException [Error 10025]: Line 15:31 Expression not in GROUP BY key '0.01'".
When I ran the same query with just one condition in HAVING clause as NUM_CURRENT_EMP >= 25, the query ran fine without any issues. NUM_CURRENT_EMP is a int type and DISTINCT_EMP is float in the table where I am trying to insert the results. Breaking my head.
Any help is appreciated.
What happens if you replace the aliases in the having with the expressions that define them?
having count(distinct c.SERIAL_NBR) >= 25 and
count(distinct c.SERIAL_NBR)/count(distinct a.SERIAL_NBR) > 0.01

Join Oracle tables on an exact match, and a closest match

I am trying to join two tables of performance metrics, system stats and memory usage. Entries in these tables come in on differing time schedules. I need to join the tables by finding the exact match for the System_Name in both tables, and the closest for WRITETIME. Write time uses the systems own idea of time and is NOT a standard Oracle timestamp.
I can select the closest timestamp from one table with something like:
select "Unix_Memory"."WRITETIME", ABS ('1140408134015004' - "Unix_Memory"."WRITETIME")
as Diff from "Unix_Memory"
where "Unix_Memory"."WRITETIME" > '1140408104015004' order by Diff;
The constants there will be parameterised in my script.
However when I try to expand this into my larger query:
select "System"."System_Name", "System"."WRITETIME" as SysStamp,
from "System"
join "Unix_Memory" on "System"."System_Name" = "Unix_Memory"."System_Name"
and "Unix_Memory"."WRITETIME" = (
select Stamp from (
select "Unix_Memory"."WRITETIME" as Stamp,
ABS ( "System"."WRITETIME" - "Unix_Memory"."WRITETIME") as Diff
from "Unix_Memory" where "Unix_Memory"."WRITETIME" > '1140408104015004' and rownum = 1 order by Diff
)
)
WHERE "System"."System_Name" in ('this','that', 'more')
and "System"."WRITETIME" > '1140408124015004';
I get:
Error at Command Line:38 Column:72
Error report:
SQL Error: ORA-00904: "System"."WRITETIME": invalid identifier
00904. 00000 - "%s: invalid identifier"
I have tried a few variations, but I am not getting any closer.
You must state the System table in the inner Select as well.
select "System"."System_Name", "System"."WRITETIME" as SysStamp,
from "System"
join "Unix_Memory" on "System"."System_Name" = "Unix_Memory"."System_Name"
and "Unix_Memory"."WRITETIME" = (
select Stamp from (
select "Unix_Memory"."WRITETIME" as Stamp,
ABS ( "System"."WRITETIME" - "Unix_Memory"."WRITETIME") as Diff
from "Unix_Memory"
-- THE NEXT LINE IS MISSING IN YOUR CODE
INNER JOIN "System" ON "System.System_Name" = "Unix_Memory"."System_Name"
and "System"."WRITETIME" > '1140408124015004'
-- end of missing
where "Unix_Memory"."WRITETIME" > '1140408104015004' and rownum = 1 order by Diff
)
)
WHERE "System"."System_Name" in ('this','that', 'more')
and "System"."WRITETIME" > '1140408124015004';
Unfortunately the column names are only known in the next nesting level. So System.writetime would be known in select Stamp from ..., but no more in select "Unix_Memory"."WRITETIME" as Stamp ...
Anyhow, you would select a rather random stamp anyhow, the first Unix_Memory"."WRITETIME" > '1140408104015004' found to be precise, because rownum = 1 gets executed before order by. You will have to re-write your statement completely.
EDIT: Here is one possibility to re-write the statement using MIN/MAX KEEP:
select
s.system_name,
s.writetime as sysstamp,
min(um.id) keep (dense_rank first order by abs(s.writetime - um.writetime)) as closest_um_id
from system sys
join unix_memory um on s.system_name = um.system_name
where s.system_name in ('this','that', 'more')
and s.writetime > '1140408124015004'
and um.writetime > '1140408104015004'
group by s.system_name, s.writetime
order by s.system_name, s.writetime;
If you need more than just the ID of unix_memory then surround this with another select:
select
sy.system_name,
sy.sysstamp,
mem.*
from
(
select
s.system_name,
s.writetime as sysstamp,
min(um.id) keep (dense_rank first order by abs(s.writetime - um.writetime)) as closest_um_id
from system sys
join unix_memory um on s.system_name = um.system_name
where s.system_name in ('this','that', 'more')
and s.writetime > '1140408124015004'
and um.writetime > '1140408104015004'
group by s.system_name, s.writetime
) sy
join unix_memory mem on mem.id = sy.closest_um_id
order by sy.system_name, sy.sysstamp;