Understanding hive in mapr - hive

Why below query is not working
select count(*) from db.full where substr(starttime,1,10) < (select min(cast(substr(starttime,1,10) as date)) as min_date from db.v9_N where substr(starttime,1,10) between '2018-10-01' and '2018-10-10');
Why below query is working?
select count(*) from db.full where substr(starttime,1,10) in (select min(cast(substr(starttime,1,10) as date)) as min_date from db.v9_N where substr(starttime,1,10) between '2018-10-01' and '2018-10-10');
please help me to understand?
How to make above query workable one?

Related

Dynamic dates queries in SQL

I have a big dataset and I want to make it shorter in order to make it easier for Power BI to read. What I need is to get data for only 6 months having my date variable as FechaCarga in MyTable, which is refreshed daily and has daily data.
Example:
select *
from Mytable
where FechaCarga between (
select max(FechaCarga)
from MyTable)
and
--THIS IS THE PART THAT IM MISSING, PROBABLY USING DATEADD.
I expect data from Today (MaxDate) and MaxDate - 6 months. Please help me.
Thanks in advance,
IC
Is this what you want?
select t.*
from (select t.*, max(fechacarga) over () as max_fechacarga
from mytable t
) t
where fechacarga > dateadd(month, -6, max_fechacarga);
Like you said, just use DATEADD(). Try current_date to get today's date. (not sure if all DBMS's support that)
select *
from Mytable
where
FechaCarga between
(select max(FechaCarga) from MyTable)
and dateadd(month, -6, current_date)
The easiest way since you are always looking until the max date is:
select *
from Mytable
where FechaCarga >= dateadd(month, -6, (select max(FechaCarga) from MyTable))

SQL - Delete all records as a result of subquery

I'm really struggling with an implementation solution here.
SELECT
mach_id,
value1,
CASE
WHEN value1 = 0 THEN lead(created_on) OVER (ORDER BY mach_id)
END,
created_on
FROM MyTable
WHERE
field_name='someValue' and
CAST(created_on AS DATE) = CAST(GETDATE() AS DATE)
I need to get the created_on date when the value1 is 0 and then get the lead record created_on date. Then take those dates and Delete all records in another tables where the created_on is between those two dates by mach_id.
I'm really at a loss for a solution here. Any suggestions?
I finally came up with the solution, I'm posting it in hopes it'll help someone else. Thanks everyone for the comments and suggestions.
Delete fe from MySecondTable fe join
(Select * from
(
Select mach_id, station_id,
lag(value1) over (partition by mach_id order by created_on) as shiftEndValue,
CASE WHEN value1=0 THEN lag(created_on) over (partition by mach_id order by created_on)
end as shiftEndTime,
value1, created_on
FROM MyFirstTable
where field_name='cur_trgt_cnt' and CAST(created_on AS DATE) = CAST(GETDATE() AS DATE)
)a
where shiftEndTime is not null
)b
on fe.mach_id= b.mach_id and fe.station_id=b.station_id
where fe.created_on between b.shiftEndTime and b.created_on

SQL BETWEEN AND operation on date is not working

This is the SQL query that I'm trying to execute:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp BETWEEN '2017-01-01' AND '2018-01-01')
ORDER BY max_timestamp DESC
As far as I know in a BETWEEN AND operation, both values are inclusive. Here, this query is unable to fetch records corresponding to 2018-01-01.
I changed the query to this:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-01')
ORDER BY max_timestamp DESC
Still, it's not working.
Then I tried this:
select *,count(dummy) over(partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-02')
ORDER BY max_timestamp DESC
It's able to fetch records related to 2018-01-01.
What could be the reason for this? and how can I fix this?
Thanks in advance.
This is your query:
select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp BETWEEN '2017-01-01' AND '2018-01-01'
order by max_timestamp DESC;
Simply don't use between with date times. Use explicit logic:
select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp >= '2017-01-01' and
max_timestamp < '2018-01-02' --> notice this is one day later
order by max_timestamp DESC;
The problem is that you have a time component on the date.
Aaron Bertrand explains this very well in his blog What do BETWEEN and the devil have in common? (I am amused by the title, given that BETWEEN definitely does exist, but there is more controversy about the existence of the devil.)
This is a known issue with Spark.
Please refer to this link for more info: https://issues.apache.org/jira/browse/SPARK-10837
I've fixed this issue by using the date_add function provided by spark.
so the last date was changed to date_add(endDate, 1) so that we'll get all the values including those corresponding to the last date.

Netezza not supporting sub query and similar... any workaround?

I'm sure this will be a very simple question for most of you, but it is driving me crazy...
I have a table like this (simplifying):
| customer_id | date | purchase amount |
I need to extract, for each day, the number of customers that made a purchase that day, and the number of customers that made at least a purchase in the 30 days previous to the current one.
I tried using a subquery like this:
select purch_date as date, count (distinct customer_id) as DAU,
count(distinct (select customer_id from table where purch_date<= date and purch_date>date-30)) as MAU
from table
group by purch_date
Netezza returns an error saying that subqueries are not supported, and that I should think to rewrite the query. But how?!?!?
I tried using case when statement, but did not work. In fact, the following:
select purch_date as date, count (distinct customer_id) as DAU,
count(distinct case when (purch_date<= date and purch_date>date-30) then player_id else null end) as MAU
from table
group by purch_date
returned no errors, but the MAU and DAU columns are the same (which is wrong).
Can anybody help me, please? thanks a lot
I don't beleive netezza supports subqueries in the select line...move to the from statement
select pur_date as date, count(distinct customer_id) as DAU
from table
group by purch_date
select pur_date as date, count (distinct customer_ID) as MAU
from table
where purch_date<= date and purch_date>date-30
group by purch_date
I hope thats right for MAU and DAU. join them to get the results combined:
select a.date, a.dau, b.mau
from
(select pur_date as date, count(distinct customer_id) as DAU
from table
group by purch_date) a
left join
(select pur_date as date, count (distinct customer_ID) as MAU
from table
where purch_date<= date and purch_date>date-30
group by purch_date) b
on b.date = a.date
I got it finally :) For all interested, here is the way I solved it:
select a.date_dt, max(a.dau), count(distinct b.player_id)
from (select dt.cal_day_dt as date_dt,
count(distinct s.player_id) as dau
FROM IA_PLAYER_SALES_HOURLY s
join IA_DATES dt on dt.date_key = s.date_key
group by dt.cal_day_dt
order by dt.cal_day_dt
) a
join (
select dt.cal_day_dt as date_dt,
s.player_id as player_id
FROM IA_PLAYER_SALES_HOURLY s
join IA_DATES dt on dt.date_key = s.date_key
order by dt.cal_day_dt
) b on b.date_dt <= a.date_dt and b.date_dt > a.date_dt - 30
group by a.date_dt
order by a.date_dt;
Hope this is helpful.

help optimize sql query

I have tracking table tbl_track with id, session_id, created_date fields
I need count unique session_id for one day
here what i got:
select count(0)
from (
select distinct session_id
from tbl_track
where created_date between getdate()-1 and getdate()
group by session_id
)tbl
im feeling that it could be better solution for it
select count(distinct session_id)
from tbl_track
where created_date between getdate()-1 and getdate()
Why not just do exactly what you ask for?
select count(distinct session_id)
from tbl_track
where created_date between getdate()-1 and getdate()