Why this subquery hiveql not work?

Why this subquery hiveql not work? - hive

select (
SELECT SUM(countNumber) from(
SELECT FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd') AS dateTime,
COUNT(productID) AS countNumber
FROM product
GROUP BY FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd')
) as bb
where aa.dateTime >= bb.dateTime
) as totalCount,
aa.countNumber,
aa.dateTimefrom (
SELECT FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd')AS dateTime,
COUNT(productID) AS countNumber
FROM product
GROUP BY FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd')
) aa order by dateTime desc limit 10000;
I wan't to query daily cumulative quantity.Why this HQL can't work?The Hive engine hint that:
FAILED: ParseException line 2:8 cannot recognize input near 'SELECT' 'SUM' '(' in expression specification

Sub queries as columns aren't allowed in Hive. To get the cumulative quantity, use the window function sum.
SELECT dateTime,countNumber,SUM(countNumber) OVER(order by dateTime) as cumsum
FROM (SELECT FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd') AS dateTime,
COUNT(productID) AS countNumber
FROM product
GROUP BY FROM_UNIXTIME(unix_timestamp(AddTime),'yyyy-MM-dd')
) t

Related

SQLite Getting multiple results with LIMIT 1

I have the following problem.
Part of a task is to determine the visitor(s) with the most money spent between 2000 and 2020.
It just looks like this.
SELECT UserEMail FROM Visitor
JOIN Ticket ON Visitor.UserEMail = Ticket.VisitorUserEMail
where Ticket.Date> date('2000-01-01') AND Ticket.Date < date ('2020-12-31')
Group by Ticket.VisitorUserEMail
order by SUM(Price) DESC;
Is it possible to output more than one person if both have spent the same amount?

Use rank():
SELECT VisitorUserEMail
FROM (SELECT VisitorUserEMail, SUM(PRICE) as sum_price,
RANK() OVER (ORDER BY SUM(Price) DESC) as seqnum
FROM Ticket t
WHERE t.Date >= date('2000-01-01') AND Ticket.Date <= date('2021-01-01')
GROUP BY t.VisitorUserEMail
) t
WHERE seqnum = 1;
Note: You don't need the JOIN, assuming that ticket buyers are actually visitors. If that assumption is not true, then use the JOIN.

Use a CTE that returns all the total prices for each email and with NOT EXISTS select the rows with the top total price:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT c.VisitorUserEMail
FROM cte c
WHERE NOT EXISTS (
SELECT 1 FROM cte
WHERE SumPrice > c.SumPrice
)
or:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT VisitorUserEMail
FROM cte
WHERE SumPrice = (SELECT MAX(SumPrice) FROM cte)
Note that you don't need the function date() because the result of date('2000-01-01') is '2000-01-01'.
Also I think that the conditions in the WHERE clause should include the =, right?

Tweaking a Query - looking for duplicates within a certain day range

I posted a question similar to this, and got an answer, but the answer isn't configurable - my fault I should have been more clear, so I'll try again.
I have a table where TABLENAME has the following information - OrderDate, OrderNumber, CustomerID, ProductSKU, ProductName exist. This table has lines for invoices. So an order will have a data line for every item in the order.
I want to know, which customers have ordered the same item, more than once, where the order is within 90 of any other order of that same product by that customer, after a specific date. Same product in the same order number do not count. The catch is that I want "more than once" to be configurable, so if I need to see 3 or more, or 4 or more I can adjust AND I want to see the counts. Here's the query I have so far, which I think gives me the items and the counts - but not the 90 day thing:
EDITED: I don't think the former version gave me the right counts
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >'2017-11-01'
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 2

Try doing this. it'll go back 90 days
declare #date date = '2017-11-01'
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >= dateadd(DD,-90,#date) and orderdate <= #date
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 1

yes that is what I was doing in the first query. so this might be a really crappy way of doing it but without seeing any data it was kind of tough. this query shows gives you the order dates as well. hope it helps
WITH DupsWithin90Days (customerid,productsku,productname,orderdate,num)
as
(
select customerid,productsku,productname,orderdate ,count(*) num from (
SELECT X.customerid, X.productsku, X.productname,X.ORDERDATE,ROW_NUMBER() OVER (partition by x.customerid,x.orderdate order by x.orderdate) rownum
FROM
(
SELECT T1.customerid, T1.productsku, T1.productname,T1.ORDERDATE
FROM TABLENAME1 T1
) X
JOIN
(
SELECT T2.customerid, T2.productsku, T2.productname,T2.ORDERDATE
FROM
TABLENAME1 T2
) Y
ON X.customerid = Y.customerid AND X.orderdate >= dateadd(DD,-90,Y.orderdate)
) dup
where rownum > 1
group by customerid,productsku,productname,orderdate
)
select customerid,productsku,productname,orderdate
from DupsWithin90Days
order by customerid ,orderdate desc

How to compute cumulative product in SQL Server 2008?

I have below table with 2 columns, DATE & FACTOR. I would like to compute cumulative product, something like CUMFACTOR in SQL Server 2008.
Can someone please suggest me some alternative.

Unfortunately, there's not PROD() aggregate or window function in SQL Server (or in most other SQL databases). But you can emulate it as such:
SELECT Date, Factor, exp(sum(log(Factor)) OVER (ORDER BY Date)) CumFactor
FROM MyTable

You can do it by:
SELECT A.ROW
, A.DATE
, A.RATE
, A.RATE * B.RATE AS [CUM RATE]
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY DATE) as ROW, DATE, RATE
FROM TABLE
) A
LEFT JOIN (
SELECT ROW_NUMBER() OVER(ORDER BY DATE) as ROW, DATE, RATE
FROM TABLE
) B
ON A.ROW + 1 = B.ROW

To calculate the cumulative product, as displayed in the CumFactor column in the original post, the following code does the job:
--first, load the sample data to a temp table
select *
into #t
from
(
values
('2/3/2000', 10),
('2/4/2000', 20),
('2/5/2000', 30),
('2/6/2000', 40)
) d ([Date], [Rate]);
--next, calculate cumulative product
select *, CumFactor = cast(exp(sum(log([Rate])) over (order by [Date])) as int) from #t;
Here is the result:

SQL Server 2008 calculating data difference when we have only one date column

I have a date column Order_date and I am looking for ways to calculate the date difference between customer last order date and his recent previous ( previous form last) order_date ....
Example
Customer : 1, 2 , 1 , 1
Order_date: 01/02/2007, 02/01/2015, 06/02/2014, 04/02/2015
As you can see customer # 1 has three orders.
I want to know the date difference between his recent order date (04/02/2015) and his recent previous (06/02/2014).

For SQL Server 2012 & 2014 you could use LAG with a DATEDIFF to see the number of days between them.
For older versions, a CTE would probably be your best bet:
;WITH CTE AS
(
SELECT CustomerID,
Order_Date,
rn = ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Order_Date DESC)
)
SELECT c1.CustomerID,
DATEDIFF(d, c1.Order_Date, c2.Order_Date)
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1

In SQL Server 2012+, you can use lag() to get the difference between any two dates:
select t.*,
datediff(day, lag(order_date) over (partition by customer order by order_date),
order_date) as days_dff
from table t;
If you have an older version, you can do something similar with correlated subqueries or outer apply.
EDIT:
If you just want the difference between the two most recent dates, use conditional aggregation instead:
select customer,
datediff(day, max(case when seqnum = 2 then order_date end),
max(case when seqnum = 1 then order_date end)
) as MostRecentDiff
from (select t.*,
row_number() over (partition by customer order by order_date desc) as seqnum
from table t
) t
group by customer;

If you're using SQL Server 2008 or later, you can try CROSS APPLY.
SELECT [customers].[customer_id], DATEDIFF(DAY, MIN([recent_orders].[order_date]), MAX([recent_orders].[order_date])) AS [elapsed]
FROM [customers]
CROSS APPLY (
SELECT TOP 2 [order_date]
FROM [orders]
WHERE ([orders].[customer_id] = [customers].[customer_id])
) [recent_orders]
GROUP BY [customers].[customer_id]

SELECT DATEDIFF(DAY, Y.PrevLastOrderDate, Y.LastOrderDate) AS PreviousDays
FROM
(
SELECT X.LastOrderDate
, (SELECT MAX(OrderDate) FROM dbo.Orders SO WHERE SO.CustomerID=1 AND SO.OrderDate < X.LastOrderDate) AS PrevLastOrderDate
FROM
(
select MAX(OrderDate) AS LastOrderDate
FROM dbo.Orders O
WHERE O.CustomerID=1
)X
)Y

drop table #Invoices
create table #Invoices ( OrderId int , OrderDate datetime )
insert into #Invoices (OrderId , OrderDate )
select 101, '01/01/2001' UNION ALL Select 202, '02/02/2002' UNION ALL Select 303, '03/03/2003'
UNION ALL Select 808, '08/08/2008' UNION ALL Select 909, '09/09/2009'
;
WITH
MyCTE /* http://technet.microsoft.com/en-us/library/ms175972.aspx */
( OrderId,OrderDate,ROWID) AS
(
SELECT
OrderId,OrderDate
, ROW_NUMBER() OVER ( ORDER BY OrderDate ) as ROWID
FROM
#Invoices inv
)
SELECT
OrderId,OrderDate
,(Select Max(OrderDate) from MyCTE innerAlias where innerAlias.ROWID = (outerAlias.ROWID-1) ) as PreviousOrderDate
,
[MyDiff] =
CASE
WHEN (Select Max(OrderDate) from MyCTE innerAlias where innerAlias.ROWID = (outerAlias.ROWID-1) ) iS NULL then 0
ELSE DATEDIFF (mm, OrderDate , (Select Max(OrderDate) from MyCTE innerAlias where innerAlias.ROWID = (outerAlias.ROWID-1) ) )
END
, ROWIDMINUSONE = (ROWID-1)
, ROWID as ROWID_SHOWN_FOR_KICKS , OrderDate as OrderDateASecondTimeForConvenience
FROM
MyCTE outerAlias
ORDER BY outerAlias.OrderDate Desc , OrderId

writing a sql query in MySQL with subquery on the same table

I have a table svn1:
id | date | startdate
23 2002-12-04 2000-11-11
23 2004-08-19 2005-09-10
23 2002-09-09 2004-08-23
select id,startdate from svn1 where startdate>=(select max(date) from svn1 where id=svn1.id);
Now the problem is how do I let know the subquery to match id with the id in the outer query. Obviously id=svn1.id wont work. Thanks!
If you have the time to read more:
This really is a simplified version of asking what I really am trying to do here. my actual query is something like this
select
id, count(distinct archdetails.compname)
from
svn1,svn3,archdetails
where
svn1.name='ant'
and svn3.name='ant'
and archdetails.name='ant'
and type='Bug'
and svn1.revno=svn3.revno
and svn3.compname=archdetails.compname
and
(
(startdate>=sdate and startdate<=edate)
or
(
sdate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
and
edate>=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
or
(
sdate>=startdate
and
edate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
)
group by id LIMIT 0,40;
As you notice select max(date) from svn1 where type='Bug' and id=svn1.id has to be calculated many times.
Can I just calculate this once and store it using AS and then use that variable later. Main problem is to correct id=svn1.id so as to correctly equate it to the id in the outer table.

I'm not sure you can eliminate the repetition of the subquery, but the subquery can reference the main query if you use a table alias, as in the following:
select id,
count(distinct archdetails.compname)
from svn1 s1,
svn3 s3,
archdetails a
where s1.name='ant' and
s3.name='ant' and
a.name='ant' and
type='Bug' and
s1.revno=s3.revno and
s3.compname = a.compname and
( (startdate >= sdate and startdate<=edate) or
(sdate <= (select max(date)
from svn1
where type='Bug' and
id=s1.id and
edate>=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) or
(sdate >= startdate and edate<=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) )
group by id LIMIT 0,40;
Share and enjoy.

You should be able to left join to a sub-select so you only run the query once. Then you can do a join condition to pull out the maximum for the ID on each record as shown below:
SELECT id,
COUNT(DISTINCT archdetails.compname)
FROM svn1,
svn3,
archdetails
LEFT JOIN (
SELECT id, MAX(date) AS MaximumDate
FROM svn1
WHERE TYPE = 'Bug'
GROUP BY id
) AS MaxDate ON MaxDate.id = svn1.id
WHERE svn1.name = 'ant'
AND svn3.name = 'ant'
AND archdetails.name = 'ant'
AND TYPE = 'Bug'
AND svn1.revno = svn3.revno
AND svn3.compname = archdetails.compname
AND (
(startdate >= sdate AND startdate <= edate)
OR (
sdate <= MaxDate.MaximumDate
AND edate >= MaxDate.MaximumDate
)
OR (
sdate >= startdate
AND edate <= MaxDate.MaximumDate
)
)
GROUP BY
id LIMIT 0,
40;

Try using alias, something like this should work:
select s.id,s.startdate from svn1.s where s.startdate>=(select max(date) from svn1.s2 where s.id=s2.id);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why this subquery hiveql not work? - hive

Related

SQLite Getting multiple results with LIMIT 1

Tweaking a Query - looking for duplicates within a certain day range

How to compute cumulative product in SQL Server 2008?

SQL Server 2008 calculating data difference when we have only one date column

writing a sql query in MySQL with subquery on the same table

Categories

Resources