Netezza not supporting sub query and similar... any workaround? - sql

I'm sure this will be a very simple question for most of you, but it is driving me crazy...
I have a table like this (simplifying):
| customer_id | date | purchase amount |
I need to extract, for each day, the number of customers that made a purchase that day, and the number of customers that made at least a purchase in the 30 days previous to the current one.
I tried using a subquery like this:
select purch_date as date, count (distinct customer_id) as DAU,
count(distinct (select customer_id from table where purch_date<= date and purch_date>date-30)) as MAU
from table
group by purch_date
Netezza returns an error saying that subqueries are not supported, and that I should think to rewrite the query. But how?!?!?
I tried using case when statement, but did not work. In fact, the following:
select purch_date as date, count (distinct customer_id) as DAU,
count(distinct case when (purch_date<= date and purch_date>date-30) then player_id else null end) as MAU
from table
group by purch_date
returned no errors, but the MAU and DAU columns are the same (which is wrong).
Can anybody help me, please? thanks a lot

I don't beleive netezza supports subqueries in the select line...move to the from statement
select pur_date as date, count(distinct customer_id) as DAU
from table
group by purch_date
select pur_date as date, count (distinct customer_ID) as MAU
from table
where purch_date<= date and purch_date>date-30
group by purch_date
I hope thats right for MAU and DAU. join them to get the results combined:
select a.date, a.dau, b.mau
from
(select pur_date as date, count(distinct customer_id) as DAU
from table
group by purch_date) a
left join
(select pur_date as date, count (distinct customer_ID) as MAU
from table
where purch_date<= date and purch_date>date-30
group by purch_date) b
on b.date = a.date

I got it finally :) For all interested, here is the way I solved it:
select a.date_dt, max(a.dau), count(distinct b.player_id)
from (select dt.cal_day_dt as date_dt,
count(distinct s.player_id) as dau
FROM IA_PLAYER_SALES_HOURLY s
join IA_DATES dt on dt.date_key = s.date_key
group by dt.cal_day_dt
order by dt.cal_day_dt
) a
join (
select dt.cal_day_dt as date_dt,
s.player_id as player_id
FROM IA_PLAYER_SALES_HOURLY s
join IA_DATES dt on dt.date_key = s.date_key
order by dt.cal_day_dt
) b on b.date_dt <= a.date_dt and b.date_dt > a.date_dt - 30
group by a.date_dt
order by a.date_dt;
Hope this is helpful.

Related

SQL: Difference between consecutive rows

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:
You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.
May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

How to summarize information over the dynamic period in sql?

I have a table with orders and the following fields:
create table orders2 (
orderID int,
customerID int,
date DateTime,
amount int)
engine=Memory;
Each customer can make 0 or many orders each day. I need to create an SQL query that will show for each customer how many orders he/she made during the period of 3 days starting from the day when the customer has made his/her first order.
So, for each customer, the query should detect the date of the first order, then compute the date that is 3 days in the future from the first date, then filter rows to take only orders with dates in the given range, and then perform counting of orders (orderID) in that time period. At the moment, I was able to just detect the date of the first order for each customer.
SELECT
O.customerID,
O.date AS first_day,
COUNT(O.orderID) AS first_day_orders_num,
SUM(O.amount) AS first_day_amount
FROM orders2 AS O
INNER JOIN
(
SELECT
customerID,
MIN(date) AS first_date
FROM orders2
GROUP BY customerID
) AS I ON (O.customerID = I.customerID) AND (O.date = I.first_date)
GROUP BY
O.customerID,
O.date
I don't really understand what result do you need. Probably it can be solved using arrays.
Here is solution using vanilla sql
select customerID, min(first_date), sum(num_orders_per_day)
from (
select customerID, date, min(date) first_date, count() num_orders_per_day
from orders2
group by customerID, date
having date <= first_date + interval 3 days
)
group by customerID
You can use window functions to get the first order date:
select o.CustomerID, count(*) as num_orders_3_days
from (select o.*, min(date) over (partition by CustomerID) as min_date
from orders o
) o
where date < min_date + interval '3 day'
group by CustomerID;
Try this query:
SELECT customerID, orders_count
FROM (
SELECT customerID,
arraySort(x -> x.1, groupArray((date, orderID))) sorted_date_per_order_pairs,
sorted_date_per_order_pairs[1].1 + INTERVAL 3 day AS end_date,
arrayFilter(x -> x.1 < end_date, sorted_date_per_order_pairs) orders_in_period,
length(orders_in_period) orders_count
FROM orders2
GROUP BY customerID);

Days Since Last Help Ticket was Filed

I am trying to create a report to show me the last date a customer filed a ticket.
Customers can file dozens of tickets. I want to know when the last ticket was filed and show how many days it's been since they have done so.
The fields I have are:
Customer,
Ticket_id,
Date_Closed
All from the Same table "Tickets"
I'm thinking I want to do a ranking of tickets by min date? I tried this query to grab something but it's giving me all the tickets from the customer. (I'm using SQL in a product called Domo)
select * from (select *, rank() over (partition by "Ticket_id"
order by "Date_Closed" desc) as date_order
from tickets ) zd
where date_order = 1
This should be simple enough,
SELECT customer,
MAX (date_closed) last_date,
ROUND((SYSDATE - MAX (date_closed)),0) days_since_last_ticket_logged
FROM emp
GROUP BY customer
select Customer, datediff(day, date_closed, current_date) as days_since_last_tkt
from
(select *, rank() over (partition by Customer order by "Date_Closed" desc) as date_order
from tickets) zd
join tickets t on zd.date_closed = t.date_closed
where zd.date_order = 1
Or you can simply do
select customer, datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer
To select other fields
select t.*
from tickets t
join (select customer, max(Date_closed) as mxdate,
datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer) tt
on t.customer = tt.customer and tt.mxdate = t.date_closed
I would do this with a simple sub-query to select the last closed date for the customer. Then compare this to today with datediff() to get the number of days since last closed.
Select
LastTicket.Customer,
LastTicket.LastClosedDate,
DateDiff(day,LastTicket.LastClosedDate,getdate()) as DaysSinceLastClosed
From
(select
tickets.customer
max(tickets.dateClosed) as LastClosedDate
from tickets
Group By tickets.Customer) as LastTicket
Based on the responses this is what I did:
select "Customer",
Max("date_closed") "last_date,
round(datediff(DAY, CURRENT_DATE, max("date_closed")), 0) as "Closed_date"
from tickets
group by "Customer"
ORDER BY "Customer"

SQL: aggregation (group by like) in a column

I have a select that group by customers spending of the past two months by customer id and date. What I need to do is to associate for each row the total amount spent by that customer in the whole first week of the two month time period (of course it would be a repetition for each row of one customer, but for some reason that's ok ). do you know how to do that without using a sub query as a column?
I was thinking using some combination of OVER PARTITION, but could not figure out how...
Thanks a lot in advance.
Raffaele
Query:
select customer_id, date, sum(sales)
from transaction_table
group by customer_id, date
If it's a specific first week (e.g. you always want the first week of the year, and your data set normally includes January and February spending), you could use sum(case...):
select distinct customer_id, date, sum(sales) over (partition by customer_ID, date)
, sum(case when date between '1/1/15' and '1/7/15' then Sales end)
over (partition by customer_id) as FirstWeekSales
from transaction_table
In response to the comments below; I'm not sure if this is what you're looking for, since it involves a subquery, but here's my best shot:
select distinct a.customer_id, date
, sum(sales) over (partition by a.customer_ID, date)
, sum(case when date between mindate and dateadd(DD, 7, mindate)
then Sales end)
over (partition by a.customer_id) as FirstWeekSales
from transaction_table a
left join
(select customer_ID, min(date) as mindate
from transaction_table group by customer_ID) b
on a.customer_ID = b.customer_ID

Efficiently group by column aggregate

SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
GROUP BY date, id
HAVING sum(revenue)>1000
Returns rows that have revenue>1000.
SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
AND id IN (SELECT id FROM table where date between '2013-01-01' and '2013-01-08' GROUP BY id HAVING sum(revenue)>1000)
GROUP BY date, id
Returns rows for id's whose total revenue over the date period is >1000 as desired. But this query is much slower. Any quicker way to do this?
Make sure you have indexes on the date and id columns, and try this variation:
select t.date, t.id, sum(t.revenue)
from table t
inner join (
select id
from table
where date between '2013-01-01' and '2013-01-08'
group by id
having sum(revenue) > 1000
) ts on t.id = ts.id
where t.date between '2013-01-01' and '2013-01-08'
group by t.date, t.id
it's not MySQL, it's Vertica ;)
Cris, what projection and order by you using in CREATE TABLE ???
Do you try using database designer
see http://my.vertica.com/docs/6.1.x/HTML/index.htm#14415.htm