Correlated subquery not working in Netezza - sql

I have a query like this in Netezza, but not sure how I can rewrite it so it will work. Thanks
with dates as (
select distinct event_date from table
)
select event_date,
(select count(distinct id)
from table
where event_date < dates.event_date
)
from dates
This form of correlated query is not supported - consider rewriting

This would be more efficient using window functions anyway. I think the logic is:
select event_date,
sum(count(*)) over (order by event_date) - count(*) as events_before
from table
group by event_date

Related

Query gives desired results but would like apply window function in case expression using this logic

Looking for a better solution, this query I wrote works for me, I would like to know if there is any better approach or same logic which I could used in the case expression.
I have written a query which give the max count for each id for respective quarter using window function and then pulling seqnum which is 1 and giving the results.
I would like to know is it possible to use the same query in case expression with similar logic.
select id,Quarter_yr, country
from (select id,Quarter_yr, country, count(*) as cnt,
row_number() over (partition by id, Quarter_yr order by count(*) desc) as seqnum
from table
group by id,Quarter_yr,country
) t
where seqnum = 1;
Maybe this query is what you're looking for
select top(1) id, Quarter_yr, country, count(*) as cnt
from table
group by id, Quarter_yr, country
order by count(*) desc;

Is more efficient GROUP BY or JOIN?

Hi i have this table and i want to know what query is mor efficient:
[ID_SOGGETTO]
,[COGNOME]
,[NOME]
,[DENOMINAZIONE]
,[FISICA_GIURID]
,[CODICEFISCALE]
,[PARTITAIVA]
,[ID_COMUNE]
,[DATA_NASCITA]
,[RAE]
,[SAE]
,[TIPO_SOCIETA]
,[NDG]
,[CODICECCIAA]
,[CODICECR]
,[ATTIVITALAVORATIVA]
,[PROFESSIONISTA]
,[CODICEFORNITORESAP]
,[TRASFERITOSAP]
,[ALBOPRECS]
,[ID_USER]
,[ID_USERINC]
,[ID_VERSIONE]
,[DATA_AGGIORNAMENTO]
,[DATA_STORICIZZAZIONE]
And i tried this query to select all rows where that have same "Partita iva" and different "ID_SOGGETTO":
SELECT * FROM table WHERE PARTITAIVA IN ( SELECT PARTITAIVA FROM table GROUP BY PARTITAIVA HAVING COUNT(distinct ID_SOGGETTO) > 1)
It's more efficient with a JOIN?
Often the most efficient way to do what you want uses window functions:
SELECT t.*
FROM (SELECT t.*,
(DENSE_RANK()OVER (PARTITION BY PARTITAIVA ORDER BY ID_SOGGETTO ASC) +
DENSE_RANK()OVER (PARTITION BY PARTITAIVA ORDER BY ID_SOGGETTO DESC)
) as cnt
FROM table t
) t
WHERE cnt > 1;
The sum of DENSE_RANK() is simply a way to calculate the COUNT(DISTINCT).
In other databases, EXISTS would be recommended:
select t.*
from t
where exists (select 1
from t t2
where t2.partitaiva = t.partitaiva and
t2.id_soggetto <> t.id_soggetto
);
However, I am not sure if this would be faster in SparkSQL.

Filter out null values resulting from window function lag() in SQL query

Example query:
SELECT *,
lag(sum(sales), 1) OVER(PARTITION BY department
ORDER BY date ASC) AS end_date_sales
FROM revenue
GROUP BY department, date;
I want to show only the rows where end_date is not NULL.
Is there a clause used specifically for these cases? WHERE or HAVING does not allow aggregate or window function cases.
One method uses a subquery:
SELECT r.*
FROM (SELECT r. *,
LAG(sum(sales), 1) OVER (ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
That said, I don't think the query is correct as you have written it. I would assume that you want something like this:
SELECT r.*
FROM (SELECT r. *,
LEAD(end_date, 1) OVER (PARTITION BY ? ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
Where ? is a column such as the customer id.
Try this
select * from (select distinct *,SUM(sales) OVER (PARTITION BY dept) from test)t
where t.date in(select max(date) from test group by dept)
order by date,dept;
And one more simpler way without sub query
SELECT distinct dept,MAX(date) OVER (PARTITION BY dept),
SUM(sales) OVER (PARTITION BY dept)
FROM test;

How to Pass Query Answer into Limit Function Impala

I am attempting to sample 20% of a table in impala. I have heard somewhere that the built in impala sampling function has issues.
Is there a way to pass in a subquery to the impala limit function to sample n percent of the entire table.
I have something like this:
select
* from
table_a
order by rand()
limit
(
select
round( (count(distinct ids)) *.2,0)
from table_a)
)
The sub query gives me 20% of all records
I'm not sure if Impala has specific sampling logic (some databases do). But you can use window functions:
select a.*
from (select a.*,
row_number() over (order by rand()) as seqnum,
count(*) over () as cnt
from table_a
) a
where seqnum <= cnt * 0.2;

query with partition and count

Given the following table (it records users' item viewing history with session)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
I'm trying to understand what the following code does..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
I've tried to break down the line with the partition to understand what it's doing but then it emits DISTINCT is not implemented for window functions.
I understand basic partition and group by but can't make sense of the above sql..
edit
there's a rather large data for test...
http://pakdd2017.recobell.io/site_view_log_small.csv000.gz
Some databases do not (yet) support count(distinct) as a window function. For this query, the count(distinct) is not necessary, because you are aggregating by the same column used for the count(distinct). Hence, count(distinct session_id) is 1 on each row.
Your query is essentially:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
I wouldn't be surprising if you wanted the ratios at the level of item_id, so the intended query is:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
If so, the equivalent logic can use a subquery:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;