Correlated subquery not working in Netezza

Correlated subquery not working in Netezza - sql

I have a query like this in Netezza, but not sure how I can rewrite it so it will work. Thanks
with dates as (
select distinct event_date from table
)
select event_date,
(select count(distinct id)
from table
where event_date < dates.event_date
)
from dates
This form of correlated query is not supported - consider rewriting

This would be more efficient using window functions anyway. I think the logic is:
select event_date,
sum(count(*)) over (order by event_date) - count(*) as events_before
from table
group by event_date

Related

Query gives desired results but would like apply window function in case expression using this logic

Looking for a better solution, this query I wrote works for me, I would like to know if there is any better approach or same logic which I could used in the case expression.
I have written a query which give the max count for each id for respective quarter using window function and then pulling seqnum which is 1 and giving the results.
I would like to know is it possible to use the same query in case expression with similar logic.
select id,Quarter_yr, country
from (select id,Quarter_yr, country, count(*) as cnt,
row_number() over (partition by id, Quarter_yr order by count(*) desc) as seqnum
from table
group by id,Quarter_yr,country
) t
where seqnum = 1;

Maybe this query is what you're looking for
select top(1) id, Quarter_yr, country, count(*) as cnt
from table
group by id, Quarter_yr, country
order by count(*) desc;

Is more efficient GROUP BY or JOIN?

Hi i have this table and i want to know what query is mor efficient:
[ID_SOGGETTO]
,[COGNOME]
,[NOME]
,[DENOMINAZIONE]
,[FISICA_GIURID]
,[CODICEFISCALE]
,[PARTITAIVA]
,[ID_COMUNE]
,[DATA_NASCITA]
,[RAE]
,[SAE]
,[TIPO_SOCIETA]
,[NDG]
,[CODICECCIAA]
,[CODICECR]
,[ATTIVITALAVORATIVA]
,[PROFESSIONISTA]
,[CODICEFORNITORESAP]
,[TRASFERITOSAP]
,[ALBOPRECS]
,[ID_USER]
,[ID_USERINC]
,[ID_VERSIONE]
,[DATA_AGGIORNAMENTO]
,[DATA_STORICIZZAZIONE]
And i tried this query to select all rows where that have same "Partita iva" and different "ID_SOGGETTO":
SELECT * FROM table WHERE PARTITAIVA IN ( SELECT PARTITAIVA FROM table GROUP BY PARTITAIVA HAVING COUNT(distinct ID_SOGGETTO) > 1)
It's more efficient with a JOIN?

Often the most efficient way to do what you want uses window functions:
SELECT t.*
FROM (SELECT t.*,
(DENSE_RANK()OVER (PARTITION BY PARTITAIVA ORDER BY ID_SOGGETTO ASC) +
DENSE_RANK()OVER (PARTITION BY PARTITAIVA ORDER BY ID_SOGGETTO DESC)
) as cnt
FROM table t
) t
WHERE cnt > 1;
The sum of DENSE_RANK() is simply a way to calculate the COUNT(DISTINCT).
In other databases, EXISTS would be recommended:
select t.*
from t
where exists (select 1
from t t2
where t2.partitaiva = t.partitaiva and
t2.id_soggetto <> t.id_soggetto
);
However, I am not sure if this would be faster in SparkSQL.

Filter out null values resulting from window function lag() in SQL query

Example query:
SELECT *,
lag(sum(sales), 1) OVER(PARTITION BY department
ORDER BY date ASC) AS end_date_sales
FROM revenue
GROUP BY department, date;
I want to show only the rows where end_date is not NULL.
Is there a clause used specifically for these cases? WHERE or HAVING does not allow aggregate or window function cases.

One method uses a subquery:
SELECT r.*
FROM (SELECT r. *,
LAG(sum(sales), 1) OVER (ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
That said, I don't think the query is correct as you have written it. I would assume that you want something like this:
SELECT r.*
FROM (SELECT r. *,
LEAD(end_date, 1) OVER (PARTITION BY ? ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
Where ? is a column such as the customer id.

Try this
select * from (select distinct *,SUM(sales) OVER (PARTITION BY dept) from test)t
where t.date in(select max(date) from test group by dept)
order by date,dept;
And one more simpler way without sub query
SELECT distinct dept,MAX(date) OVER (PARTITION BY dept),
SUM(sales) OVER (PARTITION BY dept)
FROM test;

How to Pass Query Answer into Limit Function Impala

I am attempting to sample 20% of a table in impala. I have heard somewhere that the built in impala sampling function has issues.
Is there a way to pass in a subquery to the impala limit function to sample n percent of the entire table.
I have something like this:
select
* from
table_a
order by rand()
limit
(
select
round( (count(distinct ids)) *.2,0)
from table_a)
)
The sub query gives me 20% of all records

I'm not sure if Impala has specific sampling logic (some databases do). But you can use window functions:
select a.*
from (select a.*,
row_number() over (order by rand()) as seqnum,
count(*) over () as cnt
from table_a
) a
where seqnum <= cnt * 0.2;

query with partition and count

Given the following table (it records users' item viewing history with session)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
I'm trying to understand what the following code does..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
I've tried to break down the line with the partition to understand what it's doing but then it emits DISTINCT is not implemented for window functions.
I understand basic partition and group by but can't make sense of the above sql..
edit
there's a rather large data for test...
http://pakdd2017.recobell.io/site_view_log_small.csv000.gz

Some databases do not (yet) support count(distinct) as a window function. For this query, the count(distinct) is not necessary, because you are aggregating by the same column used for the count(distinct). Hence, count(distinct session_id) is 1 on each row.
Your query is essentially:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
I wouldn't be surprising if you wanted the ratios at the level of item_id, so the intended query is:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
If so, the equivalent logic can use a subquery:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Correlated subquery not working in Netezza - sql

This would be more efficient using window functions anyway. I think the logic is: select event_date, sum(count()) over (order by event_date) - count() as events_before from table group by event_date

Related

Query gives desired results but would like apply window function in case expression using this logic

Is more efficient GROUP BY or JOIN?

Filter out null values resulting from window function lag() in SQL query

How to Pass Query Answer into Limit Function Impala

query with partition and count

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Correlated subquery not working in Netezza - sql

This would be more efficient using window functions anyway. I think the logic is: select event_date, sum(count(*)) over (order by event_date) - count(*) as events_before from table group by event_date

Related

Query gives desired results but would like apply window function in case expression using this logic

Is more efficient GROUP BY or JOIN?

Filter out null values resulting from window function lag() in SQL query

How to Pass Query Answer into Limit Function Impala

query with partition and count

Categories

Resources

This would be more efficient using window functions anyway. I think the logic is: select event_date, sum(count()) over (order by event_date) - count() as events_before from table group by event_date