Replacement for row_number() in clickhouse - sql

Row_number () is not supported by clickhouse database, looking for a alternate function.
SELECT company_name AS company,
DOMAIN,
city_name AS city,
state_province_code AS state,
country_code AS country,
location_revenue AS revenueRange,
location_TI_industry AS industry,
location_employeecount_range AS employeeSize,
topic,
location_duns AS duns,
rank AS intensityRank,
dnb_status_code AS locationStatus,
rank_delta AS intensityRankDelta,
company_id,
ROW_NUMBER() OVER (PARTITION BY DOMAIN) AS rowNumberFROM company_intent c
WHERE c.rank > 0
AND c.rank <= 10
AND c.signal_count > 0
AND c.topic IN ('Cloud Computing')
AND c.country_code = 'US'
AND c.rank IN (7, 8, 9, 10)
GROUP BY c.location_duns,
company_name,
DOMAIN,
city_name,
state_province_code,
country_code,
location_revenue,
location_TI_industry,
location_employeecount_range,
topic,
rank,
dnb_status_code,
rank_delta,
company_id
ORDER BY intensityRank DESC
LIMIT 15 SELECT COUNT (DISTINCT c.company_id) AS COUNT
FROM company_intent c
WHERE c.rank > 0
AND c.rank <= 10
AND c.signal_count > 0
AND c.topic IN ('Cloud Computing')
AND c.country_code = 'US'
AND c.rank IN (7, 8, 9, 10)
When executed the above query got the below error.
Expected one of: SETTINGS, FORMAT, WITH, HAVING, LIMIT, FROM, PREWHERE, token, UNION ALL, Comma, WHERE, ORDER BY, INTO OUTFILE, GROUP BY
any suggestions is appreciated

Solution #1
SELECT
*,
rowNumberInAllBlocks()
FROM
(
-- YOUR SELECT HERE
)
https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#rownumberinallblocks says:
rowNumberInAllBlocks() Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
Solution #2
SELECT
row_number() OVER (),
...
FROM
...
https://clickhouse.com/docs/en/sql-reference/window-functions/
In my tests, both solutions show identical results. However, you need to remember that at the beginning of 2022, window functions work in single-threaded mode.

ClickHouse doesn't support Window Functions for now. There is a rowNumberInAllBlocks function that might be interesting to you.

SELECT *, rowNumberInAllBlocks() as row_count FROM (SELECT .....)

smth like this (terrible lokks but works good)
SELECT *, rn +1 -min_rn current, max_rn - min_rn + 1 last FROM (
SELECT *, rowNumberInAllBlocks() rn FROM (
SELECT i_device, i_time
FROM tbl
ORDER BY i_device, i_time
) t
) t1 LEFT JOIN (
SELECT i_device, min(rn) min_rn, max(rn) max_rn FROM (
SELECT *, rowNumberInAllBlocks() rn FROM (
SELECT i_device, i_time
FROM tbl
ORDER BY i_device, i_time
) t
) t GROUP BY i_device
) t2 USING (i_device)

Related

Finding Median Using MySQL

I know that there are many ways to find the median, but I am trying to use this method to find the median. Can someone explain to me why this does not work? The error here says "Invalid use of group function," but when I use HAVING instead of WHERE, the system doesn't recognize what RowNumber is. I'm very confused.
SELECT
ROUND(AVG(LS.LAT_N))
FROM(
SELECT
LAT_N,
ROW_NUMBER() OVER (ORDER BY LAT_N) AS RowNumber
FROM
STATION
) AS LS
WHERE
RowNumber IN (
IF(
FLOOR(COUNT(LS.LAT_N)/2+0.5) = CEIL(COUNT(LS.LAT_N)/2+0.5),
FLOOR(COUNT(LS.LAT_N)/2+0.5),
FLOOR(COUNT(LS.LAT_N)/2+0.5) AND CEIL(COUNT(LS.LAT_N)/2+0.5)
)
I typically write this as:
SELECT AVG(LAT_N)
FROM (SELECT LAT_N,
ROW_NUMBER() OVER (ORDER BY LAT_N) AS RowNumber,
COUNT(*) OVER () as cnt
FROM STATION
) s
WHERE 2 * RowNumber IN (CNT, CNT + 1, CNT + 2);
Here is a db<>fiddle.
The median is the middle element in an ordered series - or the average of the two middle elements if there is an even number.
SELECT
AVG(LAT_N)
FROM(
SELECT
LAT_N,
ROW_NUMBER() OVER (ORDER BY LAT_N) AS RowNumber
FROM
STATION
) AS q
WHERE
RowNumber >= FLOOR ( (SELECT COUNT(*) FROM STATION)/2 + 0.5)
AND
RowNumber <= CEIL ( (SELECT COUNT(*) FROM STATION)/2 + 0.5)
Here is dbfiddle https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=b31e08f4ece61ecb95d9dde76c389fb1

Selecting the latest order

I need to select the data of all my customers with the records displayed in the image. But I need to get the most recent record only, for example I need to get the order # E987 for John and E888 for Adam. As you can see from the example, when I do the select statement, I get all the order records.
You don't mention the specific database, so I'll answer with a generic solution.
You can do:
select *
from (
select t.*,
row_number() over(partition by name order by order_date desc) as rn
from t
) x
where rn = 1
You can use analytical function row_number.
Select * from
(Select t.*,
Row_number() over (partition by customer_id order by order_date desc) as rn
From your_table t) t
Where rn = 1
Or you can use not exists as follows:
Select *
From yoir_table t
Where not exists
(Select 1 from your_table tt
Where t.customer_id = tt.custome_id
And tt.order_date > t.order_date)
You can do it with a subquery that finds the last order date.
SELECT t.*
FROM yoir_table t
JOIN (SELECT tt.custome_id,
MAX(tt.order_date) MaxOrderDate
FROM yoir_table tt
GROUP BY tt.custome_id) AS tt
ON t.custome_id = tt.custome_id
AND t.order_date = tt.MaxOrderDate

Hide row_number() column

I'm using row_number() as a counter to see if there is any data with the conditions i specified. I want rows where p = 1 but I don't need the actual column that gives p=1 for every row. Is there any way I can exclude that column?
with base as(
select
state.simple as State,
customer_status,
cast(created_at as date) as Order_Date,
cast(dbt_valid_from as date) as Aktiv_Start_Date,
**row_number() over (partition by lower(email) order by dbt_valid_from) as p**
from analytics.fct_orders_all
left join `analytics.dim_customers_history` on
lower(customer.email) = lower(email)
where customer_status like "Aktiv%"
and state.simple = "pending"
and dbt_valid_from between created_at and timestamp_add(created_at, interval 14 day)
)
select *
from base
**where p = 1**
order by 4 desc
Big Query has a nice extension called SELECT * EXCEPT:
with base as (...)
select * except(p) from base where p = 1 order by 4
BigQuery allows this using except:
select b.* except (p)
from base b
where p = 1;
You can also use replace to change the names of columns as well. These very handy modifiers are explained in the documentation.
you can always avoid using select *
I have a similar query here:
;with base as(
select
'Pending' as State,
a.preferredName,
cast(createDate as date) as Order_Date,
cast(birthdate as date) as Aktiv_Start_Date,
a.nationalitycode,
row_number() over (partition by a.nationalitycode order by a.nationalitycode) as p
from [app].[applicant] a
)
select *
from base
where p = 1
order by 4 desc
Putting the list of columns that I want to see implicit in the select:
;with base as(
select
'Pending' as State,
a.preferredName,
cast(createDate as date) as Order_Date,
cast(birthdate as date) as Aktiv_Start_Date,
a.nationalitycode,
row_number() over (partition by a.nationalitycode order by a.nationalitycode) as p
from [app].[applicant] a
)
select
State,
preferredName,
Order_Date,
Aktiv_Start_Date,
nationalityCode
from base
where p = 1
order by 4 desc

Select most recent status for each ID and department code

I have the following table:
I want to get the most recent status for each dept_code that a CL_ID has. So the desired output would be this:
I have tried the following but this give me just the most recent status for each client and not each of their dept_codes.
SELECT *
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT] C
INNER JOIN
(SELECT CLIENT_NUMBER, MAX(STATUS_DATE) AS SDATE
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT]
GROUP BY CLIENT_NUMBER) X
ON X.CLIENT_NUMBER = C.CLIENT_NUMBER
AND X.SDATE = C.STATUS_DATE
ORDER BY C.CLIENT_NUMBER
Any help would be much appreciated. Thanks.
A convenient method that works in SQL Server is:
select top (1) cl.*
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl
order by row_number() over (partition by cl_id, dept_code order by status_date desc);
A method that is efficient with the right indexes in almost any database is:
select cl.*
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl
where cl.status_date = (select max(cl2.status_date)
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl2
where cl2.cl_id = cl.cl_id and cl2.dept_code = cl.dept_code
);
The right index is on (cl_id, dept_code, status_date).
I would also use ROW_NUMBER, but with a subquery:
SELECT CL_ID, Status_date, Status, Dept_code
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CL_ID, Dept_code ORDER BY Status_date DESC) rn
FROM CIMSHR6_MERGED].[dbo].[C3CLSTAT]
) t
WHERE rn = 1;
1) Firstly group everything on Dept_Code,CL_ID and assign rank for each row with in the group in descending order.
2) Select all the rows with rnk=1 which would display your desired result.
SELECT Z.CL_ID,
Z.Status_Date,
Z.Status,
Z.Dept_Code
FROM
(
SELECT *,
RANK() OVER( PARTITION BY Dept_Code,CL_ID, ORDER BY Status_Date DESC ) AS rnk
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT]
) Z
WHERE Z.rnk = 1;
This would work for almost all databases
select * from c3clstat c
where exists
(select 1 from c3clstat c1
where c1.cl_id=c.cl_id
and c1.dept_code=c.dept_code
group by cl_id,dept_code
having c.status_date=max(c1.status_date)
)

how to query the occuring time of the max wind from a database?

I want to find the occuring time of the max wind, the max wind, and total rain from a database. The database have three columns: observerTime, wind and rain, how to generate the SQL statement to get the result ?
select observerTime from t where wind = (select max(wind) from t)
or if you need the last date when it occures
select max(observerTime) from t where wind = (select max(wind) from t)
You don't mention the database, but one of the following is likely to work:
select top 1 *, (select sum(rain) from t) as TotalRain
order by wind desc
or:
select *, (select sum(rain) from t) as TotalRain
from t
order by wind desc
limit 1
or
select *, (select sum(rain) from t) as TotalRain
from (select *
from t
order by wind desc
) t
where rownum = 1
You should be able to use something like this:
select t1.observerTime,
t1.wind,
(select sum(rain) from yourtable) TotalRain
from yourtable t1
inner join
(
select max(wind) MaxWind
from yourtable
) t2
on t1.wind = t2.maxwind
See SQL Fiddle with Demo
Since you are using SQL Server, you can also use row_number():
select observertime,
wind,
(select sum(rain) from yourtable) TotalRain
from
(
select observertime,
wind,
rain,
row_number() over(order by wind desc) rn
from yourtable
) src
where rn = 1
See SQL Fiddle with Demo