Group data with a single statement on CrateDB nested SELECT - sql

How can I group the following query to the time frame in CrateDB?
SELECT * FROM (
SELECT
(
SELECT
date_bin('1 day'::INTERVAL, time_index, 0) AS time_frame,
count(*) FROM schema.status
WHERE processstatus IN ('State_01')
GROUP BY time_frame
ORDER BY time_frame DESC
) AS parts_good,
(
SELECT
date_bin('1 day'::INTERVAL, time_index, 0) AS time_frame,
count(*) FROM schema.status
WHERE processstatus IN ('State_02')
GROUP BY time_frame
ORDER BY time_frame DESC
) AS parts_bad
)
At the moment I'm getting the following error:
Error! UnsupportedFeatureException[Subqueries with more than 1 column are not supported.]
Maybe with a JOIN I can make it work, but I would like, if possible, to avoid the declaration of date_bin(), GROUP BY and ORDER BY in each SELECT statement, any suggestions?
Thanks!

I am not entirely sure, what you are trying to achieve, however the following query would give back the good and bad parts for every time_frame
SELECT
date_bin('1 day'::INTERVAL, time_index, 0) AS time_frame,
count(*) FILTER ( WHERE processstatus = 'State_01') AS "parts_good",
count(*) FILTER ( WHERE processstatus = 'State_02') AS "parts_bad"
FROM schema.status
GROUP BY time_frame
ORDER BY time_frame DESC

Related

Returning the full record of each duplicated row by selecting the table and joining it to the duplicates?

The first query works. Query A is based on a post from StackOverflow (Using GROUP BY and HAVING COUNT(*) >1 to select duplicate and noon-duplicate field).
But is it possible to return the full record of each duplicated row by selecting the table and joining it to the duplicates? That's what I'm attempting in Query B. I'm trying to do so on two fields. Is it possible to accomplish this with the HAVING clause constructed this way? I'm a n00b. Any advice or education would be appreciated.
Query A) Based on an example from StackOverflow:
SELECT InstanceID, InstanceSequenceNumber
FROM [dbo].[ANBasics]
WHERE InstanceID IN
(SELECT InstanceID FROM [dbo].[ANBasics]
GROUP BY InstanceID
HAVING (COUNT(*) > 1))
ORDER BY InstanceID
Query B) What I'm trying to accomplish:
SELECT A.*, COUNT(*) AS B
FROM [dbo].[ANBasics] AS A
JOIN(
SELECT [InstanceID], [InstanceSequenceNumber], COUNT(*)
FROM [dbo].[ANBasics]
GROUP BY [InstanceID], [InstanceSequenceNumber]
HAVING (B > 1) )
ON A.[InstanceID] = B.[InstanceID]
AND A.[InstanceSequenceNumber] = B.[InstanceSequenceNumber]
ORDER BY A.[InstanceID]
If I understand correctly, window functions are the simplest solution:
SELECT ab.*
FROM (SELECT ab.*,
COUNT(*) OVER (PARTITION BY InstanceID, InstanceSequenceNumber) as cnt
FROM [dbo].[ANBasics] ab
) ab
WHERE cnt > 1;
If you want this for duplicates of two columns:
SELECT ab.*
FROM (SELECT ab.*,
COUNT(*) OVER (PARTITION BY InstanceID) as cnt
FROM [dbo].[ANBasics] ab
) ab
WHERE cnt > 1;

Replacement for row_number() in clickhouse

Row_number () is not supported by clickhouse database, looking for a alternate function.
SELECT company_name AS company,
DOMAIN,
city_name AS city,
state_province_code AS state,
country_code AS country,
location_revenue AS revenueRange,
location_TI_industry AS industry,
location_employeecount_range AS employeeSize,
topic,
location_duns AS duns,
rank AS intensityRank,
dnb_status_code AS locationStatus,
rank_delta AS intensityRankDelta,
company_id,
ROW_NUMBER() OVER (PARTITION BY DOMAIN) AS rowNumberFROM company_intent c
WHERE c.rank > 0
AND c.rank <= 10
AND c.signal_count > 0
AND c.topic IN ('Cloud Computing')
AND c.country_code = 'US'
AND c.rank IN (7, 8, 9, 10)
GROUP BY c.location_duns,
company_name,
DOMAIN,
city_name,
state_province_code,
country_code,
location_revenue,
location_TI_industry,
location_employeecount_range,
topic,
rank,
dnb_status_code,
rank_delta,
company_id
ORDER BY intensityRank DESC
LIMIT 15 SELECT COUNT (DISTINCT c.company_id) AS COUNT
FROM company_intent c
WHERE c.rank > 0
AND c.rank <= 10
AND c.signal_count > 0
AND c.topic IN ('Cloud Computing')
AND c.country_code = 'US'
AND c.rank IN (7, 8, 9, 10)
When executed the above query got the below error.
Expected one of: SETTINGS, FORMAT, WITH, HAVING, LIMIT, FROM, PREWHERE, token, UNION ALL, Comma, WHERE, ORDER BY, INTO OUTFILE, GROUP BY
any suggestions is appreciated
Solution #1
SELECT
*,
rowNumberInAllBlocks()
FROM
(
-- YOUR SELECT HERE
)
https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#rownumberinallblocks says:
rowNumberInAllBlocks() Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
Solution #2
SELECT
row_number() OVER (),
...
FROM
...
https://clickhouse.com/docs/en/sql-reference/window-functions/
In my tests, both solutions show identical results. However, you need to remember that at the beginning of 2022, window functions work in single-threaded mode.
ClickHouse doesn't support Window Functions for now. There is a rowNumberInAllBlocks function that might be interesting to you.
SELECT *, rowNumberInAllBlocks() as row_count FROM (SELECT .....)
smth like this (terrible lokks but works good)
SELECT *, rn +1 -min_rn current, max_rn - min_rn + 1 last FROM (
SELECT *, rowNumberInAllBlocks() rn FROM (
SELECT i_device, i_time
FROM tbl
ORDER BY i_device, i_time
) t
) t1 LEFT JOIN (
SELECT i_device, min(rn) min_rn, max(rn) max_rn FROM (
SELECT *, rowNumberInAllBlocks() rn FROM (
SELECT i_device, i_time
FROM tbl
ORDER BY i_device, i_time
) t
) t GROUP BY i_device
) t2 USING (i_device)

Trying to find duplicate values in TWO rows and TWO columns - SQL Server

Using SQL Server, I'm not a DBA but I can write some general SQL. Been pulling my hair out for about an hour now. Searching I've found several solutions but they all fail due to how GROUP BY works.
I have a table with two columns that I'm trying to check for duplicates:
userid
orderdate
I'm looking for rows that have BOTH userid and orderdate as duplicates. I want to display these rows.
If I use group by, I can't pull any other data, such as the order ID, because it's not in the group by clause.
You could use the grouped query in a subquery:
SELECT *
FROM mytable a
WHERE EXISTS (SELECT userid, orderdate
FROM mytable b
WHERE a.userid = b.userid AND a.orderdate = b.orderdate
GROUP BY userid, orderdate
HAVING COUNT(*) > 1)
You can also use a windowed function:
; With CTE as
(Select *
, count(*) over (partition by UserID, OrderDate) as DupRows
from MyTable)
Select *
from CTE
where DupRows > 1
order by UserID, OrderDate
You can get the duplicates by using the groupby and having. Like so:
SELECT
userid,orderdate, COUNT(*)
FROM
yourTable
GROUP BY
userid,orderdate
HAVING
COUNT(*) > 1
EDIT:
SELECT * FROM yourTable
WHERE CONCAT(userid,orderdate) IN
(
SELECT
CONCAT(userid,orderdate)
FROM
yourTable
GROUP BY
userid,orderdate
HAVING
COUNT(*) > 1
)
SELECT *
FROM myTable
WHERE CAST(userid as Varchar) + '/' + CONVERT(varchar(10),orderdate,103) In
(
SELECT
CAST(userid as Varchar) + '/' + CONVERT(varchar(10),orderdate,103)
FROM myTable
GROUP BY userid , orderdate
HAVING COUNT(*) > 1
);

Using query result as subquery syntax

I have a table that I need to identify duplicate entries to delete them. I can find the duplicates using the following query
select s.*, t.*
from [tableXYZ] s
join (
select [date], [product], count(*) as qty
from [tableXYZ]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[date] and s.[product] = t.[product]
ORDER BY s.[date], s.[product], s.[id]
and then need to use the result from this table to show where [fieldZ] IS NULL
I've tried the following but get error The column 'date' was specified multiple times for 'subquery'.
select * from
(
select s.*, t.*
from [tableXYZ] s
join (
select [date], [product], count(*) as qty
from [tableXYZ]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[date] and s.[product] = t.[product]
) as subquery
where [fieldZ] is null
You have column date in your subquery twice because you are selecting s.* and t.*, this will return s.Date and t.date. If you need both columns, alias one of the columns.
You will also run into this problem with the product column. Your subquery cannot return multiple columns with the same name. Only select the columns you need in your subquery instead of selecting all columns. This is a good practice in general and will solve this issue.

How to find the most repeat data in oracle?

I have a TRANSACTION table.
Columns are =
Personnel_id and Personnel_Name
I want to find the most repetitive data sorting.
How i can do it?
I have tried and I found, the most repetitive result but i can't show personnel_name.
Here is my query ;
SELECT PERSONNEL_ID,
COUNT(PERSONNEL_ID)
FROM KOMTAS.TRANSACTIONS
GROUP BY PERSONNEL_ID;
And this code gives an error ;
SELECT MAX(R),
PERSONNEL_ID
FROM ( SELECT PERSONNEL_ID,
COUNT(PERSONNEL_ID) R
FROM KOMTAS.TRANSACTIONS
GROUP BY PERSONNEL_ID
) ;
help please !
Add a second group by to the second query:
SELECT MAX(R),
PERSONNEL_ID
FROM ( SELECT PERSONNEL_ID,
COUNT(PERSONNEL_ID) R
FROM KOMTAS.TRANSACTIONS
GROUP BY PERSONNEL_ID )
GROUP BY PERSONNEL_ID;
I would suggest using rownum and a subquery:
SELECT t.*
FROM (SELECT PERSONNEL_ID, COUNT(PERSONNEL_ID) as cnt
FROM KOMTAS.TRANSACTIONS
GROUP BY PERSONNEL_ID
ORDER BY COUNT(PERSONNEL_ID) DESC
) t
WHERE rownum = 1;
Try This
SELECT MAX(cnt) AS c,
PersonId
FROM ( SELECT PersonId,
COUNT(PersonId) cnt
FROM TRANSACTIONS
GROUP BY PersonId )
GROUP BY PersonId
ORDER BY c DESC;
Check here SQLFiddle