How can I find duplicate records in clickhouse [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to know how I can find duplicate data entries within one table in clickhouse.
I am actually investigating on a merge tree table and actually threw optimize statements at my table but that didn't do the trick. The duplicate entries still persist.
Preferred would be to have a universal strategy without referencing individual column names.
I only want to see the duplicate entries, since I am working on very large tables.

The straight forward way would be to run this query.
SELECT
*,
count() AS cnt
FROM myDB.myTable
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC
If that query gets to big you can run it in pieces.
SELECT
*,
count() AS cnt
FROM myDB.myTable
WHERE (date >= '2020-08-01') AND (date < '2020-09-01')
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC

Related

SQL Adding quantity of another column [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
Improve this question
I need to compute the sum of a column (B) in another column (C), but adding the next value and changing taking into account another column criteria (A), something like this:
Is this possible with a simple SELECT?
In SQL Server, the window function sum() over() should do the trick.
Note the order by ColB ... This is just a placeholder, I suspect you have another column which would have the proper sequence
Example
Select ColA
,ColB
,ColC = sum(ColB) over (partition by ColA order by ColB rows unbounded preceding )
From YourTable

SQL Case Statement or Sub-Query [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
In the application we have a query, we have a situation where we are bringing records back that are open, closed, and archived with a date associated with them. This is a table associated and joined with a main table. The table could have 1 to 3 records associated with the same ID of the main table depending if the record has been opened, closed, and/or archived. The three stages essentially of open, closed, and archived.
What we're looking to do is this: When EStatusID = 1 (Which means open) we need the DateClosed to read as blank (because it's not closed or archived yet)
SELECT
E.EID,
EStatus.EStatusID,
FORMAT (EStatus.DateCreated, 'MM/dd/yyyy') as DateClosed
I won't bore you with the rest of the query because it's long and not useful to the question. So we need some kind of Case statement or sub query or something in the Select to accomplish this task.
You can use a case expression:
CASE WHEN EStatus.EStatusID <> 1 THEN FORMAT (EStatus.DateCreated, 'MM/dd/yyyy') END
AS DateClosed,

Remove rows with duplicate value for a column in SQL [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
From the above table, I want to remove the rows that have the duplicate CODE column. Out of the duplicate rows, the row with the smallest ID_CHECK should be retained.
Output:
If I understand correctly, this should be your SQL query:
Select Names, Code, ID_CHECK
FROM
(
Select Names, Code, ID_CHECK,
ROW_NUMBER() OVER(Partition by Name,Code Order By ID_CHECK) as rnk
FROM table
)tmp
WHERE tmp.rnk=1;
Let me know if this works.

select top N records after first M records [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
i am developing an application and in my scenario i have a table BookTable which does not have a primary key (atleast not till now) and i want to be able to retrieve records like first the top 10, then from 11-20 and then 21-30, i hope i am making my point clear here. i have looked over google and so far have been unsuccessful in finding any solution. hope i will get help here. Thanks.
P.S i am working with MS SQL Server 2012
Select Top(#numRegPerPage) ROW_NUMBER() OVER(ORDER BY myOrderField ASC) as NUM_REG
Where NUM_REG > #lastNumReg
A bit of googling leads me to believe that this will work in MSSQL 2012.
SELECT a, b FROM Table
ORDER BY a
OFFSET 50 ROWS FETCH NEXT 25 ROWS ONLY
The documentation says that it can only be used if you have an ORDER BY clause.

Replace one column by values of the same column in anther table in SQL server [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have two tables, one big and one small. Both contain columns ID and EffectiveDate.
The bigger table has more other columns and of course more rows than the smaller table.
Under the condition that ID for both tables are the same, the EffectiveDate column is earlier in the small table than the big table. I want to replace the EffectiveDate in the big table by the value of the EffectiveDate column from the small table.
What should I do?
Seems like a very basic SQL query....
UPDATE bt
SET EffectiveDate = st.EffectiveDate
FROM dbo.BiggerTable bt
INNER JOIN dbo.SmallerTable st ON bt.ID = st.ID
-- maybe you also need this condition, if not *ALL* EffectiveDate values in the
-- smaller table are indeed before the values in the bigger table
WHERE st.EffectiveDate < bt.EffectiveDate