SQL Query Pattern Selecting

SQL Query Pattern Selecting - sql

Table part
I need to select all from the rows where the first 7 characters of the Assoc.Ref column are the same on a specific day.
Result example

You need aggregation :
SELECT t.col
FROM table t
GROUP BY t.col
HAVING COUNT(*) > 1;
If you want exactly two rows for each then use COUNT(*) = 2 instead .
If you want all rows then you can use windows function :
SELECT t.*
FROM (SELECT t.*,
COUNT(*) OVER(PARTITION BY col) AS cnt
FROM table t
) t
WHERE t.cnt > 1;
EDIT : After made update on question you might need LEFT() :
SELECT t.*
FROM (SELECT t.*,
COUNT(*) OVER(PARTITION BY CAST(Date_created AS date), LEFT(associated_ref, 7)) AS cnt
FROM table t
) t
WHERE t.cnt > 1 AND CAST(t.Date_created AS date) = '2019-02-08';
If the Date_created has no time then no conversation is needed. Just use Date_created instead.

Related

Returning the full record of each duplicated row by selecting the table and joining it to the duplicates?

The first query works. Query A is based on a post from StackOverflow (Using GROUP BY and HAVING COUNT(*) >1 to select duplicate and noon-duplicate field).
But is it possible to return the full record of each duplicated row by selecting the table and joining it to the duplicates? That's what I'm attempting in Query B. I'm trying to do so on two fields. Is it possible to accomplish this with the HAVING clause constructed this way? I'm a n00b. Any advice or education would be appreciated.
Query A) Based on an example from StackOverflow:
SELECT InstanceID, InstanceSequenceNumber
FROM [dbo].[ANBasics]
WHERE InstanceID IN
(SELECT InstanceID FROM [dbo].[ANBasics]
GROUP BY InstanceID
HAVING (COUNT(*) > 1))
ORDER BY InstanceID
Query B) What I'm trying to accomplish:
SELECT A.*, COUNT(*) AS B
FROM [dbo].[ANBasics] AS A
JOIN(
SELECT [InstanceID], [InstanceSequenceNumber], COUNT(*)
FROM [dbo].[ANBasics]
GROUP BY [InstanceID], [InstanceSequenceNumber]
HAVING (B > 1) )
ON A.[InstanceID] = B.[InstanceID]
AND A.[InstanceSequenceNumber] = B.[InstanceSequenceNumber]
ORDER BY A.[InstanceID]

If I understand correctly, window functions are the simplest solution:
SELECT ab.*
FROM (SELECT ab.*,
COUNT(*) OVER (PARTITION BY InstanceID, InstanceSequenceNumber) as cnt
FROM [dbo].[ANBasics] ab
) ab
WHERE cnt > 1;
If you want this for duplicates of two columns:
SELECT ab.*
FROM (SELECT ab.*,
COUNT(*) OVER (PARTITION BY InstanceID) as cnt
FROM [dbo].[ANBasics] ab
) ab
WHERE cnt > 1;

SQL - delete record where sum = 0

I have a table which has below values:
If Sum of values = 0 with same ID I want to delete them from the table. So result should look like this:
The code I have:
DELETE FROM tmp_table
WHERE ID in
(SELECT ID
FROM tmp_table WITH(NOLOCK)
GROUP BY ID
HAVING SUM(value) = 0)
Only deletes rows with ID = 2.
UPD: Including additional example:
Rows in yellow needs to be deleted

Your query is working correctly because the only group to total zero is id 2, the others have sub-groups which total zero (such as the first two with id 1) but the total for all those records is -3.
What you're wanting is a much more complex algorithm to do "bin packing" in order to remove the sub groups which sum to zero.

You can do what you want using window functions -- by enumerating the values for each id. Taking your approach using a subquery:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
)
delete from t
where exists (select 1
from t t2
where t2.id = t.id and t2.value = - t.value and t2.seqnum = t.seqnum
);
You can also do this with a second layer of window functions:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
),
tt as (
select t.*, count(*) over (partition by id, abs(value), seqnum) as cnt
from t
)
delete from tt
where cnt = 2;

How to get the top N percent (e.g., 50%) of a table in BigQuery (standard SQL)?

I have tried the following approaches which none of them worked:
Using SELECT TOP 50 PERCENT: BigQuery does not have top function
Using LIMIT (SELECT COUNT(*) FROM tabl)/2: the reason is BigQuery does not accept any non integer value.
Using SET to set the median value and then use WHERE

In BigQuery I would use window function percent_rank().
select t.* except (prnk)
from (select t.*, percent_rank() over(order by id) prnk from mytable t) t
where prnk <= 0.5
Note: any answer to your question will require that you provide a column to order your data. I assumed that this column is called id.

One method uses window functions:
select t.* except (seqnum, cnt)
from (select t.*, row_number() over (order by ?) as seqnum,
count(*) over () as cnt
from t
) t
where seqnum <= cnt / 2;

Another possibility would be to limit the data with a WHERE clause instead of LIMIT. This is an example if you want yo filter by an ID:
SELECT * FROM table_name as t
WHERE t.id <= (SELECT COUNT(*) FROM table_name)/2;
And if you want to filter by the row number:
SELECT t.* except (rn)
FROM (
SELECT t.*, ROW_NUMBER() OVER () AS rn
FROM table_name as t
) AS t
WHERE t.rn <= (SELECT COUNT(*) FROM table_name)/2;

To scale up, you can use an approx algorithm to find the 50% point:
DECLARE mid_date TIMESTAMP DEFAULT (
SELECT APPROX_QUANTILES(creation_date, 2)[OFFSET(1)] mid_date
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers` )
;
SELECT mid_date
, COUNTIF(creation_date > mid_date) first_half
, COUNTIF(creation_date < mid_date) second_half
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
Looks like it works well:
Now let's get these records out:
CREATE TABLE `temp.fifty_percent`
AS
SELECT *
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
WHERE creation_date < (
SELECT APPROX_QUANTILES(creation_date, 2)[OFFSET(1)] mid_date
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
)
This method will happily scale, while solutions using OVER(ORDER BY) won't.

sql: Select count(*) - nth record from each group

I'm grouping by tenant_id. I want to select the count() - 1000th record (ordered by _updated time) from each GROUPBY group, for the groups where count() is greater than 1000. As follows:
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
count(*) - 1000
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
But this is not allowed as count(*) cannot be used inside the subquery.
So I tried the following, which still doesn't work as I don't have access to t1 since this is not a join.
select t1.tenant_id,
(select temp._updated
from trace temp
where temp.tenant_id = t1.tenant_id
order by _updated limit 1 offset
(select count(*)-1000
from trace t2
group by tenant_id
having t2.tenant_id = t1.tenant_id)
) as timekey
from fgc.trace as t1
group by tenant_id
having count(*) > 1000;
So how can I get the following?
tenant_id | timekey
+-----------+----------------------------------+
n7ia6ryc | 2019-07-23 23:09:49.951406+00:00

You seem to want ROW_NUMBER(). Cockroach supports windows functions, so:
SELECT updated
FROM (
SELECT
tenant_id,
updated,
ROW_NUMBER() OVER(PARTITION BY tenant_id ORDER BY updated DESC) rn
FROM trace
) x WHERE rn = 1001
For each tenant_id, this will return the timestamp of the 1001th less recent record. If a given tenant has less than 1000 records, it will not appear in the results.

select x.tenant_id
from (
select t.tenant_id,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
group by x.tenant_id
just the one timestamp would look like this:
select min(x.timekey) as min_timestamp
from (
select t.tenant_id, t.timekey,
row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
from fgc.trace as t
) x
where x.tenant_number > 1000
note that grouping does not matter here because each row can only be in one group and you are only looking at one row.

How to select top 1 and ordered by date in Oracle SQL? [duplicate]

This question already has answers here:
How do I do top 1 in Oracle? [duplicate]
(9 answers)
How do I limit the number of rows returned by an Oracle query after ordering?
(14 answers)
Oracle SELECT TOP 10 records [duplicate]
(6 answers)
How to use Oracle ORDER BY and ROWNUM correctly?
(5 answers)
Closed 5 years ago.
There is a clear answer how to select top 1:
select * from table_name where rownum = 1
and how to order by date in descending order:
select * from table_name order by trans_date desc
but they does not work togeather (rownum is not generated according to trans_date):
... where rownum = 1 order by trans_date desc
The question is how to select top 1 also ordered by date?

... where rownum = 1 order by trans_date desc
This selects one record arbitrarily chosen (where rownum = 1) and then sorts this one record (order by trans_date desc).
As shown by Ivan you can use a subquery where you order the records and then keep the first record with where rownum = 1in the outer query. This, however, is extremely Oracle-specific and violates the SQL standard where a subquery result is considered unordered (i.e. the order by clause can be ignored by the DBMS).
So better go with the standard solution. As of Oracle 12c:
select *
from table_name
order by trans_date desc
fetch first 1 row only;
In older versions:
select *
from
(
select t.*, row_number() over (order by trans_date desc) as rn
from table_name t
)
where rn = 1;

Modern Oracle versions have FETCH FIRST:
select * from table_name order by trans_date desc
fetch first 1 row only

There should be subquery so the combination rownum & order could work:
select * from (select * from table_name order by trans_date desc) AS tb where rownum = 1

You can use window functions for that:
select t.*
from (
select *,
min(trans_date) over () as min_date,
max(trans_date) over () as max_date
from the_table
) t
where trans_date = min_date
or trans_date = max_date;
Another option would be to join on the derived table
select t1.*
from the_table
join (
select min(trans_date) over () as min_date,
max(trans_date) over () as max_date
from the_table
) t2 on t1.trans_date = t2.min_date
or t1.trans_date = t2.max_date;
Not sure which one would be faster, you need to check the execution plan

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query Pattern Selecting - sql

Table part I need to select all from the rows where the first 7 characters of the Assoc.Ref column are the same on a specific day. Result example

Related

Returning the full record of each duplicated row by selecting the table and joining it to the duplicates?

SQL - delete record where sum = 0

How to get the top N percent (e.g., 50%) of a table in BigQuery (standard SQL)?

sql: Select count(*) - nth record from each group

How to select top 1 and ordered by date in Oracle SQL? [duplicate]

Categories

Resources