Remove Duplicate values in Snowflake - sql

I have the below table and I need to remove the duplicate values and leave the values only for the last session, example if an anonymous_id has 1,2,3,4 sessions_group I just need session_group 4.
channel_to_order table
select ANONYMOUS_ID, order_number,session_group_b2, CHAN_ATTRIBUTION, max (session_group) as last_session
from channel_to_order
where session_group = session_group_b2
group by order_number,ANONYMOUS_ID, session_group_b2, CHAN_ATTRIBUTION;
The above query is giving me the last session however I'm still having some duplicate values, not sure how to solve this I have also tried
select * from(
select cto.*, row_number() over (partition by order_number order by ANONYMOUS_ID ) as rn
from channel_to_order cto)
where rn = 1
;
In this case I do not have duplicates however the results are not showing the last session_group also I have been told to not use partition by

You can try the following query to remove duplicates and get the last session_group:
select ANONYMOUS_ID, order_number, CHAN_ATTRIBUTION, max(session_group) as last_session
from channel_to_order
where session_group = session_group_b2
group by ANONYMOUS_ID, order_number, CHAN_ATTRIBUTION
This query uses the group by clause to group the records by ANONYMOUS_ID, order_number, and CHAN_ATTRIBUTION. The max function is used to get the maximum value of session_group for each group. Since you only want to keep the last session, this is equivalent to getting the last session_group.

Related

How to use Array_agg without returning the same values in different Order?

When using Array_agg, it returns the same values in different orders. I tried using distinct in a few places and it didn't work. I tried using an order before and after the array and it would fail or not properly exclude results.
I am trying to find all fields in the field column that share the same time and same ID and put them into an array.
Columns are Fieldname, ID, Time
select b.Field, count(*)
from (select Time, ID, array_agg(fieldname) as Field
from a
group by 1,2
order by 3) b
group by b.field
order by 1 desc
This produces duplicate results
For example I will have:
Field Name Count
Ghost,Mark 1234
Mark,Ghost 1234
I also tried this below where I add a subquery where I first order the fields alphabetically when grouping time and ID but it failed to execute. I think due to array_agg not being the root query?
select a.Field, count(*)
from
(select Time, ID, array_agg(fieldname) as field
from
(select Time, ID, fieldname
from a
group by 1,2
order by 3 desc) a
group by 1,2 ) b
group by 1
order by 2 desc

Getting MAX of a column and adding one more

I'm trying to make an SQL query that returns the greatest number from a column and its respective id.
For more information I have two columns ID and NUMBER. Both of them have 2 entries and I want to get the highest number with the ID next to it. This is what I tried but didn't success.
SELECT ID, MAX(NUMBER) AS MAXNUMB
FROM TABLE1
GROUP BY ID, MAXNUMB;
The problem I'm experiencing is that it just shows ALL the entries and if I add a "where" expression it just shows the same (all entries [ids+numbers]).
Pd.: Yes, I got what I wanted but only with one column (number) if I add another column (ID) to select it "brokes".
Try:
SELECT
ID,
A_NUMBER
FROM TABLE1
WHERE A_NUMBER = (
SELECT MAX(A_NUMBER)
FROM TABLE1);
Presuming you want the IDs* of the row with the highest number (and not, instead, the highest number for each ID -- if IDs were not unique in your table, for example).
* there may be more than one ID returned if there are two or more IDs with equal maximum numbers
you can try this
Select ID,maxNumber
From
(
SELECT
ID,
(Select Max(NUMBER) from Tmp where Id = t.Id) maxNumber
FROM
Tmp t
)T1
Group By ID,maxNumber
The query you posted has an illegal column name (number) and is group by the alias for the max value, which is illegal and also doesn't make sense; and you can't include the unaliased max() within the group-by either. So it's likely you're actually doing something like:
select id, max(numb) as maxnumb
from table1
group by id;
which will give one row per ID, with the maximum numb (which is the new name I've made up for your numeric column) for each ID. Or as you said you get "ALL the entries" you might have group by id, numb, which would show all rows from the table (unless there are duplicate combinations).
To get the maximum numb and the corresponding id you could group by id only, order by descending maxnumb, and then return the first row only:
select id, max(numb) as maxnumb
from table1
group by id
order by maxnumb desc
fetch first 1 row only
If there are two ID with the same maxnumb then you would only get one of them - and which one is indeterminate unless you modify the order by - but in that case you might prefer to use first 1 row with ties to see them all.
You could achieve the same thing with a subquery and analytic function to generating a ranking, and have the outer query return the highest-ranking row(s):
select id, numb as maxnumb
from (
select id, numb, dense_rank() over (order by numb desc) as rnk
from table1
)
where rnk = 1
You could also use keep to get the same result as first 1 row only:
select max(id) keep (dense_rank last order by numb) as id, max(numb) as maxnumb
from table1
fiddle

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".
You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1
There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

SQL Query with MIN function is Returning Multiple Rows

I'm trying to select the estimated hours of a row with the lowest date from a table.
SELECT prev_est_hrs
FROM ( SELECT MIN(change_date), prev_est_hrs
FROM task_history
WHERE task_id = 5
GROUP BY prev_est_hrs
);
However this is returning two rows, why? I thought MIN was supposed to return the lowest only?
Help much appreciated.
You have a GROUP BY clause. The MIN will return the minimum value in each group.
Also, you are only returning the group by value from the outer SELECT.
Mitch is right. One way to get the prev_est_hrs for the record with the earliest change_date, which seems to be what you're trying to find, is with an analytic function:
SELECT prev_est_hrs
FROM (
SELECT prev_est_hrs, ROW_NUMBER() OVER (ORDER BY change_date) AS rn
FROM task_history
WHERE task_id = 5
)
WHERE rn = 1;
You need to consider what should happen if you have two rows with the same date. This would pick one of them at random. If there is some other criteria you could use to break the tie you could add that to the order by clause. If you wanted all matching rows in that case you could use rank() instead; look at dense_rank() as well, they all have their place.
Use this query to do that:
SELECT max(prev_est_hrs) keep (dense_rank first order by change_date)
FROM task_history
WHERE task_id = 5;