Removing exact duplicate rows from presto - sql

With the following table (assuming it has many other rows and columns), how could I query it while removing duplicates?
order_id
customer_name
amount
bill_type
1
Chris
10
sale
1
Chris
1
tip
1
Chris
10
sale
Note that while all 3 rows are about the same order, only row 3 is a duplicate -- since row 2 tells us about the tips of that order.
Using distinct order_id would remove rows 2 and 3, while I am looking to only remove row 3.
Appreciate any ideas

If you want a new result set, you can use:
select distinct t.*
from t;
I would suggest saving this into a new table, if you need to materialize the result.

Related

How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery

I have a table like this:
Id
Date
Steps
Distance
1
2016-06-01
1000
1
There are over 1000 records and 50 Ids in this table, most ids have about 20 records, and some ids only have 1, or 2 records which I think are useless.
I want to create a table that excludes those ids with less than 5 records.
I wrote this code to find the ids that I want to exclude:
SELECT
Id,
COUNT(Id) AS num_id
FROM `table`
GROUP BY
Id
ORDER BY
num_id
Since there are only two ids I need to exclude, I use WHERE clause:
CREATE TABLE `` AS
SELECT
*
FROM ``
WHERE
Id <> 2320127002
AND Id <> 7007744171
Although I can get the result I want, I think there are better ways to solve this kind of problem. For example, if there are over 20 ids with less than 5 records in this table, what shall I do? Thank you.
Consider this:
CREATE TABLE `filtered_table` AS
SELECT *
FROM `table`
WHERE TRUE QUALIFY COUNT(*) OVER (PARTITION BY Id) >= 5
Note: You can remove WHERE TRUE if it runs successfully without it.

Combine numbers to 1 row

I have 2 rows, I need to sum 1 column together and make 1 row. Is this possible?
I really just need these 2 rows...
To combine into this row. Only difference is the Pay_Amount fields are summed.
I'm at the point where I have the rows isolated using partition by but am not sure where to go from here. Thanks!
You can use an aggregate function.
select Location, client_no, MAX(Price), MAX(Tax_1), MAX(Tax_2), SUM(Pay_Amount)
from table
group by location, client_no

Sql to get unique rows

I have a table as below.
OId CustId CustSeq
1 A 10
1 A 20
2 A 10
2 A 20
I'm trying to extract unique records as below.
OId CustId CustSeq (Different OIds with different CustSeqs)
1 A 10
2 A 20
May I know how I could come out the query to extract like above?
Just use DISTINCT. That's what it was desgined for although group by will work.
http://www.techonthenet.com/oracle/distinct.php
SELECT DISTINCT OID, CUSTID, CUSTSEQ
FROM TABLE_NAME
Use DISTINCT, and also use Group By for the 2 columns CustId & CustSeq
Check here for example Is it possible to GROUP BY multiple columns using MySQL?

Query to count the number record contain text

I am trying to count the number records based on the text in the table
Am having Table Structure Like this
SN_ID NUMBER
PERSON_ID NUMBER
NOTICE_TYPE VARCHAR2
and the contents of the table like this
SN_ID PERSON_ID NOTICE_TYPE
-------+-----------+--------------
1 5 Appreciation
2 5 Warning
3 1 Warning
4 5 Incident
5 2 Warning
6 5 Warning
I want to count the number Appreciation, Warning and Incident records for the person with an Id = 5
select Notice_type, count(*) from [Table]
where person_id=5
group by notice_type
SELECT NOTICE_TYPE, count(SN_ID)
FROM [Table]
WHERE PERSON_ID = 5
GROUP BY NOTICE_TYPE
This is slightly different from MikkaRin answer.
Difference is count(SN_ID). I took only one column here. Because it is more optimized method than taking whole column into the count() function. This will affected to large queries.
p.s. actually we should get the primary key into the count() function. Here SN_ID look like the PK.

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c