SELECT DISTINCT doesn't appear to work with big query - sql

I am filtering to a second created table that have duplicates removed. However I'm finding that DISTINCT seems not be working, and I end up with rows with identical ids. I want to only select one unique ID and throw any remaining ones away, but this is not what is happening. In other-words I do not care about the other column names.
def de_dupe_affiliates(read_table, write_table):
query = """
CREATE OR REPLACE TABLE `{write_table}` AS
SELECT DISTINCT ID, BRAND, TITLE, SHORT_TITLE, PRICE, FROM `{read_table}`
""".format(read_table=read_table,write_table=write_table)
response = client.query(query).result()
I also tried
SELECT DISTINCT(ID), BRAND
But this did the same. Is it possible to do this with a DISTINCT on one column?

Consider below approach
SELECT AS VALUE ANY_VALUE(t) FROM (
SELECT ID, BRAND, TITLE, SHORT_TITLE, PRICE FROM read_table
) t
GROUP BY ID

Your select clause
SELECT DISTINCT ID, BRAND, TITLE, SHORT_TITLE, PRICE FROM `{read_table}`
is equivalent to
SELECT ID, BRAND, TITLE, SHORT_TITLE, PRICE FROM `{read_table}` GROUP BY ID, BRAND, TITLE, SHORT_TITLE, PRICE
meaning any differences within these fields creates new rows in your result.
Your query works only if ID, BRAND, TITLE, SHORT_TITLE, PRICE fields are unique.
If not, you may use window functions like row_number() or rank() to select one row per id.

Related

Custom table query in SQL Server similar to custom table in SPSS?

From the sample data it will compute the average price per name and code including the combinations for all names and all codes.
Currently I'm using the union all for all the combinations which is a tedious way. Is there a simplest way to query that case?
SELECT NAME ,CODE, AVG(PRICE)
FROM SAMPLE_DATA
GROUP BY
NAME ,CODE
UNION ALL
SELECT 'ALL NAMES' ,CODE, AVG(PRICE)
FROM SAMPLE_DATA
GROUP BY
CODE
UNION ALL
SELECT NAME, 'ALL CODES', AVG(PRICE)
FROM SAMPLE_DATA
GROUP BY NAME
UNION ALL
SELECT 'ALL NAMES', 'ALL CODES, AVG(PRICE)
FROM SAMPLE_DATA
You can use GROUPING SETS:
SELECT NAME, CODE, AVG(PRICE)
FROM SAMPLE_DATA
GROUP BY GROUPING SETS ( (NAME, CODE), (NAME), (CODE), () )
Just include all the combinations you want in the list.

How to formulate a conditional sum in PostgreSQL?

I have a table containing id, category, noofquestions and company. I want a query which would return the noofquestions as sum of the values of noofquestions when category is same in two or more columns. I'm trying this query but it is only adding those columns whose category is same and noofquestions are equal which is wrong. It should not check for noofquestions.
SELECT id , category, SUM(NULLIF(noofquestions, '')::int), company
FROM tableName
WHERE id=1
GROUP BY id, category, noofquestions, company;
You should not group by noofquestions:
SELECT id, category, SUM(NULLIF(noofquestions, '')::int), company
FROM tableName
WHERE id = 1
GROUP BY id, category, company;

How to use distinct clause only for one column

I wanted to use distinct clause only for one column.I am having query like this
select id,brandname from brand.
Here brandname have same entry multiple time.I wanted to choose distinct brandname along with id.
You have to pick some way of getting only one ID, e.g.,
select max(id) , brandname
from brand
group by brandname
if you had more than one column you wanted... if the data was the same you could just continue to group by... however if extra columns had varying data you could use a slightly different strategy.
select * from brand
where id in
(
select max(id)
from brand
group by brandname
)
You can do this:
select Id,BrandName from brand group by BrandName,Id

select max, min values from two tables

I have two tables. Differ in that an archive is a table and the other holds the current record. These are the tables recording sales in the company. In both we have among other fields: id, name, price of sale. I need to select from both tables, the highest and lowest price for a given name. I tried to do with the query:
select name, max (price_of_sale), min (price_of_sale)
from wapzby
union
select name, max (price_of_sale), min (price_of_sale)
from wpzby
order by name
but such an inquiry draws me two records - one of the current table, one table archival. I want to chose a name for the smallest and the largest price immediately from both tables. How do I get this query?
Here's two options (MSSql compliant)
Note: UNION ALL will combine the sets without eliminating duplicates. That's a much simpler behavior than UNION.
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM
(
SELECT Name, Price_Of_Sale
FROM wapzby
UNION ALL
SELECT Name, Price_Of_Sale
FROM wpzby
) as subQuery
GROUP BY Name
ORDER BY Name
This one figures out the max and min from each table before combining the set - it may be more performant to do it this way.
SELECT Name, MAX(MaxPrice) as MaxPrice, MIN(MinPrice) as MinPrice
FROM
(
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wapzby
GROUP BY Name
UNION ALL
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wpzby
GROUP BY Name
) as subQuery
GROUP BY Name
ORDER BY Name
In SQL Server you could use a subquery:
SELECT [name],
MAX([price_of_sale]) AS [MAX price_of_sale],
MIN([price_of_sale]) AS [MIN price_of_sale]
FROM (
SELECT [name],
[price_of_sale]
FROM [dbo].[wapzby]
UNION
SELECT [name],
[price_of_sale]
FROM [dbo].[wpzby]
) u
GROUP BY [name]
ORDER BY [name]
Is this more like what you want?
SELECT
a.name,
MAX (a.price_of_sale),
MIN (a.price_of_sale) ,
b.name,
MAX (b.price_of_sale),
MIN (b.price_of_sale)
FROM
wapzby a,
wpzby b
ORDER BY
a.name
It's untested but should return all your records on one row without the need for a union
SELECT MAX(value) FROM tabl1 UNION SELECT MAX(value) FROM tabl2;
SELECT MIN(value) FROM tabl1 UNION SELECT MIN(value) FROM tabl2;
SELECT (SELECT MAX(value) FROM table1 WHERE trn_type='CSL' and till='TILL01') as summ, (SELECT MAX(value) FROM table2WHERE trn_type='CSL' and till='TILL01') as summ_hist

Using SELECT DISTINCT in MYSQL

Been doing a lot of searching and haven't really found an answer to my MYSQL issue.
SELECT DISTINCT name, type, state, country FROM table
Results in 1,795 records
SELECT DISTINCT name FROM table
Results in 1,504 records
For each duplicate "name"... "type", "state", "country" aren't matching in each record.
Trying to figure out how to SELECT the associated row to the DISTINCT name, without checking them for being DISTINCT or not
SELECT name, type, state, country FROM table GROUP BY name;
should do the trick.
If you want distinct name, you must decide which of the multiple values that may occur for each distinct name you want. For example, you may want minimals, or counts:
SELECT name, min(type), min(state), count(country) FROM table GROUP BY name