DISTINCT in SQL Redshift - sql

What does the query below give exactly? I have tried to play around with it but don't understand the results generated.
SELECT DISTINCT metric, value
FROM
Table X
If I had just "SELECT DISTINCT metric FROM table X", I understand that it would just return all distinct values in the column metric but what is it doing when you add an extra column to the end (like the above case where we have the column "value")?

When you use distinct with multiple columns in the select clause it acts on all the columns to give you unique combinations of those column values.

The addition of a column meaning more than one column in select with a distinct would treat both the columns together as unique not one column only as if suppose one column has more duplicates than the other then if distinct gives unique for one column and not the other then it would led to inconsistency

The metric and value column will act as a combination here. If the metric=table_name has two tables x, y with different owner then the distinct result will be
table_name - x
table_name - y

Related

Combine multiple rows with N-1 identical columns and 1 different column into one row, preserving the first N-1 columns and summing the last column

I have a query that produces a table with 26 columns, A-Z. For some rows, columns A-Y are identical, and column Z is the only one that differs. Is there an easy and clean way to combine duplicate rows, such that columns A-Y are the same and column Z is summed over? My solution is to do something like
SELECT A, B, C,...,Y,SUM(Z)
-- lots of work
FROM [table produced by multiple joins]
GROUP BY A, B, C,...,Y
The last GROUP BY clause ends up being very long. It's also prone to making mistakes if columns are ever added or removed from the SELECT statement. Is this the only way to go about what I want to do?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ANY_VALUE((SELECT AS STRUCT t.* EXCEPT(z))).*,
SUM(z) AS z
FROM `project.dataset.table_produced_by_multiple_joins` t
GROUP BY FORMAT('%t', (SELECT AS STRUCT t.* EXCEPT(z)))

How to select rows that meets multiple criteria from a single column in SQL?

I have a question similar to this one:
SQL: how to select a single id ("row") that meets multiple criteria from a single column
But in my case, the pairs of values are not unique, for example:
A user_id could be paired with same ancestry more than one time (more than one row with same user_id - ancestry).
Which could be a good and efficient solution?
The array of ancestries that must pass the condition could be large and variable (until 200) which makes me think that the join solution will be very inefficient. Furthermore as pairs of values are not uniques, the "in..group by" solution will not works.
Correct me if I'm wrong. Do you want to know which user_id has X ancestors (X being a variable amount of ancestors)?
Select t.user_id
from (select distinct *
from your_table) t
where t.ancestry in XAncestors
group by t.user_id
having count(t.user_id) = length(XAncestors)
Just to clarify, this is the exact same query as in the question you posted but with a subquery in the from to select only distinct values

comparing rows in Sql, without using distinct operator? (distinct operator implementation)

I want to compare twos rows from a query result, for instance, if 1st row is equal to 2nd Row.
Given a query of the form
SELECT * FROM table_name
if the query results 100 rows, then how do we compare each rows for equality. just i am curious about the sql server how it will implement. basically implementation of Distinct operator. just want to know the how the SQL server will implement in behind the process. as it will help to understand the concept more in clearer way.
Simplest way the sql server may use - to compare hashes of whole rows:
SELECT CHECKSUM(*)
from YourTable
or choosen columns
SELECT CHECKSUM(col1, col2, col3)
from YourTable
and if checksums differ - the rows are differ, but if checksum match - it need to check more carefully over exact values of columns, but it will be more or less easier to filter out the results which checksums is not match.
To check the candidates to duplicates:
SELECT CHECKSUM(*)
from YourTable
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
You could use the following query:
SELECT *
FROM table_name
GROUP BY col1,col2,... -- all columns to test for equality here
HAVING COUNT(*)>1
In the GROUP BY you put the name of every column you want to be equal. If you want entire rows to be equal, put down the name of every column in the table there.
No matter what, your table "in a relational database" will have a primary key that will be used in other tables.
Because of this, your rows 1-100 will all be unique because of that key.
However, if you are trying to compare specific columns, you will need to build a function similar to this:
$temp;
$i=0;
$stmt = $mysqli->prepare("SELECT id, name FROM users");
$stmt->execute();
$stmt->bind_result($id, $name);
while($stmt->fetch()){
if($temp!=$name){
$temp=$name;
$saveIDs[$i]=$id;
}
$i++;
}

question about SQL query

Given a relation R with n columns. Use sql to returns the tuples having the maximum number of occurrences of the values. I have no idea how to do query horizontally?
SELECT MAX(t.*) FROM mytable t
or
SELECT DISTINCT a, b, c FROM mytable
or
SELECT DISTINCT * FROM mytable
it depends on which SQL implementation you are referring to, and generally more information about the query. but the above examples should get you started so you can google some terms.
I'm not sure what you mean by querying horizontally. Is it one relation with multiple key columns linking the two tables? Sounds like you might just need to group by those columns and order by count(*) descending...

SQL Query MAX with SUm

I have a table where i have ID,matchid,point1,point2. I need to get the ID which has the maximum points but the problem i am facing is i need find max record depending on sum of both (point1+point), I have no idea how I can get the max with the combination of 2 columns i have tried query such as,
SELECT MAX(column1+column2) FROM table
MAX(SUM(column1,column2)) FROM table
but nothing works I am using Ms:Access
This will return more than one answer if more than one sum=max:
SELECT ID FROM Table1
WHERE ([Field1]+[Field2])=(
SELECT Max([Field1]+[Field2]) AS Expr1
FROM Table1)
You can use a subquery e.g.
select id from table where point1+point2 = (select max(point1+point2) from table)
Note that this will return multiple rows if more than one record has the same maximum points.