How to select multiple columns while keeping one of them distinct - sql

I want to select two columns (A & B) from a table and only keep the distinct values of one of them (A). However single value of A can map to multiple values of B. So the following query won't work
select distinct A, B from table1
I am thinking of something like this:
select A, agg(B) from table1 group by A
I want the agg function to just randomly pick a single value from B while grouping A. How to do it in Postgres?

If you want an arbitrary value ("any old value"), then min() and max() are arbitrary values:
select a, min(b) as b
from table1
group by a;
If you want an indeterminate value ("value from any row that matches"), then:
select distinct on (a) a, b
from table1
order by a;
If you want a random value ("value from a random matching row chosen from a uniform distribution"), then:
select distinct on (a) a, b
from table1
order by a, random();
In other words, the definition of "random" is different from "arbitrary" and "indeterminate". However, distinct on is probably what you want along.

Use string_agg with comma :
select A, string_agg(distinct B,',') from table1 group by A;

Related

Looking for performance improvements in the SQL

Scenario:
There are 2 columns in the table with data as given in the sample below.
It is possible that the table has multiple rows for the same value of 'a' column.
In the example, Considering the 'a' column, There are three rows for '1' and one row for '2'.
Sample table 't1':
|a|b |
|1|1.1|
|1|1.2|
|1|2.2|
|2|3.1|
Requirement is to get following output:
Expected Query output:
|a|b |
|1|1.2|
|2|3.1|
Requirement:
Get the row if there is only one row for a given value for column 'a'.
If there are multiple rows for the same value for column 'a' and for all rows, FLOOR(b) == a, then get MIN(a) and MAX(b)
If there are multiple rows for column 'a' and for all rows, there is 1 row of column 'b' for which
FLOOR(b) > a, then ignore that row. from the remaining rows, get MIN(a) and MAX(b)
Query I used:
select distinct min(a) over(partition by table1.a) as a,
min(b) over(partition by table1.a) as b
from (
SELECT distinct Min(table2.a) OVER (PARTITION BY table2.a) AS a,
Max(table2.b) OVER (PARTITION BY table2.a) AS b
FROM t1 table2
union
SELECT distinct Min(table3.a) OVER (PARTITION BY table3.a) AS a,
Max(table3.b) OVER (PARTITION BY table3.a) AS b
FROM t1 table3
where table3.a = FLOOR(table3.b)
) table1;
This query is working and I am getting the desired output. Looking for inputs to improve by removing union and extra select from the above script.
Note: t1 is not a table but it's a procedure call in my case and there are additional columns that it returns. It would help if the extra call to the procedure can be avoided.
This is how I would get the data you need.
select t1.a, max(t1.b)
from (select a, b, count(1) over(partition by t1.a) cnt from t1) t1
where t1.a = floor(t1.b) or cnt = 1
group by t1.a ,cnt;
It has only one procedure call so it might run significantly faster
And please note that "union" clause not only appends two data sets, but removes duplicates as well. Removing duplicates causes additional checks between data sets and therefore is leading to performance issues.
It is in most cases better to use "union all" which doesn't check for duplicates

Keep Track of already summed tuples sql

If we have a table with values for a and b, is there a way to only add up the b's if its not a duplicate a? For example
a b
1 2
2 3
2 3
so we would get only 5 (instead of 8)
A sort of
select sum(b if unique a),
from table
where ...
The following query selects the lowest value of b for each group a
select min(b) min_b
from mytable
group by a
You can then sum those values by selecting the sum from a derived table
select sum(min_b) from (
select min(b) min_b
from mytable
group by a
) t
http://sqlfiddle.com/#!9/d82c5/1
You haven't specified your RDBMS, but if you are using a database which supporting window functions like SQL Server, you can query the unique rows first by using WITH clause and ROW_NUMBER() function and then get the SUM out of that.
;WITH C AS(
SELECT a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY a) AS Rn
FROM Table1
)
SELECT SUM(b) FROM C
WHERE Rn = 1
SQL Fiddle

How to select duplicate records without a primary key in SQL Server

If I run this query:
SELECT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24197 records.
If I run this query:
SELECT DISTINCT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24176 records.
How do I go about selecting only the rows that are identical?
SELECT
a,
b,
c
FROM [DMS].[dbo].[CreditDebitAdjustment]
group by a,b,c
having count(*) > 1
If you want to delete those duplicates, use
;WITH CTE AS
(
SELECT
a, b, c,
RowNum = ROW_NUMBER() OVER(PARTITION BY a,b,c ORDER BY ...(define how to order those rows)..)
FROM
[DMS].[dbo].[CreditDebitAdjustment]
)
DELETE FROM CTE
WHERE RowNum > 1
This "partitions" (groups) all your data by the tuple (a,b,c) and gives each row a number - starting at 1 for each new tuple.
So any cases where you have a RowNum that's larger than 1 - that's a duplicate, and I delete it away.
But really: any serious data table ought to have a proper primary key!

Counting with SQL

How would I count how many values of each distinct value there are in a specific table? I have a table with a column containing different values but a varying number of different values. I would like create a table with one column just listing the value and another listing the number of each value.
Say i have a column 'Letter' with values A A A A B B C C going down
I just want to make a table with column 'Letter' and 'Number' with A B C vs 4 2 2
SELECT count(letter) occurences,
letter
FROM table
GROUP BY letter
ORDER BY letter ASC
basically you're looking for the COUNT() function. Be aware that it is an aggregate function and you must use GROUP BY at the end of your SELECT statement
if you have your letters on two columns (say col1 and col2) you should first union them in a single one and do the count afterwards, like this:
SELECT count(letter) occurences,
letter
FROM (SELECT col1 letter
FROM table
UNION
SELECT col2 letter
FROM table)
GROUP BY letter
ORDER BY letter;
the inner SELECT query appends the content of col2 to col1 and renames the resulting column to "letter". The outer select, counts the occurrences of each letter in this resulting column.

In SQL Server, what's the best way to merge tables from multiple databases?

I'm sorry that I can't find a better title of my question. Not lemme describe it in detail.
I have 4 database which are a, b, c and d. Database a have all table's that appear in b, c and d, and they have the same structure with the same constraints(pk, fk, default, check). b, c,d just have some tables that appear in a. Now there already some data in a, b, c and d. In b, c,d there are more data than the counterparts in a. And probably a have duplicated data with b, c,d.
Now what I want to do is export all data in b, c,d and import them to a. I already have a solution but I want to know what is the best method to do such a complicated task.
Thanks.
The UNIONs (no ALL) in the subquery will remove duplicates. Then the IS NULL in the Where will only insert new rows into Table1.
Insert Into DatabaseA.dbo.Table1(ID, Value)
Select ID, Value
FROM (
Select ID, Value From DatabaseB.dbo.Table1
UNION
Select ID, Value From DatabaseC.dbo.Table1
UNION
Select ID, Value From DatabaseD.dbo.Table1
) T
LEFT JOIN DatabaseA.dbo.Table1 S ON T.ID = S.ID
WHERE S.ID IS NULL
You can perform a Insert Into statement with the use of a unions that obtains the results from other databases
Insert Into dboTableA(ID, Value)
Select ID, Value From dbo.DatabaseB.TableA
UNION AlL
Select ID, Value From dob.DatabaseC.TableA
UNION ALL
Select ID, Value From dbo.DatabaseD.TableA