Extracting data from SQL Server with no duplicate of one particular column - sql

Similar questions have been asked before and I tried all those solutions. Unfortunately none of them worked for me. Here is my requirement. There is a table with column1 (which is PK), column2, column3, column4, column5. Other than the first column (PK), all columns may have duplicate values in different rows. However, I want to pull a list of all rows with just one condition, column2 must not repeat. If there are multiple rows with duplicate values in column2, I just want any of the rows (say the first one, which can be done using min(column1)) and disregards the rows that has same values in column2.
I tried group by, didn't work because group by requires me to put all columns in group by and that results in a combination of all columns being unique.
EDIT
Thank you everybody for putting me on the right track. I tried many things and finally I think I found the right answer. Please comment if you see any issues with it.
select * from myTable where column1 in (select MIN(column1) from myTable group by column2)

To get the complete row with the minimum column1 per value of column2, you can use analytic functions to assign an order of the rows ordered by column1 partitioning by column2.
Then you can just pick the rows with row number 1 (the first row per partition) and you'll get the complete existing row with the smallest value of column1;
WITH cte AS (
SELECT column1, column2, column3, column4, column5,
ROW_NUMBER() OVER (PARTITION BY column2 ORDER BY column1) rn
FROM test
)
SELECT column1, column2, column3, column4, column5
FROM cte
WHERE rn=1;
An SQLfiddle to test with.

Just apply MIN() to all the other columns you want to retrieve, and use group by on the one column you want to be unique.
SELECT MIN(column1), column2, MIN(column3), MIN(column4), MIN(column5)
FROM your_table
GROUP BY column2
Update
As mentioned in the comments, the above solution does not return a complete row, but the minimum values across all columns in the group. To retrieve a complete row, based on the lowest value in column1:
SELECT column1, column2, column3, column4, column5
FROM your_table t1
WHERE t1.id = (
SELECT TOP 1 t2.id
FROM your_table t2
WHERE t2.column2 = t1.column2
ORDER BY t2.column1
)

Related

Is there a function in SQL that allows me to sum specific rows based on a column value?

I want to sum City4, and the two misspellings together as one row. Any input on how to do this?
SELECT column1,
column2,
count(column3),
Sum(Column4)
FROM TABLE
AND column1 IN ('state1',
'State2',
'State3')
AND column2 IN ('City1',
'City2',
'City3',
'City4',
'City4 misspelled1',
'City4 misspelled 2')
GROUP BY column1,
column2
ORDER BY column1;

Removing duplicates of column2 then group them based on column1 , then sum the values of column3 in sql

The table looks like
column1 column2 column3
400196 2021-07-06 33
400196 2021-07-06 33
400196 2021-08-16 33
I want to get the sum of column3 values based on grouping of column 1 but the duplicate values of date should not be added
The desired output is:
column1 column3
400196 66
The query I wrote is
select sum(column3)
from table_name
group by column1
But this gives me result 99
You can remove duplicate values in a subquery:
select t.column1, sum(t.column3)
from (select distinct t.column1, t.column2, t.column3
from t
) t
group by t.column1;
Note: This sort of problem can arise when you are joining tables together. Removing duplicates may not always be the right solution. Often it is better to do the calculation before joining, so you don't have duplicate values to deal with.
You could use a two step process here, first remove duplicates, then aggregate and sum:
SELECT column1, SUM(column3) AS column3
FROM (SELECT DISTINCT column1, column2, column3 FROM yourTable) t
GROUP BY column1;
Demo

Is it possible to ORDER BY a computed column without including it in the result set?

I have this query:
SELECT Column1, Column2, Column3, /* computed column */ AS SortColumn
FROM Table1
ORDER BY SortColumn
SortColumn serves no other purpose as to define an order for sorting the result set. Thus I'd like to omit it in the result set to decrease the size of the data sent to the client. The following fails …
SELECT Column1, Column2, Column3
FROM (
SELECT Column1, Column2, Column3, /* computed column */ AS SortColumn
FROM Table1
ORDER BY SortColumn
) AS SortedTable1
… because of:
Msg 1033, Level 15, State 1
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
So there's this hacky solution:
SELECT Column1, Column2, Column3
FROM (
SELECT TOP /* very high number */ Column1, Column2, Column3, /* computed column */ AS SortColumn
FROM Table1
ORDER BY SortColumn
) AS SortedTable1
Is there a clean solution I'm not aware of, since this doesn't sound like a rare scenario?
Edit:
The solutions already given work indeed fine for the query I referred to. Unfortunately, I left out an important detail: The (already existent) query consists of two SELECTs with a UNION in between, which changes the matter pretty much (again simplified, and hopefully not too simplified):
SELECT Column1, Column2, Column3
FROM Table1
UNION ALL
SELECT Column1, Column2, Column3
FROM Table1
ORDER BY /* computed column */
Msg 104, Level 16, State 1
ORDER BY items must appear in the select list if the statement contains a UNION, INTERSECT or EXCEPT operator.
So this error message clearly says that I have to put the computed column in both of the select lists. So there we are again with the subquery solution which doesn't reliably work, as pointed out in the answers.
You don't need to have a computed column in the select statement to use it in an order by
SELECT Column1, Column2, Column3
FROM Table1
ORDER BY /* computed column */
If you need to do it using UNION, then do the UNION in a cte, and the order by in the select, making sure to include all the columns you need to do the calculation in the CTE
WITH src AS (
SELECT Column1, Column2, Column3, /* computation */ ColumnNeededForOrderBy
FROM Table1
UNION ALL
SELECT Column1, Column2, Column3, /* computation */ ColumnNeededForOrderBy
FROM Table2
)
SELECT Column1, Column2, Column3
FROM src
ORDER BY ColumnNeededForOrderBy
If you don't care to be specific with the column name, you can use the column index and skip the CTE. I don't like this because you might add a column to the query later and forget to update the index in the ORDER BY clause (I've done it before). Also, the query plans will likely be the same, so it's not like the CTE will cost you anything.
SELECT Column1, Column2, Column3, /* computation */
FROM Table1
UNION ALL
SELECT Column1, Column2, Column3, /* computation */
FROM Table2
ORDER BY 4
If, for whatever reason, it's not practical to do the calculation in the ORDER BY, you can do something quite similar to your attempt:
SELECT Column1, Column2, Column3
FROM (
SELECT Column1, Column2, Column3, /* computed column */ AS SortColumn
FROM Table1
) AS SortedTable1
ORDER BY SortColumn
Note that all that's changed here is that the ORDER BY is applied to the outer query. It's perfectly valid to reference columns in the ORDER BY that don't appear in the SELECT clause.
Just put the expression in the order by:
SELECT Column1, Column2, Column3,
FROM Table1
ORDER BY <computed column>
The reason this is forbidden is that the ordering of the outer select has nothing to do with the ordering of the inner select - not by contract. So if you use order by without a top clause, you're obviously making a mistake. By using top the way you do, you simply hide the error, but you still have the same mistake.
Your hack only works because the engine happened to preserve the order - but that's not a given, and there's no way to enforce that (other than using order by in the outer query). For example, a different index usage or parallel execution can scramble your data.
So no, there isn't another way - you need to order by in the outer query, and that requires you to output the column you want to sort by in the subquery. And unless you're using *, it's not like it makes any difference - you don't need to select it in the outer select, just the inner one. And only the outer select is sent to the client :)
The only place for an ORDER BY is the outer most statement.
Of course there are exceptions: If you for example need the TOP record for a filtered list (e.g. the last valid value on a given date). But in these cases you must combine ORDER BY with TOP.
Only the outer most ORDER BY will sort the list you get.
After the edit looks like this is what you need
SELECT Column1, Column2, Column3
FROM
(
SELECT Column1, Column2, Column3
FROM Table1
UNION ALL
SELECT Column1, Column2, Column3
FROM Table1
)
ORDER BY /* computed column */

how to do nested SQL select count

i'm querying a system that won't allow using DISTINCT, so my alternative is to do a GROUP BY to get near to a result
my desired query was meant to look like this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT(column3)) AS column3
FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(SELECT column FROM table GROUP BY column) AS column3
FROM table
but it didn't work. Am i close?
You are using the wrong syntax for COUNT(DISTINCT). The DISTINCT part is a keyword, not a function. Based on the docs, this ought to work:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT column3) AS column3
FROM table
Do, however, read the docs. BigQuery's implementation of COUNT(DISTINCT) is a bit unusual, apparently so as to scale better for big data. If you are trying to count a large number of distinct values then you may need to specify a second parameter (and you have an inherent scaling problem).
Update:
If you have a large number of distinct column3 values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit):
SELECT *
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
)
CROSS JOIN (
SELECT count(*) AS column3
FROM (
SELECT column3
FROM table
GROUP BY column3
)
)
Update 2:
Not that joining two one-row tables would be at all expensive, but #FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(*) AS column3
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
GROUP BY column3
)
This one computes a subtotal of column1 and column2 values, grouping by column3, then counts and totals all the subtotal rows. It feels right.
FWIW, the way you are trying to use DISTINCT isn't how its normally used, as its meant to show unique rows, not unique values for one column in a dataset. GROUP BY is more in line with what I believe you are ultimately trying to accomplish.
Depending upon what you need you could do one of a couple things. Using your second query, you would need to modify your subquery to get a count, not the actual values, like:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
(SELECT sum(1) FROM table GROUP BY column) AS column3
FROM table
Alternatively, you could do a query off your initial query, something like this:
SELECT sum(column1), sum(column2), sum(column4) from (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
1 AS column4
FROM table GROUP BY column3)
GROUP BY column4
Edit: The above is generic SQL, not too familiar with Google Big Query
You can probably use a CTE
WITH result as (select column from table group by column)
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
Select Count(*) From result AS column3
FROM table
Instead of doing a COUNT(DISTINCT), you can get the same results by running a GROUP BY first, and then counting results.
For example, the number of different words that Shakespeare used by year:
SELECT corpus_date, COUNT(word) different_words
FROM (
SELECT word, corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
As a bonus, let's add a column that identifies which books were written during each year:
SELECT corpus_date, COUNT(word) different_words, GROUP_CONCAT(UNIQUE(corpus)) books
FROM (
SELECT word, corpus_date, UNIQUE(corpus) corpus
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date

appending 2 columns into one list in sql

I have 2 tables and each table has some 3 columns. i want to get one column such that one column from each table are apended one after the other
eg:- suppose one column in a table contains hai, how, are, you.
and another column in another column contains i, am, fine.
i want a query which gives hai, how, are, you,i,am,fine. in just one column
can anybody give a query for this in sql...
If I understand your schema correctly you have this
Table1: Column1
hai,
how,
are,
you.
Table2: Column2
i,
am,
fine.
Do This:
Insert Into Table1 (Column1)
Select Column2 From Table2
You will get this:
Table1: Column1
hai,
how,
are,
you.
i,
am,
fine.
If you have 3 Columns
Then just do this:
Insert Into Table1 (Column1, Column2, Column3) //the (Column1, Column2, Column3) is not neccessary if those are the only columns in your Table1
Select Column1, Column2, Column3 From Table2 //the Select Column1, Column2, Column3 could become Select * if those are the only columns of your Table2
EDIT: Do this if you don't want to modify any tables.
Select Column1, Column2, Column3
From Table1
UNION ALL
Select Column1, Column2, Column3
From Table2
Your question isn't very clear. One interpretation of it is that you want to UNION the two:
select column
from table1
union
select column
from table2;
If you really want all rows from both tables (and not the distinct values), UNION ALL will be faster than UNION.
If you want the rows in a certain order be sure to specify an ORDER BY clause.