Return distinct values from multiple columns in one query - sql

i have searched but i did not find any good answer actually i got the distinct value but the problem is i am applying query on two columns it should return distinct values but it is returning these values
Au |FAA303
Au |FAA505
From my table i want to appear Au only one time as it is now associated with the FAA303 and FAA505
What i want is like this
Au |FAA303
|FAA505
This is my query in postgresql. I am kinda new to the database queries.
select distinct column1, column2
from table_name

The distinct keyword applies to the combination of all selected fields, not to the first one only.
Suppressing repeated values is something you would typically do in an application that connects to your database and performs the query.
Just to show you that it is possible in SQL, I provide you this query, but please consider doing this in the application instead:
select case row_number() over (partition by column1 order by column2)
when 1 then column1
end as column1,
column2
from (
select distinct column1,
column2
from table_name
order by column1, column2
)

Related

SQL query to calculate subtotal on column

I have a table which have two column say Column1 and Column2.
Column1 comprises of Key and Column2 comprises of Values.
I have to display all key-value pair along with key-group sum.
Currently I am ordering the values by Column1 and calculating sum for each key in view using a local variable.
Can this be merged in a single SQL query.
Please see below image for further diagrammatic view.
Yiou can use union all ordered
select column1 , column2
from my_table
union all
select concat(column1, ' - Sub Total') as column1, sum(column2)
from my_table
group by column1
order by column1

Best way to understand big and complex SQL queries with many subqueries

I just started in a new project, in a new company.
I was given a big and complex SQL, with about 1000 lines and MANY subqueries, joins, sums, group by, etc.
This SQL is used for report generation (it has no inserts nor updates).
The SQL has some flaws, and my first job in the company is to identify and correct these flaws so that the report shows the correct values (I know the correct values by accessing a legacy system written in Cobol...)
How can I make it easier for me to understand the query, so I can identify the flaws?
As an experienced Java programmer, I know how to refactor a complex bad written monolitic Java code into an easier to understand code with small pieces of code. But I have no clue on how to do that with SQL.
The SQL looks like this:
SELECT columns
FROM
(SELECT columns
FROM
(SELECT DISTINCT columns
FROM table000 alias000
INNER JOIN
table000 alias000
ON column000 = table000.column000
LEFT JOIN
(SELECT columns
FROM (
SELECT DISTINCT columns
FROM columns
WHERE conditions) AS alias000
GROUP BY columns ) alias000
ON
conditions
WHERE conditions
) AS alias000
LEFT JOIN
(SELECT
columns
FROM many_tables
WHERE many_conditions
) )
) AS alias000
ON condition
LEFT JOIN (
SELECT columns
FROM
(SELECT
columns
FROM
many_tables
WHERE many_conditions
) ) ) AS alias001
,
(SELECT
many_columns
FROM
many_tables
WHERE many_conditions) AS alias001
) AS alias001
ON condition
LEFT JOIN
(SELECT
many_columns
FROM many_tables
WHERE many_conditions
) AS alias001
ON condition
,
(SELECT DISTINCT columns
FROM table001 alias001
INNER JOIN
table001 alias001
ON condition
LEFT JOIN
(SELECT columns
FROM (
SELECT DISTINCT columns
FROM tables
WHERE conditions
) AS alias001
GROUP BY
columns ) alias001
ON
condition
WHERE
conditions
) AS alias001
LEFT JOIN
(SELECT columns
FROM tables
WHERE conditions
) AS alias001
ON condition
LEFT JOIN (
SELECT columns
FROM
(SELECT columns
FROM tables
WHERE conditions ) AS alias001
,
(SELECT
columns
FROM
tables
WHERE conditions ) AS alias001
) AS alias001
ON condition
LEFT JOIN
(SELECT
columns
FROM
tables
WHERE conditions
) AS alias001
ON condition
WHERE
condition
) AS alias001
order by column001
How can I make it easier for me to understand the query, so I can identify the flaws?
I deal with code like this every day as we do a lot of reporting and exporting of complex data here.
Step one is to understand the meaning of what you are doing. If you don't understand the meaning, you can't evaluate if you got the correct results. So understand exactly what you are trying to accomplish and see if you can see the results you should see for one record in the user interface. It really helps to have something to compare to so that you can see as you go through the query how adding in new things changes the results. If your query has used single letters or something else meaningless for the derived table aliases, then as you figure out the meaning of that that derived table is supposed to be doing, then replace the alias with something more meaningful like Employees instead of A. This will make it easier for the next person who works on it to decode it later.
Then what you do is start at the innermost derived table(Or subquery if you prefer but when it is being used as a table, the term derived table is more accurate). First figure out what it is supposed to be doing. For instance maybe it is getting all the employees who have less than satisfactory performance evaluations.
Run that and check the results to see if they look correct based on the meaning of what you are doing. For instance, if you are looking at unsatisfactory evaluations and you have 10,000 employees would 5617 seem like a reasonable results set for that chunk of data? Look for repeated records. If the same person is in there three times, then likely you have problem where you are joining one to many and getting the many back when you only want one. This can be fixed either through using aggregate functions and group by or putting another derived table in to replace the problem join.
Once you have the innermost part clear, then start checking the results of the other other derived tables, adding the code back in and checking the results until you find where either records dropped out that should not have (Hey I had 137 employees at this stage and now I only have 116. What caused that?) Remember that is only a clue to look at why that happened. There will be times as you build a complex query when the basic results will change and times when they should not have, that is why understanding the meaning of the data is critical.
Some things in general to look out for:
How null values are handled can affect results
Mixing implict and explict joins can cause incorrect results in some
databases.
At any rate you should always replace all implicit joins with
explicit ones. That makes the code clearer and less likely to have
errors.
If you have implicit joins, look for accidental cross joins. They are
very easy to introduce even in short queries, in complex ones, they
are much more likely which is why implicit joins should never be
used.
If you have left joins look out for places where they get
accidentally converted to inner joins by putting a where clause on
the left join table (other than whether id is null). So this
structure is a problem:
FROM table1 t1
LEFT JOIN Table2 t2 ON t1.t1id = T2.t1id
WHERE t2.somefield = 'test'
and should be
FROM table1 t1
LEFT JOIN Table2 t2 ON t1.t1id = T2.t1id
AND t2.somefield = 'test'
Working from the middle is commonplace in SQL and converting the set based logic of sql as sequential logic can lead to performance issues. Try hard to avoid this although I know it will be very tempting to do so.
The first thing I would do is question the join syntax. Is this literally the way it is currently written now?
select
from tb1, tb2, tb3, tb4, tb5 ...
left join ...
That from clause should look like this
From tb1
Inner join tb2 on .....
Inner join tb3 on .....
....
Left join
http://www-03.ibm.com/software/products/en/data-studio
IBM provides an Eclipse-based analysis tool that has the capability of generating a Visual EXPLAIN graph for complex queries. It shows how indexes are used, what internal result sets are produced and combined and so on.
Example:
SELECT * FROM EMPLOYEE, DEPARTMENT WHERE WORKDEPT=DEPTNO
The solution was to simplify the query using COMMON TABLE EXPRESSIONS.
This allowed me to break the big and complex SQL query into many small and easy to understand queries.
COMMON TABLE EXPRESSIONS:
Can be used to break up complex queries, especially complex joins and sub-queries
Is a way of encapsulating a query definition.
Persist only until the next query is run.
Correct use can lead to improvements in both code quality/maintainability and speed.
Can be used to reference the resulting table multiple times in the same statement (eliminate duplication in SQL).
Can be a substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
Example:
WITH cte (Column1, Column2, Column3)
AS
(
SELECT Column1, Column2, Column3
FROM SomeTable
)
SELECT * FROM cte
My new SQL looks like this:
------------------------------------------
--COMMON TABLE EXPRESSION 001--
------------------------------------------
WITH alias001 (column001, column002) AS (
SELECT column005, column006
FROM table001
WHERE condition001
GROUP by column008
)
--------------------------------------------
--COMMON TABLE EXPRESSION 002 --
--------------------------------------------
, alias002 (column009) as (
select distinct column009 from table002
)
--------------------------------------------
--COMMON TABLE EXPRESSION 003 --
--------------------------------------------
, alias003 (column1, column2, column3) as (
SELECT '1' AS column1, '1' as column2, 'name001' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '1' AS column1, '1.1' as column2, 'name002' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '1' AS column1, '1.2' as column2, 'name003' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '2' AS column1, '2' as column2, 'name004' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '2' AS column1, '2.1' as column2, 'name005' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '2' AS column1, '2.2' as column2, 'name006' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '3' AS column1, '3' as column2, 'name007' AS column3 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT '3' AS column1, '3.1' as column2, 'name008' AS column3 FROM SYSIBM.SYSDUMMY1
)
--------------------------------------------
--COMMON TABLE EXPRESSION 004 --
--------------------------------------------
, alias004 (column1) as (
select distinct column1 from table003
)
------------------------------------------------------
--COMMON TABLE EXPRESSION 005 --
------------------------------------------------------
, alias005 (column1, column2) as (
select column1, column2 from alias002, alias004
)
------------------------------------------------------
--COMMON TABLE EXPRESSION 006 --
------------------------------------------------------
, alias006 (column1, column2, column3, column4) as (
SELECT column1, column2, column3, sum(column0) as column4
FROM table004
LEFT JOIN table005 ON column01 = column02
group by column1, column2, column3
)
------------------------------------------------------
--COMMON TABLE EXPRESSION 007 --
------------------------------------------------------
, alias007 (column1, column2, column3, column4) as (
SELECT column1, column2, column3, sum(column0) as column4
FROM table006
LEFT JOIN table007 ON column01 = column02
group by column1, column2, column3
)
------------------------------------------------------
--COMMON TABLE EXPRESSION 008 --
------------------------------------------------------
, alias008 (column1, column2, column3, column4) as (
select column1, column2, column3, column4 from alias007 where column5 = 123
)
----------------------------------------------------------
--COMMON TABLE EXPRESSION 009 --
----------------------------------------------------------
, alias009 (column1, column2, column3, column4) as (
select column1, column2,
CASE WHEN column3 IS NOT NULL THEN column3 ELSE 0 END as column3,
CASE WHEN column4 IS NOT NULL THEN column4 ELSE 0 END as column4
from table007
)
----------------------------------------------------------
--COMMON TABLE EXPRESSION 010 --
----------------------------------------------------------
, alias010 (column1, column2, column3) as (
select column1, sum(column4), sum(column5)
from alias009
where column6 < 2005
group by column1
)
--------------------------------------------
-- MAIN QUERY --
--------------------------------------------
select j.column1, n.column2, column3, column4, column5, column6,
column3 + column5 AS column7,
column4 + column6 AS column8
from alias010 j
left join alias006 m ON (m.column1 = j.column1)
left join alias008 n ON (n.column1 = j.column1)
EDIT: I got downvoted on this answer, possibly because they thought I was proposing this as how you should build the final query. I should clarify that this is purely to try and understand what is going on. Once you understand the subqueries and how they link together, you would then use that knowledge to makes the necessary changes to the query and rebuild it in an efficient way.
I've used the technique of intermediate temp tables to troubleshoot complex queries quite a bit. They break the logic up into smaller chunks and are also useful if the original query takes a long time. You can test how to combine these intermediate tables without the overhead of rerunning the whole query. Sometimes I'll use temporary views instead of temp tables because the query optimiser can continue to use indexes on the base tables. The temporary views would get then get dropped once you've finished.
I would start from the innermost subqueries and work my way to the outside.
You're looking for subqueries which appear several times under slightly different guises, and also to give them a concise description - what are they designed to do?
Eg, replace
from (
select x1.y1, x1.y2, x1.y3 ...
from tb1, tb2, tb3, tb4, tb5 ...
left join ...
where ...
group by ...
) as a1
with
from daniel_view1 as a1
where daniel_view1 is
create view daniel_view as
select x1.y1, x1.y2, x1.y3 ...
from tb1, tb2, tb3, tb4, tb5 ...
left join ...
where ...
group by ...
That will already make it look cleaner. Then compare the views. Can any be merged together? You won't necessarily end up keeping the views in the final product, but they will help see the broader pattern without drowning in detail.
Alternatively, you could insert the subquery into a temp table
insert #daniel_work1
select x1.y1, x1.y2, x1.y3 ...
from tb1, tb2, tb3, tb4, tb5 ...
left join ...
where ...
group by ...
Then replace the subquery with
select ... from #daniel_work1 as a1
The other thing you could do is to see if you can break it up into sequential steps.
If you see
select ... from ...
union all
select ... from ...
this could become
insert #steps
select 'step1', ...#1...
insert #steps
select 'step2', ...#2...
union is trickier because set union removes duplicate rows (rows where all of their columns are the same as another row).
By storing intermediate results in temp tables, you can look inside the query as it unfolded, and replay difficult steps. I have 'step_id' as the first column of all my debugging temp tables, so if it gets filled in stages, then you see what data applies to what stage.
There are a few tricks that give a clue about what is going on. If you see a table joined to itself like this:
select ... from mytable t1 inner join mytable t2 on t2.id < t.id
it usually means they want a cross product of the table with itself, but without duplicates. you'll get keys 1 & 2 but not 2 & 1.

Sort column values to match order of values in another table column

Let's say I have table like this:
Column1 Column2
C 2
B 1
A 3
I need to exchange values in the second column to get this:
Column1 Column2
C 3
B 2
A 1
The goal is only for numeric column to have values sorted to follow alphabetical order on another column. The actual table has multiple columns and column 1 is people's name, while column 2 two is rank for rendering column 1 values in UI.
What is the most optimal way to do this?
I am doing this from C# code, on SQL server and have to use System.Data.SqlClient.SqlCommand because of transaction. But maybe it's not important if this can all be done from SQL.
Thank you!
So you need to update Column2 with the row-number according toColumn1?
You can use ROW_NUMBER and a CTE:
WITH CTE AS
(
SELECT Column1, Column2, RN = ROW_NUMBER() OVER (ORDER BY Column1)
FROM MyTable
)
UPDATE CTE SET Column2 = RN;
This updates the table MyTable and works because the CTE selects a single table. If it contains more than one table you have to JOIN the UPDATE with the CTE.
Demo

how to do nested SQL select count

i'm querying a system that won't allow using DISTINCT, so my alternative is to do a GROUP BY to get near to a result
my desired query was meant to look like this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT(column3)) AS column3
FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(SELECT column FROM table GROUP BY column) AS column3
FROM table
but it didn't work. Am i close?
You are using the wrong syntax for COUNT(DISTINCT). The DISTINCT part is a keyword, not a function. Based on the docs, this ought to work:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT column3) AS column3
FROM table
Do, however, read the docs. BigQuery's implementation of COUNT(DISTINCT) is a bit unusual, apparently so as to scale better for big data. If you are trying to count a large number of distinct values then you may need to specify a second parameter (and you have an inherent scaling problem).
Update:
If you have a large number of distinct column3 values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit):
SELECT *
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
)
CROSS JOIN (
SELECT count(*) AS column3
FROM (
SELECT column3
FROM table
GROUP BY column3
)
)
Update 2:
Not that joining two one-row tables would be at all expensive, but #FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(*) AS column3
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
GROUP BY column3
)
This one computes a subtotal of column1 and column2 values, grouping by column3, then counts and totals all the subtotal rows. It feels right.
FWIW, the way you are trying to use DISTINCT isn't how its normally used, as its meant to show unique rows, not unique values for one column in a dataset. GROUP BY is more in line with what I believe you are ultimately trying to accomplish.
Depending upon what you need you could do one of a couple things. Using your second query, you would need to modify your subquery to get a count, not the actual values, like:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
(SELECT sum(1) FROM table GROUP BY column) AS column3
FROM table
Alternatively, you could do a query off your initial query, something like this:
SELECT sum(column1), sum(column2), sum(column4) from (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
1 AS column4
FROM table GROUP BY column3)
GROUP BY column4
Edit: The above is generic SQL, not too familiar with Google Big Query
You can probably use a CTE
WITH result as (select column from table group by column)
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
Select Count(*) From result AS column3
FROM table
Instead of doing a COUNT(DISTINCT), you can get the same results by running a GROUP BY first, and then counting results.
For example, the number of different words that Shakespeare used by year:
SELECT corpus_date, COUNT(word) different_words
FROM (
SELECT word, corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
As a bonus, let's add a column that identifies which books were written during each year:
SELECT corpus_date, COUNT(word) different_words, GROUP_CONCAT(UNIQUE(corpus)) books
FROM (
SELECT word, corpus_date, UNIQUE(corpus) corpus
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date

SQL Query to retrieve results from two equally designed tables

How can I query the results of two equally designed tables?
if table1 contains 1 column with data:
abc
def
hjj
and table2 contains 1 column with data:
uyy
iuu
pol
then i want my query to return
abc
def
hjj
uyy
iuu
pol
but I want to make sure that if I try to do the same task with multiple columns that the associations remain.
SELECT
Column1, Column2, Column3 FROM Table1
UNION
SELECT
Column1, Column2, Column5 AS Column3 FROM Table2
ORDER BY
Column1
Notice how I do an order by at the end and that Column5 in Table2 is the equivalent of Column3 in Table1. The Order By is of course optional, but allows you to control the order of items from both tables once they are combined.
Use a UNION
SELECT *
FROM TABLE_A
UNION
SELECT *
FROM TABLE_B
UNION will give you all distinct results, as where UNION ALL will give you results combined from the sets.
SELECT col FROM t1 UNION SELECT col FROM t2
Union reference.
sev, since union is the solution to what you described and you say that didn't work, perhaps you can provide the code you wrote that didn't work as clearly we are missing part of the picture. Are you positive the second table has the records you want? How do you know for sure?