SQL: How to combine count results from multiple tables into multiple columns - sql

I have two tables that are the same in their structure, but all I really want out of them is a distinct count from a single row from each into a multi-column result set.
I keep getting syntax errors, but so far haven't been able to get something that delivers the data I want, and makes it through the parser. I'm trying to figure out if this is a SQL problem (possible given that I'm using a website's implementation, rather than native MySQl/SQL/Oracle) or a me problem (much more likely).
So what I want is two unrelated (and un-primary-keyed) tables to return a COUNT(DISTINCT column) into a single result. I have tried a couple of different approaches:
select 1,2 FROM
(SELECT COUNT(DISTINCT col1) as 1 from table1),
(SELECT COUNT(DISTINCT col2) as 2 from table2)
SELECT *
FROM (
SELECT COUNT(DISTINCT col1) AS 1
FROM table1
)
CROSS JOIN (
SELECT COUNT(DISTINCT col2) AS 2
FROM table2
)
I have also messed around with 4-5 different uses of union || union all to no avail. Curious what thoughts someone more schooled in the arts of SQL might have. Thanks.

Depending on the database RDBM you are, a few things could change in syntax. In the ANSI Sql definition your query should be:
select col1, col2 FROM
(SELECT COUNT(DISTINCT col1) as col1 from table1) as tab1,
(SELECT COUNT(DISTINCT col2) as col22 from table2) as tab2
You have to add alias for all sub queries. Also name your columns with words, not number, it is easier to understand. Though I don't recall if a number is not allowed as an alias in SQL ANSI.
Without aliases for the subqueries you can use like this:
-- For MySql, PostgreSql, SQL Server (not sure though)
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
-- For Oracle
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
from dual
-- For DB2
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
from sysibm.sysdummy1
Side note: you can use a number as an alias if you surround it with double quotes " (this is SQL ANSI and will work everywhere) like this:
select "1", "2" FROM
(SELECT COUNT(DISTINCT col1) as "1" from table1) a, --don't forget the table alias
(SELECT COUNT(DISTINCT col2) as "2" from table2) b
Mysql Also allows you to use back ticks:
select `1`, `2` FROM
(SELECT COUNT(DISTINCT col1) as `1` from table1) a,
(SELECT COUNT(DISTINCT col2) as `2` from table2) b

Related

SQL - Transposing rows from some columns in a table to each record in thesame table

I am using a platform which accepts minimal SQL functions to write a SQL code. The UNPIVOT function cannot be used on the platform so I have to do this manually. I am thinking along the line of UNION ALL and then CROSS JOINING (which I attempted but ended up with the wrong record counts. Please see image attached.
Any help / pointer will be highly appreciated!
I don't know how you used UNION ALL but it can be done like this:
select col1, col2, col3 as NewCol from Table1
union all
select col1, col2, col4 from Table1
You could also use an ORDER BY clause, so that rows with the same col1 and col2 appear in subsequent rows:
select col1, col2, NewCol
from (
select col1, col2, col3 as NewCol, 1 as ord from Table1
union all
select col1, col2, col4, 2 from Table1
) t
order by col1, col2, ord
A portable approach uses union all:
select col1, col2, col3 as newcol from mytable
union all
select col1, col2, col4 from mytable
If your database supports lateral joins (also called cross apply in some databases) and values(), this can be simplified:
select t.col1, t.col2, x.newcol
from mytable t
cross join lateral (values(col3), (col4)) x(newcol)
You can use a cross join, but it requires some case logic. The exact syntax depends on the database, but something like this:
select t.col1, t.col2,
(case when n.n = 1 then t.col3 else t.col4 end) as newcol
from t cross join
(select 1 as n union all select 2) n;
To load another table, you would do one of the following:
insert these results into a table that has already been created.
Use select into or create table as depending on the database.
If you care about the ordering, then you can add order by t.col1, t.col2, n.n.
In most cases, a simple union all approach is fine (such as GMB suggests). That approach requires scanning the table twice, which incurs some additional overhead. However, if the "table" is really a complex query or view, then only processing it once is a bigger advantage.

BigQuery: Use COUNT as LIMIT

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery
The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)
Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

Most efficient way to find distinct records, retaining unique ID

I have a large dataset stored in a SQL server table, with 1 unique ID, and many attributes. I need to select the distinct attribute records, along with one of the unique IDs associated with that unique combination.
Example dataset:
ID|Col1|Col2|Col3...
1|big|blue|ball
2|big|red|ball
3|big|blue|ball
4|small|red|ball
Example Goal (2,3,4 would also have been acceptable) :
ID|Col1|Col2|Col3...
1|big|blue|ball
2|big|red|ball
4|small|red|ball
I have tried a few different methods, but all of them seem to be taking very long (hours), so I was wondering if there was a more efficient approach. Failing this, my next idea is to partition the table.
I have tried:
Using Where exists, e.g.
SELECT * from Table as T1
where exists (select *
from table as T2
where
ISNULL(T1.ID,'') <> ISNULL(T2.ID,'')
AND ISNULL([T1].[Col1],'') = ISNULL([T2].[Col1],'')
AND ISNULL([T1].[Col2],'') = ISNULL([T2].[Col2],'')
)
MAX(ID) and Group By Attributes.
GROUP BY Attributes, having count > 1.
How about just using group by?
select min(id), col1, col2, col3
from t
group by col1, col2, col3;
This will probably take a while. This might be more efficient:
select t.*
from t
where t.id = (select min(t2.id)
from t t2
where t.col1 = t2.col1 and t.col2 = t2.col2 and . . .
);
This requires an index on t(col1, col2, col3, . . ., id). Given your request, that is on all columns.
In addition, this will not work for columns that are NULL. Some databases support the ANSI standard is not distinct from for null-safe comparisons. If yours does, then it should use the index for this construct as well.
SELECT Id,Col1,Col2,Col3 FROM (
SELECT Id,Col1,Col2,Col3,ROW_NUMBER() OVER (Partition By Col1,Col2,Col3 Order By ID,Col1,Col2,Col3) valid
from Table as T1) t
WHERE valid=1
Hope this helps...

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).

Add Identity column to a view in SQL Server 2008

This is my view:
Create View [MyView] as
(
Select col1, col2, col3 From Table1
UnionAll
Select col1, col2, col3 From Table2
)
I need to add a new column named Id and I need to this column be unique so I think to add new column as identity. I must mention this view returned a large of data so I need a way with good performance, And also I use two select query with union all I think this might be some complicated so what is your suggestion?
Use the ROW_NUMBER() function in SQL Server 2008.
Create View [MyView] as
SELECT ROW_NUMBER() OVER( ORDER BY col1 ) AS id, col1, col2, col3
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) AS MyResults
GO
The view is just a stored query that does not contain the data itself so you can add a stable ID. If you need an id for other purposes like paging for example, you can do something like this:
create view MyView as
(
select row_number() over ( order by col1) as ID, col1 from (
Select col1 From Table1
Union All
Select col1 From Table2
) a
)
There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true:
Values of the partitioned column are unique. [partitions are parent-child, like a boss has 3 employees][ignore]
Values of the ORDER BY columns are unique. [if column 1 is unique, row_number should be stable]
Combinations of values of the partition column and ORDER BY columns are unique. [if you need 10 columns in your order by to get unique... go for it to make row_number stable]"
There is a secondary issue here, with this being a view. Order By's don't always work in views (long-time sql bug). Ignoring the row_number() for a second:
create view MyView as
(
select top 10000000 [or top 99.9999999 Percent] col1
from (
Select col1 From Table1
Union All
Select col1 From Table2
) a order by col1
)
Using "row_number() over ( order by col1) as ID" is very expensive.
This way is much more efficient in cost:
Create View [MyView] as
(
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table1
UnionAll
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table2
)
use ROW_NUMBER() with "order by (select null)" this will be less expensive and will get your result.
Create View [MyView] as
SELECT ROW_NUMBER() over (order by (select null)) as id, *
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) R
GO