SQL combine two query results - sql

I can't use a Union because it's not the result I want, and I can't use join because I haven't any common column. I have tried many different SQL query structures and nothing works as I want.
I need help to achieve what I believe is a really simple SQL query. What I am doing now is
select a, b
from (select top 4 a from element_type order by c) as Y,
(SELECT * FROM (VALUES (NULL), (1), (2), (3)) AS X(b)) as Z
The first is a part of a table and the second is a hand created select that gives results like this:
select a; --Give--> a,b,c,d (1 column)
select b; --Give--> 1,2,3,4 (1 column)
I need a query based on the two first that give me (2 column) :
a,1
b,2
c,3
d,4
How can i do this? UNION, JOIN or anything else? Or maybe I can't.
All I can get for now is this:
a,1
a,2
a,3
a,4
b,1
b,2
...

If you want to join two tables together purely on the order the rows appear, then I hope your database support analytic (window) functions:
SELECT * FROM
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table1 t) t1
INNER JOIN
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table2 t) t2
ON t1.rown = t2.rown
Essentially we invent something to join them on by numbering the rows. If one of your tables already contains incrementing integers from 1, you dont need to ROW_NUMBER() OVER() on that table, because it already has suitable data to join to; you just invent a fake column of incrementing nubmers in the other table and then join together
Actually, even if it doesn't support analytics, there are ugly ways of doing row numbering, such as joining the table back to itself using id < id and COUNT(*) .. GROUP BY id to number the rows. I hate doing it, but if your DB doesnt support ROW_NUMBER i'll post an example.. :/
Bear in mind, of course, that RDBMS have R in the name for a reason - related data is.. well.. related. They don't do so well when data is unrelated, so if your hope is to join the "chalks" table to the "cheese" table even though the two are completely unrelated, you're finding out now why it's hard work! :)

Try using row_number. I've created something that might help you. See below:
declare #tableChar table(letter varchar)
insert into #tableChar(letter)
select 'a';
insert into #tableChar(letter)
select 'b';
insert into #tableChar(letter)
select 'c';
insert into #tableChar(letter)
select 'd';
select letter,ROW_NUMBER() over(order by letter ) from #tableChar

You can user row_number() to achieve this,
select a,row_number() over(order by a) as b from element_type;
As you are not taking second part from other table, so you do not need to use join. But if you are doing this on different tables the you can use row_number() to create key for both the tables and bases on those keys, you can join.
Hope it will help.

Related

SQL count(distinct) from both the table

I have 2 tables. Let's say Table A and Table B. Table A has a column called "name". Table B also has a column "name". I want to find out the count(distinct name). Name should take values from both the columns.
For ex-
Table A
name
A
B
C
Table B
name
A
B
D
Output should be 4.
The best concept is, first combine the data in the way you want using a subquery, and then dedupe or do the 2nd step.
For example,
WITH COMBINED AS (
SELECT
name
FROM
TableA
UNION ALL
SELECT
name
FROM
TableB
)
SELECT
DISTINCT name
FROM
COMBINED
In your situation, the 2nd step can be accomplished by changing UNION ALL to a UNION. This will dedupe the values automatically. You won't even need a subquery or a 2nd step. But I wanted to teach you the concept because it comes up often.
SELECT name FROM TableA
UNION
SELECT name FROM TableB
Then UNION in the CTE will reove all Duplicates
so a COUNT(*) will suffoce
WITH CTE AS (
SELECT name FROM TableA
UNION
SELECT name FROM TableB
)
SELECT COUNT(*) FROM CTE
I hope this query should do it:
SELECT SUM(names) AS total_names
FROM (
SELECT COUNT(DISTINCT(name)) as names FROM TableA
UNION
SELECT COUNT(DISTINCT(name)) as names FROM TableB
) t;
Note: Tested with sql server
Yet another option:
select hll_count.merge(hll_sketch) names
from (
select hll_count.init(name) hll_sketch from tableA
union all
select hll_count.init(name) from tableB
)
HLL++ functions are approximate aggregate functions. Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical error. This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.
See more about benefits of using HyperLogLog++ functions

How to join in SQL-SERVER

I am trying to learn SQL-SERVER and I have created the below query:
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, *
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;
I found that the only way to limit is with ROW_NUMBER().
Although when I try to run the join I have this error:
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [8156] [S0001]: The column 'DIALOG_ID' was specified multiple times for 'T'.
The problem: In the WITH, you do SELECT * which gets all columns from both tables db and dbs. Both have a column DIALOG_ID, so a column by that name ends up twice in the result set of the WITH.
Although until here that is all allowed, it is not good practice: why have the same data twice?
Things go wrong when SQL Server has to determine what SELECT * FROM T means: it expands SELECT * to the actual columns of T, but it finds a duplicate column name, and then it refuses to continue.
The fix (and also highly recommended in general): be specific about the columns that you want to output. If T has no duplicate columns, then SELECT * FROM T will succeed.
Note that the even-more-pure variant is to also be specific about what columns you select from T. By doing that it becomes clear at a glance what the SELECT produces, instead of having to guess or investigate when you look at the query later on (or when someone else does).
The updated code would look like this (fill in your column names as we don't know them):
WITH T AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num,
d.DIALOG_ID, d.SOME_OTHER_COL,
ds.DS_ID, ds.SOME_OTHER_COL_2
FROM test.db AS d
INNER JOIN test.dbs AS ds ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT row_num, DIALOG_ID, SOME_OTHER_COL, DS_ID, SOME_OTHER_COL_2
FROM T
WHERE row_num <= 10;
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, d.*
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

Select every nth row with NHibernate

How would you implement a query that selects every nth row, with NHibernate QueryOver, HQL or Criteria?
Currently I use the following T-SQL query:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY Id) AS [Row]
FROM [TABLE_NAME]
) x WHERE (x.[Row] % 100) = 0
(Thanks to Marc Gravell)
Have you considered the solution of using an indexing table in a cross join? What I mean is that you have a table with as many rows as you think you will need with an indexed column of integers going from 1-n in each row. This can be in a master database perhaps with a date column beside it - its amazing how useful this method is. The query would then look like
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY Id) AS [Row]
FROM [TABLE_NAME]
) x INNER JOIN [Index_Table] i ON i.Id*100=x.[Row]
Same as L2S - there's no easy way to do this without SQL. And the syntax would be DBMS-specific anyway.

Merge two unrelated views into a single view

Let's say I have in my first view (ClothingID, Shoes, Shirts)
and in the second view I have (ClothingID, Shoes, Shirts) HOWEVER
the data is completely unrelated, even the ID field is not related in anyway.
I want them combined into 1 single view for reporting purposes.
so the 3rd view (the one I'm trying to make) should look like this: (ClothingID, ClothingID2, Shoes, Shoes2, Shirts, Shirts2)
so there's no relation AT ALL, I'm just putting them side by side, unrelated data into the same view.
Any help would be strongly appreciated
You want to combine the results, yet be able to tell the rows apart.
To duplicate all columns would be a bit of an overkill. Add a column with info about the source:
SELECT 'v1'::text AS source, clothingid, shoes, shirts
FROM view1
UNION ALL
SELECT 'v2'::text AS source, clothingid, shoes, shirts
FROM view2;
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_1
) v1
full outer join (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_2
) v2 on v1.row = v2.row
I think that full outer join that joins table using new unrelated column row will do the job.
row_number() exists in PostgreSQL 8.4 and above.
If you have lower version you can imitate row_number, example below. It's going to work only if ClothingID is unique in a scope of view.
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
) v1
full outer join (
select *, (select count(*) from view_2 t2
where t2.ClothingID <= t.ClothingID) as row
from view_2 t
) v2 on v1.row = v2.row
Added after comment:
I've noticed and corrected mistake in preceding query.
I'll try to explain a bit. First of all we'll have to add a row numbers to both views to make sure that there are no gaps in id's. This is quite simple way:
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
This consist of two things, simple query selecting rows(*):
select *
from view_1 t
and correlated subquery (read more on wikipedia):
(
select count(*)
from view_1 t1
where t1.ClothingID <= t.ClothingID
) as row
This counts for each row of outer query (here it's (*)) preceding rows including self. So you might say count all rows which have ClothingID less or equal like current row for each row in view. For unique ClothingID (that I've assumed) it gives you row numbering (ordered by ClothingID).
Live example on data.stackexchange.com - row numbering.
After that we can use both subqueries with row numbers to join them (full outer join on Wikipedia), live example on data.stackexchange.com - merge two unrelated views.
You could use Rownumber as a join parameter, and 2 temp tables?
So something like:
Insert #table1
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC) [Row_ID], Clothing_ID, Shoes, Shirts)
FROM Table1
Insert #table2
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC)[RowID], Clothing_ID, Shoes, Shirts)
FROM Table2
Select t1.Clothing_ID, t2.Clothing_ID,t1.Shoes,t2.Shoes, t1.Shirts,t2.Shirts
from #table1 t1
JOIN atable2 t2 on t1.Row_ID = t2.Row_ID
I think that should be roughly sensible. Make sure you are using the correct join so the full output for both queries appear
e;fb
If the views are unrelated, SQL will struggle to deal with it. You can do it, but there's a better and simpler way...
I suggest merging them one after the other, rather than side-by-side as you have suggested, ie a union rather than a join:
select 'view1' as source, ClothingID, Shoes, Shirts
from view1
union all
select 'view2', ClothingID, Shoes, Shirts
from view2
This would be the usual approach for this kind of situation, and is simple to code and understand.
Note the use of UNION ALL, which preserves row order as selected and does not remove duplicates, as opposed to UNION, which sorts the rows and removes duplicates.
Edited
Added a column indicating which view the row came from.
You can try following:
SELECT *
FROM (SELECT row_number() over(), * FROM table1) t1
FULL JOIN (SELECT row_number() over(), * FROM table2) t2 using(row_number)