Merge two unrelated views into a single view - sql

Let's say I have in my first view (ClothingID, Shoes, Shirts)
and in the second view I have (ClothingID, Shoes, Shirts) HOWEVER
the data is completely unrelated, even the ID field is not related in anyway.
I want them combined into 1 single view for reporting purposes.
so the 3rd view (the one I'm trying to make) should look like this: (ClothingID, ClothingID2, Shoes, Shoes2, Shirts, Shirts2)
so there's no relation AT ALL, I'm just putting them side by side, unrelated data into the same view.
Any help would be strongly appreciated

You want to combine the results, yet be able to tell the rows apart.
To duplicate all columns would be a bit of an overkill. Add a column with info about the source:
SELECT 'v1'::text AS source, clothingid, shoes, shirts
FROM view1
UNION ALL
SELECT 'v2'::text AS source, clothingid, shoes, shirts
FROM view2;

select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_1
) v1
full outer join (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_2
) v2 on v1.row = v2.row
I think that full outer join that joins table using new unrelated column row will do the job.
row_number() exists in PostgreSQL 8.4 and above.
If you have lower version you can imitate row_number, example below. It's going to work only if ClothingID is unique in a scope of view.
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
) v1
full outer join (
select *, (select count(*) from view_2 t2
where t2.ClothingID <= t.ClothingID) as row
from view_2 t
) v2 on v1.row = v2.row
Added after comment:
I've noticed and corrected mistake in preceding query.
I'll try to explain a bit. First of all we'll have to add a row numbers to both views to make sure that there are no gaps in id's. This is quite simple way:
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
This consist of two things, simple query selecting rows(*):
select *
from view_1 t
and correlated subquery (read more on wikipedia):
(
select count(*)
from view_1 t1
where t1.ClothingID <= t.ClothingID
) as row
This counts for each row of outer query (here it's (*)) preceding rows including self. So you might say count all rows which have ClothingID less or equal like current row for each row in view. For unique ClothingID (that I've assumed) it gives you row numbering (ordered by ClothingID).
Live example on data.stackexchange.com - row numbering.
After that we can use both subqueries with row numbers to join them (full outer join on Wikipedia), live example on data.stackexchange.com - merge two unrelated views.

You could use Rownumber as a join parameter, and 2 temp tables?
So something like:
Insert #table1
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC) [Row_ID], Clothing_ID, Shoes, Shirts)
FROM Table1
Insert #table2
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC)[RowID], Clothing_ID, Shoes, Shirts)
FROM Table2
Select t1.Clothing_ID, t2.Clothing_ID,t1.Shoes,t2.Shoes, t1.Shirts,t2.Shirts
from #table1 t1
JOIN atable2 t2 on t1.Row_ID = t2.Row_ID
I think that should be roughly sensible. Make sure you are using the correct join so the full output for both queries appear
e;fb

If the views are unrelated, SQL will struggle to deal with it. You can do it, but there's a better and simpler way...
I suggest merging them one after the other, rather than side-by-side as you have suggested, ie a union rather than a join:
select 'view1' as source, ClothingID, Shoes, Shirts
from view1
union all
select 'view2', ClothingID, Shoes, Shirts
from view2
This would be the usual approach for this kind of situation, and is simple to code and understand.
Note the use of UNION ALL, which preserves row order as selected and does not remove duplicates, as opposed to UNION, which sorts the rows and removes duplicates.
Edited
Added a column indicating which view the row came from.

You can try following:
SELECT *
FROM (SELECT row_number() over(), * FROM table1) t1
FULL JOIN (SELECT row_number() over(), * FROM table2) t2 using(row_number)

Related

Performance for multiple partitioned subqueries

I have a database with one main table and multiple history/log tables that stores the evolution over time of properties of some rows of main table. These properties are not stored on the main table itself, but must be queried from the relevant history/log table. All these tables are big (on the order of gigabytes).
I want to dump the whole main table and join the last entry of all the history/log tables.
Currently I do it via subqueries as follows:
WITH
foo AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table1),
bar AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table2)
SELECT
...
FROM maintable mt
JOIN foo foo ON foo.itemid = mt.itemid AND foo.rownumber = 1
JOIN bar bar ON foo.itemid = mt.itemid AND bar.rownumber = 1
WHERE ...
The problem is that this is very slow. Is there a faster solution to this problem?
I am only allowed to perform read-only queries on this database: I can not make any changes to it.
In actual Oracle versions it's usually better to use laterals/CROSS APPLY, because CBO (oracle cost-based optimizer) can transform them (DCL - lateral view decorrelation transformation) and use optimal join method depending on your circumstances/conditions (table statistics, cardinality, etc).
So it would be something like this:
SELECT
...
FROM maintable mt
CROSS APPLY (
SELECT *
FROM table1
WHERE table1.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
CROSS APPLY (
SELECT *
FROM table2
WHERE table2.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
WHERE ...
PS. You haven't specified your oracle version, so my answer is for Oracle 12+

Joining data from two sources using bigquery

Can anyone please check whether below code is correct? In cte_1, I’m taking all dimensions and metrics from t1 excpet value1, value2, value3. In cte_2, I’m finding the unique row number for t2. In cte_3, I’m taking all distinct dimensions and metrics using join on two keys such as Date, and Ad. In cte_4, I’m taking the values for only row number 1. I’m getting sum(value1),sum(value2),sum(value3) correct ,but sum(value4) is incorrect
WITH cte_1 AS
(SELECT *except(value1, value2, value3) FROM t1 where Date >"2020-02-16" and Publisher ="fb")
-- Find unique row number from t2--
,cte_2 as(
SELECT ROW_NUMBER() OVER(ORDER BY Date) distinct_row_number, * FROM t2
,cte_3 as
(SELECT cte_2.*,cte_1.*except(Date) FROM cte_2 join cte_1
on cte_2.Date = cte_1. Date
and cte_2.Ad= cte_1.Ad))
,cte_4 AS (
(SELECT *
FROM
(
SELECT *,
row_number() OVER (PARTITION BY distinct_row_number ORDER BY Date) as rn
FROM cte_3 ) T
where rn = 1 ))
select sum(value1),sum(value2),sum(value3),sum(value4) from cte_4
Please see the sample table below:
Whilst your data does not seem compliant with the query you shared, since it is lacking the field named Ad and other fields have different names, such as Date and ReportDate, I was able to identify some issues and propose improvements.
First, within your temp table cte_1, you are only using a filter in the WHERE clause, you could use it within your from statement in your last step, such as :
SELECT * FROM (SELECT field1,field2,field3 FROM t1 WHERE Date > DATE(2020,02,16) )
Second, in cte_2, you need to select all the columns you will need from the table t2. Otherwise, your table will have only the row number and it won't be possible to join it with other tables, once it does not provide any other information. Thus, if you need the row number, you select it together with the other columns, which it has to include your primary key if you will perform any join in the future. The syntax would be as follows:
SELECT field1, field2, ROW_NUMBER() OVER(ORDER BY Date) FROM t2
Third, in cte_3, I assume you want to perform an INNER JOIN. Thus, you need to make sure that the primary keys are present in both tables, in your case Date and Ad, which I could not find within your data. Furthermore, you can not have duplicated names when joining two tables and selecting all the columns. For example, in your case you have Brand, value 1, value 2 and value 3 in both tables, it will cause an error. Thus, you need to specify where these fields should come from by selecting one by one or the using a EXCEPT clause.
Finally, in cte_4 and your final select could be together in one step. Basically, you are selecting only one row of data ordered by Date. Then summing the fields value 1, value 2 and value 3 individually based on the partition by date. Moreover, you are not selecting any identifier for the sum, which means that your table will have only the final sums. In general, when peforming a aggregation, such as SUM(), the primary key(s) is selected as well. Lastly, this step could have been performed in one step such as follows, using only the data from t2:
SELECT ReportDate, Brand, sum(value1) as sum_1,sum(value2) as sum_1,sum(value3) as sum_1, sum(value4) as sum_1 FROM (SELECT t2.*, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY Date) as rn t2)
WHERE rn=1
GROUP BY ReportDate, Brand
UPDATE:
With your explanation in the comment section. I was able to created a more specific query. The fields ReportDate,Brand,Portfolio,Campaign and value1,value2,value3 are from t2. Whilst value4 is from t1. The sum is made based on the row number equals to 1. For this reason, the tables t1 and t2 are joined before being using ROW_NUMBER(). Finally, in the last Select statement rn is not selected and the data is aggregated based on ReportDate, Brand, Portfolio and t2.Campaign.
WITH cte_1 AS (
SELECT t2.ReportDate, t2.Brand, t2.Portfolio, t2.Campaign,
t2.value1, t2.value2, t2.value3, t1.value4
FROM t2 LEFT JOIN t1 on t2.ReportDate = t1.ReportDate and t1.placement=t2.Ad
),
cte_2 AS(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY ReportDate) as rn FROM cte_1
)
SELECT ReportDate, Brand, Portfolio, Campaign, SUM(value1) as sum1, SUM(value2) as sum2, SUM(value3) as sum3,
SUM(value4) as sum4
FROM cte_2
WHERE rn=1
GROUP BY 1,2,3,4

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

SQL combine two query results

I can't use a Union because it's not the result I want, and I can't use join because I haven't any common column. I have tried many different SQL query structures and nothing works as I want.
I need help to achieve what I believe is a really simple SQL query. What I am doing now is
select a, b
from (select top 4 a from element_type order by c) as Y,
(SELECT * FROM (VALUES (NULL), (1), (2), (3)) AS X(b)) as Z
The first is a part of a table and the second is a hand created select that gives results like this:
select a; --Give--> a,b,c,d (1 column)
select b; --Give--> 1,2,3,4 (1 column)
I need a query based on the two first that give me (2 column) :
a,1
b,2
c,3
d,4
How can i do this? UNION, JOIN or anything else? Or maybe I can't.
All I can get for now is this:
a,1
a,2
a,3
a,4
b,1
b,2
...
If you want to join two tables together purely on the order the rows appear, then I hope your database support analytic (window) functions:
SELECT * FROM
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table1 t) t1
INNER JOIN
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table2 t) t2
ON t1.rown = t2.rown
Essentially we invent something to join them on by numbering the rows. If one of your tables already contains incrementing integers from 1, you dont need to ROW_NUMBER() OVER() on that table, because it already has suitable data to join to; you just invent a fake column of incrementing nubmers in the other table and then join together
Actually, even if it doesn't support analytics, there are ugly ways of doing row numbering, such as joining the table back to itself using id < id and COUNT(*) .. GROUP BY id to number the rows. I hate doing it, but if your DB doesnt support ROW_NUMBER i'll post an example.. :/
Bear in mind, of course, that RDBMS have R in the name for a reason - related data is.. well.. related. They don't do so well when data is unrelated, so if your hope is to join the "chalks" table to the "cheese" table even though the two are completely unrelated, you're finding out now why it's hard work! :)
Try using row_number. I've created something that might help you. See below:
declare #tableChar table(letter varchar)
insert into #tableChar(letter)
select 'a';
insert into #tableChar(letter)
select 'b';
insert into #tableChar(letter)
select 'c';
insert into #tableChar(letter)
select 'd';
select letter,ROW_NUMBER() over(order by letter ) from #tableChar
You can user row_number() to achieve this,
select a,row_number() over(order by a) as b from element_type;
As you are not taking second part from other table, so you do not need to use join. But if you are doing this on different tables the you can use row_number() to create key for both the tables and bases on those keys, you can join.
Hope it will help.

Implement FIRST() in select and not in WHERE

I want to get first value in a field in Oracle when another corresponding field has max value.
Normally, we would do this using a query and a subquery. The subquery ordering by a field and the outer query with where rownum<=1.
But, I cannot do this because the table aliases persist only one level deep and this query is a part of another big query and I need to use some aliases from the outermost query.
Here's the query structure
select
(
select a --This should get first value of a after b's are sorted desc
from
(
select a,b from table1 where table1.ID=t2.ID order by b desc
)
where rownum<=1
)
) as "A",
ID
from
table2 t2
Now this is not gonna work because alias t2 wont be available at innermost query.
Real world analogy that comes to my mind is I have a table containing records for all employees of a company, their salaries(including past salaries) and the date from which the salary was effective. So, for each employee, there will multiple records. Now, I want to get latest salaries for all the employees.
With SQL server, I could have used SELECT TOP. But that's not available with Oracle and since where clauses execute before order by, I cannot use where rownum<=1 and order by in same query and expect correct results.
How do I do this?
Using your analogy of employees and their salaries, if I understand what you are trying to do, you could do something like this (haven't tested):
SELECT *
FROM (
SELECT employee_id,
salary,
effective_date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) rowno
FROM employees
)
WHERE rowno=1
I would much rather see you connect the subquery up with a JOIN instead of embedding it in the SELECT. Cleaner SQL. Then you can use the windowing function that roartechs suggests.
Select t2.whatever, t1.a
From table2 t2
Inner Join (
Select tfirst.ID, tfirst.a
From (
Select ID, a,
ROW_NUMBER() Over (Partition BY ID ORDER BY b DESC) rownumber
FROM table1
) tfirst
WHERE tfirst.rownumber=1
) t1 on t2.ID=t1.ID