Right now my query is resulting data in below format.
dbid askid amid
================================
d1 m1 a1
I want to change the display as
SOURCE_ID DEST_ID
========================
m1 d1
a1 d1
We can't use union all, inner joins, cross joins because in real time it is very big query and interacts with n no.of multiple tables and it already have them many no.of
I tried using unpivot but every time it is storing column names also in to rows.
Sample Query which is resulting data right now:
select DEST_ID,SOURCE_ID from
(select
t1.id as dbid,
t2.mid as askid,
t3.m2idd as amid from
table1 t1, table2 t2, table3 t3 where
t1.actid = t2.senid
and t2.denid = t2.mkid
)
unpivot INCLUDE NULLS (SOURCE_ME_GUID FOR DEST_ME_GUID IN (amid, askid));
Use UNION ALL in combination with a WITH clause.
You got confused with your join criteria somehow. I suppose either mkid or denid resides in table3. Otherwise you'd be cross joining table3. You shouldn't use this 1980s join syntax anyway; it's been made redundant in 1992 for a reason.
with ids as
(
select t1.id as dbid, t2.mid as askid, t3.m2idd as amid
from table1 t1
join table2 t2 on t2.senid = t1.actid
join table3 t3 on t3.mkid = t2.denid
)
select askid as source_id, dbid as dest_id from ids
union all
select amid as source_id, dbid as dest_id from ids;
Solution for the above issue.
select dbid as DEST_ID, SOURCE_ID
from (
select 'd1' dbid,
cast('m1' as varchar2(200)) as askid,
cast('a1' as varchar2(200)) as amid
from dual
) unpivot INCLUDE NULLS (
SOURCE_ID FOR DEST_ID IN (amid, askid)
);
Related
I've tried to minify this problem as much as possible. I've got two tables which share some Id's (among other columns)
id id
---- ----
1 1
1 1
2 1
2
2
Firstly, I can get each table to resolve to a simple count of how many of each Id there is:
select id, count(*) from tbl1 group by id
select id, count(*) from tbl2 group by id
id | tbl1-count id | tbl2-count
--------------- ---------------
1 2 1 3
2 1 2 2
but then I'm at a loss, I'm trying to get the following output which shows the count from tbl2 for each id, divided by the count from tbl1 for the same id:
id | count of id in tbl2 / count of id in tbl1
==========
1 | 1.5
2 | 2
So far I've got this:
select tbl1.Id, tbl2.Id, count(*)
from tbl1
join tbl2 on tbl1.Id = tbl2.Id
group by tbl1.Id, tbl2.Id
which just gives me... well... something nowhere near what I need, to be honest! I was trying count(tbl1.Id), count(tbl2.Id) but get the same multiplied amount (because I'm joining I guess?) - I can't get the individual representations into individual columns where I can do the division.
This gives consideration to your naming of tables -- the query from tbl2 needs to be first so the results will include all records from tbl2. The LEFT JOIN will include all results from the first query, but only join those results that exist in tbl1. (Alternatively, you could use a FULL OUTER JOIN or UNION both results together in the first query.) I also added an IIF to give you an option if there are no records in tbl1 (dividing by null would produce null anyway, but you can do what you want).
Counts are cast as decimal so that the ratio will be returned as a decimal. You can adjust precision as required.
SELECT tb2.id, tb2.table2Count, tb1.table1Count,
IIF(ISNULL(tb1.table1Count, 0) != 0, tb2.table2Count / tb1.table1Count, null) AS ratio
FROM (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table2Count
FROM tbl2
GROUP BY id
) AS tb2
LEFT JOIN (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table1Count
FROM tbl1
GROUP BY id
) AS tb1 ON tb1.id = tb2.id
(A subqquery with a LEFT JOIN will allow the query optimizer to determine how to generate the results and will generally outperform a CROSS APPLY, as that executes a calculation for every record.)
Assuming your expected results are wrong, then this is how I would do it:
CREATE TABLE T1 (ID int);
CREATE TABLE T2 (ID int);
GO
INSERT INTO T1 VALUES(1),(1),(2);
INSERT INTO T2 VALUES(1),(1),(1),(2),(2);
GO
SELECT T1.ID AS OutID,
(T2.T2Count * 1.) / COUNT(T1.ID) AS OutCount --Might want a CONVERT to a smaller scale and precision decimal here
FROM T1
CROSS APPLY (SELECT T2.ID, COUNT(T2.ID) AS T2Count
FROM T2
WHERE T2.ID = T1.ID
GROUP BY T2.ID) T2
GROUP BY T1.ID,
T2.T2Count;
GO
DROP TABLE T1;
DROP TABLE T2;
You can aggregate in subqueries and then join:
select t1.id, t2.cnt * 1.0 / t1.cnt
from (select id, count(*) as cnt
from tbl1
group by id
) t1 join
(select id, count(*) as cnt
from tbl2
group by id
) t2
on t1.id = t2.id
I have two tables:
The first one has the colums "SomeValue" and "Timestamp". The other one has the columns "SomeOtherValue" and also "Timestamp".
What I need as an output is the following:
A table with the three Colums "SomeValue", "SomeOtherValue" and "Timestamp".
When a row in table 1 is like this: [2; 04/07/2017-20:05] and a row in table 2 is like that: [5; 04/07/2017-20:05], I want the combined output row to be [2; 5; 04/07/2017-20:05].
Until that point it would be easy done with a simple join, but I also need all other rows. So for example if we have a row in table 1 like [2; 04/07/2017-20:05] and no matching timestamp in table 2, the output should be like [2; ?; 04/07/2017-20:05]. The '?' stands for undefined or null. It would also be possible to not join two rows with the same timestamp but rather concating both tables, so that every row would have one empty cell with '?'.
I do realize that I didn't use correct Date/Time Format here in that example, but assume that it is used in the database.
I already tried using UNION ALL but it always removes one column.
For my use case it is not possible to query both tables independently. I really need both values in one row/object.
I hope someone can help me with this. Thank you!
What you are describing is a full outer join:
select t1.somevalue, t2.someothervalue, timestamp
from t1
full outer join t2 using (timestamp);
I don't know, however, whether SAP HANA supports the USING clause. Here is the same query with ON instead:
select
t1.somevalue,
t2.someothervalue,
coalesce(t1.timestamp, t2.timestamp) as timestamp
from t1
full outer join t2 on t2.timestamp = t1.timestamp;
Joining on a datetime stamp is not always going to be reliable unless you are setting a datetime variable and writing the value of that to both tables. It's probably not very efficient either.
That said, assuming you want all the results from table 1 and matching table 2 result if it exists then you need a left outer join
Select T1.[SomeValue]
, ISNULL(T2.[SomeOtherValue], '?')
, T1.[TimeStamp]
FROM Table1 T1
LEFT OUTER JOIN Table2 T2
ON T2.[TimeStamp] = T1.[TimeStamp]
Update based on comment from OP
If you need all rows from both tables then you could either do 2 queries as above but interchange T1 and T2 position then union the 2 queries.
SELECT T1.[TimeStamp]
, T1.[SomeValue]
, ISNULL(T2.[SomeOtherValue], '?')
FROM Table1 T1
LEFT OUTER JOIN Table2 T2
ON T2.[TimeStamp] = T2.[TimeStamp]
UNION
SELECT T2.[TimeStamp]
, T2.[SomeValue]
, ISNULL(T1.[SomeOtherValue], '?')
FROM Table2 T2
LEFT OUTER JOIN Table1 T1
ON T1.[TimeStamp] = T2.[TimeStamp]
;
Or you could insert the 1st query results into a table variable then add any missing from T2 rows into that table variable using a where not exists, then select the output.
DECLARE #TempTab TABLE
( [TimeStamp] [datetime] NOT NULL
, [SomeValue] [nvarchar] (MAX) -- or int if this is always an integer
, [SomeOtherValue] [nvarchar] (MAX) -- or int if this is always an integer
)
;
INSERT INTO #TempTab
( [TimeStamp]
, [SomeValue]
, [SomeOtherValue]
)
SELECT T1.[TimeStamp]
, T1.[SomeValue]
, ISNULL(T2.[SomeOtherValue], '?')
FROM Table1 T1
LEFT OUTER JOIN Table2 T2
ON T2.[TimeStamp] = T2.[TimeStamp]
;
INSERT INTO #TempTab
( [TimeStamp]
, [SomeValue]
, [SomeOtherValue]
)
SELECT T2.[TimeStamp]
, T2.[SomeValue]
, ISNULL(T1.[SomeOtherValue], '?')
FROM Table2 T2
LEFT OUTER JOIN Table1 T1
ON T1.[TimeStamp] = T2.[TimeStamp]
WHERE NOT EXISTS
( SELECT 1
FROM #TempTab T
WHERE T.[TimeStamp] = T2.[TimeStamp]
)
;
SELECT T.[TimeStamp]
, T.[SomeValue]
, T.[SomeOtherValue]
FROM #TempTab T
;
I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.
Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.
I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.
I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!
First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )
Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);
One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.
Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)
I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
This will give the results you requested and should have the best performance.
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
Your original query is producing arbitrary values for the two columns (the use of top with no order by). You can get the same effect with this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct with a group by.
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
If you want described result, you need to use INNER JOIN and following query will satisfy your need:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN and same query will gave you desired result.
EDITED
This construction has good performance, and you dont need to use subqueries.
hi i am using SQL Server 2008, want to create a view to list the different between two tables.
for example
t1
id name
---- ------
1 John
2 peter
3 mary
4 joe
5 sun
t2
id name
--- ----
1 john
2 joe
how to create a view to list all the name in t1 but not in t2.
i tried to do that, and always get
"Conversion failed when converting from a character string to uniqueidentifier."
error
and also, i don`t want to use
select something not in {something}, this is slow
so is there any way can use join??
NOT IN
SELECT t1.name
FROM TABLE_1 t1
WHERE t1.name NOT IN (SELECT t2.name
FROM TABLE_2 t2)
NOT EXISTS
SELECT t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS (SELECT NULL
FROM TABLE_2 t2
WHERE t2.name = t1.name)
LEFT JOIN/IS NULL:
SELECT t1.name
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Summary
Contrary to your belief, NOT IN will perform equivalent to NOT EXISTS. LEFT JOIN/IS NULL is the least efficient of the three options.
Here's a trick I use:
SELECT MAX(TABLE_NAME) AS TABLE_NAME
,[id]
,[name]
FROM (
SELECT 'T1' AS TABLE_NAME
,[id]
,[name]
FROM T1
UNION ALL
SELECT 'T2' AS TABLE_NAME
,[id]
,[name]
FROM T2
) AS X
GROUP BY [id]
,[name]
HAVING COUNT(*) = 1
ORDER BY [id]
,[name]
I actually have a universal proc I use which takes two tables and a bunch of parameters and does table compares generating dynamic SQL (the SQL above is actually cleaned up code-generated output of it), it uses either this UNION ALL trick or a combination of the three joins in OMG Ponies' answer depending on whether key columns/ignore columns/comare columns are specified.
SELECT name FROM TABLE_1
EXCEPT
SELECT name FROM TABLE_2