I have n tables all with the same fields: Username and Value. The same Username can have multiple registers on each table but the combination Username/Value is unique on each one.
I want to join the tables into a single one which contains all the users who appear on all the tables with all the different (Username/Value) pairs.
Example
Table A: {(User1,Value1);(User1,Value2);(User2,Value2);(User3,Value4)]
Table B: {(User1,Value4);(User3,Value5)]
Table C: {(User1,Value5);(User1,Value2);(User2,Value7);(User3,Value8)]
Desired output
Table D: {(User1,Value1);(User1,Value2);(User1,Value4);(User1,Value5);(User3,Value4);(User3,Value5);(User3,Value8)}
Now I'm doing multiple joins (using perl) like this
SELECT *
INTO $target_table
FROM (SELECT *
FROM $table1
WHERE bname IN (SELECT DISTINCT bname FROM $table2)
UNION
SELECT *
FROM $table2
WHERE bname IN (SELECT DISTINCT bname FROM $table1)
) UN
and then doing the same join between a third table and target_table and so on, but I think it should be a better way.
Any hints?
You can use UNION for this:
SELECT username, value
FROM $table1
UNION
SELECT username, value
FROM $table2
...
SELECT username, value
FROM $tablex
SQL Fiddle Demo
This will return you distinct records. If you are interested in duplicates, use UNION ALL.
Given your edits, it appears you only want to return records if the user is in all the tables.
Breaking that down, you need to do a few things. First, combine all your records together again, but this time denote which table each are coming from. Then you need to know the count of tables each user is in. Finally you need to check that number against the overall number of tables.
Here's one way using a few CTEs:
WITH CTE AS (
SELECT username, value, 1 AS tbl
FROM t1
UNION
SELECT username, value, 2 AS tbl
FROM t2
UNION
SELECT username, value, 3 AS tbl
FROM t3
),
CTECnt AS (
SELECT username, COUNT(DISTINCT tbl) tblCnt
FROM CTE
GROUP BY username
),
CTEMaxCnt AS (
SELECT COUNT(DISTINCT tbl) MaxCnt
FROM CTE
)
SELECT C.username, C.value
FROM CTE C
JOIN CTECnt C2 ON C.username = C2.username
JOIN CTEMaxCnt C3 ON C2.tblCnt = C3.MaxCnt
Another SQL Fiddle Demo
With Combined As
(
Select 'A' As TableName, Username, Value
From TableA
Union All
Select 'B', Username, Value
From TableB
Union All
Select 'C', Username, Value
From TableC
)
Select C.Username, C.Value
From Combined As C
Join (
Select C1.Username
From Combined As C1
Group By C1.Username
Having Count(Distinct C1.TableName) = 3
) As Z
On Z.Username = C.Username
Group By C.Username, C.Value
SQL Fiddle version
Related
I need to use multiple table selections in a query in SQL. But how to reference a table selected within a query?
for example: (pseudo code)
create table C as
select distinct id, product_code
from (
select distinct id, product_code
from A where dt = '2019-06-01'
)
inner join B on (select distinct id, product_code
from A where dt='2019-06-01').id = B.id;
the code above might be wrong, but the point is that the table A could not be used directly since it's too large and it has to be specified that dt is some specific value. (so I need to select something from A for double times above). And I need to inner join the smaller A' with some other table B.
Is it possible, say, "define" that table A_ = select distinct blabla...from A ... and then join A_ with B within a query?
thanks,
You just want a table alias:
select distinct id, product_code
from (select distinct id, product_code
from table_A
where dt = '2019-06-01'
) a inner join
table_B b
on a.id = B.id;
I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.
I have 9 tables each with value like
Level_1_tab
Code Name
ae1 hdgdgd
ae2 dhdh
level_2_tab
code Name
2 jfjfjf
3 fkfjfjf
similarly level_3_tab , level_4_tab, level_5 table so on and so forth till level_9_tab.
I am inserting the code column into a new table and checking for duplicates.
SELECT
code, name, COUNT(*)
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1;
Can i write a query and compare the code column of these 9 tables and check for duplicates ? that all the rows with duplicate code values should be retrieved
You can do a union all of the 9 tables and run your same query on that.
select code, name, count(*) from
(select code, name from table 1 union all
select code, name from table 2 union all
select code, name from table 3 union all
select code, name from table 4 union all
.....)
group by code, name
having count(*) > 1;
Everyone's suggestion of union all is great to build your initial table to look for duplicates in. But you say you already have a temp table with all of the values from the 9 tables which is perfect and another great way of doing it if your dataset isn't huge.
The only step your missing from your description to get the actually duplicate rows is to use your duplicate query above to re-query your temp table and return the rows you want. A great way of doing this is through common table expressions which basically allows you to build a query on top of your other query without another temp table. So use a cte and join back to your temp table.
;WITH CommonTableExpression AS (
SELECT
code, name, COUNT(*)
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1;
)
SELECT t.*
FROM
new_table t
INNER JOIN CommonTableExpression c
ON t.code = c.code
AND t.name = c.name
If you want to do it to each of the 9 tables independently rather than to your temp table. Place the duplicates into another temp table and join on it.
SELECT
code, name, COUNT(*)
INTO #Duplicates
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1
SELECT
l.*
FROM
leve_1_tab l
INNER JOIN #Duplicates d
ON l.Code = d.Code
AND l.name = d.name
Seeing everyone loves union all here is a way to do it with out temp tables and lots of union all s I wonder which would be a more optimized query though.
;WITH cteAllCodeValues AS (
select code, name from table 1 union all
select code, name from table 2 union all
select code, name from table 3 union all
select code, name from table 4 union all
--.....)
)
, cteDuplicates AS (
SELECT code, name, RecordCount = COUNT(*)
FROM
cteAllCodeValues
GROUP BY
code, name
)
SELECT c.*
FROM
cteDuplicates d
INNER JOIN cteAllCodeValues c
ON d.code = c.code
AND d.name = c.name
I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;
I have 2 identical tables: user_id, name, age, date_added.
USER_ID column may contain multiple duplicate IDs.
Need to merge those 2 tables into 1 with the following condition.
If there are multiple records with identical 'name' for the same user then need to keep only the LATEST (by date_added) record.
This script will be used with MSSQL 2005, but would also appreciate if somebody comes up with version that does not use ROW_NUMBER(). Need this script to reload a broken table once, performance is not critical.
example:
table1:
1,'john',21,01/01/2010
1,'john',15,01/01/2005
1,'john',71,01/01/2001
table2:
1,'john',81,01/01/2007
1,'john',15,01/01/2005
1,'john',11,01/01/2008
result:
1,'john',21,01/01/2010
UPDATE:
I think that I've found my own solution. It is based on an answer for my previous question given by Larry Lustig and Joe Stefanelli.
with tmp2 as
(
SELECT * FROM table1
UNION
SELECT * FROM table2
)
SELECT * FROM tmp2 c1
WHERE (SELECT COUNT(*) FROM tmp2 c2
WHERE c2.user_id = c1.user_id AND
c2.name = c1.name AND
c2.date_added >= c1.date_added) <= 1
Could you please help me to convert this query to the one without 'WITH' clause?
Here's a variant of #Andomar's answer:
; with all_users as
(
select *
from table1 u1
union all
select *
from table2 u2
)
, ranker as (
select *,
rank() over (partition by userid order by recordtime) as [r]
)
select * from ranker where [r] = 1
Just in the interests of giving a different approach...
WITH distinctlist
As (SELECT user_id,
name
FROM table1
UNION
SELECT user_id,
name
FROM table2)
SELECT C.*
FROM distinctlist d
CROSS APPLY (SELECT TOP 1 *
FROM (SELECT TOP 1 *
FROM table1
WHERE user_id = d.user_id
AND name = d.name
ORDER BY date_added DESC
UNION ALL
SELECT TOP 1 *
FROM table1
WHERE user_id = d.user_id
AND name = d.name
ORDER BY date_added DESC) T
ORDER BY date_added DESC) C
You could use not exists, like:
; with all_users as
(
select *
from table1 u1
union all
select *
from table2 u2
)
select *
from all_users u1
where not exists
(
select *
from all_users u2
where u1.name = u2.name
and u1.record_time < u2.record_time
)
If the database doesn't support CTE's, expand all_users in the two places it is used.
P.S. If there are only three columns, and no more, you could use an even simpler solution:
select name
, MAX(record_time)
from (
select *
from table1 u1
union all
select *
from table2 u2
) sub
group by
name