Wrong count on inner join of the same table - sql

I have a big table with millions of records and new records are added everyday. My goal is to only update the min/max/total of the newly inserted rows. But I am not getting the correct count().
sqlite> select rowid, * from t1;
1, james, 0
2, james, 6
3, peter, 8
4, james, 4
5, james, 0
6, peter, 0
7, james, 6
8, james, 5
9, peter, 0
10, james, 7
11, james, 7
The sql I am using:
sqlite> select t1.name, max(t1.score), min(t1.score), count(*) from
t1 join t1 as t2 on t1.name=t2.name
where t2.rowid > 9 group by t2.name;
james, 7, 0, 16
I am expecting james, 7, 0, 8
sqlite> select t1.name, max(t1.score), min(t1.score), count(t1.name)
from t1 join t1 as t2 on t1.name=t2.name
where t2.rowid > 5 group by t2.name;
james, 7, 0, 32
peter, 8, 0, 6
I am expecting
james, 7, 0, 8
peter, 8, 0, 3
What did I do wrong here?

Related

Big Query: How to join 2 tables on user ID when 1 table contains an array of user ids?

There are somewhat similar answers already posted on StackOverflow, but they didn't address this specific case or involved a query that I was not able to understand, given that I just started my first SQL-related position.
This is the first time I try to join tables having a column values in one of the tables I am joining on in the form of an array. After trying to solve my own problem, I run into the following error: No matching signature for operator = for argument types: ARRAY<INT64>, STRING.
I have 2 tables that look like the following:
Table 1:
team_id user_id
1 [1, 2, 3]
2 [4, 5, 6]
3 [7, 8, 9]
4 [10, 11, 12]
Table 2:
user_id value
2 10
5 20
7 30
12 40
What I want to join Table 2 to Table 1 by way of having Table 2 analyze if there is a matching user_id in an array of Table 1. If there is, then join based on common user_id and output results as follows:
Desired Output
team_id user_id value
1 2 10
2 5 20
3 7 30
4 12 40
Thank you in advance for sharing your knowledge!
You can join on unnest():
select t1.team_id, t2.user_id, t2.value
from table1 t1
inner join table2 t2 on t2.user_id in unnest(t1.user_id)
Below is for BigQuery Standard SQL
#standardSQL
SELECT team_id,
ARRAY_AGG(t2.user_id IGNORE NULLS) user_id,
IFNULL(SUM(value), 0) value
FROM `project.dataset.table1` t, t.user_id AS user_id
LEFT JOIN `project.dataset.table2` t2
USING(user_id)
GROUP BY team_id
You can test, play with above using sample data similar to yours in question as in below example
#standardSQL
WITH `project.dataset.table1` AS (
SELECT 1 team_id, [1, 2, 3] user_id UNION ALL
SELECT 2, [4, 5, 6] UNION ALL
SELECT 3, [7, 8, 9] UNION ALL
SELECT 4, [10, 11, 12] UNION ALL
SELECT 5, [13, 14]
), `project.dataset.table2` AS (
SELECT 2 user_id, 10 value UNION ALL
SELECT 3, 20 UNION ALL
SELECT 5, 20 UNION ALL
SELECT 7, 30 UNION ALL
SELECT 9, 1 UNION ALL
SELECT 12, 40
)
SELECT team_id,
ARRAY_AGG(t2.user_id IGNORE NULLS) user_id,
IFNULL(SUM(value), 0) value
FROM `project.dataset.table1` t, t.user_id AS user_id
LEFT JOIN `project.dataset.table2` t2
USING(user_id)
GROUP BY team_id
with output

Oracle PL/SQL: How to find duplicate sequences in large table?

I have a ~20000 row table like this (seq = sequence):
id seq_num seq_count seq_id a b c d
----------------------------------------------------
1 1 3 A400 1 0 0 0
2 2 3 A400 0 1 0 0
3 3 3 A400 0 0 1 0
4 1 2 V2303 1 1 1 1
5 2 2 V2303 1 1 1 1
6 1 3 G2 1 0 0 0
7 2 3 G2 0 1 0 0
8 3 3 G2 0 0 1 0
9 1 3 U900 1 0 0 0
10 2 3 U900 2 2 1 1
11 3 3 U900 5 3 8 5
I want to find the seq_id of a-b-c-d sequences that have duplicates in the table, could just be a dbms_ouput.put_line or anything. So as you can see, seq_id G2 is a duplicate of A400 because all of their rows match up, but U900 has no duplicates even though one row matches A400 and G2.
Is there a good way to check for duplicates like this on large sets of data? I cannot create new tables to temporarily hold data. So far I've been trying with cursors mostly but no luck.
Thank you, let me know if you need any more info about my problem.
Oracle Setup:
CREATE TABLE table_name ( id, seq_num, seq_count, seq_id, a, b, c, d ) AS
SELECT 1, 1, 3, 'A400', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 2, 2, 3, 'A400', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 3, 3, 3, 'A400', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 4, 1, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 5, 2, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 6, 1, 3, 'G2', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 7, 2, 3, 'G2', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 8, 3, 3, 'G2', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 9, 1, 3, 'U900', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 10, 2, 3, 'U900', 2, 2, 1, 1 FROM DUAL UNION ALL
SELECT 11, 3, 3, 'U900', 5, 3, 8, 5 FROM DUAL;
Query:
SELECT s.seq_id,
t.seq_id AS matched_seq_id
FROM table_name s
INNER JOIN
table_name t
ON ( s.seq_num = t.seq_num
AND s.seq_count = t.seq_count
AND s.seq_id < t.seq_id
AND s.a = t.a
AND s.b = t.b
AND s.c = t.c
AND s.d = t.d )
GROUP BY
t.seq_id,
s.seq_id
HAVING COUNT( DISTINCT t.seq_num ) = MAX( t.seq_count );
Results:
SEQ_ID MATCHED_SEQ_ID
------ --------------
A400 G2
Assuming results fit in a string about 2000 characters long, the fastest way is probably to use listagg():
select abcds, listagg(seq_id, ',') within group (order by seq_id)
from (select seq_id, listagg(a||b||c||d, ',') within group (order by seq_num) as abcds
from table_name
group by seq_id
) t
group by abcds
having count(*) >= 2;
This returns the matches as a comma-delimited list.

How to create a hierarchical tree in SQL with two kinds of nodes?

I have the next table tree:
id, name, boss, group
1, Boss 1, 9, false
2, Boss 2, 9, false
3, Group 1, 1, true
4, Group 2, 2, true
5, Employee 1, 3, false
6, Employee 2, 3, false
7, Employee 3, 3, false
8, Employee 4, 4, false
9, Boss 0, null, false
Which must be represented in the next way:
Boss 0
|___ Boss 1
| |-- Group 1
| |________ Employee 1
| |________ Employee 2
| |________ Employee 3
|___ Boss 2
|-- Group 2
|________ Employee 4
I can get this result:
id, name, level
9, Boss 0, 1
1, Boss 1, 2
2, Boss 2, 2
3, Group 1, 3
4, Group 2, 3
5, Employee 1, 4
6, Employee 2, 4
7, Employee 3, 4
8, Employee 4, 4
using the next query:
WITH RECURSIVE t(id, name, boss, level, group) AS
(
SELECT
p1.id,
p1.name,
p1.boss,
1 as level,
p1.group
FROM tree as p1
WHERE p1.boss is null
UNION ALL
SELECT p2.id,
p2.name,
p2.boss,
CASE WHEN p2.group = true THEN level + 1
WHEN p2.group is null THEN level
END,
p2.group
FROM tree as p2
INNER JOIN t on p2.boss = t.id
)
SELECT * FROM t WHERE t.group is null
However, what I need to get is the next information: how many people are directly and indirectly below under each employee? For example:
Boss 0:
2 Direct
4 Indirect
That is, what I am looking for is something like this:
id, name, level
9, Boss 0, 1
1, Boss 1, 2
2, Boss 2, 2
3, Group 1, 3
4, Group 2, 3
5, Employee 1, 3
6, Employee 2, 3
7, Employee 3, 3
8, Employee 4, 3
What can I do in this case? Do you think is better idea to use the nested set model for this kind of problem?
You don't specify RDBMS so I use SQL Server:
SqlFiddleDemo
WITH t(id, name, boss, [level], [group]) AS
(
SELECT
p1.id,
p1.name,
p1.boss,
1 as [level],
p1.[group]
FROM tree as p1
WHERE p1.boss IS NULL
UNION ALL
SELECT
p2.id,
p2.name,
p2.boss,
CASE WHEN t.[group] = 0 THEN [level] + 1
ELSE [level]
END,
p2.[group]
FROM tree as p2
JOIN t
ON p2.boss = t.id
)
SELECT *
FROM t

Selecting rows with duplicate values grouped

I have two tables that look something like this
Table Dog:
PK, color
1, red
2, yellow
3, red
4, red
5, yellow
The dogs have toys.
Table toys
PK, FK, name
1, 2, bowser
2, 2, oscar
3, 3, greg
4, 4, alp
5, 4, hanson
6, 5, omar
7, 5, herm
I need a query that selects the count of all yellow dogs that have more than one toy.
I was thinking somehting like:
Select count(*)
from toys t, dogs d
where t.fk = d.pk
and d.color = 'yellow'
group by t.fk
having count(t.fk) > 1;
It should return 2. but it comes back with mutiple rows
select count(*)
from (
select FK
from Toys t
inner join Dogs d on t.FK = d.PK
where d."color" = 'yellow'
group by FK
having count(*) > 1
)
SQL Fiddle Example

Sql Syntax: Update All values in table based on a value in a different table

I have three tables and I want to update all values for a particular type to the same value:
table1:
id, ValueType
table2:
id, Value
table3:
id, fkValueTypeId, fkValueId
fkValueType references ID in table1. fkValue references ID in Table2
I am trying to set all Speed values to the same value:
i.e.
Table1:
0, speed
1, age
2, colour
Table2:
0, 10
1, 20
2, 30
3, 40
4, 18
5, 18
6, blue
7, black
8, orange
9, 33
10, 34
11, 35
Table3:
0, 0, 0 --Speed = 10
1, 0, 0 --Speed = 20
2, 0, 0 --Speed = 30
3, 0, 0 --Speed = 40
4, 1, 1 --Age = 18
5, 1, 1 --Age = 18
6, 2, 2 --Colour = Blue
7, 2, 2 --Colour = Black
8, 2, 2 --Colour = Orange
9, 0, 9 --Speed = 33
10, 0, 10 --Speed = 34
11, 0, 11 --Speed = 35
What I want to do is Update Speed to '55' for all Speed entries in the tables so that table2 looks like this:
Table2:
0, 55
1, 55
2, 55
3, 55
4, 18
5, 18
6, blue
7, black
8, orange
9, 55
10, 55
11, 55
Hope this makes sense. I am not sure on the syntax and can do it using a loop but wondered if there is a better way (which I am sure there is!).
Thank you
A rewrite of #hobodave's answer:
UPDATE table2
SET Value = 55
FROM table2
JOIN table3 ON table3.fkValueId = table2.id
WHERE table3.fkValueTypeId = 0
UPDATE table2
SET table2.Value = 55
FROM table2
JOIN table3 ON table3.fkValueId = table2.id
WHERE table3.fkValueTypeId = 0
Edit: wasn't aware of SQL server's syntax warts :)