Say I have two database tables T1 and T2. T1 has some column C1 (among others) with values, say 1, 1, 2, 2, 2, 3, null, null. T2 has column C2 (among others) with values, say 1, 1, 2, 4, 5, 5, null. I wish to get a summary of these two columns, i.e. in one query, if possible, get to know how many times each value (null included) occurred in both columns combined. In this case, 1 occurred 4 times, 2 occurred 4 times, 3 and 4 occurred once, 5 occurred twice and null occurred 3 times.
I do not know in advance all the possible values in the columns.
You need a group by on top of a union all query:
SELECT value, COUNT(*)
FROM (SELECT c1 AS value
FROM t1
UNION ALL
SELECT c2 AS value
FROM t2)
GROUP BY value
Depending your table size, your data distribution and maybe the index eventually available on C1 and C2, you might expect better performances by using a query like the following, as Oracle don't have to build the full union of both table.
SELECT C, SUM(N) FROM
(
SELECT C1 AS C, COUNT(*) AS N FROM T1 GROUP BY C1
UNION ALL
SELECT C2, COUNT(*) FROM T2 GROUP BY C2
)
GROUP BY C;
That being said, YMMV. So if this is critical, I would suggest you to carefully examine the query execution plan in order to choose the "right" solution for your particular case.
Related
I am looking to join two time-ordered tables, such that the events in table1 are matched to the "next" event in table2 (within the same user). I am using SQL / Snowflake for this.
For argument's sake table1 is "notification_clicked" events and table2 is "purchases"
This is one way to do it:
WITH partial_result AS (
SELECT
userId, notificationId, notificationTimeStamp, transactionId, transactionTimeStamp
FROM table1 CROSS JOIN table2
WHERE table1.userId = table2.userId
AND notificationTimeStamp <= transactionTimeStamp)
SELECT *
FROM partial_result
QUALIFY ROW_NUMBER() OVER(
PARTITION BY userId, notificationId ORDER BY transactionTimeStamp ASC
) = 1
It is not super readable, but is this "the" way to do this?
If you're doing an AsOf join against small tables, you can use a regular Venn diagram type of join. If you're running it against large tables, a regular join will lead to an intermediate cardinality explosion before the filter.
For large tables, this is the highest performance approach I have to date. Rather than treating an AsOf join like a regular Venn diagram join, we can treat it like a special type of union between two tables with a filter that uses the information from that union. The sample SQL does the following:
Unions the A and B tables so that the Entity and Time come from both tables and all other columns come from only one table. Rows from the other table specify NULL for these values (measures 1 and 2 in this case). It also projects a source column for the table. We'll use this later.
In the unioned table, it uses a LAG function on windows partitioned by the Entity and ordered by the Time. For each row with a source indicator from the A table, it lags back to the first Time with source in the B table, ignoring all values in the A table.
with A as
(
select
COLUMN1::int as "E", -- Entity
COLUMN2::int as "T", -- Time
COLUMN4::string as "M1" -- Measure (could be many)
from (values
(1, 7, 1, 'M1-1'),
(1, 8, 1, 'M1-2'),
(1, 41, 1, 'M1-3'),
(1, 89, 1, 'M1-4')
)
), B as
(
select
COLUMN1::int as "E", -- Entity
COLUMN2::int as "T", -- Time
COLUMN4::string as "M2" -- Different measure (could be many)
from (values
(1, 6, 1, 'M2-1'),
(1, 12, 1, 'M2-2'),
(1, 20, 1, 'M2-3'),
(1, 35, 1, 'M2-4'),
(1, 57, 1, 'M2-5'),
(1, 85, 1, 'M2-6'),
(1, 92, 1, 'M2-7')
)
), UNIONED as -- Unify schemas and union all
(
select 'A' as SOURCE_TABLE -- Project the source table
,E as AB_E -- AB_ means it's unified
,T as AB_T
,M1 as A_M1 -- A_ means it's from A
,NULL::string as B_M2 -- Make columns from B null for A
from A
union all
select 'B' as SOURCE_TABLE
,E as AB_E
,T as AB_T
,NULL::string as A_M1 -- Make columns from A null for B
,M2 as B_M2
from B
)
select AB_E as ENTITY
,AB_T as A_TIME
,lag(iff(SOURCE_TABLE = 'A', null, AB_T)) -- Lag back to
ignore nulls over -- previous B row
(partition by AB_E order by AB_T) as B_TIME
,A_M1 as M1_FROM_A
,lag(B_M2) -- Lag back to the previous non-null row.
ignore nulls -- The A sourced rows will already be NULL.
over (partition by AB_E order by AB_T) as M2_FROM_B
from UNIONED
qualify SOURCE_TABLE = 'A'
;
This will perform orders of magnitude faster for large tables because the highest intermediate cardinality is guaranteed to be the cardinality of A + B.
To simplify this refactor, I wrote a stored procedure that generates the SQL given the paths to table A and B, the entity column in A and B (right now limited to one, but if you have more it will get the SQL started), the order by (time) column in A and B, and finally the list of columns to "drag through" the AsOf join. It's rather lengthy so I posted it on Github and will work later to document and enhance it:
https://github.com/GregPavlik/AsOfJoin/blob/main/StoredProcedure.sql
If I have a relation collab(a1, a2) with rows
(a1, a2)
1, 2
1, 3
1, 3
2, 4
and another relation ident(a1)
with rows
(a1)
1,
2,
3,
4,
Then I can I, for each value of a1 in ident, extract a1 and the count number of a2s that are matched with this particular value of a1?
Thus, I want the result
(a1, num_a2)
1, 2
2, 1
3, 0
4, 0
If I understand your requirement correct, this following query should help you getting your required output-
Demo Here
SELECT a1, count(a11)
FROM
(
SELECT table_2.a1, table_1.a1 a11
FROM table_2
LEFT JOIN table_1 ON table_2.a1 = table_1.a1
)A
GROUP BY a1
As #GMB said, you no sub query is required as directly this can be achieved as below-
Demo Here
SELECT table_2.a1, COUNT(table_1.a1)
FROM table_2
LEFT JOIN table_1 ON table_2.a1 = table_1.a1
GROUP BY table_2.a1
I don´t know if I understood correctly. But here´s my little contribution:
You can use the function "COALESCE" to replace missing values with a zero. In this case you select your ID list from the table IDENT and left join the table collab with the number count (distinct in your case, as you want unique numbers, like in your output) and replace all NA´s with zeros:
SELECT t1.a1, COALESCE(t2.NUMBER, 0)
FROM( SELECT a1 FROM ident) t1
LEFT JOIN
( SELECT a1, COUNT (DISTINCT a2) as N FROM collab GROUP BY a1 ) t2
in my database I have 10 users numbers some of them have been deleted, and when I select the column at shows like this:
missing_user_number:
1,
2,
5,
8,
10,
and I need to know if there is a script that can get the missing numbers like this, I don't want the deleted data back, I just want the missing numbers as an integrs data:
missing_user_number:
3,
4,
6,
7,
9,
In most versions of SQL, it is actually easier to get ranges of missing values, rather than each missing value:
select user_number + 1 as missing_range_start, next_user_number - 1 as missing_range_end
from (select t.*,
lead(user_number) over (order by user_number) as next_user_number
from t
) t
where user_number <> user_number + 1;
Note: This only finds internal missing numbers, as in the example in your question.
You can create an in-line numbers table that contains all 10 user numbers. Then LEFT JOIN your table to it in order to get the missing numbers:
SELECT t1.n AS missing_user_number
FROM (
SELECT 1 AS n UNION ALL SELECT 2 ... SELECT 10
) AS t1
LEFT JOIN mytable AS t2 ON t1.n = t2.user_number
WHERE t2.user_number IS NULL
I've read and read and read but I haven't found a solution to my problem.
I'm doing something like:
SELECT a
FROM t1
WHERE t1.b IN (<external list of values>)
There are other conditions of course but this is the jist of it.
My question is: is there a way to show which in the manually entered list of values didn't find a match? I've looked but I can't find and I'm going in circles.
Create a temp table with the external list of values, then you can do:
select item
from tmptable t
where t.item not in ( select b from t1 )
If the list is short enough, you can do something like:
with t as (
select case when t.b1='FIRSTITEM' then 1 else 0 end firstfound
case when t.b1='2NDITEM' then 1 else 0 end secondfound
case when t.b1='3RDITEM' then 1 else 0 end thirdfound
...
from t1 wher t1.b in 'LIST...'
)
select sum(firstfound), sum(secondfound), sum(thirdfound), ...
from t
But with proper rights, I would use Nicholas' answer.
To display which values in the list of values haven't found a match, as one of the approaches, you could create a nested table SQL(schema object) data type:
-- assuming that the values in the list
-- are of number datatype
create type T_NumList as table of number;
and use it as follows:
-- sample of data. generates numbers from 1 to 11
SQL> with t1(col) as(
2 select level
3 from dual
4 connect by level <= 11
5 )
6 select s.column_value as without_match
7 from table(t_NumList(1, 2, 15, 50, 23)) s -- here goes your list of values
8 left join t1 t
9 on (s.column_value = t.col)
10 where t.col is null
11 ;
Result:
WITHOUT_MATCH
-------------
15
50
23
SQLFiddle Demo
There is no easy way to convert "a externally provided" list into a table that can be used to do the comparison. One way is to use one of the (undocumented) system types to generate a table on the fly based on the values supplied:
with value_list (id) as (
select column_value
from table(sys.odcinumberlist (1, 2, 3)) -- this is the list of values
)
select l.id as missing_id
from value_list l
left join t1 on t1.id = l.id
where t1.id is null;
There are ways to get what you have described, but they have requirements which exceed the statement of the problem. From the minimal description provided, there's no way to have the SQL return the list of the manually-entered values that did not match.
For example, if it's possible to insert the manually-entered values into a separate table - let's call it matchtbl, with the column named b - then the following should do the job:
SELECT matchtbl.b
FROM matchtbl
WHERE matchtbl.b NOT IN (SELECT distinct b
FROM t1)
Of course, if the data is being processed by a programming language, it should be relatively easy to keep track of the set of values returned by the original query, by adding the b column to the output, and then perform the set difference.
Putting the list in an in clause makes this hard. If you can put the list in a table, then the following works:
with list as (
select val1 as value from dual union all
select val2 from dual union all
. . .
select valn
)
select list.value, count(t1.b)
from list left outer join
t1
on t1.b = list.value
group by list.value;
In PostgreSQL 8.3 on Ubuntu, I do have 3 tables, say T1, T2, T3, of different schemas.
Each of them contains (a few) records related to the object of the ID I know.
Using 'psql', I frequently do the 3 operations:
SELECT field-set1 FROM T1 WHERE ID='abc';
SELECT field-set2 FROM T2 WHERE ID='abc';
SELECT field-set3 FROM T3 WHERE ID='abc';
and just watch the results; for me it is enough to see.
Is it possible to have a procedure/function/macro etc, with one parameter 'id',
just running the three SELECTS one after another,
displaying results on the screen ?
field-set1, field-set2 and field-set 3 are completely different.
There is no reasonable way to JOIN the tables T1, T2, T3; these are unrelated data.
I do not want JOIN.
I want to see the three resulting sets on the screen.
Any hint?
Quick and dirty method
If the row types (data types of all columns in sequence) don't match, UNION will fail.
However, in PostgreSQL you can cast a whole row to its text representation:
SELECT t1:text AS whole_row_in_text_representation FROM t1 WHERE id = 'abc'
UNION ALL
SELECT t2::text FROM t2 WHERE id = 'abc'
UNION ALL
SELECT t3::text FROM t3 WHERE id = 'abc';
Only one ; at the end, and the one is optional with a single statement.
A more refined alternative
But also needs a lot more code. Pick the table with the most columns first, cast every individual column to text and give it a generic name. Add NULL values for the other tables with fewer columns. You can even insert headers between the tables:
SELECT '-t1-'::text AS c1, '---'::text AS c2, '---'::text AS c1 -- table t1
UNION ALL
SELECT '-col1-'::text, '-col2-'::text, '-col3-'::text -- 3 columns
UNION ALL
SELECT col1::text, col2::text, col3::text FROM t1 WHERE id = 'abc'
UNION ALL
SELECT '-t2-'::text, '---'::text, '---'::text -- table t2
UNION ALL
SELECT '-col_a-'::text, '-col_b-'::text, NULL::text -- 2 columns, 1 NULL
UNION ALL
SELECT col_a::text, col_b::text, NULL::text FROM t2 WHERE id = 'abc'
...
put a union all in between and name all columns equal
SELECT field-set1 as fieldset FROM T1 WHERE ID='abc';
union all
SELECT field-set2 as fieldset FROM T2 WHERE ID='abc';
union all
SELECT field-set3 as fieldset FROM T3 WHERE ID='abc';
and execute it at once.