Get count of numbers from all columns in a large table

Get count of numbers from all columns in a large table - sql

I have the following table
ID A1 A2 A3 A4 A5 A6
1 324 243 3432 23423 342 342
2 342 242 4345 23423 324 342
How do I write a query that will give me the no.of times a number is appearing in any of the above columns. For example, this is the output I am looking for -
324 2
243 1
3432 1
23423 1
342 3
242 1
4345 1
23423 1

There are a number of ways to do this, but my first thought is to use unnest:
rnubel=# CREATE TABLE mv (a int, b int, c int);
CREATE TABLE
rnubel=# INSERT INTO mv (a, b, c) VALUES (1, 1, 1), (2, 2, 2), (3, 4, 5);
INSERT 0 3
rnubel=# SELECT unnest(array[a, b, c]) as value, COUNT(*) from mv GROUP BY 1;
value | count
-------+-------
5 | 1
4 | 1
2 | 3
1 | 3
3 | 1
(5 rows)
unnest is a handy function that turns an array into a set of rows, so it expands the array of column values into one row per column value. Then you just group and count as usual.

Brute force method:
SELECT Value
,COUNT(1) AS ValueCount
FROM (
SELECT A1 AS Value
FROM t
UNION ALL
SELECT A2
FROM t
UNION ALL
SELECT A3
FROM t
UNION ALL
SELECT A4
FROM t
UNION ALL
SELECT A5
FROM t
UNION ALL
SELECT A6
FROM t
) x
GROUP BY Value

In Postgres, you can use lateral joins to unpivot values. I find this more direct than using an array or union all:
select v.a, count(*)
from t cross join lateral
(values (a1), (a2), (a3), (a4), (a5), (a6)
) v(a)
group by v.a;
Here is a db<>fiddle.

Related

Reverse a number and sum of digits in sql

In sql database i have a table .In which I have column A is decimal(18,0) type.
A
34
123
345
879
I need column B and C as like this
B C
43 7
321 6
543 12
978 24

For Postgres you can use string_to_array() to split the number into digits:
with data (a) as (
values
(34),
(123),
(345),
(879)
)
select a,
string_agg(t.d::text, '' order by t.idx desc) as b,
sum(t.d::int) as c
from data,
unnest(string_to_array(a::text,null)) with ordinality as t(d, idx)
group by a;
The above returns:
a | b | c
----+-----+---
34 | 43 | 7
123 | 321 | 6
345 | 543 | 12
879 | 978 | 24
To get the reversed number, you could also use reverse() in Postgres

In Oracle, it can be done as follows -
SELECT
VALUE, REVERSE_VALUE, SUM(SUM_TOT) AS SUM
FROM (
SELECT
DISTINCT A AS VALUE, REVERSE(TO_CHAR(A)) AS REVERSE_VALUE, SUBSTR(TO_CHAR(A), LEVEL, 1) AS SUM_TOT
FROM (
SELECT 34 AS A FROM DUAL
UNION
SELECT 123 FROM DUAL
UNION
SELECT 345 FROM DUAL
UNION
SELECT 879 FROM DUAL
)
CONNECT BY LEVEL <= LENGTH(TO_CHAR(A))
ORDER BY 1
)
GROUP BY
VALUE, REVERSE_VALUE
;
Output -
VALUE|REVERSE_VALUE|SUM
34|43|7
345|543|12
123|321|6
879|978|24

Incrementally comparing multiple value sets or lists between two tables in Oracle

I am trying to compare two sets of values between 2 Oracle tables as below. I am trying to look for and match groups of data in table B with those in table A. The group number is common between tables
Its considered a match only if all groups and values under an id in Table A are equal the group and value pair in Table B. I have highlighted the 'matches' in green. Table A could have variable number of group/value pairs under ida value. There could be ids that have only one group/value pair and there could be some that have 3 group/value pairs
Comparison Example
Ida GroupA Vala|GroupB Valb| Match?
------------------------------------------------------------------------
50 1 4 | 1 1 | No - Value doesn't match
56 1 5 | 1 1 | No - Value doesn't match
57 1 1 | 1 1 | Yes - Both Groups (1&2) and Values match
57 2 101 | 2 101 | Yes - Both Group (1&2)and Values match
94 1 1 | 1 1 | Yes - Group and Value match
96 1 1 | 1 1 | No - Only group 1 matches
96 2 102 | 2 101 | No - Only group 1 matches. Group 2 doesn't
Trial (and Error!)
I figured I would have to use some sort of count and tried using a partition by to count the groups in Table A. But, I am not sure how to use this in a query to do a sequential/multi value comparison. I looked up hierarchical functions but realized they may not fit here.. What would be the best approach to deal with such data comparison? Thanks for your help..
Happy Halloween! :)
select a.*,MAX(a.groupa) OVER (PARTITION BY a.ida ORDER BY a.groupa desc)
occurs
from tab_a a, tab_b b
where a.groupa=b.groupb and a.vala=b.valb
and a.groupa<=3
Tables
Tables A and B
create table tab_a
(
ida number,
groupa number,
vala number
)
create table tab_b
(
idb number,
groupb number,
valb number
)
Data
insert into tab_a values (50,1,4);
insert into tab_a values (56,1,5);
insert into tab_a values (57,1,1);
insert into tab_a values (57,2,101);
insert into tab_a values (58,1,1);
insert into tab_a values (58,2,104);
insert into tab_a values (60,2,102);
insert into tab_a values (94,1,1);
insert into tab_a values (95,1,1);
insert into tab_a values (95,2,101);
insert into tab_a values (96,1,1);
insert into tab_a values (96,2,102);
insert into tab_a values (97,1,1);
insert into tab_a values (97,2,101);
insert into tab_a values (97,3,201);
insert into tab_b values (752,1,1);
insert into tab_b values (752,2,101);
insert into tab_b values (752,3,201);

I don't think this is all the way there but might get you started. You can do:
select a.*, b.*,
count(case when a.groupa = b.groupb and a.vala = b.valb then a.ida end)
over (partition by a.ida) match_count,
count(distinct a.groupa||':'||a.vala)
over (partition by a.ida) val_count
from tab_a a
full outer join tab_b b on b.groupb = a.groupa and b.valb = a.vala
where a.groupa <= 3;
The distinct may not be needed, and the concatenation with the colon needs to use a characters that isn't in any real value, I suppose, to avoid potential for false matched.
That gets:
IDA GROUPA VALA IDB GROUPB VALB MATCH_COUNT VAL_COUNT
--- ------ ---- ---- ------ ---- ----------- ----------
50 1 4 0 1
56 1 5 0 1
57 1 1 752 1 1 2 2
57 2 101 752 2 101 2 2
58 1 1 752 1 1 1 2
58 2 104 1 2
60 2 102 0 1
94 1 1 752 1 1 1 1
95 1 1 752 1 1 2 2
95 2 101 752 2 101 2 2
96 1 1 752 1 1 1 2
96 2 102 1 2
97 1 1 752 1 1 3 3
97 2 101 752 2 101 3 3
97 3 201 752 3 201 3 3
And then use that as a CTE or inline view and decode the results:
with t as (
select a.ida, a.groupa, a.vala, b.groupb, b.valb,
count(case when a.groupa = b.groupb and a.vala = b.valb then a.ida end)
over (partition by a.ida) match_count,
count(distinct a.groupa||':'||a.vala)
over (partition by a.ida) val_count
from tab_a a
full outer join tab_b b on b.groupb = a.groupa and b.valb = a.vala
where a.groupa <= 3
)
select ida, groupa, vala, groupb, valb,
case
when match_count = 0 then 'No - Value doesn''t match'
when match_count = val_count and val_count = 1
then 'Yes - Group and Value match'
when match_count = val_count and val_count = 2
then 'Yes - Both Group (1&2) and Values match'
when match_count < val_count and val_count = 2 and valb is not null
then 'No - Only group 1 matches'
when match_count < val_count and val_count = 2 and valb is null
then 'No - Only group 1 matches. Group 2 doesn''t'
else 'Unknown scenario?'
end as "Match?"
from t;
Which gets:
IDA GROUPA VALA GROUPB VALB Match?
--- ------ ---- ------ ---- ------------------------------------------
50 1 4 No - Value doesn't match
56 1 5 No - Value doesn't match
57 1 1 1 1 Yes - Both Group (1&2) and Values match
57 2 101 2 101 Yes - Both Group (1&2) and Values match
58 1 1 1 1 No - Only group 1 matches
58 2 104 No - Only group 1 matches. Group 2 doesn't
60 2 102 No - Value doesn't match
94 1 1 1 1 Yes - Group and Value match
95 1 1 1 1 Yes - Both Group (1&2) and Values match
95 2 101 2 101 Yes - Both Group (1&2) and Values match
96 1 1 1 1 No - Only group 1 matches
96 2 102 No - Only group 1 matches. Group 2 doesn't
97 1 1 1 1 Yes - All Group (1&2&3) and Values match
97 2 101 2 101 Yes - All Group (1&2&3) and Values match
97 3 201 3 201 Yes - All Group (1&2&3) and Values match
I think that gets the match result you showed in your examples; not sure if the others you didn't show are what you want... ID 97 matches on three groups/values, and it's easy enough to do:
when match_count = val_count and val_count = 3
then 'Yes - All Group (1&2&3) and Values match'
for that exact match, but figuring out what to show if one or two of those three match is trickier. You could also capture the min and max B values that do match and work out from those which one(s) are missing; but then you might add a fourth group, and it doesn't scale.

This query should work:
select a.ida
from tab_a a
where a.groupa||a.vala in
(select b.groupb|| b.valb from tab_b b where b.groupb = a.groupa )
group by a.ida
having count(distinct a.groupa||a.vala) =
(select count(distinct a1.groupa||a1.vala)
from tab_a a1
where a1.ida = a.ida)
Bit of explanation:
1. where clause gets all the rows from tab_a
that exist in tab_b for a group+val combo.
- So let's say there are 2 (out of 2) rows in tab_a
that match with 2(out of 3) rows in tab_b.
2. left hand side of the having clause adds
a condition to the found rows such that
total number of rows of distinct group+val must equal to
- So here we start comparing that count 2
3. right hand side of the having clause
that provides the total number of
distinct group+val (regardless of any match with tab_b).
- here we enforce that left hand side must be equal
to the total number of rows found. So if in #2 above,
only 1 row of table_a matched (out of its 2 rows),
then #3 will exclude that set.

It's not the perfect one but match_strength 2 means that both are matched and match_strength 1 means you match only one column.
select * from (
select a.*, b.*, case when (a.vala = b.valb and a.groupa = b.groupb) then 2
when (a.vala = b.valb or a.groupa = b.groupb) then 1
else 0 end as match_strength,
row_number() over (partition by a.rowid order by
case when (a.vala = b.valb and a.groupa = b.groupb) then 2
when (a.vala = b.valb or a.groupa = b.groupb) then 1
else 0 end desc) r
from tab_a a, tab_b b)
where r = 1;
If you want to know exactly which column matches you can play with order by clause.

Assuming the requirement is to find all the ida for which all the pairs groupa, vala can be found in table_b (with no further information on why the ones that failed, failed) you could use the query below. The inner query actually shows why the ones that failed, failed (if you select * instead of just the ida). There is only one unusual thing in this solution - I have heard of the use of IN condition (and similar) for pairs, or tuples in general, instead of scalar values, but I hadn't used it till today. I just tested on your data and it works perfectly fine.
This works in the following general sense: it is not necessary to assume that groupa is unique for each ida, or the same for table_b; that is, (ida, groupa) does not have to be unique in the first table, nor does (idb, groupb) in the second table.
select distinct ida from tab_a where ida not in
(select ida from tab_a where (groupa, vala) not in (select groupb, valb from tab_b));
IDA
------
57
95
94
97

Extracting sub data from a table

I would be happy for your help.
I have a table like this :
[MS_CODE] [MS_SML]
1 43
1 AA
2 51
3 24
3 21
4 11
4 43
5 AA
6 11
I want to write a query that will serach for the [MS_SML] which shows up in group (1 or 2 or 3) And (4 or 5 or 6) in [MS_Code].
For example:
43,AA because 43 is in a row where ms_code is 1 and 4 and same for 'AA'. I would like to create output like this:
[MS_Code] [MS_SML]
1 43
4 43
1 AA
5 AA
Thank you very much for your help!

One method is to use exists and apply your criteria:
select t.*
from t
where exists (select 1
from t t2
where t2.ms_sml = t1.ms_sml and t2.ms_code in (1, 2, 3)
) and
exists (select 1
from t t2
where t2.ms_sml = t1.ms_sml and t2.ms_code in (4, 5, 6)
);

Here is one way to do it.
select ms_code, ms_sml
from msc
where ms_sml in
(
select ms_sml
from msc
where ms_code in (1,2,3)
intersect
select ms_sml
from msc
where ms_code in (4,5,6)
)
order by ms_sml, ms_code
Note: If there is more than one ms_code for a given ms_sml in the same group, this will return all of them.
Suppose AA is mapped to 1, 3 and 5, this will return
1 AA
3 AA
5 AA
If that is an issue, We may need additional logic to deal with that: for example pick the minimum value of ms_code within the group.

Update values of column so that it forms unique constraint with other column

I have table A with columns a1,a2,a3 and table B with columns b1,b2. Currenly a2 column (Table A) contains all null values, I want to update a2 values by random values from b2 (table B). but when updating a2, I need to check it forms a unique with column a1. Means it should be (a1,a2) should be unique. what is the best way achieve this. I am using sql 2008. But i need to make sure it works oracle too.
I have tried following to select the random it happens fine.
update A
set a2 = (SELECT TOP 1 b2 FROM B ORDER BY newid())
where a2 is null [but also need to form unique (a1,a2)]

Note: I changed this answer after clarification that table b will have less rows than table a.
This can be done with a MERGE.
First, assume table a has rows with the following a1 values: 1, 1, 1, 1, 2, 2, 2, 2, 7, 7, 10, 10, 10, 10, 12, 12, 13, 13, 13, 13, 15, 15.
Next, assume table b has rows with the following b2 values: 102, 103, 104, 105, 106, 107, 108.
Each a1 value can be paired with the b2 values on a rotating basis with this query:
SELECT * FROM
(SELECT a1, ROW_NUMBER() OVER (PARTITION BY a1 ORDER BY NULL) AS RowA
FROM a) TableA
INNER JOIN
(SELECT b2, ROW_NUMBER() OVER (Order by b2) AS RowB
FROM b) TableB
ON Tablea.RowA = TableB.RowB
A1 ROWA B2 ROWB
--- ---- --- ----
1 1 102 1 <-- first a1=1 goes with b2=102
1 2 103 2 <-- second a1=1 goes with b2=103
1 3 104 3 <-- third a1=1 goes with b2=104
1 4 105 4 <-- fourth a1=1 goes with b2=105
2 1 102 1 <-- start again: first a1=2 goes with b2=102
2 2 103 2 <-- and so on...
2 3 104 3
2 4 105 4
7 1 102 1
7 2 103 2
10 1 102 1
10 2 103 2
10 3 104 3
10 4 105 4
12 1 102 1
12 2 103 2
13 1 102 1
13 2 103 2
13 3 104 3
13 4 105 4
15 1 102 1
15 2 103 2
This isn't enough for the merge because it doesn't identify table a rows uniquely, but ROWID can take care of that. Here's the full query:
MERGE INTO a
USING (
SELECT * FROM
(SELECT
a.ROWID as ID,
a1,
ROW_NUMBER() OVER (PARTITION BY a1 ORDER BY a2) AS RowA
FROM a) TableA
INNER JOIN
(SELECT b2, ROW_NUMBER() OVER (Order by b2) AS RowB
FROM b) TableB
ON Tablea.RowA = TableB.RowB) AtoB
ON (a.ROWID = AtoB.ID)
WHEN MATCHED THEN UPDATE SET a.a2 = AtoB.b2
Here's what table a looks like after the update:
SELECT a1, a2 FROM a ORDER BY a1, a2;
A1 A2
--- ----
1 102
1 103
1 104
1 105
2 102
2 103
2 104
2 105
7 102
7 103
10 102
10 103
10 104
10 105
12 102
12 103
13 102
13 103
13 104
13 105
15 102
15 103

You can use the query like this
CREATE TABLE A(a1 INT ,a2 INT, a3 int)
CREATE TABLE B(b1 INT ,b2 INT )
INSERT A VALUES (2,NULL,1)
INSERT A VALUES (3,NULL,1)
INSERT A VALUES (4,NULL,1)
INSERT A VALUES (5,NULL,1)
INSERT A VALUES (2,NULL,1)
INSERT B VALUES (2,7)
INSERT B VALUES (12,7)
INSERT B VALUES (2,7)
INSERT B VALUES (2,7)
INSERT B VALUES (2,17)
INSERT B VALUES (2,70)
INSERT B VALUES (22,1)
SELECT * FROM A
UPDATE A SET a2=(SELECT TOP 1 b2 FROM B WHERE NOT EXISTS
(SELECT 1 FROM A T2 WHERE A.a1=T2.a1 AND A.a2=B.B2 )
ORDER BY NEWID())
SELECT * FROM A
SELECT * FROM B
DROP TABLE A
DROP TABLE B
It would be better if you provide create and insert statement to diagnose the problem quickly.

Return only first item from a related group

I have a block of data like this:
RW | PK A B C D
============================
1 | 1 aa 123 x 99
2 | 2 aa 234 v 98
3 | 3 bb 321 z 11
4 | 4 bb 210 w 91
5 | 5 cc 456 y 55
How can I grab just the first item of each set (ID'd by column A), like so?
RW | A B C D
=======================
1 | aa 123 x 99
2 | bb 321 z 11
3 | cc 456 y 55
I can GROUP BY or use DISTINCT but that's very inefficient with what I'm looking at, while running a straight list takes less than 100msecs. The two aforementioned options also may produce more than once instance of an item in column A, since the related values may differ.
In other words,
SELECT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
is very fast (less than a second), while
SELECT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
GROUP BY MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
and
SELECT DISTINCT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
takes a much longer amount of time (minutes, but I have not let it complete).
I need no aggregate functions (COUNT, SUM, etc.), just a listing, once per item. The number of occurrences per value in column A vary, so I can't just grab every x row.
Why don't I just run the list and use Excel or something like that to sort? I'm looking at a few million records to be returned, and I am not able to process so many records using any software that I am familiar with.

It sounds like you want something like
SELECT pk,
a,
b,
c,
d
FROM( SELECT pk,
a,
b,
c,
d,
row_number() over (partition by a order by pk asc) rnk
FROM your_table )
WHERE rnk = 1

Try this too..
select * from table where rowid in (select min(rowid) from table group by a);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get count of numbers from all columns in a large table - sql

Brute force method: SELECT Value ,COUNT(1) AS ValueCount FROM ( SELECT A1 AS Value FROM t UNION ALL SELECT A2 FROM t UNION ALL SELECT A3 FROM t UNION ALL SELECT A4 FROM t UNION ALL SELECT A5 FROM t UNION ALL SELECT A6 FROM t ) x GROUP BY Value

In Postgres, you can use lateral joins to unpivot values. I find this more direct than using an array or union all: select v.a, count(*) from t cross join lateral (values (a1), (a2), (a3), (a4), (a5), (a6) ) v(a) group by v.a; Here is a db<>fiddle.

Related

Reverse a number and sum of digits in sql

Incrementally comparing multiple value sets or lists between two tables in Oracle

Extracting sub data from a table

Update values of column so that it forms unique constraint with other column

Return only first item from a related group

Categories

Resources