SQL query to get field value distribution dynamically - sql

Is there a way to get the value distribution for a field in SQL dynamically?
I have a table with 250 fields. I would like to get the value distribution for each of these fields:
field0
value0: 10
value1: 100
value2: 30
...
valueN: X
field1
value0: 2
value1: 124
value2: 8
...
valueN: Y
....
I know that with case + sum it is possible to generate this but then the possible values have to be put in the query in advance:
SELECT
, Sum( Case When field0 = value0 Then 1 Else 0 End ) As [0]
, Sum( Case When field0 = value1 Then 1 Else 0 End ) As [1]
, Sum( Case When field0 = value2 Then 1 Else 0 End ) As [2]
, Sum( Case When field0 = value3 Then 1 Else 0 End ) As [3]
, Sum( Case When field0 = valueN Then 1 Else 0 End ) As [4]
FROM table
Is there a way to do this dynamically?

There is an a way for rows result (not columns result ) using aggregation function eg: count
SELECT field0,
COUNT(*)
FROM table
GROUP BY field0
if you want a columns result as in your code . for some DB brand there are PIVOT functionatlities .

With Postgres you could do something like this:
select t.name as column_name,
sum(val::int) as sum
from data d, jsonb_each_text(to_jsonb(d) - 'id') as t(name, val)
group by t.name;
The - 'id' removes the id attribute from the generated JSON. Another option to only include certain columns in the aggregation is to add a where condition:
select column_name,
sum(val::int) as sum
from (
select t.name as column_name,
t.val
from data d, jsonb_each_text(to_jsonb(d)) as t(name, val)
) t
where column_name like 'col%'
group by column_name;
With the following sample table:
create table data
(
id serial primary key,
col1 int,
col2 int,
col3 int,
col4 int,
col5 int
);
insert into data (col1, col2, col3, col4, col5)
values
(1, 2, 3, 4, 5),
(6, 7, 8, 9, 10),
(11, 12, 13, 14, 15);
The query would return:
column_name | sum
------------+----
col1 | 18
col2 | 21
col5 | 30
col4 | 27
col3 | 24

Related

How to calculate how many null values are present in each column?

I have a table A like this :
col1col2col3
1 0null
nullnullnull
3nullnull
null 5 1
I want an output like this in Oracle 10G :
column_namenull_count
col1 2
col2 2
col3 3
I have achieved this using UNION ALL like this:
select "col1" column_name,sum(case when col1 is null then 1 else 0 end) as null_count from A group by "col1"
union all
select "col2" column_name,sum(case when col2 is null then 1 else 0 end) as null_count from A group by "col2"
union all
select "col3" column_name,sum(case when col3 is null then 1 else 0 end) as null_count from A group by "col3";
It is working fine , but it is taking lots of time , as there are nearly 100 UNION ALLs . I want to achieve the same output without using UNION ALL.
Is there any way to achieve this without using UNION ALL ?
You can use UNPIVOT for that (I am not sure if the ancient Oracle 10 already supported that - I haven't used that for over a decade)
select colname, count(*) - count(val) as num_nulls
from t1
UNPIVOT include nulls
(val for colname in (col1 as 'C1',
col2 as 'C2',
col3 as 'C3'))
group by colname
order by colname;
Not sure if that is faster though.
Online example: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=4e807b8b2d8080abac36574f776dbf04
Oracle 10g doesn't support the UNPIVOT and PIVOT operators, so to do what you're after in 10g, you'd need to use a dummy table (containing the same number of rows as columns being unpivoted - in your case, that's 3), like so:
WITH your_table AS (SELECT 1 col1, 0 col2, NULL col3 FROM dual UNION ALL
SELECT NULL col1, NULL col2, NULL col3 FROM dual UNION ALL
SELECT 3 col1, NULL col2, NULL col3 FROM dual UNION ALL
SELECT NULL col1, 5 col2, 1 col3 FROM dual)
SELECT CASE WHEN dummy.id = 1 THEN 'col1'
WHEN dummy.id = 2 THEN 'col2'
WHEN dummy.id = 3 THEN 'col3'
END column_name,
COUNT(CASE WHEN dummy.id = 1 THEN CASE WHEN col1 IS NULL THEN 1 END
WHEN dummy.id = 2 THEN CASE WHEN col2 IS NULL THEN 1 END
WHEN dummy.id = 3 THEN CASE WHEN col3 IS NULL THEN 1 END
END) null_count
FROM your_table
CROSS JOIN (SELECT LEVEL ID
FROM dual
CONNECT BY LEVEL <= 3) dummy
GROUP BY dummy.id;
COLUMN_NAME NULL_COUNT
----------- ----------
col1 2
col2 2
col3 3
If you think that will take an age to write for a large number of columns, you can always write a query that will generate the bulk of the case statements yourself, e.g.:
SELECT 'when dummy.id = '||row_number() OVER (PARTITION BY owner, table_name ORDER BY column_id)||' then '''||LOWER(column_name)||'''' first_part,
'when dummy.id = '||row_number() OVER (PARTITION BY owner, table_name ORDER BY column_id)||' then case when '||column_name||' is null then 1 end' second_part
FROM all_tab_columns a
WHERE owner = ...
AND table_name = ...
-- and column_name in (...)
ORDER BY column_id;
(I included the row_number() analytic function rather than using column_id because if you're excluding some columns, the column_id column will no longer be consecutive numbers starting with 1.)

Create an array with NULL values/0 and find array length excluding null/0

I want to find the number of columns in a range in each row which has non-null and >0 value.
I have done this currently using case when statements or IF-ELSE. But the number of columns that i have to now consider has increased and with that the number of case statements too.
So i wanted to create an array of the columns and then find the length of the array after excluding 0 and NULL values.
I tried the follow code but i am getting an error
**case1**
SELECT [col1,col2,col3,col4,col5] from input_Table
Error: Array cannot have a null element; error in writing field
**case2**
SELECT *,
ARRAY(SELECT col1,col2,col3,col4,col5
from input_table
WHERE col1 is not null and col2 is not null ...)
from input_Table
Error: ARRAY subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values at [2:3]
Below is a snapshot of my data
The output that i want is
1
2
0
It would be super helpful if somebody can help me with this, I am very new to Bigquery.
I want to find the number of columns in a range in each row which has non-null and >0 value ...
Option 1
Below is for BigQuery and generic enough to work for any number of columns
SELECT *,
(SELECT COUNT(1)
FROM UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t), r'"col\d+":(.*?)[,}]')
) value
WHERE NOT value IN ('null', '0')
) AS non_null_0_count
FROM `project.dataset.table` t
Above assumes pattern for columns as col1, col2, .., colNN
You can test , play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col1, 0 col2, 0 col3, 0 col4, 0 col5 UNION ALL
SELECT 2, 4, 5, 0, 0, 0 UNION ALL
SELECT 3, NULL, NULL, NULL, NULL, NULL
)
SELECT *,
(SELECT COUNT(1)
FROM UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t), r'"col\d+":(.*?)[,}]')
) value
WHERE NOT value IN ('null', '0')
) AS non_null_0_count
FROM `project.dataset.table` t
with result
Row id col1 col2 col3 col4 col5 non_null_0_count
1 1 1 0 0 0 0 1
2 2 4 5 0 0 0 2
3 3 null null null null null 0
Option 2
In case if above mentioned column pattern is not really a case - this approach still works - see example below - you just need to enumerate those columns within regexp
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 abc, 0 xyz, 0 qwe, 0 asd, 0 zxc UNION ALL
SELECT 2, 4, 5, 0, 0, 0 UNION ALL
SELECT 3, NULL, NULL, NULL, NULL, NULL
)
SELECT *,
(SELECT COUNT(1)
FROM UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t), r'"(?:abc|xyz|qwe|asd|zxc)":(.*?)[,}]')
) value
WHERE NOT value IN ('null', '0')
) AS non_null_0_count
FROM `project.dataset.table` t
with result as
Row id abc xyz qwe asd zxc non_null_0_count
1 1 1 0 0 0 0 1
2 2 4 5 0 0 0 2
3 3 null null null null null 0
Option 3
Obviously, the most simple and straightforward option is
#standardSQL
SELECT *,
(
SELECT COUNT(1)
FROM (
SELECT col1 col UNION ALL
SELECT col2 UNION ALL
SELECT col3 UNION ALL
SELECT col4 UNION ALL
SELECT col5
)
WHERE NOT col IS NULL AND col != 0
) AS non_null_0_count
FROM `project.dataset.table` t
One method is to simply use case -- because you know the number of columns:
select id,
(case when col1 = 0 or col1 is null then 0
when col2 = 0 or col2 is null then 1
when col3 = 0 or col3 is null then 2
when col4 = 0 or col4 is null then 3
when col5 = 0 or col5 is null then 4
else 5
end) as result
from t;
Although can do fancy manipulations with arrays, I don't see a need for this, given that the number of columns is finite and the case expression is pretty simple.

Pivoting row's to columns

How to achieve the below??
Anyone help me out
col_1 col_2
A 1
B 1
C 1
B 2
C 4
A 2
A 6
Output:
A B C
1 1 1
2 2 4
6
This will do the job, but it seems like quite an odd thing to want to do, so I am probably missing something?
CREATE TABLE #table (col1 CHAR(1), col2 INT);
INSERT INTO #table SELECT 'A', 1;
INSERT INTO #table SELECT 'B', 1;
INSERT INTO #table SELECT 'C', 1;
INSERT INTO #table SELECT 'B', 2;
INSERT INTO #table SELECT 'C', 4;
INSERT INTO #table SELECT 'A', 2;
INSERT INTO #table SELECT 'A', 6;
WITH Ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) AS rank_id
FROM
#table),
Numbers AS (
SELECT 1 AS number
UNION ALL
SELECT number + 1 FROM Numbers WHERE number < 50)
SELECT
MAX(CASE WHEN col1 = 'A' THEN col2 END) AS [A],
MAX(CASE WHEN col1 = 'B' THEN col2 END) AS [B],
MAX(CASE WHEN col1 = 'C' THEN col2 END) AS [C]
FROM
Numbers n
INNER JOIN Ranked r ON r.rank_id = n.number
GROUP BY
n.number;
Results are:
A B C
1 1 1
2 2 4
6 NULL NULL
Looks like you are trying to pivot without aggregation? Here is another option:
select A, B, C from
( select col1, col2, dense_rank() over (partition by col1 order by col2) dr from #table) t
pivot
( max(t.col2) for t.col1 in (A, B, C)) pvt;
Also check this out for more examples/discussion: TSQL Pivot without aggregate function

Distinct by comparing multiple columns SQL

I have a select query
Select col1,col2,col3
from table;
The table contains following rows
col1 col2 col3
A | B | C
B | A | C
C | B | C
I need to get the distinct result which contains a single combination A,B,C by comparing multiple columns.
Result Should be some thing like
col1 col2 col3
A | B | C
the order can be changed of result rows.
How can I achieve this ?
Please try out this, I am not sure about you proper requirement. But on sample data given above. I have came across this solution,
With CTE as
(
Select MIN(col1) as col1 from MyTable
)
Select * from CTE
cross apply
(
Select MIN(col2) as col2 from MyTable
where col2 <> CTE.col1
)as a
cross apply
(
Select MIN(col3) as col3 from MyTable
where col3 not in (CTE.col1,a.col2)
)as b
DEMO HERE
SELECT *
FROM table
WHERE (col1 = 'A' AND col2 = 'B' AND col3 = 'C')
You can also go with this below query if the number of column and the values are known.
The CASE statement is the closest to IF in SQL
SELECT
CASE
WHEN (col1 = 'A' and col2 = 'B' and col3='C') or (col1 = 'C' and col2 = 'A' and col3='B') or (col1 = 'B' and col2 = 'C' and col3='A' )
THEN 1
ELSE 0
END as RESULT, *
FROM table
From the result you can take the required output by checking the value of RESULT==1(integer)
If you want the result as a boolean value then do the CAST like ,
SELECT
CAST(
CASE
WHEN (col1 = 'A' and col2 = 'B' and col3='C') or (col1 = 'C' and col2 = 'A' and col3='B') or (col1 = 'B' and col2 = 'C' and col3='A' )
THEN 1
ELSE 0
END as RESULT_INT)
as RESUTL, *
FROM table

SQL: return only those columns with different data

I have two rows from a table that has many columns. How do I return only those columns where the value for row1 does not equal the value for row2?
I'm using Oracle 11.1.0.07
~~ Edit: clarification ~~
Example:
So I've got a table with rows:
1 a b c d e f g h i j k l
2 a x c d e x g h y j k l
3 a b x d e x g h x y k z
I want to return rows where id (first column) is 1 or 3, only those columns that are different. So:
1 c f i j l
3 x x x y z
with column names.
In reality, the table I'm pulling from has 223007 rows, and 40 columns. The above is a simplified example. There are two rows (one each for primary key values) that I'm wanting to compare.
First, the SQL language was not designed for dynamic column generation. For that, you need to write dynamic SQL which should be done in a middle-tier or reporting component.
Second, if what you seek is to compare two specific rows, then the simplest solution would probably be to return those rows and analyze them in a middle-tier component. However, if you accept that we must return all columns and you insist on doing this in SQL, this is one solution:
With Inputs As
(
Select 1 As Col1,'a' As Col2,'b' As Col3,'c' As Col4,'d' As Col5,'e' As Col6,'f' As Col7,'g' As Col8,'h' As Col9,'i' As Col10,'j' As Col11,'k' As Col12,'l' As Col13
Union All Select 2,'a','x','c','d','e','x','g','h','y','j','k','l'
Union All Select 3,'a','b','x','d','e','x','g','h','x','y','k','z'
)
, TransposedInputs As
(
Select Col1, 2 As ColNum, Col2 As Value From Inputs
Union All Select Col1, 3, Col3 From Inputs
Union All Select Col1, 4, Col4 From Inputs
Union All Select Col1, 5, Col5 From Inputs
Union All Select Col1, 6, Col6 From Inputs
Union All Select Col1, 7, Col7 From Inputs
Union All Select Col1, 8, Col8 From Inputs
Union All Select Col1, 9, Col9 From Inputs
Union All Select Col1, 10, Col10 From Inputs
Union All Select Col1, 11, Col11 From Inputs
Union All Select Col1, 12, Col12 From Inputs
Union All Select Col1, 13, Col13 From Inputs
)
, UniqueValues As
(
Select Min(Col1) As Col1, ColNum, Value
From TransposedInputs
Where Col1 In(1,3)
Group By ColNum, Value
Having Count(*) = 1
)
Select Col1
, Min( Case When ColNum = 2 Then Value End ) As Col2
, Min( Case When ColNum = 3 Then Value End ) As Col3
, Min( Case When ColNum = 4 Then Value End ) As Col4
, Min( Case When ColNum = 5 Then Value End ) As Col5
, Min( Case When ColNum = 6 Then Value End ) As Col6
, Min( Case When ColNum = 7 Then Value End ) As Col7
, Min( Case When ColNum = 8 Then Value End ) As Col8
, Min( Case When ColNum = 9 Then Value End ) As Col9
, Min( Case When ColNum = 10 Then Value End ) As Col10
, Min( Case When ColNum = 11 Then Value End ) As Col11
, Min( Case When ColNum = 12 Then Value End ) As Col12
, Min( Case When ColNum = 13 Then Value End ) As Col13
From UniqueValues
Group By Col1
Results:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10 | Col11 | Col12 | Col13
1 | NULL | NULL | c | NULL | NULL | f | NULL | NULL | i | j | NULL | l
3 | NULL | NULL | x | NULL | NULL | x | NULL | NULL | x | y | NULL | z
If you're trying to transpose or pivot your row1 and row2 into columns, then these questions might help you:
Oracle PIVOT, twice?
Oracle SQL pivot query
etc (Google for Oracle PIVOT Query)
After pivoting, you can select only those tuples that have row1_pivoted <> row2_pivoted
Hmm. first stab at an answer was wrong when I re-read the question. So... for clarification, you've got some rows/values
a b c
d e f
a b c
and you'd like only the 'd e f' row returned, because it doesn't have a duplicate row elsewhere?
The number of columns in the result set can't be dynamic (without resorting to dynamic SQL).
You might be interested in the Unpivot operator. That would let you return the columns as rows.
I haven't experiemented with it myself yet, so unfortunately I'm unable to help you with it :/
Edit
I wanted to give manual pivoting a shot :)
select *
from inputs;
ID C1 C2 C3 C4 C5 C6
--- -- -- -- -- -- --
1 a b c d e f
2 a x c d e x
3 a b x d e x
with unpivoted as(
select id, 'c1' as cn, c1 as cv from inputs union all
select id, 'c2' as cn, c2 as cv from inputs union all
select id, 'c3' as cn, c3 as cv from inputs union all
select id, 'c4' as cn, c4 as cv from inputs union all
select id, 'c5' as cn, c5 as cv from inputs union all
select id, 'c6' as cn, c6 as cv from inputs
)
select cn
,max(case when id = 1 then cv end) as id1
,max(case when id = 3 then cv end) as id3
from unpivoted
where id in(1,3)
group
by cn
having count(distinct cv) = 2;
CN ID1 ID3
-- --- ---
c3 c x
c6 f x
The above works by creating one row for each column and ID (2 * 6 = 12 rows).
Then I group by the column name (assigned as a literal).
I will always get 6 groups (one for each column). In each group I will have exactly two rows (one for each selected ID).
In the having clause, I count the number of unique values for the column. If the rows have the same value, then the numner of unique values = 1. Else we have a mismatch.
Note 1. id in(x,y) is pushed into the view, so we are not selecting the entire table.
Note 2. This cannot be extended into comparing more than 2 rows.
Note 3. This does not deal with NULLS in either column