Gathering Database table information? - sql

I am trying to find below information from the table in tabular format . I can get Rows Count, Column Name and Attribute (DataType)
No.Of columns
No.Of Rows Count
Column name
Attribute (DataType)
Min Value
Max Value
Non null count
Distinct count of the column
Any idea?

Many of these items can be found in the INFORMATION_SCHEMA.COLUMNS view, and the rest can be found by querying the table itself. You say you want this data in a tabular format, but many of the items do not 'fit' together. Can you provide a sample of what the result set should look like?
-- No.Of columns
SELECT COUNT(*)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'your_table'
-- No.Of Rows Count
SELECT COUNT(*)
FROM your_table
--Column name
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'your_table'
--Attribute (DataType)
SELECT DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'your_table'
--Min Value
SELECT MIN(column_1)
FROM your_table
--Max Value
SELECT MAX(column_1)
FROM your_table
--Non null count
SELECT SUM(CASE WHEN column_1 IS NOT NULL THEN 1 ELSE 0 END) AS not_null_count
FROM your_table
--Distinct count of the column
SELECT COUNT(*)
FROM your_table
GROUP BY column_1

Related

Oracle SQL: query to calculate percentage of non-null values for each column in a table

I know I can use this query to get all column names for a given table:
select column_name from all_tab_columns where table_name='my_table';
And I can use this query to calculate the percentage of non-null values for a given column in a table:
select count(col_1) / count(*), count(col_2) / count(*)
from my_table
But I want to combine these two queries to get the percentage of non-null values for all columns in a given table (without having to manually type out the column names for each table)
desired output for a given table:
column_name, completeness
col_1, 0.8
col_2, 1.0
col_3, 0.0
Is it possible to do this with only Select statements (no PL/SQL loops)?
The number of non-null values of a table is already in the data dictionary view all_tab_columns:
CREATE TABLE t AS SELECT * FROM all_objects;
EXECUTE DBMS_STATS.GATHER_TABLE_STATS(NULL, 'T');
SELECT column_name, num_nulls FROM all_tab_columns WHERE table_name='T';
COLUMN_NAME NUM_NULLS
OWNER 0
OBJECT_NAME 0
SUBOBJECT_NAME 66502
OBJECT_ID 0
DATA_OBJECT_ID 62642
...
This GATHER_TABLE_STATS analyzes the table and stores the number of null values. If you (or somebody else) inserts/deletes/updates to the table, the statistics snapshot is not exact any more, of course.
To get your "completeness" ratio, you'll need not only num_nulls, but also the number of total rows in the table. To get them, you'll need to join to the view ALL_TABLES, column NUM_ROWS, and substract NUM_ROWS - NUM_NULLS:
SELECT table_name, column_name,
t.num_rows - c.num_nulls / NULLIF(t.num_rows, 0) AS completeness
FROM all_tables t
JOIN all_tab_columns c
USING (owner, table_name);

How to fetch column names count from Tables based on columns passed into IN clause

I am trying to fetch column count from all_tab_columns for particular columns passed into IN clause like:
select count(COLUMN_NAME)
from ALL_TAB_COLUMNS
WHERE owner='SA' and COLUMN_NAME IN ('CASE_REPORTER2SITE', 'PRIMARY2BUS_ORG')
GROUP BY COLUMN_NAME
HAVING COLUMN_NAME IN ('CASE_REPORTER2SITE', 'PRIMARY2BUS_ORG')
ORDER BY DECODE (COLUMN_NAME, 'CASE_REPORTER2SITE', 1 'PRIMARY2BUS_ORG', 2)
Here when both the columns exist in the database, It gives me count of both columns in 2 rows.
RESULT:
COUNT(COLUMN_NAME)
------------------
2
4
But, when I pass one existing and 1 non existing column like:
select count(COLUMN_NAME)
from ALL_TAB_COLUMNS
WHERE owner='SA' and COLUMN_NAME IN ('CASE_XYZ', 'PRIMARY2BUS_ORG')
GROUP BY COLUMN_NAME
HAVING COLUMN_NAME IN ('CASE_XYZ', 'PRIMARY2BUS_ORG')
ORDER BY DECODE (COLUMN_NAME, 'CASE_XYZ', 1 'PRIMARY2BUS_ORG', 2)
(Assume CASE_XYZ does not exist).
It gives me count result in 1 row.
RESULT:
COUNT(COLUMN_NAME)
------------------
4
Expected Result:
COUNT(COLUMN_NAME)
------------------
0
4
How to get the count as 0 for the particular column if it does not exist?
You can solve this using left join. With a smart subquery, this eliminates the in and the decode() in the order by as well.
I would phrase the query as:
with cols as (
select 'CASE_REPORTER2SITE' as col, 1 as ordering from dual union all
select 'PRIMARY2BUS_ORG', 2 as ordering from dual
)
select cols.col, count(atc.col_name)
from cols left join
all_tab_columns atc
on cols.col = atc.col_name and atc.owner = 'SA'
group by cols.col
order by max(cols.ordering);
Note: I also included the column name in the output, because I think that is a good practice.
Mmmm, the only way I can think of is joining the all_Tab_columns table with a self created from dual, left outer join to keep those who are null, and put 0 when null like this:
select t.column_name,case when count is null then 0 else count end from (
SELECT 'CASE_XYZ' as colname from dual
union
SELECT 'PRIMARY2BUS_ORG' from dual) t
left outer join (
select column_name, count(COLUMN_NAME) as count from ALL_TAB_COLUMNS
WHERE owner='SA' and COLUMN_NAME IN ('CASE_XYZ', 'PRIMARY2BUS_ORG')
GROUP BY COLUMN_NAME HAVING COLUMN_NAME IN ('CASE_XYZ', 'PRIMARY2BUS_ORG')
ORDER BY DECODE (COLUMN_NAME, 'CASE_XYZ', 1 'PRIMARY2BUS_ORG', 2)) s
on t.colname = s.column_name

SQL query to determine that values in a column are unique

How to write a query to just determine that the values in a column are unique?
Try this:
SELECT CASE WHEN count(distinct col1)= count(col1)
THEN 'column values are unique' ELSE 'column values are NOT unique' END
FROM tbl_name;
Note: This only works if 'col1' does not have the data type 'ntext' or 'text'. If you have one of these data types, use 'distinct CAST(col1 AS nvarchar(4000))' (or similar) instead of 'distinct col1'.
select count(distinct column_name), count(column_name)
from table_name;
If the # of unique values is equal to the total # of values, then all values are unique.
IF NOT EXISTS (
SELECT
column_name
FROM
your_table
GROUP BY
column_name
HAVING
COUNT(*)>1
)
PRINT 'All are unique'
ELSE
PRINT 'Some are not unique'
If you want to list those that aren't unique, just take the inner query and run it. HTH.
With this following query, you have the advantage of not only seeing if your columns are unique, but you can also see which combination is most non-unique. Furthermore, because you still see frequency 1 is your key is unique, you know your results are good, and not for example simply missing; something is less clear when using a HAVING clause.
SELECT Col1, Col2, COUNT(*) AS Freq
FROM Table
GROUP BY Col1, Col2
ORDER BY Freq DESC
Are you trying to return only distinct values of a column? If so, you can use the DISTINCT keyword. The syntax is:
SELECT DISTINCT column_name,column_name
FROM table_name;
If you want to check if all the values are unique and you care about NULL values, then do something like this:
select (case when count(distinct column_name) = count(column_name) and
(count(column_name) = count(*) or count(column_name) = count(*) - 1)
then 'All Unique'
else 'Duplicates'
end)
from table t;
select (case when count(distinct column1 ) = count(column1)
then 'Unique'
else 'Duplicates'
end)
from table_name
By my understanding you want to know which values are unique in a column. Therefore, using select distinct to do so doesn't solve the problem, because only lists the value as if they are unique, but they may not.
A simple solution as follows:
SELECT COUNT(column_name), column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) = 1;
this code return distinct value
SELECT code FROM #test
group by code
having count(distinct code)= count(code)
return 14 that is just unique value
Use the DISTINCT keyword inside a COUNT aggregate function as shown below:
SELECT COUNT(DISTINCT column_name) AS some_alias FROM table_name
The above query will give you the count of distinct values in that column.

T-SQL Writing a Count Statement to find two values

I am trying to get two columns to appear. I have made a union of number of tables together. These tables then appear in one table now.
After this table I know need to do a summary count of one column.
This column contains two values. So i require to get count on text value 1 and text value 2 in the column.
select count (column_name) as column_name
FROM table name
where column_name = 'value1'
But i am not sure how to add value 2 into this statement? Any help be great. Much appreciated.
You can use pivot, but I think conditional aggregation is easier in this case:
select sum(case when column_name = 'value1' then 1 else 0 end) as value1,
sum(case when column_name = 'value2' then 1 else 0 end) as value2
from table name;
If you can live with the values on two rows instead of in two columns, use group by:
select column_name, count(*)
from table name
group by column_name;
I not sure what you want but whatever I understand, I think this will help you -
select
Sum ( case when column_name = 'value1' then 1 else 0 end) as CountValue1,
Sum ( case when column_name = 'value2' then 1 else 0 end) as CountValue2
FROM table name
select column_name, count (*)
FROM
(
select column_name from table1
union all
select column_name from table2
) src
group by column_name
where column_name in ( 'value1' ,'value2')

Getting the first and/or last record(s) in a column

What are the orders of the following SQL statements from most to least efficient?
NOTE: Let the syntaxes be language agnostic (i.e. use TOP 1 in the proper place instead of LIMIT 1). Assume that table_name is the name of the table, column_name is the name of a column, id is the name of a column with a primary key, and the table has tens of thousands of records.
SELECT FIRST(column_name) FROM table_name
SELECT column_name FROM table_name LIMIT 1
SELECT column_name FROM table_name WHERE id=1 --assumes the first id is 1
SELECT column_name FROM table_name WHERE id=(SELECT MIN(id) FROM table_name)
AND
SELECT LAST(column_name) FROM table_name
SELECT column_name FROM table_name ORDER BY id DESC LIMIT 1
SELECT column_name FROM table_name WHERE id=(SELECT COUNT(id) FROM table_name) --assumes no values have been skipped
SELECT column_name FROM table_name WHERE id=(SELECT MAX(id) FROM table_name)
I don't know how to benchmark these statements but my guess is 2, 3, 1, 4 for getting the first record and 3, 2, 1, 4 for getting the last record.
I am guessing but; FIRST and LAST are grouping operations. For the query engine to get their results it would need to collect the data first.
The others just need to return the first row it finds.
If there's an index on 'id' ordering it descending is not very expensive.