How to get unique values from each column based on a condition? - sql

I have been trying to find an optimal solution to select unique values from each column. My problem is I don't know column names in advance since different table has different number of columns. So first, I have to find column names and I could use below query to do it:
select column_name from information_schema.columns
where table_name='m0301010000_ds' and column_name like 'c%'
Sample output for column names:
c1, c2a, c2b, c2c, c2d, c2e, c2f, c2g, c2h, c2i, c2j, c2k, ...
Then I would use returned column names to get unique/distinct value in each column and not just distinct row.
I know a simplest and lousy way is to write select distict column_name from table where column_name = 'something' for every single column (around 20-50 times) and its very time consuming too. Since I can't use more than one distinct per column_name, I am stuck with this old school solution.
I am sure there would be a faster and elegant way to achieve this, and I just couldn't figure how. I will really appreciate any help on this.

You can't just return rows, since distinct values don't go together any more.
You could return arrays, which can be had simpler than you may have expected:
SELECT array_agg(DISTINCT c1) AS c1_arr
,array_agg(DISTINCT c2a) AS c2a_arr
,array_agg(DISTINCT c2b) AS c2ba_arr
, ...
FROM m0301010000_ds;
This returns distinct values per column. One array (possibly big) for each column. All connections between values in columns (what used to be in the same row) are lost in the output.
Build SQL automatically
CREATE OR REPLACE FUNCTION f_build_sql_for_dist_vals(_tbl regclass)
RETURNS text AS
$func$
SELECT 'SELECT ' || string_agg(format('array_agg(DISTINCT %1$I) AS %1$I_arr'
, attname)
, E'\n ,' ORDER BY attnum)
|| E'\nFROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
$func$ LANGUAGE sql;
Call:
SELECT f_build_sql_for_dist_vals('public.m0301010000_ds');
Returns an SQL string as displayed above.
I use the system catalog pg_attribute instead of the information schema. And the object identifier type regclass for the table name. More explanation in this related answer:
PLpgSQL function to find columns with only NULL values in a given table

If you need this in "real time", you won't be able to archive it using a SQL that needs to do a full table scan to archive it.
I would advise you to create a separated table containing the distinct values for each column (initialized with SQL from #Erwin Brandstetter ;) and maintain it using a trigger on the original table.
Your new table will have one column per field. # of row will be equals to the max number of distinct values for one field.
For on insert: for each field to maintain check if that value is already there or not. If not, add it.
For on update: for each field to maintain that has old value != from new value, check if the new value is already there or not. If not, add it. Regarding the old value, check if any other row has that value, and if not, remove it from the list (set field to null).
For delete : for each field to maintain, check if any other row has that value, and if not, remove it from the list (set value to null).
This way the load mainly moved to the trigger, and the SQL on the value list table will super fast.
P.S.: Make sure to pass all you SQL from trigger to explain plan to make sure they use best index and execution plan as possible. For update/deletion, just check if old value exists (limit 1).

Related

How to subscript a Postgres column

I have a Postgres query:
SELECT main
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
Which returns a single column of values in the format (value_a, value_b) in each row. I want the outer query to format those values so that all the value_a's and value_b's are in their own separate columns.
Is there an easy way to do this?
Output screenshot:
http://example.com/path-to-ghosts.jpg
You can abuse row_to_json to do this, but it is probably best to avoid anonymous record types in the first place.
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
To give a concrete example (after running pgbench -i):
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (aid, bid)
END as main
FROM pgbench_accounts
LIMIT 100) inner_t;
But it only works in v10 and up.
This is more of an explanation than an actual answer. But it won't fit into a comment.
The thing is, SQL is a strictly typed language. Postgres demands to know the number and data types in the SELECT list at call time. The *-expansion in SELECT * FROM .. is based on registered types. Postgres knows the columns of a table because the structure is saved in the catalog tables.
The expression nested in your construct (col_a, col_b) is short for ROW(col_a, col_b) and a ROW constructor creates an anonymous record. The manual:
By default, the value created by a ROW expression is of an anonymous
record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Postgres does not know how to expand an anonymous record. *-expansion does not work.
You could cast like the manual says. But that's only an option if the type is stable, i.e. you always put in the same number of columns with the same data types. And that still would not preserve column names.
So, for the best solution, first define:
Obviously you want to preserve column vales.
Do you also want to preserve column names?
Do you also want to preserve column types?
And:
Is the number of columns in the expression always the same?
Are data types always the same?
The CASE condition is stable or based on other columns?
If the true aim of the game is to fit multiple values in a single CASE expression, you only care about values, create a text array instead:
SELECT main[1] AS col_a, main[2] AS col_b
FROM (
SELECT CASE WHEN true THEN ARRAY[col_a::text, col_b::text] END AS main
FROM table1
LIMIT 100
) inner_t;
You lose name and type. You can cast and add aliases if you know name & type.
Else you have to describe your use case more closely - in the question.
Try Below query
SELECT split_part(main,',',1) as Val1,split_part(main,',',2) as Val2
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t

SQL - conditionally select a column if exists

I need to select a column only if it exists in table, else it can be set to null.
Sample table below, lets say the marks col is not necessary be there, so need to be checked if it exists
Table1:
name marks
joe 10
john 11
mary 13
Query:
select
name,
marks if it exists else null as marks1 -- pseudo code
from
table1
What should go in line to select marks ?
SQL Doesn't permit that. Your result set has two options:
Static inclusion
All from table or subquery through column-expansion with * and tbl.*
Perhaps this will suit your needs, SELECT * FROM table1; You'll always get that column, if it exists.
try this
IF COL_LENGTH('your_table_name','column_name_you_want_to_select') IS NULL BEGIN
--This means columns does not exist or permission is denied
END
else
--Do whatever you want
It is possible to achieve this in PostgreSQL using JSON. Consider the following SQL query:
SELECT c.relname, c.relkind, c.relispartition
FROM pg_class c
WHERE c.relkind IN ('r','p') AND
c.relnamespace=(SELECT oid FROM pg_namespace WHERE nspname='public')
In PostgreSQL 10+, that will show you the names of all the tables in public schema, including whether they are partitioned and if so whether the table is the partitioned table or one of the partitions of it. However, if you try to run the same query on PostgreSQL 9.6 or earlier, it will fail since the relispartition column does not exist on the pg_class table prior to PostgreSQL 10.
An obvious solution would be to dynamically generate the SQL based on a condition, or have two different versions of the SQL. However, suppose you don't want to do that, you want to have a single query which works on both versions – in other words, you want to conditionally select the relispartition column if it exists.
The core SQL language does not have any facility to conditionally select a column, but it is achievable in PostgreSQL using the row_to_json function, as follows:
SELECT c.relname, c.relkind,
(row_to_json(c)->>'relispartition')::boolean AS relispartition
FROM pg_class c
WHERE c.relkind IN ('r','p') AND
c.relnamespace=(SELECT oid FROM pg_namespace WHERE nspname='public')
If you try running that, you will find on PostgreSQL 10+ the relispartition column is returned as true/false, whereas in pre-10 versions it is NULL. You could make it return false instead of NULL in pre-10 versions by doing COALESCE((row_to_json(c)->>'relispartition')::boolean,false).
What this is doing, is row_to_json(c) turns all the data of the row into JSON. Next, ->>'relispartition' selects the value of the relispartition JSON object key as text, which will be the same as the value of the relispartition column; if there is no such key in the JSON, the result of that will be NULL. Then, ::boolean converts the string value true/false back into a PostgreSQL boolean value. (If your column is of some other type, use the appropriate cast for the type of your column.)
(Obviously this approach will not work in Postgres versions which are too old to have the necessary JSON support – I have tested it works in Postgres 9.4; while I haven't tested it in Postgres 9.3, it probably works there. However, I would not expect it to work in 9.2 or earlier – the ->> operator was added in 9.3, and the JSON type and row_to_json function was added in 9.2. However, I expect few people will need to support those old unsupported versions–9.3 was released in 2013, and 9.2 supported ended in 2017.)
Try this:
IF EXISTS( SELECT 1
FROM information_schema.columns
WHERE table_name='your_table' and column_name='your_column') THEN
SELECT your_column as 'some_column'
ELSE
SELECT NULL as 'some_column'
END IF
Replying to an old question yet again but here's my hacky solution to this problem since I don't know how to write SQL functions... yet! %I formats the string as an identifier, and if there is no such table the return value is NULL and the alias is used!
SELECT (SELECT format('%I', 'my_column')
AS my_column_alias
FROM information_schema.columns
WHERE table_name='my_table'
AND column_name='my_column')
FROM source_table
Hope this helps everybody out there =)

Oracle Selecting Columns with IN clause which includes NULL values

So I am comparing two Oracle databases by grabbing random rows in database A, and searching for these rows in database B based off their key columns. Then I compare the rows which are returned in java.
I am using the following query to find rows in database B using the key columns from database A:
select * from mytable
Where (Key_Column_A,Key_Column_B,Key_Column_C)
in (('1','A', 'cat'),('2','B', 'dog'),('3','C', ''));
This works just fine for the first two sets of keys, but the third key('3','C', '') does not work because there is a null value in the third column. Changing the statement to ('3','C', NULL) or changing the SQL to
select * from mytable
Where (Key_Column_A,Key_Column_B,Key_Column_C)
in ((('1','A', 'cat'),('2','B', 'dog'),('3','C', ''))
OR (Key_Column_A,Key_Column_B,Key_Column_C) IS NULL);
will not work either.
Is there a way to include a null column in an IN clause? And if not, is there a way to efficiently do the same thing? (My only solution currently is to create a check to make sure there are no nullable columns in my keys which would make this process rather unefficient and somewhat messy).
You can use it this way. I think it would work.
select * from mytable
Where (NVL(Key_Column_A,''),NVL(Key_Column_B,''),NVL(Key_Column_C,''))
in (('1','A', 'cat'),('2','B', 'dog'),('3','C', ''));
I am not sure about this (Key_Column_A,Key_Column_B,Key_Column_C) IS NULL. Wouldn't this imply that all of the columns (A,B,C) are NULL ?

SQLite WHERE-Clause for every column?

Does SQLite offer a way to search every column of a table for a searchkey?
SELECT * FROM table WHERE id LIKE ...
Selects all rows where ... was found in the column id. But instead to only search in the column id, I want to search in every column if the searchstring was found. I believe this does not work:
SELECT * FROM table WHERE * LIKE ...
Is that possible? Or what would be the next easy way?
I use Python 3 to query the SQLite database. Should I go the route to search through the dictionary after the query was executed and data returned?
A simple trick you can do is:
SELECT *
FROM table
WHERE ((col1+col2+col3+col4) LIKE '%something%')
This will select the record if any of these 4 columns contain the word "something".
No; you would have to list or concatenate every column in the query, or reorganize your database so that you have fewer columns.
SQLite has full-text search tables where you can search all columns at once, but such tables do not work efficiently with any other queries.
I could not comment on #raging-bull answer. So I had to write a new one. My problem was, that I have columns with null values and got no results because the "search string" was null.
Using coalesce I could solve that problem. Here sqlite chooses the column content, or if it is null an empty string (""). So there is an actual search string available.
SELECT *
FROM table
WHERE (coalesce(col1,"") || coalesce(col2,"") || coalesce(col3,"") || coalesce(col4,"")) LIKE '%something%')
I'm not quite sure, if I understood your question.
If you want the whole row returned, when id=searchkey, then:
select * from table where id=searchkey;
If you want to have specific columns from the row with the correct searchkey:
select col1, col2, col3 from table where id=searchkey;
If you want to search multiple columns for the "id": First narrow down which columns this could be found in - you don't want to search the whole table! Then:
select * from table where col1=searchkey or col2=searchkey or col3=searchkey;

Mysql Query data filled check

I had created the table with 200 columns and i had inserted data
Now i need to check that specific 100 columns in one row are filled or not,how can we check this using mysql query .the primary key is defined .please help me out how to resolve this.
select * from tablename where column1 != null or column2 != null ......
That is a lot of columns so at the risk of being mysql server version specific you can use the information schema to get the column names and then write a SQL procedure or something in your chosen shell / language that iterates over them performing a test.
select distinct COLUMN_NAME as 'Field', IS_NULLABLE from information_schema.columns where TABLE_SCHEMA="YourDatabase" and TABLE_NAME="YourTableName" and TABLE_NAME not like "%view%" escape '!' ;
The example above will tell you the column name as "Field" and tell you if it can hold a NULL. Having the field name may give you a better way of automating a field name specific test.