How to get the difference between the frequency of two fields? - sql

I have a PostgreSQL database table that contains two columns a and b.
When I query all the entries of the table I get:
{1, 2},
{2, 3},
{2, 3}
So the value:
(1) appeared in field a 1 time and in field b 0 times
(2) appeared in field a 2 times and in field b 1 time
(3) appeared in field a 0 times and in field b 2 times
I want to get the following output:
{1, 1},
{2, 1},
{3, -2}
where the first field is the value stored in the database and the second field is difference.
How can I achieve that?
I first query the database and the result is in query_result
then I get the frequency of the first and second element:
f0 = query_result |> Enum.frequencies_by(&elem(&1, 0))
f1 = query_result |> Enum.frequencies_by(&elem(&1, 1))
(f0 |> Map.keys) ++ (f1 |> Map.keys) |> Enum.uniq |> Enum.into(%{}, fn key -> {key, (f0[key] || 0) - (f1[key] || 0)} end)
I am looking for a simpler way to do this.

Use a single query:
SELECT val, COALESCE(a.ct, 0) - COALESCE(b.ct, 0) AS freq_diff
FROM (
SELECT a AS val, count(*) AS ct
FROM tbl
GROUP BY 1
) a
FULL JOIN (
SELECT b AS val, count(*) AS ct
FROM tbl
GROUP BY 1
) b USING (val);
fiddle
FULL [OUTER] JOIN, because either value may be missing in the other column.
COALESCE to defend against NULL values resulting from the join.

Related

check if array is ordered subset postgres

how to check if an array is an ordered subset of another array in PostgreSQL?
[1, 2] ordered_subset [1, 4, 2] -> true
[1, 2] ordered_subset [2, 3, 1, 5] -> false
You can first filter out elements from the second array that do not belong to the first, and then check if the first is a subarray of the second by using generate_series:
select exists (select 1 from generate_series(1, array_length(t1.a3, 1)) v2
where t1.a1 = t1.a3[v2:v2+array_length(t1.a1, 1)-1]) from
(select t.a1, (select array_agg(v) from unnest(t.a2) v
where exists (select 1 from unnest(t.a1) v1 where v1 = v)) a3
from array_inp t
) t1
See fiddle.

Case statement with four columns, i.e. attributes

I have a table with values "1", "0" or "". The table has four columns: p, q, r and s.
I need help creating a case statement that returns values when the attribute is equal to 1.
For ID 5 the case statement should return "p s".
For ID 14 the case statement should return "s".
For ID 33 the case statement should return 'p r s". And so on.
Do I need to come with a case statement that has every possible combination? Or is there a simpler way. Below is what I have come up with thus far.
case
when p = 1 and q =1 then "p q"
when p = 1 and r =1 then "p r"
when p = 1 and s =1 then "p s"
when r = 1 then r
when q = 1 then q
when r = 1 then r
when s = 1 then s
else ''
end
One solution could be this which uses a case for each attribute to return the correct value, surrounded by a trim to remove the trailing space.
with tbl(id, p, q, r, s) as (
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual
)
select id,
trim(regexp_replace(case p when 1 then 'p' end ||
case q when 1 then 'q' end ||
case r when 1 then 'r' end ||
case s when 1 then 's' end, '(.)', '\1 '))
from tbl;
The real solution would be to fix the database design. This design technically violates Boyce-Codd 4th normal form in that it contains more than 1 independent attribute. The fact an ID "has" or "is part of" attribute p or q, etc should be split out. This design should be 3 tables, the main table with the ID, the lookup table containing info about attributes that the main ID could have (p, q, r or s) and the associative table that joins the two where appropriate (assuming an ID row could have more than one attribute and an attribute could belong to more than one ID), which is how to model a many-to-many relationship.
main_tbl main_attr attribute_lookup
ID col1 col2 main_id attr_id attr_id attr_desc
5 5 1 1 p
14 5 4 2 q
14 4 3 r
4 s
Then it would be simple to query this model to build your list, easy to maintain if an attribute description changes (only 1 place to change it), etc.
Select from it like this:
select m.ID, m.col1, listagg(al.attr_desc, ' ') within group (order by al.attr_desc) as attr_desc
from main_tbl m
join main_attr ma
on m.ID = ma.main_id
join attribute_lookup al
on ma.attr_id = al.attr_id
group by m.id, m.col1;
You can use concatenations with decode() functions
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as "String"
from t;
Demo
If you need spaces between letters, consider using :
with t(id,p,q,r,s) as
(
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual union all
select 31,null,0,null,1 from dual union all
select 33,1,0,1,1 from dual
), t2 as
(
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as str
from t
), t3 as
(
select id, substr(str,level,1) as str, level as lvl
from t2
connect by level <= length(str)
and prior id = id
and prior sys_guid() is not null
)
select id, listagg(str,' ') within group (order by lvl) as "String"
from t3
group by id;
Demo
in my opinion, its a bad practice to use columns for relationships.
you should have two tables, one that's called arts and another that is called mapping art looks like this:
ID - ART
1 - p
2 - q
3 - r
4 - 2
...
and mapping maps your base-'ID's to your art-ids and looks like this
MYID - ARTID
5 - 1
5 - 4
afterwards, you should make use of oracles pivot operator. its more dynamically

How can I test whether all of the rows in a table are duplicated (except for one column)

I am working with a datawarehouse table that has can be split into claimed rows, and computed rows.
I suspect that the computed rows are perfect duplicates of the claimed row (with the exception of the claimed/computed column).
I tried to test this using the except clause:
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'clm'
EXCEPT
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'cmp'
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
edit: the solution should work with an arbitrary number of fields, with varying types. In this case, I have ~100 fields, 2/3 of which may have null values. This is a data warehouse, and some degree of denormalization must be expected.
edit: I tested the query while limiting myself to non-null columns, and I got the result I expected (nothing).
But, I would still like to compare fields which potentially contain null values.
Your supposition would appear to be false. You might try this:
select a, b, c,
sum(case when clm_cmp_cd = 'clm' then 1 else 0 end) as num_clm,
sum(case when clm_cmp_cd = 'cmp' then 1 else 0 end) as num_cmp
from t
group by a, b, c;
This will show you the values of the three columns and the number of matches of each type.
Your problem is probably that values that look alike are not exactly the same. This could be due to slight differences in floating point number or due to unmatched characters in the string, such as leading spaces.
Let's look how Db2 works with NULL values in GROUP BY and INTERSECT:
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd='clm'
intersect
select a, b
from t
where clm_cmp_cd='cmp';
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd in ('clm', 'cmp')
group by a, b
having count(1)>1;
Both queries return the same result:
A B
-- --
1 1
<null> 1
NULL values are treated as the same by these operators.
If you have too many columns in your table to specify them manually in your query, you may produce the column list with the following query:
select listagg(colname, ', ')
from syscat.columns
where tabschema='MYSCHEMA' and tabname='TABLE' and colname<>'CLM_CMP_CD';

Select from multiple rows as one row with defaults

Here is what my table looks like:
Table items
idx bigint unique
merkle char(64)
tag text
digest char(64)
Since idx is unique, I will use the subscript operator [] to signify the field corresponding to the idx speficied, so for example by merkle[i] I will mean the merkle field in the row that has as idx the value i.
What I would like is a query that, for a given i, selects tag[i], digest[i], merkle[2 * i], merkle[2 * i + 1], with default values for merkle[2 * i] and merkle[2 * i + 1] if no rows exist with those idx values.
So for example, say that I have
idx merkle tag digest
1 merk1 tag1 dig1
I would like my query to return tag1, dig1, "default", "default". If I have
idx merkle tag digest
1 merk1 tag1 dig1
2 merk2 tag2 dig2
I would like to get tag1, dig1, merk2, "default", if I have
idx merkle tag digest
1 merk1 tag1 dig1
2 merk2 tag2 dig2
3 merk3 tag3 dig3
I would like to get tag1, dig1, merk2, merk3, and so on.
How can I do such a thing? Is it possible to do it in just one transaction with the database? (Of course I could do it with three separate queries, but that looks inefficient.)
You can do it using LEFT JOIN and COALESCE:
SELECT t1.idx, t1.tag, t1.digest,
COALESCE(t2.merkle, 'default'),
COALESCE(t3.merkle, 'default')
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON t2.idx = 2 * t1.idx
LEFT JOIN mytable AS t3 ON t3.idx = 2 * t1.idx + 1
This will match every row with idx = i with rows with idx = 2 * i and idx = 2 * i + 1. If there is no match for either of these indices (or both), then default will be selected.

SQL preferred way for keyword search

I have a table in the following format:
row_key extID tag val
------- ----- --- ---
1 1 A a
2 1 A b
3 1 B c
4 2 A d
5 2 C e
Now I want to have all extID's where there are several pairs of (tag, val) with specific values, for example:
(tag, val) = (A,a) AND (tag, val) = (B,c)
or,
(tag, val) = (C,e)
The number of constrains can change.
I can think of several ways to do this:
Perform a self-join for each constraint
Do the searching (iteratively) in the caller program (multiple SQL queries)
(Maybe?) write a SQL function to do this
Nested SELECT clauses (passing to the outer level the "extID" and using WHERE extID IN (SELECT extID FROM ...)
The only true solution that I just can't find.
Which one would be the preferred (fastest and most elegant) way to do this? (Except, of course, "Surely, 5. is the correct answer.")
I think a multiple SELF-join is quite elegant. However, I do not know if it is fast and comparatively memory-efficient.
Further, I would like to use a way that works with MySQL, PostgreSQL and SQLite without adaptation - That's why I can't use PIVOT afaiu.
SELECT extID
FROM tableName
WHERE (tag = 'A' AND val = 'a') OR
(tag = 'B' AND val = 'c')
GROUP BY extID
HAVING COUNT(*) = 2
SQLFiddle Demo
SQL of Relational Division
UPDATE 1
since you haven't mentioned that there can be duplicate combination of tag and val, DISTINCT keyword is needed.
SELECT extID
FROM tableName
WHERE (tag = 'A' AND val = 'a') OR
(tag = 'B' AND val = 'c')
GROUP BY extID
HAVING COUNT(DISTINCT tag, val) = 2
SQLFiddle Demo
The tuple syntax would work:
SELECT extID
FROM tableName
WHERE (tag, val) in (('A', 'a'), ('B', 'c'))
GROUP BY extID
HAVING COUNT(DISTINCT tag, val) = 2
The HAVING COUNT(DISTINCT tag, val) = 2 ensures that each constraint tuple was present at least once. This means that the 2 needs to be adjusted to the number of constraint tuples in the query.
This would even work if you have two identical rows like this and the condition is ('C', 'e'):
row_key extID tag val
------- ----- --- ---
5 2 C e
6 2 C e
The query for this would look like this:
SELECT extID
FROM tableName
WHERE (tag, val) in (('C', 'e'))
GROUP BY extID
HAVING COUNT(DISTINCT tag, val) = 1