PosgtreSQL join tables on json field - sql

Let's say we have 2 tables-
Employee:
id integer (pk)
name char
code char
Status:
id integer (pk)
key char
data jsnob
Now here is the sample data of above tables:
Employee
+----+--------+------+
| id | name | code |
+----+--------+------+
| 1 | Brian | BR1 |
| 2 | Andrew | AN1 |
| 3 | Anil | AN2 |
| 4 | Kethi | KE1 |
| 5 | Smith | SM1 |
+----+--------+------+
Status
+----+---------+---------------------------------------+
| id | key | data |
+----+---------+---------------------------------------+
| 1 | Admin | {'BR1':true, 'AN1':true,'KE1':false} |
| 2 | Staff | {'SM1':true, 'AN2':true,'KE1':false} |
| 3 | Member | {'AN2':false, 'AN1':true,'KE1':false} |
| 4 | Parking | {'BR1':true, 'AN1':true,'KE1':false, |
| | | 'AN2':true,'SM1':true} |
| 5 | System | {'AN2':false, 'AN1':true,'KE1':true} |
| 6 | Ticket | {'AN2':false, 'AN1':true,'KE1':false} |
+----+---------+---------------------------------------+
Now my goal is to get status and name of failure keys, employee code wise. For ex:-
I am not an expert in sql complex queries, so any help is much appreciate.
Note: Above are just sample tables (name and data changed), but design is similar to original tables.

you can use the function jsonb_each_text yo get key and value from jsonb type, and if we mix with sql and ... voilĂ , the following query is a example for you case :
select employee.code,
case
when dat2.count is null then 'TRUE'
else
'FALSE'
end as status,
case
when dat2.count is null then 0
else
dat2.count
end as failures, string_agg as key from employee left join
(
select key, count(*), string_agg(code,',') from (
select key code , (jsonb_each_text(data)).key,(jsonb_each_text(data)).value
from status) as dat
where value='false'
group by 1 ) dat2 on (employee.code=dat2.key)

As far as I can tell, you don't need the employee table for this (because you only want the code column which is also present in the JSON values of the status table). It's enough to unnest the JSON value and then aggregate on the code from that.
select d.code,
bool_and(d.flag::boolean) as status_flag,
count(*) filter (where not d.flag::boolean) as failures,
coalesce(string_agg(key, ', ') filter (where not d.flag::boolean), 'N/A') as keys
from status st
join lateral jsonb_each_text(st.data) as d(code, flag) on true
group by d.code
order by d.code;
The filter() option is used to only include rows in the aggregate that comply with the where condition. In this case those where the value for the code is false.
bool_and is an aggregate function for boolean values that returns true if all input values are true (and false otherwise)
Online example: https://rextester.com/PEKCZ52605

Use a left-join query between those tables, and apply jsonb_each_text() function for jsonb type column.
The trick is to use conditionals as case when (js).value = 'false' then .. else .. end for the aggregated columns :
select e.id, e.code,
min(case when (js).value = 'false' then 'FALSE' else 'TRUE' end ) as status,
count(case when (js).value = 'false' then 1 end) as failures,
coalesce(
string_agg(case when (js).value = 'false' then s.key end, ',' ORDER BY s.id),'NA'
) as key
from Employee e
left join
(
select *, jsonb_each_text(data) as js
from Status
) s on e.code = (js).key
group by e.id, e.code
order by e.id;
where (js).value is extracted from jsonb type Status.data column
Demo

Related

Getting latest values from different columns based on other column

So I've a query like this
select
user_id,
MAX(
case when attribute = 'one' then newValue else 'No Update' end
) as one,
MAX(
case when attribute = 'two' then newValue else 'No Update' end
) as two,
MAX(
case when attribute = 'thre' then newValue else 'No Update' end
) as thre,
from table
group by user_id
This query returns result as max value for that particular attribute value in different column.
There is an updated_at column too in this table. Now instead of this, I want that the returned column should contain the latest value according to that updated_at field.
So basically the column one, two and thre should either contain latest values according to updated_at field. If no values are there, then the column should contain No Update string.
What could be the right way?
Example
user_id | attribute | newValue | updatedAt
1 | one | null | 2018-01-20
1 | one | b | 2018-01-21
1 | one | a | 2018-01-22
1 | two | null | 2018-01-23
1 | two | null | 2018-01-24
1 | two | null | 2018-01-25
So for above table the current query would return result as this coz b is the Max value for attribute=one
user_id | one | two
1 | b | No Update
But I want the result of column one to be latest one according to updatedAt column like this
user_id | one | two
1 | a | No Update
Postgres doesn't have first and last aggregation functions, but you can get similar functionality using arrays:
select user_id,
coalesce((array_agg(newvalue order by updatedAt desc) filter (where attribute = 'one'))[1], 'No update') as one,
coalesce((array_agg(newvalue order by updatedAt desc) filter (where attribute = 'two'))[1], 'No update') as two
from t
group by user_id;
Here is a db<>fiddle.
First use DISTINCT ON (user_id, attribute) to get the only the rows with the latest updatedAt for each attribute and then aggregate on the results:
select user_id,
coalesce(max(case when attribute = 'one' then newvalue end), 'No Update') one,
coalesce(max(case when attribute = 'two' then newvalue end), 'No Update') two
from (
select distinct on (user_id, attribute) *
from tablename
order by user_id, attribute, updatedAt desc
) t
group by user_id
See the demo.
Results:
> user_id | one | two
> ------: | :-- | :--------
> 1 | a | No Update

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.
You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i
You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;
First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id
No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple

How do I transform the specific row value into column headers in hive [duplicate]

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).
I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
+--------+----------+-----------+
| 1 | B | 3 |
+--------+----------+-----------+
| 2 | A | 9 |
+--------+----------+-----------+
| 2 | C | 40 |
+--------+----------+-----------+
How do I query the database to return something like
+--------+------+-----+-----+
| hostid | A | B | C |
+--------+------+-----+-----+
| 1 | 10 | 3 | 0 |
+--------+------+-----+-----+
| 2 | 9 | 0 | 40 |
+--------+------+-----+-----+
I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.
I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:
select * from history;
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
| 1 | B | 3 |
| 2 | A | 9 |
| 2 | C | 40 |
+--------+----------+-----------+
This will be our goal, the pretty pivot table:
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).
When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):
select the columns of interest, i.e. y-values and x-values
extend the base table with extra columns -- one for each x-value
group and aggregate the extended table -- one group for each y-value
(optional) prettify the aggregated table
Let's apply these steps to your problem and see what we get:
Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.
Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:
create view history_extended as (
select
history.*,
case when itemname = "A" then itemvalue end as A,
case when itemname = "B" then itemvalue end as B,
case when itemname = "C" then itemvalue end as C
from history
);
select * from history_extended;
+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A | B | C |
+--------+----------+-----------+------+------+------+
| 1 | A | 10 | 10 | NULL | NULL |
| 1 | B | 3 | NULL | 3 | NULL |
| 2 | A | 9 | 9 | NULL | NULL |
| 2 | C | 40 | NULL | NULL | 40 |
+--------+----------+-----------+------+------+------+
Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.
Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:
create view history_itemvalue_pivot as (
select
hostid,
sum(A) as A,
sum(B) as B,
sum(C) as C
from history_extended
group by hostid
);
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.
Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:
create view history_itemvalue_pivot_pretty as (
select
hostid,
coalesce(A, 0) as A,
coalesce(B, 0) as B,
coalesce(C, 0) as C
from history_itemvalue_pivot
);
select * from history_itemvalue_pivot_pretty;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
And we're done -- we've built a nice, pretty pivot table using MySQL.
Considerations when applying this procedure:
what value to use in the extra columns. I used itemvalue in this example
what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)
Known limitations:
this solution doesn't allow n columns in the pivot table -- each pivot column needs to be manually added when extending the base table. So for 5 or 10 x-values, this solution is nice. For 100, not so nice. There are some solutions with stored procedures generating a query, but they're ugly and difficult to get right. I currently don't know of a good way to solve this problem when the pivot table needs to have lots of columns.
SELECT
hostid,
sum( if( itemname = 'A', itemvalue, 0 ) ) AS A,
sum( if( itemname = 'B', itemvalue, 0 ) ) AS B,
sum( if( itemname = 'C', itemvalue, 0 ) ) AS C
FROM
bob
GROUP BY
hostid;
Another option,especially useful if you have many items you need to pivot is to let mysql build the query for you:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'ifnull(SUM(case when itemname = ''',
itemname,
''' then itemvalue end),0) AS `',
itemname, '`'
)
) INTO #sql
FROM
history;
SET #sql = CONCAT('SELECT hostid, ', #sql, '
FROM history
GROUP BY hostid');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
FIDDLE
Added some extra values to see it working
GROUP_CONCAT has a default value of 1000 so if you have a really big query change this parameter before running it
SET SESSION group_concat_max_len = 1000000;
Test:
DROP TABLE IF EXISTS history;
CREATE TABLE history
(hostid INT,
itemname VARCHAR(5),
itemvalue INT);
INSERT INTO history VALUES(1,'A',10),(1,'B',3),(2,'A',9),
(2,'C',40),(2,'D',5),
(3,'A',14),(3,'B',67),(3,'D',8);
hostid A B C D
1 10 3 0 0
2 9 0 40 5
3 14 67 0 8
Taking advantage of Matt Fenwick's idea that helped me to solve the problem (a lot of thanks), let's reduce it to only one query:
select
history.*,
coalesce(sum(case when itemname = "A" then itemvalue end), 0) as A,
coalesce(sum(case when itemname = "B" then itemvalue end), 0) as B,
coalesce(sum(case when itemname = "C" then itemvalue end), 0) as C
from history
group by hostid
I edit Agung Sagita's answer from subquery to join.
I'm not sure about how much difference between this 2 way, but just for another reference.
SELECT hostid, T2.VALUE AS A, T3.VALUE AS B, T4.VALUE AS C
FROM TableTest AS T1
LEFT JOIN TableTest T2 ON T2.hostid=T1.hostid AND T2.ITEMNAME='A'
LEFT JOIN TableTest T3 ON T3.hostid=T1.hostid AND T3.ITEMNAME='B'
LEFT JOIN TableTest T4 ON T4.hostid=T1.hostid AND T4.ITEMNAME='C'
use subquery
SELECT hostid,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='A' AND hostid = t1.hostid) AS A,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='B' AND hostid = t1.hostid) AS B,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='C' AND hostid = t1.hostid) AS C
FROM TableTest AS T1
GROUP BY hostid
but it will be a problem if sub query resulting more than a row, use further aggregate function in the subquery
If you could use MariaDB there is a very very easy solution.
Since MariaDB-10.02 there has been added a new storage engine called CONNECT that can help us to convert the results of another query or table into a pivot table, just like what you want:
You can have a look at the docs.
First of all install the connect storage engine.
Now the pivot column of our table is itemname and the data for each item is located in itemvalue column, so we can have the result pivot table using this query:
create table pivot_table
engine=connect table_type=pivot tabname=history
option_list='PivotCol=itemname,FncCol=itemvalue';
Now we can select what we want from the pivot_table:
select * from pivot_table
More details here
My solution :
select h.hostid, sum(ifnull(h.A,0)) as A, sum(ifnull(h.B,0)) as B, sum(ifnull(h.C,0)) as C from (
select
hostid,
case when itemName = 'A' then itemvalue end as A,
case when itemName = 'B' then itemvalue end as B,
case when itemName = 'C' then itemvalue end as C
from history
) h group by hostid
It produces the expected results in the submitted case.
I make that into Group By hostId then it will show only first row with values,
like:
A B C
1 10
2 3
I figure out one way to make my reports converting rows to columns almost dynamic using simple querys. You can see and test it online here.
The number of columns of query is fixed but the values are dynamic and based on values of rows. You can build it So, I use one query to build the table header and another one to see the values:
SELECT distinct concat('<th>',itemname,'</th>') as column_name_table_header FROM history order by 1;
SELECT
hostid
,(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue else '' end) as col1
,(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue else '' end) as col2
,(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue else '' end) as col3
,(case when itemname = (select distinct itemname from history a order by 1 limit 3,1) then itemvalue else '' end) as col4
FROM history order by 1;
You can summarize it, too:
SELECT
hostid
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue end) as A
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue end) as B
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue end) as C
FROM history group by hostid order by 1;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
Results of RexTester:
http://rextester.com/ZSWKS28923
For one real example of use, this report bellow show in columns the hours of departures arrivals of boat/bus with a visual schedule. You will see one additional column not used at the last col without confuse the visualization:
** ticketing system to of sell ticket online and presential
This isn't the exact answer you are looking for but it was a solution that i needed on my project and hope this helps someone. This will list 1 to n row items separated by commas. Group_Concat makes this possible in MySQL.
select
cemetery.cemetery_id as "Cemetery_ID",
GROUP_CONCAT(distinct(names.name)) as "Cemetery_Name",
cemetery.latitude as Latitude,
cemetery.longitude as Longitude,
c.Contact_Info,
d.Direction_Type,
d.Directions
from cemetery
left join cemetery_names on cemetery.cemetery_id = cemetery_names.cemetery_id
left join names on cemetery_names.name_id = names.name_id
left join cemetery_contact on cemetery.cemetery_id = cemetery_contact.cemetery_id
left join
(
select
cemetery_contact.cemetery_id as cID,
group_concat(contacts.name, char(32), phone.number) as Contact_Info
from cemetery_contact
left join contacts on cemetery_contact.contact_id = contacts.contact_id
left join phone on cemetery_contact.contact_id = phone.contact_id
group by cID
)
as c on c.cID = cemetery.cemetery_id
left join
(
select
cemetery_id as dID,
group_concat(direction_type.direction_type) as Direction_Type,
group_concat(directions.value , char(13), char(9)) as Directions
from directions
left join direction_type on directions.type = direction_type.direction_type_id
group by dID
)
as d on d.dID = cemetery.cemetery_id
group by Cemetery_ID
This cemetery has two common names so the names are listed in different rows connected by a single id but two name ids and the query produces something like this
CemeteryID Cemetery_Name Latitude
1 Appleton,Sulpher Springs 35.4276242832293
You can use a couple of LEFT JOINs. Kindly use this code
SELECT t.hostid,
COALESCE(t1.itemvalue, 0) A,
COALESCE(t2.itemvalue, 0) B,
COALESCE(t3.itemvalue, 0) C
FROM history t
LEFT JOIN history t1
ON t1.hostid = t.hostid
AND t1.itemname = 'A'
LEFT JOIN history t2
ON t2.hostid = t.hostid
AND t2.itemname = 'B'
LEFT JOIN history t3
ON t3.hostid = t.hostid
AND t3.itemname = 'C'
GROUP BY t.hostid
I'm sorry to say this and maybe I'm not solving your problem exactly but PostgreSQL is 10 years older than MySQL and is extremely advanced compared to MySQL and there's many ways to achieve this easily. Install PostgreSQL and execute this query
CREATE EXTENSION tablefunc;
then voila! And here's extensive documentation: PostgreSQL: Documentation: 9.1: tablefunc or this query
CREATE EXTENSION hstore;
then again voila! PostgreSQL: Documentation: 9.0: hstore

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

divide a column into two based on another column value - ORACLE

First, hope the title expresses the issue. Otherwise, any suggest is welcomed. My issue is I have the following table structure:
+----+------+------------------+-------------+
| ID | Name | recipient_sender | user |
+----+------+------------------+-------------+
| 1 | A | 1 | X |
| 2 | B | 2 | Y |
| 3 | A | 2 | Z |
| 4 | B | 1 | U |
| | | | |
+----+------+------------------+-------------+
Whereby in the column recipient_sender the value 1 means the user is recipient, the value 2 means the user is sender.
I need to present data in the following way:
+----+------+-----------+---------+
| ID | Name | recipient | sender |
+----+------+-----------+---------+
| 1 | A | X | Z |
| 2 | B | U | Y |
+----+------+-----------+---------+
I've tried self-join but it did not work. I cannot use MAX with CASE WHEN, as the number of records is too big.
Note: Please ignore the bad table design as it's just a simplified example of the real one
Please try:
SELECT
MIN(ID) ID
Name,
max(case when recipient_sender=1 then user else null end) sender,
max(case when recipient_sender=2 then user else null end) recipient
From yourTable
group by Name
maybe you can try this:
select min(id) id,
name,
max(decode(recipient_sender, 1, user, '')) sender,
max(decode(recipient_sender, 2, user, '')) recipient
from t
group by name
You can check a demo here on SQLFiddle.
You can select values with this query
SELECT t.id,
t.name,
case
when t.recipient_sender = 1 then
t.user
ELSE
t2.user
END as recipient,
case
when t.recipient_sender = 2 then
t.user
ELSE
t2.user
END as sender
FROM your_table t
JOIN your_table t2
ON t.name = t2.name
AND t.id != t2.id
after this query you can add DISTINCT keyword or GROUP them ...
this query is used to join tables with column NAME but if you have some identity for message , join tables using that ,
Create new Table (with better struct):
insert into <newtable> as
select distinct
id,
name,
user as recipient,
(select user from <tablename> where id = recip.id and name = recip.name) as sender
from <tablename> recip
sorry, have no oracle here.