SQL query to get conflicting values in JSONB from a group - sql

I have a table defined similar to the one below. location_id is a FK to another table. Reports are saved in an N+1 fashion: for a single location, N reporters are available, and there's one report used as the source of truth, if you will. Reports from reporters have a single-letter code (let's say R), the source of truth has a different code (let's say T). The keys for the JSONB column are regular strings, values are any combination of strings, integers and integral arrays.
create table report (
id integer not null primary key,
location_id integer not null,
report_type char(1),
data jsonb
)
Given all the information above, how can I get all location IDs where the data values for a given set of keys (supplied at query time) are not all the same for the report_type R?

There are at least two solid approaches, depending on how complex you want to get and how numerous and/or dynamic the keys are. The first is very straightforward:
select location_id
from report
where report_type = 'R'
group by location_id
having count(distinct data->'key1') > 1
or count(distinct data->'key2') > 1
or count(distinct data->'key3') > 1
The second construction is more complex, but has the advantage of providing a very simple list of keys:
--note that we also need distinct on location id to return one row per location
select distinct on(location_id) location_id
--jsonb_each returns the key, value pairs with value in type JSON (or JSONB) so the value field can handle integers, text, arrays, etc
from report, jsonb_each(data)
where report_type = 'R'
and key in('key1', 'key2', 'key3')
group by location_id, key
having count(distinct value) > 1
order by location_id

Related

Mapping array to composite type to a different row type

I want to map an array of key value pairs of GroupCount to a composite type of GroupsResult mapping only specific keys.
I'm using unnest to turn the array into rows, and then use 3 separate select statements to pull out the values.
This feels like a lot of code for something so simple.
Is there an easier / more concise way to do the mapping from the array type to the GroupsResult type?
create type GroupCount AS (
Name text,
Count int
);
create type GroupsResult AS (
Cats int,
Dogs int,
Birds int
);
WITH unnestedTable AS (WITH resultTable AS (SELECT ARRAY [ ('Cats', 5)::GroupCount, ('Dogs', 2)::GroupCount ] resp)
SELECT unnest(resp)::GroupCount t
FROM resultTable)
SELECT (
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Cats'),
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Dogs'),
(SELECT (unnestedTable.t::GroupCount).count FROM unnestedTable WHERE (unnestedTable.t::GroupCount).name = 'Birds')
)::GroupsResult
fiddle
http://sqlfiddle.com/#!17/56aa2/1
A bit simpler. :)
SELECT (min(u.count) FILTER (WHERE name = 'Cats')
, min(u.count) FILTER (WHERE name = 'Dogs')
, min(u.count) FILTER (WHERE name = 'Birds'))::GroupsResult
FROM unnest('{"(Cats,5)","(Dogs,2)"}'::GroupCount[]) u;
db<>fiddle here
See:
Aggregate columns with additional (distinct) filters
Subtle difference: our original raises an exception if one of the names pops up more than once, while this will just return the minimum count. May or may not be what you want - or be irrelevant if duplicates can never occur.
For many different names, crosstab() is typically faster. See:
PostgreSQL Crosstab Query

Compute value in SELECT based on whether a column is NOT NULL

I'm not sure of the correct terminology, but this should be clear when you see what I have so far.
SELECT
(
-- WHAT GOES HERE?
) as "Type",
COUNT(*) AS pie
FROM "people"
GROUP BY "Type"
ORDER BY COUNT(*) DESC, "Type"
I'm trying to classify people based on whether or not they have a value in any of these columns:
employee_id
student_id
with these types being possible:
Employee
Student
Both Employee & Student
(As you might have guessed from the SQL, this is going to generate a pie graph, so instead of putting anyone in 2 categories, I have a category that includes the people who are both employees and students.)
I believe a CASE expression would be suitable
CASE
WHEN employee_id IS NOT NULL AND student_id IS NOT NULL THEN 'Both'
WHEN employee_id IS NOT NULL then 'Employee'
WHEN student_id IS NOT NULL then 'Student'
ELSE 'None'
END AS "Type"
You can say GROUP BY 1 to select the first expression.
From https://www.postgresql.org/docs/10/sql-select.html ...
The optional GROUP BY clause has the general form
GROUP BY grouping_element [, ...]
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. An expression used inside a grouping_element can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.

SQL: combine two tables for a query

I want to query two tables at a time to find the key for an artist given their name. The issue is that my data is coming from disparate sources and there is no definitive standard for the presentation of their names (e.g. Forename Surname vs. Surname, Forename) and so to this end I have a table containing definitive names used throughout the rest of my system along with a separate table of aliases to match the varying styles up to each artist.
This is PostgreSQL but apart from the text type it's pretty standard. Substitute character varying if you prefer:
create table Artists (
id serial primary key,
name text,
-- other stuff not relevant
);
create table Aliases (
artist integer references Artists(id) not null,
name text not null
);
Now I'd like to be able to query both sets of names in a single query to obtain the appropriate id. Any way to do this? e.g.
select id from ??? where name = 'Bloggs, Joe';
I'm not interested in revising my schema's idea of what a "name" is to something more structured, e.g. separate forename and surname, since it's inappropriate for the application. Most of my sources don't structure the data, sometimes one or the other name isn't known, it may be a pseudonym, or sometimes the "artist" may be an entity such as a studio.
I think you want:
select a.id
from artists a
where a.name = 'Bloggs, Joe' or
exists (select 1
from aliases aa
where aa.artist = a.id and
aa.name = 'Bloggs, Joe'
);
Actually, if you just want the id (and not other columns), then you can use:
select a.id
from artists a
where a.name = 'Bloggs, Joe'
union all -- union if there could be duplicates
select aa.artist
from aliases aa
where aa.name = 'Bloggs, Joe';

Assign unique ID's to three tables in SELECT query, ID's should not overlap

I am working on SQL Sever and I want to assign unique Id's to rows being pulled from those three tables, but the id's should not overlap.
Let's say, Table one contains cars data, table two contains house data, table three contains city data. I want to pull all this data into a single table with a unique id to each of them say cars from 1-100, house from 101 - 200 and city from 300- 400.
How can I achieve this using only select queries. I can't use insert statements.
To be more precise,
I have one table with computer systems/servers host information which has id from 500-700.
I have another tables, storage devices (id's from 200-600) and routers (ids from 700-900). I have already collected systems data. Now I want to pull storage systems and routers data in such a way that the consolidated data at my end should has a unique id for all records. This needs to be done only by using SELECT queries.
I was using SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS UniqueID and storing it in temp tables (separate for storage and routers). But I believe that this may lead to some overlapping. Please suggest any other way to do this.
An extension to this question:
Creating consistent integer from a string:
All I have is various strings like this
String1
String2Hello123
String3HelloHowAreYou
I Need to convert them in to positive integers say some thing like
String1 = 12
String2Hello123 = 25
String3HelloHowAreYou = 4567
Note that I am not expecting the numbers in any order.Only requirement is number generated for one string should not conflict with other
Now later after the reboot If I do not have 2nd string instead there is a new string
String1 = 12
String3HelloHowAreYou = 4567
String2Hello123HowAreyou = 28
Not that the number 25 generated for 2nd string earlier can not be sued for the new string.
Using extra storage (temp tables) is not allowed
if you dont care where the data comes from:
with dat as (
select 't1' src, id from table1
union all
select 't2' src, id from table2
union all
select 't3' src, id from table3
)
select *
, id2 = row_number() over( order by _some_column_ )
from dat

How to set/serialize values based on results from multiple rows / multiple columns in postgresql

I have a table in which I want to calculate two columns values based on results from multiple rows / multiple columns. The primary key is set on the first two columns (tag,qid).
I would like to set the values of two fields (serial and total).
The "serial" column value is unique for each (tag,qid) so if I have 2 records with same tag, I must have record one with serial# 1 and record two with serial# 2 and so on. The serial must be calculated with accordance to priority field in which higher priority values must start serializing first.
the "total" column is the total number of each tag in the table
I would like to do this in plain SQL instead of creating a stored procedure/cursors, etc...
the table below shows full valid settings.
                                 
 +----+----+--------+-------+-----+  
 |tag |qid |priority|serial |total|  
 +--------------------------------+  
 |abc | 87 |  99    |  1    |  2  |  
 +--------------------------------+  
 |abc | 56 |  11    |  2    |  2  |  
 +--------------------------------+  
 |xyz | 89 |  80    |  1    |  1  |  
 +--------------------------------+  
 |pfm | 28 |  99    |  1    |  3  |  
 +--------------------------------+  
 |pfm | 17 |  89    |  2    |  3  |  
 +--------------------------------+  
 |pfm | 64 |  79    |  3    |  3  |  
 +----+----+--------+-------+-----+  
  
Many Thanks
You can readily return a result set with this information using window functions:
select tag, qid, priority,
row_number() over (partition by tag, qid order by priority desc) as serial,
count(*) over (partition by tag, qid) as total
from table t;