Flat to multi-level nested data - sql

In BQ, I have used ARRAY_AGG(STRUCT(... to restructure some flat data but wanted to go a level further: create another array of records within an array of records.
Although STRUCT does not exist in PostgreSQL, I am interested how one would tackle that there too.
Considering the flat data:
WITH a AS (
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'work' ph_type, '+123' ph_nr
UNION ALL
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'cell' ph_type, '+987'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'work' ph_type, '+127'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'cell' ph_type, '+988'
UNION ALL
SELECT 'XYZ' company, 'adress3' address, 'name3' name, 'email3' email, 'work' ph_type, '+456'
)
I can nest contact like so
SELECT company, address, ARRAY_AGG(STRUCT(name, email, ph_type, ph_nr)) contact
FROM a
GROUP BY company, address
ORDER BY 1
but how can I nest, in the same select statement, phones as well (array of records within contact) ?
The JSON representation would look like - for the first contact:
[
{
"company": "ABC",
"address": "adress1",
"contact": [
{
"name": "name1",
"email": "email1",
"phone": [
{
"ph_type": "work",
"ph_nr": "+123"
},
{
"ph_type": "cell",
"ph_nr": "+987"
}
},
...
This can probably be done with a WITH clause or subselect to process the aggregations sequentially but not sure this would perform well (data read twice ?).
I have 600M records to parse daily so wondering about the most efficient way.
EDIT: corrected name definition

The answer to your question is two levels of aggregation.
However, the question itself confuses me, because the query uses name but that is not defined in the data.
Here is an example of what to do:
SELECT company, address, ARRAY_AGG(STRUCT(email, phones)) as contact
FROM (SELECT company, name, address, email, ARRAY_AGG(STRUCT(ph_type, ph_nr)) as phones
FROM a
GROUP BY company, name, address, email
) a
GROUP BY company, address
ORDER BY 1

Related

stack or union multiple fields in MS Access

Beginner's question here... I have a table of tree measurements being 3 fields: - ID, Diameter_1, Diameter_2
& I wish to get to these 3 fields: - ID, DiameterName, DiameterMeasurement
Input and Desired Output
SELECT DISTINCT ID, Diameter_1
FROM tblDiameters
UNION SELECT DISTINCT ID, Diameter_2
FROM tblDiameters;
Though it results in only 2 fields. How may the field: - DiameterMeasurement be brought in?
Many thanks :-)
You were on the right track to use a union. Here is one viable approach:
SELECT ID, 'Diameter_1' AS DiameterName, Diameter_1 AS DiameterMeasurement
FROM tblDiameters
UNION ALL
SELECT ID, 'Diameter_2', Diameter_2
FROM tblDiameters
ORDER BY ID, DiameterName;

Get all tables data with Node.js and SQLite

Having these tables into a db:
Athlete with fields: athlete_id, name, surname, date_of_birth, height, weight, bio, photo_id
AthletePhoto with fields: photo_id, photo, mime_type
AthleteResult with fields: athlete_id, gold, silver, bronze
Game with fields: game_id, city, year
The db model:
The code so far can only send data for one of the tables:
db.serialize(function () {
db.all(
'SELECT athlete_id, name, surname FROM Athlete',
function (err, rows) {
return res.send(rows);
}
);
});
so it uses that query: SELECT athlete_id, name, surname FROM Athlete.
Is there a way to combine the tables and send all data?
I've tried to combine 2 tables, Athlete and AthletePhoto but didn't send any data:
SELECT athlete_id, name FROM Athlete UNION SELECT game_id, city, year FROM Game UNION SELECT photo_id as athlete_id, mime_type as name FROM AthletePhoto
Assuming that your database structure correctly represents your application needs, the query which you are trying to make will look something like this:
SELECT
a.athlete_id, a.name, a.surname, a.date_of_birth, a.bio, a.height, a.weight,
ap.photo, ap.mime_type,
ar.gold, ar.silver, ar.bronze,
g.city, g.year
FROM
(
(
(Athlete a JOIN AthletePhoto ap ON a.photo_id = ap.photo_id)
JOIN
AthleteResults ar ON a.athlete_id = ar.athlete_id
)
JOIN
Game g ON ar.game_id = g.game_id
)
There is one mistake in Athlete table, that date_of_birth column is defined twice. You should rename anyone of them. There is no need to use UNION in your query if you want to combine results of different tables, use JOIN instead.
JOIN Combines different tables row-wise
UNION Combines different tables column-wise

Is it possible to process external data as some kind of virtual table?

I have the following problem
I need to make a funnel, where I need to show correlation between the data I have and data from DB.
I have a query of the following kind:
select name, count(distinct email) from some_table
where name = 'name_1' and email in ('email 1', 'email 2', 'email 3') or
name = 'name_2' and email in ('email 2', 'email 4', 'email 5')
group by 1
Is it possible to process the data in where statement in such a way that I could address to it as to a table I mean if there is a possibility to count the emails by name in where statement to in something like this?
select name, count(emails in the list), count(distinct email)
to have the result like this
name_1 3 2
name_2 3 1
...
listed emails can be absent it the some_table and if I'm to join tables, I'm to join 3 different tables for every data piece that are not directly related. The data I have is manually processed and is not added to DB.
You can use a VALUES list to construct your virtual table, and join that to your real table.
select f.name, count(distinct email) from some_table join
(VALUES ('name_1', '{email 1,email 2,email 3}'::text[]),
('name_2', '{email 2,email 4,email 5}')) f(name,emails)
on some_table.name=f.name and ARRAY[email] && emails
group by 1
I had to switch from an IN-list to an equivalent array operation, because the tables can have arrays but can't have lists.
count(emails in the list)
I'm not sure what this means. Maybe this:?
select f.name, cardinality(emails), count(distinct email) from some_table join
(VALUES ('name_1', '{email 1,email 2,email 3}'::text[]),
('name_2', '{email 2,email 4,email 5}')) f(name,emails)
on some_table.name=f.name and ARRAY[email] && emails
group by 1,2

Access SQL Group by with condition

I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?
In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.
You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000

Oracle Query - Use of Analytical functions

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:
Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))
The data in the flat file looks like this:
'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'
You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number.
The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients
who are like Tom Smith – patients with more than one medical record number.
I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance
SELECT a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
FROM data a, data b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.date_of_birth = b.date_of_birth
AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.
with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
res as (select firstname,
lastname,
date_of_birth,
medical_record_number,
count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
from data)
select distinct firstname,
lastname,
date_of_birth,
medical_record_number
from res
where cnt_med_rec_nums > 1;
*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.
You can probably simplify the query a bit using a HAVING clause rather than doing a self-join
SELECT a.firstname,
a.lastname,
a.date_of_birth,
MIN(a.medical_record_number) lowest_medical_record_number,
MAX(a.medical_record_number) highest_medical_record_number
FROM data a
GROUP BY a.firstname,
a.lastname,
a.date_of_birth
HAVING COUNT( DISTINCT a.medical_record_number ) > 1
I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).