Database schema design for records of nested JSON

Database schema design for records of nested JSON - sql

I have a list of profile records, which each of the record is like the below:
{
"name": "Peter Pan",
"contacts": [
{
"key": "mobile",
"value": "1234-5678"
}
],
"addresses": [
{
"key": "postal",
"value": "2356 W. Manchester Ave.\nSomewhere District\nA Country"
},
{
"key": "po",
"value": "PO Box 1234"
}
],
"emails": [
{
"key": "work",
"value": "abc#work.com"
},
{
"key": "personal",
"value": "abc#personal.com"
}
],
"url": "http://www.example.com/"
}
I would think about having the following schema structure:
A profile table with id and name field.
A profile_contact table with id, profile_id, key, value field.
A profile_address table with id, profile_id, key, value field.
A profile_email table with id, profile_id, key, value field.
However, I think I am creating too many tables for such a simple JSON!
Would there be performance problems when I search across the tables, since many JOINS are performed to retrieve just one record?
What would be a better way to model the above JSON record into the database? In SQL, or better in NoSQL?

It kind of depends.
If you are planning to have an "infinite" amount of contacts/addresses/emails per user then your idea is a pretty good way to go.
You could also consider (something like) the following:
PROFILE table, containing:
PROFILE_ID
NAME
EMAIL_ADDRESS_WORK
EMAIL_ADDRESS_PERSONAL
PHONE_NUMBER
MOBILE_NUMBER
ADDRESS table, containing:
ADDRESS_ID
PROFILE_ID
STREET
CITY
..etc
This means you can set 2 kinds of emails and 2 kinds of phone numbers per user and they are stored with the profile itself.
Alternatively you can choose to have a separate CONTACT table which contains both phone numbers and email addresses (and maybe other types):
CONTACT_TYPE (phone, mobile, email_work, email_personal)
CONTACT_VALUE
PROFILE_ID
All three (mine, and yours) could work perfectly. To better decide what would work for you, you should write down all possibilities there are (and could be). Maybe you want to be able to add 10 email addresses per profile (then storing them with the profile would be silly), maybe you will have a very large variaty in different contact types, such as IM, facebook, ICQ, twitter (then a CONTACTS table would fit nicely).
So try to find out/list what types of data you will have and see how that will fit into a specific model, then pick the most suitable one :)

This is the most common case for database design... You should stop threading it as something new just because you included json :-)
Just create
User: id, name
Contacts: user-id, id, key, value
Email and addresses and whatever else is like contacts.
Now you just have to select from user and inner join the other tables on user.id=id

Related

NULL typing in BigQuery CTEs

I want to execute a merge statement in BigQuery which fills missing fields coming from late data of an upstream data pipeline. For this, I dynamically create the SQL merge statement using a CTE. Since it is late data, I know that the target table already has a partially filled row. So when creating the dynamic SQL merge statement I utilize this information to only update the missing fields by setting the already present fields in the CTE as NULL, using T.already_filled_col= IFNULL(S.already_filled_col, T.already_filled_col), where S.already_filled_col is NULL.
However, since it is a CTE without a schema, BigQuery complains that the NULL value in the CTE is of the wrong type. In particular, I get the error
No matching signature for function IFNULL for argument types: INT64, STRING
How do I specify the type of the NULL value in a CTE?
Below is a minimal working example
Given a target table with the following schema
[
{
"name": "join_col",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "already_filled_col",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "update_col",
"type": "STRING",
"mode": "NULLABLE"
}
]
Fill it with dummy data:
INSERT `my_project.my_dataset.test_table` (join_col, already_filled_col, update_col)
VALUES
('join_key', 'on_time_data', NULL)
My merge statement:
MERGE `my_project.my_dataset.test_table` AS T
USING (
SELECT
"join_key" AS join_col,
NULL AS already_filled_col,
"late_data" AS update_col
) AS S
ON T.join_col= S.join_col
WHEN MATCHED THEN
UPDATE SET
T.already_filled_col= IFNULL(S.already_filled_col, T.already_filled_col),
T.update_col= IFNULL(S.update_col, T.update_col)
I am aware that I could replace NULL AS already_filled_col in the CTE with CAST(NULL AS STRING) AS already_filled_col. However, I am hoping that there is an easier way, since for the actual data the types are not always STRING and deriving them dynamically is something I was hoping to avoid.

How to display a row without showing its null columns in PostgreSQL?

I want to show all the rows in my table with all the columns except those columns that are null.
-- SELECT all users
SELECT * FROM users
ORDER BY user_id ASC;
-- SELECT a user
SELECT * FROM users
WHERE user_id = $1;
Currently my API's GET request returns something like this with the above queries:
{
"user_id": 10,
"name": "Bruce Wayne",
"username": "Batman",
"email": "bat#cave.com",
"phone": null,
"website": null
}
Is there any way I can display it like this so that the null columns aren't shown?
{
"user_id": 10,
"name": "Bruce Wayne",
"username": "Batman",
"email": "bat#cave.com"
}

I understand that you are using serialized (or deserialized) JSON and objects in your code. More serialized modules have special parameters that as ignore nulls and etc.
If you generate this JSON format data on the DB, in the inside SQL codes, then you can use Postgres jsonb_strip_nulls(JSONB) function. This function automatically removes all null values keys in the JSONB recursively and returns the JSONB type.

create partitioned table from JSON file in BigQuery

I would like to create a partitioned table in BigQuery. The schema for the table is in JSON format stored in my local path.
I would like to create this table with partition from the JSON file using "bq mk -t" command. Kindly help.
Thanks in advance.

bk mk --table --schema=file.json PROJECTID:DATASET.TABLE
Hope the above example helps.
you can refer to the more options.

One recommendation to use the JSON format for creating the bigquery tables.
(1) If we decide to use the partition the table for better performance use the pseudo partition (_PARTITIONTIME or _PARTITITIONDATE).
(2) Example, partition_date is the column which has the data type of TIMESTAMP (we can use data type column DATE also).
{
"name": "partition_date",
"type": "TIMESTAMP",
"mode": "NULLABLE",
"timePartitioning": {
"type": "DAY"
},
"field" : [
{
"name": "partition_date",
"type": "TIMESTAMP"
}
]
},

Query to only show user data from their group based on login table

We currently have apps for each office that only shows those that log in data from their office, right now that means having a separate app for each office, which means if we make application changes I have to update 20 different apps.
The logins are done through APEX with a custom authentication scheme that only allows a person to login if the office designation matches the app.
What I would like to figure out is how I can write the sql query that when a person logs in they only see the data from their office so I can get down to one application instead of having an app for each office.
Here's the structure of the table that contains the logins.
"PKEY" NUMBER,
"USERNAME" VARCHAR2(50),
"PASSWORD" VARCHAR2(50),
"NAME" VARCHAR2(50),
"EMAIL" VARCHAR2(50),
"OFFICE" VARCHAR2(50),
CONSTRAINT "LOGINS_PK" PRIMARY KEY ("PKEY")
And here's code from one of the current queries that's used to bring up office submittal information.
select "PKEY",
"DATE_SUB",
"CLIENT",
"CANDIDATE",
"RECRUITER",
"CONTACT",
"SALES",
dbms_lob.getlength("RESUME") "RESUME",
"MIMETYPE",
"FILENAME",
"POSITION",
"AVAILABILITY",
"RATE",
"ISSUES",
"WHEN_INT",
"FEEDBACK",
"REQ_PRIORITY",
"OFFICE",
"NOTES",
"REJECT",
"INT_FB"
from "SUBS"
WHERE "OFFICE" = 'OFFICE1'
AND ("REJECT" = 'Accepted' or "REJECT" IS NULL)
AND SALES != 'House'
What I'm trying to figure out is how I can use the "OFFICE" field in the logins table and compare that against the "OFFICE" field in the "SUBS" table so that APEX only shows results to a user that contains the same "OFFICE" designation that's in the logins table.
Example: User Joe Smith logs in as jsmith and he's in OFFICE1, so when he logs in he'll only see data that contains OFFICE1 in the OFFICE field in the table. User Jane Brown logs in to the same app as jbrown and she's in OFFICE2 so she only sees the data that has OFFICE2 in the OFFICE field.
Thanks for all the help in advance!

Use a subquery:
select * from login where office = (
select office from subs where login.office = subs.office
);
This will return all login rows where the office data is matching. I may have misunderstood your initial question, you may require the reverse query:
select * from subs where office = (
select office from login where subs.office = login.office
);
Syntax is character insensitive, hence my lack of formatting, you may wish to capatalise various parts of the query. Good Luck!

I know this is an old one, but in case anyone else needs to learn how to do this.
Just create a hidden field on the page and create a computation for that field using a PL/SQL function body like this.
DECLARE
v_value VARCHAR2(4000);
BEGIN
SELECT NAME
INTO v_value
FROM LOGINS
WHERE USERNAME = :APP_USER;
RETURN v_value;
END;
Use what ever reference you need to from your table to (in this case, recruiter name) and just a AND or WHERE clause to your query referencing that hidden field.

Big Query Create View with Repeated Record

We have a series of tables with schema that contains a repeated record, like follows:
[{
name: "field1",
type: "RECORD",
mode: "REPEATED",
fields: [{type: "STRING", name: "subfield1"}, {type: "INTEGER", name: "subfield2"}]
}]
when we create a view that include that repeated record field, we always get error:
Error in query string: Field field1 from table xxxxx is not a leaf field.
I understand that it might be better to use flatten, but all this field contains mostly different filters we want to test on and we have a lot of other non-repeated fields that would be difficult to manage if flattened.
It turned out that the problem is selecting the repeated record field from multiple tables (not in creating view). Is there an easy way to get around that?
Thanks

If you do SELECT field.* from t1, t2 you'll get an error that the * cannot be used to refer fields in a union (as you've noticed above).
You can work around this by wrapping the union in an inner SELECT statement, as in SELECT field.* from (SELECT * from t1, t2).
To give a concrete example, this works:
SELECT payload.pages.*
FROM (
SELECT *
FROM [publicdata:samples.github_nested],
[publicdata:samples.github_nested])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Database schema design for records of nested JSON - sql

Related

NULL typing in BigQuery CTEs

How to display a row without showing its null columns in PostgreSQL?

create partitioned table from JSON file in BigQuery

Query to only show user data from their group based on login table

Big Query Create View with Repeated Record

Categories

Resources