Query JSON Attributes from JSON-formatted Column in Database - sql

here is the problem I am facing
I have a table called GAMELOG (well, could be SQL Table or NoSQL column family), that looks like this :
ID INT,
REQUESTDATE DATE,
REQUESTMESSAGE VARCHAR,
RESPONSEDATE DATE,
RESPONSEMESSAGE VARCHAR
the REQUESTMESSAGE and RESPONSEMESSAGE column is JSON-formatted.
let's say for example a specific value in REQUESTMESSAGE is :
{
"name" : "John",
"specialty" : "Wizard",
"joinDate" : "17-Feb-1988"
}
and for the RESPONSEMESSAGE is :
{
"name" : "John Doe",
"specialty" : "Wizard",
"joinDate" : "17-Feb-1988",
"level" : 89,
"lastSkillLearned" : "Megindo"
}
now, the data in my table has grown to incredibly large (around billion rows, few terabytes harddisk space)
What I want to do is query the row which contains JSON Property of "name" that has a value of "John" in the REQUESTMESSAGE
What I understand about SQL Database, well Oracle that I've used before, I have to make the REQUESTMESSAGE and RESPONSEMESSAGE as a CLOB, and query using LIKE, i.e.
SELECT * FROM GAMELOG WHERE REQUESTMESSAGE LIKE '%"name" : "John"%';
But, the result is very slow and painfully.
Now, I move to Cassandra, but I don't know how to properly query it, well I haven't used Apache Hadoop yet to get the data, which I intended to use it to get the data sometimes later.
My question is, is there a database product that support the query to select the JSON-formatted JSON Attribute inside the table/column family? As far as I know, MongoDB stores document in JSON, but that means that all of my column family will be stored as JSON, i.e.
{
"ID" : 1,
"REQUESTMESSAGE" : "{
"name" : "John",
"specialty" : "Wizard",
"joinDate" : "17-Feb-1988"
}",
"REQUESTDATE" : "17-Feb-1967"
"RESPONSEMESSAGE" : "{
"name" : "John Doe",
"specialty" : "Wizard",
"joinDate" : "17-Feb-1988",
"level" : 89,
"lastSkillLearned" : "Megindo"
}",
"RESPONSEDATE" : "17-Feb-1967"
}
and I still have trouble to get the JSON Attributes inside the REQUESTMESSAGE column (please correct me if I'm wrong)
Thank you very much

If you aren't committed to storing your data in Apache Cassandra, MySQL has SQL query functions that can extract data from JSON values, in particular, you would want to look at the JSON_EXTRACT function: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html
In your case, the query should look something like the following:
SELECT REQUESTMESSAGE, JSON_EXTRACT(REQUESTMESSAGE, "$.name")
FROM GAMELOG
WHERE JSON_EXTRACT(REQUESTMESSAGE, "$.name") = "John";

Related

Unable to save JSON to database with field named as order (NiFi)

I have a JSON file:
[ {
"Order" : "Nestle billboard 100%x250",
"Country" : "Russia",
"Order_ID" : 287259619,
"Country_ID" : 243,
"Order_lifetime_impressions" : "3385377",
"Total_unique_visitors" : "1090850",
"Total_reach_impressions" : "3385525",
"Average_impressions_unique_visitor" : 3.1,
"Date" : "2021-07-01"
}, {
"Order" : "Nestle_june_july 2021_ mob 300x250",
"Country" : "Russia",
"Order_ID" : 28734,
"Country_ID" : 263,
"Order_lifetime_impressions" : "1997022",
"Total_unique_visitors" : "1012116",
"Total_reach_impressions" : "1997036",
"Average_impressions_unique_visitor" : 1.97,
"Date" : "2021-07-01"
}]
And table with the same column names. I'm using PutDatabaseRecord processor with this configuration:
When I'm trying to save this file, I get an error.
ERROR: syntax error (at or near: ",") Position: 110
I renamed column in the table and in the json to order_name and processor was able to save it.
But I still want to save it as order if it possible.
I really dont understand why this happens. Yes, order is a keyword for sql, but it's inside ". Is it a bug? How can I fix it without renaming columns?
If I will keep Order as column in JSON, but change column name in database - works fine as well. But of course, I cannot save Order to this renamed column.
Order is a reserved word and you should absolutely avoid using it as a column name if you can. [1] [3]
If you absolutely can't, you need to set the Quote Column Identifiers property to True in the PutDatabaseRecord processor config. [2]
https://www.postgresql.org/docs/current/sql-keywords-appendix.html
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.PutDatabaseRecord/
Postgres table column name restrictions?

Query data inside an attribute array in a json column in Postgres 9.6

I have a table say types, which had a JSON column, say location that looks like this:
{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
} ...
]
}
Each row in the table has the data, and all have the "type": "state" in it. I want to just extract the value of "type": "state" from every row in the table, and put it in a new column. I checked out several questions on SO, like:
Query for element of array in JSON column
Index for finding an element in a JSON array
Query for array elements inside JSON type
but could not get it working. I do not need to query on this. I need the value of this column. I apologize in advance if I missed something.
create table t(data json);
insert into t values('{"attribute":[{"type": "state","value": "CA"},{"type": "distance","value": "200.00"}]}'::json);
select elem->>'value' as state
from t, json_array_elements(t.data->'attribute') elem
where elem->>'type' = 'state';
| state |
| :---- |
| CA |
dbfiddle here
I mainly use Redshift where there is a built-in function to do this. So on the off-chance you're there, check it out.
redshift docs
It looks like Postgres has a similar function set:
https://www.postgresql.org/docs/current/static/functions-json.html
I think you'll need to chain three functions together to make this work.
SELECT
your_field::json->'attribute'->0->'value'
FROM
your_table
What I'm trying is a json extract by key name, followed by a json array extract by index (always the 1st, if your example is consistent with the full data), followed finally by another extract by key name.
Edit: got it working for your example
SELECT
'{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
}
]
}'::json->'attribute'->0->'value'
Returns "CA"
2nd edit: nested querying
#McNets is the right, better answer. But in this dive, I discovered you can nest queries in Postgres! How frickin' cool!
I stored the json as a text field in a dummy table and successfully ran this:
SELECT
(SELECT value FROM json_to_recordset(
my_column::json->'attribute') as x(type text, value text)
WHERE
type = 'state'
)
FROM dummy_table

Summation of sub document field based on a group unexpected behaviour

*I have a document structured in which we embedded the salary Document into the Employee_Detail Document.As per the MongoDb documantation ,we can use $Unwind to deconstruct the Document and use aggregated pipeline...But its not working. i am using the below script...*
{
"_id" : ObjectId("5763d4a54da83b98f269878a"),
"First_Name" : "fgfg",
"Department" : "QA",
"Salary" : {
"HRA" : "1200",
"Basic" : "2000",
}
})
And i want to get sum of basic salary based on department Like
then Expected output is
Department Total_Basic**
QA 2000
I have used the following code to get the output. I have used the $unwind to deconstruct the document.and use aggregated pipeline to group the department(Sum of basic Salary).
db.Employee_Detail.aggregate([
{$unwind:"$Salary"}, {$group: {"_id": "$Department", total_Basic: {$sum: "$Salary.Basic" }
}}
])
But i get the below Result.
Department Total_Basic
QA 0
I think $unwind is not Working. Please suggest
Your main problem is the type of the field Basic is a string. Second, you do not need to use unwind unless the field Salary contains an array.
So perform an update to convert the types of Basic and HRA to floats, (see this stackoverflow question)
And then an aggregate operation like this will give you the desired result:
db.Employee_Detail.aggregate([
{$group: {"_id": "$Department", total_Basic: {$sum: "$Salary.Basic" }}
])

Complex data structures Redis

Lets say I have a hash of a hash e.g.
$data = {
'harry' : {
'age' : 25,
'weight' : 75,
},
'sally' : {
'age' : 25,
'weight' : 75,
}
}
What would the 'usual' way to store such a data structure (or would you not?)
Would you be able to directly get a value (e.g. get harry : age ?
Once stored could you directly change the value of a sub key (e.g. sally : weight = 100)
What would the 'usual' way to store such a data structure (or would
you not?)
For example harry and sally would be stored each in separate hashes where fields would represent their properties like age and weight. Then set structure would hold all the members (harry, sally, ...) which you have stored in redis.
Would you be able to directly get a value (e.g. get harry : age ?)
Yes, see HGET or HMGET or HGETALL.
Once stored could you directly change the value of a sub key (e.g.
sally : weight = 100)
Yes, see HSET.
Lets take a complex data that we have to store in redis ,
for example this one:
$data = {
"user:1" : {
name : "sally",
password : "123"
logs : "25th october" "30th october" "12 sept",
friends : "34" , "24", "10"
}
"user:2" :{
name : ""
password : "4567"
logs :
friends: ""
}
}
The problem that we face is that the friends & logs are lists.
So what we can do to represent this data in redis is use hashes and lists something like this :
Option 1. A hash map with keys as user:1 and user:2
hmset user:1 name "sally" password "12344"
hmset user:2 name "pally" password "232342"
create separate list of logs as
logs:1 { here 1 is the user id }
lpush logs:1 "" "" ""
lpush logs:2 "" "" ""
and similarly for friends.
Option 2: A hash map with dumped json data as string encode
hmset user:1 name "sally" password "12344" logs "String_dumped_data" friends "string of dumped data"
Option 3: This is another representation of #1
something like user:1:friends -> as a list
and user:2:friends -> as a list
Please , correct me if i m wrong.
Depends on what you want to do, but if your datastructure is not deeper nested and you need access to each field, I would recommend using hashes: http://redis.io/commands#hash
Here is a good overview over the redis datatypes, each with pro and contra: http://redis.io/topics/data-types

Effect mongodb _id generation on Indexing

I am using MonoDB as a databse.......
I am going to generate a _id for each document for
that i use useId and FolderID for that user
here userId is different for each User and also Each user has different FolderIds
i generate _id as
userId="user1"
folderId="Folder1"
_id = userId+folderId
is there any effect of this id generation on mongoDB Indexing...
will it work Fast like _id generated by MongoDB
A much better solution would be to leave the _id column as it is and have separate userId and folderId fields in your document, or create a separate field with them both combined.
As for if it will be "as fast" ... depends on your query, but for ordering by "create" date of the document for example you'd lose the ability to simply order by the _id you'd also lose the benefits for sharding and distribution.
However if you want to use both those ID's for your _id there is one other option ...
You can actually use both but leave them separate ... for example this is a valid _id:
> var doc = { "_id" : { "userID" : 12345, "folderID" : 5152 },
"field1" : "test", "field2" : "foo" };
> db.crazy.save(doc);
> db.crazy.findOne();
{
"_id" : {
"userID" : 12345,
"folderID" : 5152
},
"field1" : "test",
"field2" : "foo"
}
>
It should be fine - the one foreseeable issue is that you'll lose the ability to reverse out the date / timestamp from the MongoID. Why not just add another ID object within the document? You're only losing a few bytes, and you're not screwing with the built in indexing system.