Ruby: Return data grouped by multiple columns - sql

I'm trying group data for a web service.
The web service is running on Ruby on Rails and I'm working in my API controller (lets call it the index action of my projects_controller.
The table schema looks like this (the data types and example has been changed for NDA reasons). Unfortunately, the example here suggests that I break employee and projects into different tables, but please overlook that for now. This is the data that I am given:
COLUMNS:
employee, e_id, company, hire_date, project_name, project_due_date
ROWS:
John, 12345, XYZ, 01-01-2001, Project_A, 12-31-2012
John, 12345, XYZ, 01-01-2001, Project_B, 03-15-2013
John, 12345, XYZ, 01-01-2001, Project_C, 06-25-2013
Jane, 98765, XYZ, 05-22-2003, Project_Q, 01-15-2013
Jane, 98765, XYZ, 05-22-2003, Project_W, 02-25-2013
Jane, 98765, XYZ, 05-22-2003, Project_E, 08-01-2013
In order to reduce data transfer, I would like to return the above as follows:
[
{
"employee":"John",
"e_id":"12345",
"company":"XYZ",
"hire_date":"01-01-2001",
"projects":[
{ "project_name":"Project_A", "project_due_date":"12-31-2012" },
{ "project_name":"Project_B", "project_due_date":"03-15-2013" },
{ "project_name":"Project_C", "project_due_date":"06-25-2013" }
]
},
{
"employee":"Jane",
"e_id":"98765",
"company":"XYZ",
"hire_date":"05-22-2003",
"projects":[
{ "project_name":"Project_Q", "project_due_date":"01-15-2013" },
{ "project_name":"Project_W", "project_due_date":"02-25-2013" },
{ "project_name":"Project_E", "project_due_date":"08-01-2013" }
]
}
]
I can't seem to figure out the best way to group my SQL query results (rows) into the organized hash(es) that I have in the ideal data. I imagine I need some .each and hashes to post-process the data returned by my SQL call, but I can't seem to figure out the "Ruby" way (I'm also not a seasoned Ruby developer, so any reference links would also be appreciated so I can read up on the solution).
How can I accomplish this?
[EDIT]
I am performing a SQL query on the Project object. My controller is as follows:
def index
sql = "SELECT employee, e_id, company, hire_date, project_name, project_due_date
FROM projects
AND created_at = (SELECT created_at FROM projects ORDER BY created_at DESC LIMIT 1)
ORDER BY company, employee, project_due_date"
result = Project.find_by_sql(sql)
respond_with(result)
end
The data I am getting back is a bunch of Project objects in the following format
RUBY DEBUGGER:
(rdb:2) result
[#<Project employee: "John", e_id: 12345, company: "XYZ", hire_date: "01-01-2001", project_name: "Project_A", project_due_date: "12-31-2012">,
#<Project employee: "John", e_id: 12345, company: "XYZ", hire_date: "01-01-2001", project_name: "Project_B", project_due_date: "03-15-2013">,
#<Project employee: "John", e_id: 12345, company: "XYZ", hire_date: "01-01-2001", project_name: "Project_C", project_due_date: "06-25-2013">,
#<Project employee: "Jane", e_id: 98765, company: "XYZ", hire_date: "05-22-2003", project_name: "Project_Q", project_due_date: "01-15-2013">,
#<Project employee: "Jane", e_id: 98765, company: "XYZ", hire_date: "05-22-2003", project_name: "Project_W", project_due_date: "02-25-2013">,
#<Project employee: "Jane", e_id: 98765, company: "XYZ", hire_date: "05-22-2003", project_name: "Project_E", project_due_date: "08-01-2013">]
[EDIT 2]
I know I can resolve this problem in a very naive, non-Ruby way, but I'd like to know the proper way to get it working. A basic solution could consist of iterating through the result array and parsing out the data row by row, saving the employee data to a temp hash and their project data to an array of hashes. When the iteration comes to a new employee, save the data for the previous employee data in an array and reset the temp array/hashes for the next employee. Very ugly, but very possible.
However, there MUST be a Ruby way. Please help!

To produce the grouped data in the requested form:
grouped_data = data.group_by do |project|
[project.employee, project.e_id, project.company, project.hire_date]
end.map do |k, v|
{
"employee" => k[0],
"e_id" => k[1],
"company" => k[2],
"hire_date" => k[3],
"projects" => v.map do |p|
{
"project_name" => p.project_name,
"project_due_date" => p.project_due_date
}
end
}
end
And finally use to_json to produce the JSON formatted version, e.g.:
grouped_data.to_json

Related

Is PostgreSQL array_to_json is a bad practice?

I have a habit of writing queries that returns JSON structures directly from the PostgreSQL query,
-- Something like this...
-- The function
CREATE OR REPLACE FUNCTION get_data(_team_id UUID) RETURNS JSON AS
$$
DECLARE
_output JSON;
BEGIN
SELECT ROW_TO_JSON(rec)
INTO _output
FROM (SELECT COUNT(*) AS total
FROM users,
(SELECT ARRAY_TO_JSON(ARRAY_AGG(ROW_TO_JSON(a))) AS data
FROM (SELECT id,
name,
(SELECT ARRAY_TO_JSON(ARRAY_AGG(ROW_TO_JSON(b))) AS emails
FROM (SELECT email
FROM emails
WHERE user_id = users.id) b)
FROM users
WHERE active IS TRUE
AND team_id = _team_id
ORDER BY name
LIMIT 5 OFFSET 0) a)
WHERE active IS TRUE
AND team_id = _team_id) rec;
RETURN _output;
END;
$$ LANGUAGE plpgsql;
-- The query
SELECT get_data('ee0a7ea0-3888-476b-810e-de93a58aa6f6') AS data;
This gives me the below structure for my JavaScript web application,
{
"total": 100,
"data": [
{ "id": 1, "name": "User 1", "emails": [{"email": "email1"}, {"email": "email2"}] },
{ "id": 2, "name": "User 2", "emails": [{"email": "email1"}, {"email": "email2"}] },
{ "id": 3, "name": "User 3", "emails": [{"email": "email1"}, {"email": "email2"}] },
{ "id": 4, "name": "User 4", "emails": [{"email": "email1"}, {"email": "email2"}] },
{ "id": 5, "name": "User 5", "emails": [{"email": "email1"}, {"email": "email2"}] }
]
}
Since I'm new to querying, I see that building the structure directly from the query saves a lot of time over building the view models from JavaScript.
But I have some hesitations. Is this the best way to approach it? Or not?
I searched through the internet and find nothing about this.
The best approach is to generate the SQL queries not by hand but let an ORM like TypeORM (Typescript) or JPA (Java, implemented by Hibernate and Eclipselink) or Eloquent (PHP) do this for you.
With TypeORM and a matching entity model, your code would be:
const users = await entityManager.getRepository(User)
.createQueryBuilder('user')
.where('user.active = true')
.andWhere('user.team_id = :teamId', {teamId: 1})
.getManyAndCount();
This is much cleaner than the SQL query you have written and it allows for code reuse.
That way, if you e.g. change the type of a column or if you want a custom mapping (Postgres dates, ...) or whatever, you can use the ORM for that and don't have to copy+paste the code into every query.

Get data on basis of fields in Yii2

I am trying to get data on basis of fields in query param ie
users/1?fields=id,name
its give id and name using findOne
User::findOne(1);
Result:
{
"id": 12,
"name": 'Jhon'
}
When
users?fields=id,name
Its give all fields of user Model using findAll()
User::findAll([$ids])
Result:
[
{
'id': 1
'name': abc
'age':30
'dob':1970
'email':abc#test.com
},
{
'id': 2
'name': abc
'age':30
'dob':1970
'email':abc1#test.com
},
Why findAll() not work like findOne() result
I have read Data provider and solve the problem

How to write an insert SQL statement that loop through each record in an array of objects and insert into a record's specific columns accordingly?

First of all, I wanted to figure out how to even write an array of objects (like in js) in sql statement, and I am found nothing on the internet...
I can certainly just repeating all the insert statement, but I really just want to loop through a dataset and inject them into a table for a set of columns with exactly the same insert statement with different value! But seems there is no way to do this if the dataset is too complicated like an array of objects? or do I have to just write multiple list of arrays to represent each column which is really silly.. no?
Thanks
Example of data set
[
{
name: 'abc',
gender: 'male',
},
{
name: 'bbc',
gender: 'female',
},
{
name: 'ccc',
gender: 'male',
},
]
and put them into a table with columns of
nameHere
genderThere
You can use jsonb_array_elements to extract each JSON from the array, then use that as the source for an INSERT:
create table x(name text, gender text);
insert into x (name, gender)
select t ->> 'name', t ->> 'gender'
from jsonb_array_elements(
'[
{
"name": "abc",
"gender": "male"
},
{
"name": "bbc",
"gender": "female"
},
{
"name": "ccc",
"gender": "male"
}
]'::jsonb) t;
Online example: http://rextester.com/GZF87679
Update (after the scope changed)
To deal with nested JSON structures, you need to combine the operator that returns jsonb -> with the one that returns "plain text":
insert into x (name, gender)
select t -> 'name' ->> 'first', t ->> 'gender'
from jsonb_array_elements(
'[
{
"name": {"first": "a", "last": "b"},
"gender": "male"
}
]'::jsonb) t;
More details about the JSON operators can be found in the manual
select * from json_each( (REPLACE( REPLACE( REPLACE( your_input, '},{' , ' ' ) ,'[','{') ,']','}'))::json)
this will output a table
name | gender
-----+-------
abc | male
bcc | female
ccc | male
you can insert it in any table you want

Filter an object array to modify json with circe

I am evaluating Circe and couldn't find out how to use filter for arrays to transform a JSON. I read the guide on its website and API doc, still no clue. Help much appreciated.
Sample data:
{
"Department" : "HR",
"Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age" : 30 }]
}
Task:
How to use a filter for Employees to transform the JSON to another JSON, for example, all employees with age older than 50?
For some reason I can't filter from data source before JSON is generated, in case you ask.
Thanks
One possible way of doing this is by
val data = """{"Department" : "HR","Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age":30}]}"""
def ageFilter(j:Json): Json = j.withArray { x =>
Json.fromValues(x.filter(_.hcursor.downField("age").as[Int].map(_ > 26).getOrElse(false)))
}
val y: Either[ParsingFailure, Json] = parse(data).map( _.hcursor.downField("Employees").withFocus(ageFilter).top.get)
println(s"$y")

Get Most Recent Column Value With Nested And Repeated Fields

I have a table with the following structure:
and the following data in it:
[
{
"addresses": [
{
"city": "New York"
},
{
"city": "San Francisco"
}
],
"age": "26.0",
"name": "Foo Bar",
"createdAt": "2016-02-01 15:54:25 UTC"
},
{
"addresses": [
{
"city": "New York"
},
{
"city": "San Francisco"
}
],
"age": "26.0",
"name": "Foo Bar",
"createdAt": "2016-02-01 15:54:16 UTC"
}
]
What I'd like to do is recreate the same table (same structure) but with only the latest version of a row. In this example let's say that I'd like to group by everything by name and take the row with the most recent createdAt.
I tried to do something like this: Google Big Query SQL - Get Most Recent Column Value but I couldn't get it to work with record and repeated fields.
I really hoped someone from Google Team will provide answer on this question as it is very frequent topic/problem asked here on SO. BigQuery definitelly not friendly enough with writing Nested / Repeated stuff back to BQ off of BQ query.
So, I will provide the workaround I found relatively long time ago. I DO NOT like it, but (and that is why I hoped for the answer from Google Team) it works. I hope you will be able to adopt it for you particular scenario
So, based on your example, assume you have table as below
and you expect to get most recent records based on createdAt column, so result will look like:
Below code does this:
SELECT name, age, createdAt, addresses.city
FROM JS(
( // input table
SELECT name, age, createdAt, NEST(city) AS addresses
FROM (
SELECT name, age, createdAt, addresses.city
FROM (
SELECT
name, age, createdAt, addresses.city,
MAX(createdAt) OVER(PARTITION BY name, age) AS lastAt
FROM yourTable
)
WHERE createdAt = lastAt
)
GROUP BY name, age, createdAt
),
name, age, createdAt, addresses, // input columns
"[ // output schema
{'name': 'name', 'type': 'STRING'},
{'name': 'age', 'type': 'INTEGER'},
{'name': 'createdAt', 'type': 'INTEGER'},
{'name': 'addresses', 'type': 'RECORD',
'mode': 'REPEATED',
'fields': [
{'name': 'city', 'type': 'STRING'}
]
}
]",
"function(row, emit) { // function
var c = [];
for (var i = 0; i < row.addresses.length; i++) {
c.push({city:row.addresses[i]});
};
emit({name: row.name, age: row.age, createdAt: row.createdAt, addresses: c});
}"
)
the way above code works is: it implicitely flattens original records; find rows that belong to most recent records (partitioned by name and age); assembles those rows back into respective records. final step is processing with JS UDF to build proper schema that can be actually written back to BigQuery Table as nested/repeated vs flatten
The last step is the most annoying part of this workaround as it needs to be customized each time for specific schema(s)
Please note, in this example - it is only one nested field inside addresses record, so NEST() fuction worked. In scenarious when you have more than just one
field inside - above approach still works, but you need to involve concatenation of those fields to put them inside nest() and than inside js function to do extra splitting those fields, etc.
You can see examples in below answers:
Create a table with Record type column
create a table with a column type RECORD
How to store the result of query on the current table without changing the table schema?
I hope this is good foundation for you to experiment with and make your case work!