How to expand on result of first query? - sql

This is my first time using Postgres and I would like to write a query that expands on the elements in an array. The example is as follows.
I have some object in a table, say
+-----+------+----+
| 123 | john | AZ |
+-----+------+----+
| 456 | carl | CA |
+-----+------+----+
Another table has an object that contains an array of user ids.
+-----+-----------+
| 999 | {123,456} |
+-----+-----------+
Given the two case classes
case class User(userId: Int, name: String, country: String)
case class Group(groupId: Int, users: List[User])
I would love to write a function with this signature:
def getGroupById(groupId: Int): Future[Group] // or Future[Option[Group]]
so that
getGroupById(999) ---> Group(999, List(User(123, john, AZ), User(456, carl, CA))
For the time being I am doing it the 'brute force' way:
obtain group object with user ids
---> Future.sequence(query each user id)
---> map to desired final object
But, could I achieve this without application logic, in one single query?
I am using the slick-pg extensions for Slick to manipulate arrays in Postgres.

Related

Is there any Eloquent way to get sum of relational table and sort by it?

Example:
table Users
ID | Username | sex
1 | Tony | m
2 | Andy | m
3 | Lucy | f
table Scores
ID | user_id | score
1 | 2 | 4
2 | 1 | 3
3 | 1 | 4
4 | 2 | 3
5 | 1 | 1
6 | 3 | 3
7 | 3 | 2
8 | 2 | 3
Expected Result:
ID | Username | sex | score_sum (sum) (desc)
2 | Andy | m | 10
1 | Tony | m | 8
3 | Lucy | f | 5
The code I use so far:
User model:
class User extends Authenticatable
{
...
public function scores()
{
return $this->hasMany('App\Score');
}
...
}
Score model
class Job extends Model
{
//i put nothing here
}
Code in controller:
$users = User::all();
foreach ($users as $user){
$user->score_sum = $user->scores()->sum('score');
}
$users = collect($users)->sortByDesc('score_sum');
return view('homepage', [
'users' => $users->values()->all()
]);
Hope my example above make sense. My code does work, but I thought there must be an Eloquent and elegant way to do this without foreach?
There are 2 options for doing this in an Eloquent way.
Option 1
The first way is to do this to add the score_sum as an attribute that is always included when querying the users model. This is only a good idea if you will be using the score_sum the majority of the time when querying the users table. If you only need the score_sum on very specific view or for specific business logic then I would use the second option below.
To do this you will add the attribute to the users model, you can look here for documentation: https://laravel.com/docs/5.6/eloquent-mutators#defining-an-accessor
Here is an example for your use case:
/app/User.php
class User extends Model
{
.
.
.
public function getScoreSumAttribute($value)
{
return $this->scores()->sum('score');
}
}
Option 2
If you just want to do this for a single use case, then the easiest solution is just to use the sum() function in the eventual foreach loop you will be using (most likely in the view).
For example in a view:
#foreach($users as $user)
<div>Username: {{$user->username}}</div>
<div>Sex: {{$user->sex}}</div>
<div>Score Sum: {{$user->scores()->sum('price')}}</div>
#endforeach
Additionally, if you do not want to do this in a foreach loop you can use a raw query in the Eloquent call in your Controller gets the `score_sum'. Here is an example of how that can be done:
$users = User::select('score_sum',DB::raw(SUM(score) FROM 'scores'))->get();
I did not have a quick environment to test this, you might need a WHERE clause in the DB::raw query
Hope this helps!
This is as nice as it gets:
User::selectRaw('*, (SELECT SUM(score) FROM scores WHERE user_id = users.id) as score_sum')
->orderBy('score_sum', 'DESC')
->get();

SQLAlchemy getting label names out from columns

I want to use the same labels from a SQLAlchemy table, to re-aggregate some data (e.g. I want to iterate through mytable.c to get the column names exactly).
I have some spending data that looks like the following:
| name | region | date | spending |
| John | A | .... | 123 |
| Jack | A | .... | 20 |
| Jill | B | .... | 240 |
I'm then passing it to an existing function we have, that aggregates spending over 2 periods (using a case statement) and groups by region:
grouped table:
| Region | Total (this period) | Total (last period) |
| A | 3048 | 1034 |
| B | 2058 | 900 |
The function returns a SQLAlchemy query object that I can then use subquery() on to re-query e.g.:
subquery = get_aggregated_data(original_table)
region_A_results = session.query(subquery).filter(subquery.c.region = 'A')
I want to then re-aggregate this subquery (summing every column that can be summed, replacing the region column with a string 'other'.
The problem is, if I iterate through subquery.c, I get labels that look like:
anon_1.region
anon_1.sum_this_period
anon_1.sum_last_period
Is there a way to get the textual label from a set of column objects, without the anon_1. prefix? Especially since I feel that the prefix may change depending on how SQLAlchemy decides to generate the query.
Split the name string and take the second part, and if you want to prepare for the chance that the name is not prefixed by the table name, put the code in a try - except block:
for col in subquery.c:
try:
print(col.name.split('.')[1])
except IndexError:
print(col.name)
Also, the result proxy (region_A_results) has a method keys which returns an a list of column names. Again, if you don't need the table names, you can easily get rid of them.

Trying to get a third normal form

I edited the question to make it more understandable:
I got a tiny problem and I don't exactly know how to handle it.
Lets say I got a table with the following attributes and values which I want to transform into the third NF. This table is created automatically by a machine:
KeyID | Action | Class | Method | StoreNr. | Country
1 | Action1 | Class1 | Method1 | 123 | GB
1 | Action2 | Class2 | Method2 | 123 | GB
2 | Action5 | Class5 | Method5 | 335 | NULL
2 | Action8 | Class8 | Method8 | 335 | NULL
3 | Action2 | Class2 | Method2 | NULL| NL
3 | Action5 | Class5 | Method5 | NULL| NL
4 | Action4 | Class4 | Method4 | NULL| NULL
4 | Action1 | Class1 | Method1 | NULL| NULL
As you can see the attributes KeyID, Action, Class and Method cant be NULL. StoreNr and Country CAN be NULL.
The dependencies are the following:
Method -> Action
Action -> Class
StoreNr -> Country
My problem is the KeyID. This is a randomly created number which only serves the purpose of tracking useractions. Wouldnt there be a KeyID it would be impossible to say what kind of Actions User4 used in his session.
I dont exactly know how to handle this when putting the table into the third NF.
I hope this made my needs more clear :)
Regards
Thomas
Your stated set of dependencies
Method -> Action
Action -> Class
StoreNr -> Country
and your sample data suggests at least 6 relations, as follows. Depending on what you intended by the nulls in your sample data this may be simpler than you are making it. Nulls do not need to be part of any properly stated business requirement or FD - they are merely a technical feature of the implementation you have created.
R1 {Method,Action} KEY {Method}
R2 {Action,Class} KEY {Action}
R3 {StoreNr,Country} KEY {StoreNr}
R4 {Method,KeyId} KEY {Method,KeyId}
R5 {Method,KeyId,Country} KEY {Method,KeyId}
R6 {Method,KeyId,StoreNr} KEY {Method,KeyId}
The third normal form essentially means that data should be stored at its natural level. So, if you have a table of employees you should include the salary of that employee and the department they are a member of but not the average salary of the department because that is department "level" information.
Action and Class are dependend of Method. Retailer has no
dependencies. The whole dataset is dependend of the KeyID.
If Action and Class are dependent on Method you need a table that is unique on Method. You say your problem is the KeyID, but this seems to need a foreign key in the Method table. I don't know if there is any additional data you would store at this level or not. Retailer seems to be part of Action and Class from your sample data; it can't be completely independent or it has no business being in the same dataset.
Assuming the above is correct you would want tables as follows:
Keys
KeyID - PK
????
Methods
MethodID - PK
Class
Action
KeyID - FK to Keys
Retailers
RetailerNR - PK
???
This may be slightly incorrect as your question is a little confused but you know your own data best; store data at it's natural level and you can't fail to put it into 3NF.

Extract data from one field into another in mysql

I have an old table which has a column like this
1 | McDonalds (Main Street)
2 | McDonalds (1st Ave)
3 | The Goose
4 | BurgerKing (Central Gardes)
...
I want to match the venues like ' %(%)' and then extract the content in the brackets to a second field
to result in
1 | McDonalds | Main Street
2 | McDonalds | 1st Ave
3 | The Goose | NULL
4 | BurgerKing| Central Gardes
...
How would one go about this?
MySQL provides string functions for finding characters and extracting substrings. You can also use control flow functions to handle the cases where the venue is not present.
I installed these user defined functions
http://www.mysqludf.org/lib_mysqludf_preg/
Then I could select the "branches" via
SELECT `id`, `name`, preg_capture('/.*?\\((.*)\\)/',`name`,1) AS branch FROM `venues`

How to represent and insert into an ordered list in SQL?

I want to represent the list "hi", "hello", "goodbye", "good day", "howdy" (with that order), in a SQL table:
pk | i | val
------------
1 | 0 | hi
0 | 2 | hello
2 | 3 | goodbye
3 | 4 | good day
5 | 6 | howdy
'pk' is the primary key column. Disregard its values.
'i' is the "index" that defines that order of the values in the 'val' column. It is only used to establish the order and the values are otherwise unimportant.
The problem I'm having is with inserting values into the list while maintaining the order. For example, if I want to insert "hey" and I want it to appear between "hello" and "goodbye", then I have to shift the 'i' values of "goodbye" and "good day" (but preferably not "howdy") to make room for the new entry.
So, is there a standard SQL pattern to do the shift operation, but only shift the elements that are necessary? (Note that a simple "UPDATE table SET i=i+1 WHERE i>=3" doesn't work, because it violates the uniqueness constraint on 'i', and also it updates the "howdy" row unnecessarily.)
Or, is there a better way to represent the ordered list? I suppose you could make 'i' a floating point value and choose values between, but then you have to have a separate rebalancing operation when no such value exists.
Or, is there some standard algorithm for generating string values between arbitrary other strings, if I were to make 'i' a varchar?
Or should I just represent it as a linked list? I was avoiding that because I'd like to also be able to do a SELECT .. ORDER BY to get all the elements in order.
As i read your post, I kept thinking 'linked list'
and at the end, I still think that's the way to go.
If you are using Oracle, and the linked list is a separate table (or even the same table with a self referencing id - which i would avoid) then you can use a CONNECT BY query and the pseudo-column LEVEL to determine sort order.
You can easily achieve this by using a cascading trigger that updates any 'index' entry equal to the new one on the insert/update operation to the index value +1. This will cascade through all rows until the first gap stops the cascade - see the second example in this blog entry for a PostgreSQL implementation.
This approach should work independent of the RDBMS used, provided it offers support for triggers to fire before an update/insert. It basically does what you'd do if you implemented your desired behavior in code (increase all following index values until you encounter a gap), but in a simpler and more effective way.
Alternatively, if you can live with a restriction to SQL Server, check the hierarchyid type. While mainly geared at defining nested hierarchies, you can use it for flat ordering as well. It somewhat resembles your approach using floats, as it allows insertion between two positions by assigning fractional values, thus avoiding the need to update other entries.
If you don't use numbers, but Strings, you may have a table:
pk | i | val
------------
1 | a0 | hi
0 | a2 | hello
2 | a3 | goodbye
3 | b | good day
5 | b1 | howdy
You may insert a4 between a3 and b, a21 between a2 and a3, a1 between a0 and a2 and so on. You would need a clever function, to generate an i for new value v between p and n, and the index can become longer and longer, or you need a big rebalancing from time to time.
Another approach could be, to implement a (double-)linked-list in the table, where you don't save indexes, but links to previous and next, which would mean, that you normally have to update 1-2 elements:
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
2 | 0 | goodbye
3 | 2 | good day
5 | 3 | howdy
hey between hello & goodbye:
hey get's pk 6,
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
6 | 0 | hi <- ins
2 | 6 | goodbye <- upd
3 | 2 | good day
5 | 3 | howdy
the previous element would be hello with pk=0, and goodbye, which linked to hello by now has to link to hey in future.
But I don't know, if it is possible to find a 'order by' mechanism for many db-implementations.
Since I had a similar problem, here is a very simple solution:
Make your i column floats, but insert integer values for the initial data:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
Then, if you want to insert something in between, just compute a float value in the middle between the two surrounding values:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
6 | 2.5 | hey
This way the number of inserts between the same two values is limited to the resolution of float values but for almost all cases that should be more than sufficient.