On rails console (just for the record, rails 3.2), how can I make it output to STDOUT my queries results table (ie: the table containing the columns names and values)?
User Load (6.5ms) SELECT `users`.* FROM `users`
---------------------------
Id | Name | Address | Phone
---------------------------
1 | Sam | ZZZ 10 | 55555
---------------------------
2 | xxxx | xxxxxxx | xxxxx
Tks!
ps: Note that query.explain is NOT what i need.
I wrote a gem to do exactly this! http://tableprintgem.com
The most powerful feature of table_print is the ability to see your data in the context of other objects it relates to. You can reference nested objects with the method chain required to reach them. This example is showing data from three different tables:
name from the Author table (reached through author.name)
title from the Book table (reached through author.books.title)
caption from the Photo table (reached through author.books.photos.caption)
There's a short intro screencast at http://tableprintgem.com
Related
Good day, everyone. Hope you're doing well. I'm a Django newbie, trying to learn the basics of RESTful development while helping in a small app project. We currently want some of our models to update accordingly based on the data we submit to them, by using the Django ORM and the fields that some of them share wih OneToMany relationsips. Currently, there's a really difficult query that I must do for one of my fields to update automatically given that filter. First, let me explain the models. This are not real, but a doppleganger that should work the same:
First we have a Report model that is a teacher's report of a student:
class Report(models.Model):
status = models.CharField(max_length=32, choices=Statuses.choices, default=Statuses.created,)
student = models.ForeignKey(Student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
report_created = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
Here we have two related models, which are student and headroom_teacher. It's not necessary to show their models, but their relationship with the next two models is really important. We also have an Exams model:
class Exams(models.Model):
student = models.ForeignKey(student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
# Various dates
results_date = models.DateTimeField(null=True, blank=True)
initial_exam_date = models.DateTimeField(null=True, blank=True)
.
#Other fields that don't matter
As you can see, the purpose of this app is akin to reporting on the performance of students after completing some exams, and every Report is made by a teacher for specific student on how he did on those exams. Finally we have a final model called StudentMood that aims to show how should an student be feeling depending on the status of their exams:
class StudentMood(models.Model):
report = models.ForeignKey(Report, on_delete=models.CASCADE,)
student_status = models.CharField(
max_length=32, choices=Status.choices,
default=None, null=True, blank=False)
headroom_teacher = models.ForeignKey(TeacherStaff, on_delete=models.CASCADE,)
And with these three models is that we arrive to the crux of the issue. One of our possible student_status options is called Anxious for results, which we believe a student will feel during the time when he already has done an exam and is waiting for the results.
I want to automatically set my student_status to that, using a custom manager that takes into account the date that the report has been done or the day the data has been entered. I believe this can be done by making a query taking into account initial_exam_date.
I already have my custom manager set up, and the only thing missing is this query. I have no choice but to do it with Django's ORM. However, I've come up with an approximate raw SQL query, that I'm not sure if it's ok:
SELECT student_mood.id AS student_mood_id FROM
school_student_mood LEFT JOIN
school_reports report
ON student_mood.report_id = report.id AND student_mood.headroom_teacher_id = report.headroom_teacher_id
JOIN school_exams exams
ON report.headroom_teacher_id = exams.headroom_teacher_id
AND report.student_id = exams.student_id
AND exams.results_date > date where the student_mood or report data is entered, I guess
And that's what I've come to ask for help. Could someone shed some light into how to transfer this into a single query?
Without having an environment setup or really knowing exactly what you want out of the data. This is a good start.
Generally speaking, the Django ORM is not great for these types of queries, and trying to use select_related or prefetches results in really complex and inefficient queries.
I've found the best way to achieve these types of queries in Django is to break each piece of your puzzle down into a query that returns a "list" of ids that you can then use in a subquery.
Then you keep working down until you have your final output
from django.db.models import Subquery
# Grab the students of any exams where the result_date is greater than a specific date.
student_exam_subquery = Exam.objects.filter(
results_date__gt=timezone.now()
).values_list('student__id', flat=True)
# Grab all the student moods related to reports that relate to our "exams" where the student is anxious
student_mood_subquery = StudentMood.objects.filter(
student_status='anxious',
reports__student__in=Subquery(student_exam_subquery)
).values_list('report__id', flat=True)
# Get the final list of students
Student.objects.values_list('id', flat=True).filter(
reports__id__in=Subquery(student_mood_subquery)
)
Now I doubt this will work out of the box, but it's really to give you an understanding of how you might go about solving this in a way that is readable to future devs and the most efficient (db wise).
So, the issue I was running into, is that the school has exam cycles each period, and it was difficult to retrieve only the students' report for this cycle. Let's assume we have the following database:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 1 | 1 | Done | 01/01/2020 | 02/01/2020 |
| Student 2 | 2 | 2 | Done | 01/01/2020 | 02/01/2020 |
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
And Obviously, I wanted to limit my query to just this cycle, like this:
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student | Report ID | StudentMood ID | Exam Cycle Status | Initial Exam Date | Report created a |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
| Student 1 | 3 | 3 | On Going | 02/06/2020 | 01/01/2020 |
| Student 2 | 4 | 4 | On Going | 02/06/2020 | 01/01/2020 |
+-----------+-----------+----------------+-------------------+-------------------+------------------+
Now, your answer, trent, was really useful, but I'm still having issues retrieving in the shape of the above:
qs_exams = Exams.objects.filter(initial_exam_date__gt=now()).values_list('student__id', flat=True)
qs_report = Report.objects.filter(student__id__in=qs_exams).values_list('id', flat=True)
qs_mood = StudentMood.objects.select_related('report') \
.filter(report__id__in=qs_report).order_by('report__student_id', '-created').distinct()
But this query is still giving me all the StudentMoods throughout the school year. Sooooo, any ideas?
I have a table with a column that includes a handful of numbers delimited by a comma. I need to select * rows that include a particular value. I am using SQL Server and C# so it can be in SQL or LINQ.
The data in my channels column (varchar) looks something like this: 1,5,8,22,27,33
My Media table looks like this:
MediaID MediaName MediaDate ChannelIDs
------- --------- --------- ----------
1 | The Cow Jumped Over The Moon | 01/18/2015 | 1,5,8,22,27,33
2 | The Cat In The Hat | 01/18/2015 | 2,4,9,25,28,31
3 | Robin Hood The Thief | 01/18/2015 | 3,5,6,9,22,33
4 | Jingle Bells Batman Smells | 01/18/2015 | 6,7,9,24,25,32
5 | Up The River Down The River | 01/18/2015 | 5,6,10,25,26,33
etc...
My Channels Table looks like this:
ChannelID ChannelName
--------- -----------
1 Animals
2 Television
3 Movies
4 Nursery Rhymes
5 Holidays
etc...
Each row of Media could contain multiple channels.
Should I be using a contains search like this?
SELECT * FROM Media WHERE CONTAINS (Channels,'22')
This would require me to full-text index this column but I don't really want to include this column in my full-text index.
Is there a better way to do this?
Thanks
You should fix your data format so you are not storing numbers as comma-delimited strings. SQL has a great data structure for lists, it is called a table not a string. In particular, you want a junction table with one row per "media" entity and id.
That said, sometimes you are stuck with a particular data structure. If so, you can use like:
where ','+channels+',' like '%,22,%'
Note: this cannot take advantage of regular indexes, so performance will not be good. Fix the data structure if you have a large table and need better performance.
While trying to build a data warehousing application using Talend, we are faced with the following scenario.
We have two tables tables that look like
Table master
ID | CUST_NAME | CUST_EMAIL
------------------------------------
1 | FOO | FOO_BAR#EXAMPLE.COM
Events Table
ID | CUST_ID | EVENT_NAME | EVENT_DATE
---------------------------------------
1 | 1 | ACC_APPLIED | 2014-01-01
2 | 1 | ACC_OPENED | 2014-01-02
3 | 1 | ACC_CLOSED | 2014-01-02
There is a one-to-many relationship between master and the events table.Since, given a limited number of event names I proposing that we denormalize this structure into something that looks like
ID | CUST_NAME | CUST_EMAIL | ACC_APP_DATE_ID | ACC_OPEN_DATE_ID |ACC_CLOSE_DATE_ID
-----------------------------------------------------------------------------------------
1 | FOO | FOO_BAR#EXAMPLE.COM | 20140101 | 20140102 | 20140103
THE DATE_ID columns refer to entries inside the time dimension table.
First question : Is this a good idea ? What are the other alternatives to this scheme ?
Second question : How do I implement this using Talend Open Studio ? I figured out a way in which I moved the data for each event name into it's own temporary table along with cust_id using the tMap component and later linked them together using another tMap. Is there another way to do this in talend ?
To do this in Talend you'll need to first sort your data so that it is reliably in the order of applied, opened and closed for each account and then denormalize it to a single row with a single delimited field for the dates using the tDenormalizeRows component.
After this you'll want to use tExtractDelimitedFields to split the single dates field.
Yeah, this is a good idea, this is called a cumulative snapshot fact. http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
Not sure how to do this in Talend (dont know the tool) but it would be quite easy to implement in SQL using a Case or Pivot statement
Regarding only your first question, it's certainly a good idea -- unless there is any possibility of the same persons applying-opening-closing their account more than once AND you want to keep all this information in their history (so UPDATE wouldn't help).
Snowflaking is definitely not a good option if you are going to design a data warehouse. So, denormalizing will certainly be a good choice in this case. Following article almost fits perfectly to clear the air over such scenarios,
http://www.kimballgroup.com/2008/09/design-tip-105-snowflakes-outriggers-and-bridges/
I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.
let's assume i have a self referencing hierarchical table build the classical way like this one:
CREATE TABLE test
(name text,id serial primary key,parent_id integer
references test);
insert into test (name,id,parent_id) values
('root1',1,NULL),('root2',2,NULL),('root1sub1',3,1),('root1sub2',4,1),('root
2sub1',5,2),('root2sub2',6,2);
testdb=# select * from test;
name | id | parent_id
-----------+----+-----------
root1 | 1 |
root2 | 2 |
root1sub1 | 3 | 1
root1sub2 | 4 | 1
root2sub1 | 5 | 2
root2sub2 | 6 | 2
What i need now is a function (preferrably in plain sql) that would take the id of a test record and
clone all attached records (including the given one). The cloned records need to have new ids of course. The desired result
would like this for example:
Select * from cloningfunction(2);
name | id | parent_id
-----------+----+-----------
root2 | 7 |
root2sub1 | 8 | 7
root2sub2 | 9 | 7
Any pointers? Im using PostgreSQL 8.3.
Pulling this result in recursively is tricky (although possible). However, it's typically not very efficient and there is a much better way to solve this problem.
Basically, you augment the table with an extra column which traces the tree to the top - I'll call it the "Upchain". It's just a long string that looks something like this:
name | id | parent_id | upchain
root1 | 1 | NULL | 1:
root2 | 2 | NULL | 2:
root1sub1 | 3 | 1 | 1:3:
root1sub2 | 4 | 1 | 1:4:
root2sub1 | 5 | 2 | 2:5:
root2sub2 | 6 | 2 | 2:6:
root1sub1sub1 | 7 | 3 | 1:3:7:
It's very easy to keep this field updated by using a trigger on the table. (Apologies for terminology but I have always done this with SQL Server). Every time you add or delete a record, or update the parent_id field, you just need to update the upchain field on that part of the tree. That's a trivial job because you just take the upchain of the parent record and append the id of the current record. All child records are easily identified using LIKE to check for records with the starting string in their upchain.
What you're doing effectively is trading a bit of extra write activity for a big saving when you come to read the data.
When you want to select a complete branch in the tree it's trivial. Suppose you want the branch under node 1. Node 1 has an upchain '1:' so you know that any node in the branch of the tree under that node must have an upchain starting '1:...'. So you just do this:
SELECT *
FROM table
WHERE upchain LIKE '1:%'
This is extremely fast (index the upchain field of course). As a bonus it also makes a lot of activities extremely simple, such as finding partial trees, level within the tree, etc.
I've used this in applications that track large employee reporting hierarchies but you can use it for pretty much any tree structure (parts breakdown, etc.)
Notes (for anyone who's interested):
I haven't given a step-by-step of the SQL code but once you get the principle, it's pretty simple to implement. I'm not a great programmer so I'm speaking from experience.
If you already have data in the table you need to do a one time update to get the upchains synchronised initially. Again, this isn't difficult as the code is very similar to the UPDATE code in the triggers.
This technique is also a good way to identify circular references which can otherwise be tricky to spot.
The Joe Celko's method which is similar to the njreed's answer but is more generic can be found here:
Nested-Set Model of Trees (at the middle of the article)
Nested-Set Model of Trees, part 2
Trees in SQL -- Part III
#Maximilian: You are right, we forgot your actual requirement. How about a recursive stored procedure? I am not sure if this is possible in PostgreSQL, but here is a working SQL Server version:
CREATE PROCEDURE CloneNode
#to_clone_id int, #parent_id int
AS
SET NOCOUNT ON
DECLARE #new_node_id int, #child_id int
INSERT INTO test (name, parent_id)
SELECT name, #parent_id FROM test WHERE id = #to_clone_id
SET #new_node_id = ##IDENTITY
DECLARE #children_cursor CURSOR
SET #children_cursor = CURSOR FOR
SELECT id FROM test WHERE parent_id = #to_clone_id
OPEN #children_cursor
FETCH NEXT FROM #children_cursor INTO #child_id
WHILE ##FETCH_STATUS = 0
BEGIN
EXECUTE CloneNode #child_id, #new_node_id
FETCH NEXT FROM #children_cursor INTO #child_id
END
CLOSE #children_cursor
DEALLOCATE #children_cursor
Your example is accomplished by EXECUTE CloneNode 2, null (the second parameter is the new parent node).
This sounds like an exercise from "SQL For Smarties" by Joe Celko...
I don't have my copy handy, but I think it's a book that'll help you quite a bit if this is the kind of problems you need to solve.