Not understanding how to express a particular type of query with ActiveRecord's syntax.
Company
has_many shareprices
When the processes run to update the shareprices, a new entry is created in the shareprice table. So you obviously end up with lots of shareprice rows for each company. In such a scenario you could end up with 1 company having 5 very high share price entries in the last time period.
So let's say I want to return the companies that currently have the highest shareprice in the last time period - I need it to basically return the max price for that company from the shareprices table and then find the next.
I can't work out how to do that with ActiveRecord syntax. I've tried a lot of approaches inspired by other stackoverflow answers but invariably can't get it to return unique companies so I'm missing a point somewhere around select unique, joining, group by, or something else.
Environment: postgresSQL backend deploying to heroku.
Help very much appreciated.
Without knowing your structure it's hard to provide a concrete answer. But somthing like this should work.
Company.select('max(share_prices.price) as price,companies.id').joins(:share_prices).group(:id).order('price')
Related
I am trying to figure out how I can relate records form 1 table to each other
I have a table with Individual cases (e.g disciplinary, grievance etc)
However, multiples of these cases could relate to each other.
e.g a group of people get into a fight, all the people involved would get an individual case, then all the cases would need to be related/linked.
Im struggling to figure out how would be best to store this data, whether it be in the same table or a new table.
This is the backend to a Winform. So now I need to be able to link records, in the example data the user would select that caseID 1- 4 are linked.
So the question is how do i store the data that these cases are related. Because other case might be linked at a later date.
It sounds like you need some sort of "master case" or "incident".
I think I would say that an "incident" comprises one or more "cases" and keep both an incidents and cases tables. In fact cases might be a bad name. Instead, it might be more like IncidentPerson.
An alternative approach would be to have a "master case" id on each case. This could be NULL or the same case. I'm not as fond of this approach because it will likely lead to confusion down the road. One analyst will count cases per month using "cases" and another using "master cases" and you'll spend a lot of time trying to figure out why the numbers are different.
I need to render a school schedule in a very detailed way (see screenshot). Every Day, Hour of the day, Room, Group and Teacher are separate entities that are related one to another in a certain way.
My issue is I have to pull data to every single cell via separate query and that's how I am having over 700 queries to get week's schedule.
The question is: what is the best approach to store, manipulate and pull data for such demands?
I was thinking about making a separate 'static' table to store the actual values, not related ID's, but then I am loosing flexibility.
Here's my best try.
id
date
room_id
group_id
teacher_id
If you're sure that the data is very static, it might be more manageable to put teacher_9, group_9 etc. as columns in your Schedule Table. Tradeoffs are yours.
For Teacher data, you most likely want it in a different table for future changes. Imagine a Name change. Same with Group and Room.
If you're concerned about query performance, know that SQL will cache your results :).
*As a first note, I only have read access to my server. Just, FYI as it seems to come up a lot...
Server:DB2(6.1) for i (IBM)
I have a query I'm running on a table that has 19mil rows in it (I don't design them, I just query them). I've been limiting my return data to 10 rows (*) until I get this query sorted out so that return times are a bit more reasonable.
The basic design is that I need to get data about categories of products we sell on a week by week basis, using columns: WEEK_ID, and CATEGORY. Here's example code (with some important bits #### out.)
SELECT WEEK_ID, CATEGORY
FROM DWQ####.SLSCATW
INNER JOIN DW####.CATEGORY
ON DWQ####.SLSCATW.CATEGORY_NUMBER = DW####.CATEGORY.CATEGORY_NUMBER
WHERE WEEK_ID
BETWEEN 200952 AND 201230 --Format is year/week
GROUP BY WEEK_ID, CATEGORY
If I comment out that last line I can get back 100 rows in 254 ms. If I put that line back in my return takes longer than I've had patience to wait for :-). (Longest I've waited is 10 minutes.)
This question has two parts. The first question is quite rudimentary: Is this normal? There are 50 categories (roughly) and 140 weeks (or so) that I'm trying to condense down to. I realize that's a lot of info to condense off of 19mil rows, but I was hoping limiting my query to 10 rows returned would minimize the amount of time?
And, if I'm not just a complete n00b, and this in fact should not take several minutes, what exactly is wrong with my SQL?
I've Googled WHERE statement optimization and can't seem to find anything. All links and explanation are more than welcome.
Apologies for such a newbie post... we all have to start somewhere, right?
(*)using SQLExplorer, my IDE, an Eclipse implementation of Squirrel SQL.
I'm not sure how the server handles group by when there's no aggregating functions in the query. Based on your answers in the comments, I'd just try to add those:
SELECT
...,
SUM(SalesCost) as SalesCost,
SUM(SalesDollars) as SalesDollars
FROM
...
Leave the rest of the query as is.
If that doesn't solve the problem, you might have missing indexes. I would try to find out if there's an index where the WEEK_ID is the only column or where it is the first column. You could also check if you have another temporal column (i.e. TransactionDate or something similar) on the same table that already is indexed. If so, you could use that instead in the where clause.
Without correct indexes, the database server is forced to do a complete table scan, and that could explain your performance issues. 39 million rows does take some not insignificant amount of time to read from disk.
Also check that the data type of WEEK_ID is int or similar, just to avoid unnecessary casting in your query.
To avoid a table scan on the Category table, you need to make sure that Category_Number is indexed as well. (It probably already is, since I assume it is a key to that table.)
Indices on WEEK_ID, CATEGORY (and possibly CATEGORY_NUMBER) is the only way to make it really fast, so you need to convince the DBO to introduce those.
I am working for a K-12 school and I am developing a gradebook system. The existing gradebook system that we use which I also developed is based on Excel with VBA. It's always a nightmare for me to consolidate 400+ Excel workbooks every end of term. During the summer break, I'm planning to put all data in a database for easy management and stuff.
My problem is this:
For a computation-intensive application like a gradebook, is it good to store the computations in a table field or is it better to ONLY store the raw data and do the computations only on the frontend?
The way the Excel gradebook system works is like this...
Teacher records each score for each assessment of each student in the form of score / highest possible score. (e.g. Quiz 1 = 5/10, Homework 1 = 20/25, etc.)
The scores will be calculated as percentages and summarized per component. "Component" means Quiz, Homework, etc. So there will be something like "Quizzes Average = 90%, Homework Average = 80%, etc."
Different subjects will have different final grade breakdown like "Science = 50% Quizzes + 50% Homeworks, Math = 60% Quizzes + 40% Homeworks".
Then, the general average grade of each student is computed by getting the average of all subjects.
Everything above is very easy to make in a spreadsheet but I don't know how to implement it in a database.
As of the moment, I'm thinking something like having a table where all assessments are recorded like this:
tbl_scores
id
student_id
term_id
subject_id
component_id
assessment_id
raw_score
highest_possible_score #not sure about this cause this can be implied from assessment_id
Would it be useful to store the computations (percentage for each score entry, component average, subject average, general average, etc. in the database) and use stored procedures and triggers to update them when a new score comes in?
Or, is it better to just store the raw scores and calculate everything ONLY on the frontend? Will this option be faster than the first one knowing that the SELECT queries here will really be more complex due to subqueries?
By the way, I'm planning to use PostgreSQL for this.
Thanks in advance for any advice.
DashMug
400 students, that is peanuts for an RDBMS that is designed to handle millions of records. You need not worry about performance.
I have seen nothing in your description that PostgreSQL could not do in simple queries. Store only the necessary information and calculate the rest. Also, do not add a redundant column highest_possible_score to your table tbl_scores if that information is in the linked table assessment already. That would do more harm than good.
If (and only if) you should later find that a query is too slow, you can always use materialized views (write the output of a query to a table, recalculate that table after changes to the base data.)
Denormalization depends on total amounts of records. I think that for school it would not be so big to do it, better decision is to consolidate data with queries (yes it will be more complex some joins will appear, that is normal)
Im creating a little database that has employee, emp_shift, shift, tables
now im suppose to be able to calculate at the end of the month which employee
has done the most number of shifts.
Ive created the SQL creation, insert statements for the tables, and a little diagram to explain what im trying to acomplish, im a beginner and this is a homework ive been trying to do for the last 4 days.
Diagram: http://latinunit.net/emp_shift.jpg
SQL: http://latinunit.net/emp_shift.txt
can you please guys check it, deadline is 2 days and this is just a part of the whole database
That is a reasonable start. Will you have more tables? If not, it will be hard to identify how to pay people -- for example, it seems that you might want a "pay-period" table. Then you could find the start and end dates and be able to count the shifts within that period.
But if all you need to do is exactly what you said, that is a fair start.
(I am assuming you have other columns in mind, such as employee name, but that would be obvious).
You could start by telling us whoch RDBMS you are using, as some of the finer details might be different between RDMSs.
You need to create a link between tables (Called JOINS, Read this) and then perform a count of the requested data.
After you have read some of these, show us what you have done, and we can help you where you are having trouble.
also, it would be better practive to use a single numeric as the primary key instead of 'A', 'B', 'C' etc.