I am trying to find the average in a table that includes a count in each record.
I need to find the average as though there were individual records for each count listed in the record.
For example:
+-------+------------------+-------------------+
| Color | Value_to_Average | Number_of_Records |
+-------+------------------+-------------------+
| Red | 3 | 2 |
| Red | 2 | 3 |
| Green | 5 | 2 |
| Blue | 1 | 2 |
+-------+------------------+-------------------+
When I average the values individually, the result is 2.66667. How can I get this same result from the records with the counts?
SQL Fiddle
You want a weighted average:
select sum(Value_to_Average * Number_of_Records) / sum(Number_of_Records)
from Color_Avg t;
I think you're looking for something like this:
select (sum(value_to_average)*sum(number_of_records))/cast(sum(number_of_records) as double)
from table
Related
I have a table with the following structure and data in it:
| ID | Date | Result |
|---- |------------ |-------- |
| 1 | 30/04/2020 | + |
| 1 | 01/05/2020 | - |
| 1 | 05/05/2020 | - |
| 2 | 03/05/2020 | - |
| 2 | 04/05/2020 | + |
| 2 | 05/05/2020 | - |
| 2 | 06/05/2020 | - |
| 3 | 01/05/2020 | - |
| 3 | 02/05/2020 | - |
| 3 | 03/05/2020 | - |
| 3 | 04/05/2020 | - |
I'm trying to write an SQL query (I'm using SQL Server) which returns the date of the first two consecutive negative results for a given ID.
For example, for ID no. 1, the first two consecutive negative results are on 01/05 and 05/05.
The first two consecutive results for ID No. 2 are on 05/05 and 06/05.
The first two consecutive negative results for ID No. 3 are on on 01/05 and 02/05 .
So the query should produce the following result:
| ID | FirstNegativeDate |
|---- |------------------- |
| 1 | 01/05 |
| 2 | 05/05 |
| 3 | 01/05 |
Please note that the dates aren't necessarily one day apart. Sometimes, two consecutive negative tests may be several days apart. But they should still be considered as "consecutive negative tests". In other words, two negative tests are not 'consecutive' only if there is a positive test result in between them.
How can this be done in SQL? I've done some reading and it looks like maybe the PARTITION BY statement is required but I'm not sure how it works.
This is a gaps-and-island problem, where you want the start of the first island of '-'s that contains at least two rows.
I would recommend lead() and aggregation:
select id, min(date) first_negative_date
from (
select t.*, lead(result) over(partition by id order by date) lead_result
from mytable t
) t
where result = '-' and lead_result = '-'
group by id
Use LEAD or LAG functions over ID partition ordered by your Date column.
Then simple check where LEAD/LAG column is equal to Result.
You'll need also to filter the top ones.
The image attached just shows what LEAD/LAG would return
I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.
is it possible to return count of values in single row?
For example this is test table and I want to count of daily_typing_pages
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 3 | Jack | 2007-04-06 | 100 |
| 4 | Jill | 2007-04-06 | 220 |
| 5 | Zara | 2007-06-06 | 300 |
| 5 | Zara | 2007-02-06 | 350 |
+------+------+------------+--------------------+
Result of this count should be : 1610 how ever if I simply count() AROUND it return:
SQL>SELECT COUNT(daily_typing_pages) FROM employee_tbl ;
+---------------------------+
| COUNT(daily_typing_pages) |
+---------------------------+
| 7 |
+---------------------------+
1 row in set (0.01 sec)
So it return number of rows instead of count single row.
Is there some way how to do things like I want without using external programming language which will count it for me?
Thanks
You want SUM instead of COUNT. COUNT merely counts the number of records, you want them summed.
You didn't mention your DBMS, but see for example, for sql server this
Did you mean you want to summarize alle numbers of daily_typing_pages ?
So you can use sum(daily_typing_pages):
SELECT SUM(daily_typing_pages) FROM employee_tbl
I have a table that looks like this:
Day | Count
----+------
1 | 59547
2 | 40448
3 | 36707
4 | 34492
And I want it to query a result set like this:
Day | Count | Percentage of 1st row
----+-------+----------------------
1 | 59547 | 1
2 | 40448 | .6793
3 | 36707 | .6164
4 | 34492 | .5792
I've tried using window functions, but can only seem to get a percentage of the total, which looks like this:
Day | Count | Percentage of 1st row
----+-------+----------------------
1 | 59547 | 0.347833452
2 | 40448 | 0.236269963
3 | 36707 | 0.214417561
4 | 34492 | 0.201479024
But I want a percentage of the first row. I know I can use a cross join, that queries just for "Day 1" but that seems to take a long time. I was wondering if there was a way to write a window function to do this.
Judging from your numbers, you may be looking for this:
SELECT *, round(ct::numeric/first_value(ct) OVER (ORDER BY day), 4) AS pct
FROM tbl;
"A percentage for each row, calculated as ct divided by ct of the first row as defined by the smallest day number."
The key is the window function first_value().
-> SQLfiddle demo.
I have a report in reporting services. In this report, I am displaying the Top N values. But my Grand Total is displaying the sum of all the values.
Right now I am getting something like this.Here N = 2
+-------+------+-------------+
| Area |ID | Count |
+-------+------+-------------+
| - A | | 4 |
| | a1 | 1 |
| | b1 | 1 |
| | c1 | 1 |
| | d1 | 1 |
| | | |
| - B | | 3 |
| | a2 | 1 |
| | b2 | 1 |
| | c2 | 1 |
| | | |
|Grand | | 10 |
|Total | | |
+-------+------+-------------+
The correct Grand Total should be 7 instead of 10. A and B are toggle items(You can expand and contract)
How can I display the correct Grand Total using Top N filter?
I also want to use the filter in the report and not in the SQL query.
You should use the filter on the Dataset. Filtering the report object itself only turns off the items (rows, for example) visibility. The item / row itself will still be part of the group and will be used for calculations.
I found a way to solve my question. As Ido said I worked on the dataset. I am using Analysis Cube. So in this cube I created a Named Set Calculation.
In this set I used the TopCount() function. It filters out the TOP N values where N can be integer according to your choice.
So the final Named Set in this case is :-
TopCount([Dim Area].[Area].[Area], 2, ([Measures].[Count]))
This will give you Grand total of Top N filtered values.