Filtering data with from statement - sql

So let's say I have this table with these rows in it
Table name: MYTABLE
ID | NUMBER | FK_ID
1 | 0 | 26
2 | 0 | 26
3 | 1 | 26
4 | 0 | 27
5 | 1 | 27
Now I want to filter out only the rows that that go under the same FK_ID and have two or more NUMBER 0's in them.
So for instance if I would apply this filter here, I would only see one row which corresponds to the FK_ID 26 because it has two NUMBER 0s in it's MYTABLE data.
Is this even possible to do or should I just handle the whole data in my programming language not filter it like that from DB.

SELECT FK_ID ,
COUNT(DECODE(NUMBER ,0,1))
FROM TEST_DATA
GROUP BY FK_ID
HAVING COUNT(DECODE(NUMBER ,0,1)) >= 2
Fiddle here : http://sqlfiddle.com/#!4/44d70/4

Does this query work for you?
SELECT
FK_ID
FROM MYTABLE
WHERE NUMBER = 0
GROUP BY FK_ID
HAVING COUNT(*) >= 2;
Also, consider renaming the NUMBER column, as NUMBER is a reserved word in Oracle.

Related

How do you flip rows into new columns?

I've got a table that looks like this:
player_id | violation
---------------------
1 | A
1 | A
1 | B
2 | C
3 | D
3 | A
And I want to turn it into this, with a bunch of new columns that refer to the types of violations, and then the sum of the number of each individual type of violation that each player got (not that concerned with what the columns are called; a/b/c/d would work great as well):
player_id | violation_a | violation_b | violation_c | violation_d
-----------------------------------------------------------------
1 | 2 | 1 | 0 | 0
2 | 0 | 0 | 1 | 0
3 | 1 | 0 | 0 | 1
I know how I could do this, but it would take a ton of lines of code, since there are in reality 100+ types of violations. Is there any way (perhaps with a tablefunc()?) that I could do this more concisely than spelling out each of the new 100+ columns that I want and the logic for them each individually?
In pure SQL I don't see how you could avoid declaring the columns yourself. You either have to create subselects or filters in every column ..
SELECT DISTINCT ON (t.player_id)
t.player_id,
count(*) FILTER (WHERE violation = 'A') AS violation_a,
count(*) FILTER (WHERE violation = 'B') AS violation_b,
count(*) FILTER (WHERE violation = 'C') AS violation_c,
count(*) FILTER (WHERE violation = 'D') AS violation_d
FROM t
GROUP BY t.player_id;
.. or create a pivot table:
SELECT *
FROM crosstab(
'SELECT player_id, t2.violation, count(*) FILTER (WHERE t.violation = t2.violation)::INT
FROM t,(SELECT DISTINCT violation FROM t) t2
GROUP BY player_id, t2.violation'
) AS ct(player_id INT,violation_a int,violation_b int,violation_c int,violation_d int);
Demo: db<>fiddle

Count results in SQL statement additional row

I am trying to get 3% of total membership which the code below does, but the results are bringing me back two rows one has the % and the other is "0" not sure why or how to get rid of it ...
select
sum(Diabetes_FLAG) * 100 / (select round(count(medicaid_no) * 0.03) as percent
from membership) AS PERCENT_OF_Dia
from
prefinal
group by
Diabetes_Flag
Not sure why it brought back a second row I only need the % not the second row .
Not sure what I am doing wrong
Output:
PERCENT_OF_DIA
1 11.1111111111111
2 0
SELECT sum(Diabetes_FLAG)*100 / (SELECT round(count(medicaid_no)*0.03) as percentt
FROM membership) AS PERCENT_OF_Dia
FROM prefinal
WHERE Diabetes_FLAG = 1
# GROUP BY Diabetes_Flag # as you're limiting by the flag in the where clause, this isn't needed.
Remove the group by if you want one row:
select sum(Diabetes_FLAG)*100/( SELECT round(count(medicaid_no)*0.03) as percentt
from membership) AS PERCENT_OF_Dia
from prefinal;
When you include group by Diabetes_FLAG, it creates a separate row for each value of Diabetes_FLAG. Based on your results, I'm guessing that it takes on the values 0 and 1.
Not sure why it brought back a second row
This is how GROUP BY query works. The group by clause group data by a given column, that is - it collects all values of this column, makes a distinct set of these values and displays one row for each individual value.
Please consider this simple demo: http://sqlfiddle.com/#!9/3a38df/1
SELECT * FROM prefinal;
| Diabetes_Flag |
|---------------|
| 1 |
| 1 |
| 5 |
Usually GROUP BY column is listed in in SELECT clause too, in this way:
SELECT Diabetes_Flag, sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| Diabetes_Flag | sum(Diabetes_Flag) |
|---------------|--------------------|
| 1 | 2 |
| 5 | 5 |
As you see, GROUP BY display two rows - one row for each unique value of Diabetes_Flag column.
If you remove Diabetes_Flag colum from SELECT clause, you will get the same result as above, but without this column:
SELECT sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| sum(Diabetes_Flag) |
|--------------------|
| 2 |
| 5 |
So the reason that you get 2 rows is that Diabetes_Flag has 2 distict values in the table.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

In MS Access, how do I update a table record to its current value plus the count of records in a different table?

I have two tables.
**tblMonthlyData**
ReportMonth | TotalItems | TotalVariances
Jan | 5 | 0
Feb | 1 | 1
Mar | 2 | 0
Apr | 8 | 4
May | 4 | 0
Jun | 5 | 0
Jul | 3 | 0
Aug | 5 | 0
Sep | 9 | 3
Oct | 1 | 0
Nov | 7 | 0
Dec | 6 | 0
and
**tblDailyData**
ID | ItemNum | CountedQty | SystemQty | Variance
1 | Item1 | 4 | 4 | 0
2 | Item2 | 8 | 5 | -3
3 | Item3 | 1 | 2 | 1
4 | Item4 | 6 | 4 | -2
For the sake of clarity, we'll say the above tblDailyData is from a count done today, 01/27/2017. Variance is a calculated field based on the data in both quantity fields.
I'm trying to add the count of records in tblDailyData to TotalItems in tblMonthlyData based on the date of the count (i.e. counts are done daily and each counts data needs to be added to the appropriate month in tblMonthlyData). So for the above example I'd need to add 4 (number of records) to TotalItems in tblMonthlyData for the Jan record, resulting in the updated record being 9, and add 3 (number of variances) to TotalVariances, resulting in the updated record being 3.
So far, I've tried using a Make Table Query for both total items counted and total number of variances, then using an Update Query that looks like this:
UPDATE tblMonthlyData
SET TotalItems = TotalItems + tblTempTotalItems.CountOfItems,
TotalVariances = TotalVariances + tblTempTotalVariances.CountOfVariances
WHERE Format$([ReportMonth],"mmm")=Format$(Now(),"mmm");
I've also tried a similar method using select queries to count records and variances (without creating the temporary tables) and running the update query based on those. Both methods result in Access prompting for the CountOfItems and CountOfVariances parameters when the update query is ran instead of just taking the values from the specified temporary table or select query.
This seemed like it'd be such a simple operation (query the count of records and variances, add them to the appropriate monthly record in separate table), but it turns out I can't figure out how to make it work. Thanks for any help!
This does not seem to be a situation for a table, but rather for some views/queries, which will always be up to date.
Use a GROUP BY FORMAT([date_field],"mm/dd/yyyy") clause in your query for daily item count (if you want to add that to a montlhy count, we will do that in ANOTHER query.
SELECT FORMAT([date_field],"mm/dd/yyyy") AS Date, COUNT(ID) AS TotalItems
FROM tblDailyData
GROUP BY Date
Call this query dailyTotalItems.
SELECT FORMAT([date_field],"mm/dd/yyyy") AS Date, COUNT(ID) AS TotalItemsWithVariance, SUM(
FROM tblDailyData
WHERE NOT (Variance = 0)
GROUP BY Date
Call this query dailyTotalItemsWithVariance.
SELECT MONTH([date_field]) As MonthDate, SUM(TotalItems) As TotalMonthlyItems
FROM dailyTotalItems
GROUP BY MonthDate
Call this query monthlyTotalItems.
SELECT MONTH([date_field]) As MonthDate, SUM(TotalItemsWithVariance) As TotalMonthlyItemsWithVariance
FROM dailyTotalItemsWithVariance
GROUP BY MonthDate
Call this query monthlyTotalItemsWithVariance.
Then LEFT JOIN both on MonthDate.
SELECT * FROM monthlyTotalItems
LEFT JOIN monthlyTotalItemsWithVariance ON monthlyTotalItems.MonthDate = monthlyTotalItemsWithVariance.MonthDate
NOTE: TotalItems will always be >= TotalItemsWithVariance AND every date with a variance must have had a count. So get ALL dates in monthlyTotalItems and left join to match the monthlyTotalItemsWithVariance items (which must be included, as shown above)

Update statement to set a column based the maximum row of another table

I have a Family table:
SELECT * FROM Family;
id | Surname | Oldest | Oldest_Age
---+----------+--------+-------
1 | Byre | NULL | NULL
2 | Summers | NULL | NULL
3 | White | NULL | NULL
4 | Anders | NULL | NULL
The Family.Oldest column is not yet populated. There is another table of Children:
SELECT * FROM Children;
id | Name | Age | Family_FK
---+----------+------+--------
1 | Jake | 8 | 1
2 | Martin | 7 | 2
3 | Sarah | 10 | 1
4 | Tracy | 12 | 3
where many children (or no children) can be associated with one family. I would like to populate the Oldest column using an UPDATE ... SET ... statement that sets it to the Name and Oldest_Age of the oldest child in each family. Finding the name of each oldest child is a problem that is solved quite well here: How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
However, I don't know how to use the result of this in an UPDATE statement to update the column of an associated table using the h2 database.
The following is ANSI-SQL syntax that solves this problem:
update family
set oldest = (select name
from children c
where c.family_fk = f.id
order by age desc
fetch first 1 row only
)
In h2, I think you would use limit 1 instead of fetch first 1 row only.
EDIT:
For two columns -- alas -- the solution is two subqueries:
update family
set oldest = (select name
from children c
where c.family_fk = f.id
order by age desc
limit 1
),
oldest_age = (select age
from children c
where c.family_fk = f.id
order by age desc
limit 1
);
Some databases (such as SQL Server, Postgres, and Oracle) support lateral joins that can help with this. Also, row_number() can also help solve this problem. Unfortunately, H2 doesn't support this functionality.