How do I get counts of multiple records from a single table using db2 query?
Suppose I want to get the count of 1 record am using:
select count(*) from schema.table where record value='x'
What I need is a count of multiple records from the same table in separate rows for each record. I am trying something like:
select count(*) from schema.table where record in('x','y','z')
The queried result combines the value into one single value in a single row, which I don't want.
I almost agree with the Mureinik. You can add a WHERE clause to get multiple row counts from only those records you want, e.g. (x, y, z)
SELECT record, COUNT(*) AS 'count'
FROM schema.table WHERE record IN ('x', 'y', 'z')
GROUP BY record
result:
------------------
| record | count |
------------------
| x | 100 |
| y | 150 |
| z | 50 |
------------------
The group by syntax breaks the table up into groups, and allows you to perform aggregate functions (count, in your case) on each one separately:
SELECT record, COUNT(*)
FROM schema.table
GROUP BY record
Related
I have the following from another query
name | count
a | 1000
b | 100
c | 100
d | 100
x | 100
y | 100
z | 100
I need to create the final results where "a" is left unchanged; names matching 'b', 'c', 'd' are grouped as "group_B" with its new value as the sum of the three rows; all other names are grouped as "others" with its new value as the sum of all other names. Thank you!
I would recommend changing the first query to append another column. If your original query is something like this:
SELECT things
FROM table
Change it to this:
SELECT
things,
CASE
WHEN name='a' THEN NULL
WHEN name IN ('b','c','d') THEN 'group_B'
ELSE 'others'
END AS grouping
It should then be pretty easy to group your results how you like.
I am trying to get 3% of total membership which the code below does, but the results are bringing me back two rows one has the % and the other is "0" not sure why or how to get rid of it ...
select
sum(Diabetes_FLAG) * 100 / (select round(count(medicaid_no) * 0.03) as percent
from membership) AS PERCENT_OF_Dia
from
prefinal
group by
Diabetes_Flag
Not sure why it brought back a second row I only need the % not the second row .
Not sure what I am doing wrong
Output:
PERCENT_OF_DIA
1 11.1111111111111
2 0
SELECT sum(Diabetes_FLAG)*100 / (SELECT round(count(medicaid_no)*0.03) as percentt
FROM membership) AS PERCENT_OF_Dia
FROM prefinal
WHERE Diabetes_FLAG = 1
# GROUP BY Diabetes_Flag # as you're limiting by the flag in the where clause, this isn't needed.
Remove the group by if you want one row:
select sum(Diabetes_FLAG)*100/( SELECT round(count(medicaid_no)*0.03) as percentt
from membership) AS PERCENT_OF_Dia
from prefinal;
When you include group by Diabetes_FLAG, it creates a separate row for each value of Diabetes_FLAG. Based on your results, I'm guessing that it takes on the values 0 and 1.
Not sure why it brought back a second row
This is how GROUP BY query works. The group by clause group data by a given column, that is - it collects all values of this column, makes a distinct set of these values and displays one row for each individual value.
Please consider this simple demo: http://sqlfiddle.com/#!9/3a38df/1
SELECT * FROM prefinal;
| Diabetes_Flag |
|---------------|
| 1 |
| 1 |
| 5 |
Usually GROUP BY column is listed in in SELECT clause too, in this way:
SELECT Diabetes_Flag, sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| Diabetes_Flag | sum(Diabetes_Flag) |
|---------------|--------------------|
| 1 | 2 |
| 5 | 5 |
As you see, GROUP BY display two rows - one row for each unique value of Diabetes_Flag column.
If you remove Diabetes_Flag colum from SELECT clause, you will get the same result as above, but without this column:
SELECT sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| sum(Diabetes_Flag) |
|--------------------|
| 2 |
| 5 |
So the reason that you get 2 rows is that Diabetes_Flag has 2 distict values in the table.
I need to implement a query (or maybe a stored procedure) that will perform soft de-duplication of data in one of my tables. If any two records are similar enough, I need to "squash" them: deactivate one and update another.
The similarity is based on a score. Score is calculated the following way:
from both records, take values of column A,
values equal? add A1 to the score,
values not equal? subtract A2 from the score,
move on to the next column.
As soon as all desired value pairs checked:
is resulting score more then X?
yes – records are duplicate, mark older record as "duplicate"; append its id to a duplicate_ids column to the newer record.
no – do nothing.
How would I approach solving this task in SQL?
The table in question is called people. People records are entered by different admins. The de-duplication process exists to make sure no two same people exists in the system.
The motivation for the task is simple: performance.
Right now the solution is implemented in scripting language via several sub-par SQL queries and logic on top of them. However, the volume of data is expected to grow to tens of millions of records, and script will eventually become very slow (it should run via cron every night).
I'm using postgresql.
It appears that the de-duplication is generally a tough problem.
I found this: https://github.com/dedupeio/dedupe. There's a good description of how this works: https://dedupe.io/documentation/how-it-works.html.
I'm going to explore dedupe. I'm not going to try to implement it in SQL.
If I get you correctly, this could help.
You can use PostgreSQL Window Functions to get all the duplicates and use "weights" to determine which records are duplicated so you can do whatever you like with them.
Here is an example:
-- Temporal table for the test, primary key is id and
-- we have A,B,C columns with a creation date:
CREATE TEMP TABLE test
(id serial, "colA" text, "colB" text, "colC" text,creation_date date);
-- Insert test data:
INSERT INTO test ("colA", "colB", "colC",creation_date) VALUES
('A','B','C','2017-05-01'),('D','E','F','2017-06-01'),('A','B','D','2017-08-01'),
('A','B','R','2017-09-01'),('C','J','K','2017-09-01'),('A','C','J','2017-10-01'),
('C','W','K','2017-10-01'),('R','T','Y','2017-11-01');
-- SELECT * FROM test
-- id | colA | colB | colC | creation_date
-- ----+-------+-------+-------+---------------
-- 1 | A | B | C | 2017-05-01
-- 2 | D | E | F | 2017-06-01
-- 3 | A | B | D | 2017-08-01 <-- Duplicate A,B
-- 4 | A | B | R | 2017-09-01 <-- Duplicate A,B
-- 5 | C | J | K | 2017-09-01
-- 6 | A | C | J | 2017-10-01
-- 7 | C | W | K | 2017-10-01 <-- Duplicate C,K
-- 8 | R | T | Y | 2017-11-01
-- Here is the query you can use to get the id's from the duplicate records
-- (the comments are backwards):
-- third, you select the id of the duplicates
SELECT id
FROM
(
-- Second, select all the columns needed and weight the duplicates.
-- You don't need to select every column, if only the id is needed
-- then you can only select the id
-- Query this SQL to see results:
SELECT
id,"colA", "colB", "colC",creation_date,
-- The weights are simple, if the row count is more than 1 then assign 1,
-- if the row count is 1 then assign 0, sum all and you have a
-- total weight of 'duplicity'.
CASE WHEN "num_colA">1 THEN 1 ELSE 0 END +
CASE WHEN "num_colB">1 THEN 1 ELSE 0 END +
CASE WHEN "num_colC">1 THEN 1 ELSE 0 END as weight
FROM
(
-- First, select using window functions and assign a row number.
-- You can run this query separately to see results
SELECT *,
-- NOTE that it is order by id, if needed you can order by creation_date instead
row_number() OVER(PARTITION BY "colA" ORDER BY id) as "num_colA",
row_number() OVER(PARTITION BY "colB" ORDER BY id) as "num_colB",
row_number() OVER(PARTITION BY "colC" ORDER BY id) as "num_colC"
FROM test ORDER BY id
) count_column_duplicates
) duplicates
-- HERE IS DEFINED WHICH WEIGHT TO SELECT, for the test,
-- id defined the ones that are more than 1
WHERE weight>1
-- The total SQL returns all the duplicates acording to the selected weight:
-- id
-- ----
-- 3
-- 4
-- 7
You can add this query to a stored procedure so you can run it whenever you like. Hope it helps.
I have a below table
Select X,Y from T
X | Y
------
1 | 2
1 | 3
2 | 1
3 | 5
3 | 1
Column X and Y holds Strings, I gave numbers just for example.
I need output from this table as below
1,2
1,3
3,5
i,e, Unique sets from the table. Out of Row 1 (1,2) and Row 3 (2,1), I need only one set, because (1,2)=(2,1) in my set. Similarly (1,3)=(3,1).
So unique sets in this table are (1,2) (1,3) and (3,5).
I tried below SQL, let me know if there is a better way, as I am not sure whether I can use '>' or '<' with ROWID
SELECT X||','||Y FROM T t1
WHERE NOT EXISTS (SELECT 1 FROM T t2
WHERE t1.X=t2.Y AND t1.Y=t2.X and t1.ROWID>t2.ROWID)
select distinct least(x,y), greatest(x,y)
from the_table;
least() and greatest() put the values into an order so that 1,2 and 2,1 are returned as 1,2. The distinct then removes the duplicates
DISTINCT gets you distinct rows, so all you need to do is to have your pairs ordered, first the smaller then the larger. You do this with LEAST and GREATEST.
select distinct least(x,y) || ',' || greatest(x,y)
from t;
select id from table;
+------+
| id |
+------+
| 774 |
| 2775 |
+------+
return 2 rows
select count(id) as count, id from table;
+-------+-----+
| count | id |
+-------+-----+
| 2 | 774 |
+-------+-----+
but return 1 row
How to return all rows, but with counter in each record ?
SQL ???
+-------+------+
| count | id |
+-------+------+
| 2 | 774 |
| 2 | 2775 |
+-------+------+
SELECT id, (select count(*) from table) AS TotalRows
FROM table;
Although this seems unnecessary, as the total count will not change per row.
Use a group by
select id, count(id)
from table
group by id;
(BTW, your SQL in question does not work, at least in oracle and AFAIK in MySql)
I'm not sure what you're trying to do, but if you're trying to fetch the rows and get the total count in the same query because its a resource-intensive and you don't want to repeat your joins/conditions/whatever in two queries, under MySQL you can do:
# Returns a regular results set
SELECT SQL_CALC_FOUND_ROWS foo, bar FROM baz WHERE qux = 'corge' LIMIT 2;
# Returns the total count of found rows (without the LIMIT)
SELECT FOUND_ROWS();
If you want the total number of rows after the LIMIT, or don't have a LIMIT at all, you can skip the SQL_CALC_FOUND_ROWS.
However, generally speaking, counting the total number of rows doesn't scale very well. If you can, find an alternative way that doesn't require you to do that. for example, if its for paging, consider showing only 'next' / 'prev' buttons, without displaying the total number of pages. If you have 30 rows in a page, you can LIMIT 31 instead of 30, only display the first 30 rows, and check if the 31th row exists to know if a 'next' button should be displayed.
if you are useing oracle database you can use count Analytic function also for achieve this task as follow -
SELECT COUNT(*) OVER (PARTITION BY 1) AS COUNT, ID FROM TABLE