Count items and create an array - sql

I have two columns A and B, and would like to get a list of items(and their counts) in column B grouped by items in column A, and create a new table with the information. So the new table will look something like:
newCol1 | newCol2
--------+--------
a1, | b1:3,b4:1,b7:11
a2, | b2:1,b3:5,b4:3,b8:2
...and so forth. (delimiters can be anything, though. If concatenating item and count is not possible, I could also have one column with a list of items and another column with a list of counts separated by a delimiter.)
I can do this in Java by first getting all the items and storing them in a map with count updates, and then update the new table, but I was wondering if there's any way to do this in PostgreSQL (perhaps by writing a function).
I've looked at array function in PostgreSQL but didn't get far. Any pointers as well as suggestions for storing such data would be appreciated.

a and b are of type text, I assume.
SELECT a, array_agg(bs) AS b_list
FROM (
SELECT a, b || ':' || count(*) AS bs -- coerced to text automatically
FROM tbl
GROUP BY a, b
ORDER BY a, b -- to sort b_list in the result
) x
GROUP BY a;
Or use string_agg() as #a_horse demonstrates to get a string instead of an array as result.

You didn't supply any table definition nor input data (that should yield your output) so this is just a shot in the dark:
select a, string_agg(b||':'||to_char(b_count), ',)
from (
select a,
b,
count(b) over (partition by a) as b_count,
from the_unknown_table
) t
group by a

Related

Valid SQL causes Access error requiring expression in SELECT and GROUP? [duplicate]

I have this:
SELECT name, value,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200
AND date_num <= 1640995200
AND name IN('A')
GROUP BY name
Trying to get the minimum value between dates for each subject separately :
name value
A. 3
B 4
C 9
A 0
C 2
I keep getting this popular error:
column "history.value" must appear in the GROUP BY clause or be used in an aggregate function
I read this must appear in the GROUP BY clause or be used in an aggregate function
and I still do not understand:
Why I have to include in GROUP BY everything? what is the logic?
Why is this not working?
is Min() over (partition by name) better, and if so, how can I get only a single result per name?
EDIT:
If I try:GROUP BY name, find_min it will fail as well, even though in this case he can produce a unique result (the all the same)
That is actually easy to understand.
When you say GROUP BY name, all rows where name is the same are grouped together to form a single result row. Now the original table could contain two rows with the same name, but different value. If you add value to the SELECT list, which of those should be output? On the other hand, determining min(value) for each group is no problem.
Even if there is only a single value for the whole group (like with your find_min), you have to add the column to GROUP BY.
There is actually one exception: if the primary key of a table is in the GROUP BY clause, other columns from that table need not be in GROUP BY, because this proves automatically that there can be no different values.
try like below
SELECT name,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200 AND date_num <= 1640995200
GROUP BY name
I removed name in ('A') because your are searching for all name min value so it will restrict just A
To answer your question, GROUP BY groups similar data in a table.
For example this table:
A B C
a d 1
a k 2
b d 3
And you have the query:
SELECT A, B, MIN(C)
FROM t
GROUP BY A
and this would not work you can't give a decisive answer what to do with the entry a k 2 because you don't group by Column B, but you group by column A, is there now two entries but they are different. Therefore you have to group by all non min,max,sum,etc. columns.

Create table using multiple table

I'm trying to create a table from multiple tables.
table a has list of IDs and I need count of IDs stored in table C.
Table b has list of IDs and need the count of them as well in table C.
I'm trying below but getting an error:
Create or replace tablec as
select
Count(id) as total
from table a,
select count(ref) as ref_total
from table b
My desired output should look like below and should be filter applied by date.
Consider using a subquery for each count like below.
CREATE OR REPLACE TABLE TableC AS
SELECT (SELECT COUNT(id) FROM TableA) AS total,
(SELECT COUNT(ref) FROM TableB) AS ref_total
;
output:
But date range is something optional. For date range, I'm thinking to get whole data in new table then I can run select on table C
I think you can add date filter in WHERE clause of each subquery for your purpose.

Why column must appear in the GROUP BY?

I have this:
SELECT name, value,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200
AND date_num <= 1640995200
AND name IN('A')
GROUP BY name
Trying to get the minimum value between dates for each subject separately :
name value
A. 3
B 4
C 9
A 0
C 2
I keep getting this popular error:
column "history.value" must appear in the GROUP BY clause or be used in an aggregate function
I read this must appear in the GROUP BY clause or be used in an aggregate function
and I still do not understand:
Why I have to include in GROUP BY everything? what is the logic?
Why is this not working?
is Min() over (partition by name) better, and if so, how can I get only a single result per name?
EDIT:
If I try:GROUP BY name, find_min it will fail as well, even though in this case he can produce a unique result (the all the same)
That is actually easy to understand.
When you say GROUP BY name, all rows where name is the same are grouped together to form a single result row. Now the original table could contain two rows with the same name, but different value. If you add value to the SELECT list, which of those should be output? On the other hand, determining min(value) for each group is no problem.
Even if there is only a single value for the whole group (like with your find_min), you have to add the column to GROUP BY.
There is actually one exception: if the primary key of a table is in the GROUP BY clause, other columns from that table need not be in GROUP BY, because this proves automatically that there can be no different values.
try like below
SELECT name,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200 AND date_num <= 1640995200
GROUP BY name
I removed name in ('A') because your are searching for all name min value so it will restrict just A
To answer your question, GROUP BY groups similar data in a table.
For example this table:
A B C
a d 1
a k 2
b d 3
And you have the query:
SELECT A, B, MIN(C)
FROM t
GROUP BY A
and this would not work you can't give a decisive answer what to do with the entry a k 2 because you don't group by Column B, but you group by column A, is there now two entries but they are different. Therefore you have to group by all non min,max,sum,etc. columns.

How can I separate same column values to a variable based on value in another column?

suppose I Have below table
A
B
1
one
2
two
1
three
2
four
1
last
for value in A=1
then I need the output as one;three;last
how can I query this in Oracle's SQL?
If you care whether you get the string "one;three;last" or "three;one;last" or some other combination of the three values, you'd need some additional column to order the results by (a database table is inherently unordered). If there is an id column that you're not showing, for example, that could do that, you'd order by id in the listagg.
If you don't care what order the values appear in the result, you could do something like this
select listagg( b, ';' ) within group (order by a)
from your_table
where a = 1

postgres query to get first row based on multiple copies of some columns

Suppose I have a table -
A B C
1 3 5
1 3 7
1 3 9
2 4 3
2 4 6
2 4 1
here there are multiple copies for the same combination of A and B. for each combination I want back the first entry of it.
so the result for this table i want to be-
A B C
1 3 5
2 4 3
How can I do this in postgres sql?
Assuming you can define "first" in terms of a sort on a, b, and c you want DISTINCT ON for this.
SELECT
DISTINCT ON ("A", "B")
"A", "B", "C"
FROM Table1
ORDER BY "A", "B", "C";
E.g. http://sqlfiddle.com/#!15/9ca16/1
See SELECT for more on DISTINCT ON.
If you have made the serious mistake of assuming SQL tables have an inherent order, you're going to need to fix your table before you proceed. You can use the PostgreSQL ctid pseudo-column to guide the creation of a primary key that matches the current on-disk table order. It should be safe to just:
ALTER TABLE mytable ADD COLUMN id SERIAL PRIMARY KEY;
as PostgreSQL will tend to write the key in table order. It's not guaranteed, but neither is anything else when there's no primary key. Then you can:
SELECT
DISTINCT ON ("A", "B")
"A", "B", "C"
FROM Table1
ORDER BY id;
(Edit: I don't recommend using ctid in queries baked into applications. It's a handy tool for solving specific problems, but it's not really public API in PostgreSQL, and it's not part of the SQL standard. It's not like ROWID in Oracle, it changes due to vacuum etc. PostgreSQL is free to break/change/remove it in future versions.)
Well, you can sort of do this. SQL tables have no concept of ordering, so you really need a column to specify the order. The following returns an arbitrary row from each group:
select distinct on(a, b) a, b, c
from table t
order by a, b;
Normally, you would use something like:
select distinct on(a, b) a, b, c
from table t
order by a, b, id desc;