Align text and organize bars in Grouped Bar plot - matplotlib

I am trying to plot grouped bar with a line graph on it using seaborn.
So far I have achieved to plt the graph. However, the bars are getting overlapped and also need some help to align the text.
DataFrame:
df_long = pd.DataFrame({
'group': [ 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22],
'status': ['1st hour', '1st hour', '1st hour', '1st hour', '1st hour',
'1st hour', '1st hour', '1st hour', '1st hour', '1st hour',
'1st hour', '1st hour', '1st hour', '1st hour', '2nd hour',
'2nd hour', '2nd hour', '2nd hour', '2nd hour', '2nd hour',
'2nd hour', '2nd hour', '2nd hour', '2nd hour', '2nd hour',
'2nd hour', '2nd hour', '2nd hour', '3rd hour', '3rd hour',
'3rd hour', '3rd hour', '3rd hour', '3rd hour', '3rd hour',
'3rd hour', '3rd hour', '3rd hour', '3rd hour', '3rd hour',
'3rd hour', '3rd hour', '4th hour', '4th hour', '4th hour',
'4th hour', '4th hour', '4th hour', '4th hour', '4th hour',
'4th hour', '4th hour', '4th hour', '4th hour', '4th hour',
'4th hour', '5th hour', '5th hour', '5th hour', '5th hour',
'5th hour', '5th hour', '5th hour', '5th hour', '5th hour',
'5th hour', '5th hour', '5th hour', '5th hour', '5th hour',
'6th hour', '6th hour', '6th hour', '6th hour', '6th hour',
'6th hour', '6th hour', '6th hour', '6th hour', '6th hour',
'6th hour', '6th hour', '6th hour', '6th hour'],
"value":
[44.88, 45.56, 46.67, 47.37, 47.74, 49.1 , 50.68, 49.64, 50.97,
48.5 , 52.69, 54.38, 49.89, 58.66, 16.14, 17.22, 15.77, 16.69,
16.22, 16.41, 15.68, 16.21, 15.54, 15.55, 14.1 , 14.08, 16.44,
12.82, 6.45, 6.13, 6.12, 5.47, 5.89, 6.13, 5.92, 6.26,
6.08, 6.38, 7.88, 5.96, 5.38, 4.73, 4.14, 3.68, 3.76,
3.62, 3.69, 3.89, 3.64, 3.84, 3.73, 6.16, 3.62, 2.91,
3.27, 3.12, 3.35, 2.47, 3.25, 2.92, 3.47, 2.77, 2.51,
2.81, 3.65, 2.98, 2.18, 1.59, 2.18, 0.58, 2.6 , 2.06,
2.55, 2.57, 2.52, 2.33, 2.84, 2.6 , 2.15, 1.71, 0.93,
0.86, 1.24, 0.92]})
Code:
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots(figsize=(20, 8))
g = sns.barplot(data=df_long, x='group', y='value', hue='status', ax=ax)
for bar in g.patches:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/3, 0.7 * height, int(height),
ha='center', va='center', color='white')
def change_width(ax, new_value) :
for patch in ax.patches :
current_width = patch.get_width()
diff = current_width - new_value
# we change the bar width
patch.set_width(new_value)
# we recenter the bar
patch.set_x(patch.get_x() + diff * .5)
change_width(ax, .30)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
Output:
I wanted to alight the text and organise these bars for each group.
I want to plot a line graph over these bars by taking the sum of "value" column for each group.
Code:
ax2 = ax.twinx()
color = 'tab:red'
ax2.set_ylabel('AverageEfficiency', fontsize=16)
freq = df_long.groupby('group').agg({"value": "sum"})
ax2 = sns.lineplot(x=freq.index.values, y=freq.value.values)
ax2.tick_params(color=color)
But this din't work well.

for the line plot use:
freq = df_long.groupby('group').agg({"value": "sum"}).reset_index(drop=True)

Related

How to accurately sum/aggregate a SQL Running Total?

I have the below scenario (CTE SQL example) where we have product sales data at this granularity;
date level
source (device, country)
fiscal period (year, week)
product information (group)
I have a Running Total using Over Partition By, "FYTD" = Fiscal Year To Date, which seems to work as expected counting the running total by the various dimensions, but when I sum that in the final results it is inflated, as we are summing the FYTD values as of each day, rather than at the most recent level of granularity.
How can we return the accurate, true FYTD sum as of the most recent day in the results, with a solution that is scalable to a bigger results set with more fiscal years/weeks? I am testing this in Snowflake.
with rawdata as (
select * from
values
('2022-10-01', 2022, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-01', 2022, 1, 'Desktop', 'UK', 'Flip Flops', 1),
('2022-10-01', 2022, 1, 'Desktop', 'UK', 'Sunglasses', 5),
('2022-10-01', 2022, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-01', 2022, 1, 'Tablet', 'UK', 'Shoes', 1),
('2022-10-02', 2022, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-02', 2022, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-02', 2022, 1, 'Tablet', 'UK', 'Shoes', 4),
('2022-10-03', 2022, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-03', 2022, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-03', 2022, 1, 'Tablet', 'UK', 'Shoes', 5),
('2022-10-01', 2022, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-01', 2022, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-01', 2022, 1, 'Tablet', 'UK', 'Socks', 1),
('2022-10-02', 2022, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-02', 2022, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-02', 2022, 1, 'Tablet', 'UK', 'Socks', 4),
('2022-10-03', 2022, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-03', 2022, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-03', 2022, 1, 'Tablet', 'UK', 'Socks', 5),
('2022-10-08', 2022, 2, 'Desktop', 'UK', 'Shoes', 7),
('2022-10-08', 2022, 2, 'Mobile', 'UK', 'Shoes', 8),
('2022-10-08', 2022, 2, 'Tablet', 'UK', 'Shoes', 4),
('2022-10-09', 2022, 2, 'Desktop', 'UK', 'Shoes', 6),
('2022-10-09', 2022, 2, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-09', 2022, 2, 'Tablet', 'UK', 'Shoes', 8),
('2022-10-10', 2022, 2, 'Desktop', 'UK', 'Shoes', 12),
('2022-10-10', 2022, 2, 'Mobile', 'UK', 'Shoes', 22),
('2022-10-10', 2022, 2, 'Tablet', 'UK', 'Shoes', 5),
('2022-10-08', 2022, 2, 'Desktop', 'UK', 'Socks', 4),
('2022-10-08', 2022, 2, 'Mobile', 'UK', 'Socks', 1),
('2022-10-08', 2022, 2, 'Tablet', 'UK', 'Socks', 2),
('2022-10-09', 2022, 2, 'Desktop', 'UK', 'Socks', 3),
('2022-10-09', 2022, 2, 'Mobile', 'UK', 'Socks', 8),
('2022-10-09', 2022, 2, 'Tablet', 'UK', 'Socks', 9),
('2022-10-10', 2022, 2, 'Desktop', 'UK', 'Socks', 5),
('2022-10-10', 2022, 2, 'Mobile', 'UK', 'Socks', 4),
('2022-10-10', 2022, 2, 'Tablet', 'UK', 'Socks', 13),
('2022-10-01', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-01', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-01', 2023, 1, 'Tablet', 'UK', 'Shoes', 1),
('2022-10-02', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-02', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-02', 2023, 1, 'Tablet', 'UK', 'Shoes', 4),
('2022-10-03', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-03', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-03', 2023, 1, 'Tablet', 'UK', 'Shoes', 5),
('2022-10-01', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-01', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-01', 2023, 1, 'Tablet', 'UK', 'Socks', 1),
('2022-10-02', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-02', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-02', 2023, 1, 'Tablet', 'UK', 'Socks', 4),
('2022-10-03', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-03', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-03', 2023, 1, 'Tablet', 'UK', 'Socks', 5),
('2022-10-08', 2023, 2, 'Desktop', 'UK', 'Shoes', 7),
('2022-10-08', 2023, 2, 'Mobile', 'UK', 'Shoes', 8),
('2022-10-08', 2023, 2, 'Tablet', 'UK', 'Shoes', 4),
('2022-10-09', 2023, 2, 'Desktop', 'UK', 'Shoes', 6),
('2022-10-09', 2023, 2, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-09', 2023, 2, 'Tablet', 'UK', 'Shoes', 8),
('2022-10-10', 2023, 2, 'Desktop', 'UK', 'Shoes', 12),
('2022-10-10', 2023, 2, 'Mobile', 'UK', 'Shoes', 22),
('2022-10-10', 2023, 2, 'Tablet', 'UK', 'Shoes', 5),
('2022-10-08', 2023, 2, 'Desktop', 'UK', 'Socks', 4),
('2022-10-08', 2023, 2, 'Mobile', 'UK', 'Socks', 1),
('2022-10-08', 2023, 2, 'Tablet', 'UK', 'Socks', 2),
('2022-10-09', 2023, 2, 'Desktop', 'UK', 'Socks', 3),
('2022-10-10', 2023, 2, 'Desktop', 'UK', 'Socks', 5),
('2022-10-10', 2023, 2, 'Mobile', 'UK', 'Socks', 4),
('2022-10-10', 2023, 2, 'Tablet', 'UK', 'Socks', 13)
as a (date, fiscalyearno, fiscalweekno, devicegroup, usercountry, productgroup, bookings)
),
resultsset as (
select date
, fiscalyearno
, fiscalweekno
, devicegroup
, usercountry
, productgroup
, sum(bookings) as totalbookings
, dense_rank()
over
(partition by fiscalyearno, devicegroup, usercountry, productgroup order by date desc, fiscalweekno desc) as fytddr
, sum(totalbookings)
over
(partition by fiscalyearno, devicegroup, usercountry, productgroup order by date, fiscalweekno asc) as fytdbookings
from rawdata
group by 1,2,3,4,5,6
)
//select * from resultsset
//order by 1,2,3,4,5,6
select fiscalyearno
, fiscalweekno
, sum(totalbookings) as totalbookings
, sum(iff(fytddr = 1, fytdbookings, 0)) as fytdbookings
from resultsset
group by 1,2
order by 2
As you can see below, the dense_rank approach works as long as the dimensions are consistent in each time period, with values populated. Where this falls down is if we have a product in an earlier period (i.e. FW1) which is not in the latest period (i.e. FW2). Below you can see that splits the FYTD value into 6 and 161 for FW1 and FW2 respectively, whereas I am requiring the full 167 in FW2, as that is the correct FYTD Total as of FW2.
It's not overcounting. You're summing a running sum. If you have a running sum on 1, 2, 3, you'll get 1, 3, 6. If you have a sum of that running sum you'll get 10. I'm not sure why you'd want a running sum and then aggregate it. It wipes out the detail the running sum provides. Also, in order to get through the aggregation the SQL is feeding totalbookings (an alias for an aggregated sum) into the sum window function. That's interesting at best and unpredictable at worst.
You can see the sum of the running sum issue if you shortcut your CTE and look at the results of the window function:
with rawdata as (
select * from
values
('2022-10-01', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-01', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-01', 2023, 1, 'Tablet', 'UK', 'Shoes', 1),
('2022-10-02', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-02', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-02', 2023, 1, 'Tablet', 'UK', 'Shoes', 4),
('2022-10-03', 2023, 1, 'Desktop', 'UK', 'Shoes', 1),
('2022-10-03', 2023, 1, 'Mobile', 'UK', 'Shoes', 2),
('2022-10-03', 2023, 1, 'Tablet', 'UK', 'Shoes', 5),
('2022-10-01', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-01', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-01', 2023, 1, 'Tablet', 'UK', 'Socks', 1),
('2022-10-02', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-02', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-02', 2023, 1, 'Tablet', 'UK', 'Socks', 4),
('2022-10-03', 2023, 1, 'Desktop', 'UK', 'Socks', 1),
('2022-10-03', 2023, 1, 'Mobile', 'UK', 'Socks', 2),
('2022-10-03', 2023, 1, 'Tablet', 'UK', 'Socks', 5)
as a (date, fiscalyearno, fiscalweekno, devicegroup, usercountry, productgroup, bookings)
),
resultsset as (
select date
, fiscalyearno
, fiscalweekno
, devicegroup
, usercountry
, productgroup
-- , sum(bookings) as totalbookings
, sum(bookings)
over
(partition by fiscalyearno, fiscalweekno, devicegroup, usercountry, productgroup order by date asc) as fytdbookings
from rawdata
-- group by 1,2,3,4,5,6
)
select * from resultsset;
DATE
FISYNO
FISWEEKNO
DEVGRP
USRCNTRY
PRODGRP
FYTDBOOK
2022-10-01
2023
1
Desktop
UK
Shoes
1
2022-10-01
2023
1
Mobile
UK
Shoes
2
2022-10-01
2023
1
Tablet
UK
Shoes
1
2022-10-02
2023
1
Desktop
UK
Shoes
2
2022-10-02
2023
1
Mobile
UK
Shoes
4
2022-10-02
2023
1
Tablet
UK
Shoes
5
2022-10-03
2023
1
Desktop
UK
Shoes
3
2022-10-03
2023
1
Mobile
UK
Shoes
6
2022-10-03
2023
1
Tablet
UK
Shoes
10
2022-10-01
2023
1
Desktop
UK
Socks
1
2022-10-01
2023
1
Mobile
UK
Socks
2
2022-10-01
2023
1
Tablet
UK
Socks
1
2022-10-02
2023
1
Desktop
UK
Socks
2
2022-10-02
2023
1
Mobile
UK
Socks
4
2022-10-02
2023
1
Tablet
UK
Socks
5
2022-10-03
2023
1
Desktop
UK
Socks
3
2022-10-03
2023
1
Mobile
UK
Socks
6
2022-10-03
2023
1
Tablet
UK
Socks
10
Notice the running sum is in some cases higher than any individual values, so that explains the higher total when summing the running sum.
As far as how to fix this, I'm not sure. It would help to have a desired output table because as previously mentioned, calculating a running sum only to aggregate it is something that loses the detail of that running sum.

How to plot an nested array using matplotlib or seaborn

I have an array from a DataFrame and I am struggling to create a multi-line graph to display the evolution of each individual over time (2007-2009) in relation to their rating. And I need to use scalarFormatter and FormatStrFormatter. Could someone please explain to me how to do this,please?
sample data:
columns[rank_date,rank,name,country,rating,games,birthYear,date,year]
array([['27-01-07', 6, 'Peter', 'HUN', 2749, 9, 1979,
Timestamp('2007-01-27 00:00:00'), 2007],
['27-01-07', 7, 'Levon', 'ARM', 2744, 13, 1982,
Timestamp('2007-01-27 00:00:00'), 2007],
['27-01-07', 8, 'Alexander', 'RUS', 2600, 15,
1977, Timestamp('2007-01-27 00:00:00'), 2007],
['27-01-07', 6, 'Peter', 'HUN', 2980, 9, 1979,
Timestamp('2008-01-27 00:00:00'), 2007],
['27-01-07', 7, 'Levon','ARM', 2880, 13, 1982,
Timestamp('2008-01-27 00:00:00'), 2007],
['27-01-07', 8, 'Alexander', 'RUS', 2620, 15,
1977, Timestamp('2008-01-27 00:00:00'), 2007],
['27-01-07', 6, 'Peter', 'HUN', 2900, 9, 1979,
Timestamp('2009-01-27 00:00:00'), 2007],
['27-01-07', 7, 'Levon','ARM', 2800, 13, 1982,
Timestamp('2009-01-27 00:00:00'), 2007],
['27-01-07', 8, 'Alexander', 'RUS', 2750, 15,
1977, Timestamp('2009-01-27 00:00:00'), 2007]], dtype=object)

MatPlotLib with custom dictionaries convert to graphs

Problem:
I have a list of ~108 dictionaries named list_of_dictionary and I would like to use Matplotlib to generate line graphs.
The dictionaries have the following format (this is one of 108):
{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}
Understanding the dictionary:
The car 2014 Land Rover Range Rover Sport was priced at:
59990 on datetime.datetime(2020, 1, 22, 11, 19, 26)
59890 on datetime.datetime(2020, 1, 23, 13, 12, 33)
60990 on datetime.datetime(2020, 1, 28, 12, 39, 24)
62990 on datetime.datetime(2020, 1, 29, 18, 39, 36)
59990 on datetime.datetime(2020, 1, 30, 18, 41, 31)
59690 on datetime.datetime(2020, 2, 1, 12, 39, 7)
Question:
With this structure how could one create mini-graphs with matplotlib (say 11 rows x 10 columns)?
Where each mini-graph will have:
the title of the graph frome car
x-axis from the datetime
y-axis from the price
What I have tried:
df = pd.DataFrame(list_of_dictionary)
df = df.set_index('datetime')
print(df)
I don't know what to do thereafter...
Relevant Research:
Plotting a column containing lists using Pandas
Pandas column of lists, create a row for each list element
I've read these multiple times, but the more I read it, the more confused I get :(.
I don't know if it's sensible to try and plot that many plots on a figure. You'll have to make some choices to be able to fit all the axes decorations on the page (titles, axes labels, tick labels, etc...).
but the basic idea would be this:
car_data = [{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}]*108
fig, axs = plt.subplots(11,10, figsize=(20,22)) # adjust figsize as you please
for car,ax in zip(car_data, axs.flat):
ax.plot(car["datetime"], car['price'], '-')
ax.set_title(car['car'])
Ideally, all your axes could share the same x and y axes so you could have the labels only on the left-most and bottom-most axes. This is taken care of automatically if you add sharex=True and sharey=True to subplots():
fig, axs = plt.subplots(11,10, figsize=(20,22), sharex=True, sharey=True) # adjust figsize as you please

how can i convert index to datetime in pandas?

i have index like this
Index(['00:00:00', '00:15:00', '00:30:00', '00:45:00', '01:00:00', '01:15:00',
'01:30:00', '01:45:00', '02:00:00', '02:15:00', '02:30:00', '02:45:00'],
dtype='object', name='time')
and need to convert it to datetime %H:%M:%S format
How can i change it ?
I think you need:
idx = pd.Index(['00:00:00', '00:15:00', '00:30:00', '00:45:00', '01:00:00', '01:15:00',
'01:30:00', '01:45:00', '02:00:00', '02:15:00', '02:30:00', '02:45:00'],
dtype='object', name='time')
For DatetimeIndex need some date, by default is added today:
print (pd.to_datetime(idx))
DatetimeIndex(['2018-01-25 00:00:00', '2018-01-25 00:15:00',
'2018-01-25 00:30:00', '2018-01-25 00:45:00',
'2018-01-25 01:00:00', '2018-01-25 01:15:00',
'2018-01-25 01:30:00', '2018-01-25 01:45:00',
'2018-01-25 02:00:00', '2018-01-25 02:15:00',
'2018-01-25 02:30:00', '2018-01-25 02:45:00'],
dtype='datetime64[ns]', name='time', freq=None)
Or is possible add custom date:
print (pd.to_datetime('2015-01-01 ' + idx))
DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 00:15:00',
'2015-01-01 00:30:00', '2015-01-01 00:45:00',
'2015-01-01 01:00:00', '2015-01-01 01:15:00',
'2015-01-01 01:30:00', '2015-01-01 01:45:00',
'2015-01-01 02:00:00', '2015-01-01 02:15:00',
'2015-01-01 02:30:00', '2015-01-01 02:45:00'],
dtype='datetime64[ns]', freq=None)
Another solution is create TimedeltaIndex:
print (pd.to_timedelta(idx))
TimedeltaIndex(['00:00:00', '00:15:00', '00:30:00', '00:45:00', '01:00:00',
'01:15:00', '01:30:00', '01:45:00', '02:00:00', '02:15:00',
'02:30:00', '02:45:00'],
dtype='timedelta64[ns]', name='time', freq=None)

Selecting a JSON object of arrays from a PostgreSQL table

I have prepared a simple SQL Fiddle demonstrating my problem -
In a two-player game I store user chats in a table:
CREATE TABLE chat(
gid integer, /* game id */
uid integer, /* user id */
created timestamptz,
msg text
);
Here I fill the table with a simple test data:
INSERT INTO chat(gid, uid, created, msg) VALUES
(10, 1, NOW() + interval '1 min', 'msg 1'),
(10, 2, NOW() + interval '2 min', 'msg 2'),
(10, 1, NOW() + interval '3 min', 'msg 3'),
(10, 2, NOW() + interval '4 min', 'msg 4'),
(10, 1, NOW() + interval '5 min', 'msg 5'),
(10, 2, NOW() + interval '6 min', 'msg 6'),
(20, 3, NOW() + interval '7 min', 'msg 7'),
(20, 4, NOW() + interval '8 min', 'msg 8'),
(20, 4, NOW() + interval '9 min', 'msg 9');
And I can fetch the data by running the SELECT query:
SELECT ARRAY_TO_JSON(
COALESCE(ARRAY_AGG(ROW_TO_JSON(x)),
array[]::json[])) FROM (
SELECT
gid,
uid,
EXTRACT(EPOCH FROM created)::int AS created,
msg
FROM chat) x;
which returns me a JSON-array:
[{"gid":10,"uid":1,"created":1514813043,"msg":"msg 1"},
{"gid":10,"uid":2,"created":1514813103,"msg":"msg 2"},
{"gid":10,"uid":1,"created":1514813163,"msg":"msg 3"},
{"gid":10,"uid":2,"created":1514813223,"msg":"msg 4"},
{"gid":10,"uid":1,"created":1514813283,"msg":"msg 5"},
{"gid":10,"uid":2,"created":1514813343,"msg":"msg 6"},
{"gid":20,"uid":3,"created":1514813403,"msg":"msg 7"},
{"gid":20,"uid":4,"created":1514813463,"msg":"msg 8"},
{"gid":20,"uid":4,"created":1514813523,"msg":"msg 9"}]
This is close to what I need, however I would like to use "gid" as JSON object properties and the rest data as values in that object:
{"10": [{"uid":1,"created":1514813043,"msg":"msg 1"},
{"uid":2,"created":1514813103,"msg":"msg 2"},
{"uid":1,"created":1514813163,"msg":"msg 3"},
{"uid":2,"created":1514813223,"msg":"msg 4"},
{"uid":1,"created":1514813283,"msg":"msg 5"},
{"uid":2,"created":1514813343,"msg":"msg 6"}],
"20": [{"uid":3,"created":1514813403,"msg":"msg 7"},
{"uid":4,"created":1514813463,"msg":"msg 8"},
{"uid":4,"created":1514813523,"msg":"msg 9"}]}
Is that please doable by using the PostgreSQL JSON functions?
I think you're looking for json_object_agg for that last step. Here is how I'd do it:
SELECT json_object_agg(
gid::text, array_to_json(ar)
)
FROM (
SELECT gid,
array_agg(
json_build_object(
'uid', uid,
'created', EXTRACT(EPOCH FROM created)::int,
'msg', msg)
) AS ar
FROM chat
GROUP BY gid
) x
;
I left off the coalesce because I don't think an empty array is possible. But it should be easy to put it back if your real query is something more complicated that could require it.