Selecting a JSON object of arrays from a PostgreSQL table - sql

I have prepared a simple SQL Fiddle demonstrating my problem -
In a two-player game I store user chats in a table:
CREATE TABLE chat(
gid integer, /* game id */
uid integer, /* user id */
created timestamptz,
msg text
);
Here I fill the table with a simple test data:
INSERT INTO chat(gid, uid, created, msg) VALUES
(10, 1, NOW() + interval '1 min', 'msg 1'),
(10, 2, NOW() + interval '2 min', 'msg 2'),
(10, 1, NOW() + interval '3 min', 'msg 3'),
(10, 2, NOW() + interval '4 min', 'msg 4'),
(10, 1, NOW() + interval '5 min', 'msg 5'),
(10, 2, NOW() + interval '6 min', 'msg 6'),
(20, 3, NOW() + interval '7 min', 'msg 7'),
(20, 4, NOW() + interval '8 min', 'msg 8'),
(20, 4, NOW() + interval '9 min', 'msg 9');
And I can fetch the data by running the SELECT query:
SELECT ARRAY_TO_JSON(
COALESCE(ARRAY_AGG(ROW_TO_JSON(x)),
array[]::json[])) FROM (
SELECT
gid,
uid,
EXTRACT(EPOCH FROM created)::int AS created,
msg
FROM chat) x;
which returns me a JSON-array:
[{"gid":10,"uid":1,"created":1514813043,"msg":"msg 1"},
{"gid":10,"uid":2,"created":1514813103,"msg":"msg 2"},
{"gid":10,"uid":1,"created":1514813163,"msg":"msg 3"},
{"gid":10,"uid":2,"created":1514813223,"msg":"msg 4"},
{"gid":10,"uid":1,"created":1514813283,"msg":"msg 5"},
{"gid":10,"uid":2,"created":1514813343,"msg":"msg 6"},
{"gid":20,"uid":3,"created":1514813403,"msg":"msg 7"},
{"gid":20,"uid":4,"created":1514813463,"msg":"msg 8"},
{"gid":20,"uid":4,"created":1514813523,"msg":"msg 9"}]
This is close to what I need, however I would like to use "gid" as JSON object properties and the rest data as values in that object:
{"10": [{"uid":1,"created":1514813043,"msg":"msg 1"},
{"uid":2,"created":1514813103,"msg":"msg 2"},
{"uid":1,"created":1514813163,"msg":"msg 3"},
{"uid":2,"created":1514813223,"msg":"msg 4"},
{"uid":1,"created":1514813283,"msg":"msg 5"},
{"uid":2,"created":1514813343,"msg":"msg 6"}],
"20": [{"uid":3,"created":1514813403,"msg":"msg 7"},
{"uid":4,"created":1514813463,"msg":"msg 8"},
{"uid":4,"created":1514813523,"msg":"msg 9"}]}
Is that please doable by using the PostgreSQL JSON functions?

I think you're looking for json_object_agg for that last step. Here is how I'd do it:
SELECT json_object_agg(
gid::text, array_to_json(ar)
)
FROM (
SELECT gid,
array_agg(
json_build_object(
'uid', uid,
'created', EXTRACT(EPOCH FROM created)::int,
'msg', msg)
) AS ar
FROM chat
GROUP BY gid
) x
;
I left off the coalesce because I don't think an empty array is possible. But it should be easy to put it back if your real query is something more complicated that could require it.

Related

How Do I Fix This Data Frame Error for this dataset on Covid?

read.csv('covid_deaths.csv', header = TRUE)
df <-covid_deaths
covid_deaths$Age Group <- factor(covid_deaths$Age Group, levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), labels = c('All Ages', 'Under One Year', '0-17 years', '1-4 years', '5-14 years', '15-24 years', '18-29 years', '25-34 years', '30-39 years', '35-44 years', '40-49 years', '45-54 years', '50-64 years', '55-64 years', '65-74 years', '75-84 years', '85 years and older'))
#increasing the max.print limit to see more on console
options(max.print = 60)
#pulling up the age group and COVID_19 deaths
covid_deaths[, c('Age Group','COVID_19 Deaths')]
#making a vector out of the relevant columns
df2[,c( 9,10)]
vec1 <- covid_deaths$Age Group
vec2 <- covid_deaths$COVID_19 Deaths
summary(covid_deaths)
m <- covid_deaths(matrix(sample(100, 20, replace = TRUE), ncol = 9))
I am trying to get the descriptive statisitics mean, standard deviation and variance. I keep on getting an error that
Error in `$<-.data.frame`(`*tmp*`, `Age Group`, value = integer(0)) :
replacement has 0 rows, data has 107406

How to view hourly results during the time specified in Postgres

I want to create a query that returns a list of timezones as a result when the following conditions are specified.
select day, hour
...
where
target_datetime between '2021-08-01 00:00:00' and '2021-08-01 23:59:59'
[
{ 'day': '2021-08-01', 'hour': 0 },
{ 'day': '2021-08-01', 'hour': 1 },
{ 'day': '2021-08-01', 'hour': 2 },
...
{ 'day': '2021-08-01', 'hour': 23 }
]
How can I get this?
Use generate_series to create the records from the time interval, and jsonb_build_objebt with jsonb_agg to create your json document:
SELECT
jsonb_agg(
jsonb_build_object(
'day',tm::date,
'hour',EXTRACT(HOUR FROM tm)))
FROM generate_series('2021-08-01 00:00:00'::timestamp,
'2021-08-01 23:59:59'::timestamp,
interval '1 hour') j (tm);
Demo: db<>fiddle

How to SQL conver to dataframe

I want to convert to SQL to dataframe.\
SELECT day,
MAX(id),
MAX(if(device = 'Mobile devices with full browsers', 'mobile', 'pc')),
AVG(replace(replace(search_imprshare, '< 10%', '10'), '%', '') / 100),
REPLACE(SUBSTRING(SUBSTRING_INDEX(add_trackingcode, '_', 1), CHAR_LENGTH(SUBSTRING_INDEX(add_trackingcode, '_', 1 - 1)) + 2), add_trackingcode, '')
FROM MY_TEST_TABLE
GROUP BY day
But I can only do below that.
I don't know how to work on '???'.
df_data= df_data.groupby(['day').agg(
{
'id': np.max,
'device ' : ???,
'percent' : ???,
'tracking' : ???
}
)
How should I do it?

Pure PostgreSQL replacement for PL/R sample() function?

Our new database does not (and will not) support PL/R usage, which we rely on extensively to implement a random weighted sample function:
CREATE OR REPLACE FUNCTION sample(
ids bigint[],
size integer,
seed integer DEFAULT 1,
with_replacement boolean DEFAULT false,
probabilities numeric[] DEFAULT NULL::numeric[])
RETURNS bigint[]
LANGUAGE 'plr'
COST 100
VOLATILE
AS $BODY$
set.seed(seed)
ids = as.integer(ids)
if (length(ids) == 1) {
s = rep(ids,size)
} else {
s = sample(ids,size, with_replacement,probabilities)
}
return(s)
$BODY$;
Is there a purely SQL approach to this same function? This post shows an approach that selects a single random row, but does not have the functionality of sampling multiple groups at once.
As far as I know, SQL Fiddle does not support PLR, so see below for a quick replication example:
CREATE TABLE test
(category text, uid integer, weight numeric)
;
INSERT INTO test
(category, uid, weight)
VALUES
('a', 1, 45),
('a', 2, 10),
('a', 3, 25),
('a', 4, 100),
('a', 5, 30),
('b', 6, 20),
('b', 7, 10),
('b', 8, 80),
('b', 9, 40),
('b', 10, 15),
('c', 11, 20),
('c', 12, 10),
('c', 13, 80),
('c', 14, 40),
('c', 15, 15)
;
SELECT category,
unnest(diffusion_shared.sample(array_agg(uid ORDER BY uid),
1,
1,
True,
array_agg(weight ORDER BY uid))
) as uid
FROM test
WHERE category IN ('a', 'b')
GROUP BY category;
Which outputs:
category uid
'a' 4
'b' 8
Any ideas?

How to get previous value from a log

I have a situation where at some point in the past some records in a table were modified to have duplicated information.
Consider an example below:
create table #CustomerExample
(
CustomerRecordId int,
CustomerId int,
CustomerName varchar(255),
CurrentCustomerValue varchar(255)
);
create table #CustomerExampleLog
(
LogId int,
CustomerRecordId int,
CustomerId int,
LogCreateDate datetime,
NewCustomerValue varchar(255)
);
insert #CustomerExample
values
(1, 100, 'Customer 1', 'Value X'),
(2, 100, 'Customer 1', 'Value X'),
(3, 200, 'Customer 2', 'Value Z'),
(4, 200, 'Customer 2', 'Value Z'),
(5, 200, 'Customer 2', 'Value Z');
insert #CustomerExampleLog
values
(1, 1, 100, '1/1/2014', 'Value B'),
(2, 1, 100, '2/1/2014', 'Value C'),
(3, 1, 100, '3/1/2014', 'Value B'),
(4, 1, 100, '4/1/2014', 'Value X'),
(5, 1, 100, '5/1/2014', 'Value X'),
(6, 1, 100, '6/1/2014', 'Value X'),
(7, 2, 100, '1/1/2014', 'Value D'),
(8, 2, 100, '2/1/2014', 'Value E'),
(9, 2, 100, '3/1/2014', 'Value F'),
(10, 2, 100, '4/1/2014', 'Value G'),
(11, 2, 100, '5/1/2014', 'Value X'),
(12, 2, 100, '6/1/2014', 'Value X'),
(13, 3, 200, '1/2/2014', 'Value A'),
(14, 3, 200, '1/3/2014', 'Value A'),
(15, 3, 200, '1/4/2014', 'Value B'),
(16, 3, 200, '1/5/2014', 'Value Z'),
(17, 4, 200, '1/2/2014', 'Value A'),
(18, 4, 200, '1/3/2014', 'Value A'),
(19, 4, 200, '1/4/2014', 'Value Z');
Originally "Customer 1" and "Customer 2" had different values in CustomerValue column for each record in [#CustomerExample] table. However, due to lack of a proper unique constraint, a bunch of "bad" UPDATE statements resulted in duplicated info. The updates were logged to [#CustomerExampleLog] table, which contains only the ID of the updated record, the update date, and the new value. My goal is to re-trace the log entries and revert one of the duplicates to it's "last known good" value before it became a dupe.
Ideally, I want to revert the CurrentCustomerValue for one of the dupes to a previous value. In the above example it would be the LogId=3 for CustomerRecordId=1, and LogId=15 for CustomerRecordId=3.
I am completely stumped.
Do you want something like this?
SELECT *
, prev_value = (
SELECT TOP 1 NewCustomerValue
FROM #CustomerExampleLog l
WHERE c.CustomerRecordId = l.CustomerRecordId
AND l.NewCustomerValue <> c.CurrentCustomerValue
ORDER BY LogCreateDate DESC
)
FROM #CustomerExample c
If you are looking to do it selectively (one record at a time), this would update the value.
UPDATE Customerexample
SET Currentcustomervalue = a.Newcustomervalue
FROM Customerexamplelog a
WHERE Logid IN(SELECT MAX(Logid)
FROM Customerexamplelog L
INNER JOIN Customerexample C ON L.Customerrecordid = C.Customerrecordid
AND L.Newcustomervalue <> C.Currentcustomervalue
WHERE L.Customerrecordid = #custid);