How to build SQL for get grouped data by date - sql

I have data in Postgres DataBase like this
| id | name | start_date | end_date |
1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
I need SQL which return me response group_by date and If Event duration (end_date, start_date) took 2 days i need return him twice in two days array and this all should be order by date. So response should look like this.
{
"2018-09-14": [
{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
}],
"2018-09-15": [{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
},
{
"id": 2,
"name": "Event2",
"start_date": "2018-09-15 14:22:00",
"end_date": "2018-09-15 15:22:00",
}]
}
Could you help me with this SQL?

demo: db<>fiddle
SELECT
jsonb_object_agg(dates, data_array)
FROM (
SELECT
dates,
jsonb_agg(data) as data_array
FROM (
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
row_to_json(events)::jsonb as data
FROM
events
)s
GROUP BY dates
) s
Convert table into json object with row_to_json.
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
The result so far:
dates data
2018-09-14 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
DISTINCT eliminates all tied elements where start_date == end_date.
grouping by the dates, aggregating the json elements into an json array (jsonb_agg)
after all, grouping the table into json elements (jsonb_object_agg) with key == date and value == json array

If you just want rows you only need the these steps:
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
DISTINCT eliminates all tied elements where start_date == end_date.
Query:
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
*
FROM
events
Result:
dates id name start_date end_date
2018-09-14 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
demo:db<>fiddle

Related

Laravel query sum group by week and get 0 for weeks not existent in the dataset

Hi I am trying to get sum of quantity group by week of current year.
Here is my query which is working
Sale::selectRaw('sum(qty) as y')
->selectRaw(DB::raw('WEEK(order_date,1) as x'))
->whereYear('order_date',Carbon::now()->format('Y'))
->groupBy('x')
->orderBy('x', 'ASC')
->get();
The response I get is look like this. where x is the week number and y is the sum value.
[
{
"y": "50",
"x": 2
},
{
"y": "4",
"x": 14
}
]
I want to get 0 values for the week that doesn't have any value for y
My desired result should be like this
[
{
"y": "0",
"x": 1
},
{
"y": "50",
"x": 2
},
...
...
...
{
"y": "4",
"x": 14
}
]

How to select a field of a JSON object coming from the WHERE condition

I have this table
id name json
1 alex {"type": "user", "items": [ {"name": "banana", "color": "yellow"}, {"name": "apple", "color": "red"} ] }
2 peter {"type": "user", "items": [ {"name": "watermelon", "color": "green"}, {"name": "pepper", "color": "red"} ] }
3 john {"type": "user", "items": [ {"name": "tomato", "color": "red"} ] }
4 carl {"type": "user", "items": [ {"name": "orange", "color": "orange"}, {"name": "nut", "color": "brown"} ] }
Important, each json object can have different number of "items", but what I need is the "product name" of JUST the object that matched in the WHERE condition.
My desired output would be the two first columns and just the name of the item, WHERE the color is like %red%:
id name fruit
1 alex apple
2 peter pepper
3 john tomato
select id, name, ***** (this is what I don't know) FROM table
where JSON_EXTRACT(json, "$.items[*].color") like '%red%'
I would recommend json_table(), if you are running MySQL 8.0:
select t.id, t.name, x.name as fruit
from mytable t
cross join json_table(
t.js,
'$.items[*]' columns (name varchar(50) path '$.name', color varchar(50) path '$.color')
) x
where x.color = 'red'
This function is not implemented in MariaDB. We can unnest manually with the help of a numbers table:
select t.id, t.name,
json_unquote(json_extract(t.js, concat('$.items[', x.num, '].name'))) as fruit
from mytable t
inner join (select 0 as num union all select 1 union all select 2 ...) x(num)
on x.num < json_length(t.js, '$.items')
where json_unquote(json_extract(t.js, concat('$.items[', x.num, '].color'))) = 'red'
You can use JSON_EXTRACT() function along with Recursive Common Table Expression in order to generate rows dynamically such as
WITH RECURSIVE cte AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM cte
WHERE cte.n < (SELECT MAX(JSON_LENGTH(json)) FROM t )
)
SELECT id, name,
JSON_UNQUOTE(JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].name'))) AS fruit
FROM cte
JOIN t
WHERE JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].color')) = "red"
Demo

Combining separate temporal measurement series

I have a data set that combines two temporal measurement series with one row per measurement
time: 1, measurement: a, value: 5
time: 2, measurement: b, value: false
time: 10, measurement: a, value: 2
time: 13, measurement: b, value: true
time: 20, measurement: a, value: 4
time: 24, measurement: b, value: true
time: 30, measurement: a, value: 6
time: 32, measurement: b, value: false
in a visualization using Vega lite, I'd like to combine the measurement series and encode measurement a and b in a single visualization without simply layering their representation on a temporal axis but representing their value in a single encoding spec.
either measurement a values need to be interpolated and added as a new value to rows of measurement b
eg:
time: 2, measurement: b, value: false, interpolatedMeasurementA: 4.6667
or the other way around, which leaves the question of how to interpolate a boolean. maybe closest value by time, or simpler: last value
eg:
time: 30, measurement: a, value: 6, lastValueMeasurementB: true
I suppose this could be done either query side in which case this question would be regarding indexDB Flux query language
or this could be done on the visualization side in which case this would be regarding vega-lite
There's not any true linear interpolation schemes built-in to Vega-Lite (though the loess transform comes close), but you can achieve roughly what you wish with a window transform.
Here is an example (view in editor):
{
"data": {
"values": [
{"time": 1, "measurement": "a", "value": 5},
{"time": 2, "measurement": "b", "value": false},
{"time": 10, "measurement": "a", "value": 2},
{"time": 13, "measurement": "b", "value": true},
{"time": 20, "measurement": "a", "value": 4},
{"time": 24, "measurement": "b", "value": true},
{"time": 30, "measurement": "a", "value": 6},
{"time": 32, "measurement": "b", "value": false}
]
},
"transform": [
{
"calculate": "datum.measurement == 'a' ? datum.value : null",
"as": "measurement_a"
},
{
"window": [
{"op": "mean", "field": "measurement_a", "as": "interpolated"}
],
"sort": [{"field": "time"}],
"frame": [1, 1]
},
{"filter": "datum.measurement == 'b'"}
],
"mark": "line",
"encoding": {
"x": {"field": "time"},
"y": {"field": "interpolated"},
"color": {"field": "value"}
}
}
This first uses a calculate transform to isolate the values to be interpolated, then a window transform that computes the mean over adjacent values (frame: [1, 1]), then a filter transform to isolate interpolated rows.
If you wanted to go the other route, you could do a similar sequence of transforms targeting the boolean value instead.

BigQuery GIS table to GeoJSON type Feature Collection with SQL and/or Python

I have some BigQuery tables with spatial data (lon, lat, point, and linestring) representing boat tracks.
I'm trying to get them into a GeoJSON for an API output.
I have one method that uses Python to query my table into a pandas dataframe, which I can then use a function to create a feature collection.
But, there's so much data/day that it'd be ideal if I could perform some kind of efficient linestring union and simplify.
My current method:
# Query BigQuery table
sql = """SELECT
lon, lat, lead_lon, lead_lat,
CAST(CAST(timestamp AS DATE) as string) as date,
UNIX_SECONDS(timestamp) as unix_secs,
CAST(ship_num AS STRING) AS ship_num,
op,
CAST(knots AS FLOAT64) AS knots,
point,
linestring
FROM `ship_segments`
WHERE timestamp BETWEEN '2020-04-16' AND '2020-04-17';"""
# Make into pandas dataframe
df = client.query(sql).to_dataframe()
#df to geojson fn
def data2geojson(df):
features = []
insert_features = lambda X: features.append(
geojson.Feature(geometry=geojson.LineString(([X["lead_lon"], X["lead_lat"], X["knots"], X["unix_secs"]],
[X["lon"], X["lat"], X["knots"], X["unix_secs"]])),
properties=dict(date=X["date"],
mmsi=X["ship_num"],
operator=X["op"]
)))
df.apply(insert_features, axis=1)
geojson_obj = geojson.dumps(geojson.FeatureCollection(features, indent=2, sort_keys=True), sort_keys=True, ensure_ascii=False)
return(geojson_obj)
results = data2geojson(df)
This returns a GeoJSON:
{"features": [{"geometry": {"coordinates": [[-119.049945, 33.983277, 10.5502, 1587104709], [-119.034677, 33.975823, 10.5502, 1587104709]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-120.176933, 34.282107, 22.7005, 1587114969], [-120.144453, 34.275147, 22.7005, 1587114969]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.361737, 33.64647, 11.3283, 1587096305], [-118.356308, 33.643713, 11.3283, 1587096305]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.414667, 33.673013, 12.7684, 1587097278], [-118.411707, 33.671493, 12.7684, 1587097278]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-119.377783, 34.062612, 10.5456, 1587102119], [-119.384212, 34.064217, 10.5456, 1587102119]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "636018225", "operator": "Ocean Network Express Pte Ltd"}, "type": "Feature"}], "indent": 2, "sort_keys": true, "type": "FeatureCollection"}
But I'm trying something like:
select
ship_num,
date(timestamp) as date,
AVG(speed_knots) as avg_speed_knots,
st_union_agg(linestring) as multiline
from(
SELECT
*,
row_number() OVER w AS num,
ST_GeogPoint(lon,lat) as geom,
LEAD(ST_GeogPoint(lon,lat)) OVER w AS geom2,
ST_MAKELINE((ST_GeogPoint(lon,lat)), (LEAD(ST_GeogPoint(lon,lat)) OVER w)) AS linestring,
LEAD(STRING(timestamp), 0) OVER w AS t1,
LEAD(STRING(timestamp), 1) OVER w AS t2,
FROM
`ship_data`
where timestamp >= '2020-04-10'
WINDOW w AS (PARTITION BY ship_num ORDER BY timestamp)) AS q
group by
ship_num, date(timestamp);
This gives me multilinestrings in a table, but then I need to simplify and get them in a GeoJSON FeatureCollection output.
Any ideas that don't use PostGIS?

PLSQL: remove unwanted double quotes in a Json

I have a Json like this (it is contained in a clob variable):
{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}
{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}
…
I need to remove " from values of: id, val, sg1, sg2
Is it possibile?
For example, I need to obtain this:
{"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1}
{"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": , "sg2": 0}
…
If you are using Oracle 12 (R2?) or later then you can convert your JSON to the appropriate data types and then convert it back to JSON.
Oracle 18 Setup:
CREATE TABLE test_data ( value CLOB );
INSERT INTO test_data ( value )
VALUES ( '{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}' );
INSERT INTO test_data ( value )
VALUES ( '{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}' );
Query:
SELECT JSON_OBJECT(
'id' IS j.id,
'type' IS j.typ,
'val' IS j.val,
'cod' IS j.cod,
'sg1' IS j.sg1,
'sg2' IS j.sg2
) AS JSON
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS
id NUMBER(5,0) PATH '$.id',
typ VARCHAR2(10) PATH '$.type',
val NUMBER(5,0) PATH '$.val',
cod VARCHAR2(10) PATH '$.cod',
sg1 NUMBER(5,0) PATH '$.sg1',
sg2 NUMBER(5,0) PATH '$.sg2'
) j
Output:
| JSON |
| :--------------------------------------------------------------- |
| {"id":33,"type":"abc","val":2,"cod":null,"sg1":1,"sg2":1} |
| {"id":359,"type":"abcef","val":52,"cod":"aa","sg1":null,"sg2":0} |
Or, if you want to use regular expressions (you shouldn't if you have the choice and should use a proper JSON parser instead) then:
Query 2:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
value,
'"(id|val|sg1|sg2)": ""',
'"\1": "null"'
),
'"(id|val|sg1|sg2)": "(\d+|null)"',
'"\1": \2'
) AS JSON
FROM test_data
Output:
| JSON |
| :-------------------------------------------------------------------------- |
| {"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1} |
| {"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": null, "sg2": 0} |
db<>fiddle here