Is there a way to extract data from a map(varchar, varchar) column in SQL? - sql

The data is stored as map(varchar, varchar) and looks like this:
Date Info ID
2020-06-10 {"Price":"102.45", "Time":"09:31", "Symbol":"AAPL"} 10
2020-06-10 {"Price":"10.28", "Time":"12:31", "Symbol":"MSFT"} 10
2020-06-11 {"Price":"12.45", "Time":"09:48", "Symbol":"T"} 10
Is there a way to split up the info column and return a table where each entry has its own column?
Something like this:
Date Price Time Symbol ID
2020-06-10 102.45 09:31 AAPL 10
2020-06-10 10.28 12:31 MSFT 10
Note, there is the potential for the time column to not appear in every entry. For example, an entry can look like this:
Date Info ID
2020-06-10 {"Price":"10.28", "Symbol":"MSFT"} 10
In this case, I would like it to just fill it with a nan value
Thanks

You can use the subscript operator ([]) or the element_at function to access the values in the map. The difference between the two is that [] will fail with an error if the key is missing from the map.
WITH data(dt, info, id) AS (VALUES
(DATE '2020-06-10', map_from_entries(ARRAY[('Price', '102.45'), ('Time', '09:31'), ('Symbol','AAPL')]), 10),
(DATE '2020-06-10', map_from_entries(ARRAY[('Price', '10.28'), ('Time', '12:31'), ('Symbol','MSFT')]), 10),
(DATE '2020-06-11', map_from_entries(ARRAY[('Price', '12.45'), ('Time', '09:48'), ('Symbol','T')]), 10),
(DATE '2020-06-12', map_from_entries(ARRAY[('Price', '20.99'), ('Symbol','X')]), 10))
SELECT
dt AS "date",
element_at(info, 'Price') AS price,
element_at(info, 'Time') AS time,
element_at(info, 'Symbol') AS symbol,
id
FROM data
date | price | time | symbol | id
------------+--------+-------+--------+----
2020-06-10 | 102.45 | 09:31 | AAPL | 10
2020-06-10 | 10.28 | 12:31 | MSFT | 10
2020-06-11 | 12.45 | 09:48 | T | 10
2020-06-12 | 20.99 | NULL | X | 10

This answers the original version of the question.
If that is really a string, you can use regular expressions:
select t.*,
regexp_extract(info, '"Price":"([^"]*)"', 1) as price,
regexp_extract(info, '"Symbol":"([^"]*)"', 1) as symbol,
regexp_extract(info, '"Time":"([^"]*)"', 1) as time
from t;

Related

Group column by year range pyspark dataframe

I am trying to make my year column into year range instead of the value being a specific year. This is a movie dataset.
This is my code:
group =join_DF.groupby("relYear").avg("rating").withColumnRenamed("relYear", "year_range")
group.show()
This is what i have right now:
+----------+------------------+
|year_range| avg(rating)|
+----------+------------------+
| 1953|3.7107686857952533|
| 1903|3.0517241379310347|
| 1957|3.9994918537809254|
| 1897|2.9177215189873418|
| 1987|3.5399940908663594|
| 1956|3.7077949616896153|
| 2016|3.5318961695914055|
| 1936|3.8356813313560724|
| 2012|3.5490157995509457|
| |3.5151401495104130|
+----------+------------------+
This is what i want to achieve:
+-----------------+------------------+
| year_range | avg(rating) |
+-----------------+------------------+
| 1970-1979 |3.7773614199240319|
| 1960-1969 |3.8007319471419123|
| |3.5455419410410923|
| 1980-1989 |3.5778570247142313|
| 2000 onwards |3.5009940908663594|
| 1959 and earlier|3.8677949616896153|
| 1990-1999 |3.4618961695914055|
+-----------------+------------------+
The year_range that is null are movie title without release year stated.
1874 is earliest year and 2019 is the latest year.
You can divide the year by 10 to make your bin.
df = (df.withColumn('bin', F.floor(F.col('year_range') / 10))
.withColumn('bin', F.when(F.col('bin') >= 200, 200)
.when(F.col('bin') <= 195, 195)
.otherwise(F.col('bin')))
.groupby('bin')
.agg(F.avg('avg(rating)').alias('10_years_avg')))
Then, format your year_range column.
df = df.withColumn('year_range', F.when(F.col('bin') >= 200, F.lit('2000 onwards'))
.when(F.col('bin') <= 195, F.lit('1959 and earlier'))
.otherwise(F.concat(F.col('bin'), F.lit('0-'), F.col('bin'), F.lit('9'))))

SQL Aggregate Over Date Range

Whoever answers this thank you so, so much!
Here's a little snippet of my data:
DATE Score Multiplier Weighting
2022-01-05 3 4 7
2022-01-05 4 7 8
2022-01-06 5 2 4
2022-01-06 3 4 7
2022-01-06 4 7 8
2022-01-07 5 2 4
Each row of this data is when something "happened" and multiple events occur during the same day.
What I need to do is take the rolling average of this data over the past 3 months.
So for ONLY 2022-01-05, my weighted average (called ADJUSTED) would be:
DATE ADJUSTED
2022-01-05 [(3*4) + (4*7)]/(7+8)
Except I need to do this over the previous 3 months (so on Jan 5, 2022, I'd need the rolling weighted average -- using the "Weighting" column -- over the preceding 3 months; can also use previous 90 days if that makes it easier).
Not sure if this is a clear enough description, but would appreciate any help.
Thank you!
IF I have interpreted this correctly I believe a GROUP BY query will meet the need:
sample data
CREATE TABLE mytable(
DATE DATE NOT NULL
,Score INTEGER NOT NULL
,Multiplier INTEGER NOT NULL
,Weighting INTEGER NOT NULL
);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-05',3,4,7);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-05',4,7,8);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-06',5,2,4);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-06',3,4,7);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-06',4,7,8);
INSERT INTO mytable(DATE,Score,Multiplier,Weighting) VALUES ('2022-01-07',5,2,4);
query
select
date
, sum(score) sum_score
, sum(multiplier) sum_multiplier
, sum(weighting) sum_weight
, (sum(score)*1.0 + sum(multiplier)*1.0) / (sum(weighting)*1.0) ADJUSTED
from mytable
group by date
result
+------------+-----------+----------------+------------+-------------------+
| date | sum_score | sum_multiplier | sum_weight | ADJUSTED |
+------------+-----------+----------------+------------+-------------------+
| 2022-01-05 | 7 | 11 | 15 | 1.200000000000000 |
| 2022-01-06 | 12 | 13 | 19 | 1.315789473684210 |
| 2022-01-07 | 5 | 2 | 4 | 1.750000000000000 |
+------------+-----------+----------------+------------+-------------------+
db<>fiddle here
Note: I have not attempted to avoid possible divide by 0 or any NULL value problems in the query ablove

How to average data on periods from a table in SQL

I'm trying to average data on specific period of time and then, averaging a date between from these result.
Having data like:
value | datetime
-------+------------------------
15 | 2015-08-16 01:00:40+02
22 | 2015-08-16 01:01:40+02
16 | 2015-08-16 01:02:40+02
19 | 2015-08-16 01:03:40+02
21 | 2015-08-16 01:04:40+02
18 | 2015-08-16 01:05:40+02
29 | 2015-08-16 01:06:40+02
16 | 2015-08-16 01:07:40+02
16 | 2015-08-16 01:08:40+02
15 | 2015-08-16 01:09:40+02
I would like to obtain something like in one query:
value | datetime
-------+------------------------
18.6 | 2015-08-16 01:03:00+02
18.8 | 2015-08-16 01:08:00+02
where value corresponding with the first 5 initial values averaged and the datetime with the middle (or average) of the 5 intial datetimes. 5 representing the interval n.
I saw some posts that put me on the track with avg, group by and averaging date format in SQL but I'm still not able to find out what to do exactly.
I'm working under PostgreSQL 9.4
You would need to share more information but here is a way to do it. Here is more information on it : HERE
mysql> SELECT AVG(value), AVG(datetime)
FROM database.table
WHERE datetime > date1
AND datetime < date2;
Something like
SELECT
to_timestamp(round(AVG(EXTRACT(epoch from datetime)))) as middleDate,
avg(value) AS avgValue
FROM
myTable
GROUP BY
(id) / ((SELECT Count(*) FROM myTable) / 100);
filled roughtly my requirements, with 100 acting on averaged intervals length (globally equals to the outputed lines).

Oracle, Mysql, how to get average

How to get Average fuel consumption only using MySQL or Oracle:
SELECT te.fuelName,
zkd.fuelCapacity,
zkd.odometer
FROM ZakupKartyDrogowej zkd
JOIN TypElementu te
ON te.typElementu_Id = zkd.typElementu_Id
AND te.idFirmy = zkd.idFirmy
AND te.typElementu_Id IN (3,4,5)
WHERE zkd.idFirmy = 1054
AND zkd.kartaDrogowa_Id = 42
AND zkd.data BETWEEN to_date('2015-09-01','YYYY-MM-DD')
AND to_date('2015-09-30','YYYY-MM-DD');
Result of this query is:
fuelName | fuelCapacity | odometer | tanking
-------------------------------------------------
'ON' | 534 | 1284172 | 2015-09-29
'ON' | 571 | 1276284 | 2015-09-02
'ON' | 470 | 1277715 | 2015-09-07
'ON' | 580.01 | 1279700 | 2015-09-11
'ON' | 490 | 1281103 | 2015-09-17
'ON' | 520 | 1282690 | 2015-09-23
We can do it later in java or php, but want to get result right away from query. How should we modify above query to do that?
fuelCapacity is the number of liters of fuel that has been poured into cartank at gas station.
For one total average, what you need is the sum of the refills divided by the difference between the odometer readings at the start and the end, i.e. fuel used / distance travelled.
I don't have your table structure at hand, but this alteration to the select statement should do the trick:
select cast(sum(zkd.fuelCapacity) as float) / (max(zkd.odometer) - min(zkd.odometer)) as consumption ...
The cast(field AS float) does what the name implies, and typecasts the field as float, so the result will also be a float. (I do suspect that your fuelCapacity field is a float because there is one float value in your example, but this will make sure.)

Group records by time

I have a table containing a datetime column and some misc other columns. The datetime column represents an event happening. It can either contains a time (event happened at that time) or NULL (event didn't happen)
I now want to count the number of records happening in specific intervals (15 minutes), but do not know how to do that.
example:
id | time | foreign_key
1 | 2012-01-01 00:00:01 | 2
2 | 2012-01-01 00:02:01 | 4
3 | 2012-01-01 00:16:00 | 1
4 | 2012-01-01 00:17:00 | 9
5 | 2012-01-01 00:31:00 | 6
I now want to create a query that creates a result set similar to:
interval | COUNT(id)
2012-01-01 00:00:00 | 2
2012-01-01 00:15:00 | 2
2012-01-01 00:30:00 | 1
Is this possible in SQL or can anyone advise what other tools I could use? (e.g. exporting the data to a spreadsheet program would not be a problem)
Give this a try:
select datetime((strftime('%s', time) / 900) * 900, 'unixepoch') interval,
count(*) cnt
from t
group by interval
order by interval
Check the fiddle here.
I have limited SQLite background (and no practice instance), but I'd try grabbing the minutes using
strftime( FORMAT, TIMESTRING, MOD, MOD, ...)
with the %M modifier (http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html)
Then divide that by 15 and get the FLOOR of your quotient to figure out which quarter-hour you're in (e.g., 0, 1, 2, or 3)
cast(x as int)
Getting the floor value of a number in SQLite?
Strung together it might look something like:
Select cast( (strftime( 'YYYY-MM-DD HH:MI:SS', your_time_field, '%M') / 15) as int) from your_table
(you might need to cast before you divide by 15 as well, since strftime probably returns a string)
Then group by the quarter-hour.
Sorry I don't have exact syntax for you, but that approach should enable you to get the functional groupings, after which you can massage the output to make it look how you want.