Select Json object keys as columns in prestodb (sql) - sql

This is my database:
mytable
SensorID
Name
Data
1
Prox
{"O3":33, "CO2":12, "PM10":12"}
3
IR
{"O3":33, "CO2":12, "PM10":12"}
SELECT (how to select field without mentioning object keys?) FROM mytable WHERE SensorID=1
actually, I tried this method it's working :
SELECT SensorID, Name, Data.O3, Data.CO2, Data.PM10 FROM mytable WHERE SensorID=1
The problem is sometimes I don't know what are values inside the object of the Data column
Excepted output is:
SensorID
Name
O3
CO2
PM10
1
Prox
33
12
12
How can I achieve this...

Related

Complex aggregation with select

I have table in DB like this (ID column is not a unique UUID, just some object ID, primary key still exists, but removed for example)
ID
Option
Value
Number of searches
Search date
1
abc
a
1
2021-01-01
1
abc
b
2
2021-01-01
1
abc
a
3
2021-01-02
1
abc
b
4
2021-01-02
1
def
a
5
2021-01-01
1
def
b
6
2021-01-01
1
def
a
7
2021-01-02
1
def
b
8
2021-01-02
2
...
...
...
...
...
...
...
...
...
N
xyz
xyz
M
any date
I want to get a some kind of statistic report like
ID
Total searches
Option
Total number of option searches
Value
Total value searches
1
36
abc
10
a
4
b
6
def
26
a
12
b
14
Is it possible in some way? UNION isn't working were, clause GROUP BY also have no idea how can solve that
I can do it easily in kotlin, just request everything and aggregate to classes like that
data class SearchAggregate (
val id: String,
val options: List<Option>,
val values: List<Value>
)
data class Option (
val name: String,
val totalSearches: Long
)
data class Value(
val name: String,
val totalSearches: Long
)
and export to file but I have to request data by SQL
You can use the COUNT() window function in a subquery to preprocess the data. For example:
select
id,
max(total_searches) as total_searches,
option,
max(total_options) as total_options,
value,
max(total_values) as total_values
from (
select
id,
count(*) over(partition by id) as total_searches,
option,
count(*) over(partition by id, option) as total_options,
value,
count(*) over(partition by id, option, value) as total_values
from t
) x
group by id, option, value
See running example at DB Fiddle #1.
Or you can use a shorter query, as in:
select
id,
sum(cnt) over(partition by id) as total_searches,
option,
sum(cnt) over(partition by id, option) as total_searches,
value,
cnt
from (
select id, option, value, count(*) as cnt from t group by id, option, value
) x
See running example at DB Fiddle #2.
The first option is to use ROLLUP, as that is the intended SQL pattern. It doesn't give you the results in the format you asked for. That's a reflection on the format you asked for not being normalised.
SELECT
id,
option,
value,
SUM(`Number of searches`) AS total_searches
FROM
your_table
GROUP BY
ROLLUP(
id,
option,
value
)
It's concise, standard practice, SQL Friendly, etc, etc.
Thinking in terms of these normalised patterns will make your use of SQL much more effective.
That said, you CAN use SQL to aggregate and restructure the results. You get the structure you want, but with more code and increased maintenance, lower flexibility, etc.
SELECT
id,
SUM(SUM(`Number of searches`)) OVER (PARTITION BY id) as total_by_id,
option,
SUM(SUM(`Number of searches`)) OVER (PARTITION BY id, option) as total_by_id_option,
value,
SUM(`Number of searches`) AS total_by_id_option_value
FROM
your_table
GROUP BY
id,
option,
value
That doesn't leave blanks where you have them, but that's because to do so is a SQL Anti-Pattern, and should be handled in your presentation layer, not in the database.
Oh, and please don't use column names with spaces; stick to alphanumeric characters with underscores.
Demo : https://www.db-fiddle.com/f/fX3tNL82gqgVCRoa3v6snP/5

Is there a way to modify and rename a column that is within a RECORD in BigQuery, and then keep this column with the same name and location as before?

I have a RECORD in BigQuery with the structure:
Parent
|___Child_1
|___Child_2
|___Child_3
|___...
Child_1 is of type TIMESTAMP, so I would like to convert it from a TIMESTAMP string to an INT64 that represents the number of milliseconds since Unix Epoch. This is done via the unix_millis function.
I am having trouble getting this done for nested fields. Below are my attempts:
select *, unix_millis(parent.child_1) as parent.child_1 from `dataset.table`
When I tried the above, the query editor in BigQuery underlined "child_1" in "as parent.child_1", and gave the error Syntax error: Expected end of input but got "."
Why I expected this to work is because, for non-nested fields, it is possible to use unix_millis and then use the AS operator to rename the column.
So how would I go about performing the unix_millis function and then make sure that the resulting column has the same name and location within the RECORD as before?
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
REPLACE((
SELECT AS STRUCT * REPLACE(UNIX_MILLIS(child1) AS child1)
FROM UNNEST([parent])
) AS parent)
FROM `project.dataset.table`
You can test, play with above using some simplified dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, STRUCT<child1 TIMESTAMP, child2 STRING, child3 INT64>(CURRENT_TIMESTAMP(), 'test1', 123) parent UNION ALL
SELECT 2, STRUCT<child1 TIMESTAMP, child2 STRING, child3 INT64>(CURRENT_TIMESTAMP(), 'test2', 456)
)
SELECT *
REPLACE((
SELECT AS STRUCT * REPLACE(UNIX_MILLIS(child1) AS child1)
FROM UNNEST([parent])
) AS parent)
FROM `project.dataset.table`
with output
Row id parent.child1 parent.child2 parent.child3
1 1 1599154064128 test1 123
2 2 1599154064128 test2 456
while original data was
Row id parent.child1 parent.child2 parent.child3
1 1 2020-09-03 17:29:09.512794 UTC test1 123
2 2 2020-09-03 17:29:09.512794 UTC test2 456

SQL Rows to Columns if column values are unknown

I have a table that has demographic information about a set of users which looks like this:
User_id Category IsMember
1 College 1
1 Married 0
1 Employed 1
1 Has_Kids 1
2 College 0
2 Married 1
2 Employed 1
3 College 0
3 Employed 0
The result set I want is a table that looks like this:
User_Id|College|Married|Employed|Has_Kids
1 1 0 1 1
2 0 1 1 0
3 0 0 0 0
In other words, the table indicates the presence or absence of a category for each user. Sometimes the user will have a category where the value if false, sometimes the user will have no row for a category, in which case IsMember is assumed to be false.
Also, from time to time additional categories will be added to the data set, and I'm wondering if its possible to do this query without knowing up front all the possible category names, in other words, I won't be able to specify all the column names I want to count in the result. (Note only user 1 has category "has_kids" and user 3 is missing a row for category "married"
(using Postgres)
Thanks.
You can use jsonb funcions.
with titles as (
select jsonb_object_agg(Category, Category) as titles,
jsonb_object_agg(Category, -1) as defaults
from demog
),
the_rows as (
select null::bigint as id, titles as data
from titles
union
select User_id, defaults || jsonb_object_agg(Category, IsMember)
from demog, titles
group by User_id, defaults
)
select id, string_agg(value, '|' order by key)
from (
select id, key, value
from the_rows, jsonb_each_text(data)
) x
group by id
order by id nulls first
You can see a running example in http://rextester.com/QEGT70842
You can replace -1 with 0 for the default value and '|' with ',' for the separator.
You can install tablefunc module and use the crosstab function.
https://www.postgresql.org/docs/9.1/static/tablefunc.html
I found a Postgres function script called colpivot here which does the trick. Ran the script to create the function, then created the table in one statement:
select colpivot ('_pivoted', 'select * from user_categories', array['user_id'],
array ['category'], '#.is_member', null);

Find certain values and show corresponding value from different field in SQL

So I found these 2 articles but they don't quite answer my question...
Find max value and show corresponding value from different field in SQL server
Find max value and show corresponding value from different field in MS Access
I have a table like this...
ID Type Date
1 Initial 1/5/15
1 Periodic 3/5/15
2 Initial 2/5/15
3 Initial 1/10/15
3 Periodic 3/6/15
4
5 Initial 3/8/15
I need to get all of the ID numbers that are "Periodic" or NULL and corresponding date. So I want a to get query results that looks like this...
ID Type Date
1 Periodic 3/5/15
3 Periodic 3/6/15
4
I've tried
select id, type, date1
from Table1 as t
where type in (select type
from Table1 as t2
where ((t2.type) Is Null) or "" or ("periodic"));
But this doesn't work... From what I've read about NULL you can't compare null values...
Why in SQL NULL can't match with NULL?
So I tried
SELECT id, type, date1
FROM Table1 AS t
WHERE type in (select type
from Table1 as t2
where ((t.Type)<>"Initial"));
But this doesn't give me the ID of 4...
Any suggestions?
Unless I'm missing something, you just want:
select id, type, date1
from Table1 as t
where (t.type Is Null) or (t.type = "") or (t.type = "periodic");
The or applies to boolean expressions, not to values being compared.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.