How to extract elements from Presto ARRAY(MAP(VARCHAR, VARCHAR)) - sql

I have an array of maps and data format is ARRAY(MAP(VARCHAR, VARCHAR)); I'd like to extract "id" and "description" features from this "Item_Details" column:
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+--+
| Company | Country | Item_Details | | |
+===========+=============+============================================================================================================================================================+==+==+
| Apple | US | [{"created":"2019-09-15","product":"apple watch", "amount": "$7,900"},{"created":"2022-09-19","product":"iPhone", "amount": "$78,300"},{"created":"2021-01-13","product":"Macbook Pro", "amount": "$163,980"}] | | |
| Google | US | [{"created":"2020-07-15","product":"Nest", "amount": "$78,300"},{"created":"2021-07-15","product":"Google phone", "amount": "$178,900"}] | | |
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
My expected outputs would be:
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+--+
| Company | Country | Item_Details | | |
+===========+=============+============================================================================================================================================================+==+==+
| Apple | US | ["product":["apple watch", "iPhone", "Macbook Pro"], "amount":[ "$7,900", "$78,300","$163,980"] | | |
| Google | US | ["product":["Nest", "Google phone"], "amount": "$78,300", "$178,900"] | | |
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
or
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+--+
| Company | Country | Product | Amount | | |
+===========+=============+============================================================================================================================================================+==+==+
| Apple | US | apple watch | $7,900 | | |
| Apple | US | iPhone | $78,300 | | |
| Apple | US | Macbook Pro | $163,980 | | |
...
+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
I tried element_at(Item_Details, 'product') and json_extract_scalar(Item_Details, '$.product') but received error "Unexpected parameters (array(map(varchar,varchar)), varchar(23)) for function element_at. "
Any suggestions is much appreciated! Thank you in advance

For the second one you can unnest the array and access elements of map:
-- sampel data
WITH dataset(Company, Country, Item_Details) AS (
values ('Google', 'US', array[
map(array['created', 'product', 'amount'], array['2019-09-15', 'Nest', '$78,300']),
map(array['created', 'product', 'amount'], array['2019-09-16', 'Nest1', '$79,300'])
])
)
-- query
select Company,
Country,
m['product'] product,
m['amount'] amount
from dataset d,
unnest(Item_Details) as t(m);
Output:
Company
Country
product
amount
Google
US
Nest
$78,300
Google
US
Nest1
$79,300

Related

Distinct Sum and Group by

I have a dataset [attached example] and I want to create 2 tables out of this;
+------+------------+-------+-------+-------+--------+
| corp | product | data | Group | sales | market |
+------+------------+-------+-------+-------+--------+
| A | Eli | 43831 | A | 100 | I |
| A | Eli | 43831 | B | 100 | I |
| B | Sut | 43831 | A | 80 | I |
| A | Api | 43831 | C | 50 | C or D |
| A | Api | 43831 | D | 50 | C or D |
| B | Konkurent2 | 43831 | C | 40 | C or D |
+------+------------+-------+-------+-------+--------+
1st - sum(sales) by market and exclude duplicated rows, so I want to end up with Sales for each market in specific date rage (Data column) but exclude duplicated - I have them because 1 product can be in more than 1 group
So first table, for exmaple, for MRCC I, would look like:
+--------+-------+-------+
| market | sales | data |
+--------+-------+-------+
| I | 180 | 43831 |
+--------+-------+-------+
Then second table I would like to look like above one, but add as a 'dictionary' aditionall column with uniqe product name within Market and Date, so for MRCC I it would look like:
+--------+-------+-------+----------------+
| market | sales | data | unique product |
+--------+-------+-------+----------------+
| I | 180 | 43831 | eli |
| I | 180 | 43831 | Sut |
+--------+-------+-------+----------------+
The thing is, im not that experienced in SQL, and i'm fairly new to DataProcessing, the system I am working in allows me to do some of data processing either by some "visual" recipes or by SQL code which im not that familiar with. And even moe confusing is I can choose between 3 SQL DBMS , Impala, Hive, Spark SQL - for example to create market column I used Impala and script looks like this, and im not sure if this is "pure" Impala syntax:
SELECT * from
(
-- mrc I --
SELECT *,case when
(`product`="Eli")
or
(`product`="Sut")
THEN "MRCC I"
end as market
FROM x.`y`
)a
where market is not null
Could you give me some tips on a structure of a code and if this is even possible?
Thanks,
eM
import spark.implicits._
import org.apache.spark.sql.functions._
case class Sale(
corp: String,
product: String,
data: Long,
group: String,
sales: Long,
market: String
)
val df = Seq(
Sale("A", "Eli", 43831, "A", 100, "I"),
Sale("A", "Eli", 43831, "B", 100, "I"),
Sale("A", "Sut", 43831, "A", 80, "I"),
Sale("A", "Api", 43831, "C", 50, "C or D"),
Sale("A", "Api", 43831, "D", 50, "C or D"),
Sale("B", "Konkurent2", 43831, "C", 40, "C or D")
).toDF()
val t2 = df.dropDuplicates(Seq("corp", "product", "data", "market"))
.groupBy("market", "product", "data").sum("sales")
.select(
'market,
col("sum(sales)").alias("sales"),
'data,
'product.alias("unique product")
)
t2.show(false)
// +------+-----+-----+--------------+
// |market|sales|data |unique product|
// +------+-----+-----+--------------+
// |I |80 |43831|Sut |
// |I |100 |43831|Eli |
// |C or D|40 |43831|Konkurent2 |
// |C or D|50 |43831|Api |
// +------+-----+-----+--------------+
val t1 = t2.drop("unique product")
.groupBy("market", "data").sum("sales")
.select(
'market,
col("sum(sales)").alias("sales"),
'data)
t1.show(false)
// +------+-----+-----+
// |market|sales|data |
// +------+-----+-----+
// |I |180 |43831|
// |C or D|90 |43831|
// +------+-----+-----+

How can I summarize / pivot data with oracle sql

I have a table containing geological resource information.
| Property | Zone | Area | Category | Tonnage | Au_gt | Au_oz |
|----------|------|-------------|-----------|---------|-------|-------|
| Ket | Eel | Open Pit | Measured | 43400 | 5.52 | 7700 |
| Ket | Eel | Open Pit | Inferred | 51400 | 5.88 | 9700 |
| Ket | Eel | Open Pit | Indicated | 357300 | 6.41 | 73600 |
| Ket | Eel | Underground | Measured | 3300 | 7.16 | 800 |
| Ket | Eel | Underground | Inferred | 14700 | 6.16 | 2900 |
| Ket | Eel | Underground | Indicated | 168100 | 8.85 | 47800 |
I would like to summarize the data so that it can be read more easily by our clients.
| Property | Zone | Category | Open_Pit_Tonnage | Open_Pit_Au_gt | Open_Pit_Au_oz | Underground_tonnage | Underground_au_gt | Underground_au_oz | Combined_tonnage | Combined_au_gt | Combined_au_oz |
|----------|------|-----------|------------------|----------------|----------------|---------------------|-------------------|-------------------|------------------|----------------|----------------|
| Ket | Eel | Measured | 43,400 | 5.52 | 7,700 | 3,300 | 7.16 | 800 | 46,700 | 5.64 | 8,500 |
| Ket | Eel | Indicated | 357,300 | 6.41 | 73,600 | 168,100 | 8.85 | 47,800 | 525,400 | 7.19 | 121,400 |
| Ket | Eel | Inferred | 51,400 | 5.88 | 9,700 | 14,700 | 6.16 | 2,900 | 66,100 | 5.94 | 12,600 |
I'm fairly new to pivot tables. How could I write a query to translate and summarize the data?
Thanks!
If your Oracle version is 11.1 or higher (which it should be if you are a relatively new user!) then you can use the PIVOT operator, as shown below.
Note that the result of the PIVOT operation can be given an alias (I used p) - this makes it easier to write the SELECT clause.
I assumed the name of your table is geological_data - replace it with your actual table name.
select p.*
, open_pit_tonnage + underground_tonnage as combined_tonnage
, open_pit_au_gt + underground_au_gt as combined_au_gt
, open_pit_au_oz + underground_au_oz as combined_au_oz
from geological_data
pivot (sum(tonnage) as tonnage, sum(au_gt) as au_gt, sum(au_oz) as au_oz
for area in ('Open Pit' as open_pit, 'Underground' as underground)) p
;
Conditional aggregation is a simple method:
select Property, Zone, Category,
max(case when area = 'Open Pit' then tonnage end) as open_pit_tonnage,
max(case when area = 'Open Pit' then Au_gt end) as open_pit_Au_gt,
max(case when area = 'Open Pit' then Au_oz end) as open_pit_Au_ox,
max(case when area = 'Underground' then tonnage end) as Underground_tonnage,
max(case when area = 'Underground' then Au_gt end) as Underground_Au_gt,
max(case when area = 'Underground' then Au_oz end) as Underground_Au_ox
from t
group by Property, Zone, Category
SQL Server PIVOT operator is used to convert rows to columns.
Goal is to turn the category names from the first column of the output into multiple columns and count the number of products for each category
This query reference can be taken into account for you above table:
SELECT * FROM
(
SELECT
category_name,
product_id,
model_year
FROM
production.products p
INNER JOIN production.categories c
ON c.category_id = p.category_id
) t
PIVOT(
COUNT(product_id)
FOR category_name IN (
[Children Bicycles],
[Comfort Bicycles],
[Cruisers Bicycles],
[Cyclocross Bicycles],
[Electric Bikes],
[Mountain Bikes],
[Road Bikes])
) AS pivot_table;

How to modify the following cypher syntax in AgensGraph?

MATCH (wu:wiki_user)
OPTIONAL MATCH (n:wiki_doc{author:wu.uid}), (o:wiki_doc{editor:wu.uid})
RETURN wu.uid AS User_id, wu.org AS Organization, wu.email AS email, wu.token AS balance,
count(n) AS Writing, count(o) AS Modifying;
user_id | organization | email | balance | writing | modifying
--------------------------------------------------------------------------
"ailee" | "Org2" | "hazel#gbc.com" | 5 | 0 | 0
"hazel" | "Org1" | "hazel#gbc.com" | 5 | 2 | 2
match (n:wiki_doc{editor:'hazel'}) return n;
n
wiki_doc[9.11]
{"bid": "hazel_doc1", "cid": "Basic", "org": "Org1", "title": "Hello world!",
"author": "hazel", "editor": "hazel", "revnum": 1, "created": "2018-09-25
09:00:000", "hasfile": 2, "contents": "I was wrong", "modified": "2018-09-25
10:00:000"}
(1 row)
In fact, the number of updates in the case of hazel is 1, and 2
queries are used when the above query is used.
How to modify the query so that only one can be normally viewed.
MATCH( wu:wiki_user )
OPTIONAL MATCH (n:wiki_doc{author:wu.uid}), (o:wiki_doc{editor:wu.uid})
RETURN wu.uid AS User_id, wu.org AS Organization, wu.email AS email, wu.token AS balance,
count(distinct id(n)) as Writing, count(distinct id(o)) as Modifying;
user_id | organization | email | balance | writing | modifying
+----------------------------------------------------------+
"ailee" | "Org2" | "hazel#gbc.com" | 5 | 0 | 0
"hazel" | "Org1" | "hazel#gbc.com" | 5 | 2 | 1
(2 rows)

Retrieve closest road when given (lat, long) using OSM in Postgres with Postgis using SQL query

Given a set (lat, long) I am trying to find the maximum speed using "max_speed" and street type using "highway".
I have loaded my database (Postgres and Postgis) as follows:
$ osm2pgsql -c -d gis --slim -C 50000 /var/lib/postgresql/data/germany-latest.osm.pbf
The closest related question I could find was How to query all shops around a certain longitude/latitude using osm-postgis?. I have taken the query, and plugged in a (lat, long) that I found in google maps for the city center of Munich (as the post was also related to city center Munich and I have the map for Germany). The result turns up empty.
gis=# SELECT name, shop FROM planet_osm_point WHERE ST_DWithin(way ,ST_SetSrid(ST_Point(48.137969, 11.573829), 900913), 100);
name | shop
------+------
(0 rows)
Also when looking into the planet_osm_nodes, which contains (lat, long) pairs directly, I end up with no results:
gis=# SELECT * FROM planet_osm_nodes WHERE ((lat BETWEEN 470000000 AND 490000000) AND (lon BETWEEN 100000000 AND 120000000)) LIMIT 10;
id | lat | lon | tags
----+-----+-----+------
(0 rows)
I verified the data is in my database:
gis=# SELECT COUNT(*) FROM planet_osm_point;
count
---------
9924531
(1 row)
and
gis=# SELECT COUNT(*) FROM planet_osm_nodes;
count
-----------
288597897
(1 row)
So ideally, my question would be
Q: How can I find the "max speed" and "highway" given a set (lat, lon)
alternatively, my questions is:
Q: How do I get the query from the other stack overflow post to work?
My best guess is that I need to transform my (lat, lon) in some way, or that I simply have the wrong data for whatever reason.
Edit: added sample data as requested:
gis=# SELECT * FROM planet_osm_point LIMIT 1;
osm_id | access | addr:housename | addr:housenumber | addr:interpolation | admin_level | aerialway | aeroway | amenity | area | barrier | bicycle | brand | bridge | boundary | building | capital | construction | covered | culvert |
cutting | denomination | disused | ele | embankment | foot | generator:source | harbour | highway | historic | horse | intermittent | junction | landuse | layer | leisure | lock | man_made | military | motorcar | name | natural | off
ice | oneway | operator | place | poi | population | power | power_source | public_transport | railway | ref | religion | route | service | shop | sport | surface | toll | tourism | tower:type | tunnel | water | waterway | wetland | wi
dth | wood | z_order | way
-----------+--------+----------------+------------------+--------------------+-------------+-----------+---------+---------+------+---------+---------+-------+--------+----------+----------+---------+--------------+---------+---------+
---------+--------------+---------+-----+------------+------+------------------+---------+----------+----------+-------+--------------+----------+---------+-------+---------+------+----------+----------+----------+------+---------+----
----+--------+----------+-------+-----+------------+-------+--------------+------------------+---------+-----+----------+-------+---------+------+-------+---------+------+---------+------------+--------+-------+----------+---------+---
----+------+---------+----------------------------------------------------
304070863 | | | | | | | | | | | | | | | | | | | |
| | | | | | | | crossing | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | |
| | | 010100002031BF0D0048E17A94F19F2941CDCCCCDCC60D5741
(1 row)
and
gis=# SELECT * FROM planet_osm_nodes LIMIT 1;
id | lat | lon | tags
--------+-----------+----------+------
234100 | 666501948 | 80442755 |
(1 row)
Edit 2: There was a mention regarding "SRID", so I added example data from another table:
gis=# SELECT * FROM spatial_ref_sys LIMIT 1;
srid | auth_name | auth_srid | srtext
| proj4text
------+-----------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------
3819 | EPSG | 3819 | GEOGCS["HD1909",DATUM["Hungarian_Datum_1909",SPHEROID["Bessel 1841",6377397.155,299.1528128,AUTHORITY["EPSG","7004"]],TOWGS84[595.48,121.69,515.35,4.115,-2.9383,0.853,-3.408],AUTHORITY["EPSG","1024"]],PR
IMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","3819"]] | +proj=longlat +ellps=bessel +towgs84=595.48,121.69,515.35,4.115,-2.9383,0.853,-3.408 +no_defs
(1 row)
Geometry in PostGIS has a different ordering of (lat long) first is going longitude then latitude.
Also if you want to transform a point from one SRID to another use st_transfrom(), not ST_SetSrid.
ST_Transform relly transform your data from one coordinates system to another.
select st_astext(st_transform(ST_SetSrid(ST_Point(11.573829,48.137969), 4326),900913))
ST_SetSrid - just change SRID for the object.
select st_astext((ST_SetSrid(ST_Point(11.573829,48.137969),900913)
So, you have to change your SQL that way
SELECT name, shop
FROM planet_osm_point
WHERE ST_DWithin(way,st_transform(ST_SetSrid(ST_Point(11.573829,48.137969), 4326),900913), 100);

Condition within Where Clause

I am looking to add 'if else' condition within a where clause, but not sure how would I do it. Looking at the below table set, I am trying to filter the results where the query would extract all the Product line subtypes and add a condition only when the [Product Line Subtype] = 'Marine'. When it is Marine, then it should consider only two combinations of Section and Profit Code while omitting other combinations.
Combination 1 When Prod line Subtype = Marine then Section = Inland and Profit Code = Builders
Combination 2 When Prod line Subtype = Marine then Section = Ocean and Profit Code = Stocks
My actual table has larger sets of distinct combinations than showed in the below table when Prod line Subtype = Marine, but I just want to filter only the above two combinations to my results set. Any help would be much appreciated!
Main table
+--+------------------+-------------+-----------------+
| |Prod line Subtype | Section | Profit Code |
+--+------------------+-------------+-----------------+
| | Marine | Inland | Builders |
| | Marine | Ocean | Stock |
| | Property | General | Transport |
| | Energy | Source | Others |
| | Property | General | Transport |
| | Energy | Source | Transport |
| | Marine | Inland | Transport |
| | Marine | Floaters | Transport |
| | Marine | Cargo | Others |
+--+------------------+-------------+---------------- +
Expected Results
+--+------------------+-------------+-----------------+
| |Prod line Subtype | Section | Profit Code |
+--+------------------+-------------+-----------------+
| | Marine | Inland | Builders |
| | Marine | Ocean | Stock |
| | Property | General | Transport |
| | Energy | Source | Others |
| | Property | General | Transport |
| | Energy | Source | Transport |
+--+------------------+-------------+---------------- +
My query attempt:
select *
from #Step1
where c1.row_ord = 1
and c1.[Prod Line Subtype] = 'Marine' AND (
(c1.[Section] = 'Inland' AND c1.[Profit Code] = 'Builder')
OR (c1.[Section] = 'Ocean' AND c1.[Profit Code] = 'Stock')
)
What about:
Select * from [Your table]
Where ([Prod line Subtype]<>'Marine' Or
(Section='Inland' And [Profit Code]='Builders') Or
(Section='Ocean' And [Profit Code]='Stocks')
)
Can omit the [Prod line Subtype]='Marine' from or conditions