Google Public Patent Data SQL (BigQuery) - sql

I am trying to retrieve specific cpc codes AND assignees via SQL in the Google public patent data. I am trying to search for the term "VOLKSWAGEN" and cpc.code "H01M8".
But I got the error:
No matching signature for operator = for argument types: ARRAY
<STRUCT<name STRING, country_code STRING>>, STRING. Supported
signature: ANY = ANY at [15:3]
code:
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
ipc,
cpc.code,
inventor,
assignee_harmonized,
FROM
`patents-public-data.patents.publications`
WHERE
assignee_harmonized = "VOLKSWAGEN" AND cpc.code = "H01M8"
LIMIT
1000
I'm also interested in searching multiple assignees such as:
in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
I have recently started to work with SQL and do not see the mistake :/
Many thanks for your help!

Many thanks, now I created this code to screen multiple companies.
Is it possible to get the query of requests out of "cpc__u.code" in one row cell each? with a ", " to seperate the codes between the output string?.
Same I like to consider for the assignee_harmonized__u.name here as well !
Do you think the companies will be screened with this precedure and the "IN" operator?
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
cpc__u.code,
inventor,
assignee_harmonized,
assignee
FROM
`patents-public-data.patents.publications`,
UNNEST(assignee_harmonized) AS assignee_harmonized__u,
UNNEST(cpc) AS cpc__u
WHERE
assignee_harmonized__u.name in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
AND cpc__u.code LIKE "H01M8%"
LIMIT
100000

In Google BigQuery UNNEST is needed to access ARRAY elements. This is described here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
The following query works for me.
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
ipc,
cpc__u.code,
inventor,
assignee_harmonized,
FROM
`patents-public-data.patents.publications`,
UNNEST(assignee_harmonized) AS assignee_harmonized__u,
UNNEST(cpc) AS cpc__u
WHERE
assignee_harmonized__u.name = "VOLKSWAGEN AG"
AND cpc__u.code LIKE "H01M8%"
LIMIT
1000
The following are changes I made to generate results:
UNNEST(assignee_harmonized) as assignee_harmonized__u to access assignee_harmonized__u.name.
UNNEST(cpc) as cpc__u to access cpc__u.code.
assignee_harmonized__u.name = "VOLKSWAGEN AG" as "VOLKSWAGEN" returns no results.
cpc__u.code LIKE "H01M8%" as "H01M8" returns no results. An example value is H01M8/10.
This returns the following:
Query complete (2.3 sec elapsed, 29.2 GB processed)
If you want to screen multiple assignee names, IN will work like the following, however, you need to have an exact match like VOLKSWAGEN AG or AUDI AG.
assignee_harmonized__u.name IN ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
If you want to do a LIKE style match with multiple strings, you can try REGEXP_CONTAINS:
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains

Related

Scrape table from JSP website using Python

I would like to scrape the table that appears when you go to this website: https://www.eprocure.gov.bd/resources/common/SearcheCMS.jsp
I used the following code based on the example shown here.
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(executable_path="C:/Users/DefaultUser/AppData/geckodriver.exe")
driver.get("https://www.eprocure.gov.bd/resources/common/SearcheCMS.jsp")
time.sleep(5)
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()
soup = BeautifulSoup(res, 'html.parser')
table_rows =soup.find_all('table')\[1\].find_all('tr')
rows=\[\]
for tr in table_rows:
td = tr.find_all('td')
rows.append(\[i.text for i in td\])
delaydata = rows\[3:\]
import pandas as pd
df = pd.DataFrame(delaydata, columns = \['S. No.', 'Ministry, Division, Organization PE', 'Procurement Nature, Type & Method', 'Tender/Proposal ID, Ref No., Title & Publishing Date', 'Contract Awarded To', 'Company Unique ID', 'Experience Certificate No', 'Contract Amount', 'Contract Start & End Date', 'Work Status'\])
df
Finding the URL
Well, actually, there's no need to use Selenium. The data is available via sending a POST request to:
https://www.eprocure.gov.bd/AdvSearcheCMSServlet
How did I find this URL?
Well, if you inspect your browsers Network calls (Click on F12), you'll see the following:
And take note of the "Payload" tab:
this will later be used as data in the below example.
Great, but how do I get the data including paginating the page?
To get the data, including page pagination, you can see this example, where we get the HTML table and increase pageNo for pagination (this is for the "eTenders" table/tab):
import requests
import pandas as pd
from bs4 import BeautifulSoup
data = {
"action": "geteCMSList",
"keyword": "",
"officeId": "0",
"contractAwardTo": "",
"contractStartDtFrom": "",
"contractStartDtTo": "",
"contractEndDtFrom": "",
"contractEndDtTo": "",
"departmentId": "",
"tenderId": "",
"procurementMethod": "",
"procurementNature": "",
"contAwrdSearchOpt": "Contains",
"exCertSearchOpt": "Contains",
"exCertificateNo": "",
"tendererId": "",
"procType": "",
"statusTab": "eTenders",
"pageNo": "1",
"size": "10",
"workStatus": "All",
}
_columns = [
"S. No",
"Ministry, Division, Organization, PE",
"Procurement Nature, Type & Method",
"Tender/Proposal ID, Ref No., Title..",
"Contract Awarded To",
"Company Unique ID",
"Experience Certificate No ",
"Contract Amount",
"Contract Start & End Date",
"Work Status",
]
for page in range(1, 11): # <--- Increase number of pages here
print(f"Page: {page}")
data["pageNo"] = page
response = requests.post(
"https://www.eprocure.gov.bd/AdvSearcheCMSServlet", data=data
)
# The HTML is missing a `table` tag, so we need to fix it
soup = BeautifulSoup("<table>" + "".join(response.text) + "</table>", "html.parser")
df = pd.read_html(
str(soup),
)[0]
df.columns = _columns
print(df.to_string())
Going further
How do I select the different tabs/tables on the page?
To select the different tabs on the page, you can change the "statusTab" in the data. Inspect the payload tab again, and you'll see what I mean.
Output
The above code outputs:
S. No Ministry, Division, Organization, PE Procurement Nature, Type & Method Tender/Proposal ID, Ref No., Title.. Contract Awarded To Company Unique ID Experience Certificate No\t Contract Amount Contract Start & End Date Work Status
0 1 Ministry of Education, Education Engineering Department, Office of the Executive Engineer, EED,Kishoreganj Zone. Works, NCT, LTM 300580, 932/EE/EED/KZ/Rev-5974/2018-19/23, Dt: 28/03/2019 Repair and Renovation Works at Chowganga Shahid Smrity High School Itna Kishoreganj. 01-Apr-2019 M/S KAZI RASEL NIRMAN SONGSTA 1051854 WD-5974- 25/e-GP/20221228/300580/0060000 475000.000 10-Jun-2019 03-Sep-2019 Completed
1 2 Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division Works, NCT, LTM 558656, CMD/T-19/100 Dated: 14-03-2021 Manufacturing supplying & installation of 01 No MS Flap gate size - 1.65 m 1.95m and 01 no. Padestal type lifting device for sluice no S-15 6-vent 02 nos MS Vertical gate size - 1.65 m 1.95m for sluice no S-15 6-vent and sluice no S-14 new 1-vent at Coxs Bazar Sadar Upazilla of CEP Polder No 66/1 under Coxsbazar O&M Division implemented by Chattogram Mechanical Division BWDB Madunaghat Chattogram during the financial year 2020-21. 15-Mar-2021 M/S. AN Corporation 1063426 CMD/COX/LTM-16/2020-21/e-GP/20221228/558656/0059991 503470.662 12-Apr-2021 05-May-2021 Completed
2 3 Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division Works, NCT, LTM 633496, CMD/T-19/263 Dated: 30-11-2021 Manufacturing, supplying & installation of 07 No M.S Flap gate for sluice no.- 6 (1-vent), sluice no.- 7 (2-vent), sluice no.-8 (2-vent), sluice no.-35 (2-vent) size :- (1.00 m Ã?1.00m), 01 No Padestal type lifting device for sluice no- 13(1-vent) for CEP Polder No 64/2B, at pekua Upazilla under Chattogram Mechanical Division, BWDB, Madunaghat, Chattogram, during the financial year 2021-22. 30-Nov-2021 M/S. AN Corporation 1063426 CMD/LTM-08/2021-22/e-GP/20221228/633496/0059989 648808.272 26-Dec-2021 31-Jan-2022 Completed
...
...

How to get section heading of tables in wikipedia through API

How do I get section headings for individual tables: Xia dynasty (夏朝) (2070–1600 BC), Shang dynasty (商朝) (1600–1046 BC), Zhou dynasty (周朝) (1046–256 BC) etc. for the Chinese Monarchs list on Wikipedia via API? I use the code below to connect:
from pprint import pprint
import requests, wikitextparser
r = requests.get(
'https://en.wikipedia.org/w/api.php',
params={
'action': 'query',
'titles': 'List_of_Chinese_monarchs',
'prop': 'revisions',
'rvprop': 'content',
'format': 'json',
}
)
r.raise_for_status()
pages = r.json()['query']['pages']
body = next(iter(pages.values()))['revisions'][0]['*']
doc = wikitextparser.parse(body)
print(f'{len(doc.tables)} tables retrieved')
han = doc.tables[5].data()
doc.tables[6].data()
doc.tables[i].data() only return the table values, without its <h2> section headings. I would like the API to return me a list of title strings that correspond to each of the 83 tables returned.
Original website:
https://en.wikipedia.org/wiki/List_of_Chinese_monarchs
I'm not sure why you are using doc.tables when it is the sections you are interested in. This works for me:
for i in range(1,94,1):
print(doc.sections[i].title.replace('[[','').replace(']]',''))
I get 94 sections though rather than 83 and while you can use len(doc.sections) this will include See also etc. There must be a more elegant way of removing the wikilinks.

amadeus API list of all possible hotel "amenities"

In Amadeus hotels API there is amenities choices and in the search results there is different possibilities as well.
To make amenities more user readable I'd like a FULL list of ALL different possible amenities so that I can populate a database with amenities code and different translations.
For a client searching for hotels: stuff like ACC_BATHS, SAFE_DEP_BOX is kind of not readable friendly...
I'm referring to this
{
"data": [
{
"type": "hotel-offers",
"hotel": {
"type": "hotel",
"cityCode": "MIA",
...
"amenities": [
"HANDICAP_FAC",
"ACC_BATHS",
"ACC_WASHBASIN",
"ACC_BATH_CTRLS",
"ACC_LIGHT_
where can I find a csv of all amenities ?
I contacted the Amadeus tech support and they answered me this :
(you can copy this list, it's csv format... NAME_OF_AMENITY,amenity_code )
226 codes
PHOTOCOPIER,BUS.2
PRINTER,BUS.28
AUDIO-VIS_EQT,BUS.37
WHITE/BLACKBOARD,BUS.38
BUSINESS_CENTER,BUS.39
CELLULAR_PHONE_RENTAL,BUS.40
COMPUTER_RENTAL,BUS.41
EXECUTIVE_DESK,BUS.42
LCD/PROJECTOR,BUS.45
MEETING_ROOMS,BUS.46
OVERHEAD_PROJECTOR,BUS.48
SECRETARIAL_SERVICES,BUS.49
CONFERENCE_SUITE,BUS.94
CONVENTION_CTR,BUS.95
MEETING_FACILITIES,BUS.96
24_HOUR_FRONT_DESK,HAC.1
DISABLED_FACILITIES,HAC.101
MULTILINGUAL_STAFF,HAC.103
WEDDING_SERVICES,HAC.104
BANQUETING_FACILITIES,HAC.105
PORTER/BELLBOY,HAC.106
BEAUTY_PARLOUR,HAC.107
WOMENS_GST_RMS,HAC.110
PHARMACY,HAC.111
120_AC,HAC.113
120_DC,HAC.114
220_AC,HAC.115
220_DC,HAC.117
BARBECUE,HAC.118
BUTLER_SERVICE,HAC.136
CAR_RENTAL,HAC.15
CASINO,HAC.16
BAR,HAC.165
LOUNGE,HAC.165
TRANSPORTATION,HAC.172
WIFI,HAC.178
WIRELESS_CONNECTIVITY,HAC.179
BALLROOM,HAC.191
BUS_PARKING,HAC.192
CHILDRENS_PLAY_AREA,HAC.193
NURSERY,HAC.194
DISCO,HAC.195
24_HOUR_ROOM_SERVICE,HAC.2
COFFEE_SHOP,HAC.20
BAGGAGE_STORAGE,HAC.201
NO_KID_ALLOWED,HAC.217
KIDS_WELCOME,HAC.218
COURTESY_CAR,HAC.219
CONCIERGE,HAC.22
NO_PORN_FILMS,HAC.220
INT_HOTSPOTS,HAC.221
FREE_INTERNET,HAC.222
INTERNET_SERVICES,HAC.223
PETS_ALLOWED,HAC.224
FREE_BREAKFAST,HAC.227
CONFERENCE_FACILITIES,HAC.24
HI_INTERNET,HAC.259
EXCHANGE_FAC,HAC.26
LOBBY,HAC.276
DOCTOR_ON_CALL,HAC.28
24H_COFFEE_SHOP,HAC.281
AIRPORT_SHUTTLE,HAC.282
LUGGAGE_SERVICE,HAC.283
PIANO_BAR,HAC.284
VIP_SECURITY,HAC.285
DRIVING_RANGE,HAC.30
DUTY_FREE_SHOP,HAC.32
ELEVATOR,HAC.33
EXECUTIVE_FLR,HAC.34
GYM,HAC.35
EXPRESS_CHECK_IN,HAC.36
EXPRESS_CHECK_OUT,HAC.37
FLORIST,HAC.39
CONNECTING_ROOMS,HAC.4
FREE_AIRPORT_SHUTTLE,HAC.41
FREE_PARKING,HAC.42
FREE_TRANSPORTATION,HAC.43
GAMES_ROOM,HAC.44
GIFT_SHOP,HAC.45
HAIRDRESSER,HAC.46
ICE_MACHINES,HAC.52
GARAGE_PARKING,HAC.53
JACUZZI,HAC.55
JOGGING_TRACK,HAC.56
KENNELS,HAC.57
LAUNDRY_SVC,HAC.58
AIRLINE_DESK,HAC.6
LIVE_ENTERTAINMENT,HAC.60
MASSAGE,HAC.61
NIGHT_CLUB,HAC.62
SWIMMING_POOL,HAC.66
PARKING,HAC.68
ATM/CASH_MACHINE,HAC.7
POOLSIDE_SNACK_BAR,HAC.72
RESTAURANT,HAC.76
ROOM_SERVICE,HAC.77
SAFE_DEP_BOX,HAC.78
SAUNA,HAC.79
BABY-SITTING,HAC.8
SOLARIUM,HAC.83
SPA,HAC.84
CONVENIENCE_STOR,HAC.88
PICNIC_AREA,HAC.9
THEATRE_DESK,HAC.90
TOUR_DESK,HAC.91
TRANSLATION_SERVICES,HAC.92
TRAVEL_AGENCY,HAC.93
VALET_PARKING,HAC.97
VENDING_MACHINES,HAC.98
TELECONFERENCE,MRC.121
VOLTAGE_AVAILABLE,MRC.123
NATURAL_DAYLIGHT,MRC.126
GROUP_RATES,MRC.141
INTERNET-HIGH_SPEED,MRC.17
VIDEO_CONF_FACILITIES,MRC.53
ACC_BATHS,PHY.102
BR/L_PRINT_LIT,PHY.103
ADAPT_RM_DOORS,PHY.104
ACC_RM_WCHAIR,PHY.105
SERV_SPEC_MENU,PHY.106
WIDE_ENTRANCE,PHY.107
WIDE_CORRIDORS,PHY.108
WIDE_REST_ENT,PHY.109
ACC_LIGHT_SW,PHY.15
ACC_WCHAIR,PHY.28
SERV_DOGS_ALWD,PHY.29
ACC_WASHBASIN,PHY.3
ACC_TOILETS,PHY.32
ADAPT_BATHROOM,PHY.38
HANDRAIL_BTHRM,PHY.38
ADAPTED_PHONES,PHY.39
ACC_ELEVATORS,PHY.42
TV_SUB/CAPTION,PHY.45
DIS_PARKG,PHY.50
EMERG_COD/BUT,PHY.57
HANDICAP_FAC,PHY.6
DIS_EMERG_PLAN,PHY.60
HEAR_IND_LOOPS,PHY.65
BR/L_PRNT_MENU,PHY.66
DIS_TRAIN_STAF,PHY.71
PIL_ALARMS_AVL,PHY.76
ACC_BATH_CTRLS,PHY.79
PUTTING_GREEN,REC.5
TROUSER_PRESS,RMA.111
VIDEO,RMA.116
GAMES_SYSTEM_IN_ROOM,RMA.117
VOICEMAIL_IN_ROOM,RMA.118
WAKEUP_SERVICE,RMA.119
WI-FI_IN_ROOM,RMA.123
CD_PLAYER,RMA.129
BATH,RMA.13
MOVIE_CHANNELS,RMA.139
SHOWER,RMA.142
OUTLET_ADAPTERS,RMA.159
BIDET,RMA.16
DVD_PLAYER,RMA.163
CABLE_TELEVISION,RMA.18
OVERSIZED_ROOMS,RMA.185
TEA/COFFEE_MK_FACILITIES,RMA.19
AIR_CONDITIONING,RMA.2
TELEVISION,RMA.20
ANNEX_ROOM,RMA.204
FREE_NEWSPAPER,RMA.205
HONEYMOON_SUITES,RMA.206
INTERNETFREE_HIGH_IN_RM,RMA.207
MAID_SERVICE,RMA.208
PC_HOOKUP_INRM,RMA.209
PC_IN_ROOM,RMA.21
SATELLITE_TV,RMA.210
VIP_ROOMS,RMA.211
CORDLESS_PHONE,RMA.25
CRIBS_AVAILABLE,RMA.26
ALARM_CLOCK,RMA.3
PHONE-DIR_DIAL,RMA.31
FAX_FAC_INROOM,RMA.38
FREE_LOCAL_CALLS,RMA.45
HAIR_DRYER,RMA.50
INTERNET-HI_SPEED_IN_RM,RMA.51
IRON/IRON_BOARD,RMA.55
KITCHEN,RMA.59
BABY_LISTENING_DEVICE,RMA.6
LAUNDRY_EQUIPMENT_IN_ROOM,RMA.66
MICROWAVE,RMA.68
MINIBAR,RMA.69
NONSMOKING_RMS,RMA.74
REFRIGERATOR,RMA.88
ROLLAWAY_BEDS,RMA.91
SAFE,RMA.92
WATER_SPORTS,RST.110
ANIMAL_WATCHING,RST.126
BIRD_WATCHING,RST.127
SIGHTSEEING,RST.142
BEACH_WITH_DIRECT_ACCESS,RST.155
SKI_IN/OUT,RST.156
TENNIS_PROFESSIONAL,RST.157
FISHING,RST.20
GOLF,RST.27
FITNESS_CENTER,RST.36
BEACH,RST.5
HORSE_RIDING,RST.61
INDOOR_TENNIS,RST.62
MINIATURE_GOLF,RST.67
BOATING,RST.7
TENNIS,RST.71
SCUBA_DIVING,RST.82
SKEET_SHOOTING,RST.85
SNOW_SKIING,RST.88
BOWLING,RST.9
VOLLEYBALL,RST.98
ELEC_GENERATOR,SEC.15
EMERG_LIGHTING,SEC.19
FIRE_DETECTORS,SEC.22
GUARDED_PARKG,SEC.34
RESTRIC_RM_ACC,SEC.39
EXT_ROOM_ENTRY,SEC.40
INT_ROOM_ENTRY,SEC.41
SMOKE_DETECTOR,SEC.50
ROOMS_WITH_BALCONIES,SEC.51
SPRINKLERS,SEC.54
FIRST_AID_STAF,SEC.57
SECURITY_GUARD,SEC.58
VIDEO_SURVEIL,SEC.62
EXTINGUISHERS,SEC.89
FIRE_SAFETY,SEC.9
FEMA_FIRE_SAFETY_COMPLIANT,SEC.93
FIRE_SAF_NOT_STANDARD,SEC.95
According to the API, you can filter the offers by amenities:
https://developers.amadeus.com/self-service/category/hotel/api-doc/hotel-search/api-reference
I assume the multiple select list in the amenities property contains all the items you need.
EDIT: I noticed that unfortunately, the response example contains additional values, apart from the input. So the input is not enough.

Oracle-XML : XMLAGG / XMLELEMENT / XMLATTRIBUTES in a SQL query to group by a specific field

The below SQL:
SELECT
XMLELEMENT("classics", xmlattributes('xxxxxxxxx' AS "eventId"),
XMLELEMENT("author",xmlattributes(FIRST_NAME AS "firstName", LAST_NAME AS "lastName", BIRTH AS "dob", DEATH AS "dod" )),
XMLELEMENT("bibliography",
XMLELEMENT("type",xmlattributes(DESCRIPTION AS "desc"),
XMLELEMENT("award", xmlattributes(NOBEL AS "nobelPrize"))),
XMLELEMENT("books",xmlattributes(BOOK_TITLE AS "title", PUBLISHED_DATE AS "published" ))))
FROM CLASSICS
WHERE AUTHOR_ID=23;
does not group by correctly by book, what i am trying to achieve is to have this XML as result:
<?xml version="1.0" encoding="UTF-8"?>
<classics eventId="234567890">
<author firstName="Ernest " lastName="Hemingway" dob="1899-07-21" dod="1961-07-02" />
<bibliography>
<inner>
<type desc="Novel" >
<award nobel="true" />
</type>
<books>
<inner title="The Old Man And The Sea" published="1952" />
<inner title="For Whom The Bell Tolls" published="1940" />
<inner title="A Farewell To Arms" published="1929" />
</books>
</inner>
</bibliography>
</classics>
at the moment i get 3 records - 3 distinct XML for each of the books - I have tried to use XMLAGG to group by AUTHOR_id (this field is the link for each book in the table CLASSICS table for an authir) - it is very simple structure ONE AUTHOR HAS PUBLISHED AT LEAST ONE BOOK OR MANY - and i need to store the "classic" xml object for an author containing an array of books inside another array "bibliography"
this is the code i tried to used inside the XMLELEMENT "bibliography" for the Array "books":WITH NO LUCK
select
xmlagg(
xmlelement("books",
xmlattributes(BOOK_TITLE AS "title", PUBLISHED_DATE AS "published" )
))from CLASSICS
GROUP BY AUTHOR_ID
the main goal in the end is to reach this JSON structure after i have the XML:
{
   "eventId": "234567890",
   "author": {
      "firstName": "Ernest",
      "lastName": "Hemingway",
      "dob": "1899-07-21",
      "dod": "1961-07-02"
   },
   "bibliography": [
      {
         "type": {
            "desc": "Novel",
            "award": {
               "nobelPrize": "1954"
            }
         },
         "books": [
            {
               "title": "The Old Man And The Sea",
               "published": "1952"
            },
            {
               "title": "For Whom The Bell Tolls",
               "published": "1940"
            },
            {
               "title": "A Farewell To Arms",
               "published": "1929"
            }
         ]
      }
   ]
}
but seems quiet complicated declare arrays of objects in XML/SQL.
any suggestions?

How can I query max date? (PostgreSQL)

Using PostgreSQL, I would like to be able to only see Document IDs with the latest modification time stamp. I am having difficulty getting this working and was wondering if anyone had any pointers?
Here is my current code:
SELECT cmsdw_document.document_id as "Document ID",
cmsdw_activity_meta.activity_name as "Activity Name",
cmsdw_document.title as "Title",
cmsdw_document.creation_ts as "Creation Timestamp",
cmsdw_document.modification_ts as "Modification Timestamp",
cmsdw_user.firstname as "First Name",
cmsdw_user.lastname as "Last Name",
cmsdw_container.name as "Name",
cmsdw_document_stats_fact.content_id as "Content ID",
cmsdw_document_stats_fact.views as "Views",
cmsdw_document_stats_fact.likes as "Likes",
cmsdw_document_stats_fact.bookmarks as "Bookmarks",
cmsdw_document_stats_fact.comments as "Comments",
cmsdw_document_stats_fact.shares as "Shares",
cmsdw_document_stats_fact.unique_viewers as "Unique Viewers"
FROM
public.cmsdw_document,
public.cmsdw_document_stats_fact,
public.cmsdw_container,
public.cmsdw_object,
public.cmsdw_user,
public.cmsdw_activity_fact,
public.cmsdw_activity_meta
WHERE
cmsdw_activity_fact.activity_type = cmsdw_activity_meta.activity_type AND
cmsdw_document_stats_fact.content_id = cmsdw_object.object_id AND
cmsdw_document.document_id = cmsdw_object.object_id AND
cmsdw_container.container_id = cmsdw_document.container_id AND
cmsdw_object.dw_object_id = cmsdw_activity_fact.direct_dw_object_id AND
cmsdw_object.object_type = cmsdw_activity_fact.direct_object_type AND
cmsdw_activity_fact.user_id = cmsdw_user.user_id AND
cmsdw_container.name = 'Getting Started' AND
cmsdw_object.object_type = 102 AND
cmsdw_activity_fact.activity_type = 20;
You should fix your query to have proper join syntax -- simple rule: never use commas in the from clause.
For your query, you can replace the select with with:
select distinct on (cmsdw_document.document_id) . . .
The ". . ." is the rest of your query. Then add:
order by cmsdw_document.document_id, cmsdw_document.modification_ts desc
This should give you the latest document, using a Postgres extension.