split character string in cell with sql databricks - apache-spark-sql

I have the following input:
order_id
item_order_detail
17889
{"itemID":"1123","itemName":"banh baom1"},{"itemID":"1124","itemName":"khan tam"},{"itemID":"1125","itemName":"nuoc giat"},{"itemID":"1127","itemName":"sua tam da"},{"itemID":"1110","itemName":"phao khong9"},{"itemID":"1121","itemName":"het lam sao"},{"itemID":"1134","itemName":"trai ban"},{"itemID":"1141","itemName":"tim ban"},{"itemID":"1177","itemName":khai hang"},{"itemID":"1161","itemName":"chinh phat"}
17892
{"itemID":"1177","itemName":"hang tang "},{"itemID":"1167","itemName":"ban tam"},{"itemID":"1188","itemName":"nam hoa"},{"itemID":"11667","itemName":"mua dong"},{"itemID":"1198","itemName":"57mua xuan yi"},{"itemID":"1156","itemName":"hai hoang"},{"itemID":"1173","itemName":"tim ve hoa"},{"itemID":"1165","itemName":"nam hoa tim hoa"},{"itemID":"1187","itemName":"tram cai cao"},{"itemID":"116653","itemName":"Set TTT44"}
I am looking for a way to split the characters in the item_order_detail column into 2 columns itemID and itemName. As below output table uses SQL function in databricks with spark_sql version 3.2.1.
order_id
itemID
itemName
17889
1123
banh baom1
17889
1124
khan tam
17889
1125
nuoc giat
17889
1127
sua tam da
17889
1110
phao khong9
17889
1121
het lam sao
17889
1134
trai ban
17889
1141
tim ban
17889
1177
khai hang
17889
1161
chinh phat
17892
1177
hang tang
17892
1167
ban tam
17892
1188
nam hoa
17892
11667
mua dong
17892
1198
57mua xuan yi
17892
1156
hai hoang
17892
1173
tim ve hoa
17892
1165
nam hoa tim hoa
17892
1187
tram cai cao
17892
116653
Set TTT44
Can someone suggest a solution for me?
Thanks all

Related

How to build reports in Oracle SQL developer

I need to generate 10 business reports based on the following tables and I'm having some troubles. I know I do not have enough for 10 reports yet, but I am having troubles starting.
CARE_CENTER
Care Center ID|Care Center Name |Care Center Address |Nurse In Charge
------------------+--------------------------+-----------------------------------+--------------------
11111 |Centers Health Care |4770 White Plains Rd, Bronx, NY |Nurse Johnson
22222 |Bronx Urgent Care |1733 E 174th Street, Bronx, NY |Nurse Robinson
33333 |BronxCare Special Care Center|1265 Fulton Avenue, Bronx, NY |Nurse Jones
44444 |Gold Crest Care Center |2316 Bruner Avenue, Bronx, NY |Nurse Gonzalez
55555 |Regeis Care Center |3200 Baychester Avenue, Bronx, NY |Nurse Waterman
66666 |MedCarePlus |1643 Westchester Avenue, Bronx, NY |Nurse Connor
77777 |ArchCare Senior Center |900 Intervale, Avenue, Bronx, NY |Nurse Rodriguez
88888 |Bronx Center for Rehab |1010 Underhill Avenu, Bronx, NY |Nurse Morales
VISIT_CARE_CENTER
Patient ID|Visit Number|Care Center ID
-------------+-----------------+-----------------
1122 |78945 |11111
2233 |89123 |22222
3344 |91456 |33333
4455 |64981 |44444
5566 |12065 |55555
6677 |98106 |66666
7788 |40169 |77777
8899 |26013 |88888
Volunteer_Assigned_Care_Center
Volunteer ID|Care Center ID
----------------+------------------
12333 |11111
23444 |22222
34555 |33333
45666 |44444
56777 |55555
67888 |66666
78999 |77777
89000 |88888
VOLUNTEER
Volunteer ID|Interest ID
---------------+-------------
12333 |00001
23444 |00002
34555 |00003
45666 |00004
56777 |00005
67888 |00006
78999 |00007
89000 |00008
INTEREST
Interest ID|Interest Description
--------------+--------------------
00001 |Organzing
00002 |Coordinating
00003 |Daily Activites
00004 |Assisting with fundraising
00005 |Planning Special Events
00006 |Feeding Patients
00007 |Cleaning Social Rooms
00008 |Caring for Visitors
I need to generate a report that shows the Care centers and the volunteer relationships.
How would I write the SQL query to generate this report based on the above table structure?
You basically need a join first
Select c.CareCenterID,
c.CareCenterName,
c.CareCenterAddress,
c.NurseInCharge, v.VolunteerID,
i.InterestID, i.InterestDescription
From
CARE_CENTER c
Join
Volunteer_Assigned_Care_Center va
On
c.CareCenterID =va.CareCenterID
Join
VOLUNTEER v
On va.VOLUNTEERId=
v.VOLUNTEERId
Join
Interest i
v.InterestId=i.InterestId

Change language for return value of weekdayname

I'm printing some weekdays to a spreadsheet using the weekdayname-function. This works fine, except that the weekdays are written in the local language (Norwegian), instead of in English as I want it to. Is there any way to specify what language the function returns its results it, or am I stuck making my own UDF?
Try this
Debug.Print Application.WorksheetFunction.Text(1,"[$-414]DDDD")
Returns søndag for WeekDay 1 for Norwegian_Bokmal (414)
For a complete list of ID's see the link List of Language Packs and Their Codes for Windows 2000 Domain Controllers
In case that link ever dies, here is the complete list
Locale/language Language ID in hexadecimal
-------------------------------------------------------
Afrikaans 436
Albanian 041c
Arabic_Saudi_Arabia 401
Arabic_Iraq 801
Arabic_Egypt 0c01
Arabic_Libya 1001
Arabic_Algeria 1401
Arabic_Morocco 1801
Arabic_Tunisia 1c01
Arabic_Oman 2001
Arabic_Yemen 2401
Arabic_Syria 2801
Arabic_Jordan 2c01
Arabic_Lebanon 3 001
Arabic_Kuwait 3401
Arabic_UAE 3801
Arabic_Bahrain 3c01
Arabic_Qatar 4001
Armenian 042b
Azeri_Latin 042c
Azeri_Cyrillic 082c
Basque 042d
Belarusian 423
Bulgarian 402
Catalan 403
Chinese_Taiwan 404
Chinese_PRC 804
Chinese_Hong_Kong 0c04
Chinese_Singapore 1004
Chinese_Macau 1404
Croatian 041a
Czech 405
Danish 406
Dutch_Standard 413
Dutch_Belgian 813
English_United_States 409
English_United_Kingdom 809
English_Australian 0c09
English_Canadian 1009
English_New_Zealand 1409
English_Irish 1809
English_South_Africa 1c09
English_Jamaica 2009
English_Caribbean 2409
English_Belize 2809
English_Trinidad 2c09
English_Zimbabwe 3009
English_Philippines 3409
Estonian 425
Faeroese 438
Farsi 429
Finnish 040b
French_Standard 040c
French_Belgian 080c
French_Canadian 0c0c
French_Swiss 100c
French_Luxembourg 140c
French_Monaco 180c
Georgian 437
German_Standard 407
German_Swiss 807
German_Austrian 0c07
German_Luxembourg 1007
German_Liechtenstein 1407
Greek 408
Hebrew 040d
Hindi 439
Hungarian 040e
Icelandic 040f
Indonesian 421
Italian_Standard 410
Italian_Swiss 810
Japanese 411
Kazakh 043f
Konkani 457
Korean 412
Latvian 426
Lithuanian 427
FYRO Macedonian 042f
Malay_Malaysia 043e
Malay_Brunei_Darussalam 083e
Marathi 044e
Norwegian_Bokmal 414
Norwegian_Nynorsk 814
Polish 415
Portuguese_Brazilian 416
Portuguese_Standard 816
Romanian 418
Russian 419
Sanskrit 044f
Serbian_Latin 081a
Serbian_Cyrillic 0c1a
Slovak 041b
Slovenian 424
Spanish_Traditional_Sort 040a
Spanish_Mexican 080a
Spanish_Modern_Sort 0c0a
Spanish_Guatemala 100a
Spanish_Costa_Rica 140a
Spanish_Panama 180a
Spanish_Dominican_Republic 1c0a
Spanish_Venezuela 200a
Spanish_Colombia 240a
Spanish_Peru 280a
Spanish_Argentina 2c0a
Spanish_Ecuador 300a
Spanish_Chile 340a
Spanish_Uruguay 380a
Spanish_Paraguay 3c0a
Spanish_Bolivia 400a
Spanish_El_Salvador 440a
Spanish_Honduras 480a
Spanish_Nicaragua 4c0a
Spanish_Puerto_Rico 500a
Swahili 441
Swedish 041d
Swedish_Finland 081d
Tamil 449
Tatar 444
Thai 041e
Turkish 041f
Ukrainian 422
Urdu 420
Uzbek_Latin 443
Uzbek_Cyrillic 843
Vietnamese 042a

pandas merging two multi-level series

I have two multi-level Series and would like to merge them according to both index. The first Series looks like this:
# of restaurants
BORO CUISINE
BRONX American 425
Chinese 330
Pizza 206
BROOKLYN American 1254
Chinese 750
Cafe/Coffee/Tea 350
The second one has more rows and is like this:
# of votes
BORO CUISINE
BRONX American 2425
Caribbean 320
Chinese 3130
Pizza 3336
BROOKLYN American 21254
Caribbean 2320
Chinese 7250
Cafe/Coffee/Tea 3350
Pizza 13336
Setup:
s1 = pd.Series({('BRONX', 'American'): 425, ('BROOKLYN', 'Chinese'): 750, ('BROOKLYN', 'Cafe/Coffee/Tea'): 350, ('BRONX', 'Pizza'): 206, ('BROOKLYN', 'American'): 1254, ('BRONX', 'Chinese'): 330})
s2 = pd.Series({('BRONX', 'Caribbean'): 320, ('BRONX', 'American'): 2425, ('BROOKLYN', 'Chinese'): 7250, ('BROOKLYN', 'Cafe/Coffee/Tea'): 3350, ('BRONX', 'Pizza'): 3336, ('BROOKLYN', 'American'): 21254, ('BROOKLYN', 'Pizza'): 13336, ('BRONX', 'Chinese'): 3130, ('BROOKLYN', 'Caribbean'): 2320})
s1 = s1.rename_axis(['BORO','CUISINE']).rename('restaurants')
s2 = s2.rename_axis(['BORO','CUISINE']).rename('votes')
print (s1)
BORO CUISINE
BRONX American 425
Chinese 330
Pizza 206
BROOKLYN American 1254
Chinese 750
Cafe/Coffee/Tea 350
Name: restaurants, dtype: int64
print (s2)
BORO CUISINE
BRONX American 2425
Caribbean 320
Chinese 3130
Pizza 3336
BROOKLYN American 21254
Caribbean 2320
Chinese 7250
Cafe/Coffee/Tea 3350
Pizza 13336
Name: votes, dtype: int64
Use concat with parameter join if need inner join:
print (pd.concat([s1,s2], axis=1, join='inner'))
restaurants votes
BORO CUISINE
BRONX American 425 2425
Chinese 330 3130
Pizza 206 3336
BROOKLYN American 1254 21254
Cafe/Coffee/Tea 350 3350
Chinese 750 7250
#join='outer' is by default, so can be omited
print (pd.concat([s1,s2], axis=1))
restaurants votes
BORO CUISINE
BRONX American 425.0 2425
Caribbean NaN 320
Chinese 330.0 3130
Pizza 206.0 3336
BROOKLYN American 1254.0 21254
Cafe/Coffee/Tea 350.0 3350
Caribbean NaN 2320
Chinese 750.0 7250
Pizza NaN 13336
Another solution is use merge with reset_index:
#by default how='inner', so can be omited
print (pd.merge(s1.reset_index(), s2.reset_index(), on=['BORO','CUISINE']))
BORO CUISINE restaurants votes
0 BRONX American 425 2425
1 BRONX Chinese 330 3130
2 BRONX Pizza 206 3336
3 BROOKLYN American 1254 21254
4 BROOKLYN Chinese 750 7250
5 BROOKLYN Cafe/Coffee/Tea 350 3350
#outer join
print (pd.merge(s1.reset_index(), s2.reset_index(), on=['BORO','CUISINE'], how='outer'))
BORO CUISINE restaurants votes
0 BRONX American 425.0 2425
1 BRONX Chinese 330.0 3130
2 BRONX Pizza 206.0 3336
3 BROOKLYN American 1254.0 21254
4 BROOKLYN Chinese 750.0 7250
5 BROOKLYN Cafe/Coffee/Tea 350.0 3350
6 BRONX Caribbean NaN 320
7 BROOKLYN Caribbean NaN 2320
8 BROOKLYN Pizza NaN 13336

Insert data from another table

I've been stuck on this problem for a while
and here are there direction
Customers who have been invited to none of the “Mona Lisa” gala_night now get invited to to the 5-jan-2014 Mona Lisa gala_night. Show the insert command and the resulting Invite table.
Here are the two tables
SQL> select * from invite;
GALA_DATE PAINTING_NAME CUSTID
----------- ---------------------- ----------
10-nov-2013 Watercolors 1430
15-nov-2013 Woman 1502
15-nov-2013 Woman 1619
05-dec-2013 Watercolors 1207
22-dec-2013 Sunflowers 1806
22-dec-2013 Sunflowers 1904
31-dec-2013 Fiddler 1236
31-dec-2013 Fiddler 1280
05-jan-2014 Mona_Lisa 1111
05-jan-2014 Mona_Lisa 1502
25-jan-2014 Madonna 1806
25-jan-2014 Madonna 1822
25-jan-2014 Madonna 1904
18-feb-2014 Maya 1619
18-feb-2014 Maya 1822
18-feb-2014 Maya 1904
28-feb-2014 Mona_Lisa 1502
30-apr-2014 Lovers 1207
30-apr-2014 Lovers 1280
30-apr-2014 Lovers 1822
30-apr-2014 Lovers 1904
SQL> select * from customer;
CUSTID CUSTNAME CUSTBDATE CUST_TYPE BENEFACTOR DOCENT
1301 Disney 01-nov-1980 NM No No
1806 Garcia 31-dec-2000 VIP Yes No
1502 LaGardia 15-jan-1960 VIP Yes Yes
1207 Perry 20-jan-1960 VIP Yes Yes
1280 Beecham 31-dec-1979 VIP Yes No
1822 Becker 30-jan-1967 VIP Yes Yes
1140 Klim 05-apr-1990 NM No No
1509 Roberts 21-jul-1989 VIP Yes No
1619 Robins 20-feb-1990 VIP Yes Yes
1111 Bardot 28-feb-1970 VIP No No
1515 David 18-apr-1980 NM No No
1701 Martin 20-aug-1972 RM No No
1904 Gross 30-sep-1975 VIP Yes Yes
1236 Brooks 23-sep-1975 VIP Yes No
1430 Todd 15-jul-1982 VIP Yes Yes
I've tried doing
insert into invite(gala_date,painting_name,custid)
select invite.gala_date,invite.painting_name,invite.custid
from invite,customer
where invite.gala_date='05 Jan 2014' and invite.painting_name='Mona_Lisa'
and customer.custid=invite.custid
and customer.custid not in
(select custid from invite where gala_date in('05 Jan 2014')
and painting_name in('Mona_Lisa'));
0 rows created.
But as you can see the results yield "0 rows created"
Any thoughts?
Thanks!
You just need to use a not exists to get the results you want.
insert into invite(gala_date,painting_name,custid)
select '01/05/2014', 'Mona_Lisa', customer.custid
From customer
Where Not Exists (select '' from invite where invite.custid = customer.custid and painting_name = 'Mona_Lisa')
try this
insert into invite(gala_date,painting_name,custid)
select '01/05/2014', 'Mona_Lisa', a.custid
from customer a
left join invite b on (a.custid = b.custid and b.PaintingName = 'Mona_Lisa'
where b.custid is null
I think that will get you close, might need to tweak it as I didn't check

What is the effect of this order_by clause?

I don't understand what this order_by clause is doing and whether I need it or not:
select c.customerid, c.firstname, c.lastname, i.order_date, i.item, i.price from
items_ordered i, customers c
where i.customerid = c.customerid
group by c.customerid, i.item, i.order_date
order by i.order_date desc;
This produces this data:
10330 Shawn Dalton 30-Jun-1999 Pogo stick 28.00
10101 John Gray 30-Jun-1999 Raft 58.00
10410 Mary Ann Howell 30-Jan-2000 Unicycle 192.50
10101 John Gray 30-Dec-1999 Hoola Hoop 14.75
10449 Isabela Moore 29-Feb-2000 Flashlight 4.50
10410 Mary Ann Howell 28-Oct-1999 Sleeping Bag 89.22
10339 Anthony Sanchez 27-Jul-1999 Umbrella 4.50
10449 Isabela Moore 22-Dec-1999 Canoe 280.00
10298 Leroy Brown 19-Sep-1999 Lantern 29.00
10449 Isabela Moore 19-Mar-2000 Canoe paddle 40.00
10413 Donald Davids 19-Jan-2000 Lawnchair 32.00
10330 Shawn Dalton 19-Apr-2000 Shovel 16.75
10439 Conrad Giles 18-Sep-1999 Tent 88.00
10298 Leroy Brown 18-Mar-2000 Pocket Knife 22.38
10299 Elroy Keller 18-Jan-2000 Inflatable Mattress 38.00
10438 Kevin Smith 18-Jan-2000 Tent 79.99
10101 John Gray 18-Aug-1999 Rain Coat 18.30
10449 Isabela Moore 15-Dec-1999 Bicycle 380.50
10439 Conrad Giles 14-Aug-1999 Ski Poles 25.50
10449 Isabela Moore 13-Aug-1999 Unicycle 180.79
10101 John Gray 08-Mar-2000 Sleeping Bag 88.70
10299 Elroy Keller 06-Jul-1999 Parachute 1250.00
10438 Kevin Smith 02-Nov-1999 Pillow 8.50
10101 John Gray 02-Jan-2000 Lantern 16.00
10315 Lisa Jones 02-Feb-2000 Compass 8.00
10449 Isabela Moore 01-Sep-1999 Snow Shoes 45.00
10438 Kevin Smith 01-Nov-1999 Umbrella 6.75
10298 Leroy Brown 01-Jul-1999 Skateboard 33.00
10101 John Gray 01-Jul-1999 Life Vest 125.00
10330 Shawn Dalton 01-Jan-2000 Flashlight 28.00
10298 Leroy Brown 01-Dec-1999 Helmet 22.00
10298 Leroy Brown 01-Apr-2000 Ear Muffs 12.50
While if I remove the order_by clause completely, as in this query:
select c.customerid, c.firstname, c.lastname, i.order_date, i.item, i.price from
items_ordered i, customers c
where i.customerid = c.customerid
group by c.customerid, i.item, i.order_date;
I get these results:
10101 John Gray 30-Dec-1999 Hoola Hoop 14.75
10101 John Gray 02-Jan-2000 Lantern 16.00
10101 John Gray 01-Jul-1999 Life Vest 125.00
10101 John Gray 30-Jun-1999 Raft 58.00
10101 John Gray 18-Aug-1999 Rain Coat 18.30
10101 John Gray 08-Mar-2000 Sleeping Bag 88.70
10298 Leroy Brown 01-Apr-2000 Ear Muffs 12.50
10298 Leroy Brown 01-Dec-1999 Helmet 22.00
10298 Leroy Brown 19-Sep-1999 Lantern 29.00
10298 Leroy Brown 18-Mar-2000 Pocket Knife 22.38
10298 Leroy Brown 01-Jul-1999 Skateboard 33.00
10299 Elroy Keller 18-Jan-2000 Inflatable Mattress 38.00
10299 Elroy Keller 06-Jul-1999 Parachute 1250.00
10315 Lisa Jones 02-Feb-2000 Compass 8.00
10330 Shawn Dalton 01-Jan-2000 Flashlight 28.00
10330 Shawn Dalton 30-Jun-1999 Pogo stick 28.00
10330 Shawn Dalton 19-Apr-2000 Shovel 16.75
10339 Anthony Sanchez 27-Jul-1999 Umbrella 4.50
10410 Mary Ann Howell 28-Oct-1999 Sleeping Bag 89.22
10410 Mary Ann Howell 30-Jan-2000 Unicycle 192.50
10413 Donald Davids 19-Jan-2000 Lawnchair 32.00
10438 Kevin Smith 02-Nov-1999 Pillow 8.50
10438 Kevin Smith 18-Jan-2000 Tent 79.99
10438 Kevin Smith 01-Nov-1999 Umbrella 6.75
10439 Conrad Giles 14-Aug-1999 Ski Poles 25.50
10439 Conrad Giles 18-Sep-1999 Tent 88.00
10449 Isabela Moore 15-Dec-1999 Bicycle 380.50
10449 Isabela Moore 22-Dec-1999 Canoe 280.00
10449 Isabela Moore 19-Mar-2000 Canoe paddle 40.00
10449 Isabela Moore 29-Feb-2000 Flashlight 4.50
10449 Isabela Moore 01-Sep-1999 Snow Shoes 45.00
10449 Isabela Moore 13-Aug-1999 Unicycle 180.79
I'm not sure what the order_by is doing here and if it's having the intended effects.
It looks like it is ordering on i.ordered_date, but using string comparison rather than date comparison, which is why 30-Jun-1999 is placed before 29-Feb-2000. As a string "30-Jun-1999" > "28-Feb-2000", but as dates, the reverse is true.
Check the type of i.ordered_date in the items_ordered table - it should be datetime or similar - if it's varchar, then you will need to either change it to a date type, or cast the value to a date in the order-by clause. E.g.
order by CAST(i.order_date AS DATE) desc
You should always use proper DATETIME datatype to store dates