Has anybody used Blpapi/Pdblp etc packages to export Supply chain data (SPLC) from Bloomberg? - bloomberg

I am using pdblp package to gather supply chain data. But I am facing two problems. First, it just gives me the just first five suppliers of current date, not even all. Also, it does not give any historical data. By changing any date, still get the same results. I searched all web, there is no manual of tutorial for gathering supply chain data from Bloomberg. So, I was wondering if anybody has any experience or solution for this. Thank you so much!
The following shows my codes which I used con.bulkref_hist() and con.bulkref() in order to gather the suppliers of APPLE Inc. As shown in output, I just get 5 suppliers of Apple? Not all and even I cannot change the date.
import pdblp
con = pdblp.BCon(debug=True, port=8194, timeout=100000)
con.start()
con.bulkref('AAPL US Equity', 'SUPPLY_CHAIN_SUPPLIERS',ovrds=[('DZ414',"20100626")])
# Your code here, this should be a minimal reproducible example, see https://stackoverflow.com/help/mcve
pdblp.pdblp:INFO:Sending Request:
ReferenceDataRequest = {
securities[] = {
"AAPL US Equity"
}
fields[] = {
"SUPPLY_CHAIN_SUPPLIERS"
}
overrides[] = {
overrides = {
fieldId = "DZ414"
value = "20100626"
}
}
}
pdblp.pdblp:INFO:Event Type: 'RESPONSE'
pdblp.pdblp:INFO:Message Received:
ReferenceDataResponse = {
securityData[] = {
securityData = {
security = "AAPL US Equity"
eidData[] = {
}
fieldExceptions[] = {
}
sequenceNumber = 0
fieldData = {
SUPPLY_CHAIN_SUPPLIERS[] = {
SUPPLY_CHAIN_SUPPLIERS = {
Equity Ticker = "2317 TT Equity"
}
SUPPLY_CHAIN_SUPPLIERS = {
Equity Ticker = "4938 TT Equity"
}
SUPPLY_CHAIN_SUPPLIERS = {
Equity Ticker = "2382 TT Equity"
}
SUPPLY_CHAIN_SUPPLIERS = {
Equity Ticker = "601138 CH Equity"
}
SUPPLY_CHAIN_SUPPLIERS = {
Equity Ticker = "2330 TT Equity"
}
}
}
}
}
}
dates = ["20100626"]
con.bulkref_hist("AAPL US Equity", ["DZ405"],dates)
pdblp.pdblp:INFO:Sending Request:
ReferenceDataRequest = {
securities[] = {
"AAPL US Equity"
}
fields[] = {
"DZ405"
}
overrides[] = {
overrides = {
fieldId = "REFERENCE_DATE"
value = "20100626"
}
}
}
or the following:
pdblp.pdblp:INFO:Event Type: 'RESPONSE'
pdblp.pdblp:INFO:Message Received:
ReferenceDataResponse = {
securityData[] = {
securityData = {
security = "AAPL US Equity"
eidData[] = {
}
fieldExceptions[] = {
}
sequenceNumber = 0
fieldData = {
DZ405[] = {
DZ405 = {
Equity Ticker = "2317 TT Equity"
}
DZ405 = {
Equity Ticker = "4938 TT Equity"
}
DZ405 = {
Equity Ticker = "2382 TT Equity"
}
DZ405 = {
Equity Ticker = "601138 CH Equity"
}
DZ405 = {
Equity Ticker = "2330 TT Equity"
}
}
}
}
}
}
My outputs in both methods are the following. Not all suppliers.
date | ticker | field | name | value | position
-- | -- | -- | -- | -- | --
20100626 | AAPL US Equity | DZ405 | Equity Ticker | 2317 TT Equity | 0
20100626 | AAPL US Equity | DZ405 | Equity Ticker | 4938 TT Equity | 1
20100626 | AAPL US Equity | DZ405 | Equity Ticker | 2382 TT Equity | 2
20100626 | AAPL US Equity | DZ405 | Equity Ticker | 601138 CH Equity | 3
20100626 | AAPL US Equity | DZ405 | Equity Ticker | 2330 TT Equity | 4

In [1]: from xbbg import blp
In [2]: blp.bds('AAPL US Equity', 'SUPPLY_CHAIN_SUPPLIERS', Supply_Chain_Count_Override=10)
Out[2]:
equity_ticker
AAPL US Equity 2317 TT Equity
AAPL US Equity 4938 TT Equity
AAPL US Equity 2382 TT Equity
AAPL US Equity 601138 CH Equity
AAPL US Equity 2330 TT Equity
AAPL US Equity 034220 KS Equity
AAPL US Equity 005930 KS Equity
AAPL US Equity INTC US Equity
AAPL US Equity JBL US Equity
AAPL US Equity 2324 TT Equity
Btw, DZ414 is not in the list of available overrides for SUPPLY_CHAIN_SUPPLIERS, and the value can only be C or R.

Related

data frame parsing column scala

I have some problem with parsing Dataframe
val result = df_app_clickstream.withColumn(
"attributes",
explode(expr(raw"transform(attributes, x -> str_to_map(regexp_replace(x, '{\\}',''), ' '))"))
).select(
col("userId"),
col("attributes").getField("campaign_id").alias("app_campaign_id"),
col("attributes").getField("channel_id").alias("app_channel_id")
)
result.show()
I have input like this :
-------------------------------------------------------------------------------
| userId | attributes |
-------------------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e |{'campaign_id':082,'channel_id':'Chnl'}|
-------------------------------------------------------------------------------
and need to get output like this :
--------------------------------------------------------------------
| userId | campaign_id | channel_id|
--------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e | 082 | Facebook |
--------------------------------------------------------------------
but have error
you can try below solution
import org.apache.spark.sql.functions._
val data = Seq(("f6e8252f-b5cc-48a4-b348-29d89ee4fa9e", """{'campaign_id':082, 'channel_id':'Chnl'}""")).toDF("user_id", "attributes")
val out_df = data.withColumn("splitted_col", split(regexp_replace(col("attributes"),"'|\\}|\\{", ""), ","))
.withColumn("campaign_id", split(element_at(col("splitted_col"), 1), ":")(1))
.withColumn("channel_id", split(element_at(col("splitted_col"), 2), ":")(1))
out_df.show(truncate = false)
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|user_id |attributes |splitted_col |campaign_id|channel_id|
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|f6e8252f-b5cc-48a4-b348-29d89ee4fa9e|{'campaign_id':082, 'channel_id':'Chnl'}|[campaign_id:082, channel_id:Chnl]|082 |Chnl |
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+

How to pass dynamic variable in Scenario outline in Karate DSL

I have a situation where I need to pass a different variety of Date type variables in Karate.
For this, I created a JAVA method and calling it in a feature file as shown below.
I read that its cucumber limitation which can not support dynamic variables in Scenario Outline. I also read https://github.com/intuit/karate#the-karate-way but somehow, I am not getting any idea how to solve the below situation.
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(2)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": <meetingDate>",
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* print jsonPayLoad
Examples:
|meetingSource|host|topic|duration|meetingDate|
|ZOOM | abc |Quarter meeting|30|0|
|SKYPE | abc |Quarter meeting|30|'1980-08-12'|
|MS | abc |Quarter meeting|30|'2030-12-12'|
Below code works for me:
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(<meetingDate>)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": #(meetDate),
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* print jsonPayLoad
Examples:
| meetingSource | host | topic | duration | meetingDate |
| ZOOM | abc | Quarter meeting | 30 | 1 |
| SKYPE | abc | Quarter meeting | 30 | 2 |
| MS | abc | Quarter meeting | 30 | 3 |
Feature: test something
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(2)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": <meetingDate>,
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* eval if (jsonPayLoad.startDateTime == 0) jsonPayLoad.startDateTime = meetDate
* print jsonPayLoad
Examples:
|meetingSource|host|topic|duration|meetingDate|
|ZOOM | abc |Quarter meeting|30|0|
|SKYPE | abc |Quarter meeting|30|'1980-08-12'|
|MS | abc |Quarter meeting|30|'1999-08-12'|
You must be missing something, and it looks like you have a few typos.
Let's take a simple example that works for me:
Feature:
Background:
* def fun = function(x){ return 'hello ' + x }
Scenario Outline:
* match fun(name) == 'hello foo'
Examples:
| name |
| foo |
So the point is - you can plug in a function that uses data from your Examples table to dynamically generate more data.
If you are still stuck, please follow this process: https://github.com/intuit/karate/wiki/How-to-Submit-an-Issue

Spark dataframe inner join without duplicate match

I want to join two dataframes based on certain condition is spark scala. However the catch is if row in df1 matches any row in df2, it should not try to match same row of df1 with any other row in df2. Below is sample data and outcome I am trying to get.
DF1
--------------------------------
Emp_id | Emp_Name | Address_id
1 | ABC | 1
2 | DEF | 2
3 | PQR | 3
4 | XYZ | 1
DF2
-----------------------
Address_id | City
1 | City_1
1 | City_2
2 | City_3
REST | Some_City
Output DF
----------------------------------------
Emp_id | Emp_Name | Address_id | City
1 | ABC | 1 | City_1
2 | DEF | 2 | City_3
3 | PQR | 3 | Some_City
4 | XYZ | 1 | City_1
Note:- REST is like wild card. Any value can be equal to REST.
So in above sample emp_name "ABC" can match with City_1, City_2 or Some_City. Output DF contains only City_1 because it finds it first.
You seem to have a custom logic for your join. Basically I've been to come up with the below UDF.
Note that you may want to change the logic for the UDF as per your requirement.
import spark.implicits._
import org.apache.spark.sql.functions.to_timestamp
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.first
//dataframe 1
val df_1 = Seq(("1", "ABC", "1"), ("2", "DEF", "2"), ("3", "PQR", "3"), ("4", "XYZ", "1")).toDF("Emp_Id", "Emp_Name", "Address_Id")
//dataframe 2
val df_2 = Seq(("1", "City_1"), ("1", "City_2"), ("2", "City_3"), ("REST","Some_City")).toDF("Address_Id", "City_Name")
// UDF logic
val join_udf = udf((a: String, b: String) => {
(a,b) match {
case ("1", "1") => true
case ("1", _) => false
case ("2", "2") => true
case ("2", _) => false
case(_, "REST") => true
case(_, _) => false
}})
val dataframe_join = df_1.join(df_2, join_udf(df_1("Address_Id"), df_2("Address_Id")), "inner").drop(df_2("Address_Id"))
.orderBy($"City_Name")
.groupBy($"Emp_Id", $"Emp_Name", $"Address_Id")
.agg(first($"City_Name"))
.orderBy($"Emp_Id")
dataframe_join.show(false)
Basically post applying UDF, what you get is all possible combinations of the matches.
Post that when you apply groupBy and make use of first function of agg, you would only get the filtered values as what you are looking for.
+------+--------+----------+-----------------------+
|Emp_Id|Emp_Name|Address_Id|first(City_Name, false)|
+------+--------+----------+-----------------------+
|1 |ABC |1 |City_1 |
|2 |DEF |2 |City_3 |
|3 |PQR |3 |Some_City |
|4 |XYZ |1 |City_1 |
+------+--------+----------+-----------------------+
Note that I've made use of Spark 2.3 and hope this helps!
{
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.sql.functions._
object JoinTwoDataFrame extends App {
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
val df1 = Seq(
(1, "ABC", "1"),
(2, "DEF", "2"),
(3, "PQR", "3"),
(4, "XYZ", "1")
).toDF("Emp_id", "Emp_Name", "Address_id")
val df2 = Seq(
("1", "City_1"),
("1", "City_2"),
("2", "City_3"),
("REST", "Some_City")
).toDF("Address_id", "City")
val restCity: Option[String] = Some(df2.filter('Address_id.equalTo("REST")).select('City).first()(0).toString)
val res = df1.join(df2, df1.col("Address_id") === df2.col("Address_id") , "left_outer")
.select(
df1.col("Emp_id"),
df1.col("Emp_Name"),
df1.col("Address_id"),
df2.col("City")
)
.withColumn("city2", when('City.isNotNull, 'City).otherwise(restCity.getOrElse("")))
.drop("City")
.withColumnRenamed("city2", "City")
.orderBy("Address_id", "City")
.groupBy("Emp_id", "Emp_Name", "Address_id")
.agg(collect_list("City").alias("cityList"))
.withColumn("City", 'cityList.getItem(0))
.drop("cityList")
.orderBy("Emp_id")
res.show(false)
// +------+--------+----------+---------+
// |Emp_id|Emp_Name|Address_id|City |
// +------+--------+----------+---------+
// |1 |ABC |1 |City_1 |
// |2 |DEF |2 |City_3 |
// |3 |PQR |3 |Some_City|
// |4 |XYZ |1 |City_1 |
// +------+--------+----------+---------+
}
}

Nested dictionary to pandas df

My first question in stackoverflow!
I have a triple nested dictionary and I want to convert it to pandas df.
The dictionary has the following structure:
dictionary = {'CompanyA': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}},
'CompanyB': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}},
'CompanyC': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}}}
So far I been able to construct a df using:
df = pd.DataFrame.from_dict(dictionary)
But the results its a df with values as dictionaries like this:
CompanyA CompanyB CompanyC
Revenue {date1:$0,..} {date1:$1,..} {date1:$0,..}
ProfitLoss{date1:$0,..} {date1:$0,..} {date1:$0,..}
I want the table to look like this:
CompanyA CompanyB CompanyC
Revenue Date1 $1 $1 $1
Date2 $2 $2 $2
ProfitLoss Date1 $0 $0 $0
Date2 $1 $1 $1
I had tried using pd.MultiIndex.from_dict (.from_product) and change the index, with no result. Any idea what to do next? Any hint will be appreciated!
I see you're new, but there may be an answer to a similar question, see this. Next time try looking for a similar question using keywords. For example, I found the one I linked by searching "pandas nested dict", and that's it, the first link was the SO post!
Anyway, you need to reshape your input dict. You want a dict structured like this:
{
'CompanyA': {
('Revenue', 'date1'): 1,
('ProfitLoss', 'date1'): 0,
}
...
}
I would do something like this:
import pandas as pd
data = {
'CompanyA': {
'Revenue': {
"date1": 1,
"date2": 2
},
'ProfitLoss': {
"date1": 0,
"date2": 1
}
},
'CompanyB': {
'Revenue': {
"date1": 4,
"date2": 5
},
'ProfitLoss': {
"date1": 2,
"date2": 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
Output:
CompanyA CompanyB
ProfitLoss date1 0 2
date2 1 3
Revenue date1 1 4
date2 2 5
EDIT
Using actual datetimes to respond to your comment:
import pandas as pd
import datetime as dt
date1 = dt.datetime.now()
date2 = date1 + dt.timedelta(days=365)
data = {
'CompanyA': {
'Revenue': {
date1: 1,
date2: 2
},
'ProfitLoss': {
date1: 0,
date2: 1
}
},
'CompanyB': {
'Revenue': {
date1: 4,
date2: 5
},
'ProfitLoss': {
date1: 2,
date2: 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
Output:
CompanyA CompanyB
ProfitLoss 2018-10-08 11:19:09.006375 0 2
2019-10-08 11:19:09.006375 1 3
Revenue 2018-10-08 11:19:09.006375 1 4
2019-10-08 11:19:09.006375 2 5

Change a Pandas DataFrame with Integer Index

I have converted a Python dict to pandas dataframe:
dict = {
u'erterreherh':
{
u'account': u'rgrgrgrg',
u'data': u'192.168.1.1',
},
u'hkghkghkghk':
{
u'account': u'uououopuopuop',
u'data': '192.168.1.170',
},
}
df = pd.DataFrame.from_dict(dict, orient='index')
account data
aa bbss
zz sssss
vv sss
"account" is index here. I want to dataframe like below, how can I do this?
account data
0 aa bbss
1 zz sssss
2 vv ss
You need rename_axis for change index name and last reset_index:
d = {
u'erterreherh':
{
u'account': u'rgrgrgrg',
u'data': u'192.168.1.1'
},
u'hkghkghkghk':
{
u'account': u'uououopuopuop',
u'data': '192.168.1.170'
}
}
df = pd.DataFrame.from_dict(d, orient='index')
df = df.rename_axis('acount1').reset_index()
print (df)
acount1 data account
0 erterreherh 192.168.1.1 rgrgrgrg
1 hkghkghkghk 192.168.1.170 uououopuopuop
If need overwrite column account by values from index:
df = df.assign(account=df.index).reset_index(drop=True)
print (df)
data account
0 192.168.1.1 erterreherh
1 192.168.1.170 hkghkghkghk
df.reset_index() is indeed working for me.
df
data
account
aa bbss
zz sssss
vv sss
df = df.reset_index()
account data
0 aa bbss
1 zz sssss
2 vv sss