I am trying to learn ML models for predicting stock prices and initially, I tried using DataReader
import pandas_datareader as web
df = web.DataReader('AAPL', data_source='yahoo', start='2016-01-01', end='2021-08-01')
But I get a RemoteDataError and kept hitting a dead end trying to figure it out so I tried using tiingo
https://tiingo-python.readthedocs.io/en/latest/readme.html
I read through the documentation and tried passing a dictionary with 'api_key' as a key
into my tiingo client, ie.
from tiingo import TiingoClient
client = TiingoClient()
config = {}
config['session'] = True
config['api_key'] = 'my_api_key'
client = TiingoClient(config)
The documentation says I can now use TiingoClient to make API calls, however,
RuntimeError: Tiingo API Key not provided. Please provide via environment variable or config argument.
It is quite challenging learning the ML models and its syntax but what compounds the difficulty for me is what some data-scientests consider to be trivial as they don't typically deal with gathering or scraping data. Maybe my question is trivial but I've spent about an hour trying to figure out how to import data properly for stock prices and the only method that worked for me so far is
df = web.get_data_yahoo('stock symbol')
but I would like to grasp the other ways of importing stock prices via Tiingo and DataReader so if anyone can provide explanations/tips/suggestions I'd greatly appreciate it.
EDIT: for my tiingo account I did not buy any subscription plan for using their data as I was under the impression I can access data for free with my api-key
This is what I use, but its identical to what you are using it seems.
config = {}
config['session'] = True
config['api_key'] = "key here"
client = TiingoClient(config)
Remove this line: TiingoClient()
Related
I have Apache Arrow data on the server (Python) and need to use it in the browser. It appears that Arrow Flight isn't implemented in JS. What are the best options for sending the data to the browser and using it there?
I don't even need it necessarily in Arrow format in the browser. This question hasn't received any responses, so I'm adding some additional criteria for what I'm looking for:
Self-describing: don't want to maintain separate schema definitions
Minimal overhead: For example, an array of float32s should transfer as something compact like a data type indicator, length value and sequence of 4-byte float values
Cross-platform: Able to be easily sent from Python and received and used in the browser in a straightforward way
Surely this is a solved problem? If it is I've been unable to find a solution. Please help!
Building off of the comments on your original post by David Li, you can implement a non-streaming version what you want without too much code using PyArrow on the server side and the Apache Arrow JS bindings on the client. The Arrow IPC format satisfies your requirements because it ships the schema with the data, is space-efficient and zero-copy, and is cross-platform.
Here's a toy example showing generating a record batch on server and receiving it on the client:
Server:
from io import BytesIO
from flask import Flask, send_file
from flask_cors import CORS
import pyarrow as pa
app = Flask(__name__)
CORS(app)
#app.get("/data")
def data():
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]
batch = pa.record_batch(data, names=['f0', 'f1', 'f2'])
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return send_file(BytesIO(sink.getvalue().to_pybytes()), "data.arrow")
Client
const table = await tableFromIPC(fetch(URL));
// Do what you like with your data
Edit: I added a runnable example at https://github.com/amoeba/arrow-python-js-ipc-example.
I'm trying to prototype using the SmartRedis Python client to interact with the SmartSim Orchestrator. Is it possible to launch the orchestrator without any other models in the experiment? If so, what would be the best way to do so?
It is entirely possible to do that. A SmartSim Experiment can contain different types of 'entities' including Models, Ensembles (i.e. groups of Models), and Orchestrator (i.e. the Redis-backed database). None of these entities, however, are 'required' to be in the Experiment.
Here's a short script that creates an experiment which includes only a database.
from SmartSim import Experiment
NUM_DB_NODES = 3
exp = Experiment("Database Only")
db = exp.create_database(db_nodes=NUM_DB_NODES)
exp.generate(db)
exp.start(db)
After this, the Orchestrator (with the number of shards specified by NUM_DB_NODES) will have been spunup. You can then connect the Python client using the following line:
client = smartredis.Client(db.get_address()[0],NUM_DB_NODES>1)
I'm new in Cytoscape. I want to know how can I run an app (for example MCL clustering algorithm) multiple times with different parameters in Cytoscape. Is there any way to write an script to do that instead of running manually multiple times for different parameters?
Thanks!
Thanks Scooter.
I saw his answer.
Still I have problem with MCODE.
I figured it out by reading this paper "Cytoscape Automation: empowering workflow-based network analysis".
I want to put the script here in the case that maybe somebody has question.
From python you need to import
import requests, json
import numpy
REST_ENDPOINT = 'http://localhost:1234'
and then let's say we want to use affinity propagation clustering algorithm, then you can go to help->Automation->CyRest command API. Here you can find the app and all its parameters. You load the input network from cytoscape in the beginning.
counter = 0
ap_clusters = dict()
for i in numpy.arange(-1.0, 1.1, 0.1):
message_body = {
"preference": str(round(i,1))
}
response = requests.post(REST_ENDPOINT + '/v1/commands/cluster/ap', data =
json.dumps(message_body), headers = {'Content-Type': 'application/json'})
response_data = response.json()['data']
ap_clusters[counter] = response_data['clusters']
counter += 1
above is a code to call AP clustering multiple times from python.
For AP and MCL the code works for multiple parameters. However when I tried to call MCODE with different set of parameters, it stopped the connection and closed the cytoscape app. It can only run for on set of parameters.
This is the error:
" raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"
here is the code for mcode algorithm:
counter = 0
mcode_clusters = dict()
for i in numpy.arange(3,6,1):
for j in numpy.arange(0.1,0.56,0.05): #----vertex weight percentage
for h in ["on","off"]:
for f in ["on","off"]:
if f=="on":
for p in [0,0.1,0.2]: #---fluffing percentage
message_body = {
"fluff" : f,
"fluffNodeDensityCutoff" : str(round(p,1)),
"haircut" : h,
"maxDepthFromStart" : str(i),
"nodeScoreCutoff": str(round(j,1))
}
response = requests.post(REST_ENDPOINT + '/v1/commands/cluster/mcode', data = json.dumps(message_body), headers = {'Content-Type': 'application/json'})
response_data = response.json()['data']
mcode_clusters[counter] = response_data['clusters']
counter += 1
If you have any solutions I really appreciate it if you could share it with me.
Thanks.
SaRa
I think this was answered by Ruth pretty clearly in cytoscape-helpdesk:
You can do all of the above. Whatever is easiest for you.
There is a library py2cytoscape that you can use to issue commands to cytoscape from > python. Info can be found here: https://py2cytoscape.readthedocs.io/en/latest/
for more info on automation in cytoscape check out: http://manual.cytoscape.org/en/stable/Programmatic_Access_to_Cytoscape_Features_Scripting.html
But you can also run it through automation. You can create a text file with each of > your commands (for example a list of commands like: cluster mcl attribute="correlation" network=1234") and then go to Tools --> execute batch file to > execute the whole file. I am not sure if support loops. If you want to loop through anything I would recommend using python.
Thanks,
Ruth
I'll just add that currently, looping isn't supported in batch files.
-- scooter
In regard to my problem, I have to say that:
There are two mcode in cytoscape. One is in clusterMaker and the other belongs to cytoscape. When I tried to call mcode, I used the command "'/v1/commands/cluster/mcode'" I called the one in clusterMaker however the parameters' name was based on the one in cytoscape. I changed the command to "'/v1/commands/mcode/cluster'" and the the problem is solved now.
Many thanks.
SaRa
support team,
I am using google bigquery API python lib to test some operations. One purpose is to get the job information. Then we can better control all our queries from the API. I found there is a get() method mentioned in the REST reference here, which can get the job information. But in the API python lib here, I can not find any doc about this get() method or something can finish the same operation.
Can you help to provide me any guide doc about any method in the python lib can get the job information?
Thanks
Zhihong
You are looking at the documentation for the translate API rather than BigQuery. See job_from_resource under the BigQuery client documentation.
Based on Elliott's suggestion, I got the job info I need after run a query, but not figure out how to fetch the job info for an exist job, which I think not needed anymore if got query info after each operation. The python code is as below:
from google.cloud import bigquery
client = bigquery.Client()
query = client.run_sunc_query(sql)
query.use_legacy_sql = False
query.use_query_cache = True
query.run()
trows = query.total_rows
billed_byte = query.total_bytes_processed
More query info parameters can be found here and more example code can be found here.
We currently run SQL reports to extract test execution output so that we can review how successful a test has been and then make an educated guess of which tests to add to our regression suites.
However this is time consuming as it requires someone to go through all the data and make certain assumptions.
I've been tasked with looking into the possibility of using artificial intelligence to sift through the data instead and would like to know if anyone has tried this and how they implemented.
I'm not sure if this will do, but you can use out-of-the-box Python's scikit-learn
It is as simple as:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
import pandas as pd
####DATA PREP##
data = pd.read_csv('filepath')
#Forgot the target xD
# target = pd.read_csv('target_data_filepath')
target = data.target #If target is in data
other_data = pd.read_csv('filepath_other')
###MAKE MODEL##
tfidf_vect = TfidfVectorizer()
mpl_class = MLPClassifier()
pipe = Pipeline([('Tfidf Vectorizer', tfidf_vect),('MLP Classifier', mlp_class)]
pipe.fit(data, target) #remove target from data beforehand if applies
####PREDICT###
pipe.predict(other_data)
data is your text in separate entries, whole output per a single record
target is what you found beforehand, wether it should be included somewhere or not
other_data is what you want to test
But beware that above is just a mockup and I don't guarantee that I had all method names correct. For reading just follow scikit-learn's doku, quite expensive but extensive books like Building Machine Learning Systems with Python on Packt and lots of lots of other free blogs like this machinelearningmastery.com