Julia: Make Dataframe from iterator output - iterator

I am attempting to create a data frame in Julia with two columns, x and y, representing the cartesian product of x values and y values. I think in summary this could be described as using iterator output, a 2D array of tuples, as the argument for DataFrame.
Here's code for obtaining an array of the product tuples (Julia 1.4.2):
x = [0:0.1:2;]
y = [0:1.5:30;]
product = collect(Iterators.product(x, y))
I want something like this:
x y
Float64 Float64
1 0.0 0.0
2 0.1 1.5
3 0.2 3.0
4 0.3 4.5
5 0.4 6.0
6 0.5 7.5
Many thanks for looking.

Cartesian product is a set of ordered tuples, so in fact Iterators.product returns in such case matrix iterator over tuples, i.e. calling collect on Iterators.product(x,y) will return Matrix{Tuple{Float64,Float64}}
But DataFrame constructor can eat iterator as an argument and return desired result
x = 0:0.1:2
y = 0:1.5:30
product = Iterators.product(x, y)
df = DataFrame(product)
rename!(df, [:x, :y])
Note that you cannot specify names of df in constructor like DataFrame(product, [:x, :y]) because there's no such method

Actually it is even easier. Just write:
julia> rename!(DataFrame(vec(product)), [:x, :y])
441×2 DataFrame
│ Row │ x │ y │
│ │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1 │ 0.0 │ 0.0 │
│ 2 │ 0.1 │ 0.0 │
⋮
│ 439 │ 1.8 │ 30.0 │
│ 440 │ 1.9 │ 30.0 │
│ 441 │ 2.0 │ 30.0 │
Another nice pattern for two columns is:
julia> flatten(DataFrame(x=x, y=Ref(y)), :y)
441×2 DataFrame
│ Row │ x │ y │
│ │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1 │ 0.0 │ 0.0 │
│ 2 │ 0.0 │ 1.5 │
⋮
│ 439 │ 2.0 │ 27.0 │
│ 440 │ 2.0 │ 28.5 │
│ 441 │ 2.0 │ 30.0 │
If you use Iterators.product without materializing it first it is a bit faster, and materializing Iterators.product is fastest (but uses a bit more memory):
julia> #benchmark rename!(DataFrame(Iterators.product($x, $y)), [:x, :y])
BenchmarkTools.Trial:
memory estimate: 10.98 KiB
allocs estimate: 56
--------------
minimum time: 9.400 μs (0.00% GC)
median time: 10.000 μs (0.00% GC)
mean time: 13.129 μs (8.31% GC)
maximum time: 5.644 ms (99.56% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark flatten(DataFrame(x=$x, y=Ref($y)), :y)
BenchmarkTools.Trial:
memory estimate: 17.72 KiB
allocs estimate: 80
--------------
minimum time: 10.299 μs (0.00% GC)
median time: 11.300 μs (0.00% GC)
mean time: 14.268 μs (7.56% GC)
maximum time: 5.400 ms (99.58% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark rename!(DataFrame(vec(collect(Iterators.product($x, $y)))), [:x, :y])
BenchmarkTools.Trial:
memory estimate: 18.03 KiB
allocs estimate: 58
--------------
minimum time: 6.600 μs (0.00% GC)
median time: 7.160 μs (0.00% GC)
mean time: 9.286 μs (11.73% GC)
maximum time: 1.104 ms (98.91% GC)
--------------
samples: 10000
evals/sample: 5

You can use Ranges just directly in the DataFrame constructor:
julia> using DataFrames
julia> df = DataFrame(x=0:0.1:2, y=0:1.5:30)
21×2 DataFrame
│ Row │ x │ y │
│ │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1 │ 0.0 │ 0.0 │
│ 2 │ 0.1 │ 1.5 │
│ 3 │ 0.2 │ 3.0 │
│ 4 │ 0.3 │ 4.5 │
│ 5 │ 0.4 │ 6.0 │
│ 6 │ 0.5 │ 7.5 │
⋮
│ 20 │ 1.9 │ 28.5 │
│ 21 │ 2.0 │ 30.0 │

Related

transform function on all columns of dataframe

I have a dataframe df and I am trying to apply a function to each of the cells. According to the documentation I should use the transform function.
The function should be applied to each column so I use [:] as a selector for all columns
transform(
df, [:] .=> ByRow(x -> (if (x > 1) x else zero(Float64) end)) .=> [:]
)
but it yields an exception
ArgumentError: Unrecognized column selector: Colon() => (DataFrames.ByRow{Main.workspace293.var"#1#2"}(Main.workspace293.var"#1#2"()) => Colon())
although when I am using a single column, it works fine
transform(
df, [:K0] .=> ByRow(x -> (if (x > 1) x else zero(Float64) end)) .=> [:K0]
)
The simplest way to do it is to use broadcasting:
julia> df = DataFrame(2*rand(4,3), [:x1, :x2, :x3])
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼───────────┼──────────┼──────────┤
│ 1 │ 0.945879 │ 1.59742 │ 0.882428 │
│ 2 │ 0.0963367 │ 0.400404 │ 0.599865 │
│ 3 │ 1.23356 │ 0.807691 │ 0.547917 │
│ 4 │ 0.756098 │ 0.595673 │ 0.29678 │
julia> #. ifelse(df > 1, df, 0.0)
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┤
│ 1 │ 0.0 │ 1.59742 │ 0.0 │
│ 2 │ 0.0 │ 0.0 │ 0.0 │
│ 3 │ 1.23356 │ 0.0 │ 0.0 │
│ 4 │ 0.0 │ 0.0 │ 0.0 │
you can also transform for it if you prefer:
julia> transform(df, names(df) .=> ByRow(x -> ifelse(x>1, x, 0.0)) .=> names(df))
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┤
│ 1 │ 0.0 │ 1.59742 │ 0.0 │
│ 2 │ 0.0 │ 0.0 │ 0.0 │
│ 3 │ 1.23356 │ 0.0 │ 0.0 │
│ 4 │ 0.0 │ 0.0 │ 0.0 │
Also looking at the linked pandas solution DataFrames.jl seems faster in this case:
julia> df = DataFrame(2*rand(2,3), [:x1, :x2, :x3])
2×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼────────────────────────────
1 │ 1.48781 1.20332 1.08071
2 │ 1.55462 1.66393 0.363993
julia> using BenchmarkTools
julia> #btime #. ifelse($df > 1, $df, 0.0)
6.252 μs (58 allocations: 3.89 KiB)
2×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼───────────────────────────
1 │ 1.48781 1.20332 1.08071
2 │ 1.55462 1.66393 0.0
(in pandas for 2x3 data frame it was ranging from 163 µs to 2.26 ms)

Duplicated columns in Julia Dataframes

In Python Pandas and R one can get rid of duplicated columns easily - just load the data, assign the column names, and select those that are not duplicated.
What is the best practices to deal with such data with Julia Dataframes? Assigning duplicated column names is not allowed here. I understand that only way would be to massage incoming data more, and get rid of such data before constructing a Dataframe?
The thing is that it is almost always easier to deal with duplicated columns in the dataframe that is already constructed, rather than in incoming data.
UPD: I meant the duplicated column names. I build dataframe from raw data, where columns names (and thus data) could be repeated.
UPD2: Python example added.
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.hstack([np.zeros((4,1)), np.ones((4,2))]), columns=["a", "b", "b"])
>>> df
a b b
0 0.0 1.0 1.0
1 0.0 1.0 1.0
2 0.0 1.0 1.0
3 0.0 1.0 1.0
>>> df.loc[:, ~df.columns.duplicated()]
a b
0 0.0 1.0
1 0.0 1.0
2 0.0 1.0
3 0.0 1.0
I build my Julia Dataframe from a Float32 matrix and then assign column names from a vector. That is where I need to get rid of columns that have duplicated names (already present in dataframe). That is the nature of underlying data, sometimes it has dups, sometimes not, I have no control on its creation.
Is this something you are looking for (I was not 100% sure from your description - if this is not what you want then please update the question with an example):
julia> df = DataFrame([zeros(4,3) ones(4,5)])
4×8 DataFrame
│ Row │ x1 │ x2 │ x3 │ x4 │ x5 │ x6 │ x7 │ x8 │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ 1 │ 0.0 │ 0.0 │ 0.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │
│ 2 │ 0.0 │ 0.0 │ 0.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │
│ 3 │ 0.0 │ 0.0 │ 0.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │
│ 4 │ 0.0 │ 0.0 │ 0.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │ 1.0 │
julia> DataFrame(unique(last, pairs(eachcol(df))))
4×2 DataFrame
│ Row │ x1 │ x4 │
│ │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1 │ 0.0 │ 1.0 │
│ 2 │ 0.0 │ 1.0 │
│ 3 │ 0.0 │ 1.0 │
│ 4 │ 0.0 │ 1.0 │
EDIT
To deduplicate column names use makeunique keyword argument:
julia> DataFrame(rand(3,4), [:x, :x, :x, :x], makeunique=true)
3×4 DataFrame
│ Row │ x │ x_1 │ x_2 │ x_3 │
│ │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼───────────┼──────────┼──────────┼───────────┤
│ 1 │ 0.410494 │ 0.775563 │ 0.819916 │ 0.0520466 │
│ 2 │ 0.0503997 │ 0.427499 │ 0.262234 │ 0.965793 │
│ 3 │ 0.838595 │ 0.996305 │ 0.833607 │ 0.953539 │
EDIT 2
So you seem to have access to column names when creating a data frame. In this case I would do:
julia> mat = [ones(3,1) zeros(3,2)]
3×3 Array{Float64,2}:
1.0 0.0 0.0
1.0 0.0 0.0
1.0 0.0 0.0
julia> cols = ["a", "b", "b"]
3-element Array{String,1}:
"a"
"b"
"b"
julia> df = DataFrame(mat, cols, makeunique=true)
3×3 DataFrame
│ Row │ a │ b │ b_1 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┤
│ 1 │ 1.0 │ 0.0 │ 0.0 │
│ 2 │ 1.0 │ 0.0 │ 0.0 │
│ 3 │ 1.0 │ 0.0 │ 0.0 │
julia> select!(df, unique(cols))
3×2 DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1 │ 1.0 │ 0.0 │
│ 2 │ 1.0 │ 0.0 │
│ 3 │ 1.0 │ 0.0 │

How to convert a GroupedDataFrame to a DataFrame in Julia?

I have performed calculations on subsets of a DataFrame by using the groupby function:
using RDatasets
iris = dataset("datasets", "iris")
describe(iris)
iris_grouped = groupby(iris,:Species)
iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame)
Now I would like to plot the results, but I get an error message for the following plot:
#df iris_avg bar(:Species,:SepalLength)
Only tables are supported
What would be the best way to plot the data? My idea would be to create a single DataFrame and go from there. How would I do this, ie how do I convert a GroupedDataFrame to a single DataFrame? Thanks!
To convert GroupedDataFrame into a DataFrame just call DataFrame on it, e.g.:
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
in your case.
You could also have written:
julia> combine(:SepalLength => mean, iris_grouped)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original GroupedDataFrame or
julia> by(:SepalLength => mean, iris, :Species)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original DataFrame.
I write the transformation as the first argument here, but typically, you would write it as the last (as then you can pass multiple transformations), e.g.:
julia> by(iris, :Species, :SepalLength => mean, :SepalWidth => minimum)
3×3 DataFrame
│ Row │ Species │ SepalLength_mean │ SepalWidth_minimum │
│ │ Categorical… │ Float64 │ Float64 │
├─────┼──────────────┼──────────────────┼────────────────────┤
│ 1 │ setosa │ 5.006 │ 2.3 │
│ 2 │ versicolor │ 5.936 │ 2.0 │
│ 3 │ virginica │ 6.588 │ 2.2 │
I think you might be better off using the by function to get to your iris_avg directly. by iterates through a DataFrame, and then applies the given function to the the results. Often, it's used with a do block.
julia> by(iris, :Species) do df
DataFrame(sepal_mean = mean(df.SepalLength))
end
3×2 DataFrame
│ Row │ Species │ sepal_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
Or equivalently,
julia> by(iris, :Species, SepalLength_mean = :SepalLength => mean)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
See here for more details/examples.
Alternatively, you can do it in several steps as you've done, then use DataFrame constructor to convert to a proper DataFrame:
julia> iris_grouped = groupby(iris,:Species);
julia> iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame);
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │

Is there an easy way to run DataFrames::by in parallel?

I have a large dataframe I want to compute on in parallel. The call I want to parallelize is
df = by(df, [:Chromosome], some_func)
Is there a way to easily parallelize this? Preferably without any copying.
Also, I guess the kinds of parallelization used should be different depending on the size of the groups created by by.
Minimal reproducible example to use in answers:
using DataFrames, CSV, Pkg
iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"))
iris_count = by(iris, [:Species], nrow)
On Windows run in console (adjust to the number of cores/threads you have):
$ set JULIA_NUM_THREADS=4
$ julia
On Linux run in console:
$ export JULIA_NUM_THREADS=4
$ julia
Now check if it works:
julia> Threads.nthreads()
4
Run the code below (I update your code to match Julia 1.0):
using CSV, DataFrames, BenchmarkTools
iris = CSV.read(joinpath(dirname(pathof(DataFrames)),"..","test/data/iris.csv"))
iris.PetalType = iris.PetalWidth .> 2; #add an additional column for testing
Let us define some function that operates on a part of a DataFrame
function nrow2(df::AbstractDataFrame)
val = nrow(df)
#do something much more complicated...
val
end
Now the most complicated part of the puzzle comes:
function par_by(df::AbstractDataFrame,f::Function,cols::Symbol...;block_size=40)
#f needs to be precompiled - we precompile using the first row of the DataFrame.
#If try to do it within #thread macro
#Julia will crash in most ugly and unexpected ways
#if you comment out this line you can observe a different crash with every run
by(view(df,1:1),[cols...],f);
nr = nrow(df)
local dfs = DataFrame()
blocks = Int(ceil(nr/block_size))
s = Threads.SpinLock()
Threads.#threads for block in 1:blocks
startix = (block-1)*block_size+1
endix = min(block*block_size,nr)
rv= by(view(df,startix:endix), [cols...], f)
Threads.lock(s)
if nrow(dfs) == 0
dfs = rv
else
append!(dfs,rv)
end
Threads.unlock(s)
end
dfs
end
Let's test it and aggregate the results
julia> res = par_by(iris,nrow2,:Species)
6×2 DataFrame
│ Row │ Species │ x1 │
│ │ String │ Int64 │
├─────┼────────────┼───────┤
│ 1 │ versicolor │ 20 │
│ 2 │ virginica │ 20 │
│ 3 │ setosa │ 10 │
│ 4 │ versicolor │ 30 │
│ 5 │ virginica │ 30 │
│ 6 │ setosa │ 40 │
julia> by(res, :Species) do df;DataFrame(x1=sum(df.x1));end
3×2 DataFrame
│ Row │ Species │ x1 │
│ │ String │ Int64 │
├─────┼────────────┼───────┤
│ 1 │ setosa │ 50 │
│ 2 │ versicolor │ 50 │
│ 3 │ virginica │ 50 │
The par_by also supports multiple columns
julia> res = par_by(iris,nrow2,:Species,:PetalType)
8×3 DataFrame
│ Row │ Species │ PetalType │ x1 │
│ │ String │ Bool │ Int64 │
├─────┼───────────┼───────────┼───────┤
│ 1 │ setosa │ false │ 40 │
⋮
│ 7 │ virginica │ true │ 13 │
│ 8 │ virginica │ false │ 17 │
#Bogumił Kamiński commented that it is reasonable to use groupby() before threading. Unless for some reason groupby cost is to high (requires full scan) this is the recommended way - makes the aggregation simpler.
ress = DataFrame(Species=String[],count=Int[])
for group in groupby(iris,:Species)
r = par_by(group,nrow2,:Species,block_size=15)
push!(ress,[r.Species[1],sum(r.x1)])
end
julia> ress
3×2 DataFrame
│ Row │ Species │ count │
│ │ String │ Int64 │
├─────┼────────────┼───────┤
│ 1 │ setosa │ 50 │
│ 2 │ versicolor │ 50 │
│ 3 │ virginica │ 50 │
Note that in the example above are only three groups so we parallelize over each group. However, if you have big number of groups you could consider running:
function par_by2(df::AbstractDataFrame,f::Function,cols::Symbol...)
res = NamedTuple[]
s = Threads.SpinLock()
groups = groupby(df,[cols...])
f(view(groups[1],1:1));
Threads.#threads for g in 1:length(groups)
rv= f(groups[g])
Threads.lock(s)
key=tuple([groups[g][cc][1] for cc in cols]...)
push!(res,(key=key,val=rv))
Threads.unlock(s)
end
res
end
julia> iris.PetalType = iris.PetalWidth .> 2;
julia> par_by2(iris,nrow2,:Species,:PetalType)
4-element Array{NamedTuple,1}:
(key = ("setosa", false), val = 50)
(key = ("versicolor", false), val = 50)
(key = ("virginica", true), val = 23)
(key = ("virginica", false), val = 27)
Let me know if it worked for you.
Since more people might have similar problem I will make this code into a Julia package (and that is why I kept this code to be very general)
Start julia with julia -p 4 then run
using CSV, DataFrames
iris = CSV.read(joinpath(dirname(pathof(DataFrames)),"..","test/data/iris.csv"))
g = groupby(iris, :Species)
pmap(nrow, [i for i in g])
This will run groupby in parallel.

Julia readtable from a stream instead of from a file

Is there a way to read a table from a network url or from a runpipe external command? It seems that DataFrame.readtable only supports reading from file.
For example in R we could do:
df = read.table(url("http://example.com/data.txt"))
x = read.table(pipe("zcat data.txt | sed /^#/d | cut -f '11-13'"), colClasses=c("integer","integer","integer"), fill=TRUE, row.names=NULL)
using DataFrames, Requests
julia> resp = get("https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD")
Response(200 OK, 17 headers, 27350 bytes in body)
julia> tbl = readtable(IOBuffer(resp.data));
julia> names(tbl)
46-element Array{Symbol,1}:
:JURISDICTION_NAME
:COUNT_PARTICIPANTS
:COUNT_FEMALE
:PERCENT_FEMALE
:COUNT_MALE
:PERCENT_MALE
:COUNT_GENDER_UNKNOWN
:PERCENT_GENDER_UNKNOWN
:COUNT_GENDER_TOTAL
:PERCENT_GENDER_TOTAL
:COUNT_PACIFIC_ISLANDER
:PERCENT_PACIFIC_ISLANDER
:COUNT_HISPANIC_LATINO
:PERCENT_HISPANIC_LATINO
:COUNT_AMERICAN_INDIAN
:PERCENT_AMERICAN_INDIAN
:COUNT_ASIAN_NON_HISPANIC
⋮
:PERCENT_PERMANENT_RESIDENT_ALIEN
:COUNT_US_CITIZEN
:PERCENT_US_CITIZEN
:COUNT_OTHER_CITIZEN_STATUS
:PERCENT_OTHER_CITIZEN_STATUS
:COUNT_CITIZEN_STATUS_UNKNOWN
:PERCENT_CITIZEN_STATUS_UNKNOWN
:COUNT_CITIZEN_STATUS_TOTAL
:PERCENT_CITIZEN_STATUS_TOTAL
:COUNT_RECEIVES_PUBLIC_ASSISTANCE
:PERCENT_RECEIVES_PUBLIC_ASSISTANCE
:COUNT_NRECEIVES_PUBLIC_ASSISTANCE
:PERCENT_NRECEIVES_PUBLIC_ASSISTANCE
:COUNT_PUBLIC_ASSISTANCE_UNKNOWN
:PERCENT_PUBLIC_ASSISTANCE_UNKNOWN
:COUNT_PUBLIC_ASSISTANCE_TOTAL
:PERCENT_PUBLIC_ASSISTANCE_TOTAL
julia> eltypes(tbl)
46-element Array{Type,1}:
Int64
Int64
Int64
Float64
Int64
Float64
Int64
Int64
Int64
Int64
Int64
Float64
Int64
Float64
Int64
Float64
Int64
⋮
Float64
Int64
Float64
Int64
Float64
Int64
Int64
Int64
Int64
Int64
Float64
Int64
Float64
Int64
Int64
Int64
Int64
With the deprecation of Requests in favor of HTTP here is an example on how to use HTTP.request and the body of the resulting call to request.
julia> using CSV, HTTP
julia> res = HTTP.request("GET", "http://users.csc.calpoly.edu/~dekhtyar/365-Winter2015/data/CARS/cars-data.csv")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Wed, 16 May 2018 12:46:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 05 Jan 2015 23:29:09 GMT
ETag: "330f-50bf00ea05b40"
Accept-Ranges: bytes
Content-Length: 13071
Content-Type: text/csv
Id,MPG,Cylinders,Edispl,Horsepower,Weight,Accelerate,Year
1,18,8,307,130,3504,12,1970
2,15,8,350,165,3693,11.5,1970
3,18,8,318,150,3436,11,1970
⋮
13071-byte body
"""
julia> res_buffer = IOBuffer(res.body)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=13071, maxsize=Inf, ptr=1, mark=-1)
julia> using DataFrames, DataStreams
julia> df = CSV.read(res_buffer)
406×8 DataFrames.DataFrame
│ Row │ Id │ MPG │ Cylinders │ Edispl │ Horsepower │ Weight │ Accelerate │ Year │
├─────┼─────┼─────┼───────────┼────────┼────────────┼────────┼────────────┼──────┤
│ 1 │ 1 │ 18 │ 8 │ 307.0 │ 130 │ 3504 │ 12.0 │ 1970 │
│ 2 │ 2 │ 15 │ 8 │ 350.0 │ 165 │ 3693 │ 11.5 │ 1970 │
│ 3 │ 3 │ 18 │ 8 │ 318.0 │ 150 │ 3436 │ 11.0 │ 1970 │
⋮
│ 405 │ 405 │ 28 │ 4 │ 120.0 │ 79 │ 2625 │ 18.6 │ 1982 │
│ 406 │ 406 │ 31 │ 4 │ 119.0 │ 82 │ 2720 │ 19.4 │ 1982 │