How does one export a sample dataset (e.g., Lalonde).
I am able to run MATCHIT examples and export the output datasets from the MatchIt Process but I can't figure out how to export the source example dataset.
How can a reference to lalonde be made in this type of command:
tosas <- data.frame(m.data)
I'm new to R.
thanks
You can just bring in the data set with data() and then do what you want with it. It will be loaded as a data frame. You can export it to a CSV using write.csv(), for example.
data("lalonde", package = "MatchIt")
summary(lalonde)
write.csv(lalonde, file = "lalonde.csv")
Related
I am trying to export query output to a file in cloud storage.
The query output is always <1GB but the export data options is creating multiple smaller files.
Example:
EXPORT DATA OPTIONS(
uri='gs://test_bucket/test_file_*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT * FROM `test.test_table`;
When I provide filename without a wildcard (gs://test_bucket/test_file_1.csv), i see an error "Invalid uri specification. Option 'uri' value must be a wild card URI."
Is there anyway to generate only ONE file always using export data options?
Cross reference: the code
EXPORT DATA OPTIONS(
uri='gs://test_bucket/test_file_*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT DISTINCT * FROM `test.test_table`;
as answered by nicolas noziere https://stackoverflow.com/a/66388650/4985705 (add distinct to force all data to be loaded into one worker.) generates only ONE file always using export data options.
I have never used SQL before and I am trying to do something that should be simple, but it is taking hours to solve. I would like to download a table that is in a project at Google Cloud Platform. It is asked: "Choose where to save the results data from the query" and then I choose: "CSV(Google Drive) Save up to 1GB...". However, I get this message:
Table dataset_reference { project_reference { project_id: "escolas-259115" gaia_id: 777399094185 } dataset_id: "_31e2c29542f3fa3caf4d6d069271a277dce8d215" dataset_uuid: "9872dc9f-2c66-4088-b7b1-b949e7541f07" } table_id: "anon76eccaf0_a96b_4617_8a3c_c2d5ee734662" table_uuid: "76eccaf0-a96b-4617-8a3c-c2d5ee734662" too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data.
Here is the code that I am using:
SELECT
ano,
estado_abrev,
id_municipio,
causa_basica,
idade,
genero,
raca_cor,
numero_obitos
FROM
`basedosdados.br_ms_sim.municipio_causa_idade_genero_raca`
As I said, I have never worked with SQL before.
What you try to achieve is for exporting the table. Here, you want to export a query result.
You can achieve this like that
EXPORT DATA OPTIONS(
uri='gs://mybucket/transformed/sales-*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT
ano,
estado_abrev,
id_municipio,
causa_basica,
idade,
genero,
raca_cor,
numero_obitos
FROM
`basedosdados.br_ms_sim.municipio_causa_idade_genero_raca`
hope you are doing well !
I was following tutorials for process mining using 'PM4PY', but I found difficulties in the csv file ,
in my csv file I have this columns : 'id', 'status', 'mailID', 'date'.... ('status' is same as 'activity' that contain some specific choises )
my csv file contains a lot of data.
to follow process mining tutorial I must have in my columns something like 'case:concept:name' ... but I don't know how can I make it
In your case, I assume 'id' would be the same as the Case ID in normal process mining terminology. Similarly, 'status' corresponds to Activity ID and 'date' would correspond to the timestamp.
The best option is to first read into a pandas dataframe before feeding into PM4Py.
For a detailed understanding of how to do this, here is an example below. As you have not mentioned all the columns that you have in your csv file, let us assume that currently you only have [ 'id', 'status', 'date' ] as your column list. The following code can be adapted to any number of columns you have (by adding them to the list named cols) :
import pandas as pd
from pm4py.objects.conversion.log import converter as log_converter
path = '' # Enter path to the csv file
data = pd.read_csv(path)
cols = ['case:concept:name','concept:name','time:timestamp']
data.columns = cols
data['time:timestamp'] = pd.to_datetime(data['time:timestamp'])
data['concept:name'] = data['concept:name'].astype(str)
log = log_converter.apply(data, variant=log_converter.Variants.TO_EVENT_LOG)
Here we have changed the column names and their datatypes as required by the PM4Py package. Convert this dataframe into an event log using the log_converter function. Now you can perform your regular process mining tasks on this event log object. For instance, if you wish to create a Directly-Follows Graph from the event log, you can use the following line of code :
from pm4py.algo.discovery.dfg import algorithm as dfg_algorithm
dfg = dfg_algorithm.apply(log)
first you need import your csv file using pandas, then convert to an event log object, finally you can use in pm4py.
reference:
https://pm4py.fit.fraunhofer.de/documentation
I have got a function module that counts some variables in sap system and export it as single INT4. But when I try to use this in gateway service, it says me
"no output table mapped" How can i overcome it, I tried to put this variable in a table and export then but I couldnt.
DATA: EV_ENQ TYPE STANDARD TABLE OF seqg3.
CALL FUNCTION 'ENQUEUE_READ'
EXPORTING
guname = '*'
IMPORTING
number = EV_TABLESIZE
TABLES
enq = EV_ENQ.
Ev_Tablesize is the variable that I want to export. It holds the total lock count.
Your parameter should be mapped under your service implementation in SEGW. If it is not, then you should map them again and be sure that the parameter is being displayed.
I'm using EntityFramework to access a sql server to return data. The data needs to be formatted into a tab delimited file. I then want to compress the data to return to the user.
I can do the select, and then iterate over the EF objects and format all the data into one big string- but this takes forever (I'm returning abouit 800k rows). The query itself is quite fast, but its just the creating of the csv file in memory that is killing it.
I found this post that describes how to use sqlcmd to do this directly as an export (but with csv) with sql which seems very promising, but I'm unclear how to pass the -E and other parameters to ExecuteSqlCommand()... or if it is even meant for this.
I tried to do something like this:
var test = context.Database.ExecuteSqlCommand("select Chromosome c,
StartLocation sl, Endlocation el, GeneName gn from Gencode where c = chr1",
"-E", "-Q", new SqlParameter("-s", "\t"));
But of course that didn't work...
Any suggestions as to how to go about this? I'm using EF 6.1 if that matters.
Alternate option using simple method.
F5-->store result--> keep file name