Export pandas dataframe to SAS sas7bdat format - pandas

The flow I have in mind in this:
1. Export a sas7bdat from SAS
2. Import that file in python with pd.read_sas and do some stuff on in
3. Export the pandas dataframe to sas7bdat (or some other SAS binary fileformat). I thought that pd.to_sas would exist, but it doesn't
4. Open the new file in SAS and do further stuff on it
Is there a solution to point 3 above? As I see it, my only options are csv or some SQL database.
This is not really a programming question. hope it won't be an issue.

Python is capable of writing to SAS .xpt format (see for example the xport library), which is SAS's open file format. SAS7BDAT is a closed file format, and not intended to be read/written to by other languages; some have reverse engineered enough of it to read at least, but from what I've seen no good SAS7BDAT writer exists (R has haven, for example, which is the best one I've seen, but it still has issues and things it can't do).
More common than XPT files, though, which can be slow to work with, is to write a CSV and then write a SAS input script in your python/etc. program. That allows you to use variable labels, value labels, types, etc., as you wish very easily; and writing a SAS input script is very easy to do. Many other software packages do this for their preferred method to produce SAS files. This has an additional advantage that it is easily cross-platform - doesn't matter if your SAS program is on a mainframe, UNIX, Windows, etc.; it's all the same.
Edit: If you do have SAS licensed locally, either via a server or local install, another option for exporting Python data to SAS is SASPy, which is a SAS-maintained open source project that allows Python to directly connect to SAS instances and directly send data. (Under the hood, I believe the data is actually transmitted as a CSV most of the time, and then read in using SAS code.) The SAS ODBC driver is also an option, but for Python SASPy will be the easiest option most likely.

"SAS7BDAT is a closed file format, and not intended to be read/written to by other languages; some have reverse engineered enough of it to read at least, but from what I've seen no good SAS7BDAT writer exists."
Although the SAS7BDAT is a proprietary format, it is not closed. It can be read and written by third-party products using SAS' own ODBC drivers. https://support.sas.com/en/software/sas-odbc-drivers.html. Since Python can use ODBC (pyodbc), just use the SAS ODBC Driver to write the SAS7BDAT file format.
IBM SPSS Statistics and IBM SPSS Modeler can also read and write the SAS7BDAT format as well as the earlier pre-version 7 formats and the SAS Transport File format (the .xpt) files noted above. These products do not require ODBC to do this and this capability is included in SPSS Statistics Base via the SAVE Translate command. It is included in SPSS Modeler Professional via the SAS Source node for reading and the SAS Export node for writing.

Related

Generate sas7bdat files from a pandas dataframe

I would like to know if there's any python library that supports this conversion, currently the options i've found are SASpy, csv or SQL database but was unsuccessful.
This is not really a programming question but hope it won't be an issue.
I've found this post:
Export pandas dataframe to SAS sas7bdat format
But was hoping to find any updates on new libraries that support sas7bdat files creation and how licensing works for SASpy.
The sas7bdat is very hard to write. The read is fairly doable (but pretty hard) but the write is brutal. SAS costs a LOT of money and cannot be purchased (it is leased). My suggestions:
Use one of the products by companies that have done it. Some examples: CoyRoc (SSIS adaptor) $, StatTransfer $, SPSS $$$, SAS (lots of dollar signs). WPS might be able to do it but they save to their format to avoid the mess. They probably also support sas7bdat export.
Do not use sas7bdat format. Consider something else like SAS Transport format. Look at my github repository (savian-net) for C# code that can do it. Translate to Python or find a python library that can handle SAS Transport.
The sas7bdat is a binary, proprietary protocol that is 100% not published anywhere. Any docs are guesses based upon binary sleuthing. It is based on an old mainframe format and 'likely remnants' appear to be included. My suggestion is to avoid it like the plague and find an alternative.
An alternative to using xport as Stu suggested - as of Viya 2021.2.6, SAS supports reading externally generated parquet files via the new parquet import engine. As such, you could export the file to parquet via Python then directly import that into SAS and save it as a .sas7bdat file.
https://communities.sas.com/t5/SAS-Communities-Library/Parquet-Support-in-SAS-Compute-Server/ta-p/811733

Karate - Get CSV Key Value from a particular section [duplicate]

I need to create data driven unit tests for different APIs in karate framework. The various elements to be passed in the JSON payload should be taken as input from an excel file.
A few points:
I recommend you look at Karate's built-in data-table capabilities, it is far more readable, integrates into your test-script and you won't need to depend on other software. Refer these examples: call-table.feature and dynamic-params.feature
Next I would recommend using JSON instead of an Excel or CSV file, it is natively supported by Karate: call-json-array.feature
Finally, if you really wanted to, you can call any Java code and if you return data in a Map / List form, it will be ready for Karate to use. This example shows how to read a database via JDBC: dogs.feature. So although this is not built into Karate, just write a simple utility to read a CSV or Excel file and you can do pretty much anything Java can do.
EDIT: Karate now supports CSV files that can be used to even do data-driven testing: https://github.com/intuit/karate#csv-files

how-to-parse-a-ofx-version-1-0-2-file-in power BI?

I just read
How to parse a OFX (Version 1.0.2) file in PHP?
I am not a developer. What easy tool can I use to make this code run with no code skill or appetence ? web browser is pretty hard to use for non dev guys.
I need this to use the file into Power BI, which accept M code, json source or xml, but not sgml ofx or PHP.
Thanks in advance
Welcome Didier to StackOverflow!
I'm going to try and give you a clue how I'd approach the problem here. But keep in mind that your question really lacks details for us to help you, and I'm asking to update your question with example data that you want to integrate into PowerBI. Also, I'm not too familiar with PowerBI nor PHP, and won't go into making that PHP code you linked run for you.
Rather, I'd suggest to convert your OFX file into XML, and then use PowerBI's XML import on that converted file.
From your linked question, I get that your OFX file is in SGML format. There's a program specifically designed to convert SGML into XML (which is just a restricted form of SGML) called osx. I've detailed how to install it on Linux and Mac OS in another question related to SGML-to-XML down-converting; if you're on Windows, you may have luck by just downloading a really ancient (32bit) version of it from ftp://ftp.jclark.com/pub/sp/win32/sp1_3_4.zip. Alternatively, you can use my sgmljs.net software as explained in Converting HTML to XML though that tutorial is really about the much more complex task of converting HTML to XML/XHTML and will probably confuse you.
Anyway, if you manage to install osx, running it on your OFX file (which I assume to have the name yourfile.ofx just for illustration) is just a matter of invoking (on the Windows or Linux/Mac OS command line):
osx yourfile.ofx > yourfile.xml
to result in yourfile.xml which you can attempt to load with PowerBI.
Chances are your OFX file has additional text at the beginning (lines like XYZ:0001 that come before <ofx>). In that case, you can just remove those lines using a text editor before invoking osx on it. Maybe you also need a .dtd file or additional instructions at the top of the OFX file informing SGML about the grammar of your file; it's really difficult to say without seeing actual test data.
Before bothering with SGML and all that, however, I suggest to remove those first few lines in your OFX file (everything until the first < character) and check if PowerBI can already recognize your changed input file as XML (which, from other OFX example files, has a good chance of succeeding). Be sure to work on a copy of your original file rather than overwriting it. Then come back and update your question with your results and example data.

GUI tools for viewing/editing Apache Parquet

I have some Apache Parquet file. I know I can execute parquet file.parquet in my shell and view it in terminal. But I would like some GUI tool to view Parquet files in more user-friendly format. Does such kind of program exist?
Check out this utility. Works for all windows versions: https://github.com/mukunku/ParquetViewer
There is Tad utility, which is cross-platform. Allows you to open Parquet files and also pivot them and export to CSV. Uses DuckDB as it's backend. More info on the DuckDB page:
GH here:
https://github.com/antonycourtney/tad
Actually I found some Windows 10 specific solution. However, I'm working on Linux Mint 18 so I would like to some Linux (or ideally cross-platform) GUI tool. Is there some other GUI tool?
https://www.channels.elastacloud.com/channels/parquet-net/how-about-viewing-parquet-files
There is a GUI tool to view Parquet and also other binary format data like ORC and AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewer for details.
It supports complex data type like array, map, struct etc. And you can save the read file in CSV format.
GUI option for Windows, Linux, MAC
You can now use DBeaver to
view parquet data
view metadata and statistics
run sql query on one or multiple files. (supports glob expressions)
generate new parquet files.
DBeaver leverages DuckDB driver to perform operations on parquet file. Features like Projection and predicate pushdown are also supported by DuckDB.
Simply create an in-memory instance of DuckDB using Dbeaver and run the queries like mentioned in this document. Right now Parquet and CSV is supported.
Here is a Youtube video that explains the same - https://youtu.be/j9_YmAKSHoA
JetBrains (IntelliJ, PyCharm etc) has a plugin for this, if you have a professional version: https://plugins.jetbrains.com/plugin/12494-big-data-tools

Export from .xls to .sql / creating sql queries

Okay guys, I've been having this problem for a few weeks now and I'm getting no-where with it. I have OpenOffice and regular Office softwares. Both produce flawed .csv files, or at least phpMyAdmin can't read neither of these. Yes, I've been trying to change server's settings of uploading, etc. I also tried to contact my web hosting service and they claimed that all the .csv files I've produced are flawed.
Anyway, I'm looking for a way to convert .xls table to SQL. Most of the softwares out there cost money that I don't have. Furthermore, I've seen PHP systems that do just that, so I know this is possible.
No need converted to. sql, you can import directly with phpmyadmin or using tools like navicat for mysql in phpmyadmin go to the option to import, find the file, select the file type (csv or csv loaddata), in part below defines the column separator (if you do not know which opens the file with notepad)
if a very large file using navicat.
Flawed is "defective"?. I assume you have problem with excel, maybe you have defined the same column separator for separating thousands or decimals, use openoffice to open the file