How to convert txt file to ped and map file using plink? - plink

I am trying to convert a .txt and b.txt file to a1.ped and b1.map file using plink or R. I have read some manuals, but it seems that no one specifically mentions how to convert txt to ped and map.
Can somebody help me out?

Related

How to convert PDF/SVG to HPGL and GPGL files in Python?

I need to convert vector (to cutting plotter) saved to PDF (A4 format) to HPGL (.PLT) file and GPGL file (.PLT). There is any ready to use libs in Python to do it? Or any ideas how to convert it correctly? Thanks in advice

Mass extract part of a text file using Windows batch

I have thousands of txt files that are actually in JSON format.
Each file has the same string, with different values, namely:
"Name":"xxx","Email":"yyy#zzz.com"
I want to extract the values of these two strings from all the txt files that I put in the same folder.
I've found these lines of code:
Extract part of a text file using Windows batch
but it only applies to one txt file. Whereas what I need is, it can execute all files in one folder.
You can use the FORFILES command, to loop through each file,
Syntax
FORFILES [/p Path] [/m SrchMask] [/s] [/c Command] [/d [+ | -] {date | dd}]
From the following webpage,
https://ss64.com/nt/forfiles.html

Importing a *random* csv file from a folder into pandas

I have a folder with several csv files, with file names between 100 and 400 (Eg. 142.csv, 278.csv etc). Not all the numbers between 100-400 are associated with a file, for example there is no 143.csv. I want to write a loop that imports 5 random files into separate dataframes in pandas instead of manually searching and typing out the file names over and over. Any ideas to get me started with this?
You can use glob and read all the csv files in the directory.
file = glob.glob('*.csv')
random_files=np.random.choice(file,5)
dataframes= []
for fp in random_files :
dataframes.append(pd.read_csv(fp))
From this you can chose the random 5 files from directory and then read them seprately.
Hope I answer your question

Hive output to xlsx

I am not able to open an .xlsx file. Is this the correct way to output the result to an .xlsx file?
hive -f hiveScript.hql > output.xlsx
hive -S -f hiveScript.hql > output.xls
This will work
There is no easy way to create an Excel (.xlsx) file directly from hive. You could output you queries content to an older version of Excel (.xls) by the answers given above and it would open in Excel properly (with an initial warning in latest versions of Office) but in essence it is just a text file with .xls extension. If you open this file with any text editor you would see the contents of the query output.
Take any .xlsx file on your system and open it with a text editor and see what you get. It will be all junk characters since that is not a simple text file.
Having said that there are many programming languages that allow you to convert/read a text file and create xlsx. Since no information is provided/requested on this I will not go into details. However, you may use Pandas in Python to create excels.
output csv or tsv file, and I used Python to do converting (pandas library)
I am away from my setup right now so really cannot test this. But you can give this a try in your hive shell:
hive -f hiveScript.hql >> output.xls

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?
For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)