uploading a compiled program to a C51 microcontroller - embedded

I'm trying to upload a compiled program to a microcontroller.. well my problem is not in programming or uploading things.. my problem is what to upload u.u
The program is in C and was compiled with SDCC.
The mcu is an AT89S8252 by ATMEL.
I built a simple parallel port programmer following MCU protocols for serial programming as stated in its datasheet.
So far so good.. but.. what shoud I upload to the mcu??
when compiling, SDCC generates a lot of text reports.. and then an .ihx.. I suspect I should not upload this file directly but post-process it in some way to get the actual raw bytes to upload??
any help will be highly appreciated =)

Is the .ihx file an Intel hex format file by any chance ? If it is you don't have too much more work to do before you can put your programmer to work. Intel hex format is just a specialized text format. Google Intel hex format to find what it looks like. There's a page on Wikipedia for example. Compare that to your .ihx file. If you get a match you should be able to find something to convert it to a raw binary format, ready to push down to your MCU. If you can't, you should be able to write something to do the job in an hour or so, it's very simple. I could possibly email you a tool I have written previously if you are really stuck. Good luck.

Your .ihx file is an Intel hex file I reckon. That, or the similar Motorola S-record format, is usually a good format for programmer software to read. Those formats contain the data to program, as well as the address that the data should be written to. It is more useful than a binary file, which contains no address information.
What software are you using to drive the parallel programmer, and does it accept Intel hex format or Motorola S-record files? Or, do you mean you're writing your own?
You can open an Intel hex file in a text editor and make some sense of its contents. There are many references that explain the format. E.g. Intel HEX in Wikipedia.

Related

how-to-parse-a-ofx-version-1-0-2-file-in power BI?

I just read
How to parse a OFX (Version 1.0.2) file in PHP?
I am not a developer. What easy tool can I use to make this code run with no code skill or appetence ? web browser is pretty hard to use for non dev guys.
I need this to use the file into Power BI, which accept M code, json source or xml, but not sgml ofx or PHP.
Thanks in advance
Welcome Didier to StackOverflow!
I'm going to try and give you a clue how I'd approach the problem here. But keep in mind that your question really lacks details for us to help you, and I'm asking to update your question with example data that you want to integrate into PowerBI. Also, I'm not too familiar with PowerBI nor PHP, and won't go into making that PHP code you linked run for you.
Rather, I'd suggest to convert your OFX file into XML, and then use PowerBI's XML import on that converted file.
From your linked question, I get that your OFX file is in SGML format. There's a program specifically designed to convert SGML into XML (which is just a restricted form of SGML) called osx. I've detailed how to install it on Linux and Mac OS in another question related to SGML-to-XML down-converting; if you're on Windows, you may have luck by just downloading a really ancient (32bit) version of it from ftp://ftp.jclark.com/pub/sp/win32/sp1_3_4.zip. Alternatively, you can use my sgmljs.net software as explained in Converting HTML to XML though that tutorial is really about the much more complex task of converting HTML to XML/XHTML and will probably confuse you.
Anyway, if you manage to install osx, running it on your OFX file (which I assume to have the name yourfile.ofx just for illustration) is just a matter of invoking (on the Windows or Linux/Mac OS command line):
osx yourfile.ofx > yourfile.xml
to result in yourfile.xml which you can attempt to load with PowerBI.
Chances are your OFX file has additional text at the beginning (lines like XYZ:0001 that come before <ofx>). In that case, you can just remove those lines using a text editor before invoking osx on it. Maybe you also need a .dtd file or additional instructions at the top of the OFX file informing SGML about the grammar of your file; it's really difficult to say without seeing actual test data.
Before bothering with SGML and all that, however, I suggest to remove those first few lines in your OFX file (everything until the first < character) and check if PowerBI can already recognize your changed input file as XML (which, from other OFX example files, has a good chance of succeeding). Be sure to work on a copy of your original file rather than overwriting it. Then come back and update your question with your results and example data.

Rule based PDF text extraction for verious bills and invoices

I have to extract text from invoices and bills pdf files
The files layouts can get complex, though its mostly filled with tables.
I've read a few dozens articles already about the pdf format, how easy it is for our brain to grasp it and how hard it is for a machine to understand its structure.
Also downloaded a few tools like the python's pdfminer and some java tools, some even have rule based layout extraction, like LA-PDBtext these are all great libraries, leaving you the final step.
Adobe also has an online service called exportPdf but it can't be customized
Bottom line, I understand that in order to extract text from structured pdf files and convert it to XML for example, there should be some level of manual work.
I also found From Data Extractor, a non free tool with the ability to set extraction rules that claims to do the job, though its hard to find a proper manual and it runs only on windows.
I thought I may even try a to convert those files to images and try tesseract-ocr but decided to ask for advice here before I spend more time on it.
I'll be very grateful if someone with such experience give me a hint.
I've done a lot of PDF extraction and I can confirm as you've already discovered that it can be a painful process to start. One of the important things to understand is that there is no concept of "tables" within a PDF, just text that happens to have lines around it. Also, there's no guarantee that the linear order of text within the PDF code actually matches the visual order when printed. In other words, there's no guarantee that "hello world" is written in that order, it could be draw 'word' at coord 20 then draw 'hello' at coord 10. Most PDF creators don't do this but still there's no guarantee. The more creative a PDF creator is (InDesign, Illustrator, etc) the more likely the text is going to be harder to get out. And actually, once a designer starts messing with fonts too much some programs will sometimes actually output words one character at a time, changing the font just slightly each time.
That said, I'd recommend the first one that you looked at, LA-PDFText. You can run it in discovery mode (blockify) from which you can create rules. I don't have Java installed anymore so I can't test it but it seems very promising.
Your second one, A-PDF Form Data Extractor, only really works with actual PDF forms. If this is your case I'd recommend just using an open source solution like iText/iTextSharp.
The last OCR one makes me cringe. I just can't imagine going through those hoops would get you better text representation than parsing the PDF. But then again, PDF is a visual format so maybe it would.
Personally I use iText/iTextSharp for this kind of thing but I also like to do things the hard way.
It is not clear if you are looking for the development tool to automate the data extraction from bills and invoices or just for the one time tool (utility) that can be used by the non-developer?
Anyway here are some specialized tools including engines they use:
Tabula (open-source, especially designed to extract data from tables in PDF. Can export shell scripts for batch processing, runs as the localhost web service, powered by JRuby Tabula engine)
Viet OCR (open-source .NET desktop utility for text extraction from PDF and images, based on tesseract oct engine)
Bytescout PDF Viewer (freeware closed source .NET utility, detects and extracts tables, including scanned invoices, powered by PDF Extractor SDK)
DISCLAIMER: I work for ByteScout.

Creating hex dump in Objective-C

I want to create a hex dump of a binary file in Mac OS X.
In all apps in Mac OS, when you navigate to /Contents/Resources/, there is a file (or more) with the same name as the app.
I want to write an application in Objective-C that can read that file and convert it to some kind of a hex dump, like terminal's hexdump.
Does anybody know a way to create such a hex dump using Objective-C? (or perhaps an open source hex editor, written in objective-c?)
Thanks in advance.
Hex Fiend is embeddable, if it's an editor that you're after.
Use NSData to read the file as raw data, then display the bytes however you want (for example, using a format string).

Surefire way of determining the codec of a media file

I'm looking for a surefire way of determining the codec used in an audio or video file. The two things I am currently using are the file extension (obvious), and the mime type as determined by running `file -ib' on the file.
This doesn't seem to get me all the way there: loads of formats are `wrapper' formats that hide the exact codec used within -- for example, '.ogg' files can internally use the Vorbis, Speex, or FLAC codecs. Their MIME type is also usually hidden under 'application/ogg' or similar.
The `file' program is apparently able to tell me which codec is used, but it returns this as human-readable prose:
kb.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~0 bps
and as such it is dodgy to use programmatically.
What I'm essentially asking is: is there a script out there (any language) that can wade through these wrapper formats and tell me what the meat of the file is made of?
ffmpeg includes a library called libavformat that can open and demux pretty much any media format. Obviously that's more than you actually need, but I don't think you can find anything else that's quite as complete. I've used it myself with great success. Take a look at this article for an introduction. There's also bindings for these libraries for some common scripting languages, such as python.
(If you don't want to build something using the library, you can probably use the regular ffmpeg binary.)
You can always use your own magic file, copied and modified from the pre-installed magic file, and change the return string so that it can be easily parsed by your program.
See:
http://linux.die.net/man/1/file
http://linux.die.net/man/5/magic

What file type starts with BOSS 7?

I am looking at some files generated in the early 90s. One of them seems to hold references to data packed in some binary format in a number of large files.
The first six bytes of the file are 0x42 0x4f 0x53 0x53 0x20 0x37 which spells BOSS 7.
My searches of various sources of file type information, including /usr/share/file/magic have not turned up anything. Does anyone know what software might have been used to generate files that start with these bytes? Any information on file layout would be great.
It looks like the file might have been generated by VisualWorks Smalltalk:
[BOSS 7.5]
Contains the Binary Object Streaming Service, which supports efficient storage and
retrieval of objects, including code, to and from files.
Note that for code storage, the parcel system now supercedes BOSS.
I tried to load the file using the IDE provided at http://www.cincomsmalltalk.com/ and it generated a meaningful exception:
The identifier MediaCollectionDictionary has no binding
The file does contain:
MediaCollectionDictionary
MediaCollection*
CallMediaVehDict2
etc which means, if I could now figure out what the rest of the files do and learn enough SmallTalk, I could disentangle this mess.
Of course, I am not sure if this analysis is correct. So, please if you have any other ideas, let me know. Thank you.
Much later: So, my initial assessment seems to be correct. I got some useful tips on comp.lang.smalltalk: http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/5d55d857e2f80158#
Ask on comp.lang.smalltalk
Ask on the vwnc mailing list