The text is not distributed according to the column - read.csv

I run my program like this, but the text is not distributed according to the column. I want the data to be distributed according to the column. I read the csv file that came from Twitter. Please help find a solution

Related

Age Analysis Dynamically Sliced with Before Date

I am trying to create an Age Analysis for Creditors using a dynamic date slicer.
I followed each individual step specified on David Churchward's Blog, but I'm not able to replicate what he suggested there.
Herewith is the result of what I tried:
I'm expecting to see these values each in their own Ageing bucket based on what is outstanding.
Please download my PBIX file to see for yourself, then please advise what I did wrong.
The Excel source for PBIX is also in the folder.
Thank you.
The blog that you're referring is quite old and DAX has changed a lot since then.
Additionally PowerBI now has a in-built feature called binning which can do something similar to what you're looking for.
I was able to generate the below output using that feature which automatically groups the data based on the bin size.
There also a related feature called "Grouping" where you can manually choose the groups and their range. If you're up for it you can use this too. Below is the output for that:
I uploaded the file with these changes in the same folder.
Another resource that might be helpful for you is Radacad's article on dynamic banding

Best approach for this data pipeline?

I need to design a pipeline using Nifi, but I have some questions as I am thinking between two approaches and I am unsure which processors to use, so maybe you can help me.
The scenario is the following: I need to ingest some .csv files into my HDFS, those do not contain a date I want to use to partition the Hive tables I will later use, so I thought of two options:
At some point during the .csv treatment, create some kind of code snippet that is launched from Nifi to modify the .csv file adding the column with the date.
Create a temporary (internal?) table on hive, alter the table adding the column and finally add it to the table where I partition by date.
I am unsure which option is better (memory-wise, simplicity, resource management) or maybe if its even possible, or even if there is a better way to do it. Also I am unsure of which are the Nifi processors to use.
So any help is appreciated guys, thanks.
You should be able to do #1 easily in NiFi without writing any code :)
The steps would be something like this:
Source processor to get your CSV from somewhere, probably GetFile
UpdateAttribute to add an attribute for the current date
UpdateRecord with a CsvReader and CsvWriter, adds a new date field
with the value from #2
I've created an example of how to do this and posted the template here:
https://gist.githubusercontent.com/bbende/113f8fa44250c09a5282d04ee600cd09/raw/c6fe8b1b9f31bb106f9c816e4fd5ea90ebe19f80/CsvAddDate.xml
Save that xml file and use the palette on the left of NiFi canvas to upload it as a template. Then instantiate the template from the top toolbar by dragging on the template icon.

Automating the process of creating doc word

I have a .doc template I use for building CVs for many friends.
I'm trying to automate this process using simple library/program, for exmaple, that can accept data like name, email, phone number, job title, and can create the .doc automatically.
What framework can be used for that to make it fastest i can?
Thanks,
Tal
Where exactly are keeping this template and are your friends plugging in the data or are you doing it all yourself?
No matter what, you're basically looking to do a data merge. An example of a data merge is a mail merge:
https://support.microsoft.com/en-us/help/294683/how-to-use-mail-merge-to-create-form-letters-in-word
The same thing really applies to what you're accomplishing to do.
You can take a template, specify the fields that require variable data (aka the different information that's changing), and then just use a spreadsheet to pull the data from and plug it in.
Now the question you'll probably be wondering next is how data merges use spreadsheets. The way data merges work is that each column you set with data in it, that should correspond to the changing lines in your template. I strongly recommend you read up on this further - it's not that difficult to do once you get the hang of it.
The last question is probably how you'll compile the data into this spreadsheet. Are your friends going to fill out an online form perhaps? If so, you'll need an online form of some sort perhaps, so you'll need to use some PHP, have a database to store the information from the form, and then just go to the table and export the information as a .csv file after you see you have enough data populated in your database table to do a data merge.
If you don't have access to MS Office, I'm sure you can accomplish this in OpenOffice.org instead (which is free/open-source).
Hope this helps.
At my job we do data merges all the time - for mail merges, for letters that need to be personally address to individual recipients, and we do this for people who need to print dozens of different business cards for different employees. We take their business card template and just do a data merge from a spreadsheet to save time on needing to set up individual files. P.S. you can also use Adobe inDesign for this, if you know how to use it.

Enrich CSV with metadata from database

I've been looking around for a lightweight, scaleable solution to enrich a CSV file with additional metadata from a database. Each line in the CSV represents a data item and the columns the metadata belonging to that item.
Basically I have a CSV extract and I need to add additional metadata from a database. The metadata can be accessed via ODBC or REST API call.
I have a number of options in my head but I'm looking for other ideas. My options are as follows:
Import the CSV into a database table, apply the additional metadata with sql UPDATE statements by finding the necessary metadata with SELECT statements, and then export the data back into CSV format. For this solution I was thinking to use an ETL tool which may be a bit heavyweight to tackle this problem.
I also thought about a NodeJS based solution where I read the CSV in, call web service to get the metadata and write back the data into the CSV file. The CSV can be however quite large with potentially tens of thousands of rows so this could be heavy on memory or in case of line-by-line processing not very performant.
If you have a better solution in mind, please post. Many thanks.
I think you've come up with a couple of pretty good ideas here already.
Running with your first suggestion using an ETL tool to enrich your CSV files, you should check out https://github.com/streamsets/datacollector
It's a continuous ingestion approach, so you could even monitor a directory of CSV files to load as you get them. While there's no specific functionality yet for doing lookups in a database, its certainly possible in a number of ways (including writing your own custom logic in Java, or a script in python or JavaScript).
*Full disclosure I work on this project.

How to output the name of the files through the waveform?

Morning,
Here is my problem,
I am trying to save the data captured from the Condition Monitoring system,the code we are using now is only for getting the parameters. I'd like to save the files which has crossed the threshold.
Now I have the waveform from the data, how can I get the files through it?
The files should be TDMS, the data is acquired by FPGA.
Thank you very much!
Kind regards
Jialin
You can use TDMS Write and set there your channels and groups. Also you can set properties to your data using TDMS properties.
Have a look on examples. For examples this one seems good for you: TDMS Write Triggered Data VI: labview\examples\File IO\TDMS\Standard Read and Write