Bulk user account creation from CSV data import/ingestion - sql

Hi all brilliant minds,
I am currently working on a fairly complex problem and I would love to get some idea brainstorming going on. I have a C# .NET web application running in Windows Azure, using SQL Azure as the primary datastore.
Everytime a new user creates an account, all they need to provide is the name, email and password. Upon account creation, we store the core membership data to the SQL database, and all the secondary operations (e.g. sending emails, establishing social relationships, creating profile assets, etc) get pushed onto an Azure Queue and gets picked-up/processed later.
Now I have a couple of CSV files that contain hundreds of new users (names & emails) that need to be created on the system. I am thinking of automating this by breaking into two parts:
Part 1: Write a service that ingests the CSV files, parses out the names & emails, and saves this data in storage A
This service should be flexible enough to take files with different formats
This service does not actually create the user accounts, so this is decoupled from the business logic layer of our application
The choice of storage does not have to be SQL, it could also be non-relational datastore
(e.g. Azure Tables)
This service could be a third-party solution outside of our application platform - so it is open to all suggestions
Part 2: Write a process that periodically goes through storage A and creates the user accounts from there
This is in the "business logic layer" of our application
Whenever an account is successfully created, mark that specific record in storage A as processed
This needs to be retry-able in case of failures in user account creations
I'm wondering if anyone has experience with importing bulk "users" from files, and if what I am suggesting sounds like a decent solution.
Note that Part 1 could be a third-party solution outside of our application platform, so there's no restriction in what language/platform it has to be running in. We are thinking about either using BULK INSERT, or Microsoft SQL Server Integration Services 2008 (SSIS) that ingests and loads data from CSV into SQL datastore. If anyone has worked with these and can provide some pointers that would be greatly appreciated too.. Thanks so much in advance!

If I understand this correctly, you already have a process that picks up messages from a queue and does its core logic to create the user assets/etc. So, sounds like you should only automate the parsing of the CSV files and dumping the contents into queue messages? That sounds like a trivial task.
You can kick the process of processing the CSV file also via a queue message (to a different queue). The message would contain the location of the CSV file and the Worker Role running in Azure would pick it up (could be the same worker role as the one that processes new users if the usual load is not high).
Since you're utilizing queues, the process is retriable
HTH

Related

Database for live mobile tracking

I'm developing an app that allows to track a mobile device instantly (live) ... I need an of advice. The application must send the location to a webservice that in it's turn records the received data in a database.
What would be, in your opinion, the best way to store the location values?
I'm new in using bigdata and I'm afraid that simple sql requests wont be able to do the work properly ... I imagine if there is lot of users and each user send a request each 1sec I'll have issue with the database ...
An advice ? Thank you very much
i think you could have a look into the geospatial queries in mongo, if you chose to go ahead with mongodb.
Refer here
And here
for the design of the database would depend on the nature of the query (essentially the read and write).
Worth having a look into
Working at Cintric we landed on using elasticsearch. We process billions of location points in real time and provide advanced analytics to our users.
We started with mongoDB and ran into a lot of troubles, eventually leading to a painful migration.
Our stack currently has mobile devices dump location updates into AWS Kinesis, which are then processed by AWS Lambda handlers, and then dumped into elasticsearch. We're able to serve, process and store 300 million requests/month for only a few hundred dollars/month. Analytics for our dashboard add additional cost but for your needs I would highly recommend checking out your options on AWS.

Using SES to send a personalised weekly digest email?

I am wondering where to start with setting up a weekly personalised digest to go out to my users (over 200k).
It would pull in content specific to them, we currently use SES for notifications, on a Windows EC2 instance with SQL.
Is there a cron style thing for windows IIS?
Probably the easiest way to do this is to develop a console application to send your emails and then use the windows task scheduler to schedule it to run once a week.
Within your console application you'll basically get your users from your database and foreach through each user getting whatever personalised data you need to build up an email message, and then pass off the message to Amazon SES.
To use Amazon SES you'll need to request a sending quota increase because the default quotas are way below what you need: Default sending quota is 10,000 emails per 24-hour period, and a maximum send rate of 5 emails per second.
To implement this functionality you'll need these components:
Some application that will collate all bits of information and create an email message. Probably same app will send to email server (SES). If every message is unique to the user then
Hence the Question: How can I send email message programmatically?
Write your script. C# or any other language. It needs to connect to db, extract either pieces of email body or whole message will be collated by SQL query / SP. it will also extract email addresses and send the email.
You can use SSIS to create process of getting of needed bits, and sendin emails. It offers graphical interface for the process map. It may be not so fast as a custom script, but scheduling is very simple with SQL Server Agent. Also you can implement various processes depending on run time calculations.
Use and other soft to create and send emails (except Mail Merge in MS Word, joking )
Scheduling tool, that will run app from 1 on regular basis.
Use windows scheduling tool
SQL Server Agent. You can run SQL scripts and stored procedures. Scripts and SPs can contain file system commands (call .exe files, read data from files, etc), but you'll need to do some research for syntax, functionality and necessary permissions.
There are other scheduling apps available.
Content control. It may me done by the app, but you'll create some tables or use files for settings, common parts of the email message etc.
You would wish to keep record of various rules used to create custom messages. Logic
Generic advice. For the first time go with software you are familiar with. Solution may be cumbersome, but longest way is taking shortcuts.
There are many mass mailing and mail merge applications around. You can find those easily, compare functionality and may be choose one of those.
Disclaimer: I'm the author of the library mentioned below.
You could be interested by the library I'm writing. It's called nvelopy. I created it so that it's easy to embed in a .Net application to send transactional emails and campaigns. That way, it can directly access your data (users) to create a segment and send it via Amazon while respecting the sending quota.
I developed it for my own web service. I didn't want to setup another server or export/import users and I wanted it to "talk" to my datastore (RavenDb). A SQL datastore connector is also in the works.
Let me know if you have any question (via the contact page of the nvelopy web site).

Periodic Email Notifications (Windows Azure .Net)

I have an application written in C# ASP.Net MVC4 and running on Windows Azure Website. I would like to write a service / job to perform following:
1. Read the user information from the website database
2. Build a user-wise site activity summary
3. Generate an HTML email message that includes the summary for each user account
4. Periodically send such emails to each user
I am new to Windows Azure Cloud Services and would like to know best approach / solution to achieve the above.
Based on my study so far, I see that independent Worker Role of Cloud Services along with SendGrid and Postal would be a best fit. Please suggest.
You're on the right track, but... Remember that a Worker Role (or Web Role) is basically a blueprint for a Windows Server VM, and you run one or more instances of that role definition. And that VM, just like Windows Server running locally, can perform a bunch of tasks simultaneously. So... there's no need to create a separate worker role just for doing hourly emails. Think about it: For nearly an hour, it'll be sitting idle, and you'll be paying for it (for however many instances of the role you launch, and you cannot drop it to zero - you'll always need minimum one instance).
If, however, you create a thread on an existing worker or web role, which simply sleeps for an hour and then does the email updates, you basically get this ability at no extra cost (and you should hopefully cause minimal impact to the other tasks running on that web/worker role's instances).
One thing you'll need to do, independent of separate role or reused role: Be prepared for multiple instances. That is: If you have two role instances, they'll both be running the code to check every hour. So you'll need a scheme to prevent both instances doing the same task. This can be solved in several ways. For example: Use a queue message that stays invisible for an hour, then appears, and your code would check maybe every minute for a queue message (and the first one who gets it does the hourly stuff). Or maybe run quartz.net.
I didn't know postal, but it seems like the right combination to use.

What is the best way to process and import a large csv (500k records) to SQL server using Vbscript?

I have a system that requires a large amount of names and email addresses (two fields only) to be imported via CSV upload.
I can deal with the upload easily enough, how would I verify the email addresses before I process the import.
Also how could I process this quickly or as a background process without requiring the user to watch a script churning away?
Using Classic ASP / SQL server 2008.
Please no jibes at the classic asp.
Do you need to do this upload via the ASP application? If not, whatever kind of scripting language you feel most comfortable with, and can do this with the least coding time is the best tool for the job. If you need for users to be able to upload into the classic ASP app and have a reliable process to insert the valid records into the database and reject the invalid ones, your options change.
Do you need to provide feedback to the users? Like telling them exactly which rows were invalid?
If that second scenario is what you're dealing with, I would have the asp app simply store the file, and have another process, a .net service, or scheduled task or something, do the importing and report on its progress in a text file which the asp app can check. That brings you back to doing it in whatever scripting language you are comfortable with, and you don't have to deal with the http request timing out.
If you google "regex valid email" you can find a variety of regular expressions out there for identifying invalid email addresses.
In a former life, I used to do this sort of thing by dragging the file into a working table using DTS and then working that over using batches of SQL commands. Today, you'd use Integration Services.
This allows you to get the data into SQL Server very quickly, and prevent the script timing out, then you can use whatever method you prefer (e.g. AJAX-driven batches, redirection-driven batches, etc.) to work over discreet chunks of the data, or schedule it to run as a single batch (an SQL Server job) and just report on the results.
You might be lucky enough to get your 500K rows processed in a single batch by your upload script, but I wouldn't chance it.

Timer-based event triggers

I am currently working on a project with specific requirements. A brief overview of these are as follows:
Data is retrieved from external webservices
Data is stored in SQL 2005
Data is manipulated via a web GUI
The windows service that communicates with the web services has no coupling with our internal web UI, except via the database.
Communication with the web services needs to be both time-based, and triggered via user intervention on the web UI.
The current (pre-pre-production) model for web service communication triggering is via a database table that stores trigger requests generated from the manual intervention. I do not really want to have multiple trigger mechanisms, but would like to be able to populate the database table with triggers based upon the time of the call. As I see it there are two ways to accomplish this.
1) Adapt the trigger table to store two extra parameters. One being "Is this time-based or manually added?" and a nullable field to store the timing details (exact format to be determined). If it is a manaully created trigger, mark it as processed when the trigger has been fired, but not if it is a timed trigger.
or
2) Create a second windows service that creates the triggers on-the-fly at timed intervals.
The second option seems like a fudge to me, but the management of option 1 could easily turn into a programming nightmare (how do you know if the last poll of the table returned the event that needs to fire, and how do you then stop it re-triggering on the next poll)
I'd appreciate it if anyone could spare a few minutes to help me decide which route (one of these two, or possibly a third, unlisted one) to take.
Why not use a SQL Job instead of the Windows Service? You can encapsulate all of you db "trigger" code in Stored Procedures. Then your UI and SQL Job can call the same Stored Procedures and create the triggers the same way whether it's manually or at a time interval.
The way I see it is this.
You have a Windows Service, which is playing the role of a scheduler and in it there are some classes which simply call the webservices and put the data in your databases.
So, you can use these classes directly from the WebUI as well and import the data based on the WebUI trigger.
I don't like the idea of storing a user generated action as a flag (trigger) in the database where some service will poll it (at an interval which is not under the user's control) to execute that action.
You could even convert the whole code into an exe which you can then schedule using the Windows Scheduler. And call the same exe whenever the user triggers the action from the Web UI.
#Vaibhav
Unfortunately, the physical architecture of the solution will not allow any direct communication between the components, other than Web UI to Database, and database to service (which can then call out to the web services). I do, however, agree that re-use of the communication classes would be the ideal here - I just can't do it within the confines of our business*
*Isn't it always the way that a technically "better" solution is stymied by external factors?