Venkat Vangalapudi's Technology Blog

Tuesday, April 10, 2018

Virtual Appliances

Software Appliances

In concept, a software appliance is a tightly integrated package of enterprise application software and an operating system, designed to run on a standard industry platform. If the appliance is designed to run on a standard server, it is a software appliance; if it is designed to run on a hyper-visor, it is a special kind of software appliance called a virtual appliance. In implementation, a software appliance can function much like a black box, providing a business function to end users with the complexity hidden from view. The appliance is portable and easy to move, as long as it runs on the hardware or hyper-visor for which it is designed. In these ways, SaaS and software appliances are very similar. In many ways, though, software appliances offer benefits that SaaS alone cannot address. For example, software appliances are typically hosted onsite. The appliance can also be purpose built to meet the needs of a specific customer, solving the multi-tenancy challenges introduced by SaaS. Unlike SaaS, the appliance can handle updates on an individual appliance basis, enabling better timing of updates and rollbacks. Finally, appliances are easier to manage and more secure because they only contain the components of the OS that are required for a specific application. This reduces the footprint of the OS, the number of relevant attack vectors and the number of patches needed by the appliance.

Comparing to a Virtual Machine
The difference is that a virtual appliance is a pre-configured image with an application stack and JeOS (just enough operating system) , while a virtual machine is simply a hyper-visor capable of running a standard OS, and it contains no application software. With a virtual machine, a customer still needs to choose, install, configure and manage an OS and application stack.

The Benefits of Virtual Appliances

The benefits of virtual appliances over the SaaS model, or even the corporate hosted application model for that matter, run the gamut—from simpler distribution to bolstered security:

Simplified distribution. Because a virtual appliance is a self-contained, preconfigured application packaged with a JeOS, distribution is greatly simplified. A single file with a small footprint is easy to download via the Internet. A preconfigured, self-contained package with no external dependencies is also easy to transfer through testing and into the production environment. In addition, encapsulated virtual appliances are highly portable between virtual platforms and from server to server, which is especially beneficial in the case of disaster recovery. Easier deployment. Deploying a compact, self-contained virtual appliance entails a single straightforward installation.
Easier deployment. Deploying a compact, self-contained virtual appliance entails a single straightforward installation. Configuring a virtual appliance is simple and is accomplished through a command-line interface or a Web browser. These attributes reduce the installation and configuration costs associated with running complex stacks of software.
High performance. Virtual appliances tend to offer better performance when compared to a standard virtual machine for a given application. A small, finely tuned JeOS (just enough operating system) environment allows an application to run more efficiently, and all the components of a virtual appliance are tested and optimized to run together and without any unnecessary bloat.
Greater reliability. Extensive interoperability testing and tuning of components makes virtual appliances more robust. What’s more, the self-contained nature of virtual appliances insulates them from the surrounding system, also improving reliability.
Lower hardware costs. Because they are self-contained, virtual appliances can be installed on any commodity hardware, with no need for underlying OS or thirdparty components. In addition, many virtual appliances can run simultaneously on a single virtual server, further reducing hardware costs and increasing overall hardware utilization.
Improved security. Virtual appliances tend to be more secure than comparable applications running on normal OSs. Much of that is because of the compactness of a JeOS, which contains fewer packages and thus fewer vulnerabilities to exploit. In comparison to the SaaS delivery model, virtual appliances are more secure because applications with sensitive data remain on a company’s premises and safely behind its firewall. Data is not transmitted over the Internet and stored offsite at a vendor’s location, where the company has no control.
Digital signing. Virtual appliances also simplify adding security aspects such as digital signing to ensure the trustworthiness of solutions. An authoring organization can digitally sign a virtual appliance at every step, including the image description, repository packages and the image file itself.
Reduced management workload. With a typical software stack, there are individual vendors for the application, the OS and each third-party component. These vendors all have their own update schedules, which makes keeping track of versions, patch levels and compatibility a logistical night - mare for IT organizations, and makes conflicts inevitable. Because the virtual appliance author is responsible for configuring the software stack, JeOS and third-party components, there’s just a single path for updates. By replacing OS- and application-specific management tasks with a single efficient, standardized set of processes, IT staffs gain a simplified software management lifecycle while reducing associated costs.

Friday, December 22, 2017

Google Cloud Kubernetes Cluster creation with Sample Guest book App tutorial

~$ gcloud container clusters get-credentials cluster-1 --zone us-central1-a --project northern-timer-193700

Creeate A directory to create Test App

TUTORIALDIR=src/northern-timer-193700/gke_guestbook-2018-04-08-00-22

Clone gustbook App

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git $TUTORIALDIR

Change directory to guest book.

cd $TUTORIALDIR/guestbook

Disaply redis-master template
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ cat redis-master-controller.yaml

~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ gcloud container clusters get-credentials cluster-1 --zone us-central1-a

Create Redis Master Service
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl create -f redis-master-service.yaml

Create Redis Master Controller
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl create -f redis-master-controller.yaml

List down Pods, resources and Services.
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl get pods
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl get rc
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl get services

Create slave nodes
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl create -f all-in-one/redis-slave.yaml
Create Front End App
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl create -f all-in-one/frontend.yaml

Watch the Deployed App
~/src/northern-timer-193700/gke_guestbook-2018-04-08-00-22/guestbook$ kubectl get services --watch

Friday, October 13, 2017

Asp.Net Core MVC Custom OutputFormatter to send Word Document as a response

I tried to find a way to serve word document from a Memory Stream using byte array as we do in our Web API to send documents in the response. But Asp.Net Core WepAPI Response object has no supporting methods to support this feature. We need to save the file from memory to Disk and use that path to server the file.

Our requirement is to Open Word document template into memory and do some changes to the document on the fly based on the user request. We were able to achieve this using ASP.NET web API using the below code

  public HttpResponseMessage Get()  
     {  
       GenerateMyDocument gd = new GenerateMyDocument();  
       byte[] documentContent = gd.GetDocumentParameters(3, "ISQ839343", "Venkat", "sept 19 2019");  
       HttpResponseMessage response = new HttpResponseMessage(HttpStatusCode.OK);  
       response.Content = new ByteArrayContent(documentContent);  
       response.Content.Headers.ContentType = new MediaTypeHeaderValue("application/msword");  
       response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("inline")  
       {  
         FileName = "some.docx"  
       };  
       return response;  
     }

The default format used by ASP.NET Core MVC is JSON.To send other content types as a response, we need to implement our own OutputFormatter. To deliver word document as a response we followed below steps.

Create a new class WordOutputFormatter inheriting from OutputFormatter. The below two methods we need to override to return our word document as a content type.

public class WordOutputFormatter : OutputFormatter
    {
        public string ContentType { get; }

        public WordOutputFormatter()
        {
            ContentType = "application/ms-word";
            SupportedMediaTypes.Add(MediaTypeHeaderValue.Parse(ContentType));
        }

        //we need to check whether the context Object is what we need to format
        // We must override this method to keep the normal execution flow for other action methods
        public override bool CanWriteResult(OutputFormatterCanWriteContext context)
        {
            return context.Object is Jurer;
        }

        public async override Task WriteResponseBodyAsync(OutputFormatterWriteContext context)
        {
            IServiceProvider serviceProvider = context.HttpContext.RequestServices;
            var response = context.HttpContext.Response;

            var buffer = new StringBuilder();
            Jurer jurer = null;
            if (context.Object is Jurer)
            {
                jurer = context.Object as Jurer;
            }

            response.Headers.Add("Content-Disposition", "inline;filename=jurer.docx");
            response.ContentType = "application/ms-word";

            string documentFilePath = DocumentHelper.GetDocumentParameters(jurer.JurisdictionID,jurer.JurorID,jurer.JurorName,jurer.Eventdate);


            await response.SendFileAsync(documentFilePath);
        }




    }

Now In your Controller add the below Action method

     [HttpGet]  
     [DeleteFileAttribute]  
     public IActionResult Get(int juridictionid, string jurerid,string jurername, string date)  
     {  
       Jurer jurer = new Jurer()  
       {  
         JurisdictionID = juridictionid,  
         JurorID = jurerid,  
         JurorName = jurername,  
         Eventdate = date,  
       };  
       return Ok(jurer);  
     }

We are using [DeleteFileAttribute] inherited from ActionFilterAttribute to Delete the word document we recently created from Word template and saved to Disk

public class DeleteFileAttribute : ActionFilterAttribute

{

public override void OnResultExecuted(ResultExecutedContext filterContext)

{

//write your cleanup logic to delete temp files created in the previous step.

}

To invoke our custom WordOutputFormatter we need to alter ConfigureServices method from Startup.cs as shown below

        public void ConfigureServices(IServiceCollection services)
        {
            // Add framework services.
            services.AddMvc(options =>
            {
                options.RespectBrowserAcceptHeader = true; // false by default
                options.OutputFormatters.Insert(0,new WordOutputFormatter());
                options.FormatterMappings.SetMediaTypeMappingForFormat(
                  "docx", MediaTypeHeaderValue.Parse("application/ms-word"));
            });

          
        }

Thursday, September 14, 2017

D3 charts using React JS and Node JS error --element type is invalid: expected a string (for built-in components) or a class/function

From the last two months I am working on Big Data technologies like Hadoop, Python, Spark, Kafka
and web front end frameworks like Angular JS , React JS and D3 charts.

Our requirement is to Display Data from Hadoop as charts. Pyspark will load the data into DataFrame , then convert the DataFrame into JSON data, React with D3 charts will use this JSON data and display as charts on the portal.

used the below two commands to create a facebook provided sample React JS app.

npm install -g create-react-app
create-react-app AppName

I found one solution in git and modified the solution to make it a Node.JS solution, I have added import and require statements.

used npm install statements to install packages.

npm run build , to run the build and serve -s build to start the web application on localhost.

as I am learning react js, struggled for two days to make things happen. A silly mistake that I did is

to import components, we need to use {} brackets, if we omit these brackets, then it will not give any compilation error, but while running, you will get javascript error saying

"element type is invalid: expected a string (for built-in components) or a class/function (for composite components) but got: undefined. check the render method of 'ComponentName'"

below is the working version of the import statements in App.jsx

import React, {Component} from 'react'
import ReactDOM from 'react-dom';
import ReactD3 from 'react-d3-components';
import {Waveform} from 'react-d3-components'
import {BarChart} from 'react-d3-components'
import {PieChart} from 'react-d3-components'
import {AreaChart} from 'react-d3-components'
import {ScatterPlot} from 'react-d3-components'
import {LineChart} from 'react-d3-components'
import {Brush} from 'react-d3-components'
const d3 = require('d3');

index.json file contents

import ReactDOM from 'react-dom';
import React from 'react';
import App from './App';

const sw = require('./registerServiceWorker');
//ReactDOM.render(<App />, document.getElementById('root'));
sw.registerServiceWorker;

Below is the code to render D3 chart

render: function() {
  return (
    <div>
    <LineChart data={this.state.data}
                           width={400}
                           height={400}
                           margin={{top: 10, bottom: 50, left: 50, right: 20}}
                           xScale={this.state.xScale}
                           xAxis={{tickValues: this.state.xScale.ticks(d3.time.day, 2), tickFormat: d3.time.format("%m/%d")}}
        tooltipHtml={tooltipLine}
        />
    <div className="brush" style={{float: 'none' }}>
    <Brush width={400}
                       height={50}
                       margin={{top: 0, bottom: 30, left: 50, right: 20}}
                       xScale={this.state.xScaleBrush}
                       extent={[new Date(2015, 2, 10), new Date(2015, 2, 12)]}
                       onChange={this._onChange}
                       xAxis={{tickValues: this.state.xScaleBrush.ticks(d3.time.day, 2), tickFormat: d3.time.format("%m/%d")}}
        />
    </div>
    </div>
  );
},
_onChange: function(extent) {
  this.setState({xScale: d3.time.scale().domain([extent[0], extent[1]]).range([0, 400 - 70])});
}
});

ReactDOM.render(
<SomeComponent />,
document.getElementById('brush')
);

Wednesday, August 16, 2017

Spark Programming

What is RDD:

The main abstraction Spark provides is a Resilient Distributed Dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.

RDD Types:

Parallelized collections, which take an existing Scala collection and run functions on it in parallel.

Parallelized collections are created by calling SparkContext’s parallelize method on an existing Scala collection (a Seq object). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel

One important parameter for parallel collections is the number of slices to cut the dataset into. Spark will run one task for each slice of the cluster. Typically, you want 2-4 slices for each CPU in your cluster. Normally, Spark tries to set the number of slices automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to parallelize (e.g. sc. parallelize(data, 10)).

Hadoop datasets, which run functions on each record of a file in Hadoop distributed file system or any other storage system supported by Hadoop.

Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, Amazon S3, Hypertable, HBase). Spark supports text files, SequenceFiles, and any other Hadoop InputFormat.

For other Hadoop InputFormats, you can use the SparkContext.hadoopRDD method, which takes an arbitrary JobConf and input format class, key class and value class. Set these the same way you would for a Hadoop job with your input source.

RDD Operations

Transformations which create a new dataset from an existing one, and Actions, which return a value to the driver program after running a computation on the dataset.

For example, map is a transformation that passes each dataset element through a function and returns a new distributed dataset representing the results. On the other hand, reduce is an action that aggregates all the elements of the dataset using some function and returns the final result to the driver program.

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset. The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently – for example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

By default, each transformed RDD is recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist (or cache) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk.

How to share state between nodes:

A second abstraction in Spark is shared variables that can be used in parallel operations. By default, when Spark runs a function in parallel as a set of tasks on different nodes, it ships a copy of each variable used in the function to each task. Sometimes, a variable need to be shared across tasks, or between tasks and the driver program.

broadcast variables: which can be used to cache a value in memory on all nodes,

accumulators: which are variables that are only “added” to, such as counters and sums.

Dataset (New Abstraction of Spark)

For long, RDD was the standard abstraction of Spark. But from Spark 2.0, Dataset will become the new abstraction layer for spark. Though RDD API will be available, it will become low-level API, used mostly for runtime and library development. All userland code will be written against the Dataset abstraction and it’s subset Dataframe API.

Dataset is a superset of Dataframe API which is released in Spark 1.3. Dataset together with Dataframe API brings better performance and flexibility to the platform compared to RDD API. Dataset will be also replacing RDD as an abstraction for streaming in future releases.

SparkSession (New entry point of Spark)

In earlier versions of spark, spark context was an entry point for Spark. As RDD was the main API, it was created and manipulated using context API’s. For every other API, we needed to use different contexts. For streaming, we needed StreamingContext, for SQL sqlContext and for hive HiveContext. But as DataSet and Dataframe API’s are becoming new standard API’s we need an entry point build for them. So in Spark 2.0, we have a new entry point for DataSet and Dataframe API’s called as Spark Session.

SparkSession is essentially a combination of SQLContext, HiveContext and StreamingContext. All the API’s available on those contexts are available on spark session also. Spark session internally has a spark context for actual computation.

Creating SparkSession

val sparkSession = SparkSession.builder.

master("local")

.appName("spark session example")

.getOrCreate()

The above is similar to creating an SparkContext with local and creating an SQLContext wrapping it.

The Spark Session encapsulates the existing Spark Context, therefore, existing functionality should not be affected and developers may continue using the Spark Context as desired. However, the new Spark Session abstraction is preferred by the Spark community in Spark 2.0.0 on beyond.

Read data using Spark Session

The below code is reading data from csv using spark session.

val df = sparkSession.read.option("header","true").

    csv("src/main/resources/sales.csv")

Wednesday, June 21, 2017

Spiral Tree traversal using Stack and Queue

I was asked to traverse a binary tree by level , but in Spiral order

                                      A
                                   /       \
                                  B       C
                                /    \      /  \
                               D    E   F   G

should be traversed as

A
CB
DEFG

And the interviewer asked me to do it using a stack and a queue. here is the solution.

class Program
    {
        static char[] inputArray = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', };
        static void Main(string[] args)
        {
            TreeTraversalBFS treeTraversal = new TreeTraversalBFS();
            Console.WriteLine("*****************With Queue and Stack *******************");
            Node root = treeTraversal.BuildBinaryTree(inputArray);
            treeTraversal.DisplayNodesByCrisCross(root);
            Console.WriteLine("***********************************************");
            Console.ReadLine();
        }
    }
    public class Node
    {
        public Node left { get; set; }
        public Node right { get; set; }
        public char data;
        public Node(char data)
        {
            this.data = data;
        }
    }

    class TreeTraversalBFS
    {
        /// <summary>
        /// Build the binary tree
        /// </summary>
        /// <param name="inputArray"></param>
        /// <returns></returns>
        public Node BuildBinaryTree(char[] inputArray)
        {
            //to hold the nodes
            Queue<Node> queue = new Queue<Node>();
            Node root = new Node(inputArray[0]);
            queue.Enqueue(root);
            for (int i = 1; i < inputArray.Length;)
            {
                Node node = queue.Dequeue();
                Node left = new Node(inputArray[i++]);
                node.left = left;
                queue.Enqueue(left);
                if (i < inputArray.Length)
                {
                    Node right = new Node(inputArray[i++]);
                    node.right = right;
                    queue.Enqueue(right);
                }
            }
            return root;
        }
        /// <summary>
        /// breadth-first using a queue and stack
        /// </summary>
        /// <param name="root"></param>
        public void DisplayNodesByCrisCross(Node root)
        {
            if (root == null)
                return;
            Queue<Node> queue = new Queue<Node>();
            Stack<Node> stack = new Stack<Node>();
            queue.Enqueue(root);
            int level = 0;
            while (true)
            {
                if (level % 2 == 1)
                {
                    int queuNodeCount = queue.Count;
                    if (queuNodeCount == 0)
                        break;
                    while (queuNodeCount > 0)
                    {
                        Node queueNode = queue.Dequeue();
                        Console.Write(queueNode.data);
                        if (queueNode.left != null)
                        {
                            stack.Push(queueNode.left);
                            //insert into queue as well to display next level left to right
                            queue.Enqueue(queueNode.left);
                        }
                        if (queueNode.right != null)
                        {
                            stack.Push(queueNode.right);
                            //insert into queue as well to display next level left to right
                            queue.Enqueue(queueNode.right);
                        }
                        queuNodeCount--;
                    }
                }
                else
                {
                    int stackNodeCount = stack.Count;
                    while (stackNodeCount > 0)
                    {
                        Node stackNode = stack.Pop();
                        Node queueNode = queue.Dequeue();
                        //display data from stack
                        Console.Write(stackNode.data);
                        //add nodes from Queue and not from Stack to display next level nodes left to right
                        if (queueNode.left != null)
                            queue.Enqueue(queueNode.left);
                        if (queueNode.right != null)
                            queue.Enqueue(queueNode.right);
                        stackNodeCount--;
                    }
                }
                Console.WriteLine();
                level++;
            }
        }
    }

Saturday, April 08, 2017

Flow vs Logic Apps vs Functions vs Webjobs

Azure Functions or WebJobs
All these services are useful when "gluing" together disparate systems. They can all define

input
actions
conditions
output

You can run each of them on a schedule or trigger. However, each service has unique advantages, and comparing them is not a question of "Which service is the best?" but one of "Which service is best suited for this situation?" Often, a combination of these services is the best way to rapidly build a scalable, full-featured integration solution.

Flow vs. Logic Apps
Microsoft Flow and Azure Logic Apps are both configuration-first integration services. They make it easy to build processes and workflows and integrate with various SaaS and enterprise applications.

Flow is built on top of Logic Apps
They have the same workflow designer
Connectors that work in one can also work in the other
Flow empowers any office worker to perform simple integrations (for example, get SMS for important emails) without going through developers or IT. Flow is for Office workers and business users as a self service tool.

On the other hand, Logic Apps can enable advanced or mission-critical integrations (for example, B2B processes) where enterprise-level DevOps and security practices are required. It is typical for a business workflow to grow in complexity over time. Accordingly, you can start with a flow at first, then convert it to a logic app as needed. Logic apps are for IT pros and developers used for mission critical operations.

Functions vs. WebJobs
Azure Functions and Azure App Service WebJobs are both code-first integration services and designed for developers. They enable you to run a script or a piece of code in response to various events, such as new Storage Blobs or a WebHook request. Here are their similarities:

Both are built on Azure App Service and enjoy features such as source control, authentication, and monitoring.
Both are developer-focused services.
Both support standard scripting and programming languages.
Both have NuGet and NPM support.

Functions is the natural evolution of WebJobs in that it takes the best things about WebJobs and improves upon them.

The improvements include:

Streamlined dev, test, and run of code, directly in the browser.
Built-in integration with more Azure services and 3rd-party services like GitHub WebHooks.
Pay-per-use, no need to pay for an App Service plan.
Automatic, dynamic scaling.
For existing customers of App Service, running on App Service plan still possible (to take advantage of under-utilized resources).
Integration with Logic Apps.

Functions support in-browser editing where as web jobs doesn't support in browser editing.

Both can be invoked by Event Triggers and Timer based Scheduling. We can make web jobs as continuous.
Functions support more event triggers than Webjobs , Functions support

Timer
Azure Cosmos DB
Azure Event Hubs
HTTP/WebHook (GitHub, Slack)
Azure App Service Mobile Apps
Azure Notification Hubs
Azure Service Bus
Azure Storage

Webjobs support

Azure Storage
Azure Service Bus

Flow vs. Logic Apps vs. Functions
Which service is best suited to you depends on your situation. For simple business optimization, use Flow.If your integration scenario is too advanced for Flow, or you need DevOps capabilities and security compliances, then use Logic Apps.
If a step in your integration scenario requires highly custom transformation or specialized code, then write a function and trigger the function as an action in your logic app.

You can call a logic app in a flow. You can also call a function in a logic app, and a logic app in a function. The integration between Flow, Logic Apps, and Functions continues to improve over time. You can build something in one service and use it in the other services. Therefore, any investment you make in these three technologies is worthwhile.