Thursday, September 14, 2017

D3 charts using React JS and Node JS error --element type is invalid: expected a string (for built-in components) or a class/function

From the last two months I am working on Big Data technologies like Hadoop, Python, Spark, Kafka
and web front end frameworks like Angular JS , React JS and D3 charts.

Our requirement is to Display Data from Hadoop as charts. Pyspark will load the data into DataFrame , then convert the DataFrame into JSON data, React with D3 charts will use this JSON data and display as charts on the portal.

used the below two commands to create a facebook provided sample React JS app.

npm install -g create-react-app
create-react-app AppName

I found one solution in git and modified the solution to make it a Node.JS solution, I have added import and require statements.

used npm install statements to install packages.

npm run build , to run the build and serve -s build to start the web application on localhost.

as I am learning react js, struggled for two days to make things happen. A silly mistake that I did is

to import components, we need to use {} brackets, if we omit these brackets, then it will not give any compilation error, but while running, you will get javascript error saying

"element type is invalid: expected a string (for built-in components) or a class/function (for composite components) but got: undefined. check the render method of  'ComponentName'"

below is the working version of  the import statements in App.jsx

import React, {Component} from 'react'
import ReactDOM from 'react-dom';
import ReactD3 from 'react-d3-components';
import {Waveform} from 'react-d3-components'
import {BarChart} from 'react-d3-components'
import {PieChart} from 'react-d3-components'
import {AreaChart} from 'react-d3-components'
import {ScatterPlot} from 'react-d3-components'
import {LineChart} from 'react-d3-components'
import {Brush} from 'react-d3-components'

const d3 = require('d3');

index.json file contents

import ReactDOM from 'react-dom';
import React from 'react';
import App from './App';

const  sw = require('./registerServiceWorker');

//ReactDOM.render(<App />, document.getElementById('root'));

Below is the code to render D3 chart

render: function() {
  return (
    <LineChart data={}
                           margin={{top: 10, bottom: 50, left: 50, right: 20}}
                           xAxis={{tickValues: this.state.xScale.ticks(, 2), tickFormat: d3.time.format("%m/%d")}}

    <div className="brush" style={{float: 'none' }}>
    <Brush width={400}
                       margin={{top: 0, bottom: 30, left: 50, right: 20}}
                       extent={[new Date(2015, 2, 10), new Date(2015, 2, 12)]}
                       xAxis={{tickValues: this.state.xScaleBrush.ticks(, 2), tickFormat: d3.time.format("%m/%d")}}
 _onChange: function(extent) {
  this.setState({xScale: d3.time.scale().domain([extent[0], extent[1]]).range([0, 400 - 70])});
 <SomeComponent />,

Wednesday, August 16, 2017

Spark Programming

What is RDD:
The main abstraction Spark provides is a Resilient Distributed Dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.

RDD Types:

Parallelized collections, which take an existing Scala collection and run functions on it in parallel.
Parallelized collections are created by calling SparkContext’s parallelize method on an existing Scala collection (a Seq object). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel
One important parameter for parallel collections is the number of slices to cut the dataset into. Spark will run one task for each slice of the cluster. Typically, you want 2-4 slices for each CPU in your cluster. Normally, Spark tries to set the number of slices automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to parallelize (e.g. sc. parallelize(data, 10)).

Hadoop datasets, which run functions on each record of a file in Hadoop distributed file system or any other storage system supported by Hadoop.
Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, Amazon S3, Hypertable, HBase). Spark supports text files, SequenceFiles, and any other Hadoop InputFormat.
For other Hadoop InputFormats, you can use the SparkContext.hadoopRDD method, which takes an arbitrary JobConf and input format class, key class and value class. Set these the same way you would for a Hadoop job with your input source.

RDD Operations
Transformations which create a new dataset from an existing one, and Actions, which return a value to the driver program after running a computation on the dataset.

For example, map is a transformation that passes each dataset element through a function and returns a new distributed dataset representing the results. On the other hand, reduce is an action that aggregates all the elements of the dataset using some function and returns the final result to the driver program.

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset. The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently – for example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

By default, each transformed RDD is recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist (or cache) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk.

How to share state between nodes:
A second abstraction in Spark is shared variables that can be used in parallel operations. By default, when Spark runs a function in parallel as a set of tasks on different nodes, it ships a copy of each variable used in the function to each task. Sometimes, a variable need to be shared across tasks, or between tasks and the driver program.

broadcast variables: which can be used to cache a value in memory on all nodes,
accumulators: which are variables that are only “added” to, such as counters and sums.

Dataset (New Abstraction of Spark)
For long, RDD was the standard abstraction of Spark. But from Spark 2.0, Dataset will become the new abstraction layer for spark. Though RDD API will be available, it will become low-level API, used mostly for runtime and library development. All userland code will be written against the Dataset abstraction and it’s subset Dataframe API.

Dataset is a superset of Dataframe API which is released in Spark 1.3. Dataset together with Dataframe API brings better performance and flexibility to the platform compared to RDD API. Dataset will be also replacing RDD as an abstraction for streaming in future releases.

SparkSession (New entry point of Spark)
In earlier versions of spark, spark context was an entry point for Spark. As RDD was the main API, it was created and manipulated using context API’s. For every other API, we needed to use different contexts. For streaming, we needed StreamingContext, for SQL sqlContext and for hive HiveContext. But as DataSet and Dataframe API’s are becoming new standard API’s we need an entry point build for them. So in Spark 2.0, we have a new entry point for DataSet and Dataframe API’s called as Spark Session.

SparkSession is essentially a combination of SQLContext, HiveContext and StreamingContext. All the API’s available on those contexts are available on spark session also. Spark session internally has a spark context for actual computation.
Creating SparkSession
val sparkSession = SparkSession.builder.
      .appName("spark session example")
The above is similar to creating an SparkContext with local and creating an SQLContext wrapping it.
The Spark Session encapsulates the existing Spark Context, therefore, existing functionality should not be affected and developers may continue using the Spark Context as desired. However, the new Spark Session abstraction is preferred by the Spark community in Spark 2.0.0 on beyond.

Read data using Spark Session

The below code is reading data from csv using spark session.

val df ="header","true").

Wednesday, June 21, 2017

Spiral Tree traversal using Stack and Queue

I was asked to traverse a binary tree by level , but in Spiral order

                                   /       \
                                  B       C
                                /    \      /  \
                               D    E   F   G

should be traversed as


And the interviewer asked me to do it using a stack and a queue. here is the solution.

class Program
        static char[] inputArray = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', };
        static void Main(string[] args)
            TreeTraversalBFS treeTraversal = new TreeTraversalBFS();
            Console.WriteLine("*****************With Queue and Stack *******************");
            Node root = treeTraversal.BuildBinaryTree(inputArray);
    public class Node
        public Node left { get; set; }
        public Node right { get; set; }
        public char data;
        public Node(char data)
   = data;

    class TreeTraversalBFS
        /// <summary>
        /// Build the binary tree
        /// </summary>
        /// <param name="inputArray"></param>
        /// <returns></returns>
        public Node BuildBinaryTree(char[] inputArray)
            //to hold the nodes
            Queue<Node> queue = new Queue<Node>();
            Node root = new Node(inputArray[0]);
            for (int i = 1; i < inputArray.Length;)
                Node node = queue.Dequeue();
                Node left = new Node(inputArray[i++]);
                node.left = left;
                if (i < inputArray.Length)
                    Node right = new Node(inputArray[i++]);
                    node.right = right;
            return root;
        /// <summary>
        /// breadth-first using a queue and stack
        /// </summary>
        /// <param name="root"></param>
        public void DisplayNodesByCrisCross(Node root)
            if (root == null)
            Queue<Node> queue = new Queue<Node>();
            Stack<Node> stack = new Stack<Node>();
            int level = 0;
            while (true)
                if (level % 2 == 1)
                    int queuNodeCount = queue.Count;
                    if (queuNodeCount == 0)
                    while (queuNodeCount > 0)
                        Node queueNode = queue.Dequeue();
                        if (queueNode.left != null)
                            //insert into queue as well to display next level left to right
                        if (queueNode.right != null)
                            //insert into queue as well to display next level left to right
                    int stackNodeCount = stack.Count;
                    while (stackNodeCount > 0)
                        Node stackNode = stack.Pop();
                        Node queueNode = queue.Dequeue();
                        //display data from stack
                        //add nodes from Queue and not from Stack to display next level nodes left to right
                        if (queueNode.left != null)
                        if (queueNode.right != null)

Friday, October 23, 2015

Vsto Excel Addin project debug error "you cannot debug or run this project, because the required version of the microsoft office application is not installed"

If you have upgraded your office from 2013 to 2016 and after upgrade, if you want to debug your Excel-addin project, you will receive the below error.

you cannot debug or run this project, because the required version of the Microsoft office application is not installed

This issue is because in your project properties, target Office version is pointing to 2013 i.e 15.0 , you need to replace the 15.0 with 16.0 then it will work.

Select your   project in the solution explorer, then unload the project, then again  click on Edit your Excel-addin project after unloading as shown in the below window


The project file will be opened for editing. go to the below Tag line and replace 15.0 with 16.0. as shown below

        <ProjectProperties HostName="Excel" HostPackage="{29A7B9D7-A7F1-4328-8EF0-6B2D1A56B2C1}" OfficeVersion="15.0" VstxVersion="4.0" ApplicationType="Excel" Language="cs" TemplatesPath="VSTOTemplates" DebugInfoExeName="#Software\Microsoft\Office\15.0\Excel\InstallRoot\Path#excel.exe" DebugInfoCommandLine="/x " AddItemTemplatesGuid="{51063C3A-E220-4D12-8922-BDA915ACD783}" />

Reload the project and Set it as start up project  then start debugging..! it should work now.

Saturday, October 17, 2015

ADFS Setup.

Software requirements
  • AD FS relies on server functionality that is built into the Windows Server® 2012 operating system.
  • The Federation Service and Federation Service Proxy role services cannot coexist on the same computer.
Setup Roles and Features
  • Go to Local Server tab, scroll down to the ROLES AND FEATURES section, then click TASKS->Add Roles and Features.
  • In Add Roles and Features Wizard, click Next to continue.
  • On next screen, keep Role-based or feature-based installation checked, click Next to continue.
  • On Server selection screen, accept default settings and click Next.
  • On Server Roles screen, check Active Directory Domain Service. This will pop up a dialog prompting to enable required features. Click Add Features to continue.
  • Check Active Directory Federation Services. Again, click Add Features in the pop-up to add required features.
  • Click Next all the way till the end of the wizard workflow, accepting all default settings.
  • Click Install to continue. Once installation completes, click Close to close the wizard.
Configure AD and Domain controller
Now you’ll see a warning icon reminding you some additional configurations are needed
  • Click on the icon and click on the second item, which is Promote this server to a domain controller.
  • In Active Directory Domain Services Configuration Wizard, select Add a new forest, enter “your domain name” as Root domain name, and then click Next to continue.
  • On next screen, provide a DSRM password. Uncheck Domain Name System (DNS) server as we don’t need this capability in our scenario (this is an all-in-one forest anyway). Click Next to continue.
  • Keep clicking Next till Install button is enabled. Then click Install.
  • The machine reboots.
  • Create Some Test Accounts
  • Before we move forward, let’s create a couple of user groups and a couple of test accounts.
  • Launch Active Directory Users and Computers (Window + Q, then search for “users”).
  • Right-click on Users node, then select New->Group:
  • In New Object window, enter Manager as group name, and change Group scope to Domain local:
  • Follow the same step, create a Staff group. Right-click on Users node, then select New->User to create a new user:
  • Set up a password for the user, then finish the wizard. On a test environment, you can disallow password change and make the the password never expire to simplify password management:
  • Double-click on the user name, and add the user to Manager group:
  • Create another user, and add the user to Staff group.
Configure SSL Certificate
  • Launch Internet Information Services Manager (Window + Q, then search for “iis”).
  • Select the server node, and then double-click Server Certificates icon in the center pane.
  • In the right pane, click on Create Self-Signed Certificate… link. Give a friendly name to the cert, for example Click OK. If you open the cert, you can see the cert is issued to [your virtual machine name] This is the reason why we used domain name.

Configure ADFS Server
  • Go back to Server Manager. Click on the warning icon and select Run the AD FS Management snap-in.
  • Click on AD FS Federation Server Configuration Wizard link in the center pane.
  • In AD FS Federation Server Configuration Wizard, leave Create a new Federation Service checked, click Next to continue.
  • On next screen, keep New federation server farm checked, click Next to continue.
  • On next screen, You’ll see our self-issued certificate is automatically chosen. Click Next to continue.
  • On next screen, setup Administrator as the service account. Click Next. Click Next to complete the wizard.
Configure Trust Relationship with ACS – ADFS Configuration
  • Configure ADFS as a trusted Identity Provider of your ACS namespace. The trust relationship is mutual, which means it needs to be configured on both ADFS side and ACS side. From ADFS side, we’ll configure ACS as a trusted relying party. And from ACS side, we’ll configure ADFS as a trusted identity provider. Let’s start with ADFS configuration.
  • Back in AD FS Management snap-in, click on Required: Add a trusted relying party in the center pane.
  • In Add Relying Party Trust Wizard, click Start to continue.
  • Paste in the ACS ws-Federation metadata URL you got from your ACS namespace , and click Next to continue
  • Keep clicking Next, then finally Close to complete the wizard.
  • This brings up the claim rules window. Close it for now.
  • Back in the main window, click on Trust Relationships->Claims Provider Trust node. You’ll see Active Directory listed in the center pane. Right-click and select Edit Claim Rules…
  • In the Edit Claim Rules for Active Directory dialog, click Add Rule… button.
  • Select Send Group Membership as a Claim template. Click Next.
  • On next screen, set the rule name as Role claim. Pick the Manager group using the Browse… button. Pick Role as output claim type. And set claim value to be Manager. Then click Finish. What we are doing here is to generate a Role claim with value Manager for all users in the Manager group in our AD.
  • Add another rule, and this time select Send LDAP Attribute as Claims template.
  • Set rule name as Name claim. Pick Active Directory as attribute store, and set up the rule to map Given-Name attribute to Name claim
  • Back in the main window, click on Trust Relationships->Relying Party Trusts node. You’ll see your ACS namespace listed in the center pane. Right-click on it and select Edit Claim Rules…
  • Add a new rule using Pass Through or Filter an Incoming Claim template.
  • Pass through all Role claims
  • Similarly, add another pass-through rule for Name claim.
  • Now our ADFS server is configured to trust our ACS namespace, and it will issue a Name claim and a Role claim for authenticated users.