Off late, I have came across some names like CassandraDB, MongoDB, CouchDB etc. from my friends who are working with open source technologies.

When I started studying and understanding these new technologies, I realized that these are all non-relational or no-SQL database solutions meant for Big Data Analytics.

So I thought it would be more appropriate for me to understand what Big Data is before I could proceed further with no-SQL databases exploration. In this article I am going to share some of my insights about Big Data and no-SQL databases with you all.

BIG Data

According to one of the latest study, every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.

BigData_1

 

Big Data is the term used to describe a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.

Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.

Big data is important to business and society. More data may lead to more accurate analyses. More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.

After the advent of Internet, any company’s BI & Analytics doesn’t solely depend on business data. As we evolved with Internet, we have started adding weblogs, videos, Images, sensor data, 3rd partly application data like Facebook and twitter to our systems. I mean to say that these days’ organizations are witnessing more of unstructured data than structured data. It all should now be included in the analysis for decision making.

BigData_2

Popular sites like Facebook, Twitter, YouTube, Instagram and LinkedIn have all been exploded in user groups and they are the major content producers on Internet. The amount of images to Facebook or the number of videos to YouTube or unfathomable.

BigData_3

  • Data is different today.
  • 80% of enterprise data is unstructured.
  • Unstructured data is growing 2X faster than structured

BigData_4

3 V’s of BIG data

In 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the three Vs of big data: volume, velocity and variety.

BigData_5

BigData_6

Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.

Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.

Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.

BigData_7

 

Why Big Data should Matter to any organization?

The real issue is not about acquiring large amounts of data. It’s what we do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable cost reductions, time reductions, new product development and optimized offerings.

For example 80% of medical data is unstructured and is clinically relevant. Data resides in multiple places like individual EMRs, lab and imaging systems, physician notes, medical correspondence, claims etc. By Leveraging Big Data, we can build sustainable healthcare s systems, collaborate to improve care and outcomes.

For instance, by combining big data and high-powered analytics, it is possible to:

  • Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually.
  • Optimize routes for many thousands of package delivery vehicles while they are on the road.
  • Generate retail coupons at the point of sale based on the customer’s current and past purchases.
  • Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers.
  • Recalculate entire risk portfolios in minutes.
  • Quickly identify customers who matter the most.
  • Use data mining to detect fraudulent behavior.

Big Data is gaining popularity since storage is becoming cheaper and cheaper.

BigData_8

NOSQL DATABASES

NoSQL technology was pioneered by leading internet companies — including Google, Facebook, Amazon, and LinkedIn — to overcome the limitations of 40-year-old relational database technology for use with modern web applications. Today, enterprises are adopting NoSQL for a growing number of uses cases, a choice that is driven by four interrelated megatrends: Big Users, Big Data, the Internet of Things, and Cloud Computing.

There are many NOSQL databases which are gaining momentum in the context of Big Data.

Big Data storage with Key value pairs databases:

BigData_9

  • Azure table Storage
  • Redis
  • MemcacheDB
  • HamsterDB
  • DynamoDB

Big Data storage with Column Family Stores databases:

BigData_10

  • Hbase
  • CassandraDB
  • Amazon Simple DB

 

Big Data storage with Document Stores databases:

BigData_11

 

  • MongoDB
  • CouchDB

NoSQL database doesn’t mean that it won’t support any querying capabilities.

  • Query Language
  • Fast Performance
  • Horizontal Scalability
  • Replication
  • Load Balance
  • File Storage

However we will miss some of the features which we are very much used to in any RDBMS database.

  • No Joins support
  • No support for Transactions
  • No support for constraints

If your business needs would not allow you to have an ideal database schema with pre-defined list of tables and it needs to be changed with the ever growing needs of the users, then you need to look out for NOSQL databases which will

I have started exploring very interesting feature of mongoDB, which is a scalable open source high performance document oriented database.

BigData_12

 

MongoDB stores data using a flexible document data model that is similar to JSON. Documents contain one or more fields, including arrays, binary data and sub-documents. Fields can vary from document to document. This flexibility allows development teams to evolve the data model rapidly as their application requirements change.

Developers access documents through rich, idiomatic drivers available in all popular programming languages. Documents map naturally to the objects in modern languages, which allows developers to be extremely productive. Typically, there’s no need for an ORM layer.

Stay tuned for my next post on mongoDB features, installation and usage details.

Advertisements

12thOct_1

As a developer, regardless of your programming language or the platform that you target, we use the debugger on a daily basis. If you are a developer using Microsoft technologies then you need to learn and understand how to make the most of the Visual Studio debugger so that you can be more productive and effective in your everyday development.

Irrespective of your experience level, I encourage you to read my blog post series on Visual Studio debugging techniques. This is my second blog post in this series and I am sure that by the time you finish reading this entire series, you will have some good knowledge of debugger features which you would want to use immediately when you get back to your computer!

  
  Step Into Tips

12thOct_3

 

 

 

I am sure you might have used ‘Step Into’ (F11) feature of Debug tool bar whenever you would like to debug the code related to a function call line by line. If the line contains a function call, Step Into executes only the call itself, then halts at the first line of code inside the function. Otherwise, Step Into executes the next statement.

12thOct_4

12thOct_QTTNow I will tell you another interesting usage of F11. Whenever we would like to debug an application we setup breakpoint(s) and hit F5. But instead of starting the debugging by pressing F5 if you press F11 it would takes you out of the designer mode into debug mode to the first line of your code that is going to be executed.

 

12thOct_QTTThis is a useful, whenever you would like to debug startup issues in your project. Imagine how you would have done it otherwise? You could have scanned the code and identified what was the first line that is going to be executed outside designer and setup a break point before hitting F5 and remove the break point later. The key here is it saves your time.

 

12thOct_QTTAnother useful tip is that you can start debugging by selecting a line in the source code and use Run to Cursor (CTRL+F10) option instead of using Start Debugging F5. How cool is that? Even if you would like to debug the click event of a button then directly go to the source code and use Run to Cursor (CTRL+F10) option.

 Import and Export Data Tips

At times we want to continue the debugging session which we are doing on a coworker machine or perhaps you would like to handover the debug session to someone else for their quick feedback.

The first step towards accomplishing this activity is to use import/export breakpoint functionality. But what if you would like to write some debugging comments in your code which could be helpful to those who are resuming this debugging session.

Drag-Drop Pin Data Tip

To accomplish this you need to Pin the Data Tip. Data tip is kind of an advanced tool tip message which is used to inspect the objects or variable during the debugging of the application. When debugger hits the breakpoint, if you mouse over to any of the objects or variables, you can see their current values.

 12thOct_5

12thOct_QTT

 

 

Hitting Ctrl turns the data tip currently displayed nearly transparent. It becomes visible again as soon as it is released.

Change Value Using Data Tips

DataTips is also used to change the value while debugging. This means it can be used like a watch window. From the list of Pinned objects, you can change their value to see the impact on the program.

 12thOct_6

 

 

 

We can Pin/Unpin Object/Variable Inspect or Data Tip. By pressing the pin Icon Data Tip can be pinned or unpinned.

 12thOct_7

 

 

 

Click on expand to see comments downwards arrow and type your comments here. BTW it is a multi-line comment.

12thOct_8

12thOct_9

Last Session Debugging Value

This is another great feature of Visual Studio 2010 debugging. If you pinned some data tip during the debugging, the value of pinned item will remain stored in a session.

It means even when you stop debugging this information is still available in Visual studio.

12thOct_QTT

We can go back to the same line and hover the mouse on the pin to see the details of the last debugging session value as shown in the below picture:

 

 

 12thOct_10

Export/Import/Clear Data Tips

Later you can use Debug > Export Data Tips command to export the data tips and Debug > Import Data Tips command to import it on to a different machine and continue debugging. Debug > Clear Data Tips can be used to clear the data tips in the solution.

12thOct_11

Debugger Display Attribute

You can notice some specific information displayed for a type when I hover the mouse on it.

12thOct_12

 

 

In the above example when I hover the mouse on SystemInfo type, debugger is displaying some specific information related to this type. It is because we have decorated SystemInfo class as shown below.

12thOct_13

 

 

 

 

 

 

 

Debugger display attributes allow the developer of the type, who specifies and best understands the runtime behavior of that type, to also specify what that type will look like when it is displayed in a debugger.

The DebuggerDisplayAttribute constructor has a single argument: a string to be displayed in the value column for instances of the type. This string can contain braces ({ and }). The text within a pair of braces is evaluated as an expression. For example, the following C# code causes “Count = 4” to be displayed when the plus sign (+) is selected to expand the debugger display for an instance of MyHashtable.

12thOct_14

Data Tips from comments

Did you know that you can get tool tips from comments? I can hover mouse on comment which actually shows the data tip. Which means you can leave comments with useful expressions.

12thOct_15

 

 

 

12thOct_QTTYou can do this on the fly as well. Which means during debugging you can add expressions as comments and determine the value. How cool is that?

 

 

12thOct_16

 

 

12thOct_17

 

I will catchup with you next week with my last blog post on this series with some more interesting tips.

Until then Happy Debugging!!!