Tuesday, April 22, 2014

Advantages of Data Visualization tools

In today's result oriented environment, most companies wants the output in a short duration with lesser cost and quick turn around time.

Business Discovery is the term which the visualization tools as the data is viewed in an interactive manner. Display the data in a summarized way and slice and dice the data in a page or across the entire document enables the user to analze the data better.

Visualization tools typically work on In Memory Database ie when the data gets loaded to the server of the visualization tool, the data gets compressed to the maximum level so that it can hold more data in the server which enables the end user to interact with data using the tools.

Qlikview, Tableau, Spotfire etc are some of the visualization tools in the market which is in top 3 spots in this space.

Traditional Data Warehousing
Creating and enterprise Data warehousing or Data Mart for an enterprise is the way we create the DW systems. Typically this gets sponsored by the senior managers who understands the current issues (typicall data is not integrated across OR reporting is difficult for the MIS department). Once we implement the DW, the companies will be able to view the data at enterprise level, can analyze the data using OLAP technologies like SAP Business Objects, Cognos and Microstrategy. OLAP enables the companies to answer why it happend (Ad hoc) kind of questions which is the biggest advantages of creating the DW. This enables the DW users to cut across any subject area provided we have metadata in the OLAP tools for analysis.

Dis Advantages of traditional Data Warehousing implementation
Time consuming to implement the traditional DW is one of disadvantage. Deciding on ETL technology / methodology, building Data Models, creating meta data models for OLAP, creating reports, understand the data of different OLTP systems, creating the common reference model which fits all the operational system etc takes its own time to make it better. If the time and resource is the problem, then companies are requesting to create the reports (pre defined) and Dashboards which is interactive with lesser cost which takes less time to implement.

To beat the disadvantage of the traditional DW implementation, now companies are prefering to implement the visualization tools to make sure the data is displayed in the format the management requires with out building the DW and putting more emphasis on outcome (result). This is where the visualization tools gets better score to implement with in the department or enterprise. This tools provides a way to analyze data by making sure the interactive in the reports / dashboards are very high.

Here in the Visualization tools, we extract data from various sources and directly populate to the documents / reports rather than storing the data in a centralized repository. This makes the development faster. We refresh the data periodically based on the changes in the source data or based on the requirement the business has. The effort required to build Dashboards and interactivity using this tools are very less.

Monday, April 21, 2014

Top 10 to remember when you are ETL developer

Every resource who do programming to process data in an Integration environment should follow this top 10 thing to become more productivity.


1. Understand the overall objective of the task.
  • Are you loading data from source to staging OR staging to DW or DW to some other system.

2. Understanding the source and source type (database, XML, flat file etc)
  • Where does the source exists? Is it in the file system or its an RDBMS.
3. Visualize / Collect the sample data from source
  • Visualize the sample data, if there is no sample data, create the test data which is equivalent to the realistic data by taking the help of subject matter expert if available. 
4. Frequency of the source data generation and acceptance
  • What is the frequency of data the source generate and how you identify the type of extract we should apply on the source data.
 
5. Understand the target database / files
  •  What is the target file or database structure looks like? What is the PK of the target tables or how we uniquely identify the records in the file.
6. Visualize the data based on the sample source data
  • Based on the sample input data, you have to apply business rules so that transformed data can be visualized in target structure or files.
     
7. Create mapping document (contains the source system, source file / table, target system, target file / table, what business rule you apply on the columns ).
  • Use an excel sheet or use the standard mapping document according to the company you are working for. Make sure you have columns you identified as source and the target file / database columns and make sure you have the business rules documented as much as possible in the mapping document. By seeing the mapping document, you should know the work we have to do as ETL developer with the tool / program.
8. Identify the gaps between the source and required data in target
  • After completing the mapping document, you should be able to figure out how much of columns we are yet to identify. By using this mapping document, we can sit with any one in the team to get it clarified so that one person can take the responsibility to close the same.
9. Develop the ETL job
  • Use all the artifacts you collected till now and start creating the ETL job based on the mapping document.
10. Unit test the ETL
  • Create the unit test document and make sure all the possible combinations of business rules. Identify the different ways of testing, row count testing or value testing according to the transformations we apply.
 
By applying the above mentioned points, we should be able to make sure we can develop good ETL programs.