Friday, August 28, 2009

How to read a zip file using SSIS or unzip a zip file using SSIS

I'm reading: How to read a zip file using SSIS or unzip a zip file using SSISTweet this !
Today while browsing Internet to hunt food for my brain, I came across a very nice article by Jorg Klein on How to read a zip file in SSIS. Before reading the article, even I thought that the author must be using some third party library, and using the same, zip file would be getting read in a script task.

But I was really surprised, when I found that it's not the case. Author has used a Visual J# library (vjslib.dll) to read the zip file, which ships as a part of Visual J# Redistributable Package. Before a couple of years, I had worked for a few weeks in Java to help out a project and from that experience I knew that Java Runtime Engine probably comes with this kind of libraries to read .jar files.

It's a brilliant discovery, and serves a great purpose. By this functionality, one can download archived data from web, unzip it and process it to it's destination. Downloading data using a web-service and processing is very much possible using SSIS, but the missing bridge was unzipping files using SSIS without using 3rd party components. And now the bridge is ready !!

The library comes for free, and the author has published a nice step by step tutorial describing the implementation of the same. The entire article can be read on this link.

Thursday, August 27, 2009

Dimensional Modeling Tutorial : Employee Dimension

I'm reading: Dimensional Modeling Tutorial : Employee DimensionTweet this !
Dimensional Modeling is one of the areas where many of the aspirants want to migrate. I know many SQL developers who wish to learn dimensional modeling, but are not able to find out a starting point. Also not many books are available on the same. Even those who are seasoned players in this area, do encounter difficulties as the subject is very broad and is more of a techno-functional nature than just being technical. These are my personal views and not necessary be universally true.

Articles on the Kimball University website or other related websites like Intelligent Enterprise for example are some of the best sources who wish to start or improvise their knowledge on dimensional modeling. This even applies to me. I believe it's a matter of experience, by which your skill improvises in this area, not by just reading a book or webcast.

In dimensional modeling, generally we work on at least some of the common entities which generally exist in any regular business application, for example : Materials, Customers, Employees, Payroll, HR etc.

There's a good article published recently which can server as a good tutorial on how to architect dimensional model for Employee by Kimball University on I believe that this article presumes some level of data warehousing knowledge and understanding from the reader.

I suggest, if one wants to take the first step on dimensional modeling, then one should read this article. Thereafter, being an MS BI professional, I would implement the same using SQL Server 2008 and SSAS, populate the relational data warehouse and cube and attempt experimenting different permutations and combinations in the model to experience the various technicalities of dimensional modeling.

Tuesday, August 25, 2009

What is project Gemini

I'm reading: What is project GeminiTweet this !
Today, this is my first post on Microsoft's new flagship tool for the theme / concept called self-service analysis - project gemini. I have read a couple of articles from OLAP report and others. I have also gone thru a couple of videos on this topic. Some of the few things that I know and feel about this tool are as below:

1) It has a very very strong integration with Microsoft Excel and it's the primary runway from where Gemini takes it's flight to the Business Intelligence space. It is also heard to have nice integration with Sharepoint and other report authoring or rendering tools.

2) It's heard to be an in-memory storage form of Analysis services, i.e. it can be said as an another compound of analysis services. In fact data stored in Excel via Gemini add-in for excel is stored in the form of a .AS database, which means that it can be consumed by any applications which can consume data from analysis services. I may not be precise in the above statement, but the summary is data projected by gemini to its host applications can be or rather would be in the form of an analysis services database.

3) Gemini add-in for Excel is one of the fantastic ways for business users to get the most out of Excel, still having the capabilities of the likes of any data warehouse analysis hosted in a dashboard hosting environment like performance point. Here Gemini is the analysis tool, Excel can store the data in place of data warehouse and Excel charts / graphs act as the dashboard or visualization holders. Also Gemini add-in makes use of Silverlight for it's visualizations.

4) This tool as heard and discussed by industry experts, is a Microsoft's competition to tools like QlikView in the self service or I would like to use the word "portable" business intelligence arena for business (non technical) users. The advantage that gemini has, is the immense propagation of Microsoft Excel in the industry with its added popularity among business users, and its tight integration with the same.

5) Below is a link of few videos which provides preview of project Gemini:

Tuesday, August 18, 2009

Using SSIS with SAP

I'm reading: Using SSIS with SAPTweet this !
I have been working since a few months on using SSIS with SAP. Generally while working with the functional consultants, it can be a very hard time getting the architecture in shape keeping in view their understanding of technology in alignment with the requirements. There are certain points which a SSIS developer or designer should keep in mind while working on a project where SSIS would be used for catering data to SAP environments from legacy systems, and to make it worse if the project is of data migration.

1) An SSIS package designed for IDocs would be much simpler to understand and design compared to the same for Screen recorded sessions.

2) Always sort the data after reading (or you can also use the IsSorted property setting) and sort the data before you write it down to the file output.

3) Audit the data to the necessary and sufficient level. More auditing would break a developers shoulder and less would break the functional / technical SAP consultant's shoulders.

4) Never let your main package read the data directly from file input. Stage the data into a relational storage like a table. Also never read directly from the table. Always keep an interface like a stored procedure or a view from which your package should read. This design would have the advantage of more resistance towards changes, which is the theme of any such project.

5) Always keep the main control in the database which designing the architecture, for eg. configuration settings, loop iterator collections, file paths etc.

6) Generally data objects like Materials, Customers, Vendors, Finance, Payroll, Employee, etc... are the kind of data objects that gets designed in the SAP system. The first three are the foundation stones to the best of my SAP knowledge till date.

7) Try to use less variety of data types, and the rule of best fit data type does not apply to data migration project. More the synchronization you have at your relational staging area, the less pain you would have while designing your package.

8) Be well versed with casting operators and functions, as generally you would require to use them a lot in such projects.

9) Import - Export wizard can be a nuisance if the text file from which you are importing the data have duplicate column names. If you are working with text files as input source files, make sure you do a basic minimum profiling of the source files.

There are many more points worth mentioning, but for now this is just a quick starter.

Monday, August 10, 2009

Why only one of my package hangs out in BIDS , while other packages run faster ?

I'm reading: Why only one of my package hangs out in BIDS , while other packages run faster ?Tweet this !
While designing a package in BIDS, one of the issues faced is BIDS getting very slow. This can be experienced by symptoms like for example, while trying to place a control on the package, it takes a few seconds just to get the control placed on the package. There can be several reason for which the package might be running slow. One of the reasons that I want to mention in this post is the errors already present on the package.

If there are many errors on the package due to incomplete configuration of the controls, there would be many errors reported on the error list window. This error list keeps on getting updated at every change in the package. For example, there are 6-10 flat file destinations which are not configured with the input stream and placed on the package for future use. There would be dozens of errors reported on the package, which can be seen from the errors window. Then suppose if a sort transformation is placed on the package, BIDS will validate the entire package again and the entire error list would be refreshed again. This makes sense as BIDS is trying to find out if the change in package has impacted or rectified any errors present on the package.

So in essence, the time taken is more. And the time taken is for the entire process and not just placing of control on the package, though it might seem that just placing of control on the package is taking a lot of time.

The resolution is, disable all the control flow elements that have errors due to logical reasons or due to incomplete configuration. Unfortunately as there is no disable functionality in the data flow, remove all the tasks that have an error due to known reasons. Save the package, and try to execute it again. Now the package designer in BIDS should run much faster than what was the case earlier, provided you have quite a few erroneous tasks on the package.

Friday, August 07, 2009

Database Modelling Reverse Engineering , LINQ to SQL - Entity Developer Express

I'm reading: Database Modelling Reverse Engineering , LINQ to SQL - Entity Developer ExpressTweet this !
One of the things that I have learned and experienced in my career is, IT cannot work in isolation. One needs to be versatile to understand and learn technology, to grow big. It need not be a universal truth, but it has been my career experience.

Coming to the point, there is new tool out called Entity Developer Express. This is a tool not exactly for MS BI or SQL Server development or maintenance. But I am listing it as it's foundations are laid on SQL Server. While working on projects that have DotNet, SQL Server and/or Microsoft Business Intelligence features involved, this kind of tools makes a developer life easy.

I have neither evaluated this tool, nor I am going to do it as presently I am working on system integration project involving SAP and SSIS. So to be honest, no time and less interest to evaluate it.

The description on the product page is quite sufficient to describe itself, which is as below:

Entity Developer is a free of charge powerful modeling and code generation tool for ADO.NET Entity Framework and LINQ to SQL. You can design an entity model from scratch or reverse-engineer an existing database, in which case you will have the classes and the relations with just several clicks. The model is used to generate C# or Visual Basic code with flexible predefined code templates. You can also customize the templates to fit all your needs. Entity Developer for SQL Server uses standard Microsoft SqlClient, LINQ to SQL, and Entity Framework, all of them bundled with .NET Framework 3.5 SP1.

Key Features:
- ADO.NET Entity Framework support
- LINQ to SQL support
- Visual schema modelling
- Customizable and intuitive user interface
- Reverse engineering (database object to entity model)
- Well-thought predefined templates for class generation
- GUI for predefined templates
- State-of-art diagram with zooming
- Diagram overview and printing
- Undo/Redo support
- One-click help system
- Visual Studio is not required
- Free of charge

Monday, August 03, 2009

Spatial Data , Spatial Query , Visual data analysis of SQL Server - Mappoint Add-in for SQL Server

I'm reading: Spatial Data , Spatial Query , Visual data analysis of SQL Server - Mappoint Add-in for SQL ServerTweet this !
I have always been a very big admirer of visuals that Proclarity Analytics provides. The different variety of graphs that it provides for analysis are amazing. Recently, I stumbled upon an add-in that stands in the same league or even better appealing than proclarity analytics visuals.

This add-in is "MapPoint Add-in for SQL Server". It's a very innovative way for some of the key business analysis KPI. For presenting MIS related information, as of this draft, I am not able to think anything better than this. Just use the data already stored in SQL Server and connect it to maps (available for US and EUR only) and it's good to go for analysis.

This would definitely require data in some specific structure to make it usable for this add-in. Also it would require a Mappoint installation.

From a developer perspective, this is one of the best tool to use for practicing and envisioning results of your Spatial data. This add-in has a great support for spatial query development and analysis. Generally some front-end is required to consume and display spatial-data which is generally targeted for geometry, and what better can be a consumer of this than a real map itself.

If not for anything else, I would prefer it for getting a good hold / correcting and analysing my spatial query skills and spatial data analysis.

Related Posts with Thumbnails