How are variables useful in the ssis package

SQL Server 2012 Integration Services overview and best practices

Transcript

1 SQL Server 2012 Integration Services Overview and Best Practices Robert Panther, Senior Consultant CGI Group Inc.

2 Introduction Name Role Professional focus Robert Panther Senior Consultant, Logica now a part of CGI Databases: Design, Implementation & Administration (especially performance optimization and ETL with SSIS) Mobile Applications: Design & Implementation Web Applications: Implementation & Administration Technologies Microsoft SQL Server Windows Mobile .NET (VB.NET & C #), ASP, ASP.NET methods database design, object-oriented programming, structured programming, training conception & implementation, project management Certificates Special Computer Science Diploma (FH) MCTS SQL Server 2005 Implementation & Maintenance MCTS SQL Server 2005 Business Intelligence MCTS SQL Server 2008 Implementation & Maintenance MCITP SQL Server 2008 Developer Author of several specialist books and articles Head of the SQL Server Expert Group at Logica / CGI Regular speaker at specialist conferences Active PASS member 2

3 Agenda Overview SSIS What's New in SSIS 2012 Best Practices General Conventions Structure of SSIS Projects Naming Conventions Solutions for Known Problems Conclusion What is still missing? Links & Literature 3

4 SSIS overview

5 Overview of SSIS What is SSIS? SSIS = SQL Server Integration Services SSIS is an ETL tool E Extract: Extracting data T Transform: Transforming data L Load: Loading data SSIS is part of the SQL Server (or its BI tools) SSIS: SQL Server Integration Services SSAS: SQL Server Analysis Services SSRS: SQL Server Reporting Services 5

6 Overview SSIS Typical uses of SSIS: Filling of data warehouses Data migrations Regular data synchronization Export batches etc. History of SSIS: SQL Server 2000: DTS (predecessor of SSIS) SQL 2005/2008/2008 (R2): SSIS (each slightly improved) SQL 2012: SSIS heavily revised 6

7 Overview SSIS Development of SSIS packages: Up to SQL 2008 R2 with the Business Intelligence Development Studio Since SQL 2012 with the SQL Server Data Tools Execution of SSIS packages: explicitly planned in the development environment (for testing and debugging) via the SQL Server Management Studio via the SQL Server Agent via the Execute Package Utility (DTEXEC / DTEXECUI) SSIS runs as a separate process 7

8 Overview of SSIS projects and packages Solution folders: can contain several projects the projects can have different types SSIS projects: can contain several SSIS packages SSIS packages: are comparable to individual programs an SSIS package is completely saved in a DTSX file ( XML format) are completely checked out in the source code management 8

9 Overview of SSIS sequence control The basic structure of an SSIS package is determined by the sequence control (Control Flow), comparable to a program flow chart, is composed of sequence control elements Processing tasks Structural elements (loop and sequence container) Special case: Maintenance plan tasks Certain tasks can call separately defined modules: data flow task Execute package -Task Execute process -Task Execute SQL -Task Transaction handling also takes place in the flow control 9

10 Overview of SSIS process control A simple process: 10

11 Overview of SSIS sequence control Execution order: unconnected tasks are executed in parallel with arrows connected tasks are executed in the order of the arrow direction multiple output arrows mean parallel execution multiple input arrows mean all previous steps must be finished (can also be set differently) connecting arrows can be set with restrictions ( Success / failure / termination) and / or refine expressions Tasks can be grouped with sequence containers Tasks can be run through multiple times with for or foreach loop containers 11

12 Overview SSIS data flow the most important task in the flow control is the data flow task a data flow represents the actual ETL process data flows are designed in a separate view and usually consist of three components: data flow source (extract) data flow transformation (transform) data flow target (load) data flows In principle, executed in the direction of the arrow, neither in blocks nor completely per step, but in blocks. The goal is the greatest possible parallelism 12

13 Overview of SSIS data flow A simple data flow: 13

14 Overview SSIS data flow Data sources and targets: There are various data sources and targets: Flat files Excel tables XML files ADO.NET (SQL Server, Oracle, ODBC etc.) OLE DB (SQL Server, Oracle, Jet Engine / MS Access etc.) .) If a database is used as the source, either a table / view or an SQL query can be specified 14

15 Overview SSIS data flow Transformations: Transformations change the incoming data Transformations can also change the number of rows and columns Some transformation types have several inputs UNION ALL (UNION of any number of even unsorted inputs) Merge (UNION of two sorted inputs) Merge link (JOIN of two sorted inputs) some transformation types have several outputs for correct and incorrect lines multicast (several identical outputs) for complex transformations, script components can be used whose code is run through per line of the data flow 15

16 What's New in SSIS 2012

17 Innovations in SSIS 2012 In general, VS 2010 IDE uses (e.g. zoom and freely movable windows) APIs for extensions Scope of variables subsequently adapt more readable XML finally an undo in the designer! copy & paste improves Autosave & Recovery no more DTS support 17

18 Innovations in SSIS 2012 SSIS server manages complete projects, not just individual packages has its own catalog DB (SSISDB) Packages can be executed remotely via SSIS service Integration Services dashboard Administration and execution via SP's own performance counter 18

19 Innovations in SSIS 2012 SSIS projects Connection Manager at project level (no more Shared DataSource) Configuration using project parameters Deployment in the form of .ispac files Flow control Execute Package optionally from project reference (without explicit Connection Manager) Expression Task (e.g. for pre-assigning variables) 19

20 New Features in SSIS 2012 Data Flow Source / Destination Assistant Repair Work Offline mapping changes (e.g. changed column names) Pivot grouping via column names (no longer LineageIDs) in DF (with collapse & expand) Merge & Merge Join (throttled if an input is slow, saves resources ) simplified data viewer (no graphical representation) special data quality transformations (DQS adjustment) 20

21 Best Practices

22 SSIS Best Practices General Conventions General Conventions Packages should always be able to be started repeatedly at the beginning tidy up use transactions if necessary use error outputs In the simplest case in multicast containers Ideally with error logging, logging in additional tables e.g. for complex mappings configurability via variables and expressions 22

23 SSIS Best Practices General conventions Conventions for better readability Process control and data flow from top to bottom Avoid the need for horizontal scrolling Move error handling in the data flow to the side 23

24 SSIS Best Practices Structure of projects Structure of SSIS projects One package per data source or data destination One master package that calls sub-packages Configuration of the variables General preparation and follow-up De- & re-activating foreign keys Sequence container for structuring 24

25 SSIS Best Practices Structure of packages Structure of SSIS packages Preparation steps Disable (non-clustered) indexes Empty target tables Data flow Follow-up steps Recreate indexes 25

26 SSIS Best Practices Naming Conventions Naming Conventions Name of the package corresponds to the source or destination (uniform!) Use short prefixes for tasks and containers, e.g. SCR for Script Tasks SQL for SQL Tasks EP for Execute Package LKP for Search (from Lookup) ODB for OLE DB Sources & Targets 26

27 Solutions to Known Problems

28 Solutions to known problems Problem: Certain data sources (e.g. Excel, Access) cannot be used. Solution: Change the project to 32Bit Runtime. 28

29 Solutions to known problems Problem: The target in the data flow lets whole block of lines run into the error output in the event of a faulty line. Solution: 1. Provide an OLE DB target in fast loading mode with an error output. 2. Write faulty lines again in slow mode. 3. Further error output with treatment for the lines with the final error. 29

30 Solutions to Known Problems 30

31 Solutions to Known Problems Problem: Data flow writes to two target tables, one of which references the other. Possible solutions: a) Previously known IDs are used for primary and foreign keys (possibly determine the maximum ID beforehand). b) Values ​​defined in the data flow (e.g. guides generated in the source query) are used for primary and foreign keys. 31

32 Solutions to Known Problems Problem: Data flow writes to two target tables, one of which references the other. The tables use IDs with identity specification. Possible solutions: a) The first table is written using a stored procedure and the ID of the new line is returned in the data flow using output parameters. b) Foreign key column is filled with a temporary value, which is replaced by the correct ID after the data flow via postprocessing script. 32

33 Conclusion

34 Conclusion SSIS in general SSIS is ideally suited even for simple data transfers! Despite the simple visual representation, you shouldn't be fooled into the fact that very complex processes can be implemented with SSIS! SSIS is versatile. Typical batch runs can be implemented with SSIS! SSIS can be expanded almost at will: data sources, tasks and components from third-party providers own.net-code any.net-assemblies 34

35 Conclusion SSIS in general Why SSIS and not.net or SQL? Optimized for mass data processing Data sources and targets can be on different servers Error handling and debugging (redirecting incorrect lines) Using parallel processing, if necessary ,.net and SQL can be integrated 35

36 Conclusion SSIS 2012 The SQL Server Integration Services have finally grown up. Most of the shortcomings of the previous versions have been eliminated. In some places, however, there is still room for improvement, as before. What is still missing? Support in aligning objects in the Designer Commenting out steps in the data flow (e.g. data target) Structuring SSIS packages in the solution (subfolders, sorting, etc.) More convenient SQL editors Zoom in error view Data target with OUTPUT (inserted) 36

37 Links & Literature Links related to SSIS: SSIS online documentation: Resources for SSIS developers and IT specialists: Links of the speaker: SQL Server Blog: Logica / CGI: / Literature: Microsoft SQL Server Integration Services: Creating data integration and Enterprise-level data transformation solutions Bernd Jungbluth, Microsoft Press, 2011 (updated edition for SQL 2012 planned for June 2013) 37

38 A small matter on our own behalf To strengthen our SQL Server team at the Frankfurt / Sulzbach location, we are looking for database architects Microsoft SQL Server (m / f) You will be responsible for consulting, conception, project management, but also for the implementation of sophisticated solutions, whose technological focus is on Microsoft SQL Server. They help our clients to make business-critical applications highly available, secure and cost-effective. You use modern methods and techniques and integrate them into the corporate landscape. In this way, you contribute to the creation of future-proof platforms on which business and e-business can be operated securely and stably. Interest? Then talk to me or send me an email at 38

39 Thank you for your attention! Robert Panther 39