Tech Tips

Find Duplicate Records in IBM SPSS Modeler

To improve your experience using IBM SPSS Modeler, the Version 1 SPSS experts have created various Tech Tips. This Tech Tip shows how to find duplicate records in IBM SPSS Modeler. 

IBM SPSS Modeler is an extensive predictive analytics platform designed to bring predictive intelligence to decisions made by individuals, groups, systems, and the enterprise. Modeler has an easy-to-use drag-and-drop user interface with a complete set of tools for accessing data, data examination, preparation, modelling, evaluation, and deployment. 

IBM SPSS Modeler users have a complete toolset to build predictive models from start to finish. Modeler uses node-based, visual programming. Users pick nodes from palettes and place them on the stream canvas. Once nodes have been placed on the stream canvas and edited, they can be linked to form a stream. A stream represents a flow of data through several operations (nodes) to a destination that can be in the form of output (either text or chart), a model, or the export of data to another format (e.g., a database). 

Identifying duplicate records in IBM SPSS Modeler is critical in the data understanding, examination, and preparation phases of predictive modelling. To identify duplicates, go to the Record Ops palette. Select the Distinct node and drag it onto the stream canvas. You can also double-click the node to drop it onto the stream canvas. Once it is on the canvas, you can connect it to your stream. Double-click to open the node. 

On the Distinct node, the options are to create a composite record for each group, include only the first record in each group, and discard only the first record in each group. With ‘Create a composite record’ for each group, you can aggregate non-numeric fields. ‘Include only the first record in each group’ the first record is selected from duplicate records the rest are discarded.

With ‘Discard only the first record’, the first record from the duplicates is discarded, and the remaining records are selected. The ‘Key fields for grouping’ optionlists the field or fields used to determine whether records are identical. The Distinct node also has the sort order option that lists the fields used to determine how records are sorted within each group of duplicates and whether they are sorted in ascending or descending order. 

Tools Covered

IBM SPSS Modeler

Related Solutions

Training

Tagged As IBM Modeler Advanced

Need some help?

Image of three women working on laptops at a table for Version 1 SPSS Training

Learn how to use SPSS from the experts

With more than 20 years of delivering highly successful training programs, Version 1 offers a wide range of training options to best suit your requirements, enabling you to optimise your IBM SPSS Software, achieve your analytical goals and continually improve your results.

Related Tech Tips

Our SPSS experts have created a range of Tech Tips for IBM SPSS Modeler. Take a look through.

Arrange a free consultation to discuss your analytical needs, and identify the best solution for you.