Visual clustering analysis of CIS logs to inform creation of a user-configurable Web CIS interface

Y Senathirajah, S Bakken, Y Senathirajah, S Bakken

Abstract

Background: In this paper, we describe a new method for the study of clinical information system (CIS) logfiles joined with information in the clinical data warehouse. This method uses heatmap representations and clustering techniques to examine clinicians' viewing patterns of laboratory test results. The context of our application of these techniques is to inform the creation of a widget-based interface to the CIS.

Objectives: We address the rationale, feasibility, and usefulness of our method through examination of three hypotheses: 1) The frequency distribution of laboratory test viewing will follow a 'long tail' pattern, indicating that patterns are highly variable and supporting the rationale for a widget-based configurable system. 2) Patterns of laboratory testing viewing (by clinician, specialty, clinician/patient/day, and ICD-9-CM codes) can be distinguished by our methods. 3) The identified clusters will include more than 80% of the laboratory test elements found in 30 randomly selected patient records for one day.

Methods: The data were plotted as heatmaps and clustered using hierarchical clustering software. Various parameters were tested to give the optimal clusters.

Results: All the hypotheses were supported. For Hypothesis 3, 91.4% of information elements in the records were covered by the generated clusters.

Conclusions: Study findings support the rationale, feasibility, and usefulness of our methods to examine patterns of information access among clinicians and to inform the creation of widget-based interfaces. The results also contribute to our general understanding of clinicians' CIS use.

Figures

Fig. 1
Fig. 1
System architecture as it pertains to logfile and CDW information. The system consists of many commercial and home-grown systems denoted by ‘CIS’ and ‘Ancillaries,’ which, together with the Admission, Discharge, and Transfer (ADT) system (Eagle), use a series of interfaces (Interface Engine) to transfer information between systems and store and retrieve data from the CDR, a database system optimized for speed. The interface engine draws on the MED server for terminology translation between the different systems. Data from the CDR is replicated to the CDW after processing by scripts, which put it into a relational form suitable for research queries. Entries in CDW tables include medcodes for each laboratory test or other clinical elements. The CIS generates logfiles that contain either a medcode or unique time-stamp, which can be joined with data in the CDW to identify elements viewed.
Fig. 2
Fig. 2
Cell values correspond to colors in a heatmap. By convention, shades of red indicate high values, green low values, and black intermediate values. Making an analogy with microarrays in our study, laboratory tests are analogous to genes, and clinicians (or patient condition, or user session) are analogous to samples. The top row shows raw values and the corresponding heatmap. The bottom row shows the values normalized by row, so that each value is expressed as a a difference of the value from the row mean, over the row standard deviation. (The row constitutes the total views for that clinician). The corresponding heatmap is to the right.
Fig. 3
Fig. 3
Methods Overview: 1) Data from the database record and logfile record are joined to allow identification of which element (in this case, specific laboratory results) the user viewed for a specific time and patient. 2) Normalized counts of all the test results viewed are visualized as a heatmap, and clustered hierarchically in two dimensions (e.g. laboratory test v. user). 3) Cluster analysis is performed. 4) Resulting clusters are compared with actual patient records from the Eclipsys CIS (a commercial system in use at NYPH). 5) The cluster can be used to specify elements of a widget interface (future work).
Fig. 4
Fig. 4
Distribution plot of test viewing frequency (vertical axis) versus laboratory tests (horizontal axis, not all test names shown). The distribution shows a ‘long tail’ pattern. The darker line plot is the cumulative percentage (Pareto plot), showing that >50% of total test views are of tests in the tail of the frequency distribution.
Fig. 5
Fig. 5
Heatmaps with laboratory test titles (vertical axis) and clinicians (horizontal axis, names obscured). On the right is an overview screenshot of the heatmap showing clusters of tests and user patterns. Horizontal groupings reflect tests commonly viewed together. Vertical bands reflect individual clinician viewing patterns. On the left is an exploded view of a section of heatmap showing the test names on the vertical axis. Horizontal red bands indicate the tests that were viewed by many users and at high frequencies. Conventional heatmap coloring (i.e., red for high numbers, green for the lowest, black for intermediate) applies.
Fig. 6
Fig. 6
Clustered test viewing, by specialty. Left column: hospitalists, right column: nephrologists. Not all test names are shown.
Fig. 7
Fig. 7
Discordant elements (i.e., those not viewed by the other member of the matched pair) in laboratory test viewing among matched pairs of clinicians (n = 474 pairs) viewing the same patient record on the same day.
Fig. 8
Fig. 8
Discordant elements (i.e. those not viewed by the other member of the matched pair) in laboratory test viewing among matched pairs of clinicians (n = 474 pairs) viewing the same patient record on the same day, as a percentage of total elements viewed by both clinicians.
Fig. 9
Fig. 9
Jaccard Index frequency distribution. We calculated the Jaccard index of similarity between the viewed element sets of the same matched pairs as in Figure 8.
Fig. 10
Fig. 10
Laboratory test views by ICD-9-CM codes. The differences in testing viewing for atrial fibrillation and rectal and anal ulcer are easily visible.

Source: PubMed

3
Subscribe