Data Science & Engineering

Navigating Change Through Informed Insights

More About CVPros

The Data Deluge: Innovating New Approaches to Modernization

The digital realm is bursting at the seams with data. The volume of today’s data deluge is so colossal that it almost defies description. That’s why the information technology community invented the term “Big Data,” as a catchy, concise way of expressing the unprecedented magnitude and rapid growth of continuously generated data. The Age of Big Data is affecting all enterprises, from healthcare and financial services to energy and manufacturing.

The term Big Data, however, is perhaps too shallow, shortsighted and even misleading. To be sure, today’s data is high volume, has high velocity and is of high complexity. But data is just the raw material; it’s the beginning, not the end. Raw data needs to be transformed into insight, understanding, and wisdom. The data is the input, not the output. It is of high value and high impact only if merged with the right business intelligence, advanced analytics and other new technologies to deliver meaningful outcomes. If Big Data is one of those stylish terms riding the wave of technology’s “hype cycle,” the data deluge is very real and presents an array of challenges for enterprises.

Learn about CVP’s Data Science & Engineering Practice Competencies

Click here

To help public and private organizations address and manage those hurdles, CVP has created its Data Science & Engineering Practice (DS&E) containing four competencies: descriptive analytics, predictive analytics, data engineering and data systems. These areas of the data life cycle encompass back-end data systems, data engineering (or “the glue” that makes the data available), and the front-end results via descriptive and predictive analytics.

A closer look at the four competencies

  • Descriptive analytics focus on an organization’s mission and understanding the nature of the business so that performance can be summarized in dashboards and reports. Descriptive analytics concentrates on an organization’s objectives and analyzing mission-related performance. Based on analyses performed by its data technologists, CVP creates and delivers presentations to stakeholders and builds reports, dashboards and visualization that promote self-service.
  • Predictive analytics concentrate on identifying challenges and their scope and then using techniques like machine learning to infer what will happen in the future. Predictive analytics is more problem-focused, working with the customer to identify, scope and define challenges without clear solutions. This Practice also integrates machine learning models into the process and utilizes artificial-intelligence (AI) techniques like supervised and unsupervised learning and natural language processing.
  • Data engineering seeks to understand the flow of data from collection to display so that the right data is sourced, prepared, and made available. Data engineering performs data modeling to meet analytical requirements, and creates new metrics, dimensions and features in support of analytics programs. It also develops and administers ETL and data integration processes.
  • Data systems focus on technology—managing tools and systems that support all the above. Focused on technology, the data systems Practice is concerned with obtaining and managing tools and systems. It also leverages automation to improve reliability and scalability, architecting and administering Big Data systems, and ensuring data security at rest and in transit.

These four competencies face an enormous amount of data being generated from disparate sources, everything from government and public data to social media, commercial transactions and sensor databases. And it’s not just the obvious sources like Amazon or Facebook, but data from the Internet of Things and other burgeoning, even disruptive, sources like telemetry from self-driving vehicles. Government and industry’s ability to aggregate, securely process, and harmonize mountains of both structured and unstructured data is rapidly evolving with the adoption of the cloud, advanced data analytics tools, and, increasingly and perhaps mostly importantly, the advent of AI and its fields of machine learning (ML) and NLP.

As the data deluge grows, the demand for new solutions also intensifies. There is a vastly increased need for sophistication and speed in delivering intelligent and actionable insights from data. To keep up, the technology world is accelerating into automation and real-time decision analytics. Much of the data is unstructured, so it will be initially difficult to analyze it without technology like NLP to make it actionable. Organizations are also quickly exploring and adopting various technologies like storage as a service and serverless compute in order to handle the scale.

The emergence of these new cloud architecture options and infrastructures for immense volumes of data poses stumbling blocks for conventional thinking about data and co-located storage and compute. The latest push has been toward separating compute from storage and accessing the data through open formats. With open formats, users can access information via a variety of options, letting them perform multiple analytics operations or use different, competing tools against the same data set at the same time. These open formats and the decoupling of storage and compute are also critical in helping organizations keep up with continuous changes in the technology sphere and the surging pace of the data deluge by allowing experimentation in hours instead of months. Finally, decoupling offers independent scalability and access to underlying storage without adding costly compute resources that may not be needed.

As new solutions and technologies come roaring down the pike in the next few years and into the 2020s, we also will see the maturation of serverless compute yielding data systems that will be an order of magnitude cheaper and scalability that will be effectively unlimited. Patching, provisioning and backup will be accomplished externally by the cloud provider, letting organizational leaders focus on mission objectives, not technology problems and issues.

Data Challenges for Healthcare Organizations

The healthcare information technology sector, a key arena of expertise for CVP, represents a microcosm of the formidable challenges wrought by the data deluge but is also an area fertile for leading edge data analytics solutions, such as those offered by CVP’s Data Science & Engineering Practice.

A current CVP project illustrates how healthcare organizations are turning to analytics to transform implacably large amounts of structured and unstructured data into actionable intelligence in pursuit of mission objectives. CVP is working with the federal Department of Health and Human Services’ Office of Inspector General (OIG) to provide technology that supports OIG’s review and analysis of large data sets for the detection of fraud, waste and abuse in HHS programs. CVP is helping to maintain and enhance the OIG Consolidated Data Analysis Center’s existing analytical tools and develop new tools as the center mines terabytes of healthcare quality and related data spanning billions of healthcare records collected by the Centers for Medicare and Medicaid Services. CVP also is developing a new generation of cloud-based technologies to increase productivity and effectiveness, as well as furnishing data management services—acquiring, formatting and uploading data.

A focal point for CVP in healthcare IT is staying ahead of the swiftly moving advances in the field. Not so long ago, the IT function of healthcare organizations was confined to a back office where staff oversaw basic operations, such as managing infrastructure, email systems and networks to ensure that everything ran smoothly for mission-focused users. Today, healthcare IT teams face some of the most daunting challenges of any sector: the staggering growth of unstructured data, the need for interoperability across systems and increasingly sophisticated cyber threats. With digitization and automation, most healthcare data is collected in IT systems, including sensitive healthcare and patient data, clinical information, lab results and diagnostic imaging. Still, some critical healthcare information, like that produced by encounters between doctors and patients, is still processed manually.

A promising cutting-edge solution is NLP, a technique that falls under DS&E’s predictive analytics competency and a branch of AI that centers on the interpretation of human-generated spoken or written data. The potential benefits are enormous, from extracting relevant information from unstructured physician notes to supporting crucial decision-making by healthcare providers at the point of need.

CVP and the Continuous Everything Culture

Until relatively recently, advances in computing and applications, such as data analytics, came in fits and starts, sometimes in paradigmatic waves spanning a decade or more. Remember when systems could be purchased and expected to have a seven to ten year life span?

Today, the onslaught of new technologies is unrelentingly continuous. For example, Google’s deep-learning library called TensorFlow was introduced only four years ago and is now being used in a variety of industries to automate processes such as cancer detection problems that have vexed researchers for years. As another example, think of the Hadoop stack of technology. Many organizations are still using data warehouses and know it only as a fancy thing for the future. At CVP, we recently helped a client move off a Hadoop platform we consider legacy technology and start leveraging second generation Big Data technology like Spark and Presto. In the past, an organization might have been able to deal in multiyear roadmaps to modernization.

Today, that level of predictability can no longer be assumed. The rate of change in technology is moving so quickly that systems must be extremely reliable, agile, flexible, and open.

What CVP calls a Continuous Change environment requires new modernization frameworks like the company’s Continuous Everything approach. The traditional way of supporting and upgrading a large, monolithic system is becoming archaic. CVP’s “slay the monolith” strategy—breaking systems into flexible, loosely coupled and interoperable components—is critical for organizations to stay competitive and responsive to change. In the data arena, separating compute from storage, keeping data in open formats and leveraging data represent the slay the monolith line of attack.

Moreover, Continuous Everything embraces a series of continuous processes such as Continuous Integration, Continuous Delivery, Continuous Monitoring, Continuous Testing and Continuous Pulsing to identify how improvements can be injected back into the modernization process. Traditionally viewed as something done only with a conventional business application, data-system upgrades can now be accomplished continuously and automatically. CVP’s Continuous Everything principles are applied to data systems so that they are automatically built, deployed, monitored and tested to ensure that high-quality data is available 24/7.

The Innovation Imperative.

Innovation is the continuously beating heart of CVP’s Data Science & Engineering Practice, driving forward momentum in technology development, propelling growth, and, ultimately, meeting customer needs and imperatives.

A case in point is a CVP project for a large government healthcare agency. CVP is tackling a proof of concept to move a huge data cluster to the cloud by re-architecting it. Instead of migrating the whole cluster, CVP’s technologists are separating compute from storage to improve agility and system velocity and store data in open formats. Adopting this methodology allows for the creation of multiple data clusters, independent scalability up and down and the application of multiple, concurrent front-end query engines, at the same time increasing reliability and automating time-consuming manual steps.

CVP’s concept of Continuous Everything lays the foundation for future technologies, such as an autonomous computing environment, including self-healing systems with scalable adaption, and instantly available servers in the cloud. Continuous Change is impacting the public and private sectors at an accelerated pace. CVP’s approach is to create a roadmap to resolve client problems, execute legacy technologies and set a course toward adaptable, scalable solutions ready for the future.

Showcasing artificial intelligence—particularly, separating the hype from the real potential—and machine learning are also part of CVP’s nascent strategy for meeting customer needs. CVP knows that customers don’t want hype—they want to see real-life examples of what’s possible in AI and how it can help them accomplish their missions.

The same strategy applies to machine learning. Data Science & Engineering technologists are always ready to take a concept and quickly turn it into reality, like a kind of impromptu research and development shop. For example, technologists recently developed and trained a deep-learning neural network to detect skin cancer within an hour. On top of this model, they set up a serverless back-end and Application Program Interface (API) in Google’s Cloud Platform to support a mobile app.

Potential users of the app can snap a photo of the suspected melanoma, which is sent to the trained neural network for a prediction as to whether or not the skin image has malignant melanoma. Users can then share their concerns with a doctor. The system proved to be about 80 percent accurate after training for only an hour with 4,000 images, and we believe accuracy can be substantially improved once fully operational.

The melanoma detector underscores CVP’s rapid iteration approach and its application of a “startup” mentality to research and development. Both are vital pieces of CVP’s innovation strategy as the company looks to the future and pursues its overall vision of offering advanced technical capabilities in Data Science & Engineering and all its Practice domains.

EXPLORE MORE

Interested in getting insights and updates?

Subscribe to our Data Science & Engineering Channel

Learn more about how CVP approaches Modernization. Click here

Competencies

Client Success Stories

Transforming Raw Data for Analytics

Centers for Medicare & Medicaid Services was struggling with vast, varied, and unstructured data streaming from 38 state agencies due to reporting requirements for pilot healthcare programs. Each state reported data inconsistently, varying in type and format. CVP developed a comprehensive performance metrics database that captured, streamlined, and transformed data into decision-ready analytics. Our Data Acceleration solution incorporated both mixed-method advanced analytics and advanced search functionality to quickly target the right data. CVP’s efforts resulted in faster, cleaner data aggregation and analytics that eased the client’s process bottlenecks and saved operational costs across the board.

Boosting Customer Understanding with Predictive Analytics

A mobile wireless company was looking for ways to better serve customers while increasing profitability and reducing customer attrition. CVP analyzed over 50 characteristics of almost 5 million past and present wireless subscribers. We developed a predictive analytics model that calculated survival curves and provided critical insights on expected customer lifetimes, 30-day churn risk, risk-adjusted lifetime value, and expected impacts on customer satisfaction. The results of CVP’s advanced analytics model were put into use on the front lines of customer service to let the client target the most important factors in increasing customer retention, boosting profitability, and improving customer satisfaction.

Harnessing Machine Learning to Discover Secrets Hidden in Data

CVP developed and integrated a sophisticated data search, analysis, collaboration, and correlation platform for a national security agency, all powered by machine learning. CVP tapped into HP Autonomy IDOL (Intelligent Data Operating Layer) search engine’s artificial intelligence to detect and analyze patterns and trends in 150 languages and 1000 data formats encompassing text, video, image, and audio, as well as unstructured data.

The system’s knowledge discovery feature frees users from having to know what questions to ask beforehand. IDOL builds on machine learning and deep neural network algorithms to recognize patterns, trends, and relationships hidden within the data. IDOL’s approach combines sophisticated probabilistic modeling with natural language processing (NLP) algorithms to extract concepts and insights from written or spoken language in a fully automated and highly accurate manner. Unlike NLP technology, which focuses solely on linguistics, IDOL follows a language-independent, statistical approach to understanding human information that is fine-tuned by the use of linguistics.

The application was deployed in a Top Secret intelligence community computing environment, which required rigorous authentication and authorization. CVP created a core software architecture founded on role-based security and on data segregation and isolation. Incoming data must be segregated into separate visibility groups based on need-to-know, and duties/roles were clearly separated between administrative, supervisory, and standard analyst users.

Meet Featured CVPros

Cal Zemelman

Cal Zemelman

Director, Data Science & Engineering Practice

Cal directs CVP’s Data Science & Engineering Practice, focusing on Data Systems, Data Engineering, Descriptive Analytics, and Predictive Analytics. He’s pioneered new capabilities, tools, and technologies using next-generation techniques.
MEET CAL

Pete Grivas

Pete Grivas

Pete has over 20 years of experience leading and delivering data and application solutions to improve healthcare quality and outcomes and to achieve enterprise goals.
MEET PETE

Marquis Payne

Marquis Payne

Marquis has over 20 years of experience managing high-value process improvement and analytics programs. He leads CVP’s largest program involving big data, data analytics, and statistical modeling.
MEET MARQUIS

Matthew Schmitt

Matthew Schmitt

Matthew is a passionate mentor to high-performing teams in all areas of software development life cycle, from requirements to operational support, with acute attention to the Voice of Customer.
MEET MATTEW

Tim Regulski

Tim Regulski

Tim’s 19+ years of IT experience, with enterprise expertise in virtual and cloud architectures, helps pioneer new applications in healthcare and software, web, and mobile design.
MEET TIM

Pin It on Pinterest

Share This