摘要: The big data challenge is one unique opportunity for both mining and database research engineering. A vast ocean of are collected from trillions connected devices in real time on a daily basis, useful knowledge usually buried multiple genres, different sources, formats, with types representation. Many interesting patterns cannot be extracted single collection, but have to discovered the integrative analysis all heterogeneous sources available. Although many algorithms been developed analyze information applications continuously pose new challenges: Data can gigantic, noisy, unreliable, dynamically evolving, highly imbalanced, heterogeneous. Meanwhile, users provide limited feedback, growing privacy concerns, ask actionable knowledge. In this thesis, we propose explore power such challenging learning scenarios. There two perspectives correlations among sources: Explore their similarities (consensus combination), or differences (inconsistency detection). In consensus combination, focus task classification sources. Multiple same set objects complimentary predictive powers, by combining expertise, prediction accuracy significantly improved. However, major that it hard obtain sufficient reliable labeled effective training because they require efforts experienced human annotators. some may only large amount unlabeled data. do not directly generate label predictions, constraints task. Therefore, first graph based maximization framework combine supervised unsupervised models obtained available We further demonstrate benefits specific transfer learning, an model combination target domain no also robustness evolving data. On other hand, when unexpected disagreement encountered across diverse might raise red flag in-depth investigation. Another line my thesis find anomalies. spectral method detect performing inconsistently as type Traditional anomaly detection methods discover anomalies degree deviation normal source, whereas proposed approach detects according inconsistencies principle inconsistency benefit applications, particular, show how help identify networks distributed systems. probabilistic social community comparing link node information, system problems machines systems modeling machines. In go beyond scope traditional ensemble address challenges faced With framework, longer requirement successful multi-source classification, instead, use existing labeling experts maximized integratingknowledge relevant do- mains concept opens up direction detection. detected anomalies, which found techniques, insights into application area. proved areas, including network analysis, cyber-security, business intelligence, potential being applied healthcare, bioinformatics, energy efficiency. As number our world exploding, there still great opportunities well numerous inference massive collections.