|
OPENING |
8:30 |
|
[8:30-8:35]
Teruaki Hayashi
Welcome and Opening Remarks
|
8:35 |
|
[8:35-8:55]
Yukihisa Fujita, Teruaki Hayashi, and Masahiro Kuwahara
Topic-Based Search: Dataset Search without Metadata and Users' Knowledge about Data
Abstract
With the advancement of information technologies, we can obtain various kinds of data, which can be leveraged for various purposes. The availability of a large amount of data is a desirable situation. However, it makes dataset retrieval a time-consuming and complex task. Conventional dataset search methods require unified metadata and knowledge about keywords representing the datasets. In other words, they require user knowledge regarding the datasets, such as the terms used in the dataset and fields in the metadata. To address this issue, we propose a topic-based search method without metadata, especially for users lacking knowledge about the datasets. The topic-based search can find datasets by using not the exact keywords but abstract keywords described as topics. In this paper, we focus on table data, which contain column names and data values and are widely used for storing data. As preliminary analysis, we collected and analyzed public datasets available in Japanese data portals to clarify the features of datasets that should be searched through dataset search. The analysis results revealed the use of many general and common keywords as column names, but it is difficult to implement a dataset search using only column names. Therefore, based on the analysis results, we decided to use embeddings converted from the datasets to utilize both column names and data values to extract topics from datasets. The experimental results showed that we can extract topics from datasets by using the topic modeling method and obtain better search results when compared with the search method using exact keywords.
|
8:55 |
|
[8:55-9:15]
Kosuke Manabe, Yukihisa Fujita, Masahiro Kuwahara, and Teruaki Hayashi
Variable-based Learning Considering Topic Specificity in Heterogeneous Data Clustering Tasks
Abstract
TBD
|
9:15 |
|
[9:15-9:35]
Xiaolong Liu, Liangwei Yang, Chen Wang, Mingdai Yang, Zhiwei Liu, and Philip S. Yu
Multi-view Graph Convolution for Participant Recommendation
Abstract
Social networks have become essential for people's lives. The proliferation of web services further expands social networks at an unprecedented scale, leading to immeasurable commercial value for online platforms. Recently, the group buying (GB) business mode is prevalent and also becoming more popular in E-commerce. GB explicitly forms groups of users with similar interests to secure better discounts from the merchants, often operating within social networks. It is a novel way to further unlock the commercial value by explicitly utilizing the online social network in E-commerce. Participant recommendation, a fundamental problem emerging together with GB, aims to find the participants for a launched group buying process with an initiator and a target item to increase the GB success rate. This paper proposes Multi-View Graph Convolution for Participant Recommendation (MVPRec) to tackle this problem. To differentiate the roles of users (Initiator/Participant) within the GB process, we explicitly reconstruct historical GB data into initiator-view and participant-view graphs. Together with the social graph, we obtain a multi-view user representation with graph encoders. Then MVPRec fuses the GB and social representation with an attention module to obtain the user representation and learns a matching score with the initiator's social friends via a multi-head attention mechanism. Social friends with the Top-$k$ matching score are recommended for the corresponding GB process. Experiments on three datasets justify the effectiveness of MVPRec in the emerging participant recommendation problem.
|
9:35 |
|
[9:35-9:55]
Teruaki Hayashi, Yukihisa Fujita, and Masahiro Kuwahara
Exploring the Fundamental Units of Semantic Representation of Data Using Heterogeneous Variable Network in Data Ecosystems
Abstract
The value creation achieved through the exchange, distribution, and collaboration of data among different organizations has garnered significant attention as a new source of innovation. The mathematical treatment of the meaning of data helps measure its "quality" to formulate evaluation criteria for data exchange between stakeholders with distinct background knowledge in data ecosystems. This study examines the structure of data morphemes, the fundamental units of semantic representation of data, by conducting network and association analyses of variables present in metadata from diverse fields. Network analysis identifies the globally sparse and locally dense characteristics of variable co-occurrence networks and highlights essential relationships and core variables. Key findings include the discovery of "depth," "sediment/rock," and "sample code/label" as both universal variables and crucial nodes between datasets used in the experiment. Association analysis reveals vital variable pairs, such as "age" and "ring width" or "latitude" and "longitude." This research may provide a understanding of the structure and meaningful representation of data, facilitating smooth data exchange and utilization practices among stakeholders with different domains, purposes of data use, and background knowledge in data ecosystems.
|
9:55 |
|
[9:55-10:15]
Kazuhito Ogawa and Naoki Watanabe
Resale-proof Trades of Data under Budget Constraints: A Subject Experiment
Abstract
In a simple model of the market for transactions of data proposed by Nanba et al (2022, IEEE BigData 2022), this note examines the effects of budget constraints on the prices of data and the gains from trades of data produced with their constituent variables in a subject experiment. In the model, the prices of variables are exogenously set at the initial round, and then updated for the next round immediately according to the demand for the data traded in the current round. Data are traded freely among agents and the prices are determined through transactions among agents. Resale of data is permitted among agents. The initial prices of variables are computed assuming the results of sequential trades of data. Each subject was faced with budget constraints, but surplus budgets cannot be carried over to subsequent rounds. In the subject experiment, the prices of variables fluctuated, but the prices of data determined by the initial owners under budget constraints remained relatively stable without a drastic increase compared with the prices of data determined under no budget constraints. The efficiency rates in transactions made under budget constraints were not lower than those in transactions made under no budget constraints.
|
10:15 |
Coffee Break |
10:30 |
|
[10:30-11:30]
Invited Talk
Dolores Ordoñez Director of AnySolution
Data Spaces, the new paradigm of the new data economy
Abstract
TBD
Biography
Dolores Ordoñez holds a degree in Law by Deusto University, Spain and she is specialized in European Community Law and holds an Executive Master in Innovation. Specialized in innovative strategies mainly in line with SmartCities, Tourism and smart destinations and sustainability. She is member of the Smart Destinations WG at the University of the Balearic Islands. She is president of Planetic (Spanish technological platform for ICT) and vice-president of the international cluster of Tourism, TURISTEC. She is also vice-president of the Spanish National hub of GAIA-X, and co-leader of the tourism data space working group. She has been selected as one of the 10 Smart Tourism Destinations programme of the European Union. She is the coordinator of the track on smart and sustainable transition in Tourism for Intelligent Cities Challenge of the European Commission and tourism expert for Eurochambres within EU4BCC. Co-chari of the Digital Group of the T4T (Together for EU Tourism) With more than 20 years of experience as head of European projects in different public administrations in the Balearic Islands and in the private sector, presently she manages challenging EU projects in different fields, as the coordination of the European Tourism data Space, DATES project. She is speaker in many conferences worldwide and teaches issues related to the European Union, SmarCities, SmartTourism and Sustainability. As technical director of AnySolution, she is in charge of strategic innovative plans for public and private entities, EU projects implementation and the development of the data-driven platform NADIA, with the European award 2023 by the AIOTI.
|
11:30 |
|
[11:30-11:50]
Hiroki Sakaji and Noriyasu Kaneda
Indexing and Visualization of Climate Change Narratives Using BERT and Causal Extraction
Abstract
In this study, we propose a methodology to extract, index, and visualize ``climate change narratives'' (stories about the connection between causal and consequential events related to climate change). We use two natural language processing methods, BERT (Bidirectional Encoder Representations from Transformers) and causal extraction, to textually analyze newspaper articles on climate change to extract ``climate change narratives.'' The novelty of the methodology could extract and quantify the causal relationships assumed by the newspaper's writers. Looking at the extracted climate change narratives over time, we find that since 2018, an increasing number of narratives suggest the impact of the development of climate change policy discussion and the implementation of climate change-related policies on corporate behaviors, macroeconomics, and price dynamics. We also observed the recent emergence of narratives focusing on the linkages between climate change-related policies and monetar
y policy. Furthermore, there is a growing awareness of the negative impacts of natural disasters (e.g., abnormal weather and severe floods) related to climate change on economic activities, and this issue might be perceived as a new challenge for companies and governments. The methodology of this study is expected to be applied to a wide range of fields, as it can analyze causal relationships among various economic topics, including analysis of inflation expectation or monetary policy communication strategy.
|
11:50 |
|
[11:50-12:10]
Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji
From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models
Abstract
Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-trained base model. We performed Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models using our instruction dataset. We evaluated these models from both quantitative and qualitative perspectives. As a result, the effectiveness of Japanese instruction datasets is confirmed. The results also indicate that even with relatively small LLMs, performances in downstream tasks would be improved through instruction tuning. Our instruction dataset, tuned models, and implementation are publicly available online.
|
12:10 |
|
[12:10-12:30]
Takehiro Takayanagi and Kiyoshi Izumi
Harnessing Behavioral Traits to Enhance Financial Stock Recommender Systems: Tackling the User Cold Start Problem
Abstract
Recommender systems often struggle with the user cold start problem, which arises when there is a lack of interaction data for new users. This issue is particularly important in financial stock recommendations, as novice investors often lack investment experience and require personalized advice more than experienced users. Behavioral finance offers valuable insights into investor preferences and highlights the impact of psychological factors on investor behavior. In this paper, we present a novel framework that integrates behavioral finance with financial stock recommendations to effectively tackle the user cold start problem.
To that end, we first conduct a survey involving 964 Japanese individual investors to gather investment-related behavioral traits while collecting their transaction data in a trading platform. Then, we introduce the Investor Risk-tolerance Aware DropoutNet (IRAD) and show its improved performance over baseline models, demonstrating its effectiveness for stock recommendations in cold start settings. Finally, we offer an example of how incorporating investors' behavioral traits can result in more interpretable stock recommendations.
|
12:30 |
|
[12:30-12:35]
Hiroki Sakaji
General Comment and Closing Remarks |
12:35 |
CLOSING |